Can AI diagnose, treat patients better than doctors? Israeli study finds out

Evaluators highlighted several advantages of the AI system over human physicians. 

 A MEDICAL ROBOT designed to facilitate cancer screening using artificial intelligence is displayed during a technology start-ups and innovation fair last year in Paris. (photo credit: JULIEN DE ROSA/AFP via Getty Images)
A MEDICAL ROBOT designed to facilitate cancer screening using artificial intelligence is displayed during a technology start-ups and innovation fair last year in Paris.
(photo credit: JULIEN DE ROSA/AFP via Getty Images)

Many physicians tremble when they read about the high quality of artificial intelligence recommendations for diagnosis and treatment of patients, equal to or better than those of human doctors. No one believes that physicians will become obsolete due to AI, but the advance can improve their performance.

A new study led by Prof. Dan Zeltzer, a digital health expert from the Berglas School of Economics at Tel Aviv University (TAU), compared the accuracy of such recommendations made by AI to those of physicians at the famed Cedars-Sinai Medical Center in Los Angeles. As it runs CS-Connect, a virtual urgent-care clinic, it decided to collaborate with an Israeli start-up called K Health.

The paper was recently presented at the annual conference of the American College of Physicians (ACP) and published in the journal Annals of Internal Medicine I under the title “Comparison of initial artificial intelligence (AI) and final physician recommendations in AI-assisted virtual urgent care visits.”

“Cedars-Sinai operates a virtual urgent care clinic offering telemedical consultations with physicians who specialize in family and emergency care,” Zeltzer explained.

“Recently, an AI system was integrated into the clinic – an algorithm based on machine learning that conducts initial intake through a dedicated chat, incorporates data from the patient’s medical record, and provides the attending physician with detailed diagnostic and treatment suggestions at the start of the visit – including prescriptions, tests, and referrals,” he said. 

“When confidence is sufficient, AI presents diagnosis and management recommendations (prescriptions, laboratory tests, and referrals),” the digital health expert said. “After interacting with the algorithm, patients proceed to a video visit with a physician who ultimately determines the diagnosis and treatment. To ensure reliable AI recommendations, the algorithm – trained on medical records from millions of cases – offers suggestions only when its confidence level is high, giving no recommendation in about one out of five cases. 

“In this study, we compared the quality of the AI system’s recommendations with the physicians’ actual decisions in the clinic.”

 South Korean researchers unveil AI that predicts your heart's biological age. (credit: Galaxy love design. Via Shutterstock)
South Korean researchers unveil AI that predicts your heart's biological age. (credit: Galaxy love design. Via Shutterstock)

THE RESEARCHERS examined a sample of 461 online clinic visits over one month during the summer of 2024. The study focused on adult patients with relatively common symptoms – respiratory, dental, urinary, and vaginal. In all visits reviewed, patients were initially assessed by the algorithm, which provided recommendations, and then treated by a physician in a video consultation. 

Afterwards, all recommendations – from both the algorithm and the physicians – were evaluated by a panel of four doctors with at least a decade of clinical experience who rated each recommendation on a four-point scale – optimal, reasonable, inadequate, or potentially harmful. The evaluators assessed the recommendations based on the medical histories of the patients, the information collected during the visit, and transcripts of the video consultations.

The compiled ratings led to compelling conclusions: AI recommendations were rated as optimal in 77% of cases, compared to only 67% of the physicians’ decisions; at the other end of the scale, AI recommendations were rated as potentially harmful in a smaller portion of cases than physicians’ decisions (2.8% versus 4.6%). In 68% of the cases, the AI and the physician received the same score; in 21% of cases, the algorithm scored higher than the physician; and in 11% of cases, the physician’s decision was considered better.

The explanations supplied by the evaluators for the differences in ratings highlight several advantages of the AI system over human physicians. 

Advantages of AI

First, AI adheres more strictly to medical association guidelines – for example, not prescribing antibiotics for a viral infection. Second, AI more comprehensively identifies relevant information in the medical record such as recurrent cases of a similar infection that may influence the appropriate course of treatment. And third, AI more precisely identifies symptoms that could indicate a more serious condition such as eye pain reported by a contact lens wearer that may indicate an infection.

Physicians, on the other hand, are more flexible than the algorithm and have an advantage in assessing the patient’s actual condition. For example, if a COVID-19 patient reports shortness of breath, a doctor may recognize it as a relatively mild respiratory congestion, whereas the AI, based solely on the patient’s answers, might refer him or her unnecessarily to the emergency room.

ZELTZER CONCLUDED that, “in this study, we found that AI, based on a targeted intake process, can provide diagnostic and treatment recommendations that are, in many cases, more accurate than those made by doctors.

“One limitation of the study is that we do not know which of the physicians reviewed the AI’s recommendations in the available chart or to what extent they relied on these recommendations,” he said. “Thus, the study only measured the accuracy of the algorithm’s recommendations and not their impact on the physicians.”

He added that the study is unique because it tested the algorithm in a real-world setting with actual cases, while most studies focus on examples from certification exams or textbooks. 

“The relatively common conditions included in our study represent about two-thirds of the clinic’s case volume, and thus the findings can be meaningful for assessing AI’s readiness to serve as a tool that supports a decision by a doctor in his practice,” Zeltzer said.

“We can see a time soon when algorithms assist in an increasing portion of medical decisions, bringing certain data to the doctor’s attention, helping them to make faster decisions with fewer human errors,” he predicted. “Of course, many questions still remain about the best way to implement AI in the diagnostic and treatment process, as well as the optimal integration between human expertise and AI in medicine.”

When asked if Israeli doctors are fearful that AI could replace them, Zeltzer said the answer was mixed.

“I am not aware of any Israel-specific polls on that. In general, the sentiment toward AI tends to be mixed: excitement about potential alongside concerns about harmful and disruptive impacts, and claims that it is overhyped. 

“In healthcare, the debate isn’t new,” he explained. “Back in 2016, Geoffrey Hinton, often called the “godfather of AI” and later a Nobel laureate, predicted that AI would outperform radiologists within five years. Nine years later, AI accuracy in radiology has improved dramatically, but no radiologist has lost their job to AI. 

“There are several reasons for that. AI excels at some tasks but lags in others. Health systems are cautious and move slowly. Safety, trust, and regulation all slow adoption. Therefore, it is far more likely that we will see AI supporting doctors and care workflows, especially in systems already short on clinicians, rather than replacing them.”