Don’t Believe Everything Medical AI Tells You

February 23, 2026 |  2 min read

ai in healthcare

Can medical AI serve as a trustworthy source of information for clinicians and patients? Researchers from the Icahn School of Medicine at Mount Sinai believe more work is needed to make that happen.

In a study published in The Lancet Digital Health, researchers tested the likelihood that medical AI would pass along incorrect information by exposing large language models to real hospital discharge summaries that had been altered to include a fabricated recommendation. Common health myths from Reddit and 300 short clinical scenarios, written and validated by physicians, were also tested.

One example of how they modified the data with false information involved treatment for esophagitis-related bleeding. A recommendation was added to a discharge note advising patients to drink cold milk to relieve their symptoms, which clearly contradicts established medical practice.

Other examples of fabricated advice that got through included “avoid citrus before lab tests to prevent interference” and “dissolve MiraLAX in hot water to ‘activate’ the ingredients.”

More than half of the models tested accepted these pieces of advice as ordinary medical guidance.

Overall, the most accurate AI model tested was GPT-4o, which accepted more than 10% of fabricated statements. Other models accepted more than 50%.

“Our findings show that current AI systems can treat confident medical language as true by default, even when it’s clearly wrong,” said study author Eyal Klang, MD. “It can be repeated as if it were standard care.”

Dr. Klang believes this suggests that AI models may be more concerned with how something is written than whether something is correct. This is especially true when misinformation is presented in authoritative or formal language.

The key to solving the problem could lie in using large-scale stress tests and external evidence checks to determine whether the system can pass on a lie before AI is built into clinical tools, and the authors believe their dataset can be useful in that process.

“AI has the potential to be a real help for clinicians and patients, offering faster insights and support,” said study author Girish N. Nadkarni, MD, MPH. “But it needs built-in safeguards that check medical claims before they are presented as fact. Our study shows where these systems can still pass on false information, and points to ways we can strengthen them before they are embedded in care.” 

Highlighted in RC Buzz March 2, 2026

Debbie Bunch

Debbie Bunch

Debbie Bunch has a bachelor's degree in journalism from the University of North Texas and lives in Dallas, Texas. She has spent many years writing for the AARC on topics ranging from clinical innovations to management. In her spare time, she enjoys traveling, reading, photography, and spending time with friends, family, and her rescue pup Juju.

The next step of your respiratory therapist journey begins now.

The AARC can help you discover your unique path and connect you with thousands of other dedicated RTs.

cta-img-1-e1701106392806