A multi-agent LLM outperforms human experts in diagnosing rare diseases.
For millions of people living with rare diseases, getting a diagnosis can feel like an endless nightmare. On average, patients spend more than five years bouncing between specialists, collecting misdiagnoses, and undergoing unnecessary treatments before anyone figures out what’s actually wrong. In rare disease medicine, this exhausting journey even has a name: the diagnostic odyssey.
A new AI system called DeepRare may be about to change that.
What Makes Rare Diseases So Hard to Diagnose
There are roughly 7,000 known rare diseases — conditions affecting fewer than 1 in 2,000 people — and together they affect over 300 million people worldwide. About 80% are genetic in origin. The problem is that no single doctor, no matter how experienced, can hold all of that knowledge in their head. These conditions are often complex and multisystemic, cases are scarce (making traditional AI training nearly impossible), and hundreds of new rare genetic diseases are identified every year.
Standard AI tools have struggled here for exactly these reasons. You can’t simply train a model on thousands of examples when only a handful of cases exist.
A Smarter Approach: Multi-Agent AI
DeepRare, developed by an international research team and published in Nature, takes a fundamentally different approach. Rather than trying to train a model directly on rare disease cases, it uses a general-purpose large language model (DeepSeek-V3) to orchestrate a network of more than 40 specialized tools. Think of it less like a single expert and more like a well-coordinated diagnostic team.
The system works in three tiers. A central host LLM acts as the lead clinician — breaking down the diagnostic problem, deciding which tools to deploy, synthesizing evidence, and running self-reflection loops to challenge its own conclusions. A middle layer of six specialized agent modules handles tasks like extracting standardized symptoms from clinical notes and searching real-time medical databases, including PubMed and Wikipedia. The outer tier is the raw data those agents pull from.
When something doesn’t add up, the system loops back, digs deeper, and tries again — much like a careful physician reconsidering a case.
Outperforming the Best in the Field
The results are striking. Tested across more than 6,400 cases spanning 2,919 distinct rare diseases and 14 medical specialties, DeepRare correctly identified the right diagnosis as its top answer 57% of the time — outpacing the next-best AI model by nearly 24 percentage points.
More impressively, when pitted directly against five experienced rare disease physicians (each with at least a decade of experience), DeepRare came out ahead. The AI correctly identified the diagnosis first 64% of the time versus 55% for the human doctors. The physicians were allowed to use search engines, just not AI tools — a realistic simulation of real-world clinical conditions.
Critically, the system’s reasoning was also validated. Independent physicians reviewed a sample of 180 cases and found that 95% of the references DeepRare cited were both accurate and directly relevant to its conclusions. This isn’t a black box — it shows its work.
Why This Is a Big Deal
What DeepRare represents isn’t just a better diagnostic tool — it’s a potential lifeline for patients lost in the system. For families who have spent years and fortunes searching for answers, a system that can reliably surface the right rare disease diagnosis in seconds is transformative.
It also signals a broader shift in how AI is being applied to medicine. The future may not be a single all-knowing model, but rather intelligent systems that know how to ask the right questions, consult the right sources, and check themselves before they wreck someone’s life with a wrong answer.
The diagnostic odyssey has gone on long enough. Tools like DeepRare suggest the end of it may finally be in sight.
This topic was featured in Great News podcast episode 34.
Source: Lifespan Research Institute

