
Xinti Sun 1, Qiyang Hong 1, Mengyan Zhang 1, Yuyan Li 1, Tingwei Chen 1, Zigeng Huang 1, Guihan Liang 2, Wenjun Tang 1, Sulin Xu 1, Xiaolin Ni 1, Junling Pang 1, Peixing Wan 3, Erping Long 4
Cell Rep Med. 2026 Jan 20;7(1):102547.
PMID: 41494532 PMCID: PMC12866169 DOI: 10.1016/j.xcrm.2025.102547
Abstract
Medical reasoning is fundamental to clinical decision-making, underpinning tasks such as patient communication, diagnosis, and treatment planning. Inspired by psychological findings that peer interaction promotes self-correction, we introduce model confrontation and collaboration (MCC), a debate intelligence framework that transcends static ensemble methods by integrating critique and self-reflection to iteratively refine reasoning through structured, multi-round confrontation and collaboration among diverse large language models (LLMs). In multiple-choice benchmarks, MCC achieved mean accuracy on MedQA (92.6%) and PubMedQA (84.8%) and demonstrated strong performance on medical subsets of MMLU. In long-form medical question answering, MCC outperformed all individual LLMs and the domain-specific LLM Med-PaLM 2 in both physician and layperson evaluations. In diagnostic dialog tasks, MCC further excelled in both history-taking and diagnostic accuracy, reaching a top-1 diagnosis rate of 80%. These results position MCC as a scalable, model-agnostic framework that advances medical reasoning through collaborative deliberation.