ChatGPT Now Dispenses Advices to Doctors Like a Colleague

Hospitals have begun using “decision support tools” powered by artificial intelligence that can diagnose disease, suggest treatment or predict a surgery’s outcome. But no algorithm is correct all the time, so how do doctors know when to trust the AI’s recommendation? A new study led by Qian Yang, assistant professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science, suggests that if AI tools can counsel the doctor like a colleague – pointing out relevant biomedical research that supports the decision – then doctors can better weigh the merits of the recommendation.

The researchers will present the new study, “Harnessing Biomedical Literature to Calibrate Clinicians’ Trust in AI Decision Support Systems,” in April (23 to 28) at the Association for Computing Machinery CHI Conference on Human Factors in Computing Systems. Previously, most AI researchers have tried to help doctors evaluate suggestions from decision support tools by explaining how the underlying algorithm works, or what data was used to train the AI. But an education in how AI makes its predictions wasn’t sufficient, Yang said. Many doctors wanted to know if the tool had been validated in clinical trials, which typically does not happen with these tools.

A doctor’s primary job is not to learn how AI works,” Yang said. “If we can build systems that help validate AI suggestions based on clinical trial results and journal articles, which are trustworthy information for doctors, then we can help them understand whether the AI is likely to be right or wrong for each specific case.”

To develop this system, the researchers first interviewed nine doctors across a range of specialties, and three clinical librarians. They discovered that when doctors disagree on the right course of action, they track down results from relevant biomedical research and case studies, taking into account the quality of each study and how closely it applies to the case at hand.

Yang and her colleagues built a prototype of their clinical decision tool that mimics this process by presenting biomedical evidence alongside the AI’s recommendation. They used GPT-3 to find and summarize relevant research. (ChatGPT is the better-known offshoot of GPT-3, which is tailored for human dialogue.)