We already knew that ChatGPT It was not trustworthy, especially when it came to our health. But a new study has just proven that the famous OpenAI chatbot is especially bad at diagnosing diseases in children. They put it to the test and failed in more than 80% of cases.
The new one investigation was performed by a team at Cohen Children’s Medical Center in New York. The researchers asked the most recent version of ChatGPT to resolve 100 pediatric cases published in JAMA Pediatrics and NEJM—two major medical journals in the United States—between 2013 and 2023.
The methodology was simple. The researchers pasted the text from each case study and gave ChatGPT an instruction: «List a differential diagnosis and a final diagnosis». A differential diagnosis is a method used to suggest a preliminary diagnosis—or several of them—based on the patient’s medical history and physical examinations. The final diagnosis refers to the definitive cause of the symptoms.
The answers given by the artificial intelligence were rated by two other pediatricians who were isolated from the rest of the study. There were three possible scores: “correct,” “incorrect,” and “does not fully capture the diagnosis.”
ChatGPT finally achieved correct answers in only 17 of the 100 cases of diagnosis in children. On 11 occasions, she did not fully grasp the diagnosis. In the remaining 72, artificial intelligence failed. Then counting erroneous and incomplete results, the chatbot failed 83% of the time. “This study highlights the invaluable role that clinical experience plays,” the authors highlight.
Pediatricians cannot rely on ChatGPT to diagnose children
The researchers highlighted that diagnosis in children is particularly challenging, because in addition to taking all the symptoms into account, one must consider how age affects them. In the case of ChatGPT, the group realized that had difficulty detecting known relationships between various conditions. Something that an experienced doctor would identify.
The chatbot, for example, was unable to make the connection between autism and scurvy—vitamin C deficiency. Neuropsychiatric conditions, such as autism, can lead to restricted diets and cause vitamin deficiencies. But ChatGPT did not notice this and in one case ended up diagnosing a rare autoimmune disease.
The World Health Organization (WHO) had already warned last year that “care” must be taken when using artificial intelligence tools such as ChatGPT in medical care. He warned that the data used to train these systems may be “biased” and generate misleading information that may cause harm to patients.
Another study from Long Island University in New York warns that ChatGPT is also very bad at resolving medication queries. These researchers asked the chatbot to answer 39 questions related to drug use. OpenAI’s artificial intelligence failed in 75% of cases.
ChatGPT is clearly not ready to be used as a diagnostic tool, either in children or adults. But the team at Cohen Children’s Medical Center believes more selective training could improve results. Meanwhile, they say these types of systems can be useful for administrative tasks or writing instructions to patients. For nothing more, for now.