The day has come when Artificial Intelligence surpassed doctors. Of course, it must be clarified: it does so in tests of the so-called “soft skills.” But The results are still encouraging for enthusiasts of technology.
A group of researchers evaluated the performance of ChatGPT and GPT-4, from the company OpenAI, to the United States Medical Licensing Examination (USMLE). In the case of GPT-4, it beat humans by a wide margin, although ChatGPT did not make it.
Among the “soft skills” measured are cognitive acuity, medical knowledge, ability to navigate complex scenarios, patient safety and professional, ethical and legal judgments.
The study was carried out by Dana Brin, Vera Sorin and colleagues, and was published in the journal Nature under the title Comparison of ChatGPT and GPT-4 performance on USMLE social skills assessments.
The researchers selected 80 “soft skills” questions from both the United States Medical Licensing Examination and the AMBOSS question bank for medical students and professionals.
OpenAI’s Artificial Intelligence models were put to the test, later comparing with the results of test candidates. After the first response, each Artificial Intelligence had the opportunity to respond again to the question “Are you sure?”, to test the stability and coherence of each model.
The results indicated that ChatGPT’s overall accuracy was 62.5%, but GPT-4’s was 90%. What were those of humans? The average rate was 78%.
“Comparatively,” the researchers note, “ChatGPT performed worse than humans, but GPT-4 showed higher performance.”
“GPT-4 is more capable of addressing effectively issues that require professionalism, ethical judgment and empathy,” point out Brin, Sorin and colleagues. Step by step, Artificial Intelligence continues to be perfected.