ChatGPT-3.5 Shows Moderate Accuracy in Medical Genetics Exam, reveals research

Researchers have found in a new study that ChatGPT-3.5 performed with moderate accuracy and stable results on a specialist medical genetics exam, indicating potential for educational support. However, its limitations in handling complex, domain-specific reasoning highlight the need for continued advancement before wider application. The study was published in Laboratory Medicine journal by Klaudia P. and colleagues.
To test the performance of ChatGPT-3.5, scientists chose 456 available questions from the Polish national specialist exam in medical laboratory genetics that are available online. The questions were classified by topic, genetic changes, diagnostic techniques, clinical case, and calculations, and by complexity (simple, complex). Each question was asked three times on ChatGPT to test not just the correctness but also the reliability of responses across multiple rounds of interaction. Statistical tests were then used to compare differences in performance by category, level of complexity, and repeatability.
Key Findings
-
Overall Accuracy: ChatGPT correctly answered 59% of the questions, statistically significant (P < 0.001).
By Category:
-
Calculation-based questions: 71% accurate
-
Genetic methods and genetic changes: ~60% accuracy
-
Clinical case–based questions: 37% accuracy
By Complexity:
-
Simple questions: 63% accurate
-
Complex questions: 43% accuracy (P = .001)
-
Consistency: The AI model had consistent performance throughout three repeated sessions (P = 0.43), which reflects reliability in output even when being asked repeatedly.
This research concluded that although ChatGPT-3.5 performs moderate accuracy and stable performance in responding to medical laboratory genetics exam questions, it lags behind in dealing with complex and clinical case–based reasoning. Consequently, its version at present may assist in education but is not yet adequate for advanced or high-stakes implementation in genetic medicine. Further advancement in AI reasoning and domain adaptation will be required before these tools can be introduced to professional medical education or practice with confidence.
Reference:
Klaudia Paruzel, Michał Ordak, Assessment of ChatGPT-3.5 performance on the medical genetics specialist exam, Laboratory Medicine, 2025;, lmaf038, https://doi.org/10.1093/labmed/lmaf038