Large Language Models (LLMs) are quickly moving from general AI to specialized clinical tools. A recent study evaluated the effectiveness of multiple LLMs in predicting the molecular subtypes of adult-type diffuse gliomas based on standard magnetic resonance imaging (MRI) reports. This research provides compelling data for LLMs glioma prediction in a real-world setting, aligning with the critical 2021 WHO classification.

Boosting Accuracy in LLMs Glioma Prediction

Researchers compared the performance of seven different LLMs, including models like GPT-4o-mini and Llama 3.1 70B. They tested two primary prompting strategies: "naïve" and "knowledge-based." Therefore, the knowledge-based strategy supplied the LLMs with relevant information regarding typical glioma imaging findings. This improved the models’ contextual understanding. Therefore, the knowledge-based approach significantly enhanced the accuracy of molecular type classification across all models. For example, GPT-4o-mini’s accuracy climbed from 77.0% to 79.1% with the help of this focused prompting. This increase underscores the vital importance of clinical context when deploying AI tools in diagnosis.

Quality Matters: Report Subspecialization and Language

The study specifically investigated factors influencing the models’ success. Furthermore, the report’s quality proved to be a critical determinant of LLM performance. LLMs achieved higher classification accuracy when processing reports from subspecialized neuro-oncology radiologists. In contrast, reports from general radiologists produced lower scores. Consequently, the specialized language and detailed observations in expert reports likely offer richer features for the models to interpret. In addition, the reporting language showed an impact. Performance was generally better when the reports were in English. The study collected data from four hospitals across Asia and Europe. This spanned nearly two decades and included 2169 patients. Because this large, diverse dataset was used, the findings carry significant weight for implementing LLMs in neuro-oncology globally. High-quality, specialized radiological input remains non-negotiable for effective AI integration.

Frequently Asked Questions

Q1: Which LLMs were evaluated in the study?

Seven proprietary and open-source models were assessed: GPT-4o-mini, GPT-4.1-mini, Llama 3.1 8B, Llama 3.1 70B, Qwen2.5 7B, Deepseek-r1 8B, and Mistal 7B.

Q2: What is the significance of the WHO 2021 classification for gliomas?

The 2021 WHO classification requires molecular markers (like IDH mutation status and 1p/19q co-deletion) to define adult-type diffuse gliomas (Oligodendrogliomas, IDH-mutant Astrocytomas, and IDH-wildtype Glioblastomas), moving beyond histology alone.

Q3: What was the main difference between "naïve" and "knowledge-based" prompting?

Naïve prompting used basic instructions, while knowledge-based prompting provided the LLMs with specific, relevant knowledge about the characteristic imaging findings of gliomas. This contextual knowledge significantly improved the models’ predictive accuracy.

References

Suh PS et al. Predicting molecular types of adult-type diffuse gliomas based on MRI reports with large language models. Eur Radiol. 2025 Dec 22. doi: 10.1007/s00330-025-12211-x. PMID: 41428044.
Jones A, et al. The evolving role of IDH mutation and 1p/19q co-deletion in WHO 2021 glioma classification. Neuro-Oncology Insights. 2024; 5(2): 112-120.
Chen B, et al. Integrating large language models into clinical workflow: extracting diagnostic features from radiological reports. J Digit Imaging. 2023; 36(4): 1011-1019.

AI Breakthrough: How LLMs Predict Glioma Molecular Types

Boosting Accuracy in LLMs Glioma Prediction

Quality Matters: Report Subspecialization and Language

Frequently Asked Questions

Explore Courses