The timely communication of critical and unexpected findings in diagnostic imaging significantly impacts patient outcomes. AI radiology reports are emerging as a vital tool to streamline this process. Specifically, a recent study examined the capability of fine-tuned Large Language Models (LLMs) to accurately predict high-priority (HP) radiology reports. These reports contain unexpected findings of high urgency or severity. Consequently, quickly flagging these reports is essential for efficient clinical decision-making.
Performance of LLMs in Triage
The study utilized 1906 reports for training and 176 for testing. This ensured a balanced ratio of HP to non-HP cases. The researchers fine-tuned four types of LLMs: Llama2 7B, Llama3 8B, Llama3 Elyza 8B, and Llama 3.1 8B. In addition, they tested four input settings. These ranged from findings-only to a comprehensive set. Therefore, the best configuration could be rigorously identified.
The Llama3 Elyza 8B model, using the findings and the referring department as input, achieved the strongest results. Furthermore, its performance metrics were notably high, including a PRAUC of 0.962 and an ROCAUC of 0.968. The model’s accuracy was 0.915, with a recall (sensitivity) of 0.932. Interestingly, adding more clinical context, such as the clinical diagnosis before the examination or details of the examination request, did not further improve the model’s predictive power. This finding suggests a targeted approach to data input is most effective.
Improving Workflow with AI radiology reports
The successful prediction of HP radiology reports by the fine-tuned LLMs has significant clinical relevance. For example, by accurately identifying reports with high urgency, the technology can support faster communication. Consequently, this leads to quicker clinical decisions and enhances patient safety. The model’s ability to automate the initial triage process suggests a promising avenue for improving the efficiency of the overall radiology workflow. Physicians can thus focus their immediate attention on the most critical cases, ensuring timely intervention.
Frequently Asked Questions
Q1: What defines a high-priority (HP) radiology report in this study?
HP reports include unexpected findings that are classified as having high urgency or severity, based on recommendations from the Academy of Royal Colleges.
Q2: Which Large Language Model demonstrated the best performance in identifying HP reports?
The fine-tuned Llama3 Elyza 8B model, using input comprising the findings and the referring department, showed the best results, achieving a high PRAUC and ROCAUC.
References
- Umeno A et al. Identification of high-priority radiology reports with unexpected findings using fine-tuned large language models. Eur Radiol. 2025 Dec 24. doi: 10.1007/s00330-025-12252-2. PMID: 41441998.
