The Large Language Model (LLM) revolution has finally reached the heart of radiology informatics. RadLex is a widely accepted, radiology-specific ontology. It works to standardize terminology for both clinical and research purposes. However, the existing ontology’s coverage of real-world clinical radiology reports is often limited. This happens because radiologists use a wide range of linguistic variations. Therefore, researchers conducted a study to assess the impact of an LLM RadLex expansion on multinational chest CT report datasets. The method proved both practical and highly scalable for improving natural language processing (NLP) applications in the field.
RadLex Limitations and LLM’s Role
RadLex provides a standardized vocabulary, but linguistic variations in free-text reports consistently challenge its utility in real-world NLP applications. Historically, efforts to expand ontologies have been manual and time-consuming. Consequently, the limited coverage presents a major barrier to high-fidelity data extraction. The new study used an LLM, specifically Gemini 2.0 Flash Thinking, to generate a massive expansion of the RadLex lexicon. The LLM generated lexical variants, including morphologic and orthographic variants, along with acronyms and abbreviations. Additionally, the model generated strict semantic synonyms for 40,000 RadLex preferred terms. Researchers implemented detailed constraints during this generation to maintain strict semantic alignment. They then tested this expanded terminology list against five separate datasets of chest CT reports from Korea, Spain, Turkey, and the United States.
Key Findings: The LLM RadLex Expansion
The sheer scale of the LLM-generated content is immediately apparent. The original RadLex-provided expansion contained 17,515 terms. In contrast, the LLM-generated expansion added 208,465 lexical variants and 69,918 synonyms. This massive increase directly translated into significant performance improvements across all multinational datasets. The LLM-generated expansion resulted in a greater lexical coverage rate, which ranged from 81.9% to 85.6%. This coverage was much higher than the 67.5% to 75.3% achieved by the RadLex-provided expansion. Moreover, the LLM expansion demonstrated greater semantic recall, ranging from 81.6% to 91.4%, compared to 64.0% to 80.3%. Precision scores for the LLM expansion were slightly lower, falling between 94.8% and 98.2%, versus the RadLex-provided expansion’s 100.0% precision. Nevertheless, the F1 score, which balances precision and recall, was higher for the LLM-generated list (0.91-0.95 vs 0.86-0.91). This confirms that the LLM approach successfully balanced term coverage and semantic fidelity.
Clinical and Research Implications
The LLM-based approach offers a practical, scalable, and generalizable solution for refining and updating radiology ontologies. This is crucial because LLMs are now extensively used in various radiological tasks, including report generation and clinical decision support. By improving the lexical coverage and semantic recall of RadLex, the foundation for subsequent NLP applications becomes far more robust. Essentially, better ontology coverage facilitates the creation of high-quality, labeled datasets for machine learning, which directly aids in addressing the growing volume of medical images and the radiologist shortage. Furthermore, standardizing terminology through an expanded RadLex enhances data interoperability and supports critical workflows like quality assurance and research data collection. This methodology can be applied to other specialized medical terminologies, ultimately benefiting EHR integration and global clinical research.
Frequently Asked Questions
Q1: What problem does the LLM RadLex expansion solve?
It solves the problem of limited coverage in the existing RadLex ontology. Radiologists use too many linguistic variations (synonyms, abbreviations) in free-text reports, which prevents Natural Language Processing (NLP) tools from accurately recognizing medical concepts.
Q2: How did the LLM-generated expansion perform compared to the original RadLex?
The LLM expansion significantly outperformed the original by achieving a higher lexical coverage rate and a greater F1 score (0.91-0.95 vs 0.86-0.91). This means it successfully identified more terms with a strong balance of precision and recall, with only a small loss of semantic precision.
References
- Lee T et al. Large Language Model-Generated Expansion of the RadLex Ontology: Application to Multinational Datasets of Chest CT Reports. AJR Am J Roentgenol. 2026 Jan 14. doi: 10.2214/AJR.25.34243. PMID: 41532616.
- Zhang Y et al. The impact of large language models on radiology: a guide for radiologists on the latest innovations in AI. Insights Imaging. 2024 Mar 29.
- Chen C, Ma Y, Zhang Y, et al. Analysis of RadLex Coverage and Term Co-occurrence in Radiology Reporting Templates. AMIA Annu Symp Proc. 2012;2012:124-133.
- RadLex radiology lexicon. RSNA. Available at: https://www.rsna.org/Informatics/radlex-radiology-lexicon. Accessed January 15, 2026.
