Weakly-supervised natural language processing with BERT-Clinical for automated lesion information extraction from free-text MRI reports in multiple sclerosis patients
Abstract
Purpose: To investigate how bidirectional encoder representations from transformers (BERT)-based models help extract treatment response information from free-text radiology reports. Materials and methods: This study involved 400 brain MRI reports from 115 participants with multiple sclerosis. New MRI lesion activity including new or enlarging T2 (newT2) and enhancing T1 (enhanceT1) lesions for assessing treatment responsiveness was identified using the named entity recognition technique along with BERT. Likewise, 2 other associated entities were also identified: the remaining brain MRI lesions (regT2), and lesion location. Report sentences containing any of the 4 entities were labeled for model development, totally 2568. Four recognized BERT models were investigated, each with conditional random field integrated for lesion versus location classification, trained using variable sample sizes (500–2000 sentences). Regularity was then applied for lesion subtyping. Model evaluation utilized a flexible F1 score, among others. Results: The Clinical-BERT performed the best. It achieved the best testing flexible F1 score of 0.721 in lesion and location classification, 0.741 in lesion only classification, and 0.771 in regT2 subtyping. With growing sample sizes, only Clinical-BERT performed increasingly better, which also had the best area under the curve of 0.741 in lesion classification at training using 2000 sentences. The PubMed-BERT achieved the best testing flexible F1 score of 0.857 in location only classification, and 0.846 and 0.657 in subtyping newT2 and enhanceT1, respectively. Conclusion: Based on a small sample size, our methods demonstrate the potential for extracting critical treatment-related information from free-text radiology reports, especially Clinical-BERT.
References
1. Brown RA, Houshyar R, & Kelly AM. Extracting data from unstructured text in electronic health records: A review of natural language processing. Applied clinical informatics. 2019; 10(3): 517-529.
2. Brownlee WJ, Hardy TA, Fazekas F, Miller DH. Diagnosis of multiple sclerosis: progress and challenges. Lancet. 2017; 389(10076): 1336-1346. doi: 10.1016/S0140-6736(16)30959-X
3. Pandit L. No evidence of disease activity (NEDA) in multiple sclerosis—Shifting the goal posts. Annals of Indian Academy of Neurology. 2019; 22(3): 261-263. doi: 10.4103/aian.AIAN_159_19
4. Hossain E, Rana R, Higgins N, et al. Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Computers in Biology and Medicine. 2023; 155: 106649. doi: 10.1016/j.compbiomed.2023.106649
5. Smeaton AF. Using NLP or NLP resources for information retrieval tasks. In: Strzalkowski T (editors). Natural language information retrieval. Springer Netherlands; 1999. pp. 99-111.
6. Wang Q, Liu P, Zhu Z, et al. A Text Abstraction Summary Model Based on BERT Word Embedding and Reinforcement Learning. Applied Sciences. 2019; 9(21): 4701. doi: 10.3390/app9214701
7. Lende SP, Raghuwanshi MM. Question answering system on education acts using NLP techniques. In: Proceedings of 2016 world conference on futuristic trends in research and innovation for social welfare (Startup Conclave); 29 February–1 March 2016; Coimbatore, India. pp. 1-6.
8. Devlin J, Chang MW, Lee K, & Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 4171-4186.
9. Zaman S, Petri C, Vimalesvaran K, et al. Automatic Diagnosis Labeling of Cardiovascular MRI by Using Semisupervised Natural Language Processing of Text Reports. Radiology: Artificial Intelligence. 2022; 4(1). doi: 10.1148/ryai.210085
10. Feller DJ, Zucker J, Yin MT, et al. Using Clinical Notes and Natural Language Processing for Automated HIV Risk Assessment. JAIDS Journal of Acquired Immune Deficiency Syndromes. 2018; 77(2): 160-166. doi: 10.1097/qai.0000000000001580
11. Le Guellec B, Lefèvre A, Geay C, et al. Performance of an Open-Source Large Language Model in Extracting Information from Free-Text Radiology Reports. Radiology: Artificial Intelligence. 2024; 6(4). doi: 10.1148/ryai.230364
12. Souza F, Nogueira R, Lotufo R. Portuguese named entity recognition using BERT-CRF. ArXiv. 2019.
13. Pilicita A, Barra E. Using of Transformers Models for Text Classification to Mobile Educational Applications. IEEE Latin America Transactions. 2023; 21(6): 730-736. doi: 10.1109/tla.2023.10172138
14. Lin C, Miller T, Dligach D, et al. EntityBERT: Entity-centric masking strategy for model pretraining for the clinical domain. In: Proceedings of the 20th Workshop on Biomedical Language Processing.
15. Yan B, Pei M. Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation. Proceedings of the AAAI Conference on Artificial Intelligence. 2022; 36(3): 2982-2990. doi: 10.1609/aaai.v36i3.20204
16. Paszke A, Gross S, Massa F, et al. Pytorch: An imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems; 8–14 December 2019; Vancouver, BC, Canada.
17. Koroteev MV. BERT: a review of applications in natural language processing and understanding. ArXiv. 2021.
18. Geetha MP, Karthika Renuka D. Improving the performance of aspect based sentiment analysis using fine-tuned Bert Base Uncased model. International Journal of Intelligent Networks. 2021; 2: 64-69. doi: 10.1016/j.ijin.2021.06.005
19. He P, Liu X, Gao J, et al. Deberta: Decoding-enhanced bert with disentangled attention. ArXiv. 2021.
20. Maurya M. Name entity recognition and various tagging schemes. Available online: https://medium.com/@muskaan.maurya06/name-entity-recognition-and-various-tagging-schemes-533f2ac99f52 (accessed on 2 January 2025).
21. Kingma DP, Ba J. Adam: A method for stochastic optimization. ArXiv. 2017. doi: 10.48550/arXiv.1412.6980
22. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. ArXiv. 2016. doi: 10.48550/arXiv.1409.0473
23. Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters. 2006; 27(8): 861-874. doi: 10.1016/j.patrec.2005.10.010
24. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics. 1988; 44(3): 837. doi: 10.2307/2531595
25. Lamproudis A, Henriksson A, Dalianis H. Evaluating pretraining strategies for clinical BERT models//Proceedings of the Thirteenth Language Resources and Evaluation Conference. 2022: 410-416.
26. Mitchell JR, Szepietowski P, Howard R, et al. A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study. Journal of Medical Internet Research. 2022; 24(3): e27210. doi: 10.2196/27210
27. Peng Y, Yan S, Lu Z. Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In: Proceedings of the 18th BioNLP Workshop and Shared Task; Florence, Italy.
28. Chaturvedi J, Shamsutdinova D, Zimmer F, et al. Sample size in natural language processing within healthcare research. ArXiv. 2023. doi: 10.48550/arXiv.2309.02237
29. Su P, Vijay-Shanker K. Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction. BMC Bioinformatics. 2022; 23(1). doi: 10.1186/s12859-022-04642-w
30. Yu Y, Si X, Hu C, et al. A review of recurrent neural networks: LSTM cells and network architectures. Neural computation. 2019; 31(7): 1235-1270. doi: 10.1162/neco_a_01199
31. Siami-Namini S, Tavakoli N, Namin AS. The performance of LSTM and BiLSTM in forecasting time series. In: Proceedings of 2019 IEEE International conference on big data (Big Data); 9–12 December 2019; Los Angeles, CA, USA. pp. 3285-3292.
32. Liu X, Zheng Y, Du Z, et al. GPT understands, too. AI Open. 2024; 5: 208-215. doi: 10.1016/j.aiopen.2023.08.012
33. Delobelle P, Winters T, Berendt B. Robbert: a dutch roBERTa-based language model. Available online: https://www.aclweb.org/anthology/2020.findings-emnlp.292 (accessed on 2 January 2025).
34. Roy A. Recent trends in named entity recognition (ner). ArXiv. 2021. doi: 10.48550/arXiv.2101.11420
35. Zheng S, Jayasumana S, Romera-Paredes B, et al. Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision. pp. 1529-1537.
36. Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv. 2019. doi: 10.48550/arXiv.1910.01108
37. Huang K, Altosaar J, Ranganath R. Clinicalbert: Modeling clinical notes and predicting hospital readmission. ArXiv. 2020. doi: 10.48550/arXiv.1904.05342
38. Liu Y, Ott M, Goyal N, et al. RoBERTa: A robustly optimized BERT pretraining approach. ArXiv. 2019. doi: 10.48550/arXiv.1907.11692
39. Iroju O G, Olaleke J O. A systematic review of natural language processing in healthcare. International Journal of Information Technology and Computer Science, 2015, 8(8): 44-50.
Copyright (c) 2025 Author(s)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright on all articles published in this journal is retained by the author(s), while the author(s) grant the publisher as the original publisher to publish the article.
Articles published in this journal are licensed under a Creative Commons Attribution 4.0 International, which means they can be shared, adapted and distributed provided that the original published version is cited.