Automated information extraction from echocardiography reports for the development of a congenital heart disease database

Authors

  • Paradorn Chan-On Department of Pediatrics, Udon Thani Hospital, Udon Thani

Keywords:

Congenital heart disease, ealth information systems, Data extraction, Optical character recognition, Artificial intelligence, Natural language processing, Hospital administration, Thailand

Abstract

Congenital Heart Disease (CHD) is a common condition in children in Thailand. Clinical data are often kept in different systems without standardization, and manual registry creation requires much time and staff effort. In hospitals with heavy workloads, this becomes a challenge. The aim of this study was to develop and evaluate an automated system to create a CHD registry from echocardiography reports. An automated data extraction workflow was developed using N8N as the primary integration platform. The system processes image-based echocardiography reports, beginning with Optical Character Recognition (OCR) to convert images to text. Subsequently, an agentic AI, powered by the ChatGPT-4o model, extracts key clinical information, transforming unstructured text into structured JSON data according to a specifically designed instruction set. This structured data is then automatically populated into a Google Sheet, which functions as the patient registry database. Accuracy was evaluated by comparing the system's output against data manually verified by a pediatric cardiologist. Efficiency was assessed by comparing the automated processing time with manual data entry time. A total of 301 echocardiography reports were processed. The automated system identified the main diagnosis with 86.7% accuracy and additional diagnoses with 88.0% accuracy. The average processing time was 12.3 seconds per report, compared with 84 seconds for manual entry, saving 87.7 seconds per report (p<0.001). The system exhibited a major error rate (process failure) of only 0.997% and a minor error rate (data extraction with issues) of 9.9%. The developed system proves to be an accurate, efficient, and reliable method for automatically creating a specialized CHD patient registry from existing clinical reports. This approach presents a viable solution to overcome resource limitations and enhance clinical data management within the context of the Thai public health system.

 

References

I. E. Hoffman and S. Kaplan, “The incidence of congenital heart disease,” J Am Coll Cardiol, vol. 39, no. 12, pp. 1890-1900, Jun. 2002.

ควรหาเวช ณ.,”โรคหัวใจแต่ก􀂷ำเนิดในทารกแรกเกิดและ เด็กเล็กที่พบบ่อยในทางเวชปฏิบัติ”,Thai J Pediatr, ปี 64, ฉบับที่ 2, น. 1-14, มิ.ย. 2025.

Nicholson, G. Strange, J. Ayer, M. Cheung, L. Grigg, R. Justo, et al., “A national Australian congenital heart disease registry; methods and initial results,” International Journal of Cardiology Congenital Heart Disease, vol. 17, p. 100538, Sep. 2024.

Watelle, L. O. Roy, J. Lauzon-Schnitka, G. Newell, A. Dumas, A. Nadeau, et al., “The Quebec congenital heart disease registry: A model of prospective databank to facilitate research in congenital cardiology,” CJC Pediatric and Congenital Heart Disease, vol. 3, no. 2, pp. 57-66, Apr. 2024.

M. Silva, I. M. Kuipers, F. Van Den Heuvel, R. Mendes, R. M. F. Berger, I. M. Van Beynum, et al., “KinCor, a national registry for paediatric patients with congenital and other types of heart disease in the Netherlands: Aims, design and interim results,” Neth Heart J, vol. 24, no. 11, pp. 628-639, Nov. 2016.

CipherNutz, “N8N Healthcare Automation: How It Works to Improve Workflows,” CipherNutz, 7 Aug. 2025. [Online]. Available: https://ciphernutz.com/ blog/n8n-healthcare-automation. [Accessed: 28-Sep-2025

Szekér, G. Fogarassy, and Á. Vathy-Fogarassy, “A general text mining method to extract echocardiography measurement results from echocardiography documents,” Artificial Intelligence in Medicine, vol. 143, p. 102584, Sept. 2023, doi: 10.1016/j.artmed.2023.102584.

Dong, N. Sunderland, A. Nightingale, D. P. Fudulu, J. Chan, B. Zhai, A. Freitas, M. Caputo, A. Dimagli, S. Mires, M. Wyatt, U. Benedetto, and G. D. Angelini, “Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database,” Bioengineering (Basel), vol. 10, no. 11, p. 1307, Nov. 2023, doi: 10.3390/bioengineering10111307.z

Sun, Z. Cai, Y. Li, F. Liu, S. Fang, and G. Wang, “Data Processing and Text Mining Technologies on Electronic Medical Records: A Review,” J Healthc Eng, vol. 2018, p. 4302425, 2018, doi: 10.1155 /2018/4302425.

L. Barra et al., “From prompt to platform: an agentic AI workflow for healthcare simulation scenario design,” Adv Simul, vol. 10, no. 1, p. 29, May 2025, doi: 10.1186/s41077-025-00357-z.

C.-J. Chao et al., “Evaluating large language models in echocardiography reporting: opportunities and challenges,” Eur Heart J Digit Health, vol. 6, no. 3, pp. 326–339, May 2025, doi: 10.1093/ ehjdh/ztae086

Fernandez et al., “Interoperability in universal healthcare systems: insights from Brazil’s experience integrating primary and hospital health care data,” Front. Digit. Health, vol. 7, Aug. 2025, doi: 10.3389/ fdgth.2025.1622302

Batra, N. Phalnikar, D. Kurmi, J. Tembhurne, P. Sahare, and T. Diwan, “OCR-MRD: performance analysis of different optical character recognition engines for medical report digitization,” Int. j. inf. tecnol., vol. 16, no. 1, pp. 447–455, Jan. 2024, doi: 10.1007/s41870-023-01610-2

Gifu, “AI-backed OCR in Healthcare,” Procedia Computer Science, vol. 207, pp. 1134–1143, Jan. 2022, doi: 10.1016/j.procs.2022.09.169.

“JSON-Based Patient Data Architecture: A Novel Approach to Healthcare Information Storage in Salesforce CRM | Request PDF,” ResearchGate. Accessed: Sept. 28, 2025. [Online]. Available : https://www.researchgate.net/publication/389204038_JSONBased_Patient_Data_Architecture_A_Novel_Approach_to_Healthcare_Information_Storage_in_Salesforce_CRM

Patil, T. F. Heston, and V. Bhuse, “Prompt Engineering in Healthcare,” Electronics, vol. 13, no. 15, p. 2961, Jan. 2024, doi: 10.3390/electronics13152961

K. Garg, V. L. Urs, A. A. Agarwal, S. K. Chaudhary, V. Paliwal, and S. K. Kar, “Exploring the role of ChatGPT in patient care (diagnosis and treatment) and medical research: A systematic review,” Health Promot Perspect, vol. 13, no. 3, pp. 183–191, Sept. 2023, doi: 10.34172/hpp.2023.22

Zayas-Cabán, S. N. Haque, and N. Kemper, “Identifying Opportunities for Workflow Automation in Health Care: Lessons Learned from Other Industries,” Applied Clinical Informatics, vol. 12, pp. 686–697, July 2021, doi: 10.1055/s-0041-1731744

K. Baurasien et al., “Medical Errors and Patient Safety: Strategies for Reducing Errors Using Artificial Intelligence,” IJHS, vol. 7, no. S1, pp. 3471–3487, 2023, doi: 10.53730/ijhs.v7nS1.15143

“The Significance of Data Governance in Healthcare - A Case Study in a Tertiary Care Hospital,” Proceedings of the International Conference on Health Informatics, 2014, doi: 10.5220/0004738101780187

Javaid, A. Haleem, and R. P. Singh, “ChatGPT for healthcare services: An emerging stage for an innovative perspective,” 2023.

K. Garg, V. L. Urs, A. A. Agarwal, S. K. Chaudhary, V. Paliwal, and S. K. Kar, “Exploring the role of ChatGPT in patient care (diagnosis and treatment) and medical research: A systematic review,” Health Promot Perspect, vol. 13, no. 3, pp. 183––191, Sept.
2023, doi: 10.34172/hpp.2023.22

 

Downloads

Published

2026-06-16

How to Cite

Chan-On, P. . (2026). Automated information extraction from echocardiography reports for the development of a congenital heart disease database. Journal of the Thai Medical Informatics Association, 12(1), 41–48. retrieved from https://he03.tci-thaijo.org/index.php/jtmi/article/view/5816