Patient Information Extraction Using Optical Character
Keywords:
Information Extraction, Optical Character Recognition, OCRAbstract
At present, digital transformation has played an important role for driving public health agencies to the age of digital transformation, and leading to information usage. However, some patient data still has stored in the format of hard copies, scanned document, images and PDFs which need to be transformed into digital form for future use. The paper aims to propose the technique for recognizing the text from a physical document into digital format, by using Optical Character Recognition, also called OCR that attempts to extract all the text from photocopies into structured data format. The experimental studies showed that the proposed technique
makes the digitized documents completely searchable and editable with the average of accuracy performance around 74.62% for extracting attributes and 68.46% for extracting values from printed documents. Utilizing OCR helps the public health agencies to easily turn documents into digital form, provides more effective use of information, and also helps reduce amount of paper taking up space in the organization.
References
Vamvakas, G., Gatos, B., Stamatopoulos, N., & Perantonis, S. (2008). A Complete Optical Character Recognition Methodology for Historical Documents. 2008 The Eighth IAPR International Workshop on Document Analysis Systems.
Pai, N., & Kolkure, V., S. (2015). Optical Character Recognition: An Encompassing Review. International Journal of Research in Engineering and Technology (IJRET), Volume-4(Issue-1), 407-409.
Patel, C., Patel, A., & Patel, D. (2012). Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study. International Journal of Computer Applications, 55(10), 50–56.
Santinanalert, C. (1999). Design and development of a Thai-OCR Program. Bangkok, Thailand: Chulalongkorn University.
Mithe, R., Indalkar, S., & Divekar, N. (2013). Optical Character Recognition. International Journal of Recent Technology and Engineering (IJRTE), Volume-2 (Issue-1), 72–75.
Chumwatana, T. (2017). A Comparative Study of Clustering Techniques for Non-segmented Language Documents. RANGSIT JOURNAL OF ARTS AND SCIENCES (RJAS), Vol. 7, No. 1. (pp. 11-22).
Todsanai Chumwatana “A Survey of Automatic Indexing Techniques for Thai Text Documents,” In Information Technology Journal, KING MONGKUT’S UNIVERSITY OF TECHNOLOGY NORTH BANGKOK, Thailand Volume17,2013.
Todsanai Chumwatana and Ichayaporn Chuaychoo “Automatic Filtering Non-English Complaint Emails in Tourism Industry Using N-gram Extraction and Classification Techniques”, The 2016 4th International Symposium on Computational and Business Intelligence (ISCBI 2016), pp. 216 – 220, Olten, Switzerland, September 5-7, 2016