Patient Information Extraction Using Optical Character
Keywords:
Information Extraction, Optical Character Recognition, OCRAbstract
At present, digital transformation has played an important role for driving public health agencies to the age of digital transformation, and leading to information usage. However, some patient data still has stored in the format of hard copies, scanned document, images and PDFs which need to be transformed into digital form for future use. The paper aims to propose the technique for recognizing the text from a physical document into digital format, by using Optical Character Recognition, also called OCR that attempts to extract all the text from photocopies into structured data format. The experimental studies showed that the proposed technique
makes the digitized documents completely searchable and editable with the average of accuracy performance around 74.62% for extracting attributes and 68.46% for extracting values from printed documents. Utilizing OCR helps the public health agencies to easily turn documents into digital form, provides more effective use of information, and also helps reduce amount of paper taking up space in the organization.