Managing Unstructured Healthcare Data

Background

Modern technology usage has generated a large volume of data like never seen before. In healthcare, data is critical for making decisions. However, with a large amount of unstructured data, healthcare professionals struggle to manage them. Ineffective data management can lead to the following scenario. 

Modern technology usage has generated a large volume of data like never seen before. In healthcare, data is critical for making decisions. However, with a large amount of unstructured data, healthcare professionals struggle to manage them. Ineffective data management can lead to the following scenario. 

The Consequence of Unmanaged Data in Real Life 

Imagine visiting your doctor’s office. Your doctor carefully listens to your symptoms and enters them into an Electronic Patient Record system. She then reviews the information and prescribes medication to help you recover from the flu. You feel better after taking the medication for two weeks. However, you experience the same symptoms again a month later. You decide to visit your doctor again. She accesses your medical record and family medical record to determine if you have a chronic illness. However, your doctor saw about 30 patients before you and reviewed over 60 patient records. You are her last patient. She is tired. She briefed over your medical history and prescribed you with insufficient medication. 

Your doctor’s oversight and the lack of time for patient record review led to the wrong treatment. How can we utilize technology to ensure mistakes like this do not happen again? We first need to understand the difference between structured and unstructured data. 

Structured vs. Unstructured Data

Data are often divided into two types: unstructured and structured data. Structured data, as the name suggests, is information that can be stored and displayed in a consistent and organized manner. This type of data can be validated against expected or biologically plausible ranges and can be easily analyzed and interpreted. Examples of health data that would fall into this category include coded health data with a standardized code system such as SNOMED, LOINC, ICD-CM, etc. Structured data also include numerical values like height, weight and blood pressure, as well as categorical values like blood type or ordinal values like the stages of disease diagnosis.

Unlike structured data, unstructured data are often in the form of free texts and narratives that most analytics software cannot collect and analyze with numerical methods to derive useful insights. Unstructured data is much more difficult to analyze and interpret than structured data. Free texts cannot be easily categorized in the same way that a structured, numerical data point can. For example, a blood pressure reading is represented with few numbers. However, clinical information e.g. patient symptoms during a doctor’s visit is often recorded as unstructured text. A physician’s note indicating medical symptoms would require human interpretation due to the domain-specific vocabulary, potential spelling errors and abbreviations.

Therefore, unstructured free text data must be converted to a more structured format. The process of conversion may be a time-consuming task and not include all parts of the information. The problem may be solved through Natural Language Processing Solutions. 

The Need for Natural Language Processing Solutions

According to McKinsey, NLP is a “specialized brand of AI focused on the interpretation and manipulation of human-generated spoken or written data.” The rate at which unstructured clinical information is created, automated solutions utilizing Natural Language Processing (NLP) are needed to analyze this text and generate structured representations. 

The Benefits of Turning Unstructured Data into Structured Data

There are multiple benefits of utilizing an NLP system to produce accurate and efficient solutions in healthcare. First, there will be a reduction of time required for manual expert review. Healthcare professionals will spend less time reading and interpreting Electronic Health Records and free texts. The benefit will also apply to safety reviewers at the Food and Drug Administration who read large numbers of narratives from reports for medical products. Practitioners who try to keep-up-to-date with medical literature will also save time from having accurate information readily available. The second benefit includes the ability for large scale automated processing. Having the ability to manage and mind clinical data in large volumes or across large time scales is vital for implementing algorithms to define patients at risk of certain diseases. This means that all information can be used to provide insight into a decision. Because much of the information remains unexplored due to the lack of a structured format, the addition of insights due to NLP solutions could lead to more knowledge within the progression and treatment of diseases.  

How can CareIndexing Help You?

CareIndexing is an NLP solution that converts unstructured text into structured, codified content in an automated manner. One key benefit of CareIndexing is that it utilizes HealtTerm to enhance concept recognition. CareIndexing is specific to the healthcare industry. The concepts found in unstructured free texts can be sorted based on different groupings related to the area of interest, such as diseases or procedures. The quality of concept recognition in CareIndexing is superior to the existing standard in open source approaches in regard to identifying diseases and clinical findings in discharge summaries. 

Leave a Comment

Your email address will not be published.