The volume of medical text data - especially digitized and unstructured narrative text - has taken off with the adoption of EHRs and other digital tools used by clinicians, researchers and patients. All of this medical text is fueling a surge of NLP model development, to turn those medical text datasets into insights that impact care delivery, biomedical research, and much more. As every healthcare, medical device, and pharmaceutical company deepens their investments in AI, we’re seeing a growing number of clients developing NLP, and an increased need for medical text annotations.
We’ve taken a deep dive into the nuances of NLP in healthcare, and the medical text datasets at its foundation, and are excited to share our insights in this blog series.
We'll start with 4 topics -
We hope you’re as excited as we are to dig deeper into NLP and medical text, but first let’s level set. What is NLP and what do we mean by ‘medical text’?
NLP, or Natural Language Processing, is a field of artificial intelligence in which computers analyze, understand, and derive meaning from human language in a useful way. NLP is being applied in most parts of the healthcare value chain - in fact, 36% of healthcare organizations planned to implement NLP in 2021, according to a study by Gradient Flow.
From accelerating subject recruitment in clinical trials, to supporting decisions and diagnoses at the point of care, to helping consumers find the accurate health information they need - there’s a lot of impact to be excited about, and we’ll give many detailed examples in a later post in this series.
At Centaur Labs we use ‘medical text’ as shorthand for multiple types of health-related data stored as text. Medical text encompasses 3 sub-categories of text - clinical text, biomedical text, and ‘other’ health text.
Clinical text is defined as text collected during the course of ongoing patient care in the formal healthcare system, or as part of a formal clinical trial program, according to the University of Washington. We often think of clinical text as written by clinicians themselves, for example when making clinical notes in the electronic health record (EHR), in a prescription write up, or in a pathology report.
However, patients also generate an increasing volume of clinical text. They may describe the beginning of their healthcare journey, i.e. a discussion with a chatbot automating part of patient intake, or in a typical in-person clinical visit with a therapist. Patients may also describe the middle or end of their healthcare journey, i.e. a transcript of a follow up call with a clinician, an insurance claim, or a discharge description.
Clinical text can also be written by others in the healthcare ecosystem who participate in patient care, such as insurance claims processors or family members.
Biomedical text is any data stored as text collected or created throughout the course of medical research. Often this text is written by research scientists in the form of scientific papers published in academic journals, or as unpublished intellectual property. This unpublished text is owned and managed by the academic institution or company who employed the scientist and funded the research the paper summarizes.
There are many types of health-related text generated outside patient care delivery and scientific research. For example, users of consumer wellness applications, like Noom or Headspace, share health information in text formats as they interact with private groups, or company ‘coaches’. Consumers also share health information in public forums like Twitter and Reddit. Government agencies publish public health-related text to their populations, and regulatory bodies publish guidelines and rulings to their constituents. Companies in the healthcare ecosystem train their workforces - whether clinicians or pharmaceutical sales reps - with platforms that deliver content containing health-related text.
These sub-categories of medical text have one thing in common - they contain health concepts and terminology, and subtle relationships between words and phrases that are difficult for an untrained eye to identify. Those with interest in medicine, healthcare and biology - whether a seasoned clinician, a medical student, or a medical enthusiast - are best suited to interpret medical text. In this blog series, we’ll focus on NLP that leverages medical text - whether clinical, biomedical or ‘other’ - as this is where some of the most compelling innovation is taking place.
What kind of medical text are you working with? What applications of NLP in healthcare excite you most? Connect with us on LinkedIn or send us a note at email@example.com - we’d love to hear from you.
From SMS to insurance claims to pathology reports and scientific studies, in this post we dig into the most common type of medical text datasets leveraged for NLP in healthcare.
Our research collaboration with Dr. Jeremy M Wolfe just published in Cognitive Research: Principles and Implications.