[New Series] Introducing our NLP in Healthcare blog series

Ali Devaney, Marketing
August 19, 2022

The volume of medical text data - especially digitized and unstructured narrative text - has taken off with the adoption of EHRs and other digital tools used by clinicians, researchers and patients. All of this medical text is fueling a surge of NLP model development, to turn those medical text datasets into insights that impact care delivery, biomedical research, and much more. As every healthcare, medical device, and pharmaceutical company deepens their investments in AI, we’re seeing a growing number of clients developing NLP, and an increased need for medical text annotations.

Our NLP in healthcare blog series

We’ve taken a deep dive into the nuances of NLP in healthcare, and the medical text datasets at its foundation, and are excited to share our insights in this blog series.

We'll start with 4 topics - 

  1. 9 of the most common types of medical text datasets - From text in clinical notes, to chatbots, we’ll explain some of the most common types of medical text available for NLP projects. 
  2. 5 types of medical text annotation - In this blog we’ll go deeper on the common types of annotation applied to medical text, from key phrase tagging to entity linking.
  3. Best practices for annotating medical text [8 Tips] - Centaur Labs focuses on providing accurate and scalable data annotations for AI in the medical and life sciences. We’ll share some of our best practices so you can make your medical text annotation workstreams a success. 
  4. 12 impressive healthcare companies developing NLP for healthcare - Finally, our favorite, we’ll showcase some interesting ways digital health, medical device and pharmaceutical companies are leveraging medical text for their NLP projects.

Overview of NLP and medical text

We hope you’re as excited as we are to dig deeper into NLP and medical text, but first let’s level set. What is NLP and what do we mean by ‘medical text’?

What is NLP and how is it being applied in healthcare?

NLP, or Natural Language Processing, is a field of artificial intelligence in which computers analyze, understand, and derive meaning from human language in a useful way. NLP is being applied in most parts of the healthcare value chain - in fact, 36% of healthcare organizations planned to implement NLP in 2021, according to a study by Gradient Flow.

From accelerating subject recruitment in clinical trials, to supporting decisions and diagnoses at the point of care, to helping consumers find the accurate health information they need - there’s a lot of impact to be excited about, and we’ll give many detailed examples in a later post in this series.

What is medical text?

At Centaur Labs we use ‘medical text’ as shorthand for multiple types of health-related data stored as text. Medical text encompasses 3 sub-categories of text - clinical text, biomedical text, and ‘other’ health text. 

Clinical text

Clinical text is defined as text collected during the course of ongoing patient care in the formal healthcare system, or as part of a formal clinical trial program, according to the University of Washington. We often think of clinical text as written by clinicians themselves, for example when making clinical notes in the electronic health record (EHR), in a prescription write up, or in a pathology report.

However, patients also generate an increasing volume of clinical text. They may describe the beginning of their healthcare journey, i.e. a discussion with a chatbot automating part of patient intake, or in a typical in-person clinical visit with a therapist. Patients may also describe the middle or end of their healthcare journey, i.e. a transcript of a follow up call with a clinician, an insurance claim, or a discharge description.

Clinical text can also be written by others in the healthcare ecosystem who participate in patient care, such as insurance claims processors or family members. 

Biomedical text

Biomedical text is any data stored as text collected or created throughout the course of medical research. Often this text is written by research scientists in the form of scientific papers published in academic journals, or as unpublished intellectual property. This unpublished text is owned and managed by the academic institution or company who employed the scientist and funded the research the paper summarizes.

Other health text

There are many types of health-related text generated outside patient care delivery and scientific research. For example, users of consumer wellness applications, like Noom or Headspace, share health information in text formats as they interact with private groups, or company ‘coaches’. Consumers also share health information in public forums like Twitter and Reddit. Government agencies publish public health-related text to their populations, and regulatory bodies publish guidelines and rulings to their constituents. Companies in the healthcare ecosystem train their workforces - whether clinicians or pharmaceutical sales reps - with platforms that deliver content containing health-related text. 

Medical text is difficult to interpret

These sub-categories of medical text have one thing in common - they contain health concepts and terminology, and subtle relationships between words and phrases that are difficult for an untrained eye to identify. Those with interest in medicine, healthcare and biology - whether a seasoned clinician, a medical student, or a medical enthusiast - are best suited to interpret medical text. In this blog series, we’ll focus on NLP that leverages medical text - whether clinical, biomedical or ‘other’ - as this is where some of the most compelling innovation is taking place. 

4 key takeaways

  1. NLP in healthcare is booming, with 36% of healthcare organizations implementing NLP in 2021
  2. Medical text has three subcategories - 1) clinical text, 2) biomedical text and 3) ‘other’ health text. 
  3. Medical text, unlike administrative or business operations text, contains health related concepts, terminology and relationships difficult for an untrained eye to identify. 
  4. This blog series focuses on NLP that leverages medical text!

What now?

What kind of medical text are you working with? What applications of NLP in healthcare excite you most? Connect with us on LinkedIn or send us a note at - we’d love to hear from you.

Related posts