Blog

From MVP to scaleup: how to 10X your medical data labeling pipeline

Chris Hilger
January 10, 2021

While Google’s DeepMind is grabbing headlines for beating radiologists at breast cancer detection,¹ there is a very real medical AI revolution going on in the background. 

By 2021, Gartner predicts that “75% of healthcare delivery organizations (HDOs) will have invested in an AI capability that is explicitly improving either operational performance or clinical outcomes."² According to recent research, the AI healthcare market is expected to grow to over $31B by 2025.³

Powering these advances are large, labeled medical datasets. As a general rule, the larger (and more accurately labeled) the dataset, the better the AI model. But as with all nascent fields, medical AI companies are facing some typical growing pains – or rather, scaling pains.

Let's say your initial medical AI model showed promising results and you are now looking to invest further and grow your team. As you scale your business, your data pipeline must scale as well. Instead of labeling hundreds or thousands of cases per month, you might need to label tens or hundreds of thousands per month.

Scaling medical data labeling - radioogy


This post will first look at why traditional labeling pipelines are so hard to scale. Then, we'll explain how our solution can 10X your labeling pipeline in a shorter time frame and with higher accuracy.

Bonus: Read how we have helped companies to scale up their data labeling pipeline using the method outlined in the post.


Why are most medical labeling pipelines hard to scale?

The core problem with scaling most medical labeling pipelines is that it is difficult to maintain a high standard of quality as the operation grows, and trying to do so sucks an increasingly large amount of time away from your most valuable people.  Did you hire your data science team to spend all of their time recruiting, managing, and quality-controlling an army of medical data labelers, or so they could build lifesaving AI?


The difference between growth and scale – Sometimes used interchangeably, growing and scaling a process are actually quite different. Applied to medical data labeling, growing your pipeline would be to produce greater labeling volume by adding an incremental number of resources - think, for every additional 1,000 cases I need to label, I will recruit, manage, and quality-control one extra labeler.

Scaling your pipeline implies that you can increase the volume of data labeled without having to substantially increase the number of resources. To label large amounts of medical data, having a scalable model is essential.

But why is it so hard to maintain a high standard of label quality while scaling a medical data labeling pipeline?


It boils down to two core components:

  1. The type of labeler network used to generate labels
  2. The type of platform used to manage the labeling process (especially the quality control process)

Types of labeler networks and their scalability

There are two typical approaches for getting your data labeled:

  1. Managing labeling in-house
  2. Outsourcing to a labeling vendor


Approach 1: Managing data labeling in-house

For a new company, time is precious. It is easy to underestimate just how much management is needed to run a successful labeling project.

To illustrate this, let’s look at the typical data labeler groups: 

  • Board-certified physicians may seem like they will generate highly accurate labels. However, they disagree more often than you might think, as discussed in our post here. Furthermore, many experts often aren’t especially motivated to maintain the highest level of quality when they are labeling data for AI. They have busy and demanding day jobs, with lives on the line right in front of them (so they might be doing the labeling when they’re winding down after a couple glasses of wine!). Lastly, they are typically too expensive to be a realistic option at scale.
  • Medical students may be more cost-effective, but recruiting and managing them will slow your workflow. Medical students are oftentimes even busier than board-certified physicians, they have a tendency to overcommit, and attrition is high. Performance and motivation might start out acceptable, but quickly degrade over time.


The common denominator is that these workflows need to be managed. They take up too much time and can quickly drain your resources. This is time that could and should be spent on valuable tasks such as sourcing new business or developing your medical AI. 

Building out an in-house labeling pipeline from scratch requires significant engineering support. Designing the workflow has to include interfacing with annotation software, QC processes, and reporting.


Key disadvantages of managing data labeling in-house:

  • Time-intensive
  • Cost-intensive
  • QC process is difficult
  • Requires in-house data annotation software integration

For these reasons, it’s very common for medical AI companies to turn to an external vendor.

Approach 2: Working with data labeling vendors

Most vendors offer a simplistic what you see is what you get approach. You deliver the unlabeled data and you get back labeled data. This makes it more difficult to iterate over time and, more importantly, to produce high quality results in the first place.

Two common data labeler groups when outsourcing are:

  • Blended offshore teams are often even more cost-effective than medical students, but they also lack the expertise required to do tasks well. A salesperson's promise that dedicated labelers will be trained up at your task is often insufficient to maintain quality. After all, vendors face business pressure to use hired staff available, instead of those that are best at your task. It also means that your workflow is less adaptable – throughput is slow to ramp up, and maxes out based on the number of employees trained. Vendors might have an MD on staff provide initial samples back to clients and QC heavily in the early going, but once the non-expert workers are let loose, the quality can degrade and be spotty at best.
  • Crowdsourcing platforms such as Amazon’s Mechanical Turk are certainly both affordable and scalable, but accuracy is by far the lowest of all the options. These platforms are meant for unskilled tasks and are inadequate for tasks requiring even a modest degree of medical skill.

The common denominator here is that costs are lower, but so is quality.

What is an on-demand expert labeler network and why is it more scalable?

Medical data labeling at scale

An on-demand expert labeler network offers a new approach to medical data labeling. We use this approach at Centaur Labs.

Bonus: Explore how we have helped companies to scale their data labeling pipeline using our on-demand expert labeler network.


Imagine the scalability of Amazon’s Mechanical Turk but with accuracy better than that produced by in-house board-certified physicians. It’s possible with an on-demand expert labeling network. How it works is that tasks are distributed to an on-demand network of experts around the world, who compete with one another to label the data most accurately. Labelers are compensated on a performance basis, so those that label the very best earn significantly more than average labelers. The best athletes (and data scientists!) earn more than average ones, so shouldn't this be true for labelers?


How is this approach to labeling different? Using an on-demand expert network of labelers has some clear advantages:

  • Top performers can be identified and rewarded making it much easier to maintain a healthy, high-performing labeling network.
  • The recruiting process is simpler, quicker and more cost-effective by using a performance-based incentive model. It’s easy to attract new labelers if project capacity needs to be increased. As tasks are completed on-demand, the need to use hired staff is completely avoided. 
  • Labelers can be trained through continuous practice. This continually improves the educational ecosystem for the entire network. Labelers can practice by doing to improve their skills, but they aren't trusted or rewarded until they prove that they have what it takes.
  • Motivation stays high. Since performance is measured constantly, even the world's best doctors have to try their best on every case to earn their keep.


The power of a dynamic labeling platform 

The next important consideration is how a dynamic labeling platform can yield significant improvements to labeling accuracy, particularly at large data volumes. 

Dynamic vs static labeling platforms

Generally speaking, there are two fundamental platform types when it comes to medical data labeling: 

A static labeling platform is one that does not improve after a label is generated. This is because labeling is done by an individual expert, typically paid by the hour. Once the label is generated, the labeler moves onto the next and there is no additional information recorded. Additionally, labeling performance can decrease as the volume of labeled data increases. For example, QC processes and pipelines that worked with just a few labelers break down upon scaling to tens or hundreds of thousands of cases. Most labeling models (both in-house and external vendors) use this approach.

A dynamic labeling platform enhances the labeling workflow by using a platform to distribute labeling work most efficiently and accurately. Since labelers are continuously measured on accuracy—and multiple labelers provide their opinion on each case—the most controversial labels can be viewed multiple times by the most accurate users. It's a bit like making sure you're getting the ball into the hands of your most clutch player so they can be the one to take the big game-ending shot. Additionally, opinions can be aggregated intelligently: opinions generated from high-performance labelers are given more weight. The power of a dynamic labeling platform really shows as the volume of labeled data increases.


Advantages of using a dynamic labeling platform for labeling your medical data

scaling medical data labeling, physicians


The key advantage of using a dynamic labeling approach is that accuracy improves as labeling volume increases. Each medical labeling task is different, but generally after several hundred cases, the dynamic model really begins to outperform any other alternative. This is contrary to a static labeling platform, that can decline in labeling accuracy at scale due to poor QA processes.

dynamic labeling platforms for medical data labeling
Dynamic labeling platforms unlock improvements to accuracy as scale increases


Key advantages of a dynamic labeling platform:

  • Intelligently aggregates expert opinions to boost labeling accuracy by weighting labelers on their task-specific performance.
  • Offers a difficulty metric that aids in QA. A dynamic platform can measure difficulty for each individual case within a task. This is a more objective way of finding which tasks are controversial/difficult. More labelers can be efficiently reallocated to difficult cases if needed to improve overall model accuracy.
  • Provides better data insights with organized, easily sortable metadata about all labels. This allows you to identify areas of bias in your training data and opportunities to create a more robust dataset.

Why scalability should be a primary concern for any medical AI company

Scalability is the key to taking your medical AI to the next level. In order to scale efficiently, you need to plan for it. Start to think in terms of thousands of labels per day rather than just a few hundred. Will you be able to handle the scale needed to take your model to the next level, so it can positively impact patients? If you're interested in scaling your medical labeling pipeline, consider the Centaur Labs approach.

The business case for the Centaur Labs approach

Centaur Labs combines the advantages of an on-demand expert labeler network with dynamic labeling platform and focuses exclusively on medical uses-cases. Customers of Centaur Labs enjoy: 

  • Enhanced labeling accuracy thanks to intelligently aggregated multiple expert opinions and compensation based on performance, so labelers give 100% effort on every case. 
  • Dedicated customer service for all client engagements. We offer a project manager to work with you to manage the scope, schedule, and budget of your labeling project, resulting in faster and better labeling outcomes.
  • Scale and speed of our on-demand expert network that can rapidly match labeling demand. This means we can produce thousand of labels in days (not weeks or months), accelerating your time to market.

For more information about how we can offer you accurate medical data labels at scale, contact us today.


References:

1) https://deepmind.com/research/publications/International-evaluation-of-an-artificial-intelligence-system-to-identify-breast-cancer-in-screening-mammography

2) https://www.gartner.com/smarterwithgartner/the-need-for-ai-governance-in-healthcare/

3) https://www.grandviewresearch.com/industry-analysis/artificial-intelligence-ai-healthcare-market?utm_source=prnewswire&utm_medium=referral&utm_campaign=hc_16-dec-19&utm_term=ai-healthcare-market&utm_content=rd

Related posts

December 8, 2020

When experts disagree, who do you trust for your medical AI?

Learn the how to mitigate the impact of medical error in your data labeling pipeline by intelligently aggregating multiple expert opinions together

Continue reading →
August 1, 2022

Centaur Labs partners with Mayo Clinic spin out Lucem Health to accelerate medical AI development

Learn about our partnership with Mayo Clinic spin out Lucem Health, and how clinical AI development teams can access high quality medical data annotations at scale.

Continue reading →
August 31, 2022

9 most common types of medical text datasets

From SMS to insurance claims to pathology reports and scientific studies, in this post we dig into the most common type of medical text datasets leveraged for NLP in healthcare.

Continue reading →