According to researchers, 2.2 billion people are currently diagnosed with at least one GI disease, ranging from IBD to colorectal cancer. Within that, 10 million have Inflammatory Bowel Disease (IBD) and the prevalence of the disease is increasing globally - prevalence increased by 31% in the past 25 years and is projected to continue to increase at varying rates throughout the world.
The most effective means of detecting and diagnosing IBD and many other GI diseases is having an endoscopy - a procedure that enables gastroenterologists to directly examine the GI tract. GI doctors and their teams perform approximately 170 million endoscopies each year, in many cases recording video of the procedure with the lenses and lights built into the endoscopes. These recorded endoscopies can then be referenced by clinical teams as part of ongoing care.
Other medical specialties benefit from cutting edge technologies - surgeons leverage robotic equipment to assist with surgeries, pathologists use software to accelerate the interpretation of whole slide images. Similar tools for GI doctors, however, have not been developed and made available for clinical practice and research, resulting in sub-standard care for the increasing numbers of patients living with GI disease.
Satisfai Health is on a mission to close this technical gap, equipping GI doctors with tools to enhance the quality, speed, and affordability of the detection, diagnosis, assessment and treatment of GI disease. Their AI products enable real-time endoscopic video analysis, making it easier for clinicians to quickly and accurately detect lesions relevant to IBD, cancer and Barrett's Esophagus, with support for additional GI diseases planned. All four of their AI products are approved for investigational use by both GI physicians in clinical practice, as well as by pharmaceutical companies and CROs as part of the drug development process.
"Data has been a core part of our strategy from the beginning - developing partnerships with leading GI organizations - from Olympus, to AIG Hospitals, to Alimentiv. We’re as proud of the AI solutions we’ve built, as we are enthusiastic about the AI products in our pipeline. Each AI solution we develop will help us achieve our mission to offer the most comprehensive AI platform for GI,” says Solveig Johannessen, Chief Operating Officer at Satisfai Health.
When Satisfai set out to build an algorithm - now called Certai - from scratch to score the severity of IBD, they first needed training datasets of thousands of endoscopic videos. Initially, two GI experts - including Satisfai’s founder - labeled these video datasets by hand. The team quickly learned that, while this in-house approach would produce high quality labels, it would not scale.
“It's a slow process when you have just one doctor going through tens of thousands of frames, especially if they’re practicing and have clinical obligations. Even as a small company it’s not a feasible approach to getting your data labeled - it just takes too long,” says Jennifer Nam, Clinical Operations Lead at Satisfai Health.
Satisfai needed a data annotation solution that was high throughput, could reliably produce accurate annotations, scale up and down with their needs throughout the model development lifecycle, was easy to use and that was flexible enough to support increasingly complex annotations as their model improved and their training initiatives became more targeted.
Satisfai found this solution in Centaur Labs, the scalable data annotation platform for biomedical and life sciences. The two companies have been collaborating since 2020 to build training and test datasets, creating approximately 200,000 labels and generating 1.8M qualified reads across numerous projects.
Satisfai first worked with Centaur Labs to build a dataset to determine the quality and scorability of video footage collected during colonoscopies. To assess quality, Centaur Labs completed a series of 3 labeling tasks - first determining if a video clip was taken in the colon and unobstructed by tools collecting biopsies, then if that video clip was high quality enough to score, and finally, the reason for any low quality video clips. Satisfai’s GI professionals created 100 - 150 Gold Standards for each task, so each class had solid ground truth to ensure quality at scale. Centaur Labs advised Satisfai on how to best sample frames from their video clips, ensuring the most important parts of their videos are labeled. Ultimately, labelers classified one frame sampled from each video clip.
The first task focused on determining if the frames were both relevant to what a GI doctor would want to assess as part of a colonoscopy and also unobstructed by procedural tools, classifying each video as either “outside colon”, “tool visible”, “both” or “neither”. The second task determined the quality of the relevant frames, classifying each video as “yes” the video was high quality, or “no” it was not. The final task looked at the low quality frames only - classifying the quality issues as either “fresh blood”, “blurry/water”, “feces” or “none”.
Centaur Labs labeled 10,000 videos for the first task, 8,200 videos for the second, and 5,300 videos for the third, all at a rate of 4,000 videos per day. Each label was created from an average of 10-16 qualified opinions per video. Interrater agreement was 85-95% and in 90-95% of cases the Centaur Labs annotations were the same as the Gold Standards.
“The speed of Centaur Labs’ work - it’s incredible. If you've never used the product, it’s hard to even realize how much more efficient you can be at labeling if you've got thousands of medically knowledgeable people looking at your data and telling you how it should and shouldn’t be labeled. It’s really the best way to label data - I don’t know what else could be better. I’ve set up a project to label 10,000 frames, and have had results back in as little as a week. It’s simplified my life to the nth degree - I don't have to hound individual clinicians for labeled data. I don’t have my engineering colleagues on my back asking for updates. I just set up a project, release it, and I can share the results before anyone thinks to ask for an update. And I have visibility into the status of the project the whole time, so I know we’re making progress," says Nam.
Satisfai and Centaur Labs also collaborated on a UC scoring project - UC being one of the diseases that falls under IBD. A common way to score the severity of UC is to use the Ulcerative Colitis Endoscopic Index of Severity (UCEIS). This scoring system evaluates 3 parts of the colonic mucosa: bleeding, vascular pattern, and erosions and ulcers. To score the UC, Centaur Labs completed four labeling tasks, scoring each video on the three key parts of the UCEIS, and also scoring according to a different system - the Mayo UC scale. With 1,000 Gold Standards for each task, labelers labeled one frame from each video clip, though 4 adjacent frames were sampled and provided as context. Centaur Labs scored each frame for bleeding and erosion/ulcer severity on scales of 0-3, for vascular pattern on a scale of 0-2, and on the Mayo UC scale of 0-3. If the frame was not high quality enough to assess, each task had an option to label it as such.
Running all four tasks concurrently, Centaur Labs was able to produce 48,000 labels in a week. Centaur Labs labeled each of the 42,000 videos four times, each time at a rate of 12,000 videos/week. Each label was created from an average of 8-10 qualified opinions per video. Interrater agreement was 81-84% and in 61-70% of cases the Centaur Labs annotations were the same as the Gold Standards. The Satisfai team was able to add new data to their existing tasks as it became available from from partner institutions - no additional setup needed.
“When you have just one person labeling your data, it’s going to be very subjective. But when you have the Centaur Labs network, you can see the quality - you can see where there’s unanimous agreement on a label, where there’s low agreement that could warrant investigation. Getting 10+ opinions makes a big difference in terms of accuracy, quality and consistency. You can be more confident relying on the labels,” says Nam.
With ambitions to develop AI solutions that span the GI tract, having a scalable data annotation system that reliably produces high quality labels is critical.
"The data and the labels can make or break the quality of the product, the outcome of the clinical trial, etc. Having Centaur Labs working alongside us, able to produce high quality labels so quickly has enabled us to improve our algorithm exponentially,” says Nam.
The flexibility of the platform to support both multiple types of annotation tasks and large volumes of data enables Satisfai to use a single platform for all data annotation throughout the training and evaluation stages of model development.
“Centaur Labs has impacted us greatly, accelerating the momentum we have in all of our projects. Without them, we wouldn't nearly be able to make any of the timelines we have for our current projects. We’d be set back months, if not by a year or so. We have an endless amount of exciting projects in the pipeline to support our ambitions to be the most comprehensive AI platform for GI, so we expect our partnership with Centaur Labs to be a long, long term collaboration.”
Learn about our partnership with Mayo Clinic spin out Lucem Health, and how clinical AI development teams can access high quality medical data annotations at scale.
Learn about why we're called Centaur Labs!
Learn the how to mitigate the impact of medical error in your data labeling pipeline by intelligently aggregating multiple expert opinions together