Volastra Therapeutics quantifies chromosome instability at scale with data annotations from Centaur Labs

Ali Devaney, Marketing
January 11, 2023

10 million people died of cancer in 2020 alone, according to estimates by the World Health Organization - that’s roughly the equivalent of the population of Sweden, and is more than the 6.6 million deaths to date from the COVID 19 global pandemic. To make matters worse, the global burden of cancer is predicted to increase by approximately 50% by 2040. 

Volastra Therapeutics, a Fierce15 award winning biotechnology company, is on a mission to stop cancer in its tracks. To do this, it is taking a new approach. Instead of targeting genetic mutations which have long been the focus of biotech and pharma drug development efforts, Volastra is targeting the chromosomal instability or “CIN” underlying those genetic changes. 

CIN - present in 60 - 80% of human tumors - allows cancers to progress faster and more effectively resist treatment. Volastra has combined its deep understanding of CIN, drug development and data science to develop a powerful CINtech  platform.  Volastra uses this platform to identify high priority therapeutic targets, gain mechanistic biological insight, and develop novel drugs to treat chromosomally unstable cancers, exemplified by their lead KIF18A inhibitor program.

The data annotation bottleneck 

As part of their effort to understand and master chromosome instability, Volastra is developing machine learning models to help quantify chromosome instability at scale. The team will use these models across the drug development process - from powering preclinical discovery work, to improving their understanding of the underlying biology, and informing their prioritization of targets for further development. Eventually they expect these models will help them quickly characterize patient tissue samples, determine what patients will respond best to their therapies, and decide what patients should participate in their future clinical trials. 

To begin developing these models, the team needed a training dataset of pathology slides annotated to identify the presence and type of mitotic events. A few leaders at Volastra who were experienced at characterizing CIN phenotypes - including some of the company’s senior management team - began the arduous and slow process of annotating these pathology images by hand on nights and weekends.

The team knew from the outset that this approach to building the training datasets that power their models would not scale. “In order to build robust deep learning models that can - in the future - automate important CIN-specific analyses, we need to collect a very large volume of high quality annotations for these informative cellular events."

"These events are very rare and time consuming to find, count, and classify. With the many competing priorities at a fast growing biotech, we knew it would be impossible to scale the required data annotation work in-house,” says Sarah Bettigole, Head of Immunology and Data Science.

Volastra needed a higher throughput data annotation system that could produce accurate annotations much more quickly than they could produce by themselves. The system needed to be flexible enough to handle multiple annotation tasks as their needs changed overtime, and to require a much lower time investment from the Volastra team.

High quality data annotation that scales

Volastra found this solution in Centaur Labs, the scalable data annotation platform for medical and life sciences. Volastra partnered with Centaur Labs because of the network of annotators with medical backgrounds, the scalability, the multi-reviewer scoring system, and the insights about data annotation quality the platform provides. Together, these capabilities gave the Volastra team the confidence to know they can get both quality annotations and increase throughput.

Volastra worked with Centaur Labs to annotate two datasets of cancer cell lines stained with DAPI, and three datasets of human tumor samples stained with H&E. The team wanted to segment all cells in specific mitotic stages in the 15,000 DAPI stained images, and then identify and classify CIN-specific defects. For the three H&E stained datasets, they wanted to segment cells in specific mitotic stages in 119,000 tiled images from various tumor types. Volastra started by annotating subsets of these datasets to serve as examples of high quality annotations - or ‘Gold Standards’. Volastra had their staff pathologists annotate each of these Gold Standards, and the majority opinion was the final label. Once the Gold Standards were created, Centaur Labs generated 154,000 annotations for the datasets. 

Centaur Labs segmented the DAPI stained images at a rate of approximately 3,220 images per week, and generated an average of 5 qualified opinions per image. Centaur Labs then classified the images of the 20,000 mitotic cells at a rate of 4,200 per week, with 12 qualified opinions per image. For the H&E images, Centaur Labs segmented the cells in specific mitotic stages across the 119,000 tiled images at a rate of approximately 7,300 images per week, and generated an average of 8 qualified opinions per image.

“I was very surprised to be able to get annotations at such a fast speed, while maintaining annotation quality similar to our experienced pathologists,” says Nazario Bosco, Associate Director of Biology.

In 61 - 89% of cases,  depending on the task, the Centaur Labs annotations were the same as the Gold Standards. 

Accelerating model training and data annotation systems

Centaur Labs was able to annotate 80 whole slide images in just two weeks - 6 times faster than what the Volastra team could do on their own on nights and weekends. It takes approximately an hour to annotate a pathology slide, so - had they completed the annotations in house - it would’ve taken Volastra around 240 hours and yielded only 3 opinions. This 80 slide annotation project alone would’ve taken at least 12 weeks to complete.

“Working with Centaur Labs has allowed us to improve our timelines significantly, and scale up the amount of annotation we can do,” said Bosco.

The importance of speed becomes even clearer when considering the team’s longer term model development plan, involving thousands of images across numerous datasets. 

Centaur Labs is also improving the workflow for Volastra’s pathologists wherever they still would like to annotate by hand. “Instead of having our experts review every piece of data manually, Centaur Labs combs through the entire dataset and generates a shortlist for focused review,” says Bettigole. From a dataset of 53,000 tumor images, for example, Centaur Labs identified the 7,400 images the pathologists should focus on.

”Working with Centaur Labs has allowed us to use our experts more effectively and make faster progress, ” says Bettigole.

Volastra is building a durable data annotation system for the long term. “We are training our models on a large diversity of tumor types”, says Bettigole. The flexibility of the Centaur Labs platform to support both multiple types of annotation tasks and large volumes of high quality data enables Volastra to use a single platform for all data annotation throughout the model development lifecycle.

“We’re excited by the high throughput and quality of annotations we get with Centaur Labs - I would definitely recommend working with them.” - Sarah Bettigole, Head of Immunology and Data Science

Related posts

August 19, 2022

[New Series] Introducing our NLP in Healthcare blog series

Learn all about NLP in healthcare - and the medical text datasets that power it - in our new 4-part blog series.

Continue reading →
December 3, 2020

Our collection of open source datasets for medical AI

Access a collection of dozens of open source image datasets for medical AI across a variety of formats including X-ray, CT, Ultrasound, Whole Slide Imagery, MRI and more

Continue reading →