Pathologists play an essential role in a patient’s care journey. When a lump is found in the breast as part of routine care, and tissue is biopsied for further analysis, that tissue goes to the pathology lab where it is stained, put in a slide, and - increasingly - scanned to produce digital images. Does the patient have cancer? What type of cancer? What stage of cancer? Should the patient get a mastectomy or begin a chemotherapy regimen? Each of these questions are decided or informed by the pathologist’s analysis.
Pathologists also frequently have diverging interpretations of pathology images. Inter-rater reliability - the extent to which pathologists agree with one another and come to the same conclusions - is often very low across many types of pathology analysis. For both of these reasons, it’s critically important pathologists have cutting edge tools to effectively and confidently interpret pathology slides.
Dr. Thomas Fuchs, pioneer of the field of computational pathology - the use of AI to analyze images of tissue samples to identify disease and predict outcomes - saw an opportunity to develop AI-enabled tools pathologists could use to inform their analyses of pathology slides.
Paige, the company Dr. Fuchs founded based on his trailblazing research, is a global leader in end-to-end digital pathology solutions and clinical AI applications, differentiated for leveraging a weakly supervised model development approach. “Our unique approach allows us to leverage both structured and unstructured data in tandem, and build clinical-grade models with unprecedented performance,” says Brandon Rothrock, Director of AI Science. The results of their approach speak for themselves - Paige was the first company to receive FDA approval for an AI product in digital pathology with its Paige Prostate Detect solution. The company’s newest product - the Paige Breast Suite - launched in 2021, aims to assist pathologists in efficiencies of reads and reduce errors by identifying suspicious regions of interest.
As part of the Breast Suite, Paige is developing a new algorithm that enables pathologists to more easily identify key cellular features within breast cancer tissue. To begin development of this detection algorithm, the team needed a dataset of pathology slides annotated to identify the presence of the cellular feature of interest. The team had 4 staff pathologists - including the Chief Medical Officer - willing to tolerate this tedious annotation task. All of these pathologists had full time roles at Paige unrelated to data annotation. As a response, the AI team quickly prototyped an internal annotation tool where the pathologists could view a section of a pathology slide and indicate whether the cellular feature of interest was or was not present. Each pathologist annotated the entire dataset, as they wanted to build the model with high quality ground-truth using concordant assessment among readers.
While this bootstrapped data annotation system got the team started, and enabled them to build a model with a F1 score of .6, it wouldn’t get them to the finish line with a product their pathology customers could trust. The current data annotation system was inflexible and not scalable to obtain high throughput.
“Pathology slides are large images - we started thinking ‘how are we going to annotate thousands of structures of interest in each slide?’,” said Fausto Milletarì, Sr. AI Scientist at Paige. “When we knew how much data we needed to improve, we realized we needed a scalable solution to get accurate annotations. Only 4 people in the world could annotate our data using our internal tools - this would not scale,”.
Paige needed a data annotation solution that could produce accurate annotations, scale up and down with their needs throughout the model development lifecycle, and that was flexible enough to support both multiple annotation types and increasingly complex annotations as their model improved and their model training initiatives became more targeted.
Paige found this solution in Centaur Labs, the scalable data annotation platform for medical and life sciences. Paige first worked with Centaur Labs to build a test dataset they could use to evaluate the performance of their current model, answering questions like - was their model detecting cells that indeed contained the cellular feature of interest? Or was it identifying false positives?
Centaur Labs annotated a test dataset of 20,000 pathology images of breast tissue. Each image had a bounding box drawn around one cell and annotators needed to determine whether or not the indicated cell contained the cellular feature of interest.
Paige first had 5 staff pathologists annotate a subset of 150 images that would serve as examples of high quality annotations - or ‘Gold Standards’. When there was a 3-2 disagreement across the pathologists, a senior pathologist arbitrated and applied the final annotation. In 57% of cases, a majority of the 5 pathologists agreed on whether the cellular feature was present or not, and applied the final annotation without the senior pathologist. In 42% of cases it was less clear - the pathologists disagreed, so the final annotation was decided by the senior pathologist. In only 31% of cases was there unanimous agreement among the 5 pathologists, so detecting this cellular feature is difficult even for experienced pathologists.
The Paige team then used these agreement rates among professional pathologists as a quality benchmark for annotations generated by Centaur Labs. If in at least 47% of cases Centaur Labs annotated the Gold Standards the same way as the 5 pathologists, then Paige would consider the quality of the Centaur Labs annotations to be acceptable.
Once the Gold Standards were created, and the annotation quality benchmark was defined, Centaur Labs began generating annotations for the entire test set of 20,000 images. Centaur Labs annotated 20,000 images at a rate of approximately 4,000 images per week, and generated an average of 10 qualified opinions per image. Centaur Labs also exceeded the quality benchmark two times over - in 90% of cases the Centaur Labs annotations were the same as the Gold Standards.
“Centaur Labs annotated the first batch of data so quickly it seemed unreal. The speed of annotations we can now get is 10 fold - we can get 1,000 high quality annotations in one day” said Milletarì.
After evaluating their model with this test dataset, the Paige team initiated multiple rounds of model retraining. As of November 2022, the latest generation of Paige’s model has achieved an F1 score of .83, up from .6.
“We were able to achieve this dramatic improvement, in part, because of Centaur Labs. The high quality annotations they generated at scale allowed us to use a tremendous amount of data to improve our model,” said Milletarì.
Building on this progress, the Paige team is now exploring other ways to further tune and train their model. They now want to confirm the model is not producing false negatives, and overlooking cells that contain the feature of interest. The flexibility of the Centaur Labs platform to support the development of multiple types of annotation tasks and test datasets means the Paige team can use a single platform for all data annotation throughout the model development lifecycle.
“Working with Centaur Labs is the best way to get thousands of pieces of data annotated in a day, rather than weeks. We don’t have pathologists that are so available to do our annotations, and we don’t have so many people that we can compare many opinions to create consensus,” says Militari. “With Centaur Labs we have both - an army of people doing high quality annotations, and annotating the data very quickly.”
How Centaur Labs leverages multiple expert opinions to create the most accurate medical data labeling platform for text, image and video data