Centaur Labs and Dandelion Health are partnering to give AI developers secure and ethical access to de-identified, annotated clinical data—including images, waveforms and structured health records—representing over four million patients across two health systems so AI developers can build better AI products that improve patient health.
NEW YORK – Today, Dandelion Health, Inc. (“Dandelion”) and Centaur Labs announced their partnership to enable AI developers to access high quality clinical data to build AI products that improve patient health.
Dandelion, a startup founded in 2020, is building the richest AI training dataset in the world. Their de-identified clinical data includes images, waveforms, streaming monitors, full-text notes, structured health records, and more. Today Dandelion has over 4.5 million patient records from two integrated delivery networks (IDNs) that were specifically chosen for demographic and clinical diversity. This dataset will grow to 10-15 million patient records from five IDNs in the coming years. Al developers use Dandelion data to train new models, get FDA approval, and assess performance of existing models to minimize unwanted bias.
Centaur Labs is the leading scalable data annotation platform for the medical and life sciences industries. The Centaur Labs platform has turned biomedical data annotation into a competitive sport - generating 2 million high quality annotations weekly from a proprietary network of tens of thousands of doctors, medical students, and other professionals, all of whom compete on the gamified platform to annotate data most accurately. Centaur Labs annotates a wide variety of data, including unstructured clinical notes, scientific papers, radiographic images, pathology slides, auscultation audio files, and more. Centaur Labs will increase the richness of the Dandelion dataset by identifying and marking important features – for example, identifying lung nodules in a chest CT, or tagging drugs mentioned in clinical notes. AI developers can then use these annotated datasets to train algorithms to detect the same findings in medical data the algorithms have not yet seen.
Such algorithms could not only improve clinical workflows, but also help healthcare providers identify conditions that are difficult to detect. Dandelion’s point in time data (e.g., a patient's radiology scans or clinical notes) are linked to longer term quantitative outcomes (e.g., 10-year mortality). Therefore, a mammography algorithm, for example, that is trained on Dandelion-Centaur Labs data could not only aid in the accuracy of the mammogram interpretation, but also predict the 5-year risk of breast cancer.
Through this partnership, AI developers can now access annotated data that spans nearly every aspect of U.S. clinical data – including rare and acute conditions and every type of medical imaging (e.g., CTs, MRIs, X-Rays, ultrasounds, and biopsies). The result is better, more ambitious AI products that account for bias, advance science, and improve patient health.
Optimizing the AI pipeline
“Dandelion Health’s north star is to improve patient care by accelerating widespread adoption of accurate, trustworthy and equitable AI in clinical practice. By partnering with Centaur Labs we can better address the data access and data labeling bottlenecks currently slowing progress towards this mission ,” says Elliott Green, co-founder and CEO of Dandelion Health.
"We’ve seen many clients - from multinational medical device companies to early stage startups - struggle to get timely access to the data they need to build their AI models,” says Erik Duhaime, co-founder and CEO of Centaur Labs. ”We’re thrilled to partner with Dandelion Health to make it easier for clinical AI developers to get off the starting blocks more rapidly, and get their models into production where they can impact patient care.”
How Centaur Labs leverages multiple expert opinions to create the most accurate medical data labeling platform for text, image and video data
From SMS to insurance claims to pathology reports and scientific studies, in this post we dig into the most common type of medical text datasets leveraged for NLP in healthcare.