Blog

Our collection of open source datasets for medical AI

Chris Hilger
December 3, 2020

We live and breathe medical datasets for AI!

Open source data can be very valuable when starting to think through building a medical data labeling pipeline (more on that in our free white paper)

We thought it would be helpful to put some of our favorite open source datasets in an organized list and share them out to the community.

In our list, you can explore dozens of datasets by size, category, modality (including X-ray, Ultrasound, Whole Slide Images, CT Scans, ECGs) and more. Additionally, we have included a brief description that helps you to quickly understand the specific abnormalities of interest, balance of the data and information about annotations included such as medical image classifications or segmentations.

Table of open source medical image datasets
A screenshot of our table containing dozens of open source medical datasets


Access the full collection here.

If you know of any datasets that should be added to this list, please let us know!


Related posts

August 1, 2020

Building a scalable and accurate medical data labeling pipeline

Examine the unique challenges with medical data labeling, the relative lack of accuracy produced by traditional data labeling methods, and discover a more accurate and scalable alternative

Continue reading →
August 4, 2021

Centaur Spotlight: Tom Gellatly

Today, we’re getting to know Tom Gellatly, a Centaur Labs co-founder and the VP of engineering!

Continue reading →
August 1, 2022

Centaur Labs partners with Mayo Clinic spin out Lucem Health to accelerate medical AI development

Learn about our partnership with Mayo Clinic spin out Lucem Health, and how clinical AI development teams can access high quality medical data annotations at scale.

Continue reading →