NLP, named entity recognition, and other AI methods obtain powerful insights from unstructured medical data stored as text, such as EMRs, de-identified patient information, and scientific literature.
Our annotation services can quickly tag thousands of text strings, conversations, paragraphs and more, organizing large volumes of data for AI applications. Examples include classifying research citations, extracting medical misinformation from social media sites, and determining drug-target relationships in pharmacological literature.
Dataseer’s vision is a world where full research datasets—not just the final paper—are made available to the scientific community worldwide. Centaur Labs provided the skilled network to label data types across dozens of domains, in over 20,000 snippets from academic papers. The speed of Centaur’s labeling pipeline allowed Dataseer to augment rare classes in their training set quickly and confidently.
Factmata teamed up with Centaur Labs to train their content moderation algorithm to identify medically harmful misinformation related to COVID-19. For a dataset of over 4,000 tweets, Centaur Labs collected over 14 opinions per tweet in less than one week. Factmata’s non-linear SVM classifier achieved an initial accuracy of 68% with this input.
scite is developing a smarter citation metric, and requires thousands of classified snippets of academic papers to do so. Centaur matched up over 7,000 text citation snippets with a skilled labeling force capable of extracting nuance from noise. What would have taken scite’s internal labeling team of PhDs months to perform was completed in a few weeks.