Today, we’re getting to know Tom Gellatly, a Centaur Labs co-founder and the VP of engineering!
Tom studied computer science engineering at the University of Michigan. His first job out of college was as the first employee at Sidecar, where he helped launch the first peer-to-peer ridesharing app. He managed Sidecar’s product engineering team until their acquisition by General Motors in 2016. Soon after that, he was recruited by Cruise Automation, a self-driving car startup and another GM acquisition. At Cruise, Tom managed the mapping and data labeling engineering teams. After Tom left Cruise, Erik (co-founder and CEO of Centaur Labs) connected with him about potentially joining Centaur. Given that Tom had a strong startup and data labeling background and he knew Zach and Erik from their college years, Centaur seemed like a perfect fit. Tom has now been with the company for 2.5 years.
I was introduced to data labeling when I joined Cruise, which at the time was about 60 people. Their AI engineers were building algorithms for the sensors on their cars and they needed lots of labeled training data. Cruise had some home-grown data labeling tools but their results weren't great and throughput was far too low. So, one of my first tasks was: How can we take this labeling pipeline that's currently this small internal operation and plug it into Scale API (a new company at the time) to label this data at scale?
At first, this process was straightforward. We designed simple labeling tasks for drawing bounding boxes around cars, bikes, pedestrians, etc and sent them to Scale API. We designed tasks to QA labels and redraw them if necessary. Things started to get complex as the volume scaled and other engineering teams got involved… everybody wanted labels! Our class definitions were continuously evolving (eg. an articulated bus looks and behaves differently from a normal bus) which required us to go back and relabel existing data. As Scale handled the bulk of the easy tasks, Cruise continued to develop their own internal tools to solve more niche problems like sensor fusion labeling.
One reason data labeling is so challenging is that you can't fully outsource it, it gets too expensive and you'll always need some amount of custom tooling to handle your particular use cases. Data labeling is basically an infinite problem: the need for data just continues to increase as the product matures.
At Cruise, most labeling tasks were objective and could be done by anyone who read the instructions and paid attention. If you show someone a street scene, they can tell you definitively whether there’s a stop sign or not. The problem was really about scale, containing costs, and building a smart pipeline.
At Centaur, where we label medical data, that is not the case. If you show a chest X-ray to a room of expert doctors, they’re going to disagree with each other on how many lung nodules there are. Many of our tasks don’t have a “correct” answer per se, which means we need to collect many opinions and figure out how to combine them intelligently. Our tasks also require a higher degree of skill so we can’t ask just anyone. We continuously monitor our labelers’ performance and only trust those who maintain a high level of accuracy. For some tasks, labelers need to review dozens of cases before they’re good enough to be trusted. We need to be extra careful with our instructions and QA because the bar for quality is higher given the nature of the data we work with.
I managed engineers at Sidecar and Cruise, but I joined Centaur as a founder. I think for this phase in my career it’s less about trying to do it all myself and more about trying to set up other people to succeed and to build systems that are going to scale. So, it’s been a cool opportunity to take everything that I know and everything that I’ve learned and start again from scratch. And you can never do it “right,” but at least I’m able to draw a lot from mistakes I’ve made and things I’ve worked on in the past, so the goal is to set up this company, tech stack, and product to scale and to be easier to grow.
I love that the problem that we’re solving is the perfect combination of meaningful, fascinating, and hard. There are a lot of hard technical challenges that are quite dull and/or not beneficial to humanity. Every day, during our customer standup we discuss the problems our customers are trying to solve. And all of them are completely different and so fascinating! There is also the human psychology aspect of it with our labelers, where we think, “How can we get them to do this task well? How do we make it fun? How do we make sure that they’re learning from their experience?”
I would say our “Film Strip!” This feature is used for tasks with 3D images, like labeling a brain MRI. Instead of one 2D frame, you actually have a stack of them. Imagine you’ve got a series of 100 images in a row. It doesn’t really work to give a labeler the task of labeling all 100 images...it’s too hard and takes too long. So, what we do is we break down the problem, and we give the labelers a chunk of images while showing them the rest of them as context. We envisioned this feature working, we shipped it, we realized “OMG it totally works,” and now it’s something we lean really heavily on.
I’m excited about growth. We’ve done a huge amount with a very small team, and we’re entering into a phase where instead of shipping our product out there fast and solving problems as quickly as possible, our team is able to develop good generalized scalable solutions to things. This is all so that we can onboard more and more customers and do this stuff sustainably at scale, all while building and empowering a team who can own these problems.
Well, I just bought a house for the first time, so I’m learning how to garden. I like to go on long bike rides; eat at new restaurants, my wife and I are foodies kind of; and I love to go hiking!
The answer changes, but I think my superpower is that people trust me. People listen to me, and I listen to people. So, when I ask them to do something difficult, or when I say this is gonna work, they believe me, and then we can do it!
Try to give people enough context to understand a problem or proposed solution. Over-explain, write things down, repeat yourself, etc. It's easy to forget when you've spent a while thinking about something but everyone comes into a conversation with a different level of understanding.
Understand how Centaur Labs' data annotation platform offers richer results than traditional data labeling vendors
Access a collection of dozens of open source image datasets for medical AI across a variety of formats including X-ray, CT, Ultrasound, Whole Slide Imagery, MRI and more