Aiberry builds explainable mental health AI, with a novel affect video dataset annotated by Centaur Labs

Ali Devaney, Marketing
July 6, 2023

An already fragile mental healthcare system was brought to its knees by patient demand in the wake of the pandemic. With headlines like “Half of British psychotherapists turn away patients after cases rise by 10% [...]”, rising suicide rates among youth and BIPOC, and the many quietly struggling without care due to stigma or lack of access, it’s difficult to question the severity of the current mental health crisis. Alongside patients and families, mental healthcare providers carry much of the burden of this strained system - working longer and longer hours, forced to put patients on waitlists as they manage oversubscribed case loads. 

Aiberry, a Health Tech Digital Award-winning start-up, is working to address these challenges, enabling mental healthcare providers to have a comprehensive overview of a patient’s mental health state from the very beginning of the therapeutic relationship. This enables them to create more effective treatment plans, and achieve mental health outcomes more efficiently. Founded on the research of leading cognitive neuroscience and computational systems researcher, Dr. Newton Howard, Aiberry’s AI-powered platform supports mental healthcare providers in their initial mental health screening process, shifting from a paper-based screen, to one that is more objective, reliable, and quantifiable. The data science team is led by Dr. Jason Shumake as Aiberry transitions into production mode with their product. The company’s first assessment tool focuses on depression screening and is in clinical use in mental health clinics throughout the US, with plans to add support for additional mental health conditions soon. 

In traditional depression screenings, patients visit a healthcare facility and answer 10 - 20 multiple choice questions regarding the severity of their symptoms. With Aiberry’s multi-modal AI-powered conversational assessment tool, patients open an app on their mobile device or desktop computer and have a guided conversation with their virtual therapeutic assistant, Botberry. This offers a more engaging and dynamic experience for the patient and generates new patient information - both verbal and nonverbal - for analysis. Aiberry’s AI system analyzes and identifies patterns across this rich patient data - analyzing word choice, sentiment, vocal pause length and frequency, vocal pacing, facial expressions and more. It then aggregates these findings into a patient-specific depression risk score that the provider can incorporate into their evaluation. 

Aiberry also breaks down the risk score by sharing “Screening Insights” on Mood, Energy, Concentration, Rate of Speech and Memory Bias.

“Explainability and objectivity are core to our AI and platform development approach. We know it’s critical to both mental health providers and patients themselves to see what our AI system has identified in the screening data, understand what’s influencing the overall depression risk score, and see opportunities for further discussion together,” says Lior Auslander, EVP of Product and Technology.

Adding insights from patient video data 

After building five Screening Insights to explain the patterns identified in patient text and audio, the Aiberry team wanted to add Insights to explain the patterns found in the video data as well.

“We would never want to infer only from audio or only from video, but when combined with the words people use to describe their symptoms, those three pieces together can provide a clearer picture of a person’s mental state,” says Jason Shumake, Director of Data Science.

In order to add insights related to the video data, they needed a training dataset of videos classified by emotion. When developing the insights for the more straightforward audio and text datasets, they were able to confidently use large off-the-shelf open source datasets, open source base models, or build their own models to understand sentiment or pause characteristics. 

When they tried using open source video datasets and base models for emotional expressions, they ran into a number of challenges. These datasets were made up of images of actors making 1 of 7 expressions. Many of the 7 classified emotions were irrelevant to the depression screening context - in a 5 - 10 minute screen patients don’t typically show that wide of a range of emotion. On top of this, some expressions relevant to depression screenings – namely, ‘cognitive effort’ or thinking expressions - were not classified. Finally, because the images were actors, the datasets didn’t transfer well to real world emotional expression. They also tried an off the shelf model for affect assessment.

“It was terrible - it would identify someone as happy when really they were talking and making an “E” phoneme that looked like a smile,” says Shumake. “Ultimately, what was available wasn’t applicable. We needed something more specific and customized to the emotions we were seeing expressed in the screenings”.  

Granular and high quality data labeling that scales

Aiberry decided to work with Centaur Labs to build a first-of-its-kind dataset and model. Centaur Labs first did a quality control pass through 8,600 sampled frames to identify low quality frames Aiberry could use to improve their image processing algorithm. Once Aiberry could confidently share high quality frames, Centaur Labs began a more complex phase, evaluating 4-5 frames sampled from a patient video, and classifying the patient’s emotional affect as either neutral, negative, positive, or cognitive effort. 

First, Centaur Labs cleaned the frames, identifying frames where the patient was blurry, speaking, or where there was obstruction of the musculature around the eyes making it difficult to evaluate the expression, e.g. glasses, glare, hats, etc. Centaur Labs leveraged 140 gold standards (examples of high quality classifications), quality controlled the 8,600 frames in 1 week, and generated an average of 6 qualified opinions per frame.

Meanwhile, Aiberry had their team of mental health professionals classify a subset of 75 images to serve as Gold Standards of each of the four affect classifications. Once the Gold Standards were created, Centaur Labs labeled the 1,846 videos in approximately 1 week, and generated an average of 22 qualified opinions per video. Because these videos contained protected health information (PHI), and must be managed according to HIPAA, Aiberry worked exclusively with the Centaur Labs HIPAA network, a group of labelers who have signed BAAs and are permitted to view PHI. 

“Once the contest was running, we were really impressed by how quickly the data was labeled,” said Shumake. They also leveraged the comments labelers shared about cases they found unclear.  “Sometimes facial expressions are complex and ambiguous which is why we’re working with a skilled labeling network like Centaur Labs. Labelers would argue in the comments about how to classify the affect, which was a signal to us that those were the complex faces where something interesting was going on.”

Novel contributions to mental health AI and affective computing 

Aiberry set out to create a dataset classifying the expressions made during depression screenings. Not only were they able to create that dataset, but they also made a novel affect metric as well.

“We were able to use the disagreement between labelers to convert the categorical affect labels into a continuous metric we’re calling the affect rating. That’s ultimately the training data we used and it was only possible because of the 20+ opinions we could gather from Centaur Labs on each piece of our data”.

The model they’ve built with this dataset accurately predicts affect ratings, and the team is now exploring ways affect may be influencing depression risk scores.

“As a result of this work, we are now able to identify a relationship between visual signals and high depression risk scores. We are now able to further enhance our capability to raise a warning flag to the treating clinician if a patient’s visual data is inconsistent with their verbal self report,” says Shumake.

With plans to make an anxiety assessment available in the coming months, this is only the beginning for Aiberry’s AI-enabled platform. As Aiberry continues to advance the field of affective computing, creating novel datasets and models to support their growing portfolio of assessments, the flexibility of the Centaur Labs platform to produce custom data labels at scale will only accelerate the impact Aiberry is poised to have on mental health care. 



Related posts

May 10, 2022

AI in Action Podcast interviews Centaur Labs CEO Erik Duhaime

Listen to Co-founder and CEO Erik Duhaime talk about the origins of Centaur Labs and the future of medical data labeling.

Continue reading →
May 1, 2020

How multiple opinions drive huge gains in data labeling accuracy

How Centaur Labs leverages multiple expert opinions to create the most accurate medical data labeling platform for text, image and video data

Continue reading →