Talk to Me: Deep Learning Identifies Depression in Speech Patterns

by Tony Kontzer

“Talk therapy” is often used by psychotherapists to help patients overcome depression or anxiety through conversation.

A research team at Massachusetts Institute of Technology is using deep learning to uncover what might be called “talk diagnosis” — detecting signs of depression by analyzing a patient’s speech.

The research could lead to effective, and inexpensive, diagnosis of serious mental health issues.

An estimated one in 15 adults in the U.S. reports having a bout of major depression in any given year, according to the National Institute of Mental Health. The condition can lead to serious disruptions in a person’s life, yet our understanding of it remains limited.

The techniques used to identify depression typically involve mental health experts asking direct questions and drawing educated conclusions.

In the future, these pointed assessments may be less necessary, according to lead researcher Tuka Alhanai, an MIT research assistant and Ph.D. candidate in computer science. She envisions her team’s work becoming part of the ongoing monitoring of individual mental health.

All About the Dataset

A key aspect of getting started with deep learning is getting good data.

That was a challenge for Alhanai when her team went to train its model. She was specifically looking for datasets of conversations in which some of the participants were depressed.

Eventually, she found one from the University of Southern California, which had teamed with German researchers on conducting interviews with a group of 180 people, 20 percent of whom had some signs of depression. The interviews consisted of 20 minutes of questions about where the subjects lived, who their friends were and whether they felt depressed.

Alhanai was emboldened by the researchers’ conclusion that depression can, in fact, be detected in speech patterns and vocabulary. But she wanted to take things a step further by removing the leading, predictive questions, and instead train a model to detect depression during normal, everyday conversation.

“There is significant signal in the data that will cue you to whether people have depression,” she said. “You listen to overall conversation and absorb the trajectory of the conversation and speech, and the larger context in which things are said.”

Alhanai and her team combined the processing power of a cluster of machines running more than 40 NVIDIA TITAN X GPUs with the TensorFlow, Keras and cuDNN deep learning libraries, and set to work training their model.

They fed it with snippets of the interviews from the dataset, minus the obvious questions and references to depression, leaving the model to determine whether there were depression cues present or not. They subsequently exposed the model to sections of conversation from a healthy person and a depressed person, and then told the model which one was which.

After enough cycles of this, the researchers would feed the model another section of conversation and ask it to determine whether there was an indication of possible depression. The team trained dozens of models this way, something Alhanai said would not have been possible without access to GPUs.

Success Breeds Ambition

Ultimately, the training resulted in the team’s model identifying depression from normal conversation with more than 70 percent accuracy during inference — on par with mental health experts’ diagnosis — with each experiment occurring on a single TITAN X.

The team reported its findings in a paper submitted at the recent Interspeech 2018 conference in Hyderabad, India, and is now primed to take the work to the next level.

“This work is very encouraging,” said Alhanai. “Let’s get these systems out there and have them do predictions for evaluation purposes — not to use clinically yet, but to collect more data and build more robustness.”

Naturally, Alhanai craves access to faster and more powerful GPUs so she can run more experiments with larger datasets. But her long-term view is to explore the impact that using deep learning to analyze communication — not just speech — can have in diagnosing and managing other mental health conditions.

“Any condition you can hear and feel in speech, or through other gestures, a machine should be able to determine,” she said. “It doesn’t matter what the signal is — it could be speech, it could be writing, it could be jaw movement, it could be muscle tension. It will be a very non-invasive way to monitor these things.”