Giving voice to depression

Carmen Martínez Antón

During the past few decades, the development of speech technologies has been exponentially increasing, which has been mainly triggered because of the way these new technologies have been making our lives easier. Speech is the most characteristic human trait related to communication; it allows us to share our ideas, thoughts, requests, knowledge, or, even if we would not want to, our current mood.

Probably, we have all met, or been in contact with, a person suffering from depression. It is often said that they are sad, blue, or that they lack motivation in their everyday lives. Nevertheless, what we are not aware of is that this mental disorder is much more complex than we believe. In fact, it affects the life of those who suffer them not only at a social level but also at a working status. For example, something that could seem so simple as meeting friends or going to work turns out to be overwhelming for these people. This stress is the most important triggering factor (and one of the biggest risk factors at the same time) of this disorder. As part of this ‘fear’ or ‘passivity’, which may reflect what is going on in their minds at a hormonal level, the voice of these people comes off monotonous, paused, flat, or even trembling.

Current methods to diagnose depression are not completely reliable. In general, they are based on the answers to subjective questionnaires whose efficiency will always depend on the how much the individuals are willing to collaborate with their answers. These questionnaires include from the most simple to the most complicated question one can think of. For instance, they could be asked about the amount of hours they have slept in the past few days, whether they have had suicidal thoughts, or even about their interest in sex in today’s society.

Once we know this, it should not be an unreasonable idea to try to combine perceptible facts that characterise this disorder, such as the tone of voice, with the aim of improving current diagnosis. What would happen if we put together the possibility of recording the differences in people voices according to their suffering or not this disorder, and  their different reaction to stress? Maybe taking a controlled stress test could help to diagnose better. Moreover, what if the voice was recorded too, as it is supposed to be so characteristic?

Excerpt from the three levels of difficulty of the Stroop test,which is used to cause stress in the individuals taking the test.

In fact, the required technology for this purpose is already available: we only need a mobile phone that shows something on the screen that could be stressful for the individual and that, simultaneously, can record their voice. Then, we could transfer this recording to a mobile app where it would be analysed. For instance, it could be used to find specific parameters in predefined parameters within the individual’s voice that later could be used to infer the existence of an illness. These parameters could be the amount of time the individual has been speaking in the recording (e.g., one-minute audio) or how monotonous their voice is recorded during this period of time. The Stroop test, a test that induces controlled stress, could be used for that purpose. During this test, the subject must say out loud the colours ‘red’, ‘green’, or ‘blue’ in which the same words ‘red’, ‘green’, or ‘blue’ are written; although the words and the colours never match – which is the difficult part of the test.


Prototype of the phone app that could monitor and diagnose depression through the analysis of people´s voice.

As technology is in continuous advancement, there is a possibility that, at some point, we can use it to detect certain patterns to diagnose an illness, hence not needing to carry out expensive medical tests for the same purpose. At the moment, however, this research line is in full swing. On the one hand, I had the opportunity to be join this research project as part of my master’s thesis. At the beginning, I was looking for speech patterns that could distinguish depressed people from those that were not. This was only possible because our research team had access to a database with information about both types of individuals (with/out depression). After observing promising results with our basic classification analysis, we became more keen on delving into more complex approaches to improve the accuracy of this classification. We have been working on a simple classification approach with machine learning methods. First, we collect the values of several parameters – not only from speech signals but also those recorded from physiological signals such as ECG, EMG, or PPG – from a group of people in which we know who are the the control or the depressive individuals. Then, we “train” a computer algorithm with the data of these two groups (i.e., “training data”) until it can find specific patterns that are later used to distinguish depressive from control individuals. Therefore, once we have collected the values of these parameters for a new individual, we can use this “already trained” computer algorithm to classify this new individual into one of the two groups, decision that will be based on what has learnt before with the “training data”.

Nowadays, this is no more than a proposal for a coming future. Nevertheless, it might be possible that, in a quite near future, it is possible to have this kind of ‘screening tests’ to diagnose different mental disorders. This could lead to the development of faster detection methods for these diseases, which would not only improve the lifestyle of those who suffer them but, more importantly, we would be able to find faster treatments to deal with them.


* * *

By Carmen Martínez Antón, Initiated Researcher (N3) in ViVoLab and BSICoS groups from Instituto de Investigación de Aragón (I3A) at University of Zaragoza (Zaragoza, Spain). BSICoS group is also part of CIBER de Bioingeniería, Biomateriales y Nanomedicina (Spain) and Instituto de Investigación Sanitaria de Aragón (IIS Aragón, Zaragoza, Spain).