Audio based depression detection using Convolutional Autoencoder

Sara Sardari, Bahareh Nakisa, Mohammed Naim Rastgoo, Peter Eklund

Research output: Contribution to journalArticlepeer-review

35 Citations (Scopus)


Depression is a serious and common psychological disorder that requires early diagnosis and treatment. In severe episodes the condition may result in suicidal thoughts. Recently, the need for building an effective audio-based Automatic Depression Detection (ADD) system has sparked the interest of the research community. To date, most of the reported approaches to recognize depression rely on hand-crafted feature extraction for audio data representation. They combine wide variety of audio-related features to improve the classification performance. However, combining many hand-crafted features including relevant and less-relevant can enlarge the feature space which can lead to high-dimensionality issues as not all the features would carry significant information regarding depression. Having high number of features can make the pattern recognition more difficult and increase the risk of overfitting. To overcome these limitations, an audio-based framework of depression detection which includes an adaptation of a deep learning (DL) technique is proposed to automatically extract the highly relevant and compact feature set. This proposed framework uses an end-to-end Convolutional Neural Network-based Autoencoder (CNN AE) technique to learn the highly relevant and discriminative features from raw sequential audio data, and hence to detect depressed people more accurately. In addition, to address the sample imbalance problem we use a cluster-based sampling technique which highly reduces the risk of bias towards the major class (non-depressed). To evaluate the performance and effectiveness of the proposed pipeline, we perform the experiments on Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset and compare them with the hand-crafted feature extraction methods and other outstanding studies in this domain. The results show that proposed method outperforms other well-known audio-based ADD models with at least 7% improvement in F-measure for classifying depression.

Original languageEnglish
Article number116076
JournalExpert Systems with Applications
Early online date20 Oct 2021
Publication statusPublished - 1 Mar 2022
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2021 Elsevier Ltd


  • Audio depression detection
  • Semi-supervised learning
  • Convolutional Autoencoder
  • Early depression detection


Dive into the research topics of 'Audio based depression detection using Convolutional Autoencoder'. Together they form a unique fingerprint.

Cite this