JMIR Ment Health. 2025 Oct 22;12:e67802. doi: 10.2196/67802.
ABSTRACT
BACKGROUND: Despite the high prevalence and significant burden of depression, underdiagnosis remains a persistent challenge. Automatic speech analysis (ASA) has emerged as a promising method for depression assessment. However, a comprehensive quantitative synthesis evaluating its diagnostic accuracy is still lacking.
OBJECTIVE: This systematic review and meta-analysis aimed to assess the diagnostic performance of ASA in detecting depression, considering both machine learning and deep learning approaches.
METHODS: We conducted a systematic search across 8 databases, including MEDLINE, PsycInfo, Embase, CINAHL, IEEE Xplore, ACM Digital Library, Scopus, and Google Scholar from January 2013 to April 1, 2025. We included studies published in English that evaluated the accuracy of ASA for detecting depression, and reported performance metrics such as accuracy, sensitivity, specificity, precision, or confusion matrices. Study quality was assessed using a modified version of the Quality Assessment of Studies of Diagnostic Accuracy-Revised. A 3-level meta-analysis was performed to estimate the pooled highest and lowest accuracy, sensitivity, specificity, and precision. Meta-regressions and subgroup analyses were performed to explore heterogeneity across various factors, including type of publication, artificial intelligence algorithms, speech features, speech-eliciting tasks, ground truth assessment, validation approach, dataset, dataset language, participants’ mean age, and sample size.
RESULTS: Of the 1345 records identified, 105 studies met the inclusion criteria. The pooled mean of the highest accuracy, sensitivity, specificity, and precision were 0.81 (95% CI 0.79 to 0.83), 0.84 (95% CI 0.81 to 0.86), 0.83 (95% CI 0.79 to 0.86), and 0.81 (95% CI 0.77 to 0.84), respectively, whereas the pooled mean of the lowest accuracy, sensitivity, specificity, and precision were 0.66 (95% CI 0.63 to 0.69), 0.63 (95% CI 0.58 to 0.68), 0.60 (95% CI 0.55 to 0.66), and 0.64 (95% CI 0.58 to 0.70), respectively.
CONCLUSIONS: ASA shows promise as a method for detecting depression, though its readiness for clinical application as a standalone tool remains limited. At present, it should be regarded as a complementary method, with potential applications across diverse contexts. Further high-quality, peer-reviewed studies are needed to support the development of robust, generalizable models and to advance this emerging field.
TRIAL REGISTRATION: PROSPERO CRD42023444431; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023444431.
PMID:41124683 | DOI:10.2196/67802
Recent Comments