J Voice. 2025 Sep 25:S0892-1997(25)00378-9. doi: 10.1016/j.jvoice.2025.09.012. Online ahead of print.
ABSTRACT
OBJECTIVE: To develop a deep learning model to assess anxiety and depression from acoustic and lexical biomarkers able to analyze Italian psychotherapy recordings and classify three distinct conditions: depression, anxiety, and no pathology.
METHOD: Five patients diagnosed with either Major Depressive Disorder or Generalized Anxiety Disorder were selected from psychotherapy sessions conducted at RAM Psyche. A total of seven audio recordings were manually analyzed by a clinical psychologist using the DASS-21 scale, resulting in over 1000 audio segments labeled for psychopathological content. From these recordings, acoustic features and lexical markers were extracted. These features were processed through a hybrid architecture combining a Convolutional Neural Network for Mel spectrogram analysis and a Multi-Layer Perceptron for integrating lexical and acoustic inputs. Three model variants (VOM 1.1, 1.2, and 1.3) were trained and evaluated using two custom datasets (DVOM2, DVOM3), including both internal patient audio and external neutral voices.
RESULTS: The model successfully classified segments into depression, anxiety, and no pathology with promising results. Feature importance analysis revealed that prosodic cues such as lower pitch, reduced intensity, and increased pauses were highly predictive of depression, while lexical richness and adverb usage were associated with both disorders. Among the model variants, VOM 1.1 showed balanced performance across all three classes, particularly excelling in detecting depression and no pathology. In contrast, VOM 1.2 prioritized depression and anxiety detection, occasionally misclassifying ambiguous cases as symptomatic, suggesting a heightened sensitivity to subtle pathological cues. VOM 1.3 while maintaining a strong classification performance, demonstrated improved robustness on external neutral voices.
CONCLUSIONS: The Voice of Mind model demonstrates the feasibility of using speech data to support mental health diagnostics. Its capacity to distinguish between depression and anxiety, while maintaining generalization across nonpathological voices, suggests its potential as a clinical decision-support tool.
PMID:40998607 | DOI:10.1016/j.jvoice.2025.09.012
Recent Comments