Front Psychiatry. 2025 Aug 5;16:1648585. doi: 10.3389/fpsyt.2025.1648585. eCollection 2025.

ABSTRACT

INTRODUCTION: Depression is highly prevalent among college students, and accurately identifying risk factors is essential for timely intervention. Given the limitations of traditional linear models in managing high-dimensional data, this study employed machine learning techniques to predict depressive symptoms.

METHOD: Data were collected from 1,635 Chinese college students and included 38 sociodemographic, psychological, and social variables. Four machine- learning algorithms, Random Forest, XGBoost, LightGBM, and Support Vector Machine, were evaluated.

RESULTS: Results showed that the Random Forest model achieved the highest discriminant performance with an AUC of 0.87 and an accuracy of 0.79, and identified key predictors such as sleep disturbance, perceived stress, experiential avoidance, and self-criticism. SHapley Additive exPlanations analysis further revealed that deteriorating sleep quality and heightened stress levels significantly increased the risk of depressive symptoms.

DISCUSSION: These findings validate the effectiveness of Random Forest in capturing complex data interactions and offer actionable insights for targeted mental health interventions. Future studies should improve generalizability by incorporating more diverse samples and physiological biomarkers.

PMID:40838255 | PMC:PMC12361154 | DOI:10.3389/fpsyt.2025.1648585