Analysis of depression in social media texts through the Patient Health Questionnaire-9 and natural language processing

Nam Hyeok Kim; Ji Min Kim; Da Mi Park; Su Ryeon Ji; Jong Woo Kim

doi:10.1177/20552076221114204

Analysis of depression in social media texts through the Patient Health Questionnaire-9 and natural language processing

Digit Health. 2022 Jul 17:8:20552076221114204. doi: 10.1177/20552076221114204. eCollection 2022 Jan-Dec.

Authors

Nam Hyeok Kim¹, Ji Min Kim², Da Mi Park², Su Ryeon Ji¹, Jong Woo Kim³

Affiliations

¹ Department of Mathematics, Hanyang University, Seoul, Republic of Korea.
² Business Administration, Hanyang University, Seoul, Republic of Korea.
³ School of Business, Hanyang University, Seoul, Republic of Korea.

Abstract

Objective: Although depression in modern people is emerging as a major social problem, it shows a low rate of use of mental health services. The purpose of this study was to classify sentences written by social media users based on the nine symptoms of depression in the Patient Health Questionnaire-9, using natural language processing to assess naturally users' depression based on their results.

Methods: First, train two sentence classifiers: the Y/N sentence classifier, which categorizes whether a user's sentence is related to depression, and the 0-9 sentence classifier, which further categorizes the user sentence based on the depression symptomology of the Patient Health Questionnaire-9. Then the depression classifier, which is a logistic regression model, was generated to classify the sentence writer's depression. These trained sentence classifiers and the depression classifier were used to analyze the social media textual data of users and establish their depression.

Results: Our experimental results showed that the proposed depression classifier showed 68.3% average accuracy, which was better than the baseline depression classifier that used only the Y/N sentence classifier and had 53.3% average accuracy.

Conclusions: This study is significant in that it demonstrates the possibility of determining depression from only social media users' textual data.

Keywords: Depression; Patient Health Questionnaire-9; deep learning; machine learning; natural language processing; social media.