NLP facilitates a long, hard look at linguistic cues to mental health
Natural language processing of social media posts is useful for identifying depression, anxiety and suicidal thinking, but models trained on population data cannot discern long-term patterns in any one person’s state of mind.
So report researchers in Australia who applied a popular NLP tool, Linguistic Inquiry and Word Count (LIWC), to posts from 38 bloggers and social media users over a nine-month period.
Bridianne O’Dea, PhD, of the University of New South Wales and colleagues describe the project in a study published in Plos One May 19.
The team found LIWC succeeded in predicting mental health scores of self-reporting participants whose data were not in the training set.
It also turned up significant associations between linguistic features and psychological health across participants.
However, when tested on longitudinal data of individual subjects, LIWC brought back no conclusive correlations between patterns in language and shifts in symptoms over the course of the 36 weeks.
“This indicates that the model trained by group-level data could identify those with a mental illness but was not able to detect individual changes in mental health over time,” O’Dea and co-authors comment.
At the same time, the lack of group-to-individual generalizability “may also indicate that the underlying processes are indeed non-ergodic, that is, the relationship between linguistic features and mental health may not be equivalent across individuals and time,” they write. “This represents a significant challenge to past studies and cautions the use of group-level linguistic markers for inferring individuals’ mental health status.”
Further, the team adds,
As outlined, the relationship between linguistic features and mental health state may be specific to subgroups, such as the nature of the mental health problem or demographics such as gender, age and cultural identity. Patterns of linguistic expression may also differ according to the volume, type and frequency of the collected social media data, with the language conventions, word counts and social norms of each [social media] platform likely to influence findings.
The study is available in full for free.