NLP facilitates a long, hard look at linguistic cues to mental health

Natural language processing of social media posts is useful for identifying depression, anxiety and suicidal thinking, but models trained on population data cannot discern long-term patterns in any one person’s state of mind.

So report researchers in Australia who applied a popular NLP tool, Linguistic Inquiry and Word Count (LIWC), to posts from 38 bloggers and social media users over a nine-month period.

Bridianne O’Dea, PhD, of the University of New South Wales and colleagues describe the project in a study published in Plos One May 19.

The team found LIWC succeeded in predicting mental health scores of self-reporting participants whose data were not in the training set.

It also turned up significant associations between linguistic features and psychological health across participants.

However, when tested on longitudinal data of individual subjects, LIWC brought back no conclusive correlations between patterns in language and shifts in symptoms over the course of the 36 weeks.

“This indicates that the model trained by group-level data could identify those with a mental illness but was not able to detect individual changes in mental health over time,” O’Dea and co-authors comment.  

At the same time, the lack of group-to-individual generalizability “may also indicate that the underlying processes are indeed non-ergodic, that is, the relationship between linguistic features and mental health may not be equivalent across individuals and time,” they write. “This represents a significant challenge to past studies and cautions the use of group-level linguistic markers for inferring individuals’ mental health status.”

Further, the team adds,  

As outlined, the relationship between linguistic features and mental health state may be specific to subgroups, such as the nature of the mental health problem or demographics such as gender, age and cultural identity. Patterns of linguistic expression may also differ according to the volume, type and frequency of the collected social media data, with the language conventions, word counts and social norms of each [social media] platform likely to influence findings.

The study is available in full for free.

Dave Pearson

Dave P. has worked in journalism, marketing and public relations for more than 30 years, frequently concentrating on hospitals, healthcare technology and Catholic communications. He has also specialized in fundraising communications, ghostwriting for CEOs of local, national and global charities, nonprofits and foundations.

Around the web

The tirzepatide shortage that first began in 2022 has been resolved. Drug companies distributing compounded versions of the popular drug now have two to three more months to distribute their remaining supply.

The 24 members of the House Task Force on AI—12 reps from each party—have posted a 253-page report detailing their bipartisan vision for encouraging innovation while minimizing risks. 

Merck sent Hansoh Pharma, a Chinese biopharmaceutical company, an upfront payment of $112 million to license a new investigational GLP-1 receptor agonist. There could be many more payments to come if certain milestones are met.