sullivannicole / PUL-thru

A high-precision PU-Learner based EWS for student depression

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PUL-thru

Abstract

Suicide is the second leading cause of death for adolescents in the US [2], and the rate of suicide among Minnesota teens currently outpaces the national rate [2]. Undiagnosed or untreated depression is one of the leading causes of suicide; however, depression in young people often goes undiagnosed [26]. Moreover, there’s an oft-overlooked racial component to suicides in the state: suicide rates amongst Native American/Indian adolescents in Minnesota is triple that of any other racial/ethnic group, and suicide rates amongst young Black Minnesotans are increasing [2]. To provide treatment and prevent suicide, though, detecting depression and doing so early is critical [22] [35]. Current work in predicting student depression has either relied too heavily on custom data collection, or failed to attain usable precision [20] [10] [13]. Therefore the objectives of this project were two-fold: develop an early warning system (EWS) for student depression that (1) doesn’t require any custom data collection and (2) achieves high precision and F1. To meet the first requirement, we limited features in our EWS to only those that could be engineered from ad- ministrative school data already collected as a matter of other regulatory or functional requirements. To- wards our second end, we applied a machine learning approach called Positive and Unlabeled Learning (PU Learning); using the resulting framework, which we’ve dubbed PUL-thru, we were able to achieve a max- imum precision of 1 (average: 0.74), a maximum F1 of 0.81 (average: 0.66), and a maximum AUPRC of 0.93 (average: 0.71) across all back-test sets, identifying 94 students that would go on to have a depres- sion diagnosis 6 months in advance of their depression diagnosis, with only 36 false positives across all 5 years back-tested. That means that, were the district to deploy PUL-thru, an average of 13 students would receive mental health outreach each semester, with 74% of those students (on average) actually needing those services (according to our ground truth). To help the district develop meaningful, individualized inter- ventions for students whom our EWS predicts will receive a depression diagnosis, we constructed a causal model, and found the strongest deterrent to a depression diagnosis was high academic performance in the previous semester, while an increase in excused absences conferred a slight elevation in risk of depression diagnosis.

Code

Substantial feature engineering was required in this project; feature engineering scripts are contained in the data folder. Extensive model experimentation was performed; code for tuning some of the contender models is given in the model folder.

Data

Thea actual data used in this project are not publicly available; however, for transparency on the implementation of logic described in the paper (paper folder), a majority of the source code is made available here, nonetheless.

About

A high-precision PU-Learner based EWS for student depression


Languages

Language:R 72.9%Language:Python 27.1%