Predicting mental health diagnoses with machine learning based on questionnaire survey data
Background Mental health problems constitute a major occupational health challenge. In our research we analyze the associations between psychiatric diagnoses and responses to an occupational health questionnaire using a machine learning classifier.
Methods The study material included occupational health questionnaires and psychiatric diagnoses from 11,828 customers of an occupational healthcare provider. Using XGBoost, a supervised machine learning classifier, we aimed at predicting whether an individual received a psychiatric diagnosis during the first two years after answering to the occupational health questionnaire.
Results Models based on all items found in the occupational health questionnaire as well as models based on seven most important items performed markedly better in predicting the psychiatric diagnosis than a trivial model based only on age and gender or a random classifier. The most important items in the prediction were related to stress, sadness, and exhaustion.
Conclusions Using the methods of machine learning, we were able to predict psychiatric diagnoses from a general occupational health questionnaire and to automatically screen the questionnaire items most relevant to our prediction problem. The approaches we utilized may turn out to be useful in other studies in the field of occupational health and safety.
Olli Haavisto, Ari Väänänen, Pekka Varje, Simo Taimela, Ara Taalas, Oskar Niemenoja, Niina Nieminen