Data Quality Matters: Suicide Intention Detection on Social Media Posts Using a RoBERTa-CNN
Suicide remains a global health concern for the field of health, which urgently needs innovative approaches for early detection and intervention. This paper focuses on identifying suicidal intentions in SuicideWatch Reddit posts and presents a novel approach to detect suicide using the cutting-edge RoBERTa-CNN model, a variant of RoBERTa (Robustly optimized BERT approach). The RoBERTa captures textual information and forms semantic relationships within texts well. By adding the Convolution Neural Network (CNN) head, the RoBERTa enhances its ability to capture important patterns from heavy datasets. To evaluate the RoBERTa-CNN, we experimented on the Suicide and Depression Detection dataset and obtained solid results. For example, RoBERTa-CNN achieves 98% mean accuracy with the standard deviation (STD) of 0.0009. It also reaches over 97.5% mean AUC value with an STD of 0.0013. Then, RoBERTa-CNN outperforms competitive methods, demonstrating the robustness and ability to capture nuanced linguistic patterns for suicidal intentions. Hence, RoBERTa-CNN can detect suicide intention on text data very well.