Unsupervised cyber bullying detection in social networks

Di Nardo, Emanuel; Di Capua, Michele; Petrosino, Alfredo

doi:10.1109/ICPR.2016.7899672

Modern young people (“digital natives”) have grown in an era dominated by new technologies where communications are pushed to quite a real-time level, and pose no limits in establishing relationships with other people or communities. However, the speed of evolution does not allow young people to split consciously acceptable behaviors from potentially harmful ones and a new phenomenon known as cyber bullying is emerging with increasing evidence, attracting the attention of educators, and media. Cyber bullying is defined as “willful and repeated harm inflicted through the use of electronic devices” [1]. In this paper we propose a possible solution for automatic detection of bully traces over a social network, using techniques derived from NLP (Natural Language Processing) and machine learning. Specifically, we shall design a model inspired by Growing Hierarchical SOMs, able to cluster efficiently documents containing bully traces, built upon semantic and syntactic features of textual sentences. We fine-tuned our model to work with the social network Twitter, but we also tested the model against other social networks such as YouTube and Formspring. Finally, we report our results, showing that the proposed unsupervised approach could be effectively used with good performances in some scenarios.