Understanding and Discovering Deliberate Self-harm Content in Social Media

Author(s): Yilin Wang, Jiliang Tang, Jundong Li, Baoxin Li, Yali Wan, Clayton Mellina, Neil O’Hare, Yi Chang
Venue: International Conference on World Wide Web
Date: April 2017

Type of Experiement: Case Study
Sample Size: 1000000000
Class/Experience Level: Pre-Highschool Student, Highschool Student
Participant Selection: All participants were chosen at random, they chose 1 billion Flickr posts
Data Collection Method: Observation


The study analyses Flickr posts in an attempt to identify posts made by users that are committing self harm. They look at many metrics - tags, photo attributes (arousal, contrast, dominance, etc), post content (verbs, nouns, adverbs, readability, and sentiment), as well as times that posts are made. All of these metrics are compared with "normal" users. After determining where the main differences between self-harming users and normal users are, they conducted an experimental analysis with their algorithm. Their empirical analysis found that their algorithms would indeed help identify self-harm posts.

They also propose both a supervised and unsupervised algorithm. The difference being that in a supervised scenario, there are labels to help guide the machine learning process. However, since it is expensive and time consuming to label social media posts, the authors also propose an unsupervised algorithm. These sections have a lot of math notation, so if you are interested in the specifics of the algorithm, you should read the paper.