Towards Tracking Opium Related Discussions in Social Media

Albert Park, Mike Conway

Abstract


ObjectiveWe aim to develop an automated method to track opium relateddiscussions that are made in the social media platform calledReddit.As a first step towards this goal, we use a keyword-based approach totrack how often Reddit members discuss opium related issues.IntroductionIn recent years, the use of social media has increased at anunprecedented rate. For example, the popular social media platformReddit (http://www.reddit.com) had 83 billion page views from over88,000 active sub-communities (subreddits) in 2015. Members ofReddit made over 73 million individual posts and over 725 millionassociated comments in the same year [1].We use Reddit to track opium related discussions, because Redditallows for throwaway and unidentifiable accounts that are suitable forstigmatized discussions that may not be appropriate for identifiableaccounts. Reddit members exchange conversation via a forum likeplatform, and members who have achieved a certain status withinthe community are able to create new topically focused group calledsubreddits.MethodsFirst, we use a dataset archived by one of Reddit members who usedReddit’s official Application Programming Interface (API) to collectthe data (https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/). The dataset iscomprised of 239,772 (including both active and inactive) subreddits,13,213,173 unique user IDs, 114,320,798 posts, and 1,659,361,605associated comments that are made from Oct of 2007 to May of 2015.Second, we identify 10 terms that are associated with opium. Theterms are ‘opium’, ‘opioid’, ‘morphine’, ‘opiate’,’ hydrocodone’,‘oxycodone’, ‘fentanyl’, ‘oxy’, ‘heroin’, ‘methadone’. Third, wepreprocess the entire dataset, which includes structuring the data intomonthly time frame, converting text to lower cases, and stemmingkeywords and text. Fourth, we employed a dictionary approachto count and extract timestamps, user IDs, posts, and commentscontaining opium related terms. Fifth, we normalized the frequencycount by dividing the frequency count by the overall number of therespective variable for that period.ResultsAccording to our dataset, Reddit members discuss opium relatedtopics in social media. The normalized frequency count of postersshows that less than one percent members, on average, talk aboutopium related topics (Figure 1). Although the community as a wholedoes not frequently talk about opium related issues, this still amountsto more than 10,000 members in 2015 (Figure 2). Moreover, membersof Reddit created a number of subreddits, such as ‘oxycontin’,‘opioid’, ‘heroin’, ‘oxycodon’, that explicitly focus on opioids.ConclusionsWe present preliminary findings on developing an automatedmethod to track opium related discussions in Reddit. Our initialresults suggest that on the basis of our analysis of Reddit, members ofthe Reddit community discuss opium related issues in social media,although the discussions are contributed by a small fraction of themembers.We provide several interesting directions to future work to bettertrack opium related discussions in Reddit. First, the automated methodneeds to be further developed to employ more sophisticated methodslike knowledge-based and corpus-based approaches to better extractopium related discussions. Second, the automated method needs tobe thoroughly evaluated and measure precision, recall, accuracy, andF1-score of the system. Third, given how many members use socialmedia to discuss these issues, it will be helpful to investigate thespecifics of their discussions.Line Graphs of normalized frequency counts for posters, comments, and poststhat contained opium related termsLine Graphs of raw frequency counts for posters, comments, and posts thatcontained opium related terms

Full Text:

PDF


DOI: https://doi.org/10.5210/ojphi.v9i1.7652



Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org