Opioid Surveillance using Social Media: How URLs are shared among Reddit members

Albert Park, Mike Conway



We aim to understand (1) the frequency of URL sharing and (2) types of shared URLs among opioid related discussions that take place in the social media platform called Reddit.


Nearly 100 people per day die from opioid overdose in the United States. Further, prescription opioid abuse is assumed to be responsible for a 15-year increase in opioid overdose deaths1. However, with increasing use of social media comes increasing opportunity to seek and share information. For instance, 80% of Internet users obtain health information online2, including popular social interaction sites like Reddit (http://www.reddit.com), which had more than 82.5 billion page views in 20153. In Reddit, members often share information, and include URLs to supplement the information. Understanding the frequency of URL sharing and types of shared URLs can improve our knowledge of information seeking/sharing behaviors as well as domains of shared information on social media. Such knowledge has the potential to provide opportunities to improve public health surveillance practice. We use Reddit to track opioid related discussions and then investigate types of shared URLs among Reddit members in those discussions.


First, we use a dataset4—made available on Reddit—that has been used in several informatics studies5,6. The dataset is comprised of 13,213,173 unique member IDs, 114,320,798 posts, and 1,659,361,605 associated comments that are made on 239,772 (including active and inactive) subreddits (i.e., sub-communities) from October 2007 to May 2015. Second, we identified 9 terms that are associated with opioids. The terms are 'opioid', 'opium', 'morphine', 'opiate', 'hydrocodone', 'oxycodone', 'fentanyl', 'heroin', and 'methadone'. Third, we preprocessed the entire dataset (i.e., converting text to lower cases and removing punctuation) and extracted discussions with opioid terms and their metadata (e.g., user ID, post ID) via a lexicon-based approach. Fourth, we extracted URLs using Python from these discussions, categorized the URLs by domain, and then visualized the results in a bubble chart7.


We extracted 1,121,187 posts/comments that were made by 328,179 unique member IDs from 8,892 subreddits. Of the 1,121,187 posts/comments, 82,639 posts/comments contained URLs (7.37%), and these posts consisted of 272,551 individual URLs and 138,206 unique URLs. The types of shared URLs in these opioid related discussions are summarized in Figure 1. The color and size represent the type and size respectively of shared URLs. The ‘.com’ is in blue; ‘.org’ is in orange; and ‘.gov’ is in green.


We present preliminary findings concerning the types of shared URLs in opioid-related discussions among Reddit members. Our initial results suggest that Reddit members openly discuss opioid related issues and URL sharing is a part of information sharing. Although members share many URLs from reliable information sources (e.g., ‘ncbi.nlm.nih.gov’, ‘wikipedia.org, ‘nytimes.com’, ‘sciencedirect.com’), further investigation is needed concerning many of the ‘.com’ URLs, which have the potential to contain high and/or low quality information (e.g., ‘youtube.com’, ‘reddit.com’, ‘google.com’, ‘amazon.com’) to fully understand information seeking/sharing behaviors on social media and to identify opportunities, such as misinformation dissemination for improving public health surveillance practice.

Full Text:


DOI: http://dx.doi.org/10.5210/ojphi.v10i1.8419

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org