Evaluation of approaches that adjust for biases in participatory surveillance systems

Kristin Baltrusaitis, Kathleen Noddin, Colleen Nguyen, Adam Crawley, John S. Brownstein, Laura F. White



To estimate and compare influenza attack rates (AR) in the United States (US) using different approaches to adjust for reporting biases in participatory syndromic surveillance data.


Because the dynamics and severity of influenza in the US vary each season, yearly estimates of disease burden in the population are essential to evaluate interventions and allocate resources. The CDC uses data from a national health-care based surveillance system and mathematical models to estimate the overall burden of disease in the general population. Over the past decade, crowd-sourced syndromic surveillance systems have emerged as a digital data source that collects health-related information in near real-time. These systems complement traditional surveillance systems by capturing individuals who do not seek medical care and allowing for a longitudinal view of illness burden. However, because not all participants report every week and participants are more likely to report when ill, the number of weekly reports is temporally and spatially inconsistent and the estimates of disease burden and incidence may be biased. In this study, we use data from Flu Near You (FNY), a participatory surveillance system based in the US and Canada1, to estimate and compare Influenza-like Illness (ILI) ARs using different approaches to adjust for reporting biases in participatory surveillance data.


This analysis uses FNY data from the 2015-16 influenza season. Four different approaches of bias adjustment were assessed. The first approach includes all FNY participants, defined as users and household members, who submitted at least one symptom report, whereas the second approach only includes FNY participants who submitted at least 10 symptom reports. The third approach includes all FNY participants who submitted at least one symptom report, but drops the first symptom report for all participants. For the first three approaches, all missing reports were assumed to be non-ILI when estimating attack rates. Finally, the fourth approach includes FNY participants who submitted at least 10 symptom reports and uses multiple imputation to account for missing reports. Age-stratified and overall estimates of ILI ARs were calculated for each of the four approaches to bias adjustment by dividing the sum of the weekly incident cases of ILI, defined as the first report of fever with cough and/or sore throat, by the population at risk at the beginning of the period.


During the 2016-2017 influenza season, FNY received an average of 10,723 unique symptom reports per week from 46,390 registered users and their household members. For FNY, the youngest age group assessed, 5-17, had the largest ILI AR, and the ILI ARs decreased as the age group increased for all approaches. Overall, the approach that drops all first reports had the smallest ARs, whereas the approach that selects a cohort of users who submit at least 10 reports during the season and imputes the missing reports had the largest ARs. Although the influenza ARs estimated by the CDC were less than the ILI ARs estimated using FNY data for all age-groups, a similar pattern was observed across age groups, except for the 50-64 age group, which had the largest influenza AR.


As expected, the ARs estimated using FNY data were greater than the CDC’s influenza ARs because FNY estimates ARs of ILI and does not adjust for the probability of reporting ILI when experiencing non-flu illness. The approach of dropping the first report had the smallest ARs because during the 2015-16 influenza season the weekly percent of ILI cases that were first time reports ranged from 18-59%. This approach was developed to adjust for the potential correlation between symptom presence and willingness to join the platform. However, important information about the dynamics of disease may be lost when using this approach. The multiple imputation method was used only for individuals who submitted at least 10 reports to maintain a missing data rate below 30%. The imputation model also assumed that data were missing at random, which may not be appropriate in this case, because approximately 30% of FNY users have reported that they are more likely to report when ill. As shown in Table 1, the AR estimate depends on the bias adjustment approach. Simulation-based studies should be performed to further evaluate these methods.


1. Smolinski MS, Crawley AW, Baltrusaitis K, Chunara R, Olsen JM, Wójcik O, et al. Flu Near You: Crowdsourced Symptom Reporting Spanning 2 Influenza Seasons. Am J Public Health. 2015
2. Rolfes MA, Foppa IM, Garg S, Flannery B, Brammer L, Singleton JA, et al. Estimated Influenza Illnesses, Medical Visits, Hospitalizations, and Deaths Averted by Vaccination in the United States. 2016 Dec 9 [2017 Sept 25];


Full Text:


DOI: http://dx.doi.org/10.5210/ojphi.v10i1.8908

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org