OJPHI: Vol. 5
Journal Information
Journal ID (publisher-id): OJPHI
ISSN: 1947-2579
Publisher: University of Illinois at Chicago Library
Article Information
©2013 the author(s)
open-access: This is an Open Access article. Authors own copyright of their articles appearing in the Online Journal of Public Health Informatics. Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in the copy and the copy is used for educational, not-for-profit purposes.
Electronic publication date: Day: 4 Month: 4 Year: 2013
collection publication date: Year: 2013
Volume: 5E-location ID: e148
Publisher Id: ojphi-05-148

An Improved Algorithm for Outbreak Detection in Multiple Surveillance Systems
Angela Noufaily*1
Doyo Enki1
Paddy Farrington1
Paul Garthwaite1
Nick Andrews2
Andre Charlett2
1The Open University, Milton Keynes, United Kingdom;
2Health Protection Agency, London, United Kingdom
*Angela Noufaily, E-mail: a.noufaily@open.ac.uk

Abstract
Objective

To improve the performance of the England and Wales large scale multiple statistical surveillance system for infectious disease outbreaks with a view to reducing the number of false reports, while retaining good power to detect genuine outbreaks.

Introduction

There has been much interest in the use of statistical surveillance systems over the last decade, prompted by concerns over bioterrorism, the emergence of new pathogens such as SARS and swine flu, and the persistent public health problems of infectious disease outbreaks. In the United Kingdom (UK), statistical surveillance methods have been in routine use at the Health Protection Agency (HPA) since the early 1990s and at Health Protection Scotland (HPS) since the early 2000s (1,2). These are based on a simple yet robust quasi-Poisson regression method (1). We revisit the algorithm with a view to improving its performance.

Methods

We fit a quasi-Poisson regression model to baseline data.

One of the limitations of the current algorithm is the small number of baseline weeks used. We propose a simple seasonal adjustment using factors. We extend the model to include a 10-level factor.

We fit the trend component always irrespective of its statistical significance.

We are concerned that the existing weighting procedure is too drastic. The baseline at a certain week is down-weighted if the standardized Anscombe residual for that week is greater than 1. This condition was chosen empirically to avoid reducing the sensitivity of the system in the presence of large outbreaks in the baselines, but may be increasing the FPR unduly when there are no or only small outbreaks in the baselines. We investigate several other options, including reducing the down-weighting to cases where the Anscombe residuals are greater than 2 or 3.

We evaluate a new re-weighting scheme informed by past decisions. Using this adaptive scheme, baseline data where an alarm was flagged are down-weighted to reduce their effect on current predictions. The criterion we use for re-weighting, here, is the value of the exceedance score.

Finally, we investigate the validity of the upper threshold values based on the quasi-Poisson model when the data are generated using known negative binomial distributions.

Results

Our evaluation of the existing algorithm showed that the false positive rate (FPR) is too high.

A novel feature of our new models is that they make use of much more baseline data. This resulted in a better estimation of the trend and variance and decreased the FPR. In addition, we found that the trend should always be fitted even when non-significant (or extreme). This decreases the discrepancies in the results when moving from one week to another.

The adaptive reweighting scheme was found to give broadly equivalent results to the reweighting method based on scaled Anscombe residuals. Using the latter as in the original HPA method, but with much higher threshold for reweighting decreased the FPR further.

Our investigations also suggest that the negative binomial model is a reasonable one, though not ideal in all circumstances. Thus, there is a good case for replacing the quasi-Poisson model with the negative binomial.

One of the unusual features of the HPA system is that it is run every week on a database of more than 3300 distinct organisms, which is likely to produce a large number of aberrances. We found that retaining the exceedance score approach based on the 0.995 quantile is perfectly reasonable. This involves ranking aberrant organisms in order of exceedance.

Conclusions

We have undertaken a thorough evaluation of the HPA’s outbreak detection system based on simulated and real data. The main conclusion from this evaluation is that the FPR is too high, owing to a combination of factors notably excessive down-weighting of high baselines and reliance on too few baseline weeks.


Acknowledgments

This research was supported by a project grant from the Medical Research Council, and by a Royal Society Wolfson Research Merit Award.


References
1.. Farrington CP, Andrews NJ, Beale AJ, Catchpole MA. A Statistical Algorithm for the Early Detection of Outbreaks of Infectious DiseaseJournal of the Royal Statistical Society Series A 1996;159:547–563.
2.. McCabe GJ, Greenhalgh D, Gettingby G, Holmes E, Cowden J. Prediction of infectious diseases: an exception reporting systemJournal of Medical Informatics and Technologies 2003;5:67–74.

Article Categories:
  • ISDS 2012 Conference Abstracts

Keywords: outbreak, negative binomial regression, quasi-Poisson.




Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org