Journal Information Journal ID (publisherid): OJPHI ISSN: 19472579 Publisher: University of Illinois at Chicago Library 
Article Information ©2013 the author(s) openaccess: This is an Open Access article. Authors own copyright of their articles appearing in the Online Journal of Public Health Informatics. Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in the copy and the copy is used for educational, notforprofit purposes. Electronic publication date: Day: 4 Month: 4 Year: 2013 collection publication date: Year: 2013 Volume: 5Elocation ID: e88 Publisher Id: ojphi0588 

Spatial Scan Statistics for Models with Excess Zeros and Overdispersion  
Max Sousa de Lima^{2}  
Luiz H. Duczmal*^{1}  
Letícia P. Pinto^{1}  
1Universidade Federal de Minas Gerais, Belo Horizonte, Brazil; 

2Universidade Federal do Amazonas, Manaus, Brazil 

*Luiz H. Duczmal, Email: duczmal@ufmg.br 



Abstract 
Objective
To propose a more realistic model for disease cluster detection, through a modification of the spatial scan statistic to account simultaneously for inflated zeros and overdispersion.
Introduction
Spatial Scan Statistics [^{1}] usually assume Poisson or Binomial distributed data, which is not adequate in many disease surveillance scenarios. For example, small areas distant from hospitals may exhibit a smaller number of cases than expected in those simple models. Also, underreporting may occur in underdeveloped regions, due to inefficient data collection or the difficulty to access remote sites. Those factors generate excess zero case counts or overdispersion, inducing a violation of the statistical model and also increasing the type I error (false alarms). Overdispersion occurs when data variance is greater than the predicted by the used model. To accommodate it, an extra parameter must be included; in the Poisson model, one makes the variance equal to the mean.
Methods
Tools like the Generalized Poisson (GP) and the Double Poisson [^{2}] may be a better option for this kind of problem, modeling separately the mean and variance, which could be easily adjusted by covariates. When excess zeros occur, the Zero Inflated Poisson (ZIP) model is used, although ZIP’s estimated parameters may be severely biased if nonzero counts are too dispersed, compared to the Poisson distribution. In this case the Inflated Zero models for the Generalized Poisson (ZIGP), Double Poisson (ZIDP) and Negative Binomial (ZINB) could be good alternatives to the joint modeling of excess zeros and overdispersion. By one hand, Zero Inflated Poisson (ZIP) models were proposed using the spatial scan statistic to deal with the excess zeros [^{3}]. By the other hand, another spatial scan statistic was based on a PoissonGamma mixture model for overdispersion [^{4}]. In this work we present a model which includes inflated zeros and overdispersion simultaneously, based on the ZIDP model. Let the parameter p indicate the zero inflation. As the the remaining parameters of the observed cases map and the parameter p are not independent, the likelihood maximization process is not straightforward; it becomes even more complicated when we include covariates in the analysis. To solve this problem we introduce a vector of latent variables in order to factorize the likelihood, and obtain a facilitator for the maximization process using the EM (ExpectationMaximization) algorithm. We derive the formulas to maximize iteratively the likelihood, and implement a computer program using the EM algorithm to estimate the parameters under null and alternative hypothesis. The pvalue is obtained via the Fast Double Bootstrap Test [^{5}].
Results
Numerical simulations are conducted to assess the effectiveness of the method. We present results for Hanseniasis surveillance in the Brazilian Amazon in 2010 using this technique. We obtain the most likely spatial clusters for the Poisson, ZIP, PoissonGamma mixture and ZIDP models and compare the results.
Conclusions
The Zero Inflated Double Poisson Spatial Scan Statistic for disease cluster detection incorporates the flexibility of previous models, accounting for inflated zeros and overdispersion simultaneously. The Hanseniasis study case map, due to excess of zero cases counts in many municipalities of the Brazilian Amazon and the presence of overdispersion, was a good benchmark to test the ZIDP model. The results obtained are easier to understand compared to each of the previous spatial scan statistic models, the Zero Inflated Poisson (ZIP) model and the PoissonGamma mixture model for overdispersion, taken separetely. The EM algorithm and the Fast Double Bootstrap test are computationally efficient for this type of problem. 
The authors acknowledge the grants provided by FAPEAM and CNPq.
[1].  Kulldorff M. 1999Spatial scan statistics: Models, calculations and applicationsGlaz J, Balakrishnan NScan Statistics and Applications Springer; Netherlands: :303–322. 
[2].  Efron B. 1986;Double Exponential Families and Their Use in Generalized Linear RegressionJournal of the American Statistical Association 81:709–721. 
[3].  Cançado A, da Silva C, da Silva M. 2011;A zeroinflated Poissonbased spatial scan statisticEmerging Health Threats Journal 2011;4 
[4].  Zhang T, Zhang Z, Lin G. 2012;Spatial scan statistics with overdispersionStatistics in Medicine 31(8):762–774. 
[5].  Davidson R, MacKinnon JG. 2001 “Improving the reliability of bootstrap tests”, Queen’s University Institute for Economic Research Discussion Paper No. 995, revised. 
Article Categories:
Keywords: Scan statistics, Zero inflated, Overdispersion, ExpectationMaximization algorithm. 