Journal Information Journal ID (publisherid): OJPHI ISSN: 19472579 Publisher: University of Illinois at Chicago Library 
Article Information ©2010 the author(s) openaccess: This is an Open Access article. Authors own copyright of their articles appearing in the Online Journal of Public Health Informatics. Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in the copy and the copy is used for educational, notforprofit purposes. collection publication date: Year: 2010 Electronic publication date: Day: 9 Month: 4 Year: 2010 Volume: 2 Issue: 1 Elocation ID: ojphi.v2i1.2837 DOI: 10.5210/ojphi.v2i1.2837 Publisher Id: ojphi023 

A confidencebased aberration interpretation framework for outbreak conciliation  
Shamir Nizar Mukhi, PhD^{1}  
1University of Manitoba, Public Health Agency of Canada 

Correspondence: Contact, Shamir Nizar Mukhi, PhD, Email: shamir.nizar.mukhi@phacaspc.gc.ca 



Abstract 
Health surveillance can be viewed as an ongoing systematic collection, analysis, and interpretation of data for use in planning, implementation, and evaluation of a given health system, in potentially multiple spheres (ex: animal, human, environment). As we move into a sophisticated technologically advanced era, there is a need for costeffective and efficient health surveillance methods and systems that will rapidly identify potential bioterrorism attacks and infectious disease outbreaks. The main objective of such methods and systems would be to reduce the impact of an outbreak by enabling appropriate officials to detect it quickly and implement timely and appropriate interventions. Identifying an outbreak and/or potential bioterrorism attack days to weeks earlier than traditional surveillance methods would potentially result in a reduction in morbidity, mortality, and outbreak associated economic consequences. Proposed here is a novel framework that takes into account the relationships between aberration detection algorithms and produces an unbiased confidence measure for identification of start of an outbreak. Such a framework would enable a user and/or a system to interpret the anomaly detection results generated via multiple algorithms with some indication of confidence. 
Recent advances in technology have made it possible to gather, integrate, and analyze large amounts of data in realtime or near realtime. These new technologies have touched off a renaissance in public health surveillance. For the most part, the traditional purposes of health surveillance have been to monitor longterm trends in disease ecology and to guide policy decisions. With the introduction of realtime capabilities, data exchange now holds the promise of facilitating early event detection and to assist in daytoday disease management.
With the availability of dozens of different aberration detection algorithms, it is possible, if not probable, to get different results from different algorithms when executed on the same dataset. The results of the study in [^{1}] suggest that commonlyused algorithms for disease surveillance often do not perform well in detecting aberrations other than large and rapid increases in daily counts relative to baseline levels. A new approach, denoted here as Confidencebased Aberration Interpretation Framework (CAIF), may help address this issue in disease surveillance by using a collective approach rather than algorithm specific approach.
Consider a system with multiple anomaly detection algorithms as illustrated in Figure 1. Due to differences in the implementation of the algorithms and parameters used (ex: thresholds, training periods and averaging windows), the outbreak decisions may vary significantly from one algorithm to another. On the other hand, there is also a possibility that these decisions are very similar for some set of algorithms. These two extremes create a dilemma for decision makers in that there could be a situation where most of the algorithms in a system suggest an outbreak, however, not knowing the relationships between these algorithms can result in a biased decision.
The Outbreak Detection Problem
As illustrated, there are three main points of concern:
These three concerns result in a tradeoff situations between false positives, false negatives and detection time which are typically addressed by looking at sensitivity, specificity and time to detect parameters.
In summary, a framework needs to be implemented that would enable a user/system to interpret the anomaly detection results with some indication of confidence. That is, is there a potential start of an outbreak with twenty percent confidence or is it ninety percent confidence? A framework that takes into account the relationships between algorithms and produces an unbiased confidence measure for identification of start of an outbreak is presented.
The proposed anomaly interpretation framework aims to enhance surveillance decisionmaking by combining results of multiple aberration detection algorithms through the use of key result metrics. Figure 2 depicts the four steps of the proposed framework and the associated linkages between them.
The Confidencebased Aberration Interpretation Framework
Traditionally, specificity and sensitivity have been used for comparing various algorithms and their performances. In this study, these two parameters are key in helping identify a subset of algorithms (referred to as minimal set) that would be sufficient to deduce an overall decision to detect start of an outbreak. The hypothesis is that the system may not require all candidate algorithms to come up with a good decision as some of them may provide redundant information.
Sensitivity of an algorithm for a given dataset is defined as the total number of outbreaks during which the algorithm flagged (at least once per outbreak) divided by the total number of outbreak periods in the dataset^{}. Specificity of an algorithm for a given dataset, on the other hand, is defined as the total number of nonoutbreak days on which the method did not flag divided by the total number of nonoutbreak days in that dataset [^{2}]:
Sensitivity=(True Positive Count)/(Total Number of Outbreaks)Specificity=(True Negative Count)/(Total Number of No Outbreak Days) 
In addition to specificity and sensitivity, a third parameter called time to detection (TTD) defined as the average number of days from the first day of an outbreak until it was flagged by the algorithm, plays a vital role in the forthcoming analysis. This is a very important parameter as it aids in segregating a set of algorithms into various groups (or classes) and provides a very clear differentiation between set of algorithms based on its interpretation.
Figure 3 illustrates, in time, a progression of a sample outbreak over multiple days. Periods with no outbreaks are referred to as peacetime, while outbreakmode refers to a time period with outbreak days.
A sample outbreak
The three parameters discussed in this section provide a wealth of insight into the goal of identifying a minimal sub set of algorithms sufficient for generating an overall confidence value for an anomaly indicator.
Agreement analyzer deals with quantifying the degree of agreement or relationship between any given two algorithms executed on the same data set. That is, are all candidate algorithms producing unique results? Or, is it that some algorithms yield similar results and thus provide no added value to the overall decision? This step of the framework exploits such relationship and/or agreement between any two algorithms using two quite different approaches: Correlation and Kappa Coefficient.
Correlation is one of the most common and most useful statistics. A correlation, r, is a single number that describes the degree of linear relationship between two variables (also referred to as bivariate relationship). A positive relationship, in general terms, means that higher scores on one variable tend to be paired with higher scores on the other and that lower scores on one variable tend to be paired with lower scores on the other.
The correlation between two variables, in this case the two algorithm values or decisions, can be obtained using [^{3}]:
r=N∑xy−(∑x)(∑y)[N∑x2−(∑x)2][N∑y2−(∑y)2] 
ρcorrelation=(r11r12…r1nr21r22…r2nrn1rn2…rnn) 
A minimum agreement threshold based on correlation TAcorrelation needs to be defined that can be used in the next step of the framework to identify nearest neighbors for each algorithm based on the strength of the relationships.
An alternative approach to correlation matrix is the computation of Kappa Coefficient, which is an index that compares the agreement against that which might be expected by chance. Kappa can be thought of as the chancecorrected proportional agreement, where possible values range from +1 (perfect agreement) via 0 (no agreement above that expected by chance) to 1 (complete disagreement).
Cohen's kappa coefficient approach [^{4}] can be used to generate kappa coefficient matrix. Consider a 2×2 table capturing decision outcomes by two different algorithms being compared as shown in Figure 4.
Kappa coefficient: 2 by 2 table
The following formula was used to compute the kappa coefficient between any two algorithms:
κ=(Po−Pc)(1−Pc)Po=NN+YYT,Pc=NN+NYT*NN+YNT+NY+YYT*YN+YYT 
_{ρ}_{kappa}, the agreement matrix based on kappa coefficients, is obtained using the above formulas as follows:
ρkappa=(κ11κ12…κ1nκ21κ22…κ2nκn1κn2…κnn) 
Once the kappa matrix has been computed, it is necessary to consider the significance of obtained agreement values between any pair of algorithms. Landis and Koch [^{5}] give the following table for interpreting the significance of the κ value. Although inexact, this table provides a useful benchmark on the significance of the above matrix.
Based on the results and table above, the minimum agreement threshold based on kappa TAkappa can be deduced, which can be set to 0.5 based on the above table. This is the value that will be used in the next step of the framework to identify nearest neighbors for each algorithm based on the strength of the relationships.
Once the sensitivity, specificity and time to detect parameters are well established for each algorithm and the agreement levels between every possible algorithm pair is known, a minimal set of algorithms can be identified that would be sufficient to produce quantifiable confidence value for the overall decision. Figure 5 illustrates a fivestep process developed to identify this minimal set based on results from the previous two steps of the proposed framework.
Minimal Set Identification Process
This step of the framework yields a minimal subset of candidate algorithms that have minimal relation with each other and thus, form close to an independent minimal set that would be sufficient to deduce a confidence measure for an outbreak decision for a given day.
The final step of the proposed framework deals with pulling together the findings from the first three steps and working out a scheme that produces a value that corresponds to overall confidence. There are three main parameters that need to be investigated.
The first parameter is the rate of change (referred to as rise rate) of actual daily count values over a specific time period, which provides some basic knowledge of the positive or negative trend over the last few days and also yields the speed with which the change is occurring.
Rise rate analysis
Figure 6 illustrates a typical snapshot from daily counts data where the yaxis represents daily raw count and the xaxis represents the day with D(Δ) representing the current day. The rate of change (λ) is computed using basic linear regression method [^{6}] to define a line that fits the daily count values in best possible manner:
λ=n∑xy−∑x∑yn∑x2−(∑x)2 
To be effective, the computation of rate is limited to a specific time frame referred to as an epidemiologically significant window, Δ, which is defined in number of days.
Next parameter of interest is analyzing the importance of the current day's count with respect to Δ. That is, does today's count follow a typical trend identified by the linear regression or is it drastically different and thus deserves special attention. As shown in Figure 7, there could be a scenario where past (Δ − 1) values yield a negative direction, however current day's value (h) is very high but cannot influence the linear regression formula to produce a positive slope which is more accurate in this case.
Count delta
For such cases, the framework takes into account a second parameter of interest called count delta (ω). This value is simply the ratio between current day value, h, and the average value over Δ.
ω=h1Δ∑i=I−Δ+1i=IXi 
Based on the output of step three of the framework, the individual outbreak decision flags need to be considered. These provide the third parameter of interest, ϕ_{i}, where i refers to the algorithms in the minimal set. Each ϕ_{i} can have one of two values: true representing an outbreak has been detected by algorithm i and false representing no outbreak decision by algorithm i.
The overall objective of the framework is to produce a set of algorithms, that is as minimal as possible, to evaluate an aberration decision for any given day with some confidence value. Due to availability of multiple algorithms, a system that facilitates incremental confidence building based on contributions from various algorithms needs to be developed. A bimodal approach to confidence evaluation is proposed to address this issue as shown in Figure 8.
This bimodal approach is based on the concept of contribution to positive and negative confidence of a decision. The fundamental premise of the proposed scheme is a rule set, which is defined as the set of rules that collectively contribute to either positive or negative confidence. Positive confidence is a measure of collective strength of rules that contribute to a decision that supports identification of start of an outbreak. On the other hand, negative confidence is a measure of collective strength of rules that contribute to a decision that is against the decision of start of an outbreak. Rule sets are made of weighted combination of identified parameters of interest. Further discussion on details of rule sets will follow shortly. Once the rule set has been identified, appropriate weights (or points) are assigned to the members of the rule set contributing to either side. A set of rules that contribute to positive confidence by collective summation of all of their respective points (p) are referred to as the R set. On the contrary, a set of rules that contribute to negative confidence by collective summation of all of their respective points (n) are referred to as the L set. That is, each side adds its collective contribution followed by (p − n) to come up with overall confidence with 0 as the no decision point.
Point assignment scheme
The following rules contribute to incremental positive confidence (R side rules):
[φi=true∀i∈Kλd>Tu*λd−1ωd>Tu*ωd−1] 
The following rules contribute to incremental negative confidence (L side rules):
[φi=false∀i∈Kλd<Td*λd−1ωd<Td*ωd−1] 
The use of λ and ω requires introduction of some threshold value that defines the decision points in both the upward and downward directions. Thus, the scheme makes use of T_{u} parameter for the positive (or upside) threshold value and T_{d} for the negative (or downside) threshold value. Both of these values can be computed using sophisticated approaches like neural networks, however, a simple intuitive approach using hysteresis (Figure 9) was adopted. That is, λ and ω would contribute to positive confidence if the current day values were at least T_{u} times bigger than the previous day values. However, they would only contribute to negative confidence if the current day values were less than T_{d} times previous day values. This approach assists in identifying abrupt rises and falls in the count values with respect to immediate history. The proposed rule of thumb is to use T_{u} ≈ 3 * T_{d}.
Threshold hysteresis
To summarize, there are total of Z = 2(K + 2) rules that define a specific rule set ζ_{i} for a given point assignment i. In an attempt to simply the representation of rules and associated point assignments for L and R rules, a concise convention was designed as follows:
ζi=〈1Lp1Rp1,2Lp2Rp2,3Lp3Rp3,4Lp4Rp4,5Lp5Rp5,…,VLpVRpV〉 
With Z2 possible rules on each side, the most obvious choice is a balanced system with the maximum number of points for negative confidence and the maximum number of points for positive confidence to equal multiple of Z2. That is, if both sides matched in their outcomes, then the overall confidence value would equate to 0, an indecisive line. To facilitate wider base of different points and associated effects on overall decision, a system that exercises the point assignment with an unbiased (random) allocation of points is necessary. However, before such a system can be developed, the value of maximum points for each side (M) needs to be established. This can be achieved as follows:
∑i=1Zpi=M 
Maximum number of points
In Figure 10, xaxis represents M and yaxis represents the total number of point assignment possibilities for Z = 12 (that is, K = 6). In this specific case, M = 12 seems reasonable as it is located at the knee of the rising curve and provides 6188 assignment possibilities, a number that is quite reasonable for simulation purposes.
Now that the rules and point assignment method have been designed, there is a need for devising a system that interprets outcomes of the application of identified rules and associated points and yields an optimal point assignment that produces desired outcome. The proposed approach is to group sensitivty and specificity values obatined using numerous random point assignments into clusters of interest as shown in Figure 11. The idea is to identify specific areas of interest (AOI) on this scatter plot that produce outcome that is superior when compared to any single algorithm. That is, three AOIs are identified as follows: high specificity (left top); high sensitivity (bottom right) and maximum sensitivity/specificity (knee).
Clusters
Any of the commonly used clustering techniques may be used to identify AOIs. The proposed approach utilizes kmeans clustering [^{7}] technique as it allows identification of initial centroids of desired clusters, which is attractive since, as discussed above, typically one would like to look at very specific clusters that provide, for instance, high specificity and high sensitivity  that is, AOI(^{3}).
The objective of kmeans approach is to minimize total intracluster variance, or, the squared error function:
V=∑i=1k∑xj∈Sixj−μi2 
Application of clustering methodology yields a multitude of rule sets ζ_{i} each of which produce a sensitivty/specificity pair ν_{ζi} yielding:
ψk={νζi},∀i∈k 
The proposed CAIF framework utilizes a number of variables as follows:
Based on this list, the following set, referred to as CAIF Parameters, needs to be populated using various steps of the framework:
CAIF Variables={N,ρ,TA,K,Z,Tu,Td,Δ} 
CAIF Parameters={λ,ω,φj} 
CAIF Outputs={ζi,νζi,ψk} 
A simulation environment was setup that comprised of custom simulator for some aspects of the proposed approach as well as an open source package (R [^{8}]) to compute various statistical and epidemiological parameters used in the proposed approach. The data for simulation were obtained from CDC [^{2}].
Nine candidate algorithms were selected based on literature review of most commonly used aberration detection algorithms: 3day (MA3), 5day (MA5) and 7day (MA7) moving average, weighted moving average (WMA), exponentially weighted moving average (EWMA), cumulative sum (CUSUM) and early aberration reporting system C1C3 [^{9}]. The epidemiological parameters (sensitivity, specificity and time to detect) were computed using the simulation environment. A minimal set using Step 3 of the proposed framework was identified as [WMA, CUSUM, C1, C3].
The CAIF variable list was found to be:
{N=9,ρ=ρkappa,TA=0.5,K=4,Z=6,Tu=1.15,Td=0.5,M=12,Δ=7} 
The CAIF simulator was setup to perform numerous iterations to produce a large variety of point assignment using randomized point assignment strategy where only unique combinations of points for each set were allowed. This produced a scatter plot of specificity against sensitivity, over which kmeans clustering was applied to identify points that lie within the desired AOIs (Figure 12).
Identified areas of interest
From Table 1, the three clusters of interest representing the AOIs were 2, 5 and 10 with the following centroids (98.35, 53.42), (66.50, 94.63) and (86.89, 94.41). For AOI(^{1}), none of the point assignments provided a better result than simply running WMA algorithm which yielded (99.17, 52.12) as the specificity and sensitivity values. Thus, the conclusion was that the proposed framework does not provide any benefit in cases when highest possible specificity is desired. On the other hand, for AOI(^{2}), the identified centroid of (66.50, 94.63) provided a cluster with about 125 point assignments some of which provided better results than any single algorithm.
Cluster centres
Cluster  Specificity (%)  Sensitivity (%) 
1  92.94  88.15 
2  98.35  53.42 
3  84.93  92.50 
4  90.15  87.38 
5  66.50  94.63 
6  88.28  90.78 
7  94.52  54.74 
8  89.10  54.39 
9  81.46  95.92 
10  86.89  94.41 
For AOI(^{3}), the identified centroid of (86.89, 94.41) is quite close to the result produced by EARS C3 algorithm. However, this cluster has over 200 point assignments some of which yield higher sensitivity and specificity values than EARS C3 which provides the best pair from all algorithms in the candidate set. For example, the following rule set yields (86.39, 95.50):
〈110,260,312,403,535,612〉 
[1R : NA→0 points2R : NA→0 points3R : φWMA=true→2 points4R : φC1=true→3 points5R : λd>Tu*λd−1→5 points6R : ωd>Td*ωd−1→2 points] 
[1L : φCUSUM=false→1 point2L : φC3=false→6 points3L : φWMA=false→1 point4L : NA→0 points5L : λd<Tu*λd−1→3 points6L : ωd<Td*ωd−1→1 point] 
Next, one of the rule sets from the AOI(^{3}) cluster were applied to a sample outbreak within the simulated data sets and confirm its effectiveness. (Figure 13) illustrates a snapshot that superimposes daily counts during outbreak mode along with computed confidence measure using the above rule set.
Simulated outbreak analysis
As shown, the framework suggests an outbreak day with confidence measure of +1 ( 112or 8.33% positive confidence) on day 6, a day before an outbreak is going to start (point A). Although a false positive decision, it is a weak false positive that aids in planning for the following day which will have a strong positive confidence measure of +7 translating to 712 or 58.3% positive confidence (point B). This is exactly what the aim of this framework was set to be, that is, identify start of an outbreak with some level of confidence measure at an early stage. Further to note, as the outbreak progresses, the confidence seems to drop to negative values. This is because the framework is intended to monitor initial start of an outbreak. As the values stabilize during an outbreak, the confidence measure of start of an outbreak will diminish as expected.
A detailed step by step simulation results for the proposed framework have been provided in [^{10}].
The rule set for AOI (^{3}) from previous section was applied to a subset of real emergency room visit data from the Canadian Early Warning System (CEWS).
As shown in Figure 14, one of the key observations is that the indication that an outbreak is going to occur in the next few days was identified by a higher confidence value on Day 8, which was most likely the first day of an outbreak curve with peak on Day 11. Further, the confidence measure was computed based on a minimal set identified by the proposed framework and not the entire set of nine algorithms. That is, the minimal set identified by the proposed framework was sufficient to detect the start of an event a few days earlier than it was actually detected.
The following is some analysis of some of the days with interesting observations.
Using the proposed framework, the identification with significant confidence would have been detected on Day 8 and initial start of some activity instead of delayed identification which most likely occurred on Day 11.
Application of CIAF to Real Scenario
The following list highlights some limitations of the proposed framework and thus potential areas for future research:
A novel aberration interpretation framework has been proposed for producing a confidence based system decision focusing on high confidence values at the start of an outbreak. The framework comprises of multiple steps to allow identification of a subset of algorithms as well as a dynamic point assignment scheme for computing a balanced decision.
The proposed framework provides a multitude of benefits:
The proposed framework is also adaptable or extensible. It captures the essential elements of a confidence based decision process.
1A single outbreak usually lasts more than one day
Many thanks to Dr Robert McLeod of Electrical and Computer Engineering department at the University of Manitoba for his support and guidance throughout the course of this research.
1.  Jackson M, Baer A, Painter I, Duchin J. A simulation study comparing aberration detection algorithms for syndromic surveillanceBMC Medical Informatics and Decision Making 72007; 
2.  Centre for Disease Control (CDC)Data Sets Available at: http://www.bt.cdc.gov/surveillance/ears/datasets.asp 
3.  Trochim W. Correlation Available at: http://www.socialresearchmethods.net/kb/statcorr.htm 
4.  Jacob C. A coefficient of agreement for nominal scalesEducational and Psychological Measurement 20:37–46.1960; 
5.  Landis J, Koch G. The measurement of observer agreement for categorical dataBiometrics 33(1):159–174.1977; 
6.  Waner S, Costenoble S. Linear Regression Available at: http://people.hofstra.edu/faculty/stefan_waner/realworld/tutorialsf0/frames1_5.html 
7.  WikipediaKMeans Algorithm Available at: http://en.wikipedia.org/wiki/Kmeans_algorithm 
8.  The R Package for Statistical Computing Available at: http://www.rproject.org/ 
9.  Hutwagner L, Thompson W, Seeman G, Treadwell T. The Bioterrorism Preparedness and Response Early Aberration Reporting System (EARS)Journal of Urban Health: Bulletin of the New York Academy of Medicine 80(2, Supplement 1):i89–i96.2003; 
10.  Mukhi S. An Integrated approach to RealTime Biosurveillance in a Federated Data Source EnvironmentPhD Thesis. University of Manitoba; June;2007 
Article Categories:
Keywords: Health, surveillance, outbreak, bioterrorism, anomaly, syndromic, confidence, infectious disease. 