Using Principal Component Analysis to Identify Priority Neighbourhoods for Health Services Delivery by Ranking Socioeconomic Status

Christine Elizabeth Friesen, Patrick Seliske, Andrew Papadopoulos


Objectives. Changes to the Canadian Census in 2010 led to the creation of the National Household Survey (NHS). The voluntary nature of the NHS has important implications to health research in Canada, as the validity of its data used for socioeconomic status (SES) index creation, especially income variables, is questionable. This study sought to determine the appropriateness of replacing census income information with tax filer data to produce SES neighbourhood indices.

Methods. Census and taxfiler data for Guelph, Ontario were retrieved for the years 2005, 2006, and 2011. Data were extracted for eleven income and non-income SES variables. Principal component analysis was utilized to identify significant principal components from each dataset and weights of each contributing variable. Variable-specific factor scores were applied to standardized census and taxfiler data values to produce SES scores.

Results. The substitution of taxfiler income variables for census income variables yielded SES score distributions and neighbourhood SES classifications that were similar to SES scores calculated using entirely census variables. Combining taxfiler income variables with census non-income variables also produced clearer SES level distinctions.

Conclusion. Identifying socioeconomic disparities between neighbourhoods is an important step in assessing the level of disadvantage of communities, and the presented method can be adapted to other locales for such a purpose. The ability to replace census income information with taxfiler data to develop SES indices will increase the versatility of public health research and planning in Canada, and contribute to the improvement of SES measurement and calculation methods.

Full Text:



Online Journal of Public Health Informatics * ISSN 1947-2579 *