Bias Associated with Mining Electronic Health Records

George Hripcsak, Charles Knirsch, Li Zhou, Adam Wilcox, Genevieve Melton


Large-scale electronic health record research introduces biases compared to traditional manually curated retrospective research. We used data from a community-acquired pneumonia study for which we had a gold standard to illustrate such biases. The challenges include data inaccuracy, incompleteness, and complexity, and they can produce in distorted results. We found that a naïve approach approximated the gold standard, but errors on a minority of cases shifted mortality substantially. Manual review revealed errors in both selecting and characterizing the cohort, and narrowing the cohort improved the result. Nevertheless, a significantly narrowed cohort might contain its own biases that would be difficult to estimate.

Full Text: