Journal Information

Article Information

Community Based Research Network: Opportunities for Coordination of Care, Public Health Surveillance, and Farmworker Research



Introduction: The lack of aggregated longitudinal health data on farmworkers has severely limited opportunities to conduct research to improve their health status. To correct this problem, we have created the infrastructure necessary to develop and maintain a national Research Data Repository of migrant and seasonal farmworker patients and other community members receiving medical care from Community and Migrant Health Centers (C/MHCs). Project specific research databases can be easily extracted from this repository.

Methods: The Community Based Research Network (CBRN) has securely imported and merged electronic health records (EHRs) data from five geographically dispersed C/MHCs. To demonstrate the effectiveness of our data aggregation methodologies, we also conducted a small pilot study using clinical, laboratory and demographic data from the CBRN Data Repository from two initial C/MHCs to evaluate HbA1c management.

Results: Overall, there were 67,878 total patients (2,858 farmworkers) that were seen by two C/MHCs from January to August 2013. A total of 94,189 encounters were captured and all could be linked to a unique patient. HbA1c values decreased as the number of tests or intensity of testing increased.

Conclusion: This project will inform the foundation for an expanding collection of C/MHC data for use by clinicians for medical care coordination, by clinics to assess quality of care, by public health agencies for surveillance, and by researchers under Institutional Review Board (IRB) oversight to advance understanding of the needs and capacity of the migrant and seasonal farmworker population and the health centers that serve them. Approved researchers can request data that constitute a Limited Data Set from the CBRN Data Repository to establish a specific research database for their project.


Little is known about farmworker health on a national basis and in extensive reviews of the health of farmworkers in the U.S., Villarejo noted a particular lack of nationwide clinical health data needed for health care coordination, surveillance, and health outcomes and hypothesis-driven research. U.S. farmworkers experience a disproportionate frequency of injuries and illnesses associated with their work and significant barriers to healthcare access [1]. The lack of accessible medical care data and aggregated longitudinal health data on farmworkers has severely limited provision of optimal health care for this vulnerable and often mobile population. This includes limitations on continuity in health care, including needed follow up care; as well as systematic inclusion of farmworkers in reportable disease surveillance systems [2] and health services and epidemiologic research. In order to address these issues, we have created a community-academic partnership, establishing the Community Based Research Network (CBRN) with initial funding by the National Institute of Environmental Health Sciences. CBRN has built the necessary partnerships and infrastructure to securely import and merge electronic health records (EHRs) data from five Community and Migrant Health Centers (C/MHCs) across the U.S. into a Health Information Exchange platform using Business Associate Agreements with each health center.



CBRN consists of two community (National Center for Farmworker Health – NCFH and Salud Family Health Centers, Ft. Lupton, CO) and three research/academic partners (University of Texas, Texas A&M, Battelle) with a steering committee consisting of one representative from each partner. Each research/academic partner secured IRB approval from their respective institutions, including approval of a HIPAA Waiver of Authorization based upon CBRN's status as a research project [3]. NCFH, with support from the steering committee, identified five C/MHCs, one each in Colorado, New York, Washington, California, and Michigan, who meet inclusion criteria based upon their geographic distribution, patient population (including farmworkers), current use and facility with an Electronic Health Record (EHR) system, and willingness to share patient medical records, including personal identifiers. Centex Support Systems Services (Centex), a Health Information Exchange (HIE) capable of providing safe HIPAA secure health information exchange services, was brought onboard by contract, and established protocols for collecting data and providing security assurances through Business Associate Agreements (BAAs) with each of the five C/MHCs.

In order to assure representation of the participating health centers in decision making, a representative from each Center was selected to participate on the CBRN National Advisory Committee, which then elected one additional member to serve on the CBRN Steering Committee. The CBRN Steering Committee, the main governing body, developed a process for the review and approval of requests by external researchers to use CBRN data. Data sharing requires unanimous approval by the Steering Committee, oversight by an IRB, and a data use agreement assuring protection of the data and confidentiality. All shared data must meet the standards of a Limited Data Set as defined by HIPAA.

Data Collection

Centex developed the technology necessary to extract EHR data in coordination with each individual Center, extracting data on a quarterly basis to update the prospective longitudinal medical records database called the CBRN Data Repository. In order to insure the integrity of the data, Centex would conduct an initial completeness check, reviewing the numbers of patients and visits in the data maintained by the Center with those downloaded by Centex. Additional quality control measures include:

  • Patient matching and merging: Centex maintains a Master Patient Index (MPI) that contains medical record numbers, other patient id numbers, SSNs, names, dates of birth, addresses and other identifiers from those providing the data. These data are then processed by an Entity Identification Service (Mirth Match), which utilizes a data matching algorithm from the Oklahoma Department of Mental Health that involves blocking, matching weights, and a threshold of likelihood to minimize data duplication. Each patient is then assigned a unique identifier within MPI, allowing associated HIE components to find, exchange and reference patient data.

  • Data Validation and Semantic Interoperability: Centex uses its proprietary data validation engine (Trails) to validate incoming clinical data. This program has three basic validation procedures:

    • The program checks validity (not accuracy) of clinical codes and formats of dates and source IDs.

    • Data failing this first step is automatically stored for manual inspection and intervention.

    • Data accepted by the above steps is transformed for semantic operability, where source data (such as sex, race, language, marital status, etc.) are mapped to a standardized code and format used by the warehouse.

Data Sharing

We used a pilot study to test and evaluate the data sharing process and the usefulness of the data contained in the Research Data Repository. The Steering Committee created and approved a data query request to allow Centex to release data to two members of the Steering Committee for analyses. The query included all records dated between 1/1/13-8/9/13 from the two health centers which were initially recruited into the project.

Data were exported from the repository in 11 comma delimited files, with linkages available through random identifiers. The files all met Limited Data Set requirements, and were transferred using a secure FTP protocol. The structure of the files is provided in Figure 1, where linking identifiers are designated in bold.

Figure 1

Relational Database Structure for CBRN Research Database


Each of the files received were evaluated descriptively. The number of observations for each of these files were: Patient (N=67,878), Provider (N=242), Encounter (N=94,189), Procedure (N=289,952), Vital Signs (N=1,112,559), Laboratory (N=909,555), Diagnoses (N=165,848), Medications (N=674,783), Immunizations (N=30,712), Allergies (N=15,488), and Next of Kin (N=4,508). The two investigators then reviewed the data files for consistency, missing values, and invalid values or coding errors to verify the quality and completeness of the data. None of the variables were deemed unusable due to a data quality issue.

Pilot Study

The pilot study selected to study management of HbA1c levels given the large burden of diabetes in the CBRN research population, and its usefulness in demonstrating how prospective linkages between provider visit and laboratory data could provide indicators to assess quality of care and evaluate health care systems targeting low-income community-based patients. HbA1c values were examined over time by farmworker status, gender, and ethnicity. Effectiveness of HbA1c monitoring was evaluated by intensity (number of tests in the evaluation period) and changes in HbA1c levels. Finally a linear regression was constructed to identify variables associated increasing levels of HbA1c.

For the pilot study, CBRN proceeded by undertaking various data management procedures to link the Patient, Encounter, and Laboratory files and to create a merged dataset. Indicator variables were created next to classify patients as farmworker vs. non-farmworker, and to identify the sequential ordering of laboratory values (e.g., HbA1c) over time. Creating the farmworker indicator variable was straightforward requiring only two variables with no missing values from the Patient file.

On the other hand, identifying which laboratory variables to use to identify HbA1c results was a bit confusing, as the file used "test group" for ordering labs, and "test item" for returned labs - but partially completed this field when ordering. Only a small proportion of the laboratory observations (n=6,000; 0.7%) contained missing values because they had not yet been reported and entered into the health record. The biggest challenge was that not all laboratory entries could be linked to a patient encounter. It turned out that this was a structural problem in the way laboratory orders are entered into the EMRs. Centex has been able to establish a protocol that largely eliminates this problem for future use.

After all data management and editing procedures were completed, descriptive statistics were computed for person-level (e.g., demographics), encounter-level (type of clinic, duration of encounter) and laboratory-level variables (HbA1c values, sequential order).


Overall, there were 67,878 total patients (2,858 farmworkers) that were seen by two C/MHCs from January to August, 2013. Farmworkers (migrant, seasonal, or both) tended to be male, Hispanic, and Spanish speaking compared to other patients. A total of 94,189 encounters were captured and all could be linked to a unique patient. A description of the pilot patient population is shown by farmworker status in Table 1.

Table 1

Pilot Test Findings: Patient Population Description 1/2013-8/2013 (2 centers)

Migrant (M) Seasonal* (S) Total M&S Other Patients
503 (40.1)
751 (59.9)
0 (0.0)
863 (53.8)
741 (46.2)
0 (0.0)
1366 (47.8)
1492 (52.2)
0 (0.0)
37631 (57.9)
27381 (42.1)
8 (0.0)
< 5 years
5-<18 years
18+-64 years
65+ years
163 (13.0)
216 (17.2)
839 (66.9)
36 (2.9)
129 (8.0)
319 (19.9)
1067 (66.5)
89 (5.6)
292 (10.2)
535 (18.7)
1906 (66.7)
125 (4.4)
8612 (13.2)
14643 (22.5)
38337 (59.0)
3428 (5.3)
African Am/Black
American Indian
Native Hawaiian / Pacific
252 (20.1)
82 (6.5)
3 (0.2)
1 (0.1)
0 (0.0)
916 (73.0)
667 (41.6)
24 (1.5)
0 (0.0)
0 (0.0)
0 (0.0)
913 (56.9)
919 (32.2)
106 (3.7)
3 (0.1)
1 (0.0)
0 (0.0)
1829 (64.0)
41181 (63.3)
1875 (2.9)
725 (1.1)
257 (0.4)
21 (0.0)
20961 (32.2)
Not Hispanic/Latino
957 (76.3)
174 (13.9)
123 (9.8)
1133 (70.6)
421 (26.2)
50 (3.1)
2090 (73.1)
595 (20.8)
173 (6.0)
33453 (51.5)
28706 (44.1)
2861 (4.4)
178 (14.2)
911 (72.7)
165 (13.2)
525 (32.7)
1036 (64.6)
43 (2.7)
703 (24.6)
1947 (68.1)
208 (7.3)
40949 (63.0)
21627 (33.3)
2444 (3.8)
Marital Status
Legally Separated
445 (35.5)
267 (21.3)
5 (0.4)
6 (0.5)
10 (0.8)
7 (0.6)
514 (41.0)
740 (46.1)
598 (37.3)
15 (0.9)
7 (0.4)
32 (2.0)
30 (1.9)
182 (11.3)
1185 (41.5)
865 (30.3)
20 (0.7)
13 (0.5)
42 (1.5)
37 (1.3)
696 (24.4)
39216 (60.3)
16267 (25.0)
186 (0.3)
664 (1.0)
2441 (3.8)
1051 (1.6)
5195 (8.0)
12 (1.0)
1242 (99.0)
0 (0.0)
1604 (100.0)
12 (0.4)
2846 (99.6)
880 (1.4)
64140 (98.7)
3 (0.2)
1251 (99.8)
12 (0.7)
1592 (99.3)
15 (0.5)
2843 (99.5)
511 (0.8)
64509 (99.2)
2 (0.2)
1252 (99.8)
27 (1.7)
1577 (98.3)
29 (1.0)
2829 (99.0)
475 (0.7)
64545 (99.3)
Chronic Diagnosis
315 (25.1)
939 (74.9)
0 (0.0)
497 (31.0)
1107 (69.0)
0 (0.0)
812 (28.4)
2046 (71.6)
0 (0.0)
14274 (22.0)
50739 (78.0)
7 (0.0)

*Includes 24 records for patients that identified themselves as both M&S.

In our Pilot Laboratory Data File, 8,563 HbA1c laboratory test results were distributed among 7,158 patients. Patients were tested up to a total of five times in the period for which data were collected. The distribution of repeated tests was similar across farmworkers and non-farmworkers. Mean HbA1c values and their ranges are displayed by farmworker status, language, and gender and order of observation in Table 2. Mean values increased for the second and third tests, which was expected as only patients with higher values are likely to be re-tested multiple times.

Table 2

Average HbA1c values for the first test by demographic variables.

Variable 1st Test
Mean (Range; N)
2nd Test
Mean (Range; N)
3rd Test
Mean (Range; N)
Farmworker status
6.9 (4.5-14.6; n=400)
7.1 (4.2-17.8; n=6,758)
7.9 (5.2-13.0; n=80)
8.0 (4.8-17.4; n=1,160)
8.2 (5.4-14.0; n=18)
8.3 (5.0-16.0; n=131)
6.9 (4.3-16.2; n=4,249)
7.3 (4.2-17.8; n=2,907)
8.0 (4.8-17.3; n=724)
8.0 (4.8-17.4; n=516)
8.3 (5.0-14.3; n=92)
8.3 (4.7-16.0; n=57)
Not Hispanic
6.9 (4.2-15.8; n=2,839)
7.2 (4.5-17.8; n=4,203)
7.7 (4.8-17.4; n=496)
8.2 (4.8-17.3; n=732)
8.0 (4.7-16.0; n=61)
8.5 (5.0-14.0; n=87)

Note: Observations with missing/unknown demographic values are not shown.

The sample size was insufficient to examine the fourth and fifth tests.

When the change in HbA1c levels from an individual patient's first to last test was examined, HbA1c values decreased as the number of tests or intensity of testing increased (see Table 3). This finding is supported by the results from a recently published clinical trial of the effectiveness of patient-centered care in the control of type 2 diabetes [4]. A linear regression model was constructed that included farmworker status, gender, and ethnicity for the first observed HbA1c value only. An increasing level of HbA1c was associated with not being a farmworker vs. being a farmworker (Coef.=0.34; t=3.06; p=0.002), being male vs. being female (Coef.=0.38; t=7.49; p=0.00), and being Hispanic vs. non-Hispanic (Coef.=0.40; t=7.67; p=0.00).

Table 3

Average 1st Visit HbA1c values & change in HbA1C between 1st and last visit by # of Visits

All Patients with HbA1c Labs N Minimum Maximum Mean Std. Deviation
   First HbA1c Lab Value (%) 7158 4.2 17.8 7.093 2.1367
   Diff in HbA1c Lab Value (%) -8.20 8.50 -0.0363 0.67318
   Time Diff in HbA1c Labs (days) 0 238 22.06 51.821
Patients with only 1 HbA1c Lab N Minimum Maximum Mean Std. Deviation
First HbA1c Lab Value (%) 5918 4.3 17.8 6.863 2.0542
Diff in HbA1c Lab Value (%) 0.00 0.00 0.0000 0.00000
Time Diff in HbA1c Labs (days) 0 0 0.00 0.000
Patients with only 2 HbA1c Labs N Minimum Maximum Mean Std. Deviation
First HbA1c Lab Value (%) 1091 4.8 17.6 8.138 2.1834
Diff in HbA1c Lab Value (%) -8.20 8.50 -0.1889 1.57079
Time Diff in HbA1c Labs (days) 0 238 120.74 43.080
Patients with only 3 HbA1c Labs N Minimum Maximum Mean Std. Deviation
First HbA1c Lab Value (%) 136 4.9 13.7 8.510 2.0635
Diff in HbA1c Lab Value (%) -6.80 5.60 -0.2963 1.82892
Time Diff in HbA1c Labs (days) 24 234 176.40 35.700
Patients with only 4 HbA1c Labs N Minimum Maximum Mean Std. Deviation
First HbA1c Lab Value (%) 10 4.2 14.5 8.970 3.0096
Diff in HbA1c Lab Value (%) -5.30 1.00 -0.8300 1.76387
Time Diff in HbA1c Labs (days) 119 219 164.10 31.963
Patients with 5 HbA1c Labs N Minimum Maximum Mean Std. Deviation
First HbA1c Lab Value (%) 3 8.6 13.1 10.333 2.4214
Diff in HbA1c Lab Value (%) -5.10 0.00 -1.7667 2.88848
Time Diff in HbA1c Labs (days) 142 189 173.33 27.135
Process for Data Access

CBRN has developed an application process to promote qualified researchers under IRB supervision to include data from the CBRN Data Repository in their research. Proposed projects must be unanimously approved by the CBRN Steering Committee. Once approved, the researcher may request a query of the Data Repository to obtain fully anonymous count data in support their research grant. If the grant is funded, they will be able to request more detailed queries of the Data Repository to obtain data required for conducting their approved research. However, to obtain these data, they must sign a HIPAA compliant Data Use Agreement with Centex stating that the data provided will be securely maintained, will be used exclusively for the stated research purpose(s), that the identity of the individual health centers and individual subjects will not be conveyed with the data nor any attempt made to reconstruct these identities by receiving researchers, and that all data will be destroyed or returned once the research is completed. Centex will then create a Project Specific Database (PSD) by querying the CBRN Data Repository for the approved data request. The PSD will have personal identifiers replaced by random identifiers for each patient and C/MHC, but may include exact dates related to individual patients (birth and service dates) and zip code level personal address information as allowable in a Limited Data Set as defined by HIPAA.


In addition to the noted limitation of linking laboratory values to encounters, another challenge was how to work with demographic fields and other data associated with an individual rather than a clinic visit (encounter). These fields, including an important risk factor such as smoking status as well as farmworker status, are frequently over-written when updated. CBRN has had preliminary discussions with Centex to preserve these "historical" fields in a separate file so that the change in status for patients can be captured over time. This will be implemented as we move into the future and continue to prospectively add data to the Research Data Repository.

A more important concern for future research on farmworkers were the relatively few number of farmworkers in this dataset (<5%). We believe this may be an identification issue, where clinics tend to categorize patients with respect to how medical payments are made, rather than on potential research questions. We need to continue to investigate how a farmworker is identified and if it is based on a billing/payor assignment instead of a special population designation category. On a national basis, federally qualified health centers that are designated as a Migrant Health Center receive a portion of their annual grant as PHS 330 (g) funding in order to address the cost of care for farmworker patients. The amount of this grant is calculated on the basis of a projection of the number of patients to be seen and historically new start awards are seldom enough to cover the cost of care for more than two or three medical encounters, let alone provision of dental, behavioral, or ancillary services such as outreach, transportation, interpretation, or environmental services, which are essential to serving this population. Therefore, if a farmworker qualifies for other public or private third party reimbursement, the clinic staff might not document a patient's farmworker status; rather may classify the patient according to eligibility for payment. Possible misclassification of farmworkers by including them in the "other" category needs to be carefully assessed.

Additional research should further explore missing or additional information including linkage of family members. Of particular interest would be the linkage of mothers to their newborn children to facilitate reproductive studies. Despite these remaining challenges, we have documented the methodologies necessary to extract data from the data repository, using two health centers, and demonstrated our ability to meaningfully use the data for research. Data from three more participating C/MHCs have been transferred to Centex since our Pilot Study. These C/MHCs use electronic medical records systems developed by three EHR vendors, and further demonstrate the flexibility of the process created for incorporation of multiple health centers into the CBRN Research Data Repository.

Discussion and Conclusion

Over the last three years, we built a network with the infrastructure to facilitate future health services research, public health surveillance, and epidemiologic research of farmworkers, in comparison to non-farmworkers in the primary care setting. Our pilot results demonstrate that a limited dataset could be generated using EHR data merged from different C/MHCs. Further, it was shown that it was feasible to develop a cumulative dataset based on these data and that this dataset could support longitudinal surveillance, prevention, and research studies. Interpretation of these results is limited by the fact that we could not tell how many previous tests were conducted on these subjects – a problem that will diminish as ongoing longitudinal data collection includes much longer follow-up periods. Thus, our pilot study demonstrated that linkage of longitudinal patient encounter and laboratory data from multiple health centers could be successfully collected and merged to provide useful patient care and research information. Future use of this repository will be guided by mutually engaged partners including healthcare providers, community health organizations, and academic researchers.

This electronic linkage and resulting Data Repository provide an initial national source of both clinical, health data and farmworker population demographics upon which C/MHCs can better serve their patients, evaluate their success, participate in disease reporting to public health agencies, and demonstrate need. It will be sustained by improved opportunities for coordination of care, engagement by community-academic partnerships, desire for local and comparative data by C/MHCs, and by future research funding opportunities from federal agencies that have prioritized farmworker research in their vision and goals and current research agendas to improve health outcomes, reduce health disparities, and increase access to health care for underserved populations, including immigrant, Latino, and young workers in the agricultural sector. In 2011, there were 1,128 US federally-supported health centers that served over 20 million patients, including 862,808 agricultural workers and their dependents [5]. Expansion of this network to other C/MHCs could evolve into a pioneering demonstration of a national health information exchange.


This project was supported by the National Institute for Environmental Health Sciences, Grant No. 1RC4ES019405-01, Southwest Center for Agricultural Safety and Health at The University of Texas HSC at Tyler from (CDC/NIOSH Cooperative Agreement No. U50 OH07541), and Southwest Center for Occupational and Environmental Health, a NIOSH Education and Research Center, (Grant No. 5T42OH008421). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of CDC/NIOSH or NIH. This project could not have been possible without the information technology support of Centex Systems Support Services (Maurice Samuels, Bryan White, Anthony Nelson) and the enthusiastic support of participating Community and Migrant Health Centers, especially Jerry Brasher and Maria de Jesus Diaz-Perez from Salud Family Health Centers, and Mary Zelazny and Lawreen Duel from Finger Lakes Community Health.



Villarejo D. Health-related Inequities Among Hired Farm Workers and the Resurgence of Labor-intensive Agriculture. Kresge Foundation, 2012. Accessed 8/17/13.


M KlompasJ McVettaR LazarusE EgglestonG Haney 2012. Integrating clinical practice and public health surveillance using electronic medical record systems. Am J Prev Med. 42(6) (Suppl 2), S154-62. 10.1016/j.amepre.2012.04.00522704432


U.S. Department of Health and Human Services. Office of the Secretary. Federal Registry. 45 CFR Parts 160 and 164. [45 CFR 164.512 (i)(1)(i) and 45 CFR 164.512 (i)(2)(ii)] Standards for Privacy of Individually Identifiable Health Information; Final Rule. Accessed 10/8/13.


AS SlingerlandWH HermanWK RedekopRF DijkstraJW Jukema 2013. Stratified patient-centered care in type 2 diabetes: A cluster-randomized, controlled clinical trial of effectiveness and cost-effectiveness. Diabetes Care.; Epub ahead of print. 10.2337/dc12-186523949558


National Association of Community Health Centers (NACHC). United States Health Center Fact Sheet, 2011. Accessed 9/7/13.