Skip to content

Nationwide COVID-19 health records at your fingertips via data enclave, accelerating research and care

A new National Center for Advancing Translational Sciences initiative now provides clinicians and researchers a centralized national data resource known as the National COVID Cohort Collaborative (N3C) to accelerate advances in COVID-19 research and care.

The N3C is the product of a partnership between the National Center for Advancing Translational Sciences-supported Clinical and Translational Science Awards (CTSA) Program hubs, the National Center for Data to Health (CD2H), Clinical and Translational Research networks, the National Patient-Centered Clinical Research Network, Observational Health Data Sciences and Informatics, Accrual to Clinical Trials/Informatics for Integrating Biology and the Bedside, and TriNetX Global Health Research Network. The NCATS has overall stewardship of the N3C project.

Over 70 institutions across the country have committed to providing electronic health records of people who were tested for COVID-19 or who had related symptoms. In addition, data is submitted of individuals infected with pathogens that can support comparative studies, including SARS 1, MERS and H1N1. Once obtained, the data is harmonized into a consistent format for utilization within the data enclave.

“There is no shortage of clinical data within institutions; however, in the United States these data are not structured the same way nor are they accessible for shared analytics by our nation’s scientists,” according to a N3C informational brochure. “The N3C aims to unite COVID-19 data, enabling innovative machine learning and statistical analyses that require a large amount of data – more than is available in any given institution.”

There are three data access tiers within the secure cloud-based N3C platform available to registered users. The synthetic dataset contains artificial data designed to be statistically comparable to the originally submitted data. This data does not include any protected health information (PHI). The second tier is a de-identified data set that also contains no PHI. A third tier, referred to as limited data, contains two identifying factors of patient health: dates of service and patient zip code.

Once in the N3C enclave, researchers can utilize one of the data tiers to analyze and research COVID-19 treatments and its interactions with other health conditions, including diabetes, critical care, kidney disease,
immunocompromised patients, pregnancy, and many more. The curated data spans different time periods, geographic locations, and levels of care.

Multiple domain teams are set up within the N3C enclave to enable researchers with shared interests, such as cardiology, emergency services or medicine, neurology, or oncology, to analyze and collaborate more efficiently. Domain teams are comprised of clinical and subject matter experts, statisticians, informaticists, and machine learning specialists.

Suzanne McCahan, PhD, assistant research scientist with Nemours Biomedical Research Informatics Center, explained that researchers and clinicians can get involved in the N3C enclave by joining an existing domain team or proposed project, but they also have the option to propose new research initiatives.

“I am really excited about N3C, not only because it will provide wide and wise access to data that will enable many professional and citizen scientists to find important relationships involving COVID-19 infection, clinical features, concomitant medical conditions, and social-environmental circumstances, but also because it will serve as a model for the provision of data concerning so many other national health crises from asthma to sickle cell disease,” Julian Solway, M.D., Director of the Institute for Translational Medicine at the University of Chicago, shared on the N3C website. “N3C demonstrates that, as a nation, we can share data for the common good. Let’s build on its success to speed health improvement and disease prevention for all Americans.”