BIDS-BCHSI Research Xchange Forum -- Synthetic Electronic Health Record

BIDS-BCHSI Research Xchange Forum 
Date: Monday, March 1, 2021 
Time: 12:30-1:30 PM Pacific Time
Location: Virtual Participation 
Register to receive the virtual access link.



12:30-1:30 PM  Research Talk Haley Hunter-Zinck, 2019-2021 I4H Fellow. 

TITLE: Comparison of synthetic electronic health record data generation techniques for training predictive clinical models

ABSTRACT:  Synthetic data is gaining attention for facilitating electronic health records (EHR) data access for building predictive clinical models.  Currently, there are several methodologies for generating synthetic data. Some rely on access to real and patient-level EHR data, such as methods based on generative adversarial networks or other machine learning or statistical techniques.  Others, such as Synthea, do not depend on record level EHR access and use publicly available and aggregate data resources.  Here, we perform quantitative and qualitative comparisons of different synthetic data generation methodologies for the purpose of building clinical predictive models using EHR data. We formulate comparable synthetic datasets with CorGAN and Synthea using the Veteran Health Administration’s COVID-19 Shared Data Resource as a template and a benchmark.  Using each synthetic dataset, we train predictive models to predict COVID-19 outcomes such as transfer to the intensive care unit or mortality and validate the synthetically trained models on a real test dataset to measure and compare model utility.  We also qualitatively compare synthetic data generators on aspects such as privacy risks, required data inputs, as well as an assessment of manual effort and computational requirements for training the generators.  

The BIDS-BCHSI Research Xchange Forum is an open discussion platform for the interdisciplinary exchange of ideas and research projects at the intersection of healthcare and data science. Participants are invited to engage in a variety of activities, including presentations of work-in-progress, discussions and critiques of recent papers and AI methods in healthcare, introductions to new tools and methods, and opportunities to foster new collaborations. Invited speakers include leading voices in AI and Healthcare, and active conversations invite participants to share fresh perspectives. Clinicians and physicians with an interest in data science methods and tools, as well as data science faculty and researchers with applications or interests in the healthcare and health sciences, are welcome and encouraged to participate.  Regular participants will also include the I4H Fellows, as well as post-docs, staff, and faculty from UC Berkeley, UCSF, and Johnson & Johnson. The immediate goals of this Forum are to share our current research projects with a wider audience, and to increase engagement and improve communication among the three host organizations. Meetings will be held virtually on the first Monday of each month at 12:30-1:30 PM Pacific Time, and interested members of the UC Berkeley, UCSF, and Johnson & Johnson communities are invited to sign up for our mailing list to receive upcoming individual webinars.  Please contact for more information.


Haley Hunter-Zinck
UC Berkeley, UCSF
March 1, 2021
12:30-1:30 PM PST
Virtual Participation


UCSF Bakar Computational health Sciences Institute
BIDS logo
Janssen - Pharmaceutical Companies of Johnson & Johnson
UC Berkeley
Johnson & Johnson Innovation