Dr. Apeltsin holds a PhD in Biomedical Informatics from UCSF. He is a co-founder of Primer.ai, a machine learning company that focuses on natural language processing. Dr. Apeltsin’s book, “Data Science Bookcamp”, has been featured on the best-seller list of Manning Publishing. His specialties include large-scale analytics, genomics, text analysis, and advanced machine learning techniques.
PROJECTS
Lupus & Pulmonary Arterial Hypertension: This project proposes a machine learning framework to diagnose disease onset and progression from electronic health records. The framework will be applied to two debilitating diseases that are notoriously difficult to diagnose; Lupus and Pulmonary Arterial Hypertension. Both structured and unstructured patient record data will be utilized to train the models. The unstructured data will originate primarily from recorded clinical notes. These notes provide a valuable source of the signal. Unfortunately, the notes are also highly disordered in their contents. Thus, state-of-the-art language models, such as BERT and XLNet, might need to be fine-tuned to extract value from the text data. Ideally, the predictive signal in the clinical texts will be made interpretable, in order to gain physician trust. All validated models will be publicly released, once rigorous algorithmic measures are taken to ensure full patient privacy protection.
Dr. Apeltsin holds a PhD in Biomedical Informatics from UCSF. He is a co-founder of Primer.ai, a machine learning company that focuses on natural language processing. Dr. Apeltsin’s book, “Data Science Bookcamp”, has been featured on the best-seller list of Manning Publishing. His specialties include large-scale analytics, genomics, text analysis, and advanced machine learning techniques.
PROJECTS
Lupus & Pulmonary Arterial Hypertension: This project proposes a machine learning framework to diagnose disease onset and progression from electronic health records. The framework will be applied to two debilitating diseases that are notoriously difficult to diagnose; Lupus and Pulmonary Arterial Hypertension. Both structured and unstructured patient record data will be utilized to train the models. The unstructured data will originate primarily from recorded clinical notes. These notes provide a valuable source of the signal. Unfortunately, the notes are also highly disordered in their contents. Thus, state-of-the-art language models, such as BERT and XLNet, might need to be fine-tuned to extract value from the text data. Ideally, the predictive signal in the clinical texts will be made interpretable, in order to gain physician trust. All validated models will be publicly released, once rigorous algorithmic measures are taken to ensure full patient privacy protection.