Leonard Apeltsin, PhD

Leonard Apeltsin
Former Fellow

Dr. Apeltsin holds a PhD in Biomedical Informatics from UCSF. He is a co-founder of Primer.ai, a machine learning company that focuses on natural language processing. Dr. Apeltsin’s book, “Data Science Bookcamp”, has been featured on the best-seller list of Manning Publishing. His specialties include large-scale analytics, genomics, text analysis, and advanced machine learning techniques.


PROJECTS 

Lupus & Pulmonary Arterial Hypertension: This project proposes a machine learning framework to diagnose disease onset and progression from electronic health records. The framework will be applied to two debilitating diseases that are notoriously difficult to diagnose; Lupus and Pulmonary Arterial Hypertension. Both structured and unstructured patient record data will be utilized to train the models. The unstructured data will originate primarily from recorded clinical notes. These notes provide a valuable source of the signal. Unfortunately, the notes are also highly disordered in their contents. Thus, state-of-the-art language models, such as BERT and XLNet, might need to be fine-tuned to extract value from the text data. Ideally, the predictive signal in the clinical texts will be made interpretable, in order to gain physician trust. All validated models will be publicly released, once rigorous algorithmic measures are taken to ensure full patient privacy protection.


 

Leonard Apeltsin, PhD

UC Berkeley, UCSF

Dr. Apeltsin holds a PhD in Biomedical Informatics from UCSF. He is a co-founder of Primer.ai, a machine learning company that focuses on natural language processing. Dr. Apeltsin’s book, “Data Science Bookcamp”, has been featured on the best-seller list of Manning Publishing. His specialties include large-scale analytics, genomics, text analysis, and advanced machine learning techniques.


PROJECTS 

Lupus & Pulmonary Arterial Hypertension: This project proposes a machine learning framework to diagnose disease onset and progression from electronic health records. The framework will be applied to two debilitating diseases that are notoriously difficult to diagnose; Lupus and Pulmonary Arterial Hypertension. Both structured and unstructured patient record data will be utilized to train the models. The unstructured data will originate primarily from recorded clinical notes. These notes provide a valuable source of the signal. Unfortunately, the notes are also highly disordered in their contents. Thus, state-of-the-art language models, such as BERT and XLNet, might need to be fine-tuned to extract value from the text data. Ideally, the predictive signal in the clinical texts will be made interpretable, in order to gain physician trust. All validated models will be publicly released, once rigorous algorithmic measures are taken to ensure full patient privacy protection.


 


In News / Blogs

UCSF Bakar Computational health Sciences Institute
BIDS logo
Janssen - Pharmaceutical Companies of Johnson & Johnson
UC Berkeley
Johnson & Johnson Innovation