Data Science

The Erdős Institute

I was a member of The Erdős Institute for the Fall 2023 cohort, where I completed their Data Science Boot Camp with distinction, in recognition of my group's capstone project. (Certificate)

Projects

The Silent Emergency - Predicting Preterm Birth

Capstone Project for Data Science Boot Camp

Preterm birth is a primary cause of infant mortality and morbidity in the United States, affecting approximately 1 in 10 births. The rates are notably higher among Black women (14.6%), compared to White (9.4%) and Hispanic women (10.1%). Despite its prevalence, predicting preterm birth remains challenging due to its multifaceted etiology rooted in environmental, biological, genetic, and behavioral interactions. Our project harnesses machine learning techniques to predict preterm birth using electronic health records. This data intersects with social determinants of health, reflecting some of the interactions contributing to preterm birth. Recognizing that under-representation in healthcare research perpetuates racial and ethnic health disparities, we take care to use diverse data to ensure equitable model performance across underrepresented populations.

Project page link (includes 5 min summary video)   GitHub link

Preterm_Birth_Exec _Summary.pdf

Executive Summary

Preterm Birth Prediction Slides.pdf

Slides

Umpiring in the Age of Technology - A Study of Pitch Calling by MLB Umpires

In 2024, the pitch calling abilities of home plate umpires are under more scrutiny than ever: entities like Ump Scorecards publish summaries of home plate umpire performance for every Major League Baseball (MLB) game on social media and all AAA-level games in Minor League Baseball (MiLB) used the automated ball-strike (ABS) challenge system (as of June 25th). Given the importance of correct calls to both game outcomes and fans - as well as the looming possibility of an ABS system in MLB games - we investigate both:

GitHub link

Scatter plot of correct pitch calls by horizontal and vertical pitch location

Correct pitch calls by horizontal and vertical pitch location

Pitch Calling - Executive Summary.pdf

Executive Summary

Classifying 2024 MLB Pitches

No two pitchers truly throw the same pitch the same way. For example, Tim Hill threw his four-seam fastball with an average of 17.9in of arm-side movement, 4.4in of vertical movement, and at 90.7 MPH. Mason Miller, on the other hand, threw his four-seam fastball with 9.7in of arm-side movement, 16.6in of vertical movement, and at 100.9 MPH. In this project, we classify pitch types using observable data, paying special attention to the inclusion/exclusion of pitcher IDs and the differences between classical machine learning models and a simple neural network. 

GitHub link

Classifying 2024 MLB Pitches - Executive Summary.pdf

Executive Summary

DataCamp

I have supplemented my work with The Erdős Institute and other independent projects through DataCamp by completing the following tracks of courses:

Developing Large Language Models

SQL Fundamentals