Data Science
The Erdős Institute
I was a member of The Erdős Institute for the Fall 2023 cohort, where I completed their Data Science Boot Camp with distinction, in recognition of my group's capstone project. (Certificate)
Projects
The Silent Emergency - Predicting Preterm Birth
Capstone Project for Data Science Boot Camp
Preterm birth is a primary cause of infant mortality and morbidity in the United States, affecting approximately 1 in 10 births. The rates are notably higher among Black women (14.6%), compared to White (9.4%) and Hispanic women (10.1%). Despite its prevalence, predicting preterm birth remains challenging due to its multifaceted etiology rooted in environmental, biological, genetic, and behavioral interactions. Our project harnesses machine learning techniques to predict preterm birth using electronic health records. This data intersects with social determinants of health, reflecting some of the interactions contributing to preterm birth. Recognizing that under-representation in healthcare research perpetuates racial and ethnic health disparities, we take care to use diverse data to ensure equitable model performance across underrepresented populations.
Project page link (includes 5 min summary video) GitHub link

Executive Summary

Slides
Umpiring in the Age of Technology - A Study of Pitch Calling by MLB Umpires
In 2024, the pitch calling abilities of home plate umpires are under more scrutiny than ever: entities like Ump Scorecards publish summaries of home plate umpire performance for every Major League Baseball (MLB) game on social media and all AAA-level games in Minor League Baseball (MiLB) used the automated ball-strike (ABS) challenge system (as of June 25th). Given the importance of correct calls to both game outcomes and fans - as well as the looming possibility of an ABS system in MLB games - we investigate both:
the efficacy of machine learning models to predict umpires' ball/strike calls, and
what features most impact calling balls and strikes (outside of pitch location).
Correct pitch calls by horizontal and vertical pitch location

Executive Summary
Classifying 2024 MLB Pitches
No two pitchers truly throw the same pitch the same way. For example, Tim Hill threw his four-seam fastball with an average of 17.9in of arm-side movement, 4.4in of vertical movement, and at 90.7 MPH. Mason Miller, on the other hand, threw his four-seam fastball with 9.7in of arm-side movement, 16.6in of vertical movement, and at 100.9 MPH. In this project, we classify pitch types using observable data, paying special attention to the inclusion/exclusion of pitcher IDs and the differences between classical machine learning models and a simple neural network.

Executive Summary
DataCamp
I have supplemented my work with The Erdős Institute and other independent projects through DataCamp by completing the following tracks of courses:
Developing Large Language Models
Introduction to Deep Learning with PyTorch
Intermediate Deep Learning with PyTorch
Deep Learning for Text with PyTorch
Introduction to LLMs in Python
Working with Llama 3
LLMOps Concepts
SQL Fundamentals
Introduction to SQL
Intermediate SQL
Joining Data in SQL
Data Manipulation in SQL
PostgreSQL Summary Stats and Window Functions
Functions for Manipulating Data in PostgreSQL
Database Design