Predicting Sepsis in the Intensive Care Unit
Building a classifier on large-scale EHR data to accurately predict a common hospital complication within the ICU and improve patient outcomes.
Analytics: Naive Bayes, logistic regression, time-series forecasting, decision trees, support vector machines Tools: PostgreSQL, Unix shell, R (ggplot2, caret), Python (sklearn)
Providing user-specific analytics and visualizations of iPhone text messages, include response rate, average response time, and sentiment score. Code available on GitHub.
Analytics: sentiment analysis, natural language processing Tools: R, Python
Mental Health and Substance Use in New York City
A Department of Health / Columbia University GRAPH initiative to estimate the DALYs and costs associated with mental illness in NYC and map the disparities in psychopathology and substance use.
Analytics: Markov chains, Monte-Carlo simulation, cost-effectiveness analysis, multiple imputation Tools: Stata, Treeage, ArcGIS, R (maptools, sp)
Epigenetic research analyzing capture bisulfite sequencing data to identify differentially methylated regions. Focus on ovarian and breast cancer. Sample R code available on GitHub.
Analytics: regression analysis, local likelihood smoothing Tools: R, Unix shell
Childhood Obesity and the Environment
An analysis of prenatal environmental exposures and food frequency data and their effect on childhood obesity. Sample SAS code available on GitHub.
Analytics: regression analysis, data imputation, hypothesis testing Tools: SAS, Unix shell
A collaboration with Frank Chen where snippets of data science, programming, and public health come together to share a meal. Check out our series on physician misconduct, MDMA data, electronic health records, and more at our GitHub repo.
Visualizations: dot graph with slider, stacked area chart with hover-overs Tools: R (ggplot2, dplyr), D3.js
Visualizations: Rickshaw line graph, NVD3 bar chart Tools: R (Shiny, rCharts)
A survival analysis using data from the General Social Survey to investigate the effects of multigenerational households on mortality risk. Manuscript currently under review. Sample Stata code available on GitHub.
Analytics: cox proportional-hazards regression, seemingly unrelated regression, factor analysis Tools: Stata, Unix shell
Visualizations: choropleth maps, heatmaps Tools: R, QGIS
Visualizations: choropleth maps Tools: R (Shiny)
A state-wide high school outreach program focusing on cultural and educational enrichment.
A slice-of-life podcast documenting the long-distance friendship of two college friends and their philosophical musings. Check out the first episode!