Current Projects

Predicting Sepsis in the Intensive Care Unit

Building a classifier on large-scale EHR data to accurately predict a common hospital complication within the ICU and improve patient outcomes.

Analytics: Naive Bayes, logistic regression, time-series forecasting, decision trees, support vector machines Tools: PostgreSQL, Unix shell, R (ggplot2, caret), Python (sklearn)

SMS Analysis

Providing user-specific analytics and visualizations of iPhone text messages, include response rate, average response time, and sentiment score. Code available on GitHub.

Analytics: sentiment analysis, natural language processing Tools: R, Python

Mental Health and Substance Use in New York City

A Department of Health / Columbia University GRAPH initiative to estimate the DALYs and costs associated with mental illness in NYC and map the disparities in psychopathology and substance use.

Analytics: Markov chains, Monte-Carlo simulation, cost-effectiveness analysis, multiple imputation Tools: Stata, Treeage, ArcGIS, R (maptools, sp)

DNA Methylation Analysis

Epigenetic research analyzing capture bisulfite sequencing data to identify differentially methylated regions. Focus on ovarian and breast cancer. Sample R code available on GitHub.

Analytics: regression analysis, local likelihood smoothing Tools: R, Unix shell

Childhood Obesity and the Environment

An analysis of prenatal environmental exposures and food frequency data and their effect on childhood obesity. Sample SAS code available on GitHub.

Analytics: regression analysis, data imputation, hypothesis testing Tools: SAS, Unix shell


A collaboration with Frank Chen where snippets of data science, programming, and public health come together to share a meal. Check out our series on physician misconduct, MDMA data, electronic health records, and more at our GitHub repo.

Visualizations: dot graph with slider, stacked area chart with hover-overs Tools: R (ggplot2, dplyr), D3.js

Past Projects

Visualizing Ebola

A contribution to Daniel Chen’s interactive plotting project to visualize the Ebola data using RStudio’s web application framework Shiny. Check out the working prototype and the GitHub repo.

Visualizations: Rickshaw line graph, NVD3 bar chart Tools: R (Shiny, rCharts)

Multigenerational Households

A survival analysis using data from the General Social Survey to investigate the effects of multigenerational households on mortality risk. Manuscript currently under review. Sample Stata code available on GitHub.

Analytics: cox proportional-hazards regression, seemingly unrelated regression, factor analysis Tools: Stata, Unix shell

NYC Open Data

Using NYC Open Data to map district and city-wide issues and produce statistical reports to inform government action. Check out the presentation slides, blog post, and visualization.

Visualizations: choropleth maps, heatmaps Tools: R, QGIS

County Health Rankings Map

An interactive visualization of County Health Rankings data using the RStudio’s web application framework Shiny. Check out the source code and the accompanying presentation slides.

Visualizations: choropleth maps Tools: R (Shiny)

Ban The Bins

Crowd-sourced data collection and social media campaign to map illegal clothing donation bins in New York City using the Usahidi Web platform. Visit the crowd map website!

Justice in Housing Court

Leading web and research support for Intro 214, a bill to provide legal representation to low-income tenants facing eviction in New York City. Sample policy memo available here.


A state-wide high school outreach program focusing on cultural and educational enrichment.

JR Podcast

A slice-of-life podcast documenting the long-distance friendship of two college friends and their philosophical musings. Check out the first episode!