My research spans epidemiology, biostatistics, bioinformatics, and computational biology, so can offer a variety of research opportunities for students and interns with an interest in data analysis or scientific software development, particularly using R and Bioconductor.

Integrative Analysis of Multi-Assay Genomic Experiments

This project develops scalable R / Bioconductor software infrastructure and data resources tointegrate complex, heterogeneous, and large cancer genomic experiments. The falling cost of genomic assays facilitates collection of multiple data types (e.g., gene and transcript expression, structural variation, copy number, methylation, and microRNA data) from a set of clinical specimens. Furthermore, substantial resources are now available from large consortium activities like The Cancer Genome Atlas (TCGA). Existing analysis pipelines focus on the treatment of a specific data type, leaving a critical need for tools for integrative analysis of multiple genomic assays. R / Bioconductor provides standardized genomic data structures and annotations that have enjoyed widespread adoption in the cancer genomics research community. This project adapts R / Bioconductor to meet the increasing conceptual and computational complexity of multi-assay cancer genomic experiments, including creation of the MultiAssayExperiment package.

Public Health Human Microbiome Analysis

I am interested in the role of the human microbiome as an ongoing link between host and environment, and its role in human health and disease. Along with the laboratory of Nicola Segata at the University of Trento, we developed the curatedMetagenomicData package for Bioconductor, which provides curated microbiome profiles for thousands of human-associated microbiomes. As part of the New York City Health and Nutrition Examination Study, we have profiled the oral microbiome of a representative sample of the population of NYC. This study has collected extensive lifestyle, health, and socio-demographic data, in addition to oral rinse specimens for microbiome analysis, from a randomized population-representative sample of 1,500 adults. Among other things, this study will evaluate changes to the oral cavity caused by a wide range of tobacco exposures (cigarette, secondhand smoke, hookah, and e-cigarette) from a racially and ethnically diverse, population-based sample of NYC adults.

Multi-omic Analysis of Molecular Subtypes of Cancer

Ovarian cancer is a molecularly heterogeneous disease in which clinically similar cases can exhibit dramatically different response to treatment. Several major studies have identified potential ovarian cancer molecular subtypes, while several others have been unable to do so. Each study reporting subtypes has offered related but different definitions, and has presented limited and different datasets for validating subtype discreteness and association to patient outcome. Thus the robustness and clinical utility of transcriptome subtypes of high-grade, serous ovarian remain controversial. This study provided a systematic, comparative meta-analysis of proposed molecular subtypes, and is using DNA sequencing data from The Cancer Genome Atlas to investigate the fundamental suitability of proposed molecular subtypes for targeted treatment through understanding the heterogeneity of subtypes within a single tumor.