Skip to contents

The NIH Human Microbiome Project (HMP) was a longitudinal study conducted from 2007 to 2012 across four institutions (Baylor College of Medicine, the Broad Institute, the J. Craig Venter Institute, and Washington University) of healthy adults aged 18 to 40 that produced a comprehensive reference for the composition, diversity, and variation of the healthy human microbiome. This SummarizedExperiment-class object represents 16S rRNA sequencing data for variable regions 1–3 that was performed on samples collected at five major body sites – available participant metadata as well as phylogenetic trees are included.

Usage

V13(metadata = FALSE)

Format

A SummarizedExperiment-class object with 43,140 features and 2,898 samples:

colData

RSID

a random subject identifier

VISITNO

visit number, between 1 and 3

SEX

sex, female or male

RUN_CENTER

center where sample sequencing took place: Baylor College of Medicine (BCM), the Broad Institute (BI), the J. Craig Venter Institute (JCVI), or the Genome Sequencing Center at Washington University (WUGC)

HMP_BODY_SITE

body site where the sample was collected

HMP_BODY_SUBSITE

body subsite where the sample was collected

SRS_SAMPLE_ID

a sample identifier to be used when comparing 16S rRNA samples to whole metagenome shotgun (WMS) samples

rowData

CONSENSUS_LINEAGE

the most detailed lineage description shared by the sequences within an OTU

SUPERKINGDOM

superkingdom taxonomy, assumed to be Bacteria

PHYLUM

phylum taxonomy parsed from CONSENSUS_LINEAGE

CLASS

calss taxonomy parsed from CONSENSUS_LINEAGE

ORDER

order taxonomy parsed from CONSENSUS_LINEAGE

FAMILY

family taxonomy parsed from CONSENSUS_LINEAGE

GENUS

genus taxonomy parsed from CONSENSUS_LINEAGE

Source

The following source information is derived from the HMP Data Analysis and Coordination Center:

Following a July 2010 16S data freeze, data was downloaded from NCBI SRA projects SRP002395: Human Microbiome Project 16S rRNA Clinical Production Phase I, and SRP002012: Human Microbiome Project 454 Clinical Production Pilot. This dataset corresponds to over 5,700 samples and over 10,000 sequence preps. 16S variable region 3–5 (V35) was sequenced for the entire set of samples, and variable region 1–3 (V13) for a subset of samples.

The QIIME (Quantitative Insights Into Microbial Ecology) software package was used to process HMP 16S data using an OTU-binning strategy to which taxonomic classification is added.

Raw 16S sequence and metadata, available through https://tinyurl.com/y7ev836z, were demultiplexed using QIIME. OTU picking was performed for the V1–3 and V3–5 region sequences using OTUPipe, which includes error correction, chimera checking through UCHIME, and clustering via UCLUST, and postprocessing by picking the optimal representative sequence centroid. Taxonomy was assigned using the RDP classifier version 2.2.

The resulting OTU tables were checked for mislabeling and contamination, as described in the SOP available through https://tinyurl.com/y7ev836z. Alpha and beta diversity for each sample and Procrustes analysis were established using QIIME with default parameters.

All QIIME output files are available through https://tinyurl.com/y7ev836z, for both the V1–3 and V3–5 variable regions, as well as Procrustes summary data. SOPs and custom scripts are also available through https://tinyurl.com/y7ev836z.

If you're interested in joint analysis of 16S and shotgun metagenomic datasets from the HMP, pairing up data from the same microbiome samples can initially seem tricky. The HMP Sample Flow Schematic indicates how these sample IDs are related experimentally, and provides tables joining 16S dataset "SN" and "PSN" identifiers with metagenomic dataset "SRS" identifiers.

Four files were used to construct this SummarizedExperiment-class object.

OTU table file with PSN identifiers: https://tinyurl.com/y74gqpho

Subject metadata files with PSN identifiers: https://tinyurl.com/y8adlfso

Subject metadata files with SRS identifiers: https://tinyurl.com/ybmn7q8m

Representative sequence phylogenetic trees: https://tinyurl.com/ybp8mzgj

Arguments

metadata

logical; if TRUE only the metadata is downloaded, rather than the entire resource

Note

The "PSN" identifiers were used as the colnames of the SummarizedExperiment-class object, see source for additional information.

See also

Examples

V13()
#> see ?HMP16SData and browseVignettes('HMP16SData') for documentation
#> loading from cache
#> class: SummarizedExperiment 
#> dim: 43140 2898 
#> metadata(2): experimentData phylogeneticTree
#> assays(1): 16SrRNA
#> rownames(43140): OTU_97.1 OTU_97.10 ... OTU_97.9997 OTU_97.9999
#> rowData names(7): CONSENSUS_LINEAGE SUPERKINGDOM ... FAMILY GENUS
#> colnames(2898): 700013549 700014386 ... 700114963 700114965
#> colData names(7): RSID VISITNO ... HMP_BODY_SUBSITE SRS_SAMPLE_ID