The NIH Human Microbiome Project (HMP) was a longitudinal study conducted
from 2007 to 2012 across four institutions (Baylor College of Medicine, the
Broad Institute, the J. Craig Venter Institute, and Washington University) of
healthy adults aged 18 to 40 that produced a comprehensive reference for the
composition, diversity, and variation of the healthy human microbiome. This
SummarizedExperiment-class
object
represents 16S rRNA sequencing data for variable regions 3–5 that was
performed on samples collected at five major body sites – available
participant metadata as well as phylogenetic trees are included.
Format
A SummarizedExperiment
with
45,383 features and 4,743 samples:
colData
- RSID
a random subject identifier
- VISITNO
visit number, between 1 and 3
- SEX
sex, female or male
- RUN_CENTER
center where sample sequencing took place: Baylor College of Medicine (BCM), the Broad Institute (BI), the J. Craig Venter Institute (JCVI), or the Genome Sequencing Center at Washington University (WUGC)
- HMP_BODY_SITE
body site where the sample was collected
- HMP_BODY_SUBSITE
body subsite where the sample was collected
- SRS_SAMPLE_ID
a sample identifier to be used when comparing 16S rRNA samples to whole metagenome shotgun (WMS) samples
rowData
- CONSENSUS_LINEAGE
the most detailed lineage description shared by the sequences within an OTU
- SUPERKINGDOM
superkingdom taxonomy, assumed to be Bacteria
- PHYLUM
phylum taxonomy parsed from
CONSENSUS_LINEAGE
- CLASS
calss taxonomy parsed from
CONSENSUS_LINEAGE
- ORDER
order taxonomy parsed from
CONSENSUS_LINEAGE
- FAMILY
family taxonomy parsed from
CONSENSUS_LINEAGE
- GENUS
genus taxonomy parsed from
CONSENSUS_LINEAGE
Source
The following source information is derived from the HMP Data Analysis and Coordination Center:
Following a July 2010 16S data freeze, data was downloaded from NCBI SRA projects SRP002395: Human Microbiome Project 16S rRNA Clinical Production Phase I, and SRP002012: Human Microbiome Project 454 Clinical Production Pilot. This dataset corresponds to over 5,700 samples and over 10,000 sequence preps. 16S variable region 3–5 (V35) was sequenced for the entire set of samples, and variable region 1–3 (V13) for a subset of samples.
The QIIME (Quantitative Insights Into Microbial Ecology) software package was used to process HMP 16S data using an OTU-binning strategy to which taxonomic classification is added.
Raw 16S sequence and metadata, available through https://tinyurl.com/y7ev836z, were demultiplexed using QIIME. OTU picking was performed for the V1–3 and V3–5 region sequences using OTUPipe, which includes error correction, chimera checking through UCHIME, and clustering via UCLUST, and postprocessing by picking the optimal representative sequence centroid. Taxonomy was assigned using the RDP classifier version 2.2.
The resulting OTU tables were checked for mislabeling and contamination, as described in the SOP available through https://tinyurl.com/y7ev836z. Alpha and beta diversity for each sample and Procrustes analysis were established using QIIME with default parameters.
All QIIME output files are available through https://tinyurl.com/y7ev836z, for both the V1–3 and V3–5 variable regions, as well as Procrustes summary data. SOPs and custom scripts are also available through https://tinyurl.com/y7ev836z.
If you're interested in joint analysis of 16S and shotgun metagenomic datasets from the HMP, pairing up data from the same microbiome samples can initially seem tricky. The HMP Sample Flow Schematic indicates how these sample IDs are related experimentally, and provides tables joining 16S dataset "SN" and "PSN" identifiers with metagenomic dataset "SRS" identifiers.
Four files were used to construct this
SummarizedExperiment-class
object.
OTU table file with PSN identifiers: https://tinyurl.com/y9rbpjl7
Subject metadata files with PSN identifiers: https://tinyurl.com/yaz35f22
Subject metadata files with SRS identifiers: https://tinyurl.com/y9xjqm29
Representative sequence phylogenetic trees: https://tinyurl.com/y9exxlgr
Value
A SummarizedExperiment
object
Note
The "PSN" identifiers were used as the colnames
of the
SummarizedExperiment
object, see source
for additional information.
Examples
V35()
#> see ?HMP16SData and browseVignettes('HMP16SData') for documentation
#> loading from cache
#> class: SummarizedExperiment
#> dim: 45383 4743
#> metadata(2): experimentData phylogeneticTree
#> assays(1): 16SrRNA
#> rownames(45383): OTU_97.1 OTU_97.10 ... OTU_97.9998 OTU_97.9999
#> rowData names(7): CONSENSUS_LINEAGE SUPERKINGDOM ... FAMILY GENUS
#> colnames(4743): 700013549 700014386 ... 700114717 700114750
#> colData names(7): RSID VISITNO ... HMP_BODY_SUBSITE SRS_SAMPLE_ID