Create a MultiAssayExperiment from specific assays and cohorts
Source:R/curatedTCGAData.R
curatedTCGAData.Rd
curatedTCGAData assembles data on-the-fly from ExperimentHub to
provide cohesive
MultiAssayExperiment
container objects. All the user has to do is to provide TCGA disease
code(s) and assay types. It is highly recommended to use the companion
package TCGAutils
, developed to work with TCGA data specifically from
curatedTCGAData
and some flat files.
Usage
curatedTCGAData(
diseaseCode = "*",
assays = "*",
version,
dry.run = TRUE,
verbose = TRUE,
...
)
Arguments
- diseaseCode
character() A vector of TCGA cancer cohort codes (e.g.,
COAD
)- assays
character() A vector of TCGA assays, glob matches allowed; see below for more details
- version
character(1) One of
1.1.38
,2.0.1
,2.1.0
, or2.1.1
indicating the data version to obtain fromExperimentHub
. Version2.1.1
includes various improvements as well as the addition of theRNASeq2Gene
assay and subtype updates. Seeversion
section details.- dry.run
logical(1) Whether to return the dataset names before actual download (default TRUE)
- verbose
logical(1) Whether to show the dataset currenlty being (down)loaded (default TRUE)
- ...
Additional arguments passed on to the
ExperimentHub
constructor
Value
a
MultiAssayExperiment of
the specified assays and cancer codes or informative data.frame of
resources when dry.run
is TRUE
Details
This function will check against available resources in
ExperimentHub. Only the latest runDate ("2016-01-28") is supported. Use the
dry.run = FALSE
to download remote datasets and build an integrative
MultiAssayExperiment
object. For a list of 'diseaseCodes', see the
curatedTCGAData-package help page.
Available Assays
Below is a list of partial ExperimentList
assay names and their respective
description. These assays can be entered as part of the assays
argument in the main function. Partial glob matches are allowed such as:
'CN*'
for "CNASeq", "CNASNP", "CNVSNP" assays. Credit: Ludwig G.
ExperimentList data types Description
----------------------------------------------------------------------------
SummarizedExperiment*
RNASeqGene Gene expression values
RNASeq2Gene RSEM TPM gene expression values
RNASeq2GeneNorm Upper quartile log2 normalized RSEM TPM gene
expression values
miRNAArray Probe-level miRNA expression values
miRNASeqGene Gene-level log2 RPM miRNA expression values
mRNAArray Unified gene-level mRNA expression values
mRNAArray_huex Gene-level mRNA expression values from Affymetrix
Human Exon Array
mRNAArray_TX_g4502a Gene-level mRNA expression values from Agilent
244K Array
mRNAArray_TX_ht_hg_u133a Gene-level mRNA expression values from Affymetrix
Human Genome U133 Array
GISTIC_AllByGene Gene-level GISTIC2 copy number values
GISTIC_ThresholdedByGene Gene-level GISTIC2 thresholded discrete copy
number values
RPPAArray Reverse Phase Protein Array normalized protein
expression values
RangedSummarizedExperiment
GISTIC_Peaks GISTIC2 thresholded discrete copy number values
in recurrent peak regions
SummarizedExperiment with HDF5Array DelayedMatrix
Methylation_methyl27 Probe-level methylation beta values from Illumina
HumanMethylation 27K BeadChip
Methylation_methyl450 Probe-level methylation beta values from Infinium
HumanMethylation 450K BeadChip
RaggedExperiment
CNASNP Segmented somatic Copy Number Alteration calls
from SNP array
CNVSNP Segmented germline Copy Number Variant calls from
SNP Array
CNASeq Segmented somatic Copy Number Alteration calls
from low pass DNA Sequencing
Mutation* Somatic mutations calls
CNACGH_CGH_hg_244a Segmented somatic Copy Number Alteration calls
from CGH Agilent Microarray 244A
CNACGH_CGH_hg_415k_g4124a Segmented somatic Copy Number Alteration calls
from CGH Agilent Microarray 415K
* All can be converted to RangedSummarizedExperiment (except RPPAArray) with
TCGAutils
version
Version 2.1.1
provides a couple of corrections to the colData
for ovarian
cancer (OV
) and skin cancer (SKCM
). In these new data, the cancer
subtype variables are fully available. One get obtain the mapping of columns
to subtypes in the colData
with the getSubtypeMap
function in
TCGAutils
.
Version 2.1.0
provides gene-level log2 RPM miRNA expression values for
miRNASeqGene
data log2 normalized RSEM for RNASeq2GeneNorm
assays.
Previously, the data provided were read counts and normalized counts,
respectively. See issue #53 on GitHub for
additional details.
The new version 2.0.1
includes various improvements including an
additional assay that provides RNASeq2Gene
data as RSEM TPM gene
expression values (issue #38). Additional changes include genomic
information for RaggedExperiment
type data objects where '37' is now
'GRCh37' as reported in issue #40. Datasets (e.g., OV, GBM) that contain
multiple assays that could be merged are now provided as merged assays
(issue #27). We corrected an issue where mRNAArray
assays were returning
DataFrame
s instead of matrix
type data (issue #31). Version 1.1.38
provides the original run of curatedTCGAData
and is provided due to legacy
reasons.
Examples
curatedTCGAData(
diseaseCode = c("GBM", "ACC"), assays = "CNASNP", version = "2.0.1"
)
#> See '?curatedTCGAData' for 'diseaseCode' and 'assays' inputs
#> Querying EH with: ACC_CNASNP-20160128
#> Querying EH with: GBM_CNASNP-20160128
#> ah_id title file_size rdataclass rdatadateadded
#> 1 EH4737 ACC_CNASNP-20160128 0.8 Mb RaggedExperiment 2021-01-27
#> 2 EH4875 GBM_CNASNP-20160128 5.2 Mb RaggedExperiment 2021-01-27
#> rdatadateremoved
#> 1 <NA>
#> 2 <NA>
curatedTCGAData("BRCA", "GISTIC*", "2.0.1")
#> See '?curatedTCGAData' for 'diseaseCode' and 'assays' inputs
#> Querying EH with: BRCA_GISTIC_AllByGene-20160128
#> Querying EH with: BRCA_GISTIC_Peaks-20160128
#> Querying EH with: BRCA_GISTIC_ThresholdedByGene-20160128
#> ah_id title file_size
#> 1 EH4773 BRCA_GISTIC_AllByGene-20160128 1.2 Mb
#> 2 EH4774 BRCA_GISTIC_Peaks-20160128 0 Mb
#> 3 EH4775 BRCA_GISTIC_ThresholdedByGene-20160128 0.3 Mb
#> rdataclass rdatadateadded rdatadateremoved
#> 1 SummarizedExperiment 2021-01-27 <NA>
#> 2 RangedSummarizedExperiment 2021-01-27 <NA>
#> 3 SummarizedExperiment 2021-01-27 <NA>