Functions to convert rows annotations to ranges and RaggedExperiment to RangedSummarizedExperiment
Source:R/simplifyTCGA.R
simplifyTCGA.Rd
This group of functions will convert row annotations as either gene symbols or miRNA symbols to row ranges based on database resources 'TxDB' and 'org.Hs' packages. It will also simplify the representation of RaggedExperiment objects to RangedSummarizedExperiment.
Usage
simplifyTCGA(obj, keep.assay = FALSE, unmapped = TRUE)
symbolsToRanges(obj, keep.assay = FALSE, unmapped = TRUE)
mirToRanges(obj, keep.assay = FALSE, unmapped = TRUE)
CpGtoRanges(obj, keep.assay = FALSE, unmapped = TRUE)
qreduceTCGA(obj, keep.assay = FALSE, suffix = "_simplified")
Arguments
- obj
A
MultiAssayExperiment
object obtained fromcuratedTCGAData
- keep.assay
logical (default FALSE) Whether to keep the
SummarizedExperiment
assays that have been converted toRangedSummarizedExperiment
- unmapped
logical (default TRUE) Include an assay of data that was not able to be mapped in reference database
- suffix
character (default "_simplified") A character string to append to the newly modified assay for
qreduceTCGA
.
Value
A
MultiAssayExperiment
with any gene expression, miRNA, copy number, and mutations converted to
RangedSummarizedExperiment
objects
Details
The original SummarizedExperiment
containing either gene symbol
or miR annotations is replaced or supplemented by a
RangedSummarizedExperiment
for those that could be mapped to
GRanges, and optionally another
SummarizedExperiment
for annotations that could not be mapped to
GRanges.
qreduceTCGA
Using TxDb.Hsapiens.UCSC.hg19.knownGene
as the reference, qreduceTCGA
reduces the data by applying either the weightedmean
or nonsilent
function (see below) to non-mutation or mutation data, respectively.
Internally, it uses RaggedExperiment::qreduceAssay()
to reduce the ranges
to the gene-level.
qreduceTCGA
will update genome(x)
based on the NCBI reference annotation
which includes the patch number, e.g., GRCh37.p14, as provided by the
seqlevelsStyle
setter, seqlevelsStyle(gn) <- "NCBI"
. qreduceTCGA
uses the NCBI genome annotation as the default reference.
RaggedExperiment
mutation objects become a genes by patients
RangedSummarizedExperiment
object containing '1' if there is a non-silent
mutation somewhere in the gene, and '0' otherwise as obtained from the
Variant_Classification
column in the data.
weightedmean <- function(scores, ranges, qranges) {
isects <- GenomicRanges::pintersect(ranges, qranges)
sum(scores * BiocGenerics::width(isects)) /
sum(BiocGenerics::width(isects))
}
"CNA" and "CNV" segmented copy number are reduced using a weighted mean in the rare cases of overlapping (non-disjoint) copy number regions.
These functions rely on TxDb.Hsapiens.UCSC.hg19.knownGene
and
org.Hs.eg.db
to map to the 'hg19' NCBI build. Use the liftOver
procedure
for datasets that are provided against a different reference genome (usually
'hg18'). See an example in the vignette.
Examples
library(curatedTCGAData)
library(GenomeInfoDb)
accmae <-
curatedTCGAData(diseaseCode = "ACC",
assays = c("CNASNP", "Mutation", "miRNASeqGene", "GISTICT"),
version = "1.1.38",
dry.run = FALSE)
#> Querying and downloading: ACC_CNASNP-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: ACC_miRNASeqGene-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: ACC_Mutation-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: ACC_colData-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: ACC_metadata-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: ACC_sampleMap-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> harmonizing input:
#> removing 655 sampleMap rows not in names(experiments)
## update genome annotation
rex <- accmae[["ACC_Mutation-20160128"]]
## Translate build to "hg19"
tgenome <- vapply(genome(rex), translateBuild, character(1L))
genome(rex) <- tgenome
accmae[["ACC_Mutation-20160128"]] <- rex
simplifyTCGA(accmae)
#> 403 genes were dropped because they have exons located on both strands of the
#> same reference sequence or on more than one reference sequence, so cannot be
#> represented by a single genomic range.
#> Use 'single.strand.genes.only=FALSE' to get all the genes in a GRangesList
#> object, or use suppressMessages() to suppress this message.
#> Warning: cannot switch some hg19's seqlevels from UCSC to NCBI style
#> 'select()' returned 1:1 mapping between keys and columns
#> Warning: The 2 combined objects have no sequence levels in common. (Use
#> suppressWarnings() to suppress this warning.)
#> Warning: more than one seqlevels style supplied, using the 1st one only
#> Warning: cannot switch some hg19's seqlevels from UCSC to NCBI style
#> Warning: 'experiments' dropped; see 'drops()'
#> harmonizing input:
#> removing 270 sampleMap rows not in names(experiments)
#> Warning: cannot switch some hg19's seqlevels from UCSC to NCBI style
#> harmonizing input:
#> removing 80 sampleMap rows not in names(experiments)
#> A MultiAssayExperiment object of 4 listed
#> experiments with user-defined names and respective classes.
#> Containing an ExperimentList class object of length 4:
#> [1] ACC_Mutation-20160128_simplified: RangedSummarizedExperiment with 22912 rows and 90 columns
#> [2] ACC_CNASNP-20160128_simplified: RangedSummarizedExperiment with 22912 rows and 180 columns
#> [3] ACC_miRNASeqGene-20160128_ranged: RangedSummarizedExperiment with 1002 rows and 80 columns
#> [4] ACC_miRNASeqGene-20160128_unranged: SummarizedExperiment with 44 rows and 80 columns
#> Functionality:
#> experiments() - obtain the ExperimentList instance
#> colData() - the primary/phenotype DataFrame
#> sampleMap() - the sample coordination DataFrame
#> `$`, `[`, `[[` - extract colData columns, subset, or experiment
#> *Format() - convert into a long or wide DataFrame
#> assays() - convert ExperimentList to a SimpleList of matrices
#> exportClass() - save data to flat files