Skip to contents

Additional helper functions for cleaning and uncovering metadata within a downloaded MultiAssayExperiment from curatedTCGAData.

Usage

getSubtypeMap(multiassayexperiment)

getClinicalNames(diseaseCode)

TCGAsplitAssays(multiassayexperiment, sampleCodes = NULL, exclusive = FALSE)

sampleTables(multiassayexperiment, vial = FALSE)

Arguments

multiassayexperiment

A MultiAssayExperiment object

diseaseCode

A TCGA cancer code (e.g., "BRCA")

sampleCodes

character (default NULL) A string of sample type codes (refer to data(sampleTypes); TCGAsplitAssays section)

exclusive

logical (default FALSE) Whether to return only assays that contain all codes in sampleCodes

vial

(logical default FALSE) whether to display vials in the table output

Value

  • getSubtypeMap: A data.frame with explanatory names and their in-data variable names. They may not be present for all cancer types.

  • getClinicalNames: A vector of common variable names that may be found across several cancer disease codes.

Details

Note that for getSubtypeMap, the column of in-data variable names may need to go through make.names to be found in the colData of the MultiAssayExperiment.

getSubtypeMap

provides a two column data.frame with interpreted names and in-data variable names. 'Name' usually refers to the colData row names a.k.a. the patientID.

getClinicalNames

provides a vector of common variable names that exist in the colData DataFrame of a curatedTCGAData MultiAssayExperiment object. These variables are directly obtained from the BroadFirehose clinical data (downloaded with getFirehoseData) and tend to be present across cancer disease codes.

TCGAsplitAssays

Separates samples by indicated sample codes into different assays in a MultiAssayExperiment. Refer to the sampleTypes data object for a list of available codes. This operation generates n times the number of assays based on the number of sample codes entered. By default, all assays will be split by samples present in the data.

sampleTables

Display all the available samples in each of the assays

Examples


library(curatedTCGAData)

gbm <- curatedTCGAData("GBM", c("RPPA*", "CNA*"), version = "2.0.1", FALSE)
#> Querying and downloading: GBM_CNACGH_CGH_hg_244a-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: GBM_CNACGH_CGH_hg_415k_g4124a-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: GBM_CNASNP-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: GBM_RPPAArray-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: GBM_colData-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: GBM_metadata-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: GBM_sampleMap-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> harmonizing input:
#>   removing 5922 sampleMap rows not in names(experiments)

getSubtypeMap(gbm)
#>          GBM_annotations                          GBM_subtype
#> 1             Patient_ID                                 Case
#> 2   methylation_subtypes                 MGMT promoter status
#> 3      mutation_subtypes                    IDH/codel subtype
#> 4  histological_subtypes                            Histology
#> 5          mrna_subtypes                     Original Subtype
#> 6          mrna_subtypes                Transcriptome Subtype
#> 7          mrna_subtypes    Pan-Glioma RNA Expression Cluster
#> 8          mrna_subtypes  IDH-specific RNA Expression Cluster
#> 9   methylation_subtypes   Pan-Glioma DNA Methylation Cluster
#> 10  methylation_subtypes IDH-specific DNA Methylation Cluster
#> 11  methylation_subtypes   Supervised DNA Methylation Cluster
#> 12  methylation_subtypes          Random Forest Sturm Cluster
#> 13      protein_subtypes                         RPPA cluster

sampleTables(gbm)
#> $`GBM_CNACGH_CGH_hg_244a-20160128`
#> 
#>  01  10  11 
#> 267 145  26 
#> 
#> $`GBM_CNACGH_CGH_hg_415k_g4124a-20160128`
#> 
#>  01  10 
#> 169 169 
#> 
#> $`GBM_CNASNP-20160128`
#> 
#>  01  02  10  11 
#> 577  13 488  26 
#> 
#> $`GBM_RPPAArray-20160128`
#> 
#>  01  02 
#> 233  11 
#> 

TCGAsplitAssays(gbm, c("01", "10"))
#> Warning: Some 'sampleCodes' not found in assays
#> Warning: Inconsistent barcode lengths: 28, 27
#> A MultiAssayExperiment object of 7 listed
#>  experiments with user-defined names and respective classes.
#>  Containing an ExperimentList class object of length 7:
#>  [1] 01_GBM_CNACGH_CGH_hg_244a-20160128: RaggedExperiment with 81512 rows and 267 columns
#>  [2] 10_GBM_CNACGH_CGH_hg_244a-20160128: RaggedExperiment with 81512 rows and 145 columns
#>  [3] 01_GBM_CNACGH_CGH_hg_415k_g4124a-20160128: RaggedExperiment with 57975 rows and 169 columns
#>  [4] 10_GBM_CNACGH_CGH_hg_415k_g4124a-20160128: RaggedExperiment with 57975 rows and 169 columns
#>  [5] 01_GBM_CNASNP-20160128: RaggedExperiment with 602338 rows and 577 columns
#>  [6] 10_GBM_CNASNP-20160128: RaggedExperiment with 602338 rows and 488 columns
#>  [7] 01_GBM_RPPAArray-20160128: SummarizedExperiment with 208 rows and 233 columns
#> Functionality:
#>  experiments() - obtain the ExperimentList instance
#>  colData() - the primary/phenotype DataFrame
#>  sampleMap() - the sample coordination DataFrame
#>  `$`, `[`, `[[` - extract colData columns, subset, or experiment
#>  *Format() - convert into a long or wide DataFrame
#>  assays() - convert ExperimentList to a SimpleList of matrices
#>  exportClass() - save data to flat files

getClinicalNames("COAD")
#>  [1] "years_to_birth"                      
#>  [2] "vital_status"                        
#>  [3] "days_to_death"                       
#>  [4] "days_to_last_followup"               
#>  [5] "tumor_tissue_site"                   
#>  [6] "pathologic_stage"                    
#>  [7] "pathology_T_stage"                   
#>  [8] "pathology_N_stage"                   
#>  [9] "pathology_M_stage"                   
#> [10] "gender"                              
#> [11] "date_of_initial_pathologic_diagnosis"
#> [12] "days_to_last_known_alive"            
#> [13] "radiation_therapy"                   
#> [14] "histological_type"                   
#> [15] "residual_tumor"                      
#> [16] "number_of_lymph_nodes"               
#> [17] "race"                                
#> [18] "ethnicity"