This function removes variables that have a high number of missing data and contain keywords.
Usage
trimColData(
multiassayexperiment,
maxNAfrac = 0.2,
keystring = c("portion", "analyte")
)
Arguments
- multiassayexperiment
A
MultiAssayExperiment
object withcolData
- maxNAfrac
(numeric default 0.2) A decimal between 0 and 1 to indicate the amount of NA values allowed per column
- keystring
(character) A vector of keywords to match and remove variables
Value
A
MultiAssayExperiment
object
Examples
example(getSubtypeMap)
#>
#> gtSbtM> library(curatedTCGAData)
#>
#> gtSbtM> gbm <- curatedTCGAData("GBM", c("RPPA*", "CNA*"), version = "2.0.1", FALSE)
#> Querying and downloading: GBM_CNACGH_CGH_hg_244a-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: GBM_CNACGH_CGH_hg_415k_g4124a-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: GBM_CNASNP-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: GBM_RPPAArray-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: GBM_colData-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: GBM_metadata-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> Querying and downloading: GBM_sampleMap-20160128
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> loading from cache
#> harmonizing input:
#> removing 5922 sampleMap rows not in names(experiments)
#>
#> gtSbtM> getSubtypeMap(gbm)
#> GBM_annotations GBM_subtype
#> 1 Patient_ID Case
#> 2 methylation_subtypes MGMT promoter status
#> 3 mutation_subtypes IDH/codel subtype
#> 4 histological_subtypes Histology
#> 5 mrna_subtypes Original Subtype
#> 6 mrna_subtypes Transcriptome Subtype
#> 7 mrna_subtypes Pan-Glioma RNA Expression Cluster
#> 8 mrna_subtypes IDH-specific RNA Expression Cluster
#> 9 methylation_subtypes Pan-Glioma DNA Methylation Cluster
#> 10 methylation_subtypes IDH-specific DNA Methylation Cluster
#> 11 methylation_subtypes Supervised DNA Methylation Cluster
#> 12 methylation_subtypes Random Forest Sturm Cluster
#> 13 protein_subtypes RPPA cluster
#>
#> gtSbtM> sampleTables(gbm)
#> $`GBM_CNACGH_CGH_hg_244a-20160128`
#>
#> 01 10 11
#> 267 145 26
#>
#> $`GBM_CNACGH_CGH_hg_415k_g4124a-20160128`
#>
#> 01 10
#> 169 169
#>
#> $`GBM_CNASNP-20160128`
#>
#> 01 02 10 11
#> 577 13 488 26
#>
#> $`GBM_RPPAArray-20160128`
#>
#> 01 02
#> 233 11
#>
#>
#> gtSbtM> TCGAsplitAssays(gbm, c("01", "10"))
#> Warning: Some 'sampleCodes' not found in assays
#> Warning: Inconsistent barcode lengths: 28, 27
#> A MultiAssayExperiment object of 7 listed
#> experiments with user-defined names and respective classes.
#> Containing an ExperimentList class object of length 7:
#> [1] 01_GBM_CNACGH_CGH_hg_244a-20160128: RaggedExperiment with 81512 rows and 267 columns
#> [2] 10_GBM_CNACGH_CGH_hg_244a-20160128: RaggedExperiment with 81512 rows and 145 columns
#> [3] 01_GBM_CNACGH_CGH_hg_415k_g4124a-20160128: RaggedExperiment with 57975 rows and 169 columns
#> [4] 10_GBM_CNACGH_CGH_hg_415k_g4124a-20160128: RaggedExperiment with 57975 rows and 169 columns
#> [5] 01_GBM_CNASNP-20160128: RaggedExperiment with 602338 rows and 577 columns
#> [6] 10_GBM_CNASNP-20160128: RaggedExperiment with 602338 rows and 488 columns
#> [7] 01_GBM_RPPAArray-20160128: SummarizedExperiment with 208 rows and 233 columns
#> Functionality:
#> experiments() - obtain the ExperimentList instance
#> colData() - the primary/phenotype DataFrame
#> sampleMap() - the sample coordination DataFrame
#> `$`, `[`, `[[` - extract colData columns, subset, or experiment
#> *Format() - convert into a long or wide DataFrame
#> assays() - convert ExperimentList to a SimpleList of matrices
#> exportClass() - save data to flat files
#>
#> gtSbtM> getClinicalNames("COAD")
#> [1] "years_to_birth"
#> [2] "vital_status"
#> [3] "days_to_death"
#> [4] "days_to_last_followup"
#> [5] "tumor_tissue_site"
#> [6] "pathologic_stage"
#> [7] "pathology_T_stage"
#> [8] "pathology_N_stage"
#> [9] "pathology_M_stage"
#> [10] "gender"
#> [11] "date_of_initial_pathologic_diagnosis"
#> [12] "days_to_last_known_alive"
#> [13] "radiation_therapy"
#> [14] "histological_type"
#> [15] "residual_tumor"
#> [16] "number_of_lymph_nodes"
#> [17] "race"
#> [18] "ethnicity"
(gbm_trimmed <- trimColData(gbm))
#> A MultiAssayExperiment object of 4 listed
#> experiments with user-defined names and respective classes.
#> Containing an ExperimentList class object of length 4:
#> [1] GBM_CNACGH_CGH_hg_244a-20160128: RaggedExperiment with 81512 rows and 438 columns
#> [2] GBM_CNACGH_CGH_hg_415k_g4124a-20160128: RaggedExperiment with 57975 rows and 338 columns
#> [3] GBM_CNASNP-20160128: RaggedExperiment with 602338 rows and 1104 columns
#> [4] GBM_RPPAArray-20160128: SummarizedExperiment with 208 rows and 244 columns
#> Functionality:
#> experiments() - obtain the ExperimentList instance
#> colData() - the primary/phenotype DataFrame
#> sampleMap() - the sample coordination DataFrame
#> `$`, `[`, `[[` - extract colData columns, subset, or experiment
#> *Format() - convert into a long or wide DataFrame
#> assays() - convert ExperimentList to a SimpleList of matrices
#> exportClass() - save data to flat files
head(colData(gbm_trimmed))[1:5]
#> DataFrame with 6 rows and 5 columns
#> patientID years_to_birth vital_status days_to_death
#> <character> <integer> <integer> <integer>
#> TCGA-02-0001 TCGA-02-0001 44 1 358
#> TCGA-02-0003 TCGA-02-0003 50 1 144
#> TCGA-02-0004 TCGA-02-0004 59 1 345
#> TCGA-02-0006 TCGA-02-0006 56 1 558
#> TCGA-02-0007 TCGA-02-0007 40 1 705
#> TCGA-02-0009 TCGA-02-0009 61 1 322
#> tumor_tissue_site
#> <character>
#> TCGA-02-0001 brain
#> TCGA-02-0003 brain
#> TCGA-02-0004 brain
#> TCGA-02-0006 brain
#> TCGA-02-0007 brain
#> TCGA-02-0009 brain