vignettes/cBioPortalData.Rmd
cBioPortalData.Rmd
This vignette lays out the two main user-facing functions for downloading and representing data from the cBioPortal API. cBioDataPack
makes use of the legacy distribution data method in cBioPortal
(via tarballs). cBioPortalData
allows for a more flexibile approach to obtaining data based on several available parameters including available molecular profiles.
This function will access the packaged data from and return an integrative MultiAssayExperiment representation.
## Use ask=FALSE for non-interactive use
cBioDataPack("laml_tcga", ask = FALSE)
## A MultiAssayExperiment object of 13 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 13:
## [1] cna_hg19.seg: RaggedExperiment with 13571 rows and 191 columns
## [2] CNA: SummarizedExperiment with 24776 rows and 191 columns
## [3] linear_CNA: SummarizedExperiment with 24776 rows and 191 columns
## [4] methylation_hm27: SummarizedExperiment with 10919 rows and 194 columns
## [5] methylation_hm450: SummarizedExperiment with 10919 rows and 194 columns
## [6] mutations_extended: RaggedExperiment with 2584 rows and 197 columns
## [7] mutations_mskcc: RaggedExperiment with 2584 rows and 197 columns
## [8] RNA_Seq_expression_median: SummarizedExperiment with 19720 rows and 179 columns
## [9] RNA_Seq_mRNA_median_all_sample_Zscores: SummarizedExperiment with 19720 rows and 179 columns
## [10] RNA_Seq_mRNA_median_Zscores: SummarizedExperiment with 19719 rows and 179 columns
## [11] RNA_Seq_v2_expression_median: SummarizedExperiment with 20531 rows and 173 columns
## [12] RNA_Seq_v2_mRNA_median_all_sample_Zscores: SummarizedExperiment with 20531 rows and 173 columns
## [13] RNA_Seq_v2_mRNA_median_Zscores: SummarizedExperiment with 20440 rows and 173 columns
## Functionality:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample coordination DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
## exportClass() - save all data to files
This function provides a more flexible and granular way to request a MultiAssayExperiment object from a study ID, molecular profile, gene panel, sample list.
cbio <- cBioPortal()
acc <- cBioPortalData(api = cbio, by = "hugoGeneSymbol", studyId = "acc_tcga",
genePanelId = "IMPACT341",
molecularProfileIds = c("acc_tcga_rppa", "acc_tcga_linear_CNA")
)
## harmonizing input:
## removing 1 colData rownames not in sampleMap 'primary'
acc
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] acc_tcga_rppa: SummarizedExperiment with 57 rows and 46 columns
## [2] acc_tcga_linear_CNA: SummarizedExperiment with 339 rows and 90 columns
## Functionality:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample coordination DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
## exportClass() - save all data to files
In cases where a download is interrupted, the user may experience a corrupt cache. The user can clear the cache for a particular study by using the removeCache
function. Note that this function only works for data downloaded through the cBioDataPack
function.
removeCache("laml_tcga")
For users who wish to clear the entire cBioPortalData
cache, it is recommended that they use:
unlink("~/.cache/cBioPortalData/")
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04 LTS
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] cBioPortalData_2.2.11 MultiAssayExperiment_1.16.0
## [3] SummarizedExperiment_1.20.0 Biobase_2.50.0
## [5] GenomicRanges_1.42.0 GenomeInfoDb_1.26.7
## [7] IRanges_2.24.1 S4Vectors_0.28.1
## [9] BiocGenerics_0.36.1 MatrixGenerics_1.2.1
## [11] matrixStats_0.58.0 AnVIL_1.2.0
## [13] dplyr_1.0.5 BiocStyle_2.18.1
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-6 fs_1.5.0
## [3] bit64_4.0.5 progress_1.2.2
## [5] httr_1.4.2 rprojroot_2.0.2
## [7] GenomicDataCommons_1.14.0 tools_4.0.3
## [9] utf8_1.2.1 R6_2.5.0
## [11] DBI_1.1.1 withr_2.4.2
## [13] prettyunits_1.1.1 tidyselect_1.1.0
## [15] TCGAutils_1.10.1 bit_4.0.4
## [17] curl_4.3 compiler_4.0.3
## [19] cli_2.4.0 rvest_1.0.0
## [21] textshaping_0.3.3 formatR_1.9
## [23] xml2_1.3.2 desc_1.3.0
## [25] DelayedArray_0.16.3 rtracklayer_1.50.0
## [27] bookdown_0.21 readr_1.4.0
## [29] askpass_1.1 rappdirs_0.3.3
## [31] pkgdown_1.6.1 rapiclient_0.1.3
## [33] RCircos_1.2.1 Rsamtools_2.6.0
## [35] systemfonts_1.0.1 stringr_1.4.0
## [37] digest_0.6.27 rmarkdown_2.7
## [39] XVector_0.30.0 pkgconfig_2.0.3
## [41] htmltools_0.5.1.1 dbplyr_2.1.1
## [43] fastmap_1.1.0 limma_3.46.0
## [45] rlang_0.4.10 rstudioapi_0.13
## [47] RSQLite_2.2.6 generics_0.1.0
## [49] jsonlite_1.7.2 BiocParallel_1.24.1
## [51] RCurl_1.98-1.3 magrittr_2.0.1
## [53] GenomeInfoDbData_1.2.4 futile.logger_1.4.3
## [55] Matrix_1.3-2 Rcpp_1.0.6
## [57] fansi_0.4.2 lifecycle_1.0.0
## [59] stringi_1.5.3 yaml_2.2.1
## [61] RaggedExperiment_1.14.2 RJSONIO_1.3-1.4
## [63] zlibbioc_1.36.0 BiocFileCache_1.14.0
## [65] grid_4.0.3 blob_1.2.1
## [67] crayon_1.4.1 lattice_0.20-41
## [69] Biostrings_2.58.0 splines_4.0.3
## [71] GenomicFeatures_1.42.3 hms_1.0.0
## [73] ps_1.6.0 knitr_1.32
## [75] pillar_1.6.0 codetools_0.2-18
## [77] biomaRt_2.46.3 futile.options_1.0.1
## [79] XML_3.99-0.6 glue_1.4.2
## [81] evaluate_0.14 lambda.r_1.2.4
## [83] data.table_1.14.0 BiocManager_1.30.12
## [85] vctrs_0.3.7 tidyr_1.1.3
## [87] openssl_1.4.3 purrr_0.3.4
## [89] assertthat_0.2.1 cachem_1.0.4
## [91] xfun_0.22 ragg_1.1.2
## [93] survival_3.2-10 tibble_3.1.1
## [95] RTCGAToolbox_2.20.0 GenomicAlignments_1.26.0
## [97] AnnotationDbi_1.52.0 memoise_2.0.0
## [99] ellipsis_0.3.1