The cBioDataPack function allows the user to download and process cancer study datasets found in MSKCC's cBioPortal. Output datasets use the MultiAssayExperiment data representation to faciliate analysis and data management operations.

  use_cache = TRUE,
  names.field = c("Hugo_Symbol", "Entrez_Gene_Id", "Gene"),
  ask = TRUE



character(1) The study identifier from cBioPortal as in


logical(1) (default TRUE) create the default cache location and use it to track downloaded data. If data found in the cache, data will not be re-downloaded. A path can also be provided to data cache location.


A character vector of possible column names for the column that is used to label ranges from a mutations or copy number file.


A logical vector of length one indicating whether to prompt the the user before downloading and loading study MultiAssayExperiment. If TRUE, the user will be prompted to continue for studies that are not currently building as MultiAssayExperiment based on previous testing (in a non-interactive session, no data will be downloaded and built unless ask = FALSE).


A MultiAssayExperiment object


The list of datasets can be found in the studiesTable dataset by doing data("studiesTable"). Some datasets may not be available for download and are not guaranteed to be represented as MultiAssayExperiment data objects. After taking a random sample of 100 (using set.seed(1234)), we were able to succesfully represent about 76 percent of the study identifiers as MultiAssayExperiment objects. Please refer to the #' website for the full list of available datasets. Users who would like to prioritize particular datasets should open GitHub issues at the URL in the DESCRIPTION file. For a more fine-grained approach to downloading data from the cBioPortal API, refer to the cBioPortalData function.


The cBioDataPack function accesses data from the cBio_URL option. By default, it points to an Amazon S3 bucket location. Previously, it pointed to ''. This recent change (> 2.1.17) should provide faster and more reliable downloads for all users. See the URL using cBioPortalData:::.url_location. This can be changed if there are mirrors that host this data by setting the cBio_URL option with getOption("cBio_URL", "") before running the function.

See also


Levi Waldron, Marcel R., Ino dB.


data(studiesTable) head(studiesTable[["cancer_study_id"]])
#> [1] "paac_jhu_2014" "mel_tsam_liang_2017" "all_stjude_2015" #> [4] "all_stjude_2016" "aml_ohsu_2018" "laml_tcga"
# ask=FALSE for non-interactive use mae <- cBioDataPack("acc_tcga", ask = FALSE)
#> Study file in cache: acc_tcga
#> Working on: /tmp/RtmpAQUKGa/8c26d4c20e2_acc_tcga/acc_tcga/data_cna_hg19.seg
#> Working on: /tmp/RtmpAQUKGa/8c26d4c20e2_acc_tcga/acc_tcga/data_CNA.txt
#> Working on: /tmp/RtmpAQUKGa/8c26d4c20e2_acc_tcga/acc_tcga/data_linear_CNA.txt
#> Working on: /tmp/RtmpAQUKGa/8c26d4c20e2_acc_tcga/acc_tcga/data_methylation_hm450.txt
#> Working on: /tmp/RtmpAQUKGa/8c26d4c20e2_acc_tcga/acc_tcga/data_mutations_extended.txt
#> Working on: /tmp/RtmpAQUKGa/8c26d4c20e2_acc_tcga/acc_tcga/data_mutations_mskcc.txt
#> Working on: /tmp/RtmpAQUKGa/8c26d4c20e2_acc_tcga/acc_tcga/data_mutsig.txt
#> Working on: /tmp/RtmpAQUKGa/8c26d4c20e2_acc_tcga/acc_tcga/data_RNA_Seq_v2_expression_median.txt
#> Working on: /tmp/RtmpAQUKGa/8c26d4c20e2_acc_tcga/acc_tcga/data_RNA_Seq_v2_mRNA_median_all_sample_Zscores.txt
#> Working on: /tmp/RtmpAQUKGa/8c26d4c20e2_acc_tcga/acc_tcga/data_RNA_Seq_v2_mRNA_median_Zscores.txt
#> Working on: /tmp/RtmpAQUKGa/8c26d4c20e2_acc_tcga/acc_tcga/data_rppa_Zscores.txt
#> Working on: /tmp/RtmpAQUKGa/8c26d4c20e2_acc_tcga/acc_tcga/data_rppa.txt
#> Warning: Multiple prefixes found, using keyword 'region' or taking first one
#> Warning: Multiple prefixes found, using keyword 'region' or taking first one