Note that these functions should be used when a particular study is not currently available as a MultiAssayExperiment representation. Otherwise, use cBioDataPack. Provide a cancer_study_id from the studiesTable and retrieve the study tarball from cBioPortal. These functions are used by cBioDataPack under the hood to download, untar, and load the tarball datasets with caching. As stated in ?cBioDataPack, not all studies are currently working as MultiAssayExperiment objects. As of July 2020, about ~80% of datasets can be successfully imported into the MultiAssayExperiment data class. Please open an issue if you would like the team to prioritize a study. You may also check studiesTable$pack_build for a more current status.

downloadStudy(
  cancer_study_id,
  use_cache = TRUE,
  force = FALSE,
  url_location = getOption("cBio_URL", .url_location)
)

untarStudy(cancer_study_file, exdir = tempdir())

loadStudy(filepath, names.field = c("Hugo_Symbol", "Entrez_Gene_Id", "Gene"))

Arguments

cancer_study_id

character(1) The study identifier from cBioPortal as in https://cbioportal.org/webAPI

use_cache

logical(1) (default TRUE) create the default cache location and use it to track downloaded data. If data found in the cache, data will not be re-downloaded. A path can also be provided to data cache location.

force

logical(1) (default FALSE) whether to force re-download data from remote location

url_location

character(1) (default "https://cbioportal-datahub.s3.amazonaws.com") the URL location for downloading packaged data. Can be set using the 'cBio_URL' option (see ?cBioDataPack for more details)

cancer_study_file

character(1) indicates the on-disk location of the downloaded tarball

exdir

character(1) indicates the folder location to put the contents of the tarball (default tempdir(); see also ?untar)

filepath

character(1) indicates the folder location where the contents of the tarball are located (usually the same as exdir)

names.field

A character vector of possible column names for the column that is used to label ranges from a mutations or copy number file.

Value

  • downloadStudy - The file location of the data tarball

  • untarStudy - The directory location of the contents

  • loadStudy - A MultiAssayExperiment-class object

See also

cBioDataPack, MultiAssayExperiment

Examples

(acc_file <- downloadStudy("acc_tcga"))
#> Study file in cache: acc_tcga
#> BFC7 #> "/github/home/.cache/R/cBioPortalData/34324bf2ff1d_acc_tcga.tar.gz"
(file_dir <- untarStudy(acc_file, tempdir()))
#> [1] "/tmp/RtmptBqLur/34324bf2ff1d_acc_tcga"
loadStudy(file_dir)
#> Working on: /tmp/RtmptBqLur/34324bf2ff1d_acc_tcga/acc_tcga/data_cna_hg19.seg
#> Working on: /tmp/RtmptBqLur/34324bf2ff1d_acc_tcga/acc_tcga/data_CNA.txt
#> Working on: /tmp/RtmptBqLur/34324bf2ff1d_acc_tcga/acc_tcga/data_linear_CNA.txt
#> Working on: /tmp/RtmptBqLur/34324bf2ff1d_acc_tcga/acc_tcga/data_methylation_hm450.txt
#> Working on: /tmp/RtmptBqLur/34324bf2ff1d_acc_tcga/acc_tcga/data_mutations_extended.txt
#> Working on: /tmp/RtmptBqLur/34324bf2ff1d_acc_tcga/acc_tcga/data_mutations_mskcc.txt
#> Working on: /tmp/RtmptBqLur/34324bf2ff1d_acc_tcga/acc_tcga/data_mutsig.txt
#> Working on: /tmp/RtmptBqLur/34324bf2ff1d_acc_tcga/acc_tcga/data_RNA_Seq_v2_expression_median.txt
#> Working on: /tmp/RtmptBqLur/34324bf2ff1d_acc_tcga/acc_tcga/data_RNA_Seq_v2_mRNA_median_all_sample_Zscores.txt
#> Working on: /tmp/RtmptBqLur/34324bf2ff1d_acc_tcga/acc_tcga/data_RNA_Seq_v2_mRNA_median_Zscores.txt
#> Working on: /tmp/RtmptBqLur/34324bf2ff1d_acc_tcga/acc_tcga/data_rppa_Zscores.txt
#> Working on: /tmp/RtmptBqLur/34324bf2ff1d_acc_tcga/acc_tcga/data_rppa.txt
#> Warning: Multiple prefixes found, using keyword 'region' or taking first one
#> Warning: Multiple prefixes found, using keyword 'region' or taking first one
#> A MultiAssayExperiment object of 11 listed #> experiments with user-defined names and respective classes. #> Containing an ExperimentList class object of length 11: #> [1] cna_hg19.seg: RaggedExperiment with 16080 rows and 90 columns #> [2] CNA: SummarizedExperiment with 24776 rows and 90 columns #> [3] linear_CNA: SummarizedExperiment with 24776 rows and 90 columns #> [4] methylation_hm450: SummarizedExperiment with 15473 rows and 80 columns #> [5] mutations_extended: RaggedExperiment with 20166 rows and 90 columns #> [6] mutations_mskcc: RaggedExperiment with 20166 rows and 90 columns #> [7] RNA_Seq_v2_expression_median: SummarizedExperiment with 20531 rows and 79 columns #> [8] RNA_Seq_v2_mRNA_median_all_sample_Zscores: SummarizedExperiment with 20531 rows and 79 columns #> [9] RNA_Seq_v2_mRNA_median_Zscores: SummarizedExperiment with 20440 rows and 79 columns #> [10] rppa_Zscores: SummarizedExperiment with 191 rows and 46 columns #> [11] rppa: SummarizedExperiment with 192 rows and 46 columns #> Functionality: #> experiments() - obtain the ExperimentList instance #> colData() - the primary/phenotype DataFrame #> sampleMap() - the sample coordination DataFrame #> `$`, `[`, `[[` - extract colData columns, subset, or experiment #> *Format() - convert into a long or wide DataFrame #> assays() - convert ExperimentList to a SimpleList of matrices #> exportClass() - save all data to files