Note that these functions should be used when a particular
study is not currently available as a MultiAssayExperiment
representation. Otherwise, use cBioDataPack. Provide a cancer_study_id
from getStudies and retrieve the study tarball from the cBio Genomics
Portal.  These functions are used by cBioDataPack under the hood to
download,untar, and load the tarball datasets with caching. As stated in
?cBioDataPack, not all studies are currently working as
MultiAssayExperiment objects. As of July 2020, about ~80% of datasets can
be successfully imported into the MultiAssayExperiment data class. Please
open an issue if you would like the team to prioritize a study. You may
also check getStudies(buildReport = TRUE)$pack_build for the current
status.
Usage
downloadStudy(
  cancer_study_id,
  use_cache = TRUE,
  force = FALSE,
  url_location = getOption("cBio_URL", .url_location),
  ask = interactive()
)
untarStudy(cancer_study_file, exdir = tempdir())
loadStudy(
  filepath,
  names.field = c("Hugo_Symbol", "Entrez_Gene_Id", "Gene", "Composite.Element.REF"),
  cleanup = TRUE
)Arguments
- cancer_study_id
- character(1)The study identifier from cBioPortal as seen in the dataset links at https://www.cbioportal.org/datasets.
- use_cache
- logical(1)(default TRUE) create the default cache location and use it to track downloaded data. If data found in the cache, data will not be re-downloaded. A path can also be provided to data cache location.
- force
- logical(1)(default FALSE) whether to force re-download data from remote location
- url_location
- character(1)(default "https://cbioportal-datahub.s3.amazonaws.com") the URL location for downloading packaged data. Can be set using the 'cBio_URL' option (see- ?cBioDataPackfor more details)
- ask
- logical(1)Whether to prompt the the user before downloading and loading study- MultiAssayExperimentthat is not currently building based on previous testing. Set to- interactive()by default. In a non-interactive session, data download will be attempted; equivalent to- ask = FALSE. The argument will also be used when a cache directory needs to be created when using- downloadStudy.
- cancer_study_file
- character(1)indicates the on-disk location of the downloaded tarball
- exdir
- character(1)indicates the folder location to put the contents of the tarball (default- tempdir(); see also- ?untar)
- filepath
- character(1)indicates the folder location where the contents of the tarball are located (usually the same as- exdir)
- names.field
- character()Possible column names for the column that will used to label ranges for data such as mutations or copy number (defaults: "Hugo_Symbol", "Entrez_Gene_Id", "Gene", and "Composite.Element.REF"). Values are cycled through and eliminated when no data present, or duplicates are found. Values in the corresponding column must be unique in each row.
- cleanup
- logical(1)whether to delete the- untar-red contents from the- exdirfolder (default TRUE)
Value
- downloadStudy - The file location of the data tarball 
- untarStudy - The directory location of the contents 
- loadStudy - A MultiAssayExperiment-class object 
Details
When attempting to load a dataset using loadStudy, note that the
cleanup argument is set to TRUE by default. Change the argument to
FALSE if you would like to keep the untarred data in the exdir
location. downloadStudy and untarStudy are not affected by this change.
The tarball of the downloaded data is cached via BiocFileCache when
use_cache is TRUE.
Examples
acc_file <- downloadStudy("acc_tcga")
#> Study file in cache: acc_tcga
acc_file
#>                                                               BFC4 
#> "/github/home/.cache/R/cBioPortalData/a1d7e2b6e07_acc_tcga.tar.gz" 
file_dir <- untarStudy(acc_file, tempdir())
file_dir
#> [1] "/tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga"
loadStudy(file_dir)
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_cna.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_cna_hg19.seg
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_linear_cna.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_methylation_hm450.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_mrna_seq_v2_rsem.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_mrna_seq_v2_rsem_zscores_ref_all_samples.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_mrna_seq_v2_rsem_zscores_ref_diploid_samples.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_mutations.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_mutsig.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_rppa.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_rppa_zscores.txt
#> Warning:  Multiple prefixes found, using keyword 'region' or taking first one
#> Warning:  Multiple prefixes found, using keyword 'region' or taking first one
#> A MultiAssayExperiment object of 10 listed
#>  experiments with user-defined names and respective classes.
#>  Containing an ExperimentList class object of length 10:
#>  [1] cna: SummarizedExperiment with 24776 rows and 90 columns
#>  [2] cna_hg19.seg: RaggedExperiment with 16080 rows and 90 columns
#>  [3] linear_cna: SummarizedExperiment with 24776 rows and 90 columns
#>  [4] methylation_hm450: SummarizedExperiment with 15754 rows and 80 columns
#>  [5] mrna_seq_v2_rsem: SummarizedExperiment with 20531 rows and 79 columns
#>  [6] mrna_seq_v2_rsem_zscores_ref_all_samples: SummarizedExperiment with 20531 rows and 79 columns
#>  [7] mrna_seq_v2_rsem_zscores_ref_diploid_samples: SummarizedExperiment with 20440 rows and 79 columns
#>  [8] mutations: RaggedExperiment with 20166 rows and 90 columns
#>  [9] rppa: SummarizedExperiment with 192 rows and 46 columns
#>  [10] rppa_zscores: SummarizedExperiment with 191 rows and 46 columns
#> Functionality:
#>  experiments() - obtain the ExperimentList instance
#>  colData() - the primary/phenotype DataFrame
#>  sampleMap() - the sample coordination DataFrame
#>  `$`, `[`, `[[` - extract colData columns, subset, or experiment
#>  *Format() - convert into a long or wide DataFrame
#>  assays() - convert ExperimentList to a SimpleList of matrices
#>  exportClass() - save data to flat files