Obtain pre-packaged data from cBioPortal and represent as a MultiAssayExperiment object
Source:R/cBioDataPack.R
cBioDataPack.Rd
The cBioDataPack
function allows the user to
download and process cancer study datasets found in MSKCC's cBioPortal.
Output datasets use the
MultiAssayExperiment
data representation to faciliate analysis and data management operations.
Usage
cBioDataPack(
cancer_study_id,
use_cache = TRUE,
names.field = c("Hugo_Symbol", "Entrez_Gene_Id", "Gene"),
cleanup = TRUE,
ask = interactive(),
check_build = TRUE
)
Arguments
- cancer_study_id
character(1)
The study identifier from cBioPortal as seen in the dataset links at https://www.cbioportal.org/datasets.- use_cache
logical(1)
(default TRUE) create the default cache location and use it to track downloaded data. If data found in the cache, data will not be re-downloaded. A path can also be provided to data cache location.- names.field
character()
Possible column names for the column that will used to label ranges for data such as mutations or copy number (defaults: "Hugo_Symbol", "Entrez_Gene_Id", "Gene", and "Composite.Element.REF"). Values are cycled through and eliminated when no data present, or duplicates are found. Values in the corresponding column must be unique in each row.- cleanup
logical(1)
whether to delete theuntar
-red contents from theexdir
folder (default TRUE)- ask
logical(1)
Whether to prompt the the user before downloading and loading studyMultiAssayExperiment
that is not currently building based on previous testing. Set tointeractive()
by default. In a non-interactive session, data download will be attempted; equivalent toask = FALSE
. The argument will also be used when a cache directory needs to be created when usingdownloadStudy
.- check_build
logical(1L) Whether to check the build status of the
studyId
using an internal dataset. This argument should be set toFALSE
if using alternativehostnames
, e.g., 'pedcbioportal.kidsfirstdrc.org'
Value
A MultiAssayExperiment object
Details
The full list of study identifiers (studyId
s) can obtained from
getStudies()
. Currently, only ~ 72% of datasets can be represented as
MultiAssayExperiment
data objects from the data tarballs. Refer to
getStudies(..., buildReport = TRUE)
and its "pack_build"
column to see
which study identifiers are not building. Users who would like to prioritize
particular datasets should open GitHub issues at the URL in the
DESCRIPTION
file. For a more fine-grained approach to downloading data
from the cBioPortal API, refer to the cBioPortalData
function.
cBio_URL
The cBioDataPack
function accesses data from the cBio_URL
option.
By default, it points to an Amazon S3 bucket location. Previously, it
pointed to 'http://download.cbioportal.org'. This recent change
(> 2.1.17) should provide faster and more reliable downloads for all users.
See the URL using cBioPortalData:::.url_location
. This can be changed
if there are mirrors that host this data by setting the cBio_URL
option
with getOption("cBio_URL", "https://some.url.com/")
before running the
function.
Examples
cbio <- cBioPortal()
head(getStudies(cbio)[["studyId"]])
#> [1] "all_stjude_2015" "all_stjude_2013" "acyc_fmi_2014" "acyc_jhu_2016"
#> [5] "acyc_mda_2015" "acyc_mgh_2016"
mae <- cBioDataPack("acc_tcga")
#> Study file in cache: acc_tcga
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_cna.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_cna_hg19.seg
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_linear_cna.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_methylation_hm450.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_mrna_seq_v2_rsem.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_mrna_seq_v2_rsem_zscores_ref_all_samples.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_mrna_seq_v2_rsem_zscores_ref_diploid_samples.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_mutations.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_mutsig.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_rppa.txt
#> Working on: /tmp/RtmpdbKWbS/a1d7e2b6e07_acc_tcga/acc_tcga/data_rppa_zscores.txt
#> Warning: Multiple prefixes found, using keyword 'region' or taking first one
#> Warning: Multiple prefixes found, using keyword 'region' or taking first one