The cBioPortalData
R package aims to import cBioPortal datasets as MultiAssayExperiment objects into Bioconductor. Some of the features of the package include:
MultiAssayExperiment
integrative container for coordinating and representing the data.MultiAssayExperiment
provides harmonized subsetting and reshaping into convenient wide and long formats.To install from Bioconductor (recommended for most users, this will install the release or development version corresponding to your version of Bioconductor):
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("cBioPortalData")
To install from GitHub (for bleeding-edge, not generally necessary because changes here are also pushed to bioc-devel. Installing the development version and generally requires running bioc-devel)
To load the package:
library(cBioPortalData)
cBioPortalData
is a work in progress due to changes in data curation and cBioPortal API specification. Users can view the data(studiesTable)
dataset to get an overview of the studies that are available and currently building as MultiAssayExperiment
representations. About 98 % of the studies via the API (api_build
) and 73 % of the package studies (pack_build
) are building, these include additional datasets that were not previously available. Feel free to file an issue to request prioritization of fixing any of the remaining datasets.
cbio <- cBioPortal()
studies <- getStudies(cbio, buildReport = TRUE)
table(studies$api_build)
#>
#> FALSE TRUE
#> 5 308
table(studies$pack_build)
#>
#> FALSE TRUE
#> 86 227
Flexible and granular access to cBioPortal data from cbioportal.org/api
. This option is best used with a particular gene panel of interest. It allows users to download sections of the data with molecular profile and gene panel combinations within a study.
gbm <- cBioPortalData(api = cbio, by = "hugoGeneSymbol", studyId = "gbm_tcga",
genePanelId = "IMPACT341",
molecularProfileIds = c("gbm_tcga_rppa", "gbm_tcga_mrna")
)
gbm
#> A MultiAssayExperiment object of 2 listed
#> experiments with user-defined names and respective classes.
#> Containing an ExperimentList class object of length 2:
#> [1] gbm_tcga_rppa: SummarizedExperiment with 67 rows and 244 columns
#> [2] gbm_tcga_mrna: SummarizedExperiment with 336 rows and 401 columns
#> Functionality:
#> experiments() - obtain the ExperimentList instance
#> colData() - the primary/phenotype DataFrame
#> sampleMap() - the sample coordination DataFrame
#> `$`, `[`, `[[` - extract colData columns, subset, or experiment
#> *Format() - convert into a long or wide DataFrame
#> assays() - convert ExperimentList to a SimpleList of matrices
#> exportClass() - save data to flat files
This function will download a dataset from the cbioportal.org/datasets
website as a packaged tarball and serve it to users as a MultiAssayExperiment
object. This option is good for users who are interested in obtaining all the data for a particular study.
acc <- cBioDataPack("acc_tcga")
acc
#> A MultiAssayExperiment object of 11 listed
#> experiments with user-defined names and respective classes.
#> Containing an ExperimentList class object of length 11:
#> [1] cna_hg19.seg: RaggedExperiment with 16080 rows and 90 columns
#> [2] CNA: SummarizedExperiment with 24776 rows and 90 columns
#> [3] linear_CNA: SummarizedExperiment with 24776 rows and 90 columns
#> [4] methylation_hm450: SummarizedExperiment with 15755 rows and 80 columns
#> [5] mutations_extended: RaggedExperiment with 20166 rows and 90 columns
#> [6] mutations_mskcc: RaggedExperiment with 20166 rows and 90 columns
#> [7] RNA_Seq_v2_expression_median: SummarizedExperiment with 20531 rows and 79 columns
#> [8] RNA_Seq_v2_mRNA_median_all_sample_Zscores: SummarizedExperiment with 20531 rows and 79 columns
#> [9] RNA_Seq_v2_mRNA_median_Zscores: SummarizedExperiment with 20440 rows and 79 columns
#> [10] rppa_Zscores: SummarizedExperiment with 191 rows and 46 columns
#> [11] rppa: SummarizedExperiment with 192 rows and 46 columns
#> Functionality:
#> experiments() - obtain the ExperimentList instance
#> colData() - the primary/phenotype DataFrame
#> sampleMap() - the sample coordination DataFrame
#> `$`, `[`, `[[` - extract colData columns, subset, or experiment
#> *Format() - convert into a long or wide DataFrame
#> assays() - convert ExperimentList to a SimpleList of matrices
#> exportClass() - save data to flat files