Skip to contents

oncoKBData

Lifecycle: experimental

The aim of the package is to expose the OncoKB API through an R client. This vignette demonstrates public API access. To learn more about the OncoKB database, visit https://www.oncokb.org.

Installation

To get the development version of oncoKBData use:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("waldronlab/oncoKBData")

Introduction

The oncoKBData aims to provide access to the OncoKB API via the public API. Access is also possible with a licensed token.

API representation

In order to use the OncoKB API, we must instantiate an API object as provided by the rapiclient and AnVIL packages.

oncokb <- oncoKB()

Note that for private API access, users must change the api. argument in the oncoKB function.

Operations

Check available tags, operations, and descriptions as a tibble:

tags(oncokb)
## # A tibble: 20 × 3
##    tag          operation                                       summary         
##    <chr>        <chr>                                           <chr>           
##  1 Annotations  annotateCopyNumberAlterationsGetUsingGET_1      annotateCopyNum…
##  2 Annotations  annotateCopyNumberAlterationsPostUsingPOST_1    annotateCopyNum…
##  3 Annotations  annotateMutationsByGenomicChangeGetUsingGET_1   annotateMutatio…
##  4 Annotations  annotateMutationsByGenomicChangePostUsingPOST_1 annotateMutatio…
##  5 Annotations  annotateMutationsByHGVSgGetUsingGET_1           annotateMutatio…
##  6 Annotations  annotateMutationsByHGVSgPostUsingPOST_1         annotateMutatio…
##  7 Annotations  annotateMutationsByProteinChangeGetUsingGET_1   annotateMutatio…
##  8 Annotations  annotateMutationsByProteinChangePostUsingPOST_1 annotateMutatio…
##  9 Annotations  annotateStructuralVariantsGetUsingGET_1         annotateStructu…
## 10 Annotations  annotateStructuralVariantsPostUsingPOST_1       annotateStructu…
## 11 Cancer Genes utilsAllCuratedGenesGetUsingGET_1               utilsAllCurated…
## 12 Cancer Genes utilsAllCuratedGenesTxtGetUsingGET_1            utilsAllCurated…
## 13 Cancer Genes utilsCancerGeneListGetUsingGET_1                utilsCancerGene…
## 14 Cancer Genes utilsCancerGeneListTxtGetUsingGET_1             utilsCancerGene…
## 15 Info         infoGetUsingGET_1                               infoGet         
## 16 Levels       levelsDiagnosticGetUsingGET_1                   levelsDiagnosti…
## 17 Levels       levelsGetUsingGET_1                             levelsGet       
## 18 Levels       levelsPrognosticGetUsingGET_1                   levelsPrognosti…
## 19 Levels       levelsResistanceGetUsingGET_1                   levelsResistanc…
## 20 Levels       levelsSensitiveGetUsingGET_1                    levelsSensitive…
head(tags(oncokb)$operation)
## [1] "annotateCopyNumberAlterationsGetUsingGET_1"     
## [2] "annotateCopyNumberAlterationsPostUsingPOST_1"   
## [3] "annotateMutationsByGenomicChangeGetUsingGET_1"  
## [4] "annotateMutationsByGenomicChangePostUsingPOST_1"
## [5] "annotateMutationsByHGVSgGetUsingGET_1"          
## [6] "annotateMutationsByHGVSgPostUsingPOST_1"

Note. The annotations API access requires a token.

Levels of Evidence

To retrieve the levels of evidence for all types (i.e., ‘therapeutic’, ‘diagnostic’, ‘prognostic’, and ‘FDA’) run the levelsOfEvidence function.

(loe <- levelsOfEvidence(oncokb))
## DataFrame with 16 rows and 4 columns
##     levelOfEvidence            description        htmlDescription    colorHex
##         <character>            <character>            <character> <character>
## 1           LEVEL_1 FDA-recognized bioma.. <span><b>FDA-recogni..     #33A02C
## 2           LEVEL_2 Standard care biomar.. <span><b>Standard ca..     #1F78B4
## 3          LEVEL_3A Compelling clinical .. <span><b>Compelling ..     #984EA3
## 4          LEVEL_3B Standard care or inv.. <span><b>Standard ca..     #BE98CE
## 5           LEVEL_4 Compelling biologica.. <span><b>Compelling ..     #424242
## ...             ...                    ...                    ...         ...
## 12        LEVEL_Px1 FDA and/or professio.. <span><b>FDA and/or ..     #33A02C
## 13        LEVEL_Px2 FDA and/or professio.. <span><b>FDA and/or ..     #1F78B4
## 14        LEVEL_Px3 Biomarker is prognos.. <span>Biomarker is p..     #984EA3
## 15         LEVEL_R1 Standard care biomar.. <span><b>Standard of..     #EE3424
## 16         LEVEL_R2 Compelling clinical .. <span><b>Compelling ..     #F79A92

It will return a DataFrame with important metadata:

## [1] "oncoTreeVersion" "ncitVersion"     "dataVersion"     "appVersion"     
## [5] "apiVersion"      "publicInstance"  "genomeNexus"
metadata(loe)["oncoTreeVersion"]
## $oncoTreeVersion
## [1] "oncotree_2019_12_01"
metadata(loe)[["apiVersion"]]
## $version
## [1] "v1.5.0"
## 
## $major
## [1] 1
## 
## $minor
## [1] 5
## 
## $patch
## [1] 0
## 
## $suffixTokens
## list()
## 
## $stable
## [1] TRUE

Gene tables

The API allows retrieval of curated genes where there is a single gene per observation:

curatedGenes(oncokb)
## # A tibble: 933 × 13
##    grch37Isoform grch37RefSeq grch38Isoform grch38RefSeq entrezGeneId hugoSymbol
##    <chr>         <chr>        <chr>         <chr>               <int> <chr>     
##  1 ENST00000265… NM_000927.4  ENST00000622… NM_00134894…         5243 ABCB1     
##  2 ENST00000318… NM_005157.4  ENST00000318… NM_005157.4            25 ABL1      
##  3 ENST00000502… NM_007314.3  ENST00000502… NM_007314.3            27 ABL2      
##  4 ENST00000321… NM_139076.2  ENST00000321… NM_139076.2         84142 ABRAXAS1  
##  5 ENST00000272… NM_020311    ENST00000272… NM_020311           57007 ACKR3     
##  6 ENST00000331… NM_00119995… ENST00000573… NM_00119995…           71 ACTG1     
##  7 ENST00000263… NM_00111106… ENST00000263… NM_00111106…           90 ACVR1     
##  8 ENST00000257… NM_004302    ENST00000257… NM_004302              91 ACVR1B    
##  9 ENST00000241… NM_001278579 ENST00000241… NM_001278579           92 ACVR2A    
## 10 ENST00000381… NM_018702.3  ENST00000381… NM_018702.4           105 ADARB2    
## # ℹ 923 more rows
## # ℹ 7 more variables: oncogene <lgl>, highestSensitiveLevel <chr>,
## #   highestResistanceLevel <chr>, summary <chr>, background <chr>, tsg <lgl>,
## #   highestResistancLevel <chr>

and a long list of genes associated with cancer where there can be multiple entries for the same hugoSymbol due to multiple geneAliases:

## # A tibble: 3,275 × 17
##    hugoSymbol entrezGeneId grch37Isoform grch37RefSeq grch38Isoform grch38RefSeq
##    <chr>             <int> <chr>         <chr>        <chr>         <chr>       
##  1 ABL1                 25 ENST00000318… NM_005157.4  ENST00000318… NM_005157.4 
##  2 ABL1                 25 ENST00000318… NM_005157.4  ENST00000318… NM_005157.4 
##  3 ABL1                 25 ENST00000318… NM_005157.4  ENST00000318… NM_005157.4 
##  4 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  5 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  6 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  7 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  8 AKT1                207 ENST00000349… NM_00101443… ENST00000349… NM_00101443…
##  9 ALK                 238 ENST00000389… NM_004304.4  ENST00000389… NM_004304.4 
## 10 AMER1            139285 ENST00000330… NM_152424.3  ENST00000374… NM_152424.3 
## # ℹ 3,265 more rows
## # ℹ 11 more variables: oncokbAnnotated <lgl>, occurrenceCount <int>,
## #   mSKImpact <lgl>, mSKHeme <lgl>, foundation <lgl>, foundationHeme <lgl>,
## #   vogelstein <lgl>, sangerCGC <lgl>, geneAliases <list>, tsg <lgl>,
## #   oncogene <lgl>

Session Information

Click to expand sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] S4Vectors_0.47.0    BiocGenerics_0.55.0 generics_0.1.3     
[4] oncoKBData_0.99.7   AnVIL_1.21.3        AnVILBase_1.3.1    
[7] dplyr_1.1.4         BiocStyle_2.37.0   

loaded via a namespace (and not attached):
 [1] utf8_1.2.5           rappdirs_0.3.3       sass_0.4.10         
 [4] futile.options_1.0.1 digest_0.6.37        magrittr_2.0.3      
 [7] evaluate_1.0.3       bookdown_0.43        fastmap_1.2.0       
[10] jsonlite_2.0.0       formatR_1.14         promises_1.3.2      
[13] BiocManager_1.30.25  httr_1.4.7           rapiclient_0.1.8    
[16] codetools_0.2-20     httr2_1.1.2          textshaping_1.0.1   
[19] jquerylib_0.1.4      cli_3.6.5            shiny_1.10.0        
[22] rlang_1.1.6          futile.logger_1.4.3  cachem_1.1.0        
[25] yaml_2.3.10          BiocBaseUtils_1.11.0 tools_4.5.0         
[28] httpuv_1.6.16        DT_0.33              lambda.r_1.2.4      
[31] curl_6.2.2           vctrs_0.6.5          R6_2.6.1            
[34] mime_0.13            lifecycle_1.0.4      fs_1.6.6            
[37] htmlwidgets_1.6.4    miniUI_0.1.2         ragg_1.4.0          
[40] pkgconfig_2.0.3      desc_1.4.3           pkgdown_2.1.2       
[43] pillar_1.10.2        bslib_0.9.0          later_1.4.2         
[46] glue_1.8.0           Rcpp_1.0.14          systemfonts_1.2.3   
[49] xfun_0.52            tibble_3.2.1         tidyselect_1.2.1    
[52] knitr_1.50           xtable_1.8-4         htmltools_0.5.8.1   
[55] rmarkdown_2.29       compiler_4.5.0