Skip to contents

The aim of the package is to expose the OncoKB API through an R client. This vignette demonstrates public API access. To learn more about the OncoKB database, visit https://www.oncokb.org.

Installation

To get the development version of oncoKBData use:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("waldronlab/oncoKBData")

Introduction

The oncoKBData aims to provide access to the OncoKB API via the public API. Access is also possible with a licensed token.

API representation

In order to use the OncoKB API, we must instantiate an API object as provided by the rapiclient and AnVIL packages.

oncokb <- oncoKB()

Note that for private API access, users must change the api. argument in the oncoKB function.

Operations

Check available tags, operations, and descriptions as a tibble:

tags(oncokb)
#> # A tibble: 20 × 3
#>    tag          operation                                       summary                             
#>    <chr>        <chr>                                           <chr>                               
#>  1 Annotations  annotateCopyNumberAlterationsGetUsingGET_1      annotateCopyNumberAlterationsGet    
#>  2 Annotations  annotateCopyNumberAlterationsPostUsingPOST_1    annotateCopyNumberAlterationsPost   
#>  3 Annotations  annotateMutationsByGenomicChangeGetUsingGET_1   annotateMutationsByGenomicChangeGet 
#>  4 Annotations  annotateMutationsByGenomicChangePostUsingPOST_1 annotateMutationsByGenomicChangePost
#>  5 Annotations  annotateMutationsByHGVSgGetUsingGET_1           annotateMutationsByHGVSgGet         
#>  6 Annotations  annotateMutationsByHGVSgPostUsingPOST_1         annotateMutationsByHGVSgPost        
#>  7 Annotations  annotateMutationsByProteinChangeGetUsingGET_1   annotateMutationsByProteinChangeGet 
#>  8 Annotations  annotateMutationsByProteinChangePostUsingPOST_1 annotateMutationsByProteinChangePost
#>  9 Annotations  annotateStructuralVariantsGetUsingGET_1         annotateStructuralVariantsGet       
#> 10 Annotations  annotateStructuralVariantsPostUsingPOST_1       annotateStructuralVariantsPost      
#> 11 Cancer Genes utilsAllCuratedGenesGetUsingGET_1               utilsAllCuratedGenesGet             
#> 12 Cancer Genes utilsAllCuratedGenesTxtGetUsingGET_1            utilsAllCuratedGenesTxtGet          
#> 13 Cancer Genes utilsCancerGeneListGetUsingGET_1                utilsCancerGeneListGet              
#> 14 Cancer Genes utilsCancerGeneListTxtGetUsingGET_1             utilsCancerGeneListTxtGet           
#> 15 Info         infoGetUsingGET_1                               infoGet                             
#> 16 Levels       levelsDiagnosticGetUsingGET_1                   levelsDiagnosticGet                 
#> 17 Levels       levelsGetUsingGET_1                             levelsGet                           
#> 18 Levels       levelsPrognosticGetUsingGET_1                   levelsPrognosticGet                 
#> 19 Levels       levelsResistanceGetUsingGET_1                   levelsResistanceGet                 
#> 20 Levels       levelsSensitiveGetUsingGET_1                    levelsSensitiveGet
head(tags(oncokb)$operation)
#> [1] "annotateCopyNumberAlterationsGetUsingGET_1"      "annotateCopyNumberAlterationsPostUsingPOST_1"   
#> [3] "annotateMutationsByGenomicChangeGetUsingGET_1"   "annotateMutationsByGenomicChangePostUsingPOST_1"
#> [5] "annotateMutationsByHGVSgGetUsingGET_1"           "annotateMutationsByHGVSgPostUsingPOST_1"

Note. The annotations API access requires a token.

Levels of Evidence

To retrieve the levels of evidence for all types (i.e., ‘therapeutic’, ‘diagnostic’, ‘prognostic’, and ‘FDA’) run the levelsOfEvidence function.

(loe <- levelsOfEvidence(oncokb))
#> DataFrame with 16 rows and 4 columns
#>     levelOfEvidence            description        htmlDescription    colorHex
#>         <character>            <character>            <character> <character>
#> 1           LEVEL_1 FDA-recognized bioma.. <span><b>FDA-recogni..     #33A02C
#> 2           LEVEL_2 Standard care biomar.. <span><b>Standard ca..     #1F78B4
#> 3          LEVEL_3A Compelling clinical .. <span><b>Compelling ..     #984EA3
#> 4          LEVEL_3B Standard care or inv.. <span><b>Standard ca..     #BE98CE
#> 5           LEVEL_4 Compelling biologica.. <span><b>Compelling ..     #424242
#> ...             ...                    ...                    ...         ...
#> 12        LEVEL_Px1 FDA and/or professio.. <span><b>FDA and/or ..     #33A02C
#> 13        LEVEL_Px2 FDA and/or professio.. <span><b>FDA and/or ..     #1F78B4
#> 14        LEVEL_Px3 Biomarker is prognos.. <span>Biomarker is p..     #984EA3
#> 15         LEVEL_R1 Standard care biomar.. <span><b>Standard of..     #EE3424
#> 16         LEVEL_R2 Compelling clinical .. <span><b>Compelling ..     #F79A92

It will return a DataFrame with important metadata:

names(metadata(loe))
#> [1] "oncoTreeVersion" "ncitVersion"     "dataVersion"     "appVersion"      "apiVersion"      "publicInstance"  "genomeNexus"

metadata(loe)["oncoTreeVersion"]
#> $oncoTreeVersion
#> [1] "oncotree_2019_12_01"

metadata(loe)[["apiVersion"]]
#> $version
#> [1] "v1.5.0"
#> 
#> $major
#> [1] 1
#> 
#> $minor
#> [1] 5
#> 
#> $patch
#> [1] 0
#> 
#> $suffixTokens
#> list()
#> 
#> $stable
#> [1] TRUE

Gene tables

The API allows retrieval of curated genes where there is a single gene per observation:

curatedGenes(oncokb)
#> # A tibble: 933 × 13
#>    grch37Isoform  grch37RefSeq grch38Isoform grch38RefSeq entrezGeneId hugoSymbol oncogene highestSensitiveLevel highestResistanceLevel
#>    <chr>          <chr>        <chr>         <chr>               <int> <chr>      <lgl>    <chr>                 <chr>                 
#>  1 ENST000002657… NM_000927.4  ENST00000622… NM_00134894…         5243 ABCB1      TRUE     ""                    ""                    
#>  2 ENST000003185… NM_005157.4  ENST00000318… NM_005157.4            25 ABL1       TRUE     "1"                   "R1"                  
#>  3 ENST000005027… NM_007314.3  ENST00000502… NM_007314.3            27 ABL2       TRUE     ""                    ""                    
#>  4 ENST000003219… NM_139076.2  ENST00000321… NM_139076.2         84142 ABRAXAS1   FALSE    ""                    ""                    
#>  5 ENST000002729… NM_020311    ENST00000272… NM_020311           57007 ACKR3      TRUE     ""                    ""                    
#>  6 ENST000003319… NM_00119995… ENST00000573… NM_00119995…           71 ACTG1      FALSE    ""                    ""                    
#>  7 ENST000002636… NM_00111106… ENST00000263… NM_00111106…           90 ACVR1      TRUE     ""                    ""                    
#>  8 ENST000002579… NM_004302    ENST00000257… NM_004302              91 ACVR1B     FALSE    ""                    ""                    
#>  9 ENST000002414… NM_001278579 ENST00000241… NM_001278579           92 ACVR2A     TRUE     ""                    ""                    
#> 10 ENST000003813… NM_018702.3  ENST00000381… NM_018702.4           105 ADARB2     TRUE     ""                    ""                    
#> # ℹ 923 more rows
#> # ℹ 4 more variables: summary <chr>, background <chr>, tsg <lgl>, highestResistancLevel <chr>

and a long list of genes associated with cancer where there can be multiple entries for the same hugoSymbol due to multiple geneAliases:

cancerGeneList(oncokb)
#> # A tibble: 3,275 × 17
#>    hugoSymbol entrezGeneId grch37Isoform   grch37RefSeq   grch38Isoform  grch38RefSeq oncokbAnnotated occurrenceCount mSKImpact mSKHeme
#>    <chr>             <int> <chr>           <chr>          <chr>          <chr>        <lgl>                     <int> <lgl>     <lgl>  
#>  1 ABL1                 25 ENST00000318560 NM_005157.4    ENST000003185… NM_005157.4  TRUE                          7 TRUE      TRUE   
#>  2 ABL1                 25 ENST00000318560 NM_005157.4    ENST000003185… NM_005157.4  TRUE                          7 TRUE      TRUE   
#>  3 ABL1                 25 ENST00000318560 NM_005157.4    ENST000003185… NM_005157.4  TRUE                          7 TRUE      TRUE   
#>  4 AKT1                207 ENST00000349310 NM_001014431.1 ENST000003493… NM_00101443… TRUE                          7 TRUE      TRUE   
#>  5 AKT1                207 ENST00000349310 NM_001014431.1 ENST000003493… NM_00101443… TRUE                          7 TRUE      TRUE   
#>  6 AKT1                207 ENST00000349310 NM_001014431.1 ENST000003493… NM_00101443… TRUE                          7 TRUE      TRUE   
#>  7 AKT1                207 ENST00000349310 NM_001014431.1 ENST000003493… NM_00101443… TRUE                          7 TRUE      TRUE   
#>  8 AKT1                207 ENST00000349310 NM_001014431.1 ENST000003493… NM_00101443… TRUE                          7 TRUE      TRUE   
#>  9 ALK                 238 ENST00000389048 NM_004304.4    ENST000003890… NM_004304.4  TRUE                          7 TRUE      TRUE   
#> 10 AMER1            139285 ENST00000330258 NM_152424.3    ENST000003748… NM_152424.3  TRUE                          7 TRUE      TRUE   
#> # ℹ 3,265 more rows
#> # ℹ 7 more variables: foundation <lgl>, foundationHeme <lgl>, vogelstein <lgl>, sangerCGC <lgl>, geneAliases <list>, tsg <lgl>,
#> #   oncogene <lgl>

Session Information

Click to expand sessionInfo()
R version 4.5.0 Patched (2025-04-15 r88148)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] BiocStyle_2.37.0    S4Vectors_0.47.0    BiocGenerics_0.55.0 generics_0.1.3      oncoKBData_0.99.4   AnVIL_1.21.3
[7] AnVILBase_1.3.1     dplyr_1.1.4         colorout_1.3-2

loaded via a namespace (and not attached):
 [1] utf8_1.2.4           rappdirs_0.3.3       futile.options_1.0.1 digest_0.6.37        magrittr_2.0.3       evaluate_1.0.3
 [7] fastmap_1.2.0        rprojroot_2.0.4      jsonlite_2.0.0       processx_3.8.6       pkgbuild_1.4.7       ps_1.9.1
[13] formatR_1.14         promises_1.3.2       BiocManager_1.30.25  httr_1.4.7           purrr_1.0.4          rapiclient_0.1.8
[19] codetools_0.2-20     httr2_1.1.2          cli_3.6.5            shiny_1.10.0         rlang_1.1.6          futile.logger_1.4.3
[25] remotes_2.5.0        yaml_2.3.10          BiocBaseUtils_1.11.0 tools_4.5.0          httpuv_1.6.16        DT_0.33
[31] lambda.r_1.2.4       curl_6.2.2           vctrs_0.6.5          R6_2.6.1             mime_0.13            lifecycle_1.0.4
[37] fs_1.6.6             htmlwidgets_1.6.4    usethis_3.1.0        miniUI_0.1.2         pkgconfig_2.0.3      desc_1.4.3
[43] callr_3.7.6          clipr_0.8.0          pillar_1.10.2        later_1.4.2          rsconnect_1.3.4      glue_1.8.0
[49] Rcpp_1.0.14          xfun_0.52            tibble_3.2.1         tidyselect_1.2.1     rstudioapi_0.17.1    knitr_1.50
[55] xtable_1.8-4         htmltools_0.5.8.1    rmarkdown_2.29       compiler_4.5.0