Skip to contents

HGNCREST

Lifecycle: experimental

The HGNCREST package provides functions for querying the HGNC REST API. The functions follow the HUGO Gene Nomenclature Committee (HGNC) REST API documentation at https://www.genenames.org/help/rest/. There are three main operations that can be performed with this package:

  1. fetching general information about the HGNC database (hgnc_info).
  2. fetching information about a specific gene (hgnc_fetch)
  3. searching for genes based on a query (hgnc_search)

Installation

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("waldronlab/HGNCREST")

Package load

library(HGNCREST)

General information

The hgnc_info function returns general information about the HGNC database. It includes searchableFields and storedFields metadata.

## $lastModified
## [1] "2025-05-06T13:38:21.968Z"
## 
## $numDoc
## [1] 45901
## 
## $responseHeader
## $responseHeader$QTime
## [1] 1
## 
## $responseHeader$status
## [1] 0
## 
## 
## $searchableFields
##  [1] "hgnc_id"          "rna_central_id"   "alias_name"       "locus_group"     
##  [5] "symbol"           "location"         "mane_select"      "name"            
##  [9] "rgd_id"           "entrez_id"        "status"           "uniprot_ids"     
## [13] "alias_symbol"     "ccds_id"          "omim_id"          "ucsc_id"         
## [17] "mgd_id"           "prev_symbol"      "curator_notes"    "refseq_accession"
## [21] "ena"              "locus_type"       "ensembl_gene_id"  "vega_id"         
## [25] "prev_name"       
## 
## $storedFields
##  [1] "locus_type"             "horde_id"               "bioparadigms_slc"      
##  [4] "enzyme_id"              "prev_name"              "date_symbol_changed"   
##  [7] "refseq_accession"       "mgd_id"                 "homeodb"               
## [10] "omim_id"                "alias_name"             "gtrnadb"               
## [13] "pubmed_id"              "alias_symbol"           "date_approved_reserved"
## [16] "ccds_id"                "location"               "name"                  
## [19] "uuid"                   "lsdb"                   "status"                
## [22] "alias_name"             "_version_"              "cosmic"                
## [25] "rna_central_id"         "date_name_changed"      "ensembl_gene_id"       
## [28] "vega_id"                "mirbase"                "location"              
## [31] "prev_symbol"            "curator_notes"          "cd"                    
## [34] "mamit-trnadb"           "ena"                    "lncipedia"             
## [37] "snornabase"             "prev_name"              "gene_group"            
## [40] "merops"                 "ucsc_id"                "uniprot_ids"           
## [43] "imgt"                   "symbol"                 "mane_select"           
## [46] "rgd_id"                 "entrez_id"              "date_modified"         
## [49] "lncrnadb"               "gencc"                  "locus_group"           
## [52] "orphanet"               "iuphar"                 "hgnc_id"               
## [55] "agr"                    "gene_group_id"          "pseudogene.org"

Searchable fields

The searchableFields function is a convenience function that returns a character vector of searchable fields in the HGNC database.

##  [1] "ccds_id"          "alias_symbol"     "uniprot_ids"      "status"          
##  [5] "entrez_id"        "rgd_id"           "name"             "mane_select"     
##  [9] "symbol"           "location"         "locus_group"      "alias_name"      
## [13] "hgnc_id"          "rna_central_id"   "prev_name"        "vega_id"         
## [17] "locus_type"       "ensembl_gene_id"  "ena"              "refseq_accession"
## [21] "curator_notes"    "prev_symbol"      "mgd_id"           "ucsc_id"         
## [25] "omim_id"

Fetching gene information

The hgnc_fetch function returns a tibble with information about the gene specified by the searchableField and value arguments.

hgnc_fetch("ena", "BC040926")
## # A tibble: 1 × 27
##   date_approved_reserved gene_group_id hgnc_id  alias_symbol date_symbol_changed
##   <chr>                  <list>        <chr>    <list>       <chr>              
## 1 2009-07-20T00:00:00Z   <int [1]>     HGNC:37… <chr [1]>    2010-11-25T00:00:0…
## # ℹ 22 more variables: locus_group <chr>, refseq_accession <list>,
## #   lncipedia <chr>, vega_id <chr>, name <chr>, uuid <chr>, status <chr>,
## #   locus_type <chr>, prev_symbol <list>, ena <list>, symbol <chr>,
## #   entrez_id <chr>, gene_group <list>, date_name_changed <chr>,
## #   ensembl_gene_id <chr>, location_sortable <chr>, prev_name <list>,
## #   ucsc_id <chr>, rna_central_id <list>, date_modified <chr>, location <chr>,
## #   agr <chr>

Searching for genes

The hgnc_search function searches for genes based on a query. It returns a data.frame with information about the genes that match the query.

hgnc_search("symbol", "BRCA1")
##     hgnc_id symbol    score
## 1 HGNC:1100  BRCA1 4.694909

The hgnc_search function also allows for more complex queries using the query argument. The query should be a string that follows the HGNC REST API query syntax.

hgnc_search("symbol", c("ZNF*", "AND", "status:Approved")) |>
    head()
##      hgnc_id       symbol    score
## 1 HGNC:12991         ZNF2 1.018147
## 2 HGNC:13089         ZNF3 1.018147
## 3 HGNC:13139         ZNF7 1.018147
## 4 HGNC:13154         ZNF8 1.018147
## 5 HGNC:55280      ZNF8-DT 1.018147
## 6 HGNC:56757 ZNF8-ERVK3-1 1.018147

Searching for genes based on multiple criteria

In this example, we search for genes with the name “MAPK interacting” and that have a locus type of “gene with protein product”. Note that “locus_type” is a searchable field in the HGNC database that can be used to filter the search results.

hgnc_search(
    "name",
    c("MAPK interacting", "AND", "locus_type:gene with protein product")
)
##     hgnc_id symbol    score
## 1 HGNC:7110  MKNK1 8.771411
## 2 HGNC:7111  MKNK2 8.771411

Conclusion

The HGNCREST package provides a convenient way to query the HGNC REST API from R. It allows users to fetch information about specific genes, search for genes based on a query, and get general information about the HGNC database.

sessionInfo

Click to expand
## R version 4.5.0 (2025-04-11)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] HGNCREST_0.99.6  httr2_1.1.2      BiocStyle_2.37.0
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.6.5          cli_3.6.5            knitr_1.50          
##  [4] rlang_1.1.6          xfun_0.52            textshaping_1.0.1   
##  [7] jsonlite_2.0.0       glue_1.8.0           htmltools_0.5.8.1   
## [10] BiocBaseUtils_1.11.0 ragg_1.4.0           sass_0.4.10         
## [13] rappdirs_0.3.3       rmarkdown_2.29       tibble_3.2.1        
## [16] evaluate_1.0.3       jquerylib_0.1.4      fastmap_1.2.0       
## [19] yaml_2.3.10          lifecycle_1.0.4      bookdown_0.43       
## [22] BiocManager_1.30.25  compiler_4.5.0       fs_1.6.6            
## [25] pkgconfig_2.0.3      htmlwidgets_1.6.4    systemfonts_1.2.3   
## [28] digest_0.6.37        R6_2.6.1             utf8_1.2.5          
## [31] pillar_1.10.2        curl_6.2.2           magrittr_2.0.3      
## [34] bslib_0.9.0          tools_4.5.0          pkgdown_2.1.2       
## [37] cachem_1.1.0         desc_1.4.3