A few functions are available to search for build versions, either from NCBI or UCSC.
translateBuild
: translates between UCSC and NCBI build versionsextractBuild
: use grep patterns to find the first build within the string inputuniformBuilds
: replace build occurrences below a threshold level of occurence with the alternative buildcorrectBuild
: Ensure that the build annotation is correct based on the NCBI/UCSC website. If not, usetranslateBuild
with the indicated 'style' inputisCorrect
: Check to see if the build is exactly as annotated
Arguments
- from
character() A vector of build versions typically from
genome()
(e.g., "37"). The build vector must be homogenous (i.e.,length(unique(x)) == 1L
).- to
character(1) The name of the desired build version (either "UCSC" or "NCBI"; default: "UCSC")
- build
A vector of build version names (default UCSC, NCBI)
- style
character(1) The annotation style, either 'UCSC' or 'NCBI'
- string
A single character string
- builds
A character vector of builds
- cutoff
numeric(1L) An inclusive threshold tolerance value for missing values and translating builds that are below the threshold
- na
character() The values to be considered as missing (default: c("", "NA"))
Value
translateBuild: A character vector of translated genome builds
extractBuild: A character string of the build information available
uniformBuilds: A character vector of builds where all builds are
identical `identical(length(unique(build)), 1L)`
correctBuild: A character string of the 'corrected' build name
isCorrect: A logical indicating if the build is exactly as annotated
Details
The correctBuild
function takes the input and ensures that
the style specified matches the input. Otherwise, it will
return the correct style for use with seqlevelsStyle
.
Currently, the function does not support patched builds
(e.g., 'GRCh38.p13') Build names are taken from the website:
https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/
Examples
translateBuild("GRCh35", "UCSC")
#> [1] "hg17"
correctBuild("grch38", "NCBI")
#> [1] "GRCh38"
correctBuild("hg19", "NCBI")
#> [1] "GRCh37"
isCorrect("GRCh38", "NCBI")
#> [1] TRUE
isCorrect("hg19", "UCSC")
#> [1] TRUE
extractBuild(
"SCENA_p_TCGAb29and30_SNP_N_GenomeWideSNP_6_G05_569110.nocnv_grch38.seg.txt"
)
#> NCBI
#> "grch38"
buildvec <- rep(c("GRCh37", "hg19"), times = c(5, 1))
uniformBuilds(buildvec)
#> [1] "GRCh37" "GRCh37" "GRCh37" "GRCh37" "GRCh37" "GRCh37"
navec <- c(rep(c("GRCh37", "hg19"), times = c(5, 1)), "NA")
uniformBuilds(navec)
#> [1] "GRCh37" "GRCh37" "GRCh37" "GRCh37" "GRCh37" "GRCh37" "GRCh37"