In CNV analysis, it is often of interest to summarize individual calls across the population, (i.e. to define CNV regions), for subsequent association analysis with e.g. phenotype data.
Usage
populationRanges(
grl,
mode = c("density", "RO"),
density = 0.1,
ro.thresh = 0.5,
multi.assign = FALSE,
verbose = FALSE,
min.size = 2,
classify.ranges = TRUE,
type.thresh = 0.1,
est.recur = FALSE
)
Arguments
- grl
A
GRangesList
.- mode
Character. Should population ranges be computed based on regional density ("density") or reciprocal overlap ("RO"). See Details.
- density
Numeric. Defaults to 0.1.
- ro.thresh
Numeric. Threshold for reciprocal overlap required for merging two overlapping regions. Defaults to 0.5.
- multi.assign
Logical. Allow regions to be assigned to several region clusters? Defaults to
FALSE
.- verbose
Logical. Report progress messages? Defaults to
FALSE
.- min.size
Numeric. Minimum size of a summarized region to be included. Defaults to 2 bp.
- classify.ranges
Logical. Should CNV frequency (number of samples overlapping the region) and CNV type (gain, loss, or both) be annotated? Defaults to
TRUE
.- type.thresh
Numeric. Required minimum relative frequency of each CNV type (gain / loss) to be taken into account when assigning CNV type to a region. Defaults to 0.1. That means for a region overlapped by individual gain and loss calls that both types must be present in >10 in order to be typed as 'both'. If gain or loss calls are present below the threshold they are ignored.
- est.recur
Logical. Should recurrence of regions be assessed via a permutation test? Defaults to
FALSE
. See Details.
Value
A GRanges
object containing the summarized
CNV ranges.
Details
CNVRuler procedure that trims region margins based on regional density
Trims low-density areas (usually <10% of the total contributing individual calls within a summarized region).
An illustration of the concept can be found here: https://www.ncbi.nlm.nih.gov/pubmed/22539667 (Figure 1)
Reciprocal overlap (RO) approach (e.g. Conrad et al., Nature, 2010)
Reciprocal overlap of 0.51 between two genomic regions A and B:
requires that B overlaps at least 51% of A, *and* that A also overlaps at least 51% of B
Approach:
At the top level of the hierarchy, all contiguous bases overlapping at least 1bp of individual calls are merged into one region. Within each region, we further define reciprocally overlapping regions with the following algorithm:
Calculate reciprocal overlap (RO) between all remaining calls.
Identify pair of calls with greatest RO. If RO > threshold, merge and create a new CNV. If not, exit.
Continue adding unclustered calls to the region, in order of best overlap. In order to add a call, the new call must have > threshold to all calls within the region to be added. When no additional calls may be added, move to next step.
If calls remain, return to 1. Otherwise exit.
GISTIC procedure (Beroukhim et al., PNAS, 2007) to identify recurrent CNV regions
GISTIC scores each CNV region with a G-score that is proportional to the total magnitude of CNV calls in each CNV region. In addition, by permuting the locations in each sample, GISTIC determines the frequency with which a given score would be attained if the events were due to chance and therefore randomly distributed. A significance threshold can then be used to determine scores / regions that are unlikely to occur by chance alone.
References
Kim et al. (2012) CNVRuler: a copy number variation-based case-control association analysis tool. Bioinformatics, 28(13):1790-2.
Conrad et al. (2010) Origins and functional impact of copy number variation in the human genome. Nature, 464(7289):704-12.
Beroukhim et al. (2007) Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. PNAS, 104(50):20007-12.
Examples
grl <- GRangesList(
sample1 = GRanges( c("chr1:1-10", "chr2:15-18", "chr2:25-34") ),
sample2 = GRanges( c("chr1:1-10", "chr2:11-18" , "chr2:25-36") ),
sample3 = GRanges( c("chr1:2-11", "chr2:14-18", "chr2:26-36") ),
sample4 = GRanges( c("chr1:1-12", "chr2:18-35" ) ),
sample5 = GRanges( c("chr1:1-12", "chr2:11-17" , "chr2:26-34") ) ,
sample6 = GRanges( c("chr1:1-12", "chr2:12-18" , "chr2:25-35") )
)
# default as chosen in the original CNVRuler procedure
populationRanges(grl, density=0.1, classify.ranges=FALSE)
#> GRanges object with 3 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 1-12 *
#> [2] chr2 11-18 *
#> [3] chr2 25-36 *
#> -------
#> seqinfo: 2 sequences from an unspecified genome; no seqlengths
# density = 0 merges all overlapping regions,
# equivalent to: reduce(unlist(grl))
populationRanges(grl, density=0, classify.ranges=FALSE)
#> GRanges object with 2 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 1-12 *
#> [2] chr2 11-36 *
#> -------
#> seqinfo: 2 sequences from an unspecified genome; no seqlengths
# density = 1 disjoins all overlapping regions,
# equivalent to: disjoin(unlist(grl))
populationRanges(grl, density=1, classify.ranges=FALSE)
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 2-10 *
#> -------
#> seqinfo: 2 sequences from an unspecified genome; no seqlengths
# RO procedure
populationRanges(grl, mode="RO", ro.thresh=0.5, classify.ranges=FALSE)
#> GRanges object with 5 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 1-12 *
#> [2] chr2 11-18 *
#> [3] chr2 15-18 *
#> [4] chr2 18-35 *
#> [5] chr2 25-36 *
#> -------
#> seqinfo: 2 sequences from an unspecified genome; no seqlengths