Read exon-level expression files and create a GRangesList
Source: R/makeGRangesListFromExonFiles.R
makeGRangesListFromExonFiles.Rd
This function serves to read exon-level expression data. It works for exon quantification (raw counts and RPKM) and junction quantification (raw counts) file paths and represents such data as a GRangesList. The data files can be downloaded via the Genomic Data Commons (GDC) Legacy Archive.
Usage
makeGRangesListFromExonFiles(
filepaths,
sampleNames = NULL,
fileNames = basename(filepaths),
getBarcodes = TRUE,
rangesColumn = "exon",
nrows = Inf
)
Arguments
- filepaths
character() vector of file paths containing TCGA exon data usually obtained from the GDC
- sampleNames
character() vector of TCGA barcodes to be used as names for the
GRangesList
output (default NULL)- fileNames
character() vector of file names as downloaded from the Genomic Data Commons Legacy archive (default
basename(filepaths)
)- getBarcodes
logical(1). Whether to query the GDC API with the
filenameToBarcode
and obtain the TCGA barcodes from the file names (default TRUE); see details.- rangesColumn
character(1). The name of the column in the data containing the ranges information (default "exon"); see details.
- nrows
numeric(1). The number of rows to return from each of the files read in (all rows by default; default Inf)
Value
A GRangesList object
Details
The rangesColumn
name in the GDC data files is usually "exon"
but can be changed with the rangesColumn
argument, if different.
To avoid programmatically obtaining TCGA barcodes from the GDC
API, set the getBarcodes
to FALSE
. When getBarcodes
is set to
FALSE
, the file names are used to name the elements of the GRangesList
output.
Examples
## Load example file found in package
pkgDir <- system.file("extdata", package = "TCGAutils", mustWork = TRUE)
exonFile <- list.files(pkgDir, pattern = "cation\\.txt$", full.names = TRUE)
filePrefix <- "unc.edu.32741f9a-9fec-441f-96b4-e504e62c5362.1755371."
## Add actual file name manually (due to Windows OS restriction)
makeGRangesListFromExonFiles(exonFile,
fileNames = paste0(filePrefix, basename(exonFile)),
sampleNames = "TCGA-AA-3678-01A-01R-0905-07")
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> exon = col_character(),
#> raw_counts = col_double(),
#> median_length_normalized = col_double(),
#> RPKM = col_double()
#> )
#> GRangesList object of length 1:
#> $`TCGA-AA-3678-01A-01R-0905-07`
#> GRanges object with 100 ranges and 3 metadata columns:
#> seqnames ranges strand | raw_counts median_length_normalized
#> <Rle> <IRanges> <Rle> | <numeric> <numeric>
#> [1] chr1 11874-12227 + | 4 0.492918
#> [2] chr1 12595-12721 + | 2 0.341270
#> [3] chr1 12613-12721 + | 2 0.398148
#> [4] chr1 12646-12697 + | 2 0.372549
#> [5] chr1 13221-14409 + | 39 0.632997
#> ... ... ... ... . ... ...
#> [96] chr1 881782-881925 - | 179 1
#> [97] chr1 883511-883612 - | 151 1
#> [98] chr1 883870-883983 - | 155 1
#> [99] chr1 886507-886618 - | 144 1
#> [100] chr1 887380-887519 - | 158 1
#> RPKM
#> <numeric>
#> [1] 0.322477
#> [2] 0.449436
#> [3] 0.523655
#> [4] 1.097661
#> [5] 0.936105
#> ... ...
#> [96] 35.4758
#> [97] 42.2492
#> [98] 38.8033
#> [99] 36.6933
#> [100] 32.2085
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>