Skip to contents

This function serves to read exon-level expression data. It works for exon quantification (raw counts and RPKM) and junction quantification (raw counts) file paths and represents such data as a GRangesList. The data files can be downloaded via the Genomic Data Commons (GDC) Legacy Archive.

Usage

makeGRangesListFromExonFiles(
  filepaths,
  sampleNames = NULL,
  fileNames = basename(filepaths),
  getBarcodes = TRUE,
  rangesColumn = "exon",
  nrows = Inf
)

Arguments

filepaths

character() vector of file paths containing TCGA exon data usually obtained from the GDC

sampleNames

character() vector of TCGA barcodes to be used as names for the GRangesList output (default NULL)

fileNames

character() vector of file names as downloaded from the Genomic Data Commons Legacy archive (default basename(filepaths))

getBarcodes

logical(1). Whether to query the GDC API with the filenameToBarcode and obtain the TCGA barcodes from the file names (default TRUE); see details.

rangesColumn

character(1). The name of the column in the data containing the ranges information (default "exon"); see details.

nrows

numeric(1). The number of rows to return from each of the files read in (all rows by default; default Inf)

Value

A GRangesList object

Details

The rangesColumn name in the GDC data files is usually "exon" but can be changed with the rangesColumn argument, if different. To avoid programmatically obtaining TCGA barcodes from the GDC API, set the getBarcodes to FALSE. When getBarcodes is set to FALSE, the file names are used to name the elements of the GRangesList output.

Author

M. Ramos

Examples


## Load example file found in package
pkgDir <- system.file("extdata", package = "TCGAutils", mustWork = TRUE)
exonFile <- list.files(pkgDir, pattern = "cation\\.txt$", full.names = TRUE)

filePrefix <- "unc.edu.32741f9a-9fec-441f-96b4-e504e62c5362.1755371."

## Add actual file name manually (due to Windows OS restriction)
makeGRangesListFromExonFiles(exonFile,
    fileNames = paste0(filePrefix, basename(exonFile)),
    sampleNames = "TCGA-AA-3678-01A-01R-0905-07")
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   exon = col_character(),
#>   raw_counts = col_double(),
#>   median_length_normalized = col_double(),
#>   RPKM = col_double()
#> )
#> GRangesList object of length 1:
#> $`TCGA-AA-3678-01A-01R-0905-07`
#> GRanges object with 100 ranges and 3 metadata columns:
#>         seqnames        ranges strand | raw_counts median_length_normalized
#>            <Rle>     <IRanges>  <Rle> |  <numeric>                <numeric>
#>     [1]     chr1   11874-12227      + |          4                 0.492918
#>     [2]     chr1   12595-12721      + |          2                 0.341270
#>     [3]     chr1   12613-12721      + |          2                 0.398148
#>     [4]     chr1   12646-12697      + |          2                 0.372549
#>     [5]     chr1   13221-14409      + |         39                 0.632997
#>     ...      ...           ...    ... .        ...                      ...
#>    [96]     chr1 881782-881925      - |        179                        1
#>    [97]     chr1 883511-883612      - |        151                        1
#>    [98]     chr1 883870-883983      - |        155                        1
#>    [99]     chr1 886507-886618      - |        144                        1
#>   [100]     chr1 887380-887519      - |        158                        1
#>              RPKM
#>         <numeric>
#>     [1]  0.322477
#>     [2]  0.449436
#>     [3]  0.523655
#>     [4]  1.097661
#>     [5]  0.936105
#>     ...       ...
#>    [96]   35.4758
#>    [97]   42.2492
#>    [98]   38.8033
#>    [99]   36.6933
#>   [100]   32.2085
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>