Differential expression analysis for datasets of a compendium

This function applies selected methods for differential expression (DE) analysis to selected datasets of an expression data compendium.

Usage

runDE(
  exp.list,
  de.method = c("limma", "edgeR", "DESeq2"),
  padj.method = "flexible",
  parallel = NULL,
  ...
)

metaFC(exp.list, max.na = round(length(exp.list)/3))

writeDE(exp.list, out.dir = NULL)

plotDEDistribution(exp.list, alpha = 0.05, beta = 1)

plotNrSamples(exp.list)

Arguments

exp.list: Experiment list. A list of datasets, each being of class SummarizedExperiment.
de.method: Differential expression method. See documentation of deAna.
padj.method: Method for adjusting p-values to multiple testing. For available methods see the man page of the stats function p.adjust. Defaults to 'flexible', which applies a dataset-specific correction strategy. See details.
parallel: Parallel computation mode. An instance of class BiocParallelParam. See the vignette of the BiocParallel package for switching between serial, multi-core, and grid execution. Defaults to NULL, which then uses the first element of BiocParallel::registered() for execution. If not changed by the user, this accordingly defaults to multi-core execution on the local host.
...: Additional arguments passed to EnrichmentBrowser::deAna.
max.na: Integer. Determines for which genes a meta fold change is computed. Per default, excludes genes for which the fold change is not annotated in >= 1/3 of the datasets in exp.list.
out.dir: Character. Determines the output directory where DE results for each dataset are written to. Defaults to NULL, which then writes to a subdir named 'de' in tools::R_user_dir("GSEABenchmarkeR").
alpha: Statistical significance level. Defaults to 0.05.
beta: Absolute log2 fold change cut-off. Defaults to 1 (2-fold).

Value

runDE returns exp.list with DE measures annotated to the rowData slot of each dataset, writeDE writes to file, and plotDEDistribution plots to a graphics device.

Details

DE studies typically report a gene as differentially expressed if the corresponding DE p-value, corrected for multiple testing, satisfies the chosen significance level. Enrichment methods that work directly on the list of DE genes are then substantially influenced by the multiple testing correction.

An example is the frequently used over-representation analysis (ORA), which assesses the overlap between the DE genes and a gene set under study based on the hypergeometric distribution (see Appendix A of the EnrichmentBrowser vignette for an introduction).

ORA is inapplicable if there are few genes satisfying the significance threshold, or if almost all genes are DE.

Using padj.method="flexible" accounts for these cases by applying multiple testing correction in dependence on the degree of differential expression:

the correction method from Benjamini and Hochberg (BH) is applied if it renders >= 1% and <= 25% of all measured genes as DE,
the p-values are left unadjusted, if the BH correction results in < 1% DE genes, and
the more stringent Bonferroni correction is applied, if the BH correction results in > 25% DE genes.

Note that resulting p-values should not be used for assessing the statistical significance of DE genes within or between datasets. They are solely used to determine which genes are included in the analysis with ORA - where the flexible correction ensures that the fraction of included genes is roughly in the same order of magnitude across datasets.

Alternative stratgies could also be applied - such as taking a constant number of genes for each dataset or excluding ORA methods in general from the assessment.

Author

Ludwig Geistlinger <Ludwig.Geistlinger@sph.cuny.edu>

Examples


    # reading user-defined expression data from file
    data.dir <- system.file("extdata/myEData", package="GSEABenchmarkeR")
    edat <- loadEData(data.dir)

    # differential expression analysis
    edat <- runDE(edat)

    # visualization of per-dataset DE distribution
    plotDEDistribution(edat)


    # calculating meta fold changes across datasets 
    mfcs <- metaFC(edat, max.na=0) 

    # writing DE results to file
    out.dir <- tempdir()
    out.dir <- file.path(out.dir, "de")
    if(!file.exists(out.dir)) dir.create(out.dir)
 
    writeDE(edat, out.dir)