Evaluation of enrichment methods on random gene sets

This function evaluates the proportion of rejected null hypotheses (= the fraction of significant gene sets) of an enrichment method when applied to random gene sets of defined size.

Usage

evalRandomGS(
  method,
  se,
  nr.gs = 100,
  set.size = 5,
  alpha = 0.05,
  padj = "none",
  perc = TRUE,
  reps = 100,
  rep.block.size = -1,
  summarize = TRUE,
  save2file = FALSE,
  out.dir = NULL,
  ...
)

Arguments

method: Enrichment analysis method. A character scalar chosen from sbeaMethods and nbeaMethods, or a user-defined function implementing a method for enrichment analysis.
se: An expression dataset of class SummarizedExperiment.
nr.gs: Integer. Number of random gene sets. Defaults to 100.
set.size: Integer. Gene set size, i.e. number of genes in each random gene set.
alpha: Numeric. Statistical significance level. Defaults to 0.05.
padj: Character. Method for adjusting p-values to multiple testing. For available methods see the man page of the stats function p.adjust. Defaults to "none".
perc: Logical. Should the percentage (between 0 and 100, default) or the proportion (between 0 and 1) of significant gene sets be returned?
reps: Integer. Number of replications. Defaults to 100.
rep.block.size: Integer. When running in parallel, splits reps into blocks of the indicated size. Defaults to -1, which indicates to not partition reps.
summarize: Logical. If TRUE (default) returns the mean (mean) and the standard deviation (sd) of the proportion of significant gene sets across reps replications. Use FALSE to return the full vector storing the proportion of significant gene sets for each replication.
save2file: Logical. Should results be saved to file for subsequent benchmarking? Defaults to FALSE.
out.dir: Character. Determines the output directory where results are saved to. Defaults to NULL, which then writes to tools::R_user_dir("GSEABenchmarkeR") in case save2file is set to TRUE.
...: Additional arguments passed to the selected enrichment method.

Value

A named numeric vector of length 2 storing mean and standard deviation of the proportion of significant gene sets across reps replications (summarize=TRUE); or a numeric vector of length reps storing the the proportion of significant gene sets for each replication itself (summarize=FALSE).

Author

Ludwig Geistlinger <Ludwig.Geistlinger@sph.cuny.edu>

Examples


    # loading two datasets from the GEO2KEGG compendium
    geo2kegg <- loadEData("geo2kegg", nr.datasets = 2)
#> Loading GEO2KEGG data compendium ...

    # only considering the first 1000 probes for demonstration
    geo2kegg <- lapply(geo2kegg, function(d) d[1:1000,]) 

    # preprocessing and DE analysis for two of the datasets
    geo2kegg <- maPreproc(geo2kegg)
#> Summarizing probe level expression ...
#> Corresponding annotation package not found: hgu133a.db
#> Make sure that you have it installed.
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> 'help("repositories", package = "BiocManager")' for details.
#> Replacement repositories:
#>     CRAN: https://p3m.dev/cran/__linux__/noble/latest
#> Bioconductor version 3.21 (BiocManager 1.30.25), R 4.5.0 (2025-04-11)
#> Installing package(s) 'hgu133a.db'
#> Corresponding annotation package not found: hgu133plus2.db
#> Make sure that you have it installed.
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> 'help("repositories", package = "BiocManager")' for details.
#> Replacement repositories:
#>     CRAN: https://p3m.dev/cran/__linux__/noble/latest
#> Bioconductor version 3.21 (BiocManager 1.30.25), R 4.5.0 (2025-04-11)
#> Installing package(s) 'hgu133plus2.db'
    geo2kegg <- runDE(geo2kegg)

    evalRandomGS("camera", geo2kegg[[1]], reps = 3)
#>     mean       sd 
#> 5.333333 1.527525

Usage

Arguments

Value

See also

Author

Examples