R implementation of the LEfSe method

Perform a LEfSe analysis: the function carries out differential analysis between two sample classes for multiple features and uses linear discriminant analysis to establish their effect sizes. Subclass information for each class can be incorporated into the analysis (see examples). Features with large differences between two sample classes are identified as biomarkers.

lefser(
  relab,
  kruskal.threshold = 0.05,
  wilcox.threshold = 0.05,
  lda.threshold = 2,
  classCol = "CLASS",
  subclassCol = NULL,
  assay = 1L,
  trim.names = FALSE,
  checkAbundances = TRUE,
  method = "none",
  ...,
  groupCol,
  blockCol
)

Arguments

relab: A SummarizedExperiment-class with relative abundances in the assay
kruskal.threshold: numeric(1) The p-value for the Kruskal-Wallis Rank Sum Test (default 0.05). If multiple hypothesis testing is performed, this threshold is applied to corrected p-values.
wilcox.threshold: numeric(1) The p-value for the Wilcoxon Rank-Sum Test when 'subclassCol' is present (default 0.05). If multiple hypothesis testing is performed, this threshold is applied to corrected p-values.
lda.threshold: numeric(1) The effect size threshold (default 2.0).
classCol: character(1) Column name in colData(relab) indicating class, usually a factor with two levels (e.g., c("cases", "controls"); default "CLASS").
subclassCol: character(1) Optional column name in colData(relab) indicating the subclasses, usually a factor with two levels (e.g., c("adult", "senior"); default NULL), but can be more than two levels.
assay: The i-th assay matrix in the SummarizedExperiment ('relab'; #' default 1).
trim.names: Default is FALSE. If TRUE, this function extracts the most specific taxonomic rank of organism.
checkAbundances: logical(1) Whether to check if the assay data in the relab input are relative abundances or counts. If counts are found, a warning will be emitted (default TRUE).
method: Default is "none" as in the original LEfSe implementation. Character string of length one, passed on to p.adjust to set option for multiple testing. For multiple pairwise comparisons, each comparison is adjusted separately. Options are "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr" (synonym for "BH"), and "none".
groupCol: (DEFUNCT) Column name in colData(relab) indicating groups, usually a factor with two levels (e.g., c("cases", "controls"); default "GROUP").
blockCol: (DEFUNCT) Optional column name in colData(relab) indicating the blocks, usually a factor with two levels (e.g., c("adult", "senior"); default NULL).
...: Additional inputs to lower level functions (not used).

Value

The function returns a data.frame with two columns, which are names of features and their LDA scores.

Details

The LEfSe method expects relative abundances in the expr input. A warning will be emitted if the column sums do not result in 1. Use the relativeAb helper function to convert the data in the SummarizedExperiment to relative abundances. The checkAbundances argument enables checking the data for presence of relative abundances and can be turned off by setting the argument to FALSE.

Examples


    data(zeller14)
    zeller14 <- zeller14[, zeller14$study_condition != "adenoma"]
    tn <- get_terminal_nodes(rownames(zeller14))
    zeller14tn <- zeller14[tn,]
    zeller14tn_ra <- relativeAb(zeller14tn)

    # (1) Using classes only
    res_class <- lefser(zeller14tn_ra,
                        classCol = "study_condition")
#> The outcome variable is specified as 'study_condition' and the reference category is 'CRC'.
#>  See `?factor` or `?relevel` to change the reference category.
    # (2) Using classes and sub-classes
    res_subclass <- lefser(zeller14tn_ra,
                        classCol = "study_condition",
                        subclassCol = "age_category")
#> The outcome variable is specified as 'study_condition' and the reference category is 'CRC'.
#>  See `?factor` or `?relevel` to change the reference category.