Perform a LEfSe analysis: the function carries out differential analysis between two sample groups for multiple microorganisms and uses linear discirminant analysis to establish their effect sizes. Subclass information for each class can be incorporated into the analysis (see examples). Microorganisms with large differences between two sample groups are identified as biomarkers.

lefser(
  relab,
  kruskal.threshold = 0.05,
  wilcox.threshold = 0.05,
  lda.threshold = 2,
  groupCol = "GROUP",
  blockCol = NULL,
  assay = 1L,
  trim.names = FALSE,
  checkAbundances = TRUE,
  ...,
  expr
)

Arguments

relab

A SummarizedExperiment with relative abundances in the assay

kruskal.threshold

numeric(1) The p-value for the Kruskal-Wallis Rank Sum Test (default 0.05).

wilcox.threshold

numeric(1) The p-value for the Wilcoxon Rank-Sum Test when 'blockCol' is present (default 0.05).

lda.threshold

numeric(1) The effect size threshold (default 2.0).

groupCol

character(1) Column name in colData(relab) indicating groups, usually a factor with two levels (e.g., c("cases", "controls"); default "GROUP").

blockCol

character(1) Optional column name in colData(relab) indicating the blocks, usually a factor with two levels (e.g., c("adult", "senior"); default NULL).

assay

The i-th assay matrix in the SummarizedExperiment ('relab'; default 1).

trim.names

If TRUE extracts the most specific taxonomic rank of organism.

checkAbundances

logical(1) Whether to check if the assay data in the relab input are relative abundances or counts. If counts are found, a warning will be emitted (default TRUE).

expr

(deprecated) Use relab instead. A SummarizedExperiment with relative abundances in the assay

...

Additional inputs to lower level functions (not used).

Value

The function returns a data.frame with two columns, which are names of microorganisms and their LDA scores.

Details

The LEfSe method expects relative abundances in the expr input. A warning will be emitted if the column sums do not result in 1. Use the relativeAb helper function to convert the data in the SummarizedExperiment to relative abundances. The checkAbundances argument enables checking the data for presence of relative abundances and can be turned off by setting the argument to FALSE.

Examples

    # (1) Using classes only
    data(zeller14)
    # exclude 'adenoma'
    zeller14 <- zeller14[, zeller14$study_condition != "adenoma"]
    res_group <- lefser(zeller14, groupCol = "study_condition")
#> Warning: Convert counts to relative abundances with 'relativeAb()'
#> The outcome variable is specified as 'study_condition' and the reference category is 'CRC'.
#>  See `?factor` or `?relevel` to change the reference category.
    head(res_group)
#>                                                                                                                                 Names
#> 1                                                                                         k__Bacteria|p__Bacteroidetes|c__Bacteroidia
#> 2                                                                        k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales
#> 3                                                                                                        k__Bacteria|p__Bacteroidetes
#> 4                                                  k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae
#> 5                  k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Lachnospiraceae|g__Coprococcus|s__Coprococcus_eutactus
#> 6 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Lachnospiraceae|g__Coprococcus|s__Coprococcus_eutactus|t__GCF_000154425
#>      scores
#> 1 -6.364356
#> 2 -6.364356
#> 3 -6.364325
#> 4 -5.791754
#> 5 -5.622091
#> 6 -5.566016

    # (2) Using classes and sublasses
    data(zeller14)
    # exclude 'adenoma'
    zeller14 <- zeller14[, zeller14$study_condition != "adenoma"]
    res_block <- lefser(
         zeller14, groupCol = "study_condition", blockCol = "age_category"
    )
#> Warning: Convert counts to relative abundances with 'relativeAb()'
#> The outcome variable is specified as 'study_condition' and the reference category is 'CRC'.
#>  See `?factor` or `?relevel` to change the reference category.
    head(res_block)
#>                                                                                                                                                                                Names
#> 1                                                                               k__Bacteria|p__Fusobacteria|c__Fusobacteriia|o__Fusobacteriales|f__Fusobacteriaceae|g__Fusobacterium
#> 2                                                                                                k__Bacteria|p__Fusobacteria|c__Fusobacteriia|o__Fusobacteriales|f__Fusobacteriaceae
#> 3                                                                                k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae|g__Porphyromonas
#> 4 k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae|g__Porphyromonas|s__Porphyromonas_asaccharolytica|t__Porphyromonas_asaccharolytica_unclassified
#> 5                                               k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae|g__Porphyromonas|s__Porphyromonas_asaccharolytica
#> 6                                                                                                                    k__Bacteria|p__Fusobacteria|c__Fusobacteriia|o__Fusobacteriales
#>      scores
#> 1 -5.233182
#> 2 -5.212977
#> 3 -5.039454
#> 4 -4.987723
#> 5 -4.932632
#> 6 -4.840816