vignettes/lefser.Rmd
lefser.Rmd
Lefser is metagenomic biomarker discovery tool that is based on LEfSe tool and is
published by Huttenhower
et al. 2011. Lefser
is the R implementation of the
LEfSe
method.
Using statistical analyses, lefser
compares microbial
populations of healthy and diseased subjects to discover differencially
expressed microorganisms. Lefser
than computes effect size,
which estimates magnitude of differential expression between the
populations for each differentially expressed microorganism. Subclasses
of classes can also be assigned and used within the analysis.
To install Bioconductor and the lefser
package, run the
following commands.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("lefser")
Then load the lefser
package.
lefser
The lefser
function can be used with a
SummarizedExperiment
.
Load the zeller14
example dataset and exclude ‘adenoma’
conditions.
data(zeller14)
zeller14 <- zeller14[, zeller14$study_condition != "adenoma"]
Note. lefser
supports only two-group contrasts.
The colData
in the SummarizedExperiment
dataset contains the grouping column study_condition
which
includes the ‘control’ and ‘CRC’ groups.
table(zeller14$study_condition)
##
## control CRC
## 66 91
There can be subclasses in each group condition. In the example
dataset we include age_category
as a subclass of
study_condition
which includes ‘adults’ and ‘seniors’. This
variable will correspond to the blockCol
input
argument.
table(zeller14$age_category)
##
## adult senior
## 91 66
We can create a contingency table for the two categorical variables.
table(zeller14$age_category, zeller14$study_condition)
##
## control CRC
## adult 46 45
## senior 20 46
We can now use the lefser
function. It provides results
as a data.frame
with the names of selected microorganisms
and their effect size.
res <- lefser(zeller14, groupCol = "study_condition", blockCol = "age_category")
## Warning in lefser(zeller14, groupCol = "study_condition", blockCol =
## "age_category"): Convert counts to relative abundances with 'relativeAb()'
## The outcome variable is specified as 'study_condition' and the reference category is 'control'.
## See `?factor` or `?relevel` to change the reference category.
head(res)
## Names
## 1 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Ruminococcus
## 2 k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales
## 3 k__Bacteria|p__Firmicutes|c__Bacilli
## 4 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Ruminococcus|s__Ruminococcus_sp_5_1_39BFAA
## 5 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Ruminococcus|s__Ruminococcus_sp_5_1_39BFAA|t__GCF_000159975
## 6 k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Eubacteriaceae|g__Eubacterium|s__Eubacterium_hallii
## scores
## 1 -6.193941
## 2 -5.914844
## 3 -5.903174
## 4 -5.558376
## 5 -5.558376
## 6 -5.507090
lefserPlot
lefserPlot(res)
When using phyloseq
objects, we recommend to extract the
data and create a SummarizedExperiment
object as
follows:
##
## Attaching package: 'phyloseq'
## The following object is masked from 'package:SummarizedExperiment':
##
## distance
## The following object is masked from 'package:Biobase':
##
## sampleNames
## The following object is masked from 'package:GenomicRanges':
##
## distance
## The following object is masked from 'package:IRanges':
##
## distance
fp <- system.file(
"extdata", "study_1457_split_library_seqs_and_mapping.zip",
package = "phyloseq"
)
kostic <- suppressWarnings({
microbio_me_qiime(fp)
})
## Found biom-format file, now parsing it...
## Done parsing biom...
## Importing Sample Metdadata from mapping file...
## Merging the imported objects...
## Successfully merged, phyloseq-class created.
## Returning...
counts <- unclass(otu_table(kostic))
colData <- as(sample_data(kostic), "data.frame")
## create a SummarizedExperiment object
SummarizedExperiment(
assays = list(counts = counts), colData = colData
)
## class: SummarizedExperiment
## dim: 2505 190
## metadata(0):
## assays(1): counts
## rownames(2505): 304309 469478 ... 206906 298806
## rowData names(0):
## colnames(190): C0333.N.518126 C0333.T.518046 ... 32I9UNA9.518098
## BFJMKNMP.518102
## colData names(71): X.SampleID BarcodeSequence ... HOST_TAXID
## Description
You may also consider using
makeTreeSummarizedExperimentFromPhyloseq
from the
mia
package (example not shown).
## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] phyloseq_1.46.0 lefser_1.12.2
## [3] SummarizedExperiment_1.32.0 Biobase_2.62.0
## [5] GenomicRanges_1.54.1 GenomeInfoDb_1.38.6
## [7] IRanges_2.36.0 S4Vectors_0.40.2
## [9] BiocGenerics_0.48.1 MatrixGenerics_1.14.0
## [11] matrixStats_1.2.0 BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-7 permute_0.9-7 sandwich_3.1-0
## [4] rlang_1.1.3 magrittr_2.0.3 multcomp_1.4-25
## [7] ade4_1.7-22 compiler_4.3.2 mgcv_1.9-1
## [10] systemfonts_1.0.5 vctrs_0.6.5 reshape2_1.4.4
## [13] stringr_1.5.1 pkgconfig_2.0.3 crayon_1.5.2
## [16] fastmap_1.1.1 XVector_0.42.0 labeling_0.4.3
## [19] utf8_1.2.4 rmarkdown_2.25 ragg_1.2.7
## [22] purrr_1.0.2 xfun_0.42 modeltools_0.2-23
## [25] zlibbioc_1.48.0 cachem_1.0.8 jsonlite_1.8.8
## [28] biomformat_1.30.0 highr_0.10 rhdf5filters_1.14.1
## [31] DelayedArray_0.28.0 Rhdf5lib_1.24.2 parallel_4.3.2
## [34] cluster_2.1.6 R6_2.5.1 coin_1.4-3
## [37] bslib_0.6.1 stringi_1.8.3 jquerylib_0.1.4
## [40] Rcpp_1.0.12 bookdown_0.38 iterators_1.0.14
## [43] knitr_1.45 zoo_1.8-12 Matrix_1.6-5
## [46] splines_4.3.2 igraph_2.0.2 tidyselect_1.2.0
## [49] abind_1.4-5 yaml_2.3.8 vegan_2.6-4
## [52] codetools_0.2-19 lattice_0.22-5 tibble_3.2.1
## [55] plyr_1.8.9 withr_3.0.0 evaluate_0.23
## [58] desc_1.4.3 survival_3.5-8 Biostrings_2.70.2
## [61] pillar_1.9.0 BiocManager_1.30.22 foreach_1.5.2
## [64] generics_0.1.3 RCurl_1.98-1.14 ggplot2_3.5.0
## [67] munsell_0.5.0 scales_1.3.0 glue_1.7.0
## [70] tools_4.3.2 data.table_1.15.2 fs_1.6.3
## [73] mvtnorm_1.2-4 rhdf5_2.46.1 grid_4.3.2
## [76] ape_5.7-1 libcoin_1.0-10 colorspace_2.1-0
## [79] nlme_3.1-164 GenomeInfoDbData_1.2.11 cli_3.6.2
## [82] textshaping_0.3.7 fansi_1.0.6 S4Arrays_1.2.1
## [85] dplyr_1.1.4 gtable_0.3.4 sass_0.4.8
## [88] digest_0.6.34 SparseArray_1.2.4 TH.data_1.1-2
## [91] farver_2.1.1 memoise_2.0.1 htmltools_0.5.7
## [94] pkgdown_2.0.7 multtest_2.58.0 lifecycle_1.0.4
## [97] MASS_7.3-60.0.1