Installation
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SingleCellMultiModal")
Load libraries
library(MultiAssayExperiment)
library(SingleCellMultiModal)
library(SingleCellExperiment)
CITE-seq dataset
CITE-seq data are a combination of two data types extracted at the same time from the same cell. First data type is scRNA-seq data, while the second one consists of about a hundread of antibody-derived tags (ADT). In particular this dataset is provided by Stoeckius et al. (2017).
Downloading datasets
The user can see the available dataset by using the default options
CITEseq(DataType="cord_blood", modes="*", dry.run=TRUE, version="1.0.0")
## Dataset: cord_blood
## ah_id mode file_size rdataclass rdatadateadded rdatadateremoved
## 1 EH3795 scADT_Counts 0.2 Mb matrix 2020-09-23 <NA>
## 2 EH3796 scRNAseq_Counts 22.2 Mb matrix 2020-09-23 <NA>
## 3 EH8228 coldata_scRNAseq 0.1 Mb data.frame 2023-05-17 <NA>
## 4 EH8305 scADT_clrCounts 0.8 Mb matrix 2023-07-05 <NA>
Or simply by setting dry.run = FALSE
it downloads the
data and creates the MultiAssayExperiment
object.
In this example, we will use one of the two available datasets
scADT_Counts
:
mae <- CITEseq(
DataType="cord_blood", modes="*", dry.run=FALSE, version="1.0.0"
)
## Warning: 'ExperimentList' contains 'data.frame' or 'DataFrame',
## potential for errors with mixed data types
mae
## A MultiAssayExperiment object of 3 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 3:
## [1] scADT_clr: matrix with 13 rows and 7858 columns
## [2] scADT: matrix with 13 rows and 7858 columns
## [3] scRNAseq: matrix with 36280 rows and 7858 columns
## Functionality:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample coordination DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
## exportClass() - save data to flat files
Example with actual data:
experiments(mae)
## ExperimentList class object of length 3:
## [1] scADT_clr: matrix with 13 rows and 7858 columns
## [2] scADT: matrix with 13 rows and 7858 columns
## [3] scRNAseq: matrix with 36280 rows and 7858 columns
Exploring the data structure
Check row annotations:
rownames(mae)
## CharacterList of length 3
## [["scADT_clr"]] CD3 CD4 CD8 CD45RA CD56 CD16 CD10 CD11c CD14 CD19 CD34 CCR5 CCR7
## [["scADT"]] CD3 CD4 CD8 CD45RA CD56 CD16 CD10 CD11c CD14 CD19 CD34 CCR5 CCR7
## [["scRNAseq"]] ERCC_ERCC-00104 HUMAN_A1BG ... MOUSE_n-R5s25 MOUSE_n-R5s31
Take a peek at the sampleMap
:
sampleMap(mae)
## DataFrame with 23574 rows and 3 columns
## assay primary colname
## <factor> <character> <character>
## 1 scADT_clr TACAGTGTCTCGGACG TACAGTGTCTCGGACG
## 2 scADT_clr GTTTCTACATCATCCC GTTTCTACATCATCCC
## 3 scADT_clr GTACGTATCCCATTTA GTACGTATCCCATTTA
## 4 scADT_clr ATGTGTGGTCGCCATG ATGTGTGGTCGCCATG
## 5 scADT_clr AACGTTGTCAGTTAGC AACGTTGTCAGTTAGC
## ... ... ... ...
## 23570 scRNAseq AGCGTCGAGTCAAGGC AGCGTCGAGTCAAGGC
## 23571 scRNAseq GTCGGGTAGTAGCCGA GTCGGGTAGTAGCCGA
## 23572 scRNAseq GTCGGGTAGTTCGCAT GTCGGGTAGTTCGCAT
## 23573 scRNAseq TTGCCGTGTAGATTAG TTGCCGTGTAGATTAG
## 23574 scRNAseq GGCGTGTAGTGTACTC GGCGTGTAGTGTACTC
scRNA-seq data
The scRNA-seq data are accessible with the name
scRNAseq
, which returns a matrix object.
head(experiments(mae)$scRNAseq)[, 1:4]
## TACAGTGTCTCGGACG GTTTCTACATCATCCC GTACGTATCCCATTTA
## ERCC_ERCC-00104 0 0 0
## HUMAN_A1BG 0 0 0
## HUMAN_A1BG-AS1 0 0 0
## HUMAN_A1CF 0 0 0
## HUMAN_A2M 0 0 0
## HUMAN_A2M-AS1 0 0 0
## ATGTGTGGTCGCCATG
## ERCC_ERCC-00104 0
## HUMAN_A1BG 0
## HUMAN_A1BG-AS1 0
## HUMAN_A1CF 0
## HUMAN_A2M 0
## HUMAN_A2M-AS1 0
scADT data
The scADT data are accessible with the name scADT
, which
returns a matrix object.
head(experiments(mae)$scADT)[, 1:4]
## TACAGTGTCTCGGACG GTTTCTACATCATCCC GTACGTATCCCATTTA ATGTGTGGTCGCCATG
## CD3 36 34 49 35
## CD4 28 21 38 29
## CD8 34 41 52 47
## CD45RA 228 228 300 303
## CD56 26 18 48 36
## CD16 44 38 51 59
SingleCellExperiment object conversion
Because of already large use of some methodologies (such as in the SingleCellExperiment
vignette or CiteFuse
Vignette where the SingleCellExperiment
object is used
for CITE-seq data, we provide a function for the conversion of our
CITE-seq MultiAssayExperiment
object into a
SingleCellExperiment
object with scRNA-seq data as counts
and scADT data as altExp
s.
sce <- CITEseq(DataType="cord_blood", modes="*", dry.run=FALSE, version="1.0.0",
DataClass="SingleCellExperiment")
## Warning: 'ExperimentList' contains 'data.frame' or 'DataFrame',
## potential for errors with mixed data types
sce
## class: SingleCellExperiment
## dim: 36280 7858
## metadata(0):
## assays(1): counts
## rownames(36280): ERCC_ERCC-00104 HUMAN_A1BG ... MOUSE_n-R5s25
## MOUSE_n-R5s31
## rowData names(0):
## colnames(7858): TACAGTGTCTCGGACG GTTTCTACATCATCCC ... TTGCCGTGTAGATTAG
## GGCGTGTAGTGTACTC
## colData names(6): adt.discard mito.discard ... celltype markers
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(1): scADT
Session Info
## R version 4.5.0 (2025-04-11)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SingleCellExperiment_1.30.0 SingleCellMultiModal_1.20.1
## [3] MultiAssayExperiment_1.34.0 SummarizedExperiment_1.38.1
## [5] Biobase_2.68.0 GenomicRanges_1.60.0
## [7] GenomeInfoDb_1.44.0 IRanges_2.42.0
## [9] S4Vectors_0.46.0 BiocGenerics_0.54.0
## [11] generics_0.1.3 MatrixGenerics_1.20.0
## [13] matrixStats_1.5.0 BiocStyle_2.36.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.1.4 blob_1.2.4
## [4] filelock_1.0.3 Biostrings_2.76.0 fastmap_1.2.0
## [7] BiocFileCache_2.16.0 digest_0.6.37 mime_0.13
## [10] lifecycle_1.0.4 KEGGREST_1.48.0 RSQLite_2.3.11
## [13] magrittr_2.0.3 compiler_4.5.0 rlang_1.1.6
## [16] sass_0.4.10 tools_4.5.0 yaml_2.3.10
## [19] knitr_1.50 S4Arrays_1.8.0 htmlwidgets_1.6.4
## [22] bit_4.6.0 curl_6.2.2 DelayedArray_0.34.1
## [25] abind_1.4-8 withr_3.0.2 purrr_1.0.4
## [28] desc_1.4.3 grid_4.5.0 ExperimentHub_2.16.0
## [31] cli_3.6.5 rmarkdown_2.29 crayon_1.5.3
## [34] ragg_1.4.0 httr_1.4.7 rjson_0.2.23
## [37] BiocBaseUtils_1.10.0 DBI_1.2.3 cachem_1.1.0
## [40] AnnotationDbi_1.70.0 formatR_1.14 BiocManager_1.30.25
## [43] XVector_0.48.0 vctrs_0.6.5 Matrix_1.7-3
## [46] jsonlite_2.0.0 bookdown_0.43 bit64_4.6.0-1
## [49] systemfonts_1.2.3 magick_2.8.6 jquerylib_0.1.4
## [52] glue_1.8.0 pkgdown_2.1.2 BiocVersion_3.21.1
## [55] UCSC.utils_1.4.0 tibble_3.2.1 pillar_1.10.2
## [58] rappdirs_0.3.3 htmltools_0.5.8.1 GenomeInfoDbData_1.2.14
## [61] R6_2.6.1 dbplyr_2.5.0 textshaping_1.0.1
## [64] evaluate_1.0.3 lattice_0.22-7 AnnotationHub_3.16.0
## [67] png_0.1-8 SpatialExperiment_1.18.0 memoise_2.0.1
## [70] bslib_0.9.0 Rcpp_1.0.14 SparseArray_1.8.0
## [73] xfun_0.52 fs_1.6.6 pkgconfig_2.0.3