#mae-discussion
2016-11-21
Marcel Ramos Pérez (15:21:37): > @Marcel Ramos Pérez has joined the channel
Marcel Ramos Pérez (15:22:28): > set the channel topic: Discuss the MultiAssayExperiment
2016-11-26
Sean Davis (22:23:58): > @Sean Davis has joined the channel
2016-11-28
Phil Chapman (13:18:09): > @Phil Chapman has joined the channel
Lucas Schiffer (13:18:09): > @Lucas Schiffer has joined the channel
Levi Waldron (13:18:09): > @Levi Waldron has joined the channel
Vince Carey (13:18:09): > @Vince Carey has joined the channel
Tim Triche (13:18:09): > @Tim Triche has joined the channel
2016-12-15
Sean Davis (08:25:13): > I’m going to have to miss the phone call again today. Sorry!
Phil Chapman (08:56:17): > Likewise sorry, have a good xmas everyone
Lucas Schiffer (10:44:39): > I don’t believe there is a call today. Is that correct@Marcel Ramos Pérez?
Marcel Ramos Pérez (11:10:55): > Hi<!channel>, our next call would be on the 22nd at 12 PM EST AFAIK
Sean Davis (11:23:17): > It is pretty bad when I cannot trust my own calendar.
Phil Chapman (11:38:02): > oh cool I think i can make that!
Marcel Ramos Pérez (11:40:33): > @Sean Davisthe calendar event might have been modified recently. See you then!@Phil Chapman
Tim Triche (13:54:00): > seeyanextweek
2017-05-04
Kasper D. Hansen (12:51:19): > @Kasper D. Hansen has joined the channel
2017-05-19
Aedin Culhane (14:16:45): > @Aedin Culhane has joined the channel
2017-05-22
Ludwig Geistlinger (04:55:20): > @Ludwig Geistlinger has joined the channel
Ludwig Geistlinger (05:08:23): > Not sure whether this is the right channel, but I would be interested in what’s our status regarding mae visualization + interactive exploration. A quick look into the literature brought the caOmicsV package (https://bioconductor.org/packages/caOmicsV) and a recent review into sight (https://www.ncbi.nlm.nih.gov/pubmed/?term=27585944). Are there any additional ongoing efforts in the community, which I should be aware of and that allow straightforward coupling with the MAE design? - Attachment (Bioconductor): caOmicsV > caOmicsV package provides methods to visualize multi-dimentional cancer genomics data including of patient information, gene expressions, DNA methylations, DNA copy number variations, and SNP/mutations in matrix layout or network layout. - Attachment (ncbi.nlm.nih.gov): Exploring and visualizing multidimensional data in translational research platforms. - PubMed - NCBI > Brief Bioinform. 2016 Sep 1. pii: bbw080. [Epub ahead of print]
2017-05-30
Levi Waldron (03:21:18): > Hey@Ludwig Geistlingersorry I didn’t see this message - I’ve turned on all notifications for this channel so I’ll get the message next time!
Levi Waldron (03:43:45): > I do remember someone working visiting Roswell who was working on a TCGA browser @Martin Morgancan you remind me who that was?
Levi Waldron (03:46:21): > BTW, I just made a MultiAssayExperiment video to accompany the paper with re-submission - thoughts welcome:https://www.youtube.com/watch?v=w6HWAHaDpyk - Attachment (YouTube): MultiAssayExperiment demo
2017-06-01
Sean Davis (08:12:50): > http://www.biorxiv.org/content/early/2017/05/31/139071
Sean Davis (08:13:41): > http://www.biorxiv.org/content/biorxiv/early/2017/05/31/139071.full.pdf
Sean Davis (08:14:10): > Abstract: Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.
2017-07-31
Lorena Pantano (12:35:18): > @Lorena Pantano has joined the channel
2017-11-29
Matthew McCall (09:41:47): > @Matthew McCall has joined the channel
2017-12-11
Ricard Argelaguet (16:47:43): > @Ricard Argelaguet has joined the channel
2018-08-02
Petr Smirnov (11:44:21): > @Petr Smirnov has joined the channel
2018-10-26
Dror Berel (17:52:36): > @Dror Berel has joined the channel
2019-01-17
Lluís Revilla (07:25:55): > @Lluís Revilla has joined the channel
2019-01-22
Dror Berel (11:14:10): > Which of the following is a better practice when you have MAE with replicates, and each assay is a SummarizedExperiment : > a. Keep MAE’s colData compact with covariates at the SAMPLE level ONLY, and then merge them into replicates (SummarizedExperiment’ colDAta) later on. > b. Initially merging (redundant) SAMPLE information from MAE’s colData into each of the assays SummarizedExperiment’ colData
2019-01-23
Levi Waldron (09:13:54): > Hi@Dror Berel- for true replicates, I would merge the redundant information sample information in the colData. You can then usereplicated
,anyReplicated
, andmergeReplicates
to manage them. On the other hand, for things like time series, tumor/normal pairs, I would keep these separate in the colData.
2019-12-04
Jonathan Carroll (17:39:26): > @Jonathan Carroll has joined the channel
2020-06-26
Jenny Smith (15:57:17): > @Jenny Smith has joined the channel
2020-08-06
Laurent Gatto (13:01:02): > @Laurent Gatto has joined the channel
2020-12-12
Huipeng Li (00:39:46): > @Huipeng Li has joined the channel
2020-12-13
Kelly Eckenrode (13:41:41): > @Kelly Eckenrode has joined the channel
2021-01-05
Malte Thodberg (09:26:40): > @Malte Thodberg has joined the channel
2021-01-22
Annajiat Alim Rasel (15:47:56): > @Annajiat Alim Rasel has joined the channel
2021-02-26
Tim Triche (13:11:28): > questiion: is there support for taking an existing MAE and directly saving the components to HDF5 or other DelayedArray-able backends?
Tim Triche (13:11:53): > e.g. > > R> scNMT > A MultiAssayExperiment object of 3 listed > experiments with user-defined names and respective classes. > Containing an ExperimentList class object of length 3: > [1] rna: RangedSummarizedExperiment with 22084 rows and 116 columns > [2] acc: RangedSummarizedExperiment with 95030342 rows and 116 columns > [3] meth: RangedSummarizedExperiment with 13464893 rows and 116 columns > Functionality: > experiments() - obtain the ExperimentList instance > colData() - the primary/phenotype DataFrame > sampleMap() - the sample coordination DataFrame > `$`, `[`, `[[` - extract colData columns, subset, or experiment > *Format() - convert into a long or wide DataFrame > assays() - convert ExperimentList to a SimpleList of matrices > exportClass() - save all data to files >
Tim Triche (13:12:39): > perhapsexportClass
but that is a LOT of manpage
Tim Triche (13:13:51): > it looks likeexportClass
is for exporting to text files. So that’s a no. Let’s see what happens going the other way round
Tim Triche (13:28:56): > nvm it looks like I have previously done this thing and just need to make it portable/movable
Marcel Ramos Pérez (13:30:31): > Hi Tim,@Tim Trichewe have been working onsaveHDF5MultiAssayExperiment
which takes theassays
and saves them to a single H5 file. This feature is still in the devel version of MultiAssayExperiment and will be updated to reduce the loss of annotations / metadata
Tim Triche (13:37:41): > Thanks! That’s exactly what I’m looking for. If it is aware of whether an object in an MAE is already backed by a DelayedArray that will be good — I have seen data loss when I made the mistake of saving an HDF5-backed SE to its existing directory in the past
2021-05-11
Megha Lal (16:44:17): > @Megha Lal has joined the channel
2021-06-05
Chris Vanderaa (11:17:55): > @Chris Vanderaa has joined the channel
2021-06-25
Chris Vanderaa (04:34:24): > Do you know why replacing an element in anExperimentList
objects is now (since a few weeks) taking so much time? I want to recursively modify and replace some assays in a largeMAE
object. Here should be a reproducible example to illustrate what I mean: > First, I fetch the data > > library(scpdata) > scp <- specht2019v3() > el <- experiments(scp) ## Extract the ExperimentList object > se <- el[[1]] ## Get one the assays (SummarizedExperiment) >
> Now, I compare the timing when replacing the first assay using the[[
on theExperimentList
against replacing the@ListData
slot directly (I know this can be dangerous, but see the gain in time): > > microbenchmark::microbenchmark( > el[[1]] <- se, > el@listData[[1]] <- se, > times = 1 > ) >
> Output: > > Unit: microseconds > expr min lq mean median uq max neval > el[[1]] <- se 8331134.156 8331134.156 8331134.156 8331134.156 8331134.156 8331134.156 1 > el@listData[[1]] <- se 48.892 48.892 48.892 48.892 48.892 48.892 1 >
> I’m not sure whether this is linked toExperimentList
or more generally toList
. So I prefer asking here before opening irrelevant issues on GitHub:wink:
Marcel Ramos Pérez (11:01:28): > This is due to the size of theel
object. Somewhere down the line in the[[<-
and[<-
methods for theList
class there are operations that do things to the data (maybeupdateObject
) that add time to the operation. Here is the same operation on a smaller ExperimentList: > > suppressPackageStartupMessages({ > library(scpdata) > library(MultiAssayExperiment) > }) > #> snapshotDate(): 2021-06-21 > > scp <- specht2019v3() > #> see ?scpdata and browseVignettes('scpdata') for documentation > #> loading from cache > el <- experiments(scp) ## Extract the ExperimentList object > se <- el[[1]] ## Get one the assays (SummarizedExperiment) > example(ExperimentList, echo = FALSE) > > microbenchmark::microbenchmark(el[[1]] <- se, ExpList[[1]] <- se, times = 1) > #> Unit: milliseconds > #> expr min lq mean median uq > #> el[[1]] <- se 6035.18780 6035.18780 6035.18780 6035.18780 6035.18780 > #> ExpList[[1]] <- se 67.17864 67.17864 67.17864 67.17864 67.17864 > #> max neval > #> 6035.18780 1 > #> 67.17864 1 >
> Direct slot access is not recommended because it can lead to errors.
Chris Vanderaa (18:06:24) (in thread): > Thank you very much Marcel for your input! What would be your advice to work around this for largeExperimentList/MultiAssayExperiment/QFeatures
objects? The example I show above is a typical data set size in single-cell proteomics and the issue I raise makes any data transformation very slow…
2021-06-26
Kasper D. Hansen (04:18:50): > Well, to me it sounds like we need some fix here. But first, I think it is important to rule out that there is not something special happening because you assign the samese
as is already in the slot. Ie. take the “fast” approach, changese
first and then assign. Ie.
Kasper D. Hansen (04:19:34): > > se2 <- se + 1 > microbenchmark(ExpList[[1]] <- se, times = 1) >
Kasper D. Hansen (04:20:45): > If this is as fast as before, then I strongly think we need to change some internals. It is not clear to me that the added robustness ofupdateObject()
(if that’s the cause) is worth a reallocation and massive slowdown.
2021-06-29
Chris Vanderaa (04:45:42): > Thank you very much@Kasper D. Hansenfor considering my problem! Here is the output of some additional test based on your suggestion: > > Unit: microseconds > expr min lq mean median uq max neval > el[[1]] <- se 6507666.182 6507666.182 6507666.182 6507666.182 6507666.182 6507666.182 1 > el[[1]] <- se2 6636018.208 6636018.208 6636018.208 6636018.208 6636018.208 6636018.208 1 > el@listData[[1]] <- se 55.899 55.899 55.899 55.899 55.899 55.899 1 > el@listData[[1]] <- se2 58.432 58.432 58.432 58.432 58.432 58.432 1 > ExpList[[1]] <- se 135455.744 135455.744 135455.744 135455.744 135455.744 135455.744 1 > ExpList[[1]] <- se2 105851.493 105851.493 105851.493 105851.493 105851.493 105851.493 1 > ExpList@listData[[1]] <- se 34.091 34.091 34.091 34.091 34.091 34.091 1 > ExpList@listData[[1]] <- se2 36.151 36.151 36.151 36.151 36.151 36.151 1 >
> To make things clear, note thatel
has size 1.3 Gb andExpList
has size 30 kb. The larger theList
, the longer it takes. I have no idea what is happening under the hood when replacing an element in aList
object, and I’m very surprised it takes so much more time than replacing an element in a basiclist
object. It’s as if the whole object is fully checked/replaced/updated/… and the timing increases even when the replacement (herese
orse2
) is the same size (4.7Mb). If you think this is worth opening an issue, could you please tell me which repo? (my wild guess from Marcel’s answer would be to open an issue inS4Vectors
)
Sean Davis (13:21:19) (in thread): > Nice experiment!
2021-06-30
Marcel Ramos Pérez (19:22:52): > Follow the issue here:https://github.com/Bioconductor/S4Vectors/issues/86
2021-07-01
Chris Vanderaa (04:55:46) (in thread): > Excellent! Thanks for the heads up!
2022-01-28
Megha Lal (11:13:30): > @Megha Lal has left the channel
2022-08-29
Margaret Turner (19:12:37): > @Margaret Turner has joined the channel
2023-01-10
Vince Carey (10:49:07): > @Vince Carey has left the channel
2023-04-20
Chris Vanderaa (04:35:38): > @Chris Vanderaa has left the channel
2023-09-15
Leo Lahti (04:55:19): > @Leo Lahti has joined the channel
2024-02-09
Marcel Ramos Pérez (10:11:37): > archived the channel