#singlecellexperiment

2017-08-08

Davide Risso (10:09:20): > @Davide Risso has joined the channel

Davide Risso (10:09:21): > set the channel description: Discuss implementation of the SingleCellExperiment class

Aaron Lun (10:09:42): > @Aaron Lun has joined the channel

Vladimir Kiselev (10:09:42): > @Vladimir Kiselev has joined the channel

Davide Risso (10:11:13): > OK, so this is thinking out loud so stop me if it makes no sense

Davide Risso (10:12:25): > but I was thinking it would be nice to have a logcounts() method that computes the log counts if the slot is empty and simply retrieve them if it’s already populated

Davide Risso (10:12:34): > too hidden to the user?

Davide Risso (10:14:06): > I guess the problem is, like Sean mentioned, that for an object that fit in memory we don’t want to populate it with so many extra slots and create a huge object when the computations would be straightforward

Davide Risso (10:14:26): > so perhaps being more explicit with getters and setters is better

Davide Risso (10:14:49): > at least people now that there are no hidden side effects

Peter Hickey (10:22:36): > @Peter Hickey has joined the channel

Peter Hickey (10:29:32): > I favour explicit with no side effects. E.g., > suppose logcounts was empty, eitherlogcounts()computes and return the values each time it is called or it has to silently modify the SCE as a side effect to ensure that repeated calls tologcounts()don’t re-compute the data (orlogcounts()could bememoisedbut I don’t know whether that helps user-level transparency and would still require a large matrix to be kept around)

Kevin Rue-Albrecht (10:34:12): > @Kevin Rue-Albrecht has joined the channel

Aaron Lun (10:39:50): > I thinklogcountsshould just fail if it hasn’t been set.

Aaron Lun (10:40:02): > It’s notSingleCellExperiment’s job to compute it.

Aaron Lun (10:48:14): > We’re just suggesting that, if you were to compute some kind of log-transformed count-like value, you should put it intologcounts.

Aaron Lun (10:48:31): > NotlogCounts, orlog_counts, or - dear god -exprs.

Sean Davis (12:52:46): > @Sean Davis has joined the channel

Nitesh Turaga (12:53:27): > @Nitesh Turaga has joined the channel

Marcel Ramos Pérez (12:53:43): > @Marcel Ramos Pérez has joined the channel

Sean Davis (12:58:09): > It might make sense to focus on developing an API (methods and classes) without immediately specifying an implementation. That could even take the form of a virtual class and generics so that folks could be free to inherit from an empty shell and supply behavior. The line between API spec and implementation is blurry, I realize, though.

Sean Davis (13:00:07): > Also, I think it is worth thinking about how many methods and slots to load onto a single base class. There is nothing that says that there can be only one Bioconductor base class for SC data. I am stating the obvious, I suspect.

Aaron Lun (13:51:10): > I’ve always thought ofSingleCellExperimentas a euro, and the various single-cell-related analysis packages as distinct countries in the eurozone. You can take your SCE from one package and use it in another package without having to do a coercion, i.e., currency transfer. You can go up/down denominations (derived <-> base classes), but everything is easily transferable.

Aaron Lun (13:56:14): > To extend the analogy further, I guess we’d probably have the same amount of bickering, too.

Aaron Lun (17:34:40): > So the named assay getters/setters: yay or nay?

Aaron Lun (17:35:11): > Same for the distance matrices: yay or nay?

Aaron Lun (17:36:21): > (Don’t use ambiguous emojis like:scream:, I won’t be able to figure out what they mean.)

Davide Risso (18:07:46): > :ghost:

Davide Risso (18:07:54): > (kidding)

Davide Risso (18:10:33): > yes on the setters/getters, I think it will be good to introduce naming conventions

Davide Risso (18:12:33): > I think overall the pros of adding a distance matrices slot a la assays() is beneficial (I will probably end up using them in clusterExperiment if they’re there and this makes it already two packages using them)

2017-08-09

Aaron Lun (05:20:50): > I’ll drag@Andrew McDavidinto this channel, as he probably has some strong opinions on this.

Andrew McDavid (05:20:54): > @Andrew McDavid has joined the channel

Aaron Lun (06:09:42): > Note that, if you do have a slot for distance matrices, they will not be handled bycbindoperations. This is because there is no coherent and generalizable way to combine two distance matrices together (as you need to fill in the off-diagonal blocks, and the only way to do that is to calculate the distances again). Thus, any attempt to docbind(or[<-on the columns) should ignore distance matrices, probably spitting out a warning.

Aaron Lun (06:12:19): > Arguably the same reasoning applies to dimensionality reduction coordinates - there are very few cases where you would want to combine separate PCA results together. However, it’s easy enough to do, and I guess I’ve done it before (separate t-SNEs of quite distinct populations that I wanted to put on the same plot) so we’ll continue to support it, I suppose.

hcorrada (07:08:54): > @hcorrada has joined the channel

Peter Hickey (09:05:16): > at least a warning seems appropriate

Kasper D. Hansen (09:58:46): > @Kasper D. Hansen has joined the channel

Kasper D. Hansen (10:00:55): > I strongly opposed storing distance matrices in the basic single cell container. There are multiple arguments against it. First, it goes against our existing paradigm of having a separate fit / dist object (not saying this is a holy paradigm, but I note it goes against it).

Kasper D. Hansen (10:02:26): > More importantly, computing and manipulating distances on experiments with many cells (1M) is much, much harder than accessing the individual data. In fact, it is pretty clear to me that this will not get done for large experiments and we will have to develop (or re-use) clustering methods which do not involve computing all pairwise distances, both for time and space.

Kasper D. Hansen (10:03:27): > Having scalable code for distances is likely to involve new tricks (if we want to compute them) so I don’t think it is the right time to add this into a core class

Davide Risso (10:18:26): > Just to continue here what I was thinking of in the other channel

Davide Risso (10:18:43): > SingleCellExperiment was designed with RNA-seq in mind

Davide Risso (10:18:47): > so perhaps that

Davide Risso (10:18:54): > is the way it should remain

Davide Risso (10:19:12): > we could write another class for methylation

Davide Risso (10:19:38): > and a (virtual) super-class that both extend

Davide Risso (10:19:55): > or is this overthinking it?

Aaron Lun (10:20:02): > Yeah, probably.

Aaron Lun (10:20:18): > What does@Peter Hickeyneed for methylatoin?

Aaron Lun (10:20:29): > SummarizedExperiment capabilities… and what else?

Aaron Lun (10:20:37): > No spike-ins, no size factors… reduced dims?

Davide Risso (10:20:39): > OTOH is SummarizedExperiment already that super-class?

Peter Hickey (10:20:47): > i havent thought of what i need specifically forsingle-cellmethylation

Peter Hickey (10:21:38): > methylation has concept of ‘spike ins’ (an unmethylated lambda phage genome) but i havent been using/leveraging like spike ins in RNA-seq

Aaron Lun (10:29:05) (in thread): > I’m happy to go either way on this one. Yes, the presence of a distance matrix slot would imply a specific clustering strategy, which wouldn’t fit into a concept of a base class. On a more practical level, if many packages need it, then I would be willing to live with the philosophical discomfort of a distance slot (that can potentially be empty). This seems easier for users and developers to work with, than to have yet another subclass.

Aaron Lun (10:29:05): - Attachment: Attachment > I strongly opposed storing distance matrices in the basic single cell container. There are multiple arguments against it. First, it goes against our existing paradigm of having a separate fit / dist object (not saying this is a holy paradigm, but I note it goes against it). - Attachment: Attachment > I’m happy to go either way on this one. Yes, the presence of a distance matrix slot would imply a specific clustering strategy, which wouldn’t fit into a concept of a base class. On a more practical level, if many packages need it, then I would be willing to live with the philosophical discomfort of a distance slot (that can potentially be empty). This seems easier for users and developers to work with, than to have yet another subclass.

Aaron Lun (10:38:17): > I would guess that the current named assay wrappers are the most general -countscan apply to gene/transcript-level counts, bin-level counts (for CNVs in genomic sequencing) or region-level counts (for ATAC-seq). Methylation will require more specialized accessors - and that’s fine, they can be implemented in the relevant packages. It seems to me that you’d want to implement them anyway forSummarizedExperiment, in which case they can be applied toSingleCellExperimentwithout much extra effort.

Peter Hickey (10:41:25): > as example, inBSseqwe subclassSummarizedExperimentto create aBSseqclass that has methods likegetMeth()andgetCoverage(). so thecounts()accessor would just error because it found no such named assay (I think). i can live with that

Kasper D. Hansen (10:42:18): > subclasses are very easy to work with. Especially if it is a core subclass

Kasper D. Hansen (10:42:35): > (core = present in the package)

Stephanie Hicks (11:01:20): > @Stephanie Hicks has joined the channel

Aaron Lun (11:40:59): > @Peter HickeyYeah, it should just break with “i not found” or something like that.

Martin Morgan (12:09:50): > @Martin Morgan has joined the channel

Aaron Lun (12:22:50): > So if there’s no objections about the named assay wrappers, I’ll merge the branch into the master.

Lorena Pantano (14:27:01): > @Lorena Pantano has joined the channel

Aaron Lun (14:50:09): > @Kasper D. Hansen@Davide Risso@Sean DavisJust wondering what the name of the subclass would be. My creative juices are finished for the day, so…ClusteredCellExperiment?

Davide Risso (14:52:00): > wouldn’t that imply that you’re storing clustering info in the object?

Peter Hickey (14:52:08): > SomethingShortOhForTheLoveOfGodSomethingShort

Aaron Lun (14:53:41): > Okay, well,SingleCellExperimentWithDistances. Obviously.

Aaron Lun (14:54:13): > Okay, it’s almost 8pm and I’m not functional anymore. I’m going home for dinner; I hope the google docs doesn’t turn into an ASCII cat or something by the time I get back.

Aaron Lun (14:55:20) (in thread): > > |\ *,,,---,,* > /,`.-'`' -. ;-;;,_ > |,4- ) )-,_..;\ ( `'-' > '---''(*/--' `-'\*) >

Lorena Pantano (15:02:00) (in thread): > /*_/
> ( o o ) > /
> **// meow! > /
> /
> / *_* /

Kasper D. Hansen (15:12:02): > I would start by having a class only with distances and not with expression data, essentially a wrapper around adistobject. I predict that is going to have its own HDF problems etc etc. Once those (and the API) have been worked around, then potentially create a “merged” class of the two classes

Kasper D. Hansen (15:12:48): > That gives you flexibility with almost no issues. If it turns out that dist only class is not used by anyone, we can always delete it

Sean Davis (15:42:51): > What methods would be necessary for newdist-like object to be useful?

2017-08-10

Fanny Perraudeau (00:59:59): > @Fanny Perraudeau has joined the channel

Aaron Lun (04:23:50): > Named assay branch has been merged. The vignette needs some updating though.

Aaron Lun (07:02:21): > Updated the vignette.

Aaron Lun (10:46:19): > @Davide RissoGive the updates a look over and copy it over to the Bioc build if you’re happy.

Davide Risso (11:05:31): > sure! I’ll do it tonight!

Aaron Lun (12:45:39): > needs a version bump tho

2017-08-11

Aaron Lun (14:28:53): > Did this happen?

Aaron Lun (14:29:03): > The update, I mean.

Davide Risso (14:47:40): > nope… sorry I had too many things to do yesterday! It will happen soon, though!

Aaron Lun (14:51:56): > ;kay

Davide Risso (15:28:54): > okay… so do you know how to do this? Because I’m struggling everytime that I have to merge Github to svn… I’ve been cherry-picking my commits in the other repositories… is that what we are supposed to do?

Lorena Pantano (15:29:53): > that is what works for me, but not sure if it is the right way, the only way I could get it done:slightly_smiling_face:

Davide Risso (15:32:11): > yes, exactly. This is also the only way that I can get it done, but I was wondering if there is a better method.

Davide Risso (15:32:40): > The problem is that right now the two histories are completely unrelated so there is no way to merge

Davide Risso (15:33:39): > I’m wondering if we should just wait for August 16:slightly_smiling_face:

Lorena Pantano (15:37:09): > I am waiting for that for all my commits, let’s see

Kasper D. Hansen (15:38:07): > I just copy my changes over

Kasper D. Hansen (15:38:26): > That makes the svn log kind of useless. But it is quick and easy

Kasper D. Hansen (15:38:59): > And then my log entry is “From version 1.2.3 on Github”

Kasper D. Hansen (15:39:10): > so I know where to get the log

Sean Davis (15:58:20): > I’ve been waiting for git transition, also. I’ll still end up with the “unrelated histories” problem, but at least that will happen only once.

Nitesh Turaga (15:59:07): > @Sean DavisYep, just once if it does happen.

Kevin Rue-Albrecht (16:00:37): > I think I’m doing like Kasper: I have git and svn set up in the same folder, ignoring each other’s confit files. On a regular basis pushing to my GitHub, then committing to svn only for each release

Davide Risso (16:17:21): > well.. good to know I’m not the only one struggling!

Davide Risso (16:18:46): > @Aaron Lunare you in a hurry to merge to bioc devel or should we wait for the git transition?

Aaron Lun (18:52:20): > I also do it like Kevin and Kasper have described, using some scripts for convenience.

Aaron Lun (18:52:55): > No hurry, but sooner rather than later before we forget about it.

Kasper D. Hansen (19:40:23): > I have them in separate directories unlike Kevin

Kasper D. Hansen (19:40:51): > Same directory sounds … potentially bad

2017-08-12

Kevin Rue-Albrecht (04:16:53): > I must admit. Took me a few attempts to get it working to be fair. If i recall properly, even the order in which I checkout the two version controls repos mattered, or at least made it easier. But once passed the git and svn ignore, setup, now I just have to switch the version control in R studio and both git/svn only see what they’re supposed to see. > Again I’m not particularly happy either about the distinct histories, especially in anticipation of the git transition. But once it’s git on all sides, it should make it easier to follow a single workflow… looking forward!

Aaron Lun (06:29:54): > Ah, I didn’t read Kevin’s comment fully. Yes, I use separate directories to avoid chaos.

Tim Triche (14:10:44): > @Tim Triche has joined the channel

Sean Davis (15:17:19): > Hi, all. I created the#bioc_gitchannel to discuss all things bioc and git. If we don’t use it, it can go away, but I suspect there will be some interest.

2017-08-17

Aaron Lun (06:53:15): > @Davide RissoI have cloned SingleCellExperiment from the Bioc git and started making some commits (https://github.com/LTLA/SingleCellExperiment). I think our best strategy is for someone (either you or me) to have a “central” repository on Github that everyone else forks or makes publicly visible PRs to. This should make it easier to avoid surprise conflicts from (relatively hidden) commits to the Bioc git. - Attachment (GitHub): LTLA/SingleCellExperiment > Clone of the Bioconductor repository for the SingleCellExperiment package, see https://bioconductor.org/packages/devel/bioc/html/SingleCellExperiment.html for the official development version.

Aaron Lun (06:54:24): > I personally will be archiving all my existing Git repositories for my other packages (csaw,diffHic, etc.). Far too much effort would be required to reconcile the histories, and for no apparent gain.

Aaron Lun (06:57:26): > In any case, I don’t have write access to BioC’sSingleCellExperiment, so I guess it’ll have to be you.

Kevin Rue-Albrecht (07:00:08): > I was just looking at package guidelines a few minutes ago, and spotted that ‘All authors mentioned in the package DESCRIPTION file are entitled to modify package source code.’ > I think they initially give write access only to the package submitter, but if you want I’m pretty sure they can give write access to other authors mentioned in DESCRIPTION (which you clearly are)

Aaron Lun (07:43:19): > Really?Allauthors? I assume this only refers to people labelled with[aut]?

Davide Risso (08:20:01): > @Aaron Lunhappy to be the person with the central repository and I can definitely archive the old repository. Just to make sure, are your commits from the github version or from the Bioc version (I still have to last week changes)

Aaron Lun (08:21:23): > I have manually committed to a clone of the BioC version. I would suggest that you archive the SingleCellExperiment repo you currently have in Github (I would just rename it to archive-SingleCellExperiment, to keep a record); once you do so, I will transfer ownership of the Github clone of the Bioc repository to your account.

Kevin Rue-Albrecht (08:46:24): > I can’t talk from experience, because I’ve always maintained packages by myself, but I copy pasted “‘All authors mentioned in the package DESCRIPTION file are entitled to modify package source code.’” fromhttps://bioconductor.org/developers/package-guidelines/I’m sure@Martin Morganor@Nitesh Turagacan answer precisely, but the way I understand it, all people mentioned in theDESCRIPTIONfile are entitled to request write access directly to the central Bioc repo.

Kasper D. Hansen (08:48:51): > That seems like it should be package specific. I am sure some packages wants commits to go through the maintainer

Kevin Rue-Albrecht (08:48:51): > Still, if it was me, I would also prefer a separate ‘central’ repo between developers as a safe place to merge contributions before releasing on the central Bioc repo

Kevin Rue-Albrecht (08:49:32): > @Kasper D. Hansen- exactly, I agree

Martin Morgan (08:53:42): > I’ll add some git-related comments to#bioc_git

Davide Risso (09:29:43): > @Aaron LunI just archived my SingleCellExperiment repo

Davide Risso (09:29:50): > it’s now athttps://github.com/drisso/archive-SingleCellExperiment - Attachment (GitHub): drisso/archive-SingleCellExperiment > archive-SingleCellExperiment - This is the archived version of SingleCellExperiment with the history before Bioc submission.

Davide Risso (09:30:05): > Feel free to transfer the new one to me anytime

Aaron Lun (09:30:36): > Sweet, I will transfer now. I think archiving is probably safer and more appropriate than doing a Switcheroo, as the histories are completely divergent.

Aaron Lun (09:31:38): > Done. You should probably update your affiliation on your Github account page…

Davide Risso (09:32:59): > :smile:

Davide Risso (09:35:29): > Done!:grin:

Aaron Lun (09:40:44): > Excellent. Now I can really sense your PI-ness from your Github account page.

Davide Risso (09:47:42): > so… does it mean that I should addgit.bioconductor.orgas a remote upstream in this repo?

Aaron Lun (09:52:06): > Yes.

Davide Risso (10:12:47): > Done! All your commits have been pushed to the bioconductor git

Aaron Lun (10:12:59): > Sweet.

Davide Risso (10:13:36): > is there a way to “see” the git server… something like the oldhttps://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks

Kasper D. Hansen (10:16:00): > (perhaps move the git discussion into#bioc_git)

Davide Risso (10:16:17): > yeah, sorry!

2017-08-22

Aaron Lun (13:09:31): > @Davide RissoWas thinking about clustering without distance matrices. If you’re using graphs, they can be stored as sparse matrices (where non-zero values indicate the presence of an edge between the corresponding cells). This could be easily stored in the same type of structure that would hold distance matrices, allowing us to use a common slot type for both.

Aaron Lun (13:14:05): > Trying to think of a name for the slot. Something likepairedColData, reflecting the fact that each entry of each matrix holds data for a pair of columns.

Aaron Lun (13:15:11): > Brings me back to my Hi-C days, actually, thinking of pairs of things.

2017-08-25

Aaron Lun (04:25:34): > Minor tweaks to theshowmethod.

Aaron Lun (13:52:01): > Minor fixes to theSingleCellExperimentconstructor example.

Aaron Lun (13:57:39): > @Davide RissoDo you want to commit them to BioC? I’m also happy to do it myself I given access.

Davide Risso (14:21:27): > either way is fine with me. You should already have access to the repo, no?

2017-08-26

Aaron Lun (05:39:16): > Not to the BioC one. I’ll can ask for it, if that’s alright with you.

2017-09-07

Michael Steinbaugh (07:20:18): > @Michael Steinbaugh has joined the channel

2017-10-03

Sean Davis (10:29:53): > Just FYI, there is a small collection of batched fibroblast cells here, perhaps useful for some QC/normalization work.https://github.com/singlecell-batches/getting-started - Attachment (GitHub): singlecell-batches/getting-started > getting-started - How to get started with the single cell batches comparison

2017-10-04

Sean Davis (05:48:00): > This framework was applied to the 10x 1.3M cell brain dataset.https://www.biorxiv.org/content/early/2017/10/03/197244with code herehttps://github.com/iaconogi/bigSCale

2017-10-06

David Jenkins (16:33:36): > @David Jenkins has joined the channel

2017-10-26

Levi Waldron (19:20:18): > @Levi Waldron has joined the channel

Levi Waldron (19:20:40): > Just saw this:https://www.biorxiv.org/content/early/2017/10/23/207704 - Attachment (bioRxiv): ascend: R package for analysis of single cell RNA-seq data > Abstract Summary: ascend is an R package comprised of fast, streamlined analysis functions optimized to address the statistical challenges of single cell RNA-seq. The package incorporates novel and established methods to provide a flexible framework to perform filtering, quality control, normalization, dimension reduction, clustering, differential expression and a wide-range of plotting. ascend is designed to work with scRNA-seq data generated by any high-throughput platform, and includes functions to convert data objects between software packages. Availability: The R package and associated vignettes are freely available at https://github.com/IMB-Computational-Genomics-Lab/ascend. Contact: mailto:joseph.powell@uq.edu.au|joseph.powell@uq.edu.au Supplementary information: An example dataset is available at ArrayExpress, accession number E-MTAB-6108

2017-10-27

Sean Davis (06:42:01): > Added “ascend” to list:https://github.com/seandavi/awesome-single-cell - Attachment (GitHub): seandavi/awesome-single-cell > awesome-single-cell - List of software packages for single-cell data analysis, including RNA-seq, ATAC-seq, etc.

Lorena Pantano (13:56:33): > http://www.scrna-tools.org

2017-11-01

Martin Morgan (06:50:59): > A couple of things… > > Aaron implemented TENxBrainData()https://github.com/LTLA/TENxBrainDatawhich uses ExperimentHub() to store the relevant data > > > suppressMessages(sce <- TENxBrainData::TENxBrainData()) > > sce > class: SingleCellExperiment > dim: 27998 1306127 > metadata(0): > assays(1): counts > rownames: NULL > rowData names(2): Ensembl Symbol > colnames(1306127): AAACCTGAGATAGGAG-1 AAACCTGAGCGGCTTC-1 ... > TTTGTCAGTTAAAGTG-133 TTTGTCATCTGAAAGA-133 > colData names(4): Barcode Sequence Library Mouse > reducedDimNames(0): > spikeNames(0): > > Also, the original 10x hdf5 file is available as > > > library(ExperimentHub) > > find = "TENxBrainData/1M_neurons_filtered_gene_bc_matrices_h5.h5" > > query(ExperimentHub(), find) > - Attachment (GitHub): LTLA/TENxBrainData > TENxBrainData - An ExperimentHub package for the 1.3 million brain cell 10X single-cell RNA-seq data set.

Martin Morgan (06:53:04): > Also, the ‘loom’ format is kind of a SummarizedExperiment in hdf5; there’s a new parser atSummarizedExperiment::makeSummarizedExperimentFromLoom()(in git, should be available after today’s build, or frombiocLite("Bioconductor/SummarizedExperiment"), both using R-devel / bioc-devel).

Martin Morgan (06:57:26): > (here’s loom:http://loompy.org/loompy-docs/format/index.html)

Peter Hickey (19:22:41): > loom looks interesting. are there other pkgs/tools using it?

Peter Hickey (19:45:19): > @Aaron Lun: Cool to have TENxBrainData! how is the data stored in the HDF5 file? that weird sparse representation from 10X or a chunked + compressed, dense matrix?

2017-11-02

Martin Morgan (08:30:41): > The script athttps://github.com/LTLA/TENxBrainData/blob/master/inst/scripts/make-data.Rdescribes how the data are made – chunked dense matrix - Attachment (GitHub): LTLA/TENxBrainData > TENxBrainData - An ExperimentHub package for the 1.3 million brain cell 10X single-cell RNA-seq data set.

Aaron Lun (12:24:12): > @Martin MorganWould you like me to transfer the repository to Bioconductor, if it makes it easier for you to make the necessary changes?

Martin Morgan (16:55:19): > Kind of an interesting discussion about loom, with two limitations being (a) the row / col data may change relatively more frequently (e.g., via addition of summary stats) than the assay data, and it might be valuable to version this, so that one ends up creating copies of very large data to track changes in very small data; and (b) maybe this scheme is ok for simple row / col data, but what about implementing typical consortium data e.g., TCGA that is really query-able relational data?

Kasper D. Hansen (17:00:11): > I see loom is done by the linnearson lab. DO they have any track record with software

2017-11-29

Matthew McCall (09:43:11): > @Matthew McCall has joined the channel

2017-12-11

Ricard Argelaguet (16:47:17): > @Ricard Argelaguet has joined the channel

2018-02-14

Aaron Lun (15:02:48): > @Hervé PagèsI just noticed that GRanges has protected fields for “start”, “end”, etc. What is the motivation behind this? I ask because I wonder if we can do something similar forSingleCellExperiment; currently it protects certain row/column metadata fields from the user by tucking them into separate slots completely.

Aaron Lun (15:03:50): > It also seems that a fairly innocuous command can generate an invalid object:

 gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
+                     IRanges(1:10, width=10:1))
 gr0$strand <- 2
 validObject(gr0)
Error in validObject(gr0) :  etc etc

2018-02-28

Daniel Van Twisk (15:18:12): > @Daniel Van Twisk has joined the channel

2018-03-03

Aedin Culhane (09:36:56): > @Aedin Culhane has joined the channel

2018-03-07

Michael Steinbaugh (13:04:49): > @Aaron Lun@Hervé PagèsWhat’s the recommended method for adding spike-in sequences inside a GRanges object to be used with a SingleCellExperiment? For example, if I have a dataset using EGFP, I’d like to include “EGFP” insideisSpike()but also set an empty range for the spike-in

Aaron Lun (13:05:23): > Not sure what you’re needing here.

Aaron Lun (13:05:36): > If you want to say that EGFP is the spike in, you can do this easily.

Michael Steinbaugh (13:05:45): > Perfect yeah I just want to do that

Aaron Lun (13:05:49): > If you want to set a GRanges for EGFP, you can also do that easily.

Aaron Lun (13:06:18): > For the former, you can just do: > > isSpike(sce, "EGPF") <- 1 # if your first gene is EGFP. >

Aaron Lun (13:06:30): > For the latter, you can probably do something like: > > rowRanges(sce)[1] <- GRanges("plasmid", IRanges(1-100)) >

Aaron Lun (13:06:56): > Though it’s been a while since I tried that, so I’m not 100% sure it works.

Michael Steinbaugh (13:07:12): > Ah so set it with something likeplasmid. Nice, thanks Aaron

2018-03-09

Elizabeth Purdom (10:09:27): > @Elizabeth Purdom has joined the channel

2018-03-15

Martin Morgan (18:30:58): > Single cell experimenters might wish to cast their eye over (provide a brief review of) this packagehttps://github.com/Bioconductor/Contributions/issues/675 - Attachment (GitHub): ccfindR · Issue #675 · Bioconductor/Contributions > Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor Repository: https://github.com/hjunwoo/ccfindR Confirm the following by editing each c…

Kevin Rue-Albrecht (18:39:06): > I know I’m already biased against a package when the first thing I see is « Please re-use the SingleCellExperiment representation »:sweat_smile:

Kevin Rue-Albrecht (18:39:53): > #Bioc4ever

2018-03-30

Matt Ritchie (07:53:02): > @Matt Ritchie has joined the channel

2018-04-12

Davide Risso (15:54:40): > @Aaron LunI noticed that the github repo is a few commits ahead of bioc-devel. Is that on purpose or should we push to upstream?

2018-04-13

Aaron Lun (08:51:56): > No reason, I would think.

Davide Risso (09:00:04): > Ok I’ll push today

Aaron Lun (09:17:46): > I just did it.

Davide Risso (10:15:08): > thanks

2018-04-19

Sean Davis (06:57:00): > Just FYI, the Human Cell Atlas data are available in preview:https://preview.data.humancellatlas.org/

2018-04-20

Aedin Culhane (12:48:18): > Thanks Sean. Have we a port of these into bioc pkgs?

2018-04-21

Sean Davis (12:57:04): > Not that I know of.

Aaron Lun (13:04:31): > I think it would be difficult at this point - it’s just FASTQs, as far as I can tell.

Sean Davis (13:04:52): > And I get a sense that the data are still in flux.

2018-04-22

Kasper D. Hansen (14:07:09): > But not sure we should wait until the HCA have gotten around to it.

Sean Davis (14:11:39): > Assuming that we wanted to start with raw data, what would be the desired workflow? What would we want the final container(s) to be? Are there use cases that already exist that would benefit from more example data?

Aaron Lun (17:09:29): > The 10X is pretty straightforward; viaCellRanger(orscPipe, for a more BioC flavour) and then get the count matrix at the end, packaged as a SingleCellExperiment (easily done usingread10xCountsinDropletUtils). The other one is good-old Smart-seq2, so we can just plug it into any RNA-seq alignment and read-counting program (e.g.,Rsubread, which works pretty well with this data in my experience). Again, this gives a count matrix that can be packaged as a SingleCellExperiment.

Aaron Lun (17:10:10): > As for use cases… not sure we have anything that would benefit from the scope of this data.

Aaron Lun (17:19:33): > Though perhaps on second thought,CellRangermay not be happy with the quantity of data involved… might require some care.

2018-04-23

Stephanie Hicks (05:54:49): > how so?

Aaron Lun (06:35:23): > Perhaps not so muchCellRanger, butMatrixencounters integer overflows for sparse matrices that are too large.

Aaron Lun (06:36:40): > Having said that, CR will try to generate BAM files (which will be huge for these data) and it will try to perform secondary analyses e.g. clustering and t-SNE generation, which would probably take a long time.

Aaron Lun (06:37:09): > So there’s probably something to be said for processing the data in chunks.

Stephanie Hicks (08:50:49): > Ah ok. & agreed on processing data in chunks. Even though the HCA data is still in flux, I’m supportive of@Sean Davis’s suggestion. It might be a good idea to draft out what a desired workflow is & what the final containers would be. especially for all of us working downstream (e.g.@Kasper D. Hansen,@Davide Risso,@Elizabeth Purdom,@Raphael Gottardo).

2018-04-26

Aaron Lun (01:49:13): > @Davide Risso@Keegan Korthauersuggest changing the class name toLinearEmbeddingMatrixand give it matrix-like behaviour, for dim/dimnames and during subsetting (i.e., drop=TRUE by default and returning a vector).

Keegan Korthauer (01:49:16): > @Keegan Korthauer has joined the channel

Aaron Lun (01:49:47): > This ensures that the object is truly interchangeable with matrices when stored in the SCE reddim slot.

Stephanie Hicks (02:01:12): > makes sense. Will toy around withLinearEmbedding(soon –>LinearEmbeddingMatrix) this week as input to the kmeans function we’re working on for theSingleCellExperimentobject.

Aaron Lun (08:50:54): > LinearEmbeddingMatrixis implemented with docs. Need to clean upshowand add a$<-method forfactorData.

Aaron Lun (09:04:29): > Need to allow setting ofmetadatain constructor.

Aaron Lun (09:37:09): > also subsetting is not robust when it’s a character string, or Rle… need to dig up the SE internal functions that handle such cases.

Aaron Lun (09:49:57): > Tests added covering most LEM functionality.

Aaron Lun (18:14:53): > @Hervé PagèsI was thinking of putting together a document to help BioC developers derive from base classes, e.g., SummarizedExperiment. The idea would be to have some “best practices” for the derivation process - and perhaps it would be possible to even extend concepts likeparallelSlotNamesto accommodate matrix-like slots or column-based slots, which would allow us to avoid having to manually define the dreaded*bindand subset (assignment) methods.

Kevin Rue-Albrecht (18:19:23): > That would be a worthy effort Aaron, thanks. I think a lot of developers like myself only learn about ‘tricks’ likeparallelSlotNamesduring review of their submitted package. It would also probably facilitate the job of package reviewers to have such a resource towards which they may redirect package developers

Aaron Lun (18:22:32): > Dealing withNAMESis an extreme pain. Character or NULL-ness requires a lot ofiffing around to cover all possibilities.

2018-04-27

Aaron Lun (16:53:57): > @Hervé PagèsI am thinking of putting this document as a vignette in theSummarizedExperimentpackage itself, would that be a good idea or should I put it in a separate repo?

Hervé Pagès (16:54:01): > @Hervé Pagès has joined the channel

Aaron Lun (22:29:38): > I’ve put up a draft as a PR.

Aaron Lun (23:02:35): > Bumping@Kasper D. Hansen@Martin Morgan, if interested. Also added some thoughts about how to potentially simplify the derivation process. Link is athttps://github.com/Bioconductor/SummarizedExperiment/pull/10. - Attachment (GitHub): Detailed instructions for class extension by LTLA · Pull Request #10 · Bioconductor/SummarizedExperiment > This is very much a work in progress, just adding it as a PR to put it on the radar and to get feedback.

Martin Morgan (23:44:50): > I don’t think it belongs in SummarizedExperiment, which is basically a ‘user’ package; maybe S4Vectors…?

2018-04-28

Aaron Lun (14:35:43): > Okay. I just put it there because the end of Herve’s original vignette had a bit on extending the SE class. Happy to know where it should actually go - though S4Vectors would seem like an odd place IMO, because it precedes the introduction of the concept of the SE class?

2018-04-29

Kasper D. Hansen (07:46:21): > I think it belongs in SummarizedExperiment - that is where the class is defined. But it could have a headline like “For developers: $CURRENT_TITLE”

2018-05-01

Davide Risso (14:47:26): > Hi@Aaron Lunwhat did we decide to do for those cases in which one needs to store a reducedDim() matrix that was created with only a subset of genes?

Davide Risso (14:47:52): > Also, we currently still require that the reducedDims() elements are matrices

Davide Risso (14:48:06): > should we change that to matrix-like objects?

Aaron Lun (14:48:12): > After discussing with@Kasper D. Hansen, I decided to just store the loading matrix, and no subset link.

Aaron Lun (14:48:34): > It makes some sense to subset PCs, because if we re-project we get the same PCs anyway

Aaron Lun (14:48:43): > but it doesn’t make sense to subset the loading vectors, which would change our PCs.

Davide Risso (14:48:53): > so that means we get rid of them?

Aaron Lun (14:48:57): > Huh?

Davide Risso (14:49:01): > oh OK sorry

Davide Risso (14:49:07): > we just keep the full matrix

Davide Risso (14:49:16): > even when subsetting

Davide Risso (14:49:18): > correct?

Aaron Lun (14:49:45): > Yep

Aaron Lun (14:49:52): > Anyway, yes, it should be matrix-like.

Aaron Lun (14:50:06): > Just get rid of the part in the validity method that enforcesis.matrix-ness, if there was anything like that.

Aaron Lun (14:50:08): > Can’t remember.

Davide Risso (14:50:43): > yeah I can do it

Davide Risso (14:52:00): > so just to be clear: in a typical workflow one would likely select high variable genes, do PCA, store the factors and the loadings in an element ofreducedDims

Davide Risso (14:52:47): > that would work, before we don’t care about the number of features inLinearEmbeddingMatrix

Aaron Lun (14:53:15): > Yes, so there’s only ever subsetting by row for elements ofreducedDims.

Aaron Lun (14:53:30): > If you like, write it up as a PR so we can both stare at it tomorrow.

Aaron Lun (14:55:39): > yeah

Aaron Lun (14:57:47): > Probably needs a couple of extra tests for the LEM+SCE interaction, esp for subsetting and combining.

Davide Risso (14:58:44): > yep

2018-05-02

Aaron Lun (04:48:01): > God, this celltrails guy is argumentative.

Davide Risso (13:11:45): > I wonder if there is a way to deprecate the use of ExpressionSet in new packages. There is really no reason why one would do that over SummarizedExperiment other than “I’m used to ExpressionSets” which seems his main argument.

Davide Risso (13:12:18): > Could BiocCheck return a WARNING or at least a NOTE if using ExpressionSet to consider using SummarizedExperiment?

Sean Davis (13:19:05): > There is a lot of code out there that uses ExpressionSet. For new work, it makes sense to strongly encourage SE, but I would hesitate to have BiocCheck issue warnings.

Davide Risso (14:52:59): > Yeah, I was thinking specifically about new packages. I realize that a warning is not a good idea, perhaps just mentioning this in the guidelines for new packages on the website?

Peter Hickey (14:55:18): > http://bioconductor.org/developers/package-guidelines/ > > Make use of appropriate existing … classes (e.g.,ExpressionSet, … > looks like the guidelines need an update:astonished:

Peter Hickey (14:55:23): > Are these pages PR-able?

Peter Hickey (14:55:45): > the link given in that paragraph (http://bioconductor.org/developers/package-guidelines/developers/how-to/commonMethodsAndClasses/) is broken too

Kasper D. Hansen (14:55:53): > I think you can argue for ExpressionSet still, but not for his usecase

Kasper D. Hansen (14:56:15): > Also, he just needs to get over it. I mean he submitted 2 days ago and we have standards. Not our problem his paper is accepted

Martin Morgan (15:14:15): > @Peter Hickeythanks for the typo-spotting; the repo (I’ll fix this) is athttps://github.com/Bioconductor/bioconductor.orgwith pages under content/ and the url on the web site leading typically to a .md file. > > Remember it’s not polite to talk behind people’s back, and that this is a public forum.

Aaron Lun (20:17:55): > In any case, I just gave an anecdote from my past, hopefully it is sufficiently illustrative of what happens when one does not listen to the package reviewer.

2018-05-03

Aaron Lun (05:32:55): > @Davide RissoDon’t know if we should add arbindcheck to ensure that the LEMs being combined have the same feature loadings?

Aaron Lun (05:34:27): > Otherwise you could do very silly things

Davide Risso (10:45:18): > Yeah but how do we do that? Are feature names mandatory? Otherwise, how can we know that it’s the same features?

Aaron Lun (10:59:58): > No, we just check that the feature loading matrix is the same.

Aaron Lun (11:10:53): > I have this implemented.

Davide Risso (13:25:19): > @Aaron LunDid you push upstream after merging my pull request?

Aaron Lun (13:32:32): > Nope.

Aaron Lun (13:33:20): > But now done.

Davide Risso (13:35:45): > Thanks!

Aaron Lun (13:37:49): > Didn’t bump, though.

Aaron Lun (13:38:41): > Should we enforce equivalent feature loadings in LEMs to be rbinded?

Kasper D. Hansen (13:39:45): > Loyal Goff is part of the latent space CZI team. He is asking where this class lives? I will ask him to join here

Kasper D. Hansen (13:40:01): > this class = LinearEmbeddingSomething

Kasper D. Hansen (13:40:29): > or should I direct him to the CZI slack Bioconductor channel?

Aaron Lun (13:41:26): > Hm. Might as well have a discussion here, it’s not like we get a lot of activity here anyway.

Davide Risso (13:43:05): > Anyway, LinearEmbeddingMatrix lives in the SingleCellExperiment package

Loyal (13:51:07): > @Loyal has joined the channel

Aaron Lun (13:57:25): > See thedevelopbranch for enforced equality of featureLoadings (within a tolerance) during rbind.

Aaron Lun (13:57:46): > a486d7aa4fbf543d40d4d6d235c75e62daf5dbac

2018-06-06

Aaron Lun (14:15:13): > @Davide RissoAm thinking of converting our currently “internal” fields to an actual protected field in thecolDataandrowData(guarded by modifications tocolData<-androwData<-, or lower-level derivatives if necessary). This would reduce the maintenance burden considerably. However, note that internal fields would be stored as nestedDataFrames, so this will probably not play nice with ggplot. Or maybe it will, I don’t really know.

Kevin Rue-Albrecht (15:05:08) (in thread): > maybe youriSEE:::.extract_nested_DFhas a brighter future higher up the dependency chain?

Kasper D. Hansen (17:23:35): > protected fields in colData / rowData would be nice in general

Kasper D. Hansen (17:23:54): > I could use them elsewhere

Kasper D. Hansen (17:24:17): > Not sure what I mean by protected, but something along the lines of them being the package responsibility

Kasper D. Hansen (17:25:14): > might be good enough to need something likecolData(X, force = TRUE)$PROTECTED_NAME = VALUE

Kasper D. Hansen (17:25:39): > and then specific commands for extracting

Aaron Lun (17:59:43) (in thread): > I would have preferred foras.data.frame,DataFrame-methodto do something sensible in this regard - currently it just falls over.

Aaron Lun (18:01:52): > Yes, if this turns out to be useful one could imagine putting a check for protected fields in thecolData<-method forSummarizedExperimentitself. The function that generates protected fields can then be specialized for different SE subclasses to allow for different protected fields, allowing us all to avoid redefiningcolData<-for our own classes.

Aaron Lun (18:03:17): > This could be part of a fairly extensive “making it easier to derive from SE” wishlist that I have… having done this procedure at least three times, I still don’t think I have the hang of it..

Kevin Rue-Albrecht (18:05:33) (in thread): > aren’t we just talking about putting the code ofiSEE:::.extract_nested_DFinto the body ofas.data.frame,DataFrame-method?

Aaron Lun (18:07:19) (in thread): > Sort of, depending how name clashes are resolved. Something to consider if you want to PR intoS4Vectors.

2018-06-07

Kasper D. Hansen (05:44:35): > Yes,@Aaron Lunsuggestion of putting it in SummarizedExperiment and allowing a list of protected fields is exactly what I would like.

Kasper D. Hansen (05:46:11): > An alternative to having protected columns ofcolDatais SummarizedExperiment level support for 2 structures:colDataandpackageColData(ok, probably suboptimal naming scheme, but intention is clear).

Kasper D. Hansen (05:47:30): > This is a cleaner design IMO. We also need to think about accessing (and printing for example) protected columns. Should they be returned by a standard call tocolData()(I don’t think so, and I also don’t think they should be printed as default).

Kasper D. Hansen (05:47:45): > perhapsprotectedColData()

Kasper D. Hansen (05:48:15): > I see plenty of use for this inminfi

Kasper D. Hansen (05:48:56): > and stuff likesizeFactorsetc in expression packages

Aaron Lun (06:00:01): > The two structure setup is currently what we have inSingleCellExperiment, with the standardcolDataand an internalint_colData. It works, but for some applications it’s a bit annoying as you need to change the access strategy if you want something from the internal fields.

Kasper D. Hansen (06:05:27): > I don’t see how that is avoidable. If we want protected fields we need to make them less accessible to the user.

Kasper D. Hansen (06:06:36): > we could have convenience functions likecolData_all()which could extract both user-level and protected data and do acbindand make sure we have unique names. But at some level we cannot have the cake and eat it too.

Kasper D. Hansen (06:07:55): > But I am getting pretty convinced that doing this would be useful at the project-level.

Kasper D. Hansen (06:08:12): > By which I mean inSummarizedExperiment

Aaron Lun (06:14:42): > Yes, that’s also the current strategy in SCE withcolData(internal=TRUE).

Aaron Lun (06:15:43): > The reason to switch to a protected field incolDatainstead would be to allow users to re-use read semantics, while preventing writes to the protected field other than through the approved methods.

Hervé Pagès (16:00:30): > How do you guys feel about using a naming convention for hidden fields e.g. fields prefixed with.would be hidden i.e. not returned bycolData()orrowData()by default? Other tools already use this convention e.g.ls()in R and Unix commandls. More precisely: we could add theallargument tocolData()androwData(). Whenall=TRUE, the full DataFrame would be returned, otherwise only its non-hidden columns.allwould beFALSEby default. This assumes that the end-user will never run in a situation where s/he needs to store user data in.*columns though, and I’m not sure this is a reasonable assumption. What do you think?

2018-06-08

Kasper D. Hansen (05:28:19): > It is not a bad idea. What about subsetting: I think we should assume they are always carried through and does not count for numbering purposes.

Kasper D. Hansen (05:29:18): > We clearly have some things to think about. The advantage of two structures is that everything would carry through “automatically” instead of having some of the columns being “special”

Kasper D. Hansen (05:30:37): > similar, say the use want to update thecolDatawith something likecolData(OBJECT) <- NEW_DF. Then it shouldn’t (I think) overwrite existing hidden fields. Which now makes it a column(s) replacement/deletion instead of the whole object

Kasper D. Hansen (05:30:43): > If that even makes any sense

Kasper D. Hansen (05:31:01): > I am not exactly expressing myself clearly here.

Aaron Lun (05:31:11): > Hiding fields based on a dot prefix seems a bit too broad brush to me. Class-by-class specification seems “cleaner”.

Aaron Lun (05:31:46): > And yes, having two structures would make life easier.

Aaron Lun (05:32:16): > In some respects, at least.

Kasper D. Hansen (05:32:40): > Thats my feel as well. I think main drawbacks we have articulated so far are

Kasper D. Hansen (05:33:18): > 1) it requires changing the class structure (potentially big) > 2) more work by the developer if they want all the colData; a cbind is necessary

Kasper D. Hansen (05:34:09): > Of these 2) seems super minor and I would rather put some load on the developer. 1) is bigger, in that it both needs new code/design and then we need to address existing objects. We have done so before, but clearly not something which should be undertaken lightly

Aaron Lun (05:35:16): > Well, SCE already has 1), so we know it’s possible.

Aaron Lun (05:35:40): > The question is whether we want to move this to SE, and how to do so most effectively.

Aaron Lun (05:36:51): > There’s a few comments athttps://github.com/Bioconductor/SummarizedExperiment/pull/10about an idea forparallelSlotNamesfor the other dimensions, which would allow automatic subsetting and combining behaviour. - Attachment (GitHub): Detailed instructions for class extension by LTLA · Pull Request #10 · Bioconductor/SummarizedExperiment > This is very much a work in progress, just adding it as a PR to put it on the radar and to get feedback.

Aaron Lun (05:38:58): > The idea is that if you have a new slot that’s parallel to the column or row, you just add the slot name to the set of slot names, and - presto - you get instant (correct) subsetting and combining behaviour. If this were implemented,colDatawould obviously be the first entry in the slots parallel to the columns; but you could easily imagine just addinginternal_colDatato this set of names, and that’s job done. No need to modify all of the subsetting/combining methods.

Aaron Lun (05:40:05): > As you say, 2) is not so bad either. Happily enough,DataFramedoes not care about name clashes anymore: > > > library(S4Vectors) > > a <- DataFrame(a=1) > > cbind(a,a) > DataFrame with 1 row and 2 columns > a a > <numeric> <numeric> > 1 1 1 >

Aaron Lun (05:40:52): > So you can justcbindeverything and it won’t crash even if the user was unfortunate enough to use the same column name as one of the hidden fields.

Aaron Lun (05:41:07): > The user’s column should probably be in front, though.

Kasper D. Hansen (05:42:42): > For 1) I was referring to changing SummarizedExperiment, at the root. Making a new class is different (easier but more cruft)

Aaron Lun (05:47:08): > Yes, that’s what I was thinking as well.

Aaron Lun (05:49:01): > The idea would be for SE to provideparallelSlotNamesByColumnor something, and us derivers can just modify the set of slot names that we need to be parallel.

Aaron Lun (05:49:22): > And if there is such a function in SE, then it would be trivial to add (to itself) theint_colDataslot.

Aaron Lun (05:49:36): > No need for extra work in writing the subset and combine methods.

Kasper D. Hansen (05:53:09): > So the question I would have here is the question of overengineering. Do we need unlimited colData (and rowData)-like slots with custom names, or do we just need 2 : public and private?

Aaron Lun (05:55:09): > I have a few classes that would benefit from adding additional slots toparallelSlotNames(ByColumn).

Aaron Lun (05:55:48): > For SCE, I would have added the slot for handling reduced dimensions to this set, as this is parallel to the columns.

Aaron Lun (05:57:10): > You could say that I could store entire matrices as columns in aDataFrame, which is also possible. But this may not be a generally applicable solution.

Aaron Lun (05:58:07): > Definitely we need to specify new slots that are parallel to the rows.

Aaron Lun (05:58:44): > e.g.,InteractionSethas aGInteractionsobject running parallel to the rows.

Hervé Pagès (11:58:10): > i see what you mean. Yes we need to be careful about what thecolDatasetter would do. You’re right that it will behave more like a column replacement/deletion/addition operation with respect to the “full” DataFrame. From an end-user point of view (i.e. with respect to the “visible” DataFrame), nothing would change i.e. it would still look and feel as a whole colData replacement. I was thinking that the advanced user who needs to touch the hidden columns (call him/her an “authorized user”) would use a switch e.g.colData(se, force=TRUE) <- DFto perform a brut replacement. This would let him/her do anything. By default (i.e. whenforce=FALSE), one would only be allowed to replace/delete/add visible columns.

Hervé Pagès (12:06:17): > The main advantage of this vs adding new slots is that we don’t touch the internal representation so we don’t break hundreds of serialized objects. Also this scheme could be re-used for therowDataand for themcolsof other Vector derivatives.

Aaron Lun (12:10:33): > Hm, that’s true.

Hervé Pagès (12:58:31): > An orthogonal discussion: maybe it’s time for us to start a discussion with R core (@Michael Lawrence?@Martin Morgan?) about howreadRDS()andload()could be enhanced to update objects on-the-fly. Maybe this could be done via anonUnserializehook (details to be discussed somewhere else). This would let us register arbitrary code to be executed after the object is loaded and before it’s returned to the user. Then we could really focus on the best way to improve our data structures without being afraid to touch their internal representation. We’ve fixed hundreds or maybe thousands of serialized S4 objects over the last 12 years and now with AnnotationHub and ExperimentHub hosting thousands or maybe tens of thousands of serialzed S4 objects, we’ve put ourselves in a situation where it’s almost impossible to make any significant progress on our data structures. It’s frustrating and will hurt the project as a whole in the long term.

Kasper D. Hansen (15:32:32): > At least run updateObject() per default on load would be good

Hervé Pagès (16:07:44): > Right. A technical problem is thatupdateObjectis a Bioconductor S4 generic (defined in the BiocGenerics package) andreadRDS()andload()are in base so cannot callupdateObject(). I don’t think they’d want to moveupdateObjectto base or, more generally speaking, add an S4 generic to it. We would need to discuss the best way to implement this “on unserialize hook”. Where would be the best place to discuss that?

2018-06-26

Elana Fertig (08:37:21): > @Elana Fertig has joined the channel

2018-07-19

Aaron Lun (18:06:07): > @Davide RissoI see you’re working on thedevelopbranch. Keep in mind that there is one extra change to the LEM class there, where it throws errors upon attempts torbindtwo LEMs with different feature loadings. I was never sure whether this change was a good idea or not, which is why it’s been in limbo for a while.

Davide Risso (18:19:59): > I’m not working on develop… I updated directly master, but then realized that develop was way behind master and merged master into develop

Davide Risso (18:20:08): > That’s why I pushed develop

Davide Risso (18:20:36): > But I was wondering why you didn’t merge those changes to master

2018-07-20

Aaron Lun (04:07:47): > Okay, good.

Aaron Lun (04:08:12): > Well, we might as well have a discussion about the LEM behaviour I mentioned above.

Aedin Culhane (16:09:46): > Hi Is there a known subsetting issue of rowData with SingleCellExperiment_1.2.0

Aedin Culhane (16:11:19): > rowData is not in the same order as rownames.

Aedin Culhane (16:11:54): > and when subset (used logical subset) rowData has the nrow of the SingleCellExperiment class, but doesn’t match

Aedin Culhane (16:12:09): > Should I just update or if this a known issue?

Aaron Lun (16:38:39): > Uh - not quite sure what you mean.

Aaron Lun (16:39:11): > Do you have a code example?

Aedin Culhane (16:56:27): > Hi. I created a SingleCellExperiment from 10x data. However the order of rowData is not the same as rownames

Aedin Culhane (16:56:28): > > rownames(x)[1:2] > [1] “ENSG00000243485” “ENSG00000237613” > > rowData(x)[1:2,] > DataFrame with 2 rows and 6 columns > ID Symbol CHR START END > > 1 ENSG00000000003 TSPAN6 X 100627109 100639991 > 2 ENSG00000000005 TNMD X 100584802 100599885 > STRAND > > 1 -1 > 2 1 > > class(x) > [1] “SingleCellExperiment”

Aedin Culhane (16:57:45): > If I use a logical to subset x by row, the rowData does not contain the rownames(x)

Aedin Culhane (16:58:14): > > class(rowData(x)) > [1] “DataFrame” > attr(,“package”) > [1] “S4Vectors”

Aaron Lun (16:58:41): > Hm. We do overwriterowData(), but that shouldn’t do much.

Aaron Lun (16:59:32): > I presume that everything was correct going into the constructor?

Aedin Culhane (16:59:54): > Is there a check on consistency between rowData and rownames. I thought SingleCellExperiemnt would inherit that from SummarizedExperiment.

Aaron Lun (17:00:40): > AFAIK that doesn’t occur even inSummarizedExperiment, due to the inability of the BioC-release DataFrame to handle duplicate row names. Or something.

Aaron Lun (17:00:57): > This was one of the behaviour changes that’ll happen in BioC-devel.

Aaron Lun (17:01:46): > Anyway, withinternal=FALSE, the SCErowData()just calls the SErowData(). So it’ worth checking whether you get the same behaviour with the SE, in which case it’s S.E.P.

Aedin Culhane (17:02:57): > So the row filters of genes with zero counts, have messed up rowData.

Aedin Culhane (17:03:29): > Might be worth mentioning in the workflows etc…

Aaron Lun (17:04:07): > Wait, I don’t think I understand the problem here. Can you describe exactly what happened?

Aedin Culhane (17:04:25): > A fix I have done, is filtering the rowData and counts matrix separately.

Aaron Lun (17:05:12): > Are you saying that row-subsetting of the SCE object results in assays androwDatathat are not synchronized?

Aedin Culhane (17:05:20): > Yes,

Aaron Lun (17:05:32): > :shocked_face_with_exploding_head:

Aaron Lun (17:05:52): > I assume you’re running on BioC-release.@Davide Rissocan you check this?

Aedin Culhane (17:06:36): > Sorry this isn’t devel. Its on release

Aaron Lun (17:06:42): > SCE[should just dispatch to the SE[, so I’m not sure how this happened.

Aedin Culhane (17:07:16): > Thats what I expected

Davide Risso (17:07:58): > Mmmm.. this indeed seems very strange

Davide Risso (17:08:22): > @Aedin Culhaneany chance you have a reproducible script of what you did?

Aedin Culhane (17:08:41): > > class(x) > [1] “SingleCellExperiment” > attr(,“package”) > [1] “SingleCellExperiment” > > rownames(x[1:5,]) > [1] “ENSG00000243485” “ENSG00000237613” “ENSG00000186092” > [4] “ENSG00000238009” “ENSG00000239945” > > rowData(x[1:5,])[,1:3] > DataFrame with 5 rows and 3 columns > ID Symbol CHR > > 1 ENSG00000000003 TSPAN6 X > 2 ENSG00000000005 TNMD X > 3 ENSG00000000419 DPM1 20 > 4 ENSG00000000457 SCYL3 1 > 5 ENSG00000000460 C1orf112 1

Davide Risso (17:10:26): > I meant, how you createdx

Davide Risso (17:10:34): > so that I can try the same example

Aedin Culhane (17:10:58): > xDirs=grep(“hg19”,list.dirs(baseDir), value=TRUE) > x<-read10xCounts(xDirs) > save(x, file=file.path(wkdir,“./KM_scRNAseq.rda”))

Aedin Culhane (17:11:40): > I then overwrote the rowData

Aedin Culhane (17:11:42): > library(EnsDb.Hsapiens.v86) > Rxx <- select(EnsDb.Hsapiens.v86, keys=rowData(x)$ID, > column=c("SEQNAME", "GENESEQSTART", "GENESEQEND","SEQSTRAND"), keytype="GENEID", multiVals="first") > colnames(Rxx)= c("EnsEMBLID", "CHR", "START", "END", "STRAND") > rowData(x)<- merge(rowData(x), Rxx,by.x="ID" ,by.y= "EnsEMBLID" , all=TRUE) > > summary(rowData(x)$CHR==“MT”)

Davide Risso (17:12:06): > mm… I guess then it could be a problem inrowData(x)<-

Aedin Culhane (17:12:13): > possibly.

Davide Risso (17:12:34): > can you check if the rownames where fine before that step?

Aedin Culhane (17:13:04): > However adding additional annotation to rowData is not an unusual requirement. I haven’t seen this with SE

Aedin Culhane (17:14:22): > I actually overwrote it so I’d need to run the script again

Aedin Culhane (17:14:57): > I didn’t expect to introduce an error there:wink:

Davide Risso (17:16:28): > No problem, I’m trying a similar example on my machine

Davide Risso (17:16:35): > let’s see if I hit the same issue

Aaron Lun (17:21:24): > Keep in mind thatmergere-orders everything.

Aedin Culhane (17:22:21): > > x<-read10xCounts(xDirs) > > rownames(x)[1:2] > [1] “ENSG00000243485” “ENSG00000237613” > > rowData(x)[1:2] > DataFrame with 32738 rows and 2 columns > ID Symbol > > 1 ENSG00000243485 MIR1302-10 > 2 ENSG00000237613 FAM138A

Aedin Culhane (17:23:33): > So its ok after read10xCounts.

Aedin Culhane (17:23:40): > Just re-running next bit

Aedin Culhane (17:24:20): > (Doesn’t help that there are several merge functions… sorry)

Davide Risso (17:24:58): > One thing that I notice is that rowData(x) has no rownames

Davide Risso (17:25:04): > not sure if that matters

Aedin Culhane (17:25:37): > > rowData(x)<- merge(rowData(x), Rxx,by.x=“ID” ,by.y= “EnsEMBLID” , all=TRUE) > > > > rownames(x)[1:2] > [1] “ENSG00000243485” “ENSG00000237613” > > rowData(x)[1:2,] > DataFrame with 2 rows and 6 columns > ID Symbol CHR START END STRAND > > 1 ENSG00000000003 TSPAN6 X 100627109 100639991 -1 > 2 ENSG00000000005 TNMD X 100584802 100599885 1

Aedin Culhane (17:26:10): > > class(rowData(x)) > [1] “DataFrame” > attr(,“package”) > [1] “S4Vectors”

Davide Risso (17:26:25): > Yeah, definitely the problem is not with the subsetting, it’s with therowData<-method

Aedin Culhane (17:26:44): > Yes… its with overwriting rowData

Davide Risso (17:26:54): > I think that the problem is that your merge() is not preserving the order and so you are overwriting the rowData

Aedin Culhane (17:27:04): > Yes I agree.

Davide Risso (17:27:11): > Somehow SingleCellExperiment does not check

Davide Risso (17:27:23): > when it probably should

Davide Risso (17:27:55): > but I’m not sure how it can check if your rowData() has no rownames

Davide Risso (17:28:06): > how does it know that it should check the order of the ID column?

Davide Risso (17:28:37): > One more thing to check would be what happens if you try with SummarizedExperiment

Aedin Culhane (17:28:41): > SE check that rownames(rowData(x)) are equal to rownames(x)

Davide Risso (17:28:57): > just to see whether it’s a SummarizedExperiment or SingleCellExperiment issue

Aedin Culhane (17:29:04): > However without that, could it check that rowData(x)[,1] is the same

Davide Risso (17:29:09): > what if you try the same code after

Davide Risso (17:29:23): > x <- as(x, "SummarizedExperiment")

Aedin Culhane (17:29:39): > Ok, give me a minute or two

Aedin Culhane (17:31:01): > PS .. how do I add code properly on slack

Aedin Culhane (17:31:08): > @Aedin Culhaneuploaded a file:Untitled - File (Plain Text): Untitled

Aedin Culhane (17:31:14): > (worked it out)

Aedin Culhane (17:33:26): > when I convert x to a SummarizedExperiment, it removes the rownames

Aedin Culhane (17:33:31): > @Aedin Culhaneuploaded a file:Untitled - File (Plain Text): Untitled

Davide Risso (17:33:53): > mmm

Davide Risso (17:34:04): > OK, give me some time to dig into it

Davide Risso (17:34:37): > It still looks strange to me that your rowData(x) doesn’t show row names

Davide Risso (17:34:49): > in my example for instance: > > > rowData(x)[,1:3] > DataFrame with 5 rows and 3 columns > feature_symbol is_feature_control is_feature_control_ERCC > <character> <logical> <logical> > 1/2-SBSRNA4 1/2-SBSRNA4 FALSE FALSE > A1BG A1BG FALSE FALSE > A1BG-AS1 A1BG-AS1 FALSE FALSE > A1CF A1CF FALSE FALSE > A2LD1 A2LD1 FALSE FALSE >

Aedin Culhane (17:34:59): > Let me double check read10xcounts

Aedin Culhane (17:35:09): > and see why I am not getting rownames

Aedin Culhane (17:42:03): > …. as a feature request for SCE and SE… its might be nice to have an option to auto-populate rowData given a known keytype ;-))

Aaron Lun (17:42:24): > Elaboration?

Davide Risso (17:42:38): > I think that the issue is with the rownames of rowData

Aaron Lun (17:42:40): > like from annotationDBI::select?

Davide Risso (17:43:03): > and it might be an issue of the SingleCellExperiment constructor

Aaron Lun (17:43:34): > unlikely.

Aaron Lun (17:44:27): > We don’t do anything to deviate from SE handling of the row names.

Davide Risso (17:44:53): > @Davide Rissouploaded a file:Untitled - File (Plain Text): Untitled

Davide Risso (17:45:08): > Shoot!

Davide Risso (17:45:20): > This works in Bioc devel

Davide Risso (17:45:54): > @Davide Rissouploaded a file:Untitled - File (Plain Text): Untitled

Aaron Lun (17:45:59): > See my previous comments about S4vectors DataFrame name handling changes, and the newuse.names=TRUEdefault.

Davide Risso (17:46:45): > Just to clarify: my first snippet is Bioc-devel and has no issues, the second snippet is bioc release and it’s not what I would expect

Aedin Culhane (17:46:58): > I used DropletUtils’ version 1.0.2

Aaron Lun (17:48:22): > Differences in behaviour due to a newly added...torowData’s function definition in BioC-devel.

Aedin Culhane (17:48:47): > Friday evening… sorry

Davide Risso (17:52:00): > On further exploration, the rownames have nothing to do with it… even with rownames,rowData<-doesn’t check for inconsistencies

Davide Risso (17:52:11): > @Davide Rissouploaded a file:Untitled - File (Plain Text): Untitled

Davide Risso (17:52:25): > This is SummarizedExperiment, not SingleCellExperiment

Davide Risso (17:54:55): > So, definitely not an issue of SingleCellExperiment… should SummarizedExperiment throw an error here?

Aedin Culhane (17:56:58): > Isn’t there a code to check if a class is valid. (validObject???) Does this throw an error

Aedin Culhane (18:00:28): > Sorry I have to get home to baby sitter. Thanks Davide and Aaron for getting your help on this. I will avoid using rowData()<- and put extra checks in if I do. Lets follow up in Toronto. I will be there Tuesday evening

Davide Risso (18:01:45): > :+1:see you in Toronto! Have a great weekend!

Aedin Culhane (18:08:43): > Thanks

2018-07-25

Neke Ibeh (09:33:10): > @Neke Ibeh has joined the channel

Diya Das (13:06:50): > @Diya Das has joined the channel

C. Mirzayi (please do not tag this account) (15:57:48): > @C. Mirzayi (please do not tag this account) has joined the channel

Ben Johnson (16:02:21): > @Ben Johnson has joined the channel

2018-07-26

Matthew Oldach (12:56:33): > @Matthew Oldach has joined the channel

2018-07-31

Aaron Lun (06:27:25): > I am going to split up the SCE source code to make it easier to navigate.

Kevin Rue-Albrecht (06:40:51): > while you look at it, it seems that the accessors to the internal slots are not exported in the NAMESPACE, despite the@exportinstruction in the roxygen blocks. That is speaking of the master branch.

Aaron Lun (06:53:58): > Huh? Well, they wouldn’t be.

Aaron Lun (06:54:03): > Because they’re internal.

Aaron Lun (06:54:17): > I’ve only recently been thinking of exporting them, and mentioning theirproperuse in the docs.

Kevin Rue-Albrecht (06:57:55): > Indeed, I wasn’t sure whether they were supposed to be exported or not. The slots are internal, but I suppose that developers of downstream packages should access them with the accessor function. The question is then whether that should be prefaced bySingleCellExperiment::or not

Kevin Rue-Albrecht (07:00:10): > argh, hang on, I’m crazy. You’re right, they don’t have the@exportinstruction. Everything around them does, but not them

Aaron Lun (07:01:10): > Well, I’m going to export them so that developers are able to use them. But I’m going to put a lot of flags around the help page.

Kevin Rue-Albrecht (07:01:38): > Looking forward to it:slightly_smiling_face:

Aaron Lun (08:18:36): > @Davide RissoWhy do we needSingleCellExperiment.Rproj?

Aaron Lun (08:31:03): > And why do we have package status in the vignette?

Davide Risso (09:15:20) (in thread): > It makes it easier to open a dedicated session in RStudio if you’re working in parallel on different projects

Davide Risso (09:15:55) (in thread): > That’s for historical reason from when the readme was a link to the vignette

Davide Risso (09:16:11) (in thread): > You can get rid of it if you want

Aaron Lun (10:26:52): > <!channel>Those of you who are interested, have a look at therefactorbranch, in particularvignettes/devel.Rmd.

Aaron Lun (10:26:57) (in thread): > Done.

Michael Steinbaugh (10:28:02): > Will do, thanks Aaron

2018-08-01

Aaron Lun (14:57:34): > If no one says anything, I’ll assume that silence means consent and will merge changes on Saturday.

2018-08-02

Kevin Rue-Albrecht (03:44:15): > @Aaron LunI finally just had a look. Nice cleanup, and the devel.Rmd tells me all I want to know at the moment. Cheers!

Aaron Lun (08:05:19): > https://github.com/drisso/SingleCellExperiment/pull/24 - Attachment (GitHub): Large-scale refactoring of internals by LTLA · Pull Request #24 · drisso/SingleCellExperiment > Reorganized functions to separate source files for easier navigation. Added reducedDimNames<- method. Added withDimnames= option for reducedDim(s) methods. Exported internal getters and setters….

Davide Risso (08:24:22): > Thanks@Aaron Lunreally nice vignette!

Aedin Culhane (12:18:23): > SCAPhttp://advances.sciencemag.org/content/4/8/eaat8573 - Attachment (Science Advances): Accelerating a paradigm shift: The Common Fund Single Cell Analysis Program > It has become exceedingly important to understand the precise molecular profiles of the nearly 40 trillion cells in an adult human because of their role in determining health, disease, and therapeutic outcome. The National Institutes of Health (NIH) Common Fund–supported Single Cell Analysis Program (SCAP) was designed to address this challenge. In this review, we outline the original program goals and provide a perspective on the impact of the program as a catalyst for exploration of heterogeneity of human tissues at the cellular level. We believe that the technological advances in single-cell RNA sequencing and multiplexed imaging combined with computational methods made by this program will undoubtedly have an impact on broad and robust applications of single-cell analyses in both health and disease research.

Davide Risso (14:40:42): > Hey@Aaron Lun(or anyone else<!here>) random question: do you have any favorite method to detect cell doublets?

Aaron Lun (14:41:26): > scranhasdoubletCellsanddoubletCluster, depending on what you have available.

Aaron Lun (14:41:35): > You’ll need devel.

Davide Risso (14:41:53): > what do you mean by what you have available?

Aaron Lun (14:42:43): > if you have clusters,doubletClusteris often more interpretable, and provides more information. But it requires that your doublets cluster separately, which may not always be possible.

Aaron Lun (14:43:28): > If you just have cells,doubletCellsdoes the usual “simulate doublets and match them up to real cells”. Which doesn’t need doublets, but just gives you scores. Higher = more likely to be a doublet, though this is more difficult to interpret. You usually have to find outliers to actually call the doublets.

Davide Risso (14:45:15): > thanks this is very useful!

Davide Risso (14:46:02): > would it scale to 1M cells?

Aaron Lun (14:46:07): > ¯*(ツ)*/¯

Aaron Lun (14:46:40): > doubletCellsdoes chunk-wise processing to simulate the doublets, so memory isn’t much of an issue.

Aaron Lun (14:47:00): > The real problem is the PCA at the start and the nearest neighbours detection at the end.

Aaron Lun (14:48:03): > Well, the nearest neighbours is parallelized, but the initial PCA is not.

Aaron Lun (14:49:58): > doubletClustersdoesn’t care much either way, as you’ll have already done the hard work in clustering.

Aaron Lun (14:50:44): > Oh,doubletCellsmay only be inhttps://github.com/MarioniLab/scran. Can’t remember, need to pushscaterandSingleCellExperimentfirst. - Attachment (GitHub): MarioniLab/scran > Clone of the Bioconductor repository for the scran package, see https://bioconductor.org/packages/devel/bioc/html/scran.html for the official development version.

Aaron Lun (14:53:19): > I should mention that, even if you have 1 million cells, you wouldn’t have generated them in a single run. Doublets only form within runs, so you should be running the function for each run separately.

Davide Risso (14:56:36): > Good point, thanks!

2018-08-04

Aaron Lun (06:16:06): > Pushed.

2018-08-16

Marcus Kinsella (16:58:28): > @Marcus Kinsella has joined the channel

2018-09-13

Aaron Lun (17:24:52): > Also, if anyone using the latest version of SCE manages to generate an object wherereducedDimNames(sce)returnsNULL, please let me know how you did it. I’ve seen it once in someone else’s interactive session but neither he nor I was able to reproduce it. This is an error and should not occur.

Aaron Lun (17:26:08) (in thread): > @Davide Rissohttp://bioconductor.org/packages/devel/workflows/vignettes/simpleSingleCell/inst/doc/work-6-doublet.html

2018-09-14

Aaron Lun (12:09:20): > Does one of the core packages have a function that converts an arbitrary subsetting vector into integer indices? I’ve found myself rewriting this function across many different packages.

Kevin Rue-Albrecht (12:10:14): > I must misunderstand: how is this different fromwhich?

Aaron Lun (12:11:16): > characters

Aaron Lun (12:11:31): > or non-integer numerics.

Aaron Lun (12:11:44): > So then: > > if (is.logical(rows)) { rows <- which(rows) } > else if (is.character(rows)) { rows <- match(rows, rownames(x)) } > else { rows <- as.integer(rows) } >

Kevin Rue-Albrecht (12:12:00): > ow

Marcel Ramos Pérez (12:13:23): > what if it’s a factor?:scream_cat:

Aaron Lun (12:14:13): > Then madness happens, I suppose.

Aaron Lun (12:14:27): > Probably gets coerced to an integer. Which is how factors subset anyway.

Kevin Rue-Albrecht (12:15:12): > exactly.. but now think of all the user whoread.tablewith the defaultstringsAsFactors = TRUE:wink:

Aaron Lun (12:15:15): > The full code looks something like this: > > .subset_to_index <- function(subset, x, byrow=TRUE) > # Converts arbitrary subsetting vectors to an integer index vector. > { > if (byrow) { > dummy <- seq_len(nrow(x)) > names(dummy) <- rownames(x) > } else { > dummy <- seq_len(ncol(x)) > names(dummy) <- colnames(x) > } > > if (!is.null(subset)) { > dummy <- dummy[subset] > } > out <- unname(dummy) > if (any([is.na](http://is.na)(out))) { > stop("'subset' indices out of range of 'x'") > } > return(out) > } >

Aaron Lun (12:16:01): > If you’re subsetting with factors and not expecting integer behaviour… then that’s too bad.

2018-09-19

Hervé Pagès (11:29:51): > normalizeSingleBracketSubscript()in S4Vectors does that. It handles many kinds of subsets e.g. factor-Rle, IRanges, etc… It’s used by the subsetting methods of many core objects e.g. GRanges, SummarizedExperiment, GAlignments, etc…

2018-09-20

JiefeiWang (12:49:07): > @JiefeiWang has joined the channel

2018-09-24

Kim-Anh Lê Cao (21:49:01): > @Kim-Anh Lê Cao has joined the channel

2018-10-03

Aaron Lun (10:28:33): > @Hervé PagèsnormalizeSingleBracketSubscriptis good for rows, but is there anything for columns?

Hervé Pagès (18:11:03): > You could useDelayedArray:::normalizeSingleBracketSubscript2()for this. It presents an interface that is more versatile thannormalizeSingleBracketSubscript(): > > > m <- matrix(1:15, ncol=5, dimnames=list(NULL, letters[1:5])) > > DelayedArray:::normalizeSingleBracketSubscript2(-2, ncol(m), colnames(m)) > [1] 1 3 4 5 > > DelayedArray:::normalizeSingleBracketSubscript2(c("d", "a"), ncol(m), colnames(m)) > [1] 4 1 > > DelayedArray:::normalizeSingleBracketSubscript2is just a work around the limitations ofnormalizeSingleBracketSubscript’s interface but the right thing to do would be to fixnormalizeSingleBracketSubscript. Has been on my list for a while but is not going to happen soon…

2018-10-24

Vince Carey (11:26:59): > @Vince Carey has joined the channel

Davide Risso (12:00:21): > Answering@Martin Morganquestion here, I think that no, conversion is not the right word

Davide Risso (12:02:22): > I think we should implement a coerce method for Seurat objects in SCE

Davide Risso (12:02:57): > but there may be some loss of information

Martin Morgan (12:10:09): > I guess I was thinking that it would be more useful to write an interface to the format, and as one application of the interface be able to create a SCE.

Aaron Lun (12:18:46): > I’d rather not have the coerce method live in SCE.

Aaron Lun (12:19:20): > It would introduce an unnecessary dependency on Seurat (which itself depends on a whole lot of other things).

Kevin Rue-Albrecht (12:21:36): > I agree with Aaron, but just out of curioisity, wouldEnhancescause a dependency?

Aaron Lun (12:36:17): > They don’t enhance us. We enhance them.

Kevin Rue-Albrecht (12:36:48): > oh then i just misunderstood Enhances:grimacing:

Vince Carey (12:37:19): > Maybe the best thing to do is to submit pull requests to seurat that implement things that we think are useful. X[G,S] could be interpreted for X an instance of seurat. reducedDims() could easily be implemented. and so on.

Hervé Pagès (12:39:06): > Wouldn’t Suggests be fine? Would avoid the debate of who enhances who…

Vince Carey (12:39:08): > My “conversion” request is probably unsound – given the sparse Matrix or hdf5 representations that can be used in seurat.

Aaron Lun (12:40:49): > I don’t want to have to be responsible for keeping a SeuratObject conversion method operational.

Kevin Rue-Albrecht (12:41:24) (in thread): > I think packages listed inSuggestsare considered dependencies bydevtools, which goes back to Aaron’s point that it would be unnecessarily painful to pull in all of Seurat’s dependencies

Martin Morgan (12:42:00): > But I think (very vague recollection, ‘from the hip’) that their hdf5 is actually in the tenx format; I think for ‘us’ the thing to do would be to maintain a SeuratExperiment package or similar, but I’d be highly nervous that Seurat will just change content of their files without (needing to) consider the consequences…

Martin Morgan (12:42:46): > I think the endeavor is bigger than a coerce method, and somewhat orthogonal to the analytic purposes of SingleCellExperiment, and so deserving of a separate package…

Kevin Rue-Albrecht (12:46:34): > I vaguely started a mini-package (https://github.com/Bioconductor/Contributions/issues/826) that was solely dedicated to interfacing Seurat to Bioconductor. I didn’t dig through the entire Seurat object, which as@Davide Rissowrote, leads to loss of information. - Attachment (GitHub): (inactive) BioSeurat · Issue #826 · Bioconductor/Contributions > Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor Repository: https://github.com/kevinrue/BioSeurat Confirm the following by editing each …

Aaron Lun (12:47:16): > scranhas aconvertTomethod as well. It was - and continues to be - a real pain to maintain.

Aaron Lun (12:47:36): > I would be happy for someone to take that off my hands.

Kevin Rue-Albrecht (12:49:04): > Which makes the idea of stashing this pain into a separate ‘ugly duck’ package fairly sensible

Kevin Rue-Albrecht (12:50:37): > In my case, I merely focused on the information that was of interest to me, which I suppose each of us starts with. Perhaps a community effort could progressively fill gaps? I don’t mind which function we start from. It could start withscran’s version, and progress from there

Aaron Lun (12:51:19): > I just want this to not be my problem.

Kevin Rue-Albrecht (12:51:42): > I know. Me neither. But maybe we can divide the pain. At least we can all cry together when Seurat updates their data structure.

Aaron Lun (12:56:38): > well, theconvertTocode is all yours if you want to deal with it.

Vince Carey (13:01:39): > Changes to the data structure should not be important if the methods that are exposed behave consistently. What methods do we need for, say, iSEE, to succeed with an instance of seurat as input?

Vince Carey (13:04:00): > It can’t be that much – the Convert method in Seurat generates a compliant SCE in a little more than a screenful of code.

Vince Carey (13:05:00): > So I don’t think we should take all the interoperability obligations on our shoulders, particularly if the benefits of consistent method behavior are made clear, and the valuable methods are properly supported.

Aaron Lun (13:07:34): > iSEE is extremely tied into the SE framework. You’d have to have all the SE methods supported, e.g.,assay,colData,rowData, to name a few.

Aaron Lun (13:09:12): > And anyway - if theirConvertis already doing the job, isn’t this problem solved?

Vince Carey (13:14:41): > 1) most of those methods are one-liners given what already exists in seurat class, 2) the problem seems mostly solved for a given instance – but more complex data stores may lead to challenges, 3) assuming the external developer has an interest in conversion, to ensure consistent interoperability, tests should be on hand for all the methods used to interoperate. if the methods and tests don’t exist, tested pull requests should be welcomed by the external developer.

Vince Carey (13:15:32): > i am saying that our project (maybe me) should write the converting methods and associated tests on objects of the different classes, and submit them as pull requests to Seurat.

Vince Carey (13:16:06): > i don’t want to make more work for you. i did a fair amount of interoperability work in MLInterfaces … it seems to have held up ok because most of the remote methods are pretty stable

Aaron Lun (13:16:45): > Well, if it’s seurat-side, then that’s fine from my end.

Kasper D. Hansen (14:00:55): > It could be a suggests

2018-10-26

Aaron Lun (11:31:56): > @Vince CareyThe problem should be fixed in the GH version.

Aaron Lun (11:32:07): > Well, once I push it, give me a second tocheck.

Aaron Lun (11:39:48): > Now, the real question is why, in the class heirarchy of SE -> RSE- > SCE, if I define thesetAsfor RSE -> SCE, why doesn’t theas(se, "SingleCellExperiment")use it? I would have thought that it would progressively call coercion methods, i.e., SE -> RSE (fromSummarizedExperiment) and then my defined RSE->SCE.

Kasper D. Hansen (11:47:43): > Well, the RSE has more information than the SE. So when you go from RSE to SCE you may use this additional information, which is not necessarily available for SE objects

Kasper D. Hansen (11:48:11): > What if your coercion method used a RSE-specific slot?

Aaron Lun (11:48:57): > My point is that there already exists an SE->RSE conversion. Now I’ve defined a RSE to SCE conversion. So why does the S4 system not automatically chain them together when I ask for a SE->SCE conversion?

Aaron Lun (11:49:17): > Certainly these implicit type promotions happen in other languages, e.g., C++.

Kasper D. Hansen (11:51:33): > Ah, I didnt’ understand that

Kasper D. Hansen (11:51:47): > No good answer here, someone who is more deep in the internals might know

Aaron Lun (11:59:25): > Well, I guess I was wrong on the C++ - the compiler is only allowed to make one implicit conversion, so it wouldn’t be allowed to chain things together.

Hervé Pagès (12:28:48): > Everything would behave like we all expect in the following case: > > setClass("A", slots=c(stuff="ANY")) > setClass("B", contains="A") > setClass("C0", slots=c(some_other_stuff="ANY")) > > (I purposedly didn’t make C0 a subclass of B) > > setAs("A", "C0", function(from) cat("Hi there, I'm the coercion method from A to C0\n")) > selectMethod("coerce", c("B", "C0")) > # Method Definition: > # > # function (from, to = "C0", strict = TRUE) > # cat("Hi there, I'm the coercion method from A to C0\n") > # > # Signatures: > # from to > # target "B" "C0" > # defined "A" "C0" > > as(new("B"), "C0") > # Hi there, I'm the coercion method from A to C0 > > But if C extends B, bad things happen: > > setClass("C", contains="B") > > setAs("A", "C", function(from) cat("Hi there, I'm the coercion method from A to C\n")) > > selectMethod("coerce", c("B", "C")) > # Method Definition: > # > # function (from, to = "C", strict = TRUE) > # cat("Hi there, I'm the coercion method from A to C\n") > # > # Signatures: > # from to > # target "B" "C" > # defined "A" "C" > > as(new("B"), "C") > # An object of class "C" > # Slot "stuff": > # NULL > > selectMethod("coerce", c("B", "C")) > # Method Definition: > # > # function (from, to = "C", strict = TRUE) > # { > # obj <- new("C") > # as(obj, "B") <- from > # obj > # } > # > # Signatures: > # from to > # target "B" "C" > # defined "B" "C" > > Bingo! An automatic B -> C coercion method was created that overrides my A -> C coercion method! And these automatic coercion methods almost always return broken objects when they coerce from a class to a subclass (like in the case of the automatic RSE -> SCE coercion). What makes things even worse it that the method is created only the 1st time you try to coerce from B to C soselectMethod("coerce", c("B", "C"))was “lying” earlier when I called it before trying to doas(new("B"), "C"). > > The coercion framework in S4 is conceptually very powerful but it has some scary dark corners. Nothing that couldn’t be fixed/revisited IMO but I had no luck so far convincing the R folks to improve these things:https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16423,https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16194,https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16421,https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16422

Aaron Lun (12:36:58): > So, the moral of the story here is that we need to explicitly define both B->C and A->C?

Aaron Lun (12:37:17): > In the case where the class hierarachy looks like A => B => C.

Aaron Lun (12:38:15): > In any case,@Vince Carey, this is fixed and pushed.

Kevin Rue-Albrecht (12:41:02): > @Aaron LunJust thinking.. in a world where “chained conversion” could be identified automatically (irrespective of language), what would you have happen is all of the conversion methods below simultaneously existed in your environment? > - A -> B > - B -> C > - A -> C > How would you have the program “choose” whether to use sequentially A -> B -> C or the explicit A -> C ?

Aaron Lun (12:41:14): > Shortest path.

Aaron Lun (12:41:26): > Same way that S4 chooses method dispatch.

Aaron Lun (12:41:50): > Of course, this results in problems if two methods are tied for shortest path, which triggers ambiguity warnings.

Hervé Pagès (12:42:53): > Now, just to be clear, my above rant was about the silly automatic coercion getting in the way. I’m not saying S4 should be smart enough to do the chaining automatically but at least my original A -> C coercion should have been called when I tried to coerce B to C.

Aaron Lun (12:52:54): > Well, in any case, I’ve just added both SE->SCE and RSE->SE explicitly.

Vince Carey (15:05:46): > Thanks. Now here’s a question – do we really need RSE? Isn’t it just an SE with a more elaborate rowData than usual? Wrong forum I guess but perhaps someone can clarify. Years ago I may have been enamored of RSE and pushed in the other direction.

Aaron Lun (16:05:14): > I think this initially came from@Davide Risso, maybe@Peter Hickeyas well for his methylation data. I guess we were of the opinion that it didn’t hurt.

Aaron Lun (16:05:39): > I mean, if there are no coordinates to store, then there’s not much overhead anyway, so no harm done.

2018-10-29

Hervé Pagès (19:36:19): > Related to the SE- > RSE -> SCE coercions: just noticed that the Extensions vignette in the SummarizedExperiment package maybe could say a few words about how to implement those coercions and mention the pitfalls discussed earlier.

Aaron Lun (23:14:05): > Done, as requested.

2018-10-31

Raphael Gottardo (16:51:58): > @Raphael Gottardo has joined the channel

Raphael Gottardo (16:53:48): > Team, are there been any work on supporting multiple omics assay at the single-cell level. So sort of like an MAE structure for single-cell data? A SummarizedExperiment x MAE. Perhaps that’s already supported? I am thinking of CITE-Seq or ATAC-seq and RNA-seq on the same cells. Or VDJ scRNA-seq.

Raphael Gottardo (16:54:06): > @Martin Morgan@Aaron Lun@Rob Amezquita

Rob Amezquita (16:54:09): > @Rob Amezquita has joined the channel

Jayaram Kancherla (16:57:06): > @Jayaram Kancherla has joined the channel

Martin Morgan (17:09:01): > if each of the data types can be represented in a SummarizedExperiment (or RaggedExperiment or any other matrix-like object) then they can be combined into a MultiAssayExperiment. SummarizedExperiment can store DelayedArray, so very large data.@Marcel Ramos Pérezmight have more to say

Aaron Lun (17:19:54): > That seems like a fairly straightforward application of the MAE, provided each data set can be stored as a SCE.

Raphael Gottardo (17:22:10): > Great, thanks@Martin MorganDo you see any additional development that needs to be done in that space?

Aaron Lun (17:23:39): > I will tag in@Ricard Argelaguet, who does exactly this stuff with multi-omics.

Ricard Argelaguet (17:27:15): > I did actually start a package to do integration of single cell omics, but never finished it. If you are interested@Raphael Gottardowe could try push this

Ricard Argelaguet (17:31:24): > i.e. MultiAssayExperiment object with added functionalities. The important thing is also to get good classes for some single cell modalities. For example, Bisulfite sequencing data is currently not very well supported and there are very few tools available in Bioconductor. I was mainly focused on this

Aaron Lun (18:01:49): > Well, bsseq comes to mind.

Kasper D. Hansen (22:59:26): > bsseq should work for single cell bisulfite sequencing. In practice, for some current datasets, they dont’ really have single CpG resolution which means smaller data

2018-11-01

Ricard Argelaguet (06:53:18): > ah cool, i just saw that it now handles hdf5 and delayed array backends. Running out memory was my main limitation when i originally tried bsseq (a while ago) with a few thousand cells

Ricard Argelaguet (06:54:30): > i will give it a shot again, thanks!

Davide Risso (08:34:21): > @Levi Waldronand I have written a proposal that was proposing some extension to MAE and SCE to accommodate multi-patient multi-modal data

Davide Risso (08:34:49): > Happy to join efforts, although we didn’t really do anything yet

Davide Risso (08:35:09): > Except brainstorming some ideas

Stephanie Hicks (08:38:09): > yeah, this is something I think is worth pushing forward considering all the multiomics data from the same cells coming. But then again, I now also hear a lot about multiomics from not the same cells, but from the same population of cells.

Stephanie Hicks (08:38:20): > happy to help brainstorm ideas if useful

Raphael Gottardo (10:02:46): > Yes I think we should include something on that. This would nicely complement what we propose.

2018-11-04

Tim Triche (17:17:18): > Interested in doing this if it helps move MOFA forward

Tim Triche (17:18:01): > Talking with Vince tomorrow about plugging a lot of multi omic datasets (some with matched single cell data) into restfulSE

2018-11-20

Vince Carey (12:15:25): > is there a “cell type signature” resource we can use in Bioconductor? for example, i have a cluster of cells and i obtain a collection of DE genes … i can manually submit the list of mouse genes toimmgen.organd it will give some indication of cell types with similar signatures. I do not see an indication of an API for that system.

Kevin Rue-Albrecht (12:33:02) (in thread): > We would love that in our group. There are several biorXiv paper that seem to work on identifying and using signature to classify new experiments. I’ve given some thought to the problem myself over the summer (PCA, NNMF, …) but haven’t yet come up with any satisfying solution to define, organise, and predict cell types/state/phenotype

Kevin Rue-Albrecht (12:40:21) (in thread): > trying to illustrate with a few resources, none of them “Bioc-compliant” yet, as far as I could see > -https://www.biorxiv.org/content/early/2018/08/20/395004.1-https://github.com/dviraran/SingleR-https://www.biorxiv.org/content/early/2018/07/15/369538- - Attachment (bioRxiv): Decomposing cell identity for transfer learning across cellular measurements, platforms, tissues, and species. > New approaches are urgently needed to glean biological insights from the vast amounts of single cell RNA sequencing (scRNA-Seq) data now being generated. To this end, we propose that cell identity should map to a reduced set of factors which will describe both exclusive and shared biology of individual cells, and that the dimensions which contain these factors reflect biologically meaningful relationships across different platforms, tissues and species. To find a robust set of dependent factors in large-scale scRNA-Seq data, we developed a Bayesian non-negative matrix factorization (NMF) algorithm, scCoGAPS. Application of scCoGAPS to scRNA-Seq data obtained over the course of mouse retinal development identified gene expression signatures for factors associated with specific cell types and continuous biological processes. To test whether these signatures are shared across diverse cellular contexts, we developed projectR to map biologically disparate datasets into the factors learned by scCoGAPS. Because projecting these dimensions preserve relative distances between samples, biologically meaningful relationships/factors will stratify new data consistent with their underlying processes, allowing labels or information from one dataset to be used for annotation of the other — a machine learning concept called transfer learning. Using projectR, data from multiple datasets was used to annotate latent spaces and reveal novel parallels between developmental programs in other tissues, species and cellular assays. Using this approach we are able to transfer cell type and state designations across datasets to rapidly annotate cellular features in a new dataset without a priori knowledge of their type, identify a species-specific signature of microglial cells, and identify a previously undescribed subpopulation of neurosecretory cells within the lung. Together, these algorithms define biologically meaningful dimensions of cellular identity, state, and trajectories that persist across technologies, molecular features, and species. - Attachment (GitHub): dviraran/SingleR > SingleR: Single-cell RNA-seq cell types Recognition - dviraran/SingleR - Attachment (bioRxiv): scPred: Single cell prediction using singular value decomposition and machine learning classification > Single-cell RNA sequencing has enabled the characterization of highly specific cell types in many human tissues, as well as both primary and stem cell-derived cell lines. An important facet of these studies is the ability to identify the transcriptional signatures that define a cell type or state. In theory, this information can be used to classify an unknown cell based on its transcriptional profile; and clearly, the ability to accurately predict a cell type and any pathologic-related state will play a critical role in the early diagnosis of disease and decisions around the personalized treatment for patients. Here we present a new generalizable method (scPred) for prediction of cell type(s), using a combination of unbiased feature selection from a reduced-dimension space, and machine-learning classification. scPred solves several problems associated with the identification of individual gene feature selection and is able to capture subtle effects of many genes, increasing the overall variance explained by the model, and correspondingly improving the prediction accuracy. We apply scPred to scRNA-seq data from pancreatic tissue, colorectal tumor biopsies, and circulating dendritic cells, and show that scPred is able to classify cell subtypes with an accuracy of 96.1-99.2%. Furthermore, we demonstrate that the feature selection step of scPred is able to discriminate from both transcriptional variation between single-cell RNA-sequencing protocols, and between laboratory batch effects, and still predict cell subtype with an accuracy greater the 96%. Collectively, our results demonstrate the utility of scPred as a single cell prediction method that can be used for a wide variety of applications. The generalized method is implemented in software available here: https://github.com/IMB-Computational-Genomics-Lab/scPred/

Vince Carey (12:45:23): > Thanks Kevin! I’ll take a look.

Raphael Gottardo (12:57:26): > @Vince CareyThere is flowCL but that’s not great. We should do more work in this space, and try to leverage the cell ontology.

Rob Amezquita (13:13:29) (in thread): > +100 to this,@Valentin Voilletand I use a private repo of genes we’ve curated but nothing that’s been properly evaluted/created in a sensible manner

Peter Hickey (14:39:39): > +1 for doing more on this. > my first effort has been to start packaging ImmGen data into an ExperimentHub resource to simplify creating DE lists to compare from it to a single cell DE list > then, to make ‘cell type signature’ lists available (also from ExperimentHub?)

Aaron Lun (14:40:08): > Suggest shifting this to a different channel.

Peter Hickey (14:40:13): > also looking to do the same for data onhttps://www.haemosphere.org/ - Attachment (haemosphere.org): Haemosphere > Gene Expression Analysis Tool

Aaron Lun (14:40:28): > #singlecellexperimentwas intended for S4 dataa containers and stuff.

Aaron Lun (14:40:38): > Unless I’m misunderstanding what you guys are talking about.

Kevin Rue-Albrecht (14:42:50): > I’m personally thinking more about a container “orthogonal” toSCE. Meaning thatSCEinstances contain experiments, while here the discussion is aboutsignaturesto extract/apply from/toSCEinstances

Aaron Lun (14:43:25): > Hmph.

Kevin Rue-Albrecht (14:43:59): > from a ML perspective, I’d be keen to see apredict(x, sce)function

Valentin Voillet (14:45:20): > @Valentin Voillet has joined the channel

Kevin Rue-Albrecht (14:45:24): > (“views are my own”)

Aaron Lun (14:45:26): > That’s pretty hardcore.

Aaron Lun (14:45:48): > w.r.t. the underlying statistical machinery that needs to be sorted out, not just the object containers.

Aaron Lun (14:46:50): > Well, IMO, the cell ontology stuff seems complicated enough to deserve its own channel, but hey, whatever.

Kevin Rue-Albrecht (14:46:58): > let’s say i’m looking way down the line

Kevin Rue-Albrecht (14:47:27): > kind of howbatcheloraims to wrap around various batch correction methods

Aaron Lun (14:47:44): > yeah, and that’s why we have#sc-batch-correction.

Kevin Rue-Albrecht (14:48:27): > perhaps it’s time for a #sc-cell-signature ?

Aaron Lun (14:50:21): > Yes.

2018-11-23

Aaron Lun (08:48:43): > @Martin Morgan@Michael LoveI noticedSummarizedExperimenthas areadKallistofunction. should that be made obsolete by tximeta?

Michael Love (08:55:40): > @Michael Love has joined the channel

Aaron Lun (08:57:15): > Bit of an odd place to put this function.

Michael Love (08:57:37): > Redundancy doesn’t hurt but in theory they provide same functionality

Michael Love (08:59:16): > tximeta will also optionally reduce the inferential replicates by sample variance

Aaron Lun (09:00:25): > I’d say there should be Only One Way To Do It, at least in this specialised case. I don’t see the benefit in maintaining two versions of the same thing. I don’t have any stake in this (not being a maintainer of either relevant package), but that’s my 2c as a user.

Michael Love (09:02:32): > readKallisto was developed around the same time as tximport and I remember discussing with MM that my plan was to have a simple pkg with no S4 and no dependencies (tximport) and then another one with more dependencies that outputs SE (tximeta)

Michael Love (09:03:36): > tximeta then didn’t come around for another 2 years bc I was preoccupied :-)

Aaron Lun (09:05:04): > Fair enough. But from what you’re saying, it sounds like we don’t needSE::readKallistoanymore.

Michael Love (09:07:39): > Potentially yes. We will continue to develop tximport and tximeta and provide SE output for many different quant methods

Michael Love (09:08:47): > (It’s holiday in US so may not get replies from others until Mon. I’m avoiding family by going on Slack:wink:)

Aaron Lun (09:09:25): > ah right. That’s why it’s so quiet around here.

Kasper D. Hansen (20:33:19): > I agree with@Aaron Lun

2018-11-24

Martin Morgan (12:29:40): > Me too, I’ll try to remember to deprecate readKallisto; it was a fast intermediate step…

2018-11-26

Aaron Lun (05:10:57): > I would like to get rid ofSingleCellExperiment::mutateand the other verbs. Is there a Bioc-tidy package I can pass them off to? Biobroom seems closest.

2018-11-29

Aaron Lun (07:42:42): > @Martin MorganDoesTENxMatrix()assume 0- or 1-based indices inindices? The 10X documentation is not clear on this at all, but given it’s designed to be read by Python, one would assume that it’s 0-based.

Aaron Lun (07:49:01): > Looking at the code suggests it’s 0-indexed.

Aaron Lun (07:53:14): > But the currentTENxMatrixcode seems to assume that it’s 1-indexed, at least when Ias.matrixit.

Aaron Lun (08:04:45): > Oh, no, it’s fine. My mistake.

2018-12-04

Avi Srivastava (08:43:36): > @Avi Srivastava has joined the channel

Avi Srivastava (08:51:52): > Hi guys, we (PI- Rob Patro) recently preprinted a tool Alevin, which performs quantification of droplet based single cell protocols. Along with improved quantification estimates, Alevin also has a capability to perform bootstrapping of the gene counts matrix per cell. I am aware of the fact that the format of the estimates of single cell experiments is itself an active research area let alone multiple bootstraps matrices per experiment but just wanted to explore the idea of what should be an ideal format to dump say|i.j.k|size matrix where i is the number of cells, j is the number of genes and k is the number of bootstraps. > We have a couple of ideas for the same, one could be (g is gene, c is cell, b is bootstrap id): > > g1,g2,g3 > c1b1,-,-,- > c1b2,-,-,- > c1b3,-,-,- > c2b1,-,-,- > c2b2,-,-,- > c2b3,-,-,- > > Let me know if you guys have comments/suggestion on this.

Aaron Lun (08:52:44): > I think@Michael Lovehas something for this in tximeta.

Aaron Lun (08:53:17): > Old version ofscaterjust chucked bootstrap estimates in as separate assays of the SCE.

Aaron Lun (08:53:42): > I recall some bits of code floating around in the package to do this, in fact. Haven’t purged all of it yet.

Davide Risso (09:25:54): > one assay per bootstrap sample, i.e., k [i,j] matrices would fit with the current SCE class

Kevin Rue-Albrecht (09:31:52): > Perhaps goes without saying, but I assume that each of those assays would point to an HDF5-backed matrix, right?

Michael Love (09:32:50): > we’re thinking of what alevin should put on disk before tximeta reads it in to a SE / SCE shape

Michael Love (09:33:11): > one option is to record {i,j,k,x} quadruples

Michael Love (09:33:50): > currently tximeta puts in inferential replicates as extra assaysinfRep1,infRep2etc.

Kevin Rue-Albrecht (09:34:03): > We just had a presentation from Páll Melsted yesterday in Oxford, who presented the BUS format that Kallisto now outputs for single-cell data. Any opinions?

Michael Love (09:34:11): > (also these can be summarized withvarReduce=TRUEwhich will make avariancematrix)

Michael Love (09:34:46): > one difference between BUS and what Avi and Rob are doing is that alevin is outputting the inferred gene counts

Kevin Rue-Albrecht (09:35:06): > I’ve only run thekallisto buspipelines this morning, and I don’t immediately have time to play with the BUS files. So my feedback is somewhat limited at the moment

Michael Love (09:36:10): > i guess the question we have now is that alevin infers gene counts for each cell, and has inferential replicates of this (for now bootstrap but possibly also Gibbs in the future), and we want to store this on disk, and then pull them into R as an SE / SCE

Aaron Lun (09:37:19): > It’s worth breaking down the concerns here. The first is how we represent the bootstrap replicates, and I think storing them as separate assays would be fairly sensible. The second is how to store this on disk, which is a bit trickier if you want to be able to use out-of-memory representations for each matrix in R.

Aaron Lun (09:37:49): > One example would be to store each replicate as a HDF5DataSet, which is directly compatible with HDF5Arrays.

Aaron Lun (09:38:16): > Sparser representations would be tricky but still accommodated with minor tweaks (e.g., to TENxMatrix) if you don’t hold them as quadruples.

Michael Love (09:38:25): > yes, i’m inclined to use HDF5 except that the data is so sparse

Michael Love (09:38:51): > so we should read up on TENxMatrix

Michael Love (09:39:10): > where is this function?

Aaron Lun (09:39:31): > InHDF5Array. Note that it won’t work directly for you right now, but you can take the same ideas and generalize it pretty easily.

Michael Love (09:40:04): > ok, we’ll look for inspiration

Rob Patro (09:40:46): > @Rob Patro has joined the channel

Michael Love (09:40:58): > worst case we could use HDF5, rather than storing as some custom type (really don’t want to do this)

Michael Love (09:41:14): > oops, have to run, i’ll check in later

Avi Srivastava (09:42:45): > On the discussion of HDF5,https://github.com/CINPLA/exdiris worth considering . - Attachment (GitHub): CINPLA/exdir > Directory structure standard for experimental pipelines. - CINPLA/exdir

2018-12-06

Vladimir Kiselev (16:48:39): > Related to the discussion above - is there a good tutorial on using hdf5/singlecellexperiment/loom and corresponding export/import/conversion. We are trying to generalize an output of our sc rna sea pipeline so that it can be then read to R/python packages without as less conversions as possible. Has anyone already achieved that?

Martin Morgan (16:51:58): > Maybe the LoomExperiment package and vignette is a starting place?https://bioconductor.org/packages/LoomExperiment - Attachment (Bioconductor): LoomExperiment > The LoomExperiment class provide a means to easily convert Bioconductor’s

2018-12-08

Charlotte Soneson (04:43:53): > @Charlotte Soneson has joined the channel

2018-12-10

Mark Robinson (15:15:31): > @Mark Robinson has joined the channel

2018-12-12

Fabiola Curion (16:36:30): > @Fabiola Curion has joined the channel

2018-12-14

Rena Yang (12:45:53): > @Rena Yang has joined the channel

2018-12-22

Aaron Lun (22:47:14): > @Davide RissoSCE travis is broken.

2018-12-23

Kevin Rue-Albrecht (04:13:54): > Single cell, single cell, single in their wellOh what fun it is to fix the code on Christmas day - hey!

2018-12-28

Aaron Lun (01:25:36): > @Davide RissoTravis is still broken.

2019-01-03

Aedin Culhane (13:10:40) (in thread): > Also have a look at some of the work from@Benjamin Haibe-Kainsgroup. They were trying to standardize signatures

Benjamin Haibe-Kains (13:10:46): > @Benjamin Haibe-Kains has joined the channel

2019-01-07

Davide Risso (10:11:19) (in thread): > fixed!

Aaron Lun (10:12:09) (in thread): > :+1:

2019-01-08

Laurent Gatto (02:03:48): > @Laurent Gatto has joined the channel

Sean Davis (11:24:07): > Small speaker announcement on#randomthat might be of interest to folks here.

2019-01-10

Levi Waldron (11:49:41): > Not strictly single-cell related, but does anyone know of a DelayedMatrix implementation ofbase::rowsum(), or a workaround for accomplishing something similar without resorting to as.matrix?

Levi Waldron (11:57:43): > And is there another more appropriate channel for HDF5-related discussion?

Sean Davis (11:58:29): > #bigdata-rep,@Levi Waldron.

Kasper D. Hansen (12:31:54): > This is supposed to be the realm of DealayedMatrixStats

Tim Triche (13:36:43): > rowsum not rowSums

Martin Morgan (14:26:44): > The high-level impl is to split the row indexkeyby groupgrpthen iterate and stitch together, which is actuallysapply(split(key, grp), function(idx, m) colSums(m[idx,]), m)

Tim Triche (14:47:32): > R> methods("rowsum") > [1] rowsum,ANY-method rowsum,DelayedMatrix-method > [3] rowsum,HDF5Matrix-method rowsum,matrix-method > [5] rowsum.data.frame rowsum.default > see '?methods' for accessing help and source code >

Tim Triche (14:47:37): > welp, I’m dum

Tim Triche (14:47:52): > (how do you blockquote code in Slack?)

Peter Hickey (14:48:42): > I think I also added an analogouscolsum()

Tim Triche (14:49:17): > rowsum.default, after error checking, ends up being > > .Internal(rowsum_matrix(x, group, ugroup, na.rm, as.character(ugroup))) >

Peter Hickey (14:49:36) (in thread): > Three tick fencing for code chunks

Tim Triche (14:50:19) (in thread): > muchas garcias:thumbsup:

Tim Triche (14:51:35): > @Peter Hickeyyep > > standardGeneric for "colsum" defined from package "DelayedMatrixStats" >

Tim Triche (14:52:15): > moving back to#bigdata-repsince the interaction between this and remote HDF5 (via HSDS) is of interest

2019-01-11

Rob Amezquita (15:27:54): > Who made this figure originally? Could I get the figure original from PPT/AI? (@Davide Risso@Aaron Lun) - File (PNG): Pasted image at 2019-01-11, 12:27 PM

Aaron Lun (19:32:00): > Looks like the old SE figure from one of BioC core team, modified by Davide for the extra slots.

Aaron Lun (19:32:17): > You can see how the line widths for the reduced dim is slightly different from that of the other boxes.

2019-01-17

Vince Carey (15:10:11) (in thread): > Hi@Peter HickeyI noticed some comments in the singlecell expt paper concerning desires for ImmGen annotation. I don’t see anything on ExperimentHub via query(), is this still in the offing? Thanks

Peter Hickey (17:57:58) (in thread): > I still want to do it. time, as ever, is the issue. > I’ll set aside a day next week and update you

2019-01-24

Steve Lianoglou (13:56:31): > @Steve Lianoglou has joined the channel

2019-01-25

Stephanie Hicks (23:08:56): > by chance does anyone have just the count table of the Usoskin 2015 data? (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59739) - Attachment (ncbi.nlm.nih.gov): GEO Accession viewer > NCBI’s Gene Expression Omnibus (GEO) is a public archive and resource for gene expression data.

2019-01-26

Charlotte Soneson (01:22:17): > @Stephanie HicksWe made one version forhttps://www.nature.com/articles/nmeth.4612(based on data fromhttp://linnarssonlab.org/drg/). Script here:https://github.com/csoneson/conquer_comparison/blob/master/scripts/generate_Usoskin_mae.R, count table available in the dataset bundle that can be downloaded fromhttp://imlspenticton.uzh.ch/robinson_lab/conquer_de_comparison/(I can also send you only this one if you want).

Charlotte Soneson (01:22:54): > Also,@Koen Van den Bergeprocessed it forhttps://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1406-4(script:https://github.com/statOmics/zinbwaveZinger/tree/master/realdata/usoskin, data:https://github.com/statOmics/zinbwaveZinger/tree/master/datasets)

Stephanie Hicks (12:22:53): > Thanks@Charlotte Soneson! I remembered someone already having done this, but couldn’t quite put my finger on it

2019-01-28

Davide Risso (09:43:26): > Hi@Aaron Lun, I’m not sure if this is the right channel, but what happens inscatercurrently if I have a SCE with a DelayedMatrix and I runnormalize()?

Aaron Lun (09:43:35): > Some magic.

Aaron Lun (09:43:53): > See?normalizeSCE, assuming you’re on devel.

Davide Risso (09:44:53): > oh ok so it returns a DelayedMatrix, right?

Aaron Lun (09:45:08): > Under certain conditions.

Davide Risso (09:45:37): > yes, I’m reading… I was not looking at the devel version before…

Davide Risso (09:45:55): > This is great! Thanks!

2019-01-31

Aaron Lun (15:21:13): > Man, I wish I could separate out the actual computing stuff in scater from all the ggplotting gunk.

Rob Amezquita (15:23:13): > @Aaron Lunthis is something ive been hoping for…i ended up writing a function that basically formats the colData/reducedDim and any desired expression features into a data frame for constructing my own ggplots

Rob Amezquita (15:24:22): - File (R): .extractFrame()

Aaron Lun (15:25:47): > Going home now, but will think about this.

2019-02-05

Aaron Lun (05:33:05): > I WILL PAY GOOD MONEY FOR SOMEONE TO MAKE A SINGLE CELL DATA FORMAT CONVERTER PACKAGE.

Aaron Lun (05:33:35): > You can start fromscran::convertToif you like, as long as you take it off my hands.

Aaron Lun (05:35:04): > And by “good money”, I mean gold

Aaron Lun (05:35:09): > nescafe gold blend, that is.

Kevin Rue-Albrecht (05:35:32): > sorry, for my own clarification: converting from what to what?

Aaron Lun (05:36:04): > Anything to anything.

Aaron Lun (05:36:20): > SCE to CellDataSets, SCE to Seurat, vice versa, whatever.

Kevin Rue-Albrecht (05:37:42): > ah right, I had the “vice versa” started athttps://github.com/kevinrue/BioSeurat, but then Pete rightly pointed out that Seurat finally included a method for that:https://github.com/Bioconductor/Contributions/issues/826

Kevin Rue-Albrecht (05:39:26): > Still. It’d be nice to have a consistent set ofascommands, rather than each package implementing their own style of conversion, e.g.Seurat::Convert

Aaron Lun (05:40:25): > The problem is that things don’t map directly to each other.

Kevin Rue-Albrecht (05:40:34): > I can’t remember precisely, and I’m too lazy to check, but last time I tried, some of the Seurat conversion methods were crashing. Not sure if it wasConvertoras.SingleCellExperiment

Kevin Rue-Albrecht (05:41:27): > Well, as long as the “mappable” bits are converted, isn’t that the point ofasmethods? i.e. dropping the unmappable bits?

Aaron Lun (05:41:43): > Well, that’s the thing. Some of the mappings seem ambiguous.

Kevin Rue-Albrecht (05:47:47): > Oh definitely. Which I think is why at the moment the consensus is to let each package developer write their own “export” conversion methods, to decide what they think should map to each other format they accept to support

Kevin Rue-Albrecht (05:49:08): > I doubt anyone would like to take the responsibility of making that decision for everyone. As you said for#sc-signatureI’d expect an “endless stream of complaints about misclassification”

Aaron Lun (06:03:53): > ARgh. So I can’t get rid ofconvertTo.

Federico Marini (07:08:34): > @Federico Marini has joined the channel

Federico Marini (07:09:19): > Hi everyone:slightly_smiling_face:Given the fact that the SCE class was conceived and born here, you might have some first hand info

Federico Marini (07:09:52): > Q: are there any plan to incorporate the reducedDim slots/functionality also in the “simple” SummarizedExperiment objects?

Kevin Rue-Albrecht (07:15:07): > @Federico MariniI’m curious: what do you think would be the benefit? > I don’t see much overhead in separating the two classes, it “tags” the SCE as having the extra slot available, and it also separates responsibilities and testing of features between the “core” SE, and the additional functionality brought in by the SCE.

Federico Marini (07:15:58): > I’d like to have the computed PCA (for bulk) stored in a slot, as for SCE

Federico Marini (07:16:11): > In the analysis, I can always cast SE to SCE

Federico Marini (07:16:33): > I was thinking for example for package devel, where I would need the extra dependency

Kevin Rue-Albrecht (07:21:10): > Well, that’s the thing that I always say:SCEis just a name, and the class inherits fromSE, so what’s the issue with running bulk analyses inSCEcontainers? Packages that expect SE should not see the difference (so long as they use S4 dispatch orinherits, but notclass(x) == "SE")

Federico Marini (07:22:02): > no issue for that. it is more that I would avoid using an SCE in a package where I do not depend/import SingleCellExperiment

Kevin Rue-Albrecht (07:23:02): > ah that - it the packages uses, then the package must import (more “depend”) to have access to the class and methods. I don’t see a way around that

Kevin Rue-Albrecht (07:24:34): > And to be fair,SCEdoesn’t add that many extra dependencies on top ofSE: > > Depends: SummarizedExperiment > Imports: S4Vectors, methods, BiocGenerics, utils, stats > Suggests: testthat, BiocStyle, knitr, rmarkdown, scRNAseq, magrittr, > Rtsne, Matrix >

Kevin Rue-Albrecht (07:26:12): > Anyway, curious to hear from the SCE dev team on the subject

Federico Marini (07:28:48): > I think it is more a core-SE - at least the decision. The SCE team can be the best opinionated opinion on this:smile:

Kevin Rue-Albrecht (07:37:54): > Oh sorry right, yes, I actually should have said: especially core team, I just have a hunch that the discussion might have already happened and that the SCE dev team also has the answer.

Kayla Interdonato (07:46:57): > @Kayla Interdonato has joined the channel

Martin Morgan (08:51:57): > The problem with adding slots that are only sometimes used is that the type of the object is compromised – all code just starts withif (is.null(...)) stop()and the user wonders at the point of a function that mostly fails.

Kevin Rue-Albrecht (09:03:33): > I do love that aspect of class names “tagging” available slots (or even particular constraints on existing slots:https://master.bioconductor.org/help/course-materials/2017/Zurich/S4-classes-and-methods.html#extending-a-class-without-adding-any-slot)

Tim Triche (11:38:37): > arbitrary-format-to-HDF5 and HDF5-to-arbitrary-format seems like the most efficient approach, no?

2019-02-22

Vince Carey (06:32:56) (in thread): > hi pete … just wondering …

Kevin Rue-Albrecht (06:47:52): > In terms of memory efficiency is there a recommended syntax betweenassay(sce[use.genes, ], assay.type)andassay(sce, assay.type)[use.genes, ]? > i.e. subsetting the object and extracting the assay -vs- extracting the full assay and then subsetting? > I recently faced an issue on RStudio cloud (due to their very restrictive memory limit), which prompted me to think more carefully about what happens under the hood. > Apparently, “[R performs] a full memory copy before making the mutation” > > https://community.rstudio.com/t/killed-error-during-installation-for-many-packages/24239/5?u=kevinrue - Attachment (RStudio Community): “Killed” error during installation for many packages > When I open that project and run gc() I see: > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 5475504 292.5 8828510 471.5 8828510 471.5 Vcells 16484956 125.8 23798222 181.6 19711902 150.4 I am not certain how R mutates into the copy, but if it performs a full memory copy before making the mutation, then you could very easily run out of memory here.

Kevin Rue-Albrecht (06:48:28): > I’m asking here, in the absence of a #summarizedexperiment channel

Aaron Lun (06:50:19): > Well, you should dowithDimnames=FALSEfor starters.

Kevin Rue-Albrecht (06:51:41): > point taken

Martin Morgan (08:28:43): > extract then subset is more efficient, as with e.g., dat.framedf$foo[1:10](good) vsdf[1:10,]$foo(less good)

Kevin Rue-Albrecht (08:33:04): > Ok, thanks!

Martin Morgan (09:52:12): > Also, while on the topic, generally, updating objects is very expensive in R so the strategy when doing frequent updates is to separate the part of the object to update, make many updates, then update the object. Usually this is very conveniently (for reuse, for unit tests, for logical comprehension of your program, …) implemented as a function, so > > .update_assay <- function(assay, ...) { > ... > modified_assay > } > > assay(se) <- .update_assay(assay(se), ...) > > It is much less costly to update vectors and matrices than data.frame or higher-level objects, so the.update_*()should generally be implemented on these types.

Kevin Rue-Albrecht (09:58:28): > Really good point. I can only speak for myself, but running things on HPCs tends to let me forget keeping an eye on memory and performance.

Aaron Lun (10:15:54): > I run everything on my desktop with 8 GB ram so I ALWAYS have an eye on memory and performance.

2019-02-24

Peter Hickey (20:12:54) (in thread): > Thanks for the nudge, Vince. I created a local package but in doing so found some errors in the annotations used. I’m currently working with the data generators to resolve these

2019-02-28

Vince Carey (05:38:00): > hi@Aaron Lunmaybe this is not the right place for this … but I am trying to run reads.Rmd from simpleSingleCell in devel … and I am hitting > > > mito <- which(rowData(sce)$CHR=="chrM") > > sce <- calculateQCMetrics(sce, feature_controls=list(Mt=mito)) > Error in bp.out[[1]] : subscript out of bounds > > at > > label: unnamed-chunk-12 > Quitting from lines 219-222 (reads.Rmd) > Error in bp.out[[1]] : subscript out of bounds > > Any thoughts?

Aaron Lun (05:57:03): > BiocParallelbroke and was fixed.https://github.com/Bioconductor/BiocParallel/issues/96

2019-03-15

Sean Davis (11:01:05): > New funding opportunity:https://www.i2cell.science/the-award/ > > The Fourmentin-Guilbert Scientific Foundation is inviting applicants to submit proposals to the I2CELL Seed Award. It is intended to support a 3 years experimental project in biology to encourage experimental approaches that explore the algorithmic processing of information in biological systems. The experimental dimension of a research proposal is of major importance as well as the biological question which is addressed.

2019-04-03

Lambda Moses (01:05:35): > @Lambda Moses has joined the channel

Levi Mangarin (15:46:20): > @Levi Mangarin has joined the channel

2019-04-23

darlanminussi (12:54:40): > @darlanminussi has joined the channel

2019-04-26

Almut (09:47:20): > @Almut has joined the channel

2019-05-06

Firas (10:31:43): > @Firas has joined the channel

2019-05-08

Ming Tang (13:51:26): > @Ming Tang has joined the channel

2019-05-13

Michael Love (11:38:44): > hiscaterfolks (@Davis McCarthy,@Aaron Lun), the release vignette says > > scater also provides wrapper functions readSalmonResults or readKallistoResults to import transcript abundances from the kallisto and Salmon pseudo-aligners. > but i think you’ve dropped these functions, in lieu oftximetawhich can easily convert from SE to SCE, right? that was the deprecation message at least

Aaron Lun (12:11:37): > yes. Must have forgotten to kill the commentary in the vignette.

Aaron Lun (12:14:49): > @Michael LoveCould you open an issue on the repo and I’ll fix it when I get home.

Michael Love (12:23:01): > ok

2019-05-16

Sridhar N (11:21:22): > @Sridhar N has joined the channel

2019-05-20

Assa (05:28:22): > @Assa has joined the channel

Sean Davis (21:07:46): > Posted a preprint of a comparison of PCA methods over on#sc-batch-correction.

2019-05-21

Sridhar N (14:37:48): > where can i find marker genes to determine cell type for mm10? is there a database or paper that talks about such markers?

Aaron Lun (14:38:36): > probably tabula muris has a list of signatures for all identified clusters, if you dig around in their supporting material.

Sridhar N (14:38:47): > I see

Sridhar N (14:38:58): > thanks all i came across was Haemopedia Degraf etal

Sridhar N (14:39:43): > ohh yea, tabula muris is indeed what i was looking for detailed information

2019-05-22

Brendan Innes (15:48:28): > @Brendan Innes has joined the channel

2019-05-30

FeiZhao (18:57:13): > @FeiZhao has joined the channel

2019-06-23

Ameya Kulkarni (22:10:03): > @Ameya Kulkarni has joined the channel

2019-06-24

Komal Rathi (09:22:53): > @Komal Rathi has joined the channel

2019-06-25

Aaron Wolen (10:17:11): > @Aaron Wolen has joined the channel

Sonali (15:52:33): > @Sonali has joined the channel

2019-06-26

Junhao Li (13:29:48): > @Junhao Li has joined the channel

2019-06-28

Andrew McDavid (06:48:07): > @Davide Risso@Aaron LunHave either of you used MultiAssayExperiment in anger for CiteSeq/ReapSeq data? If so, wondering how well-suited you found it?

Davide Risso (08:39:48): > No, sorry. I don’t have direct experience with cite-seq data…

Aaron Lun (11:12:59): > Apparently according to@Rob Amezquitait is not good. We discussed this and it is better to stuff it in the colData.

Andrew McDavid (12:09:16): > Or I saw you had a thread on bioc devel with someone where you mentioned putting it as another row in theassayhttps://stat.ethz.ch/pipermail/bioc-devel/2019-January/014573.html

Aaron Lun (12:12:43): > There’s multiple options on the table, depending on how you envisage its use. Putting them as another row in the assays would make sense if you intend to apply the same algorithms to the Ab tags as to the endogenous genes. (Even if you never apply the algorithm to both sets at the same time, you could just easily re-use the same functions while subsetting the matrix.)

Aaron Lun (12:13:28): > If you intend to treat the Ab tag data as metadata, e.g., for annotation or colouring plots, it makes more sense to cut them out and stuff them in thecolData. Same goes if you never/rarely intend to apply the same algorithms to the ab tags as to the endogenous genes.

Andrew McDavid (12:13:42): > Yeah, I agree…multiple options.

Aaron Lun (12:14:09): > The second option is “safer” in the sense that you won’t accidentally forget to do the subsetting and then apply an operation on all rows at once, which probably wouldn’t make sense.

Aaron Lun (12:14:26): > The first option is the default as that’s just the natural way to store all the counts in a single matrix.

Andrew McDavid (12:15:34): > 1. MultiAssayExperiment is maybe best suited if there are completely different algorithms needed, and they need to operate on the data as a matrix. But downside is that a lot of code (that could work) won’t work.

Andrew McDavid (12:16:36): > 2. Putting it in colData is fine, but clutters it up, and would be fragile for any algorithms that need to operate on the data as matrix rather than a sequence of columns.

Aaron Lun (12:16:53): > 2can be overcome with a nested entry.

Aaron Lun (12:17:40): > > library(S4Vectors) > a <- DataFrame(row.names=LETTERS) > a$FACS <- matrix(rnorm(260), ncol=10) >

Aaron Lun (12:17:47): > Transposed, but that’s okay.

Andrew McDavid (12:18:14): > 3. Putting it in assays is maybe also fine, but has the risk of introducing incorrect behavior for some algorithms that are expecting just one class of counts there.

Andrew McDavid (12:18:33): > ^^regarding 2, that’s an appealing thought

Andrew McDavid (12:19:15): > I’ll confess that whenever I’ve gotten DataFrames with nested columns I just curse and immediately try to figure out how to unnest.

Aaron Lun (12:19:52): > INCEPTION BAAAAAM

Aaron Lun (12:20:09): > Is this a DataFrame? Or a DataFrame within a DataFrame?

Andrew McDavid (12:21:20): > Indeed. I guess using a nested column in practice may require a PRs in scater, etc.

Andrew McDavid (12:21:55): > And is there an S4Vectors::unnest and S4Vectors::nest?

Aaron Lun (12:21:55): > scater’s vis tools already support nested columns if the internal nesting is a DF.

Andrew McDavid (12:22:00): > Ah nice

Aaron Lun (12:22:08): > Wouldn’t be much effort to extend this to matrices, just need to change[[to[,].

Andrew McDavid (12:31:02): > Or how about this tasty chestnut: > > sce = SingleCellExperiment(assay = matrix(rnorm(1000), 100, 10)) > rowData(sce)$sce = sce >

Andrew McDavid (12:32:02): > …don’t try to printrowData(sce), though.

Aaron Lun (12:32:17): > OH YEAH.

Aaron Lun (12:32:26): > Put another one in there!

Andrew McDavid (12:32:35): > lol

Aaron Lun (12:33:10): > With some careful overriding of the methods, you could create a memory bomb.

Andrew McDavid (12:34:30): > It looks like it already triggers some sort of infinite recursion when it tries to call[

Andrew McDavid (12:35:30): > anyways, nesting seems like the best option of 1-3, though maybe some additional batteries should be included in S4Vectors to handle nesting and unnesting.

Aaron Lun (12:37:24): > Nested matrices will probably fail in scater off the shelf - happy to take PRs, make sure you read through thescater-vis-argsdocs (can’t remember exactly what I called it).

Andrew McDavid (12:40:35): > I’ll confess the motivation for this actually is scRNAseq + linked repertoire sequencing, but figured CiteSeq was the area where we might have already dealt with this.

Aaron Lun (12:41:21): > That’s a whole other kettle of fish.

Aaron Lun (12:41:24): > Painful fish.

Aaron Lun (12:41:58): > Even so; if you have a data structure that is parallel to the samples, you can nest it in the colData.

Andrew McDavid (12:43:57): > I do have a data structure, and it could potentially be nested in the colData. Though in effect I am extending atibble, so I may run into trouble trying to put it into aDataFrame:scream_cat:

Aaron Lun (12:45:02): > I hate these goddamn tibbles. Took me 5 minutes to figure out how to get a column out of it.

Andrew McDavid (12:46:00): > [[]]

Andrew McDavid (12:46:34): > effectively,drop = FALSEalways, which i’d argue should have been the default anyways.

Aaron Lun (12:46:58): > Well, that ship has sailed a long time ago.

Andrew McDavid (12:47:58): > in any case, here’s another unholy snippet: > > a <- DataFrame(row.names=LETTERS) > a$tbl = tibble(letters, numbers = 1:26) >

Andrew McDavid (12:49:12): > which seems to just work, but I am…suspicious…

Andrew McDavid (12:50:11): > This has been helpful. Thanks Aaron!

Rob Amezquita (13:39:48): > super helpful to watch from afar, this is great

2019-06-30

Aaron Lun (22:55:19): > @Rob AmezquitaCan you describe exactly why using an MAE was bad?

Aaron Lun (22:55:48): > Fundamentally the same issue applies to spike-in data, and it’s annoyed me enough to want to fix it.

Aaron Lun (22:56:10): > And part of the fix will just naturally apply to Ab data as well.

2019-07-01

Andrew McDavid (10:29:47): > Having tried to use MAE for repertoire data, my take away is that: > i) it’s overkill (experiments have the same samples, or are a simple subset of one another, or nearly so) > ii) there would need to be way more batteries included for downstream analysis to be efficient and enjoyable > iii) at least for my data, needing to store the RepSeq data in transposed “expression” format (rows “features = covariates”, columns cells) wasn’t going to fly, since my data is not an atomic matrix.

Andrew McDavid (10:31:57): > I think that nested colData does seem like a pretty good solution, because it doesn’t require learning a new api (for users) and gives developers quite a bit of flexibility (can stick an S4 object in there). I starting hacking a nicer interface for the RepSeq data and will try to post a proof of concept in the next week or so.

Andrew McDavid (10:33:03): > …And will see what it looks like when it gets plugged into the scater vis tools.

Rob Amezquita (13:31:07): > @Aaron Lunmostly ii) - not as natural. i believe it could work with using only the SCE portion with existing SC analysis, but yeah, something lighter weight like SCE is preferable to MAE at this stage given the widespread support for SCE. so really, mostly i) and ii) per@Andrew McDavid’s point above

2019-07-03

Aaron Lun (02:17:44): > Okay, so (iii) is probably not in scope for this, as RepSeq is another kettle of fish.

Aaron Lun (02:18:32): > (ii) can be managed - makes me grateful that I used S4, because we can just throw a MAE specialization with anexperiment=argument to have it work off the bat.

Aaron Lun (02:22:59): > (i) is… fair enough.

Aaron Lun (02:28:10): > But we should be able to protect people in (i) by giving them a simple way tosplitan existing SCE based on feature type into an MAE.

Aaron Lun (02:28:54): > Which just leaves addition of methods in (ii). That is… tolerable.

Aaron Lun (02:31:02): > I can handle the analytics, but I’ll be damned if I have to go through another round of touching ggplot code inscater.

2019-07-04

Martin Treppner (10:39:24): > @Martin Treppner has joined the channel

Sridhar N (20:54:01): > is there a way to plot overall normalized expression of a gene from single cell data, meaning total expression minus the cluster/celltype

Aaron Lun (21:07:09): > Not really sure what you mean, butscater::plotExpression?

Sridhar N (21:17:48): > Well just like we can plot Tpm from bulk rna seq for each gene across genotypes

Aaron Lun (21:18:15): > So,plotExpressionwithx="your_cluster_label".

Sridhar N (21:19:15): > from seurat - File (PNG): Screen Shot 2019-07-04 at 8.18.39 PM.png

Sridhar N (21:19:37): > plotExpression is that in seurat?

Sridhar N (21:19:58): > oh no you said thisscater::plotExpression

Sridhar N (21:20:07): > ok

Sridhar N (21:20:16): > does scater take seurat object?

Aaron Lun (21:22:11): > Nope.

Sridhar N (21:22:40): > looks it it does no?

Sridhar N (21:22:41): > https://satijalab.org/seurat/v3.0/conversion_vignette.html - Attachment (satijalab.org): Satija Lab > Lab Webpage —

Aaron Lun (21:23:12): > Well, scater doesn’t.

Aaron Lun (21:23:19): > I know, because I wrote those functions.

Sridhar N (21:23:19): > aah ok

Aaron Lun (21:26:40): > The easiest way to mimic the plot above is to useplotExpression()withxset to your cell type andcolour_byset to your treatment.

Aaron Lun (21:27:21): > It doesn’t create the mirrored violin plots, but I’m sure you could do that with moreggplot2-fu.

Sridhar N (21:27:39): > i want to combine all the celltypes

Sridhar N (21:27:46): > total expression

Aaron Lun (21:28:27): > Well, justplotExpression()with the specified feature, then.

Aaron Lun (21:28:48): > Not that it seems particularly useful, but then again, I don’t know what you’re doing with it.

Sridhar N (21:29:02): - File (PNG): Screen Shot 2019-07-04 at 8.28.49 PM.png

Sridhar N (21:29:10): > something like this

Sridhar N (21:30:30): > we expect a particular gene to have higher epxression in celltype(X) in KO vs WT, but we dont see it

Sridhar N (21:30:56): > wanted to see if we see increase in overall expression

Sridhar N (21:31:20): > just a sanity check

Aaron Lun (21:31:51): > Just setx="whatever EC and RegC are"inplotExpression()for all cells.

Aaron Lun (21:32:01): > Or subset the SCE to your cell type of interest.

Sridhar N (21:33:58): > that worked!

Sridhar N (21:34:10): > cheers mate!

2019-07-05

Kevin Missault (05:31:35): > @Kevin Missault has joined the channel

2019-07-09

Stevie Pederson (21:31:35): > @Stevie Pederson has joined the channel

2019-07-15

Oriol Pavón (13:36:23): > @Oriol Pavón has joined the channel

2019-07-17

John Hutchinson (14:32:57): > @John Hutchinson has joined the channel

2019-07-22

Aaron Lun (01:43:59): > <!channel>For those of you who aren’t on the BioC-devel mailing list:https://stat.ethz.ch/pipermail/bioc-devel/2019-July/015305.html

2019-08-01

Loyal (13:14:20): > Is there a converter between SingleCellExperiment and AnnData? Can’t seem to find it if it exists.

Federico Marini (15:43:14): > I think I saw something like that

Federico Marini (15:43:36): > https://github.com/theislab/anndata2ri

Federico Marini (15:43:52): > Did not try directly, though

Tim Triche (15:44:12): > this seems like a Fabian thing to do

Federico Marini (15:44:43): > I did have a quick chat with Malte Luecken after the publication of their recent manuscript

Federico Marini (15:44:56): > and he mentioned me this tiny thingy

Tim Triche (15:45:04): > why do people still use rpy instead of reticulate:grimacing:

Tim Triche (15:45:16): > oh well, if it works, who cares how

Federico Marini (15:45:33): > I was trying to re-do their notebook just with R to put up a sample instance ofiSEE

Federico Marini (15:45:52): > > oh well, if it works, who cares how > in these cases, yes

Tim Triche (15:46:11): > tha’ts awesome! I’ve been breaking iSEE lately and trying to figure out how to have it automatically expose MultiAssayExperiment guts

Federico Marini (15:46:37): > uh

Federico Marini (15:46:45): > do please keep us all up to date:slightly_smiling_face:

Tim Triche (15:46:54): > I know, of course it’s a bad idea, that is my specialty

Federico Marini (15:47:08): > the extension to multiomics is something that can be quite appealing, and not just to you

Tim Triche (15:47:28): > I’m comfortable being the village idiot

Aaron Lun (15:47:32): > People did see the altExp stuff, right?

Tim Triche (15:47:39): > no where is that

Aaron Lun (15:47:45): > see my link above.

Tim Triche (15:48:02): > this?https://rdrr.io/github/drisso/SingleCellExperiment/src/tests/testthat/test-sce-altExp.R - Attachment (rdrr.io): drisso/SingleCellExperiment source: tests/testthat/test-sce-altExp.R > tests/testthat/test-sce-altExp.R defines the following functions:

Friederike Dündar (15:48:12): > @Friederike Dündar has joined the channel

Aaron Lun (15:48:13): > commit <- match.arg(commit)

Aaron Lun (15:48:16): > whoops

Aaron Lun (15:48:22): > https://stat.ethz.ch/pipermail/bioc-devel/2019-July/015305.html

Federico Marini (15:48:32): > Hi@Friederike Dündar

Friederike Dündar (15:48:42): > :wave:

Federico Marini (15:48:46): > pointer for you:https://github.com/theislab/anndata2ri

Tim Triche (15:48:46): > oh! I see@Aaron Lun– from the previous MAE discussion

Tim Triche (15:48:47): > thanks

Dan Bunis (15:50:32): > @Dan Bunis has joined the channel

Friederike Dündar (15:51:05) (in thread): > yeah, I found that, too, but didn’t immediately get how it would work, but I guess I’ll just have to try and see

Jared Andrews (16:25:42): > @Jared Andrews has joined the channel

Federico Marini (17:09:15) (in thread): > same here, did not try that extensively

Friederike Dündar (17:09:33) (in thread): > how far did you get?

Federico Marini (17:09:47) (in thread): > in the end it is just possible to export the whole needed by hand, but I can imagine an autoconverter is an excellent shortcut

Friederike Dündar (17:12:25) (in thread): > how do you “export by hand”?

2019-08-02

Dave Tang (02:28:09): > @Dave Tang has joined the channel

Gabriele Sales (04:04:53): > @Gabriele Sales has joined the channel

Kellie Kravarik (08:40:39): > @Kellie Kravarik has joined the channel

Federico Marini (16:42:24) (in thread): > Well, load into python via scanpy, and then write out to something that gets loaded back into R - this being quite suboptimal

Federico Marini (16:42:46) (in thread): > using the .obs, .data and so on slots of the loaded object

Friederike Dündar (16:46:10) (in thread): > right

2019-08-03

Mikhael Manurung (13:53:02): > @Mikhael Manurung has joined the channel

Shila Ghazanfar (14:42:32): > @Shila Ghazanfar has joined the channel

2019-08-05

Koen Van den Berge (09:28:08): > @Koen Van den Berge has joined the channel

2019-08-09

Aaron Lun (01:30:42): > Is there no one I can convince to takescran::convertTooff my hands?

Aaron Lun (01:30:45): > I hate this function so much.

Friederike Dündar (09:24:37): > that’s not a particularly good sell

Friederike Dündar (09:25:32): > what’s the bother?

2019-08-14

Mike Smith (07:25:10): > @Mike Smith has joined the channel

2019-08-16

Constantin Ahlmann-Eltze (04:41:16): > @Constantin Ahlmann-Eltze has joined the channel

Kevin Rue-Albrecht (18:20:49): > incoming PR:grimacing:

2019-08-17

Kevin Rue-Albrecht (12:38:33): > FYI, fixing unit tests on the PR. Also, I just realize that I accidentally wiped outwithDimnames=TRUE. Fixing that too

Kevin Rue-Albrecht (13:54:11): > > This should go into the body of the tryCatch, don’t know why you need to catch the NULL here. > > Oh wait, just replace this with the dep warning. But make sure you use if (…) {. > What’s thedepwarning?@Aaron Lun

Aaron Lun (13:54:59): > The Dep warning for the old NULL returns

Kevin Rue-Albrecht (13:56:05): > Also: sorry, I copy pasted from SummarizedExperiment and adapted from there, it looks more sane if I move code like this: > > out <- tryCatch({ > internals[, type] > }, error=function(err) { > stop(msg, "\n'", type, "' not in names(reducedDims(<", class(x), ">))") > }) >

Aaron Lun (13:56:54): > Well, that will be what it will eventually be, but right now, we can’t throw, we need to return a NULL with a dep warning.

Kevin Rue-Albrecht (13:59:14): > > But make sure you use if (…) > Sorry.. I don’t deprecate stuff much. How do you want to useifhere?

Kevin Rue-Albrecht (14:03:16): > Is that behaviour good for you? > > > reducedDim(sce, "PCA") > NULL > Warning message: > In value[[3L]](cond) : > 'reducedDim(<SingleCellExperiment>, type="character", ...)' invalid subscript 'type' > 'PCA' not in names(reducedDims(<SingleCellExperiment>)) > NULL return value will be deprecated and error will be thrown >

Kevin Rue-Albrecht (14:03:43): > It didn’t require anifas you mentioned, so I’m not sure

Kevin Rue-Albrecht (14:05:16): > (Fine tuning the message itself is another detail)

Aaron Lun (14:06:02): > I wouldn’t want the error message at all, just usething <- try(..., silent=TRUE). Then check the output of thething, and if it’s an error class, return NULL with a deprecation warning. The entire thing will then switch over to a tryCatch block in the next release.

Kevin Rue-Albrecht (14:16:35): > Can’t finish this now. Dinner. Though we’re basically there. As you said, I’ve got the future behaviour. I just need to hide that behind a deprecation warning for now

Tim Triche (14:22:35): > oh hey, I meant to ask people who know, is this supposed to be happening in “stable” devel?

Tim Triche (14:22:42): > > R> scRNA > class: SingleCellExperiment > dim: 33694 35417 > metadata(0): > assays(1): counts > rownames(33694): ENSG00000243485 ENSG00000237613 ... ENSG00000277475 > ENSG00000268674 > rowData names(2): ID Symbol > colnames(35417): A_scRNA_AAACCTGCAAGTAATG A_scRNA_AAACCTGCAGCGAACA ... > U_scRNA_TTTGTCATCCTACAGA U_scRNA_TTTGTCATCTGTTGAG > colData names(3): Sample Barcode subject > Error in if (objectVersion(object) >= "1.7.1") { : > argument is of length zero >

Tim Triche (14:23:25): > It also seems to mangle other packages’ callNextMethod()s.

Tim Triche (14:23:30): > > R> cytof > class: daFrame > dim: 4938816 21 > metadata(2): experiment_info cofactor > assays(1): exprs > rownames: NULL > rowData names(3): sample_id condition patient_id > colnames(21): CD49d CD11a ... CD57 CD127 > colData names(3): channel_name marker_name marker_class > Error in reducedDimNames(object) : > no slot of name "reducedDims" for this object of class "daFrame" >

Aaron Lun (14:25:58): > No, it shouldn’t be happening. I though I fixed all of these. MRE?

Tim Triche (14:28:45): > > R> packageVersion("SingleCellExperiment") > [1] '1.7.4' >

Tim Triche (14:28:53): > Too old?

Aaron Lun (14:29:42): > That should have been fine.

Tim Triche (14:30:14): > > R> packageVersion("SummarizedExperiment") > [1] '1.15.6' >

Tim Triche (14:30:21): > Is that the problem somehow?

Tim Triche (14:30:55): > It seems unlikely that SE would be the root but I don’t know.

Tim Triche (14:30:59): > > R> bulkRNA > class: SummarizedExperiment > dim: 173259 8 > metadata(0): > assays(3): est_counts tpm eff_length > rownames(173259): ENST00000415118 ENST00000448914 ... ENST00000630922 > ENST00000630347 > rowData names(2): length Symbol > colnames(8): A_bulkRNA B_bulkRNA ... T_bulkRNA U_bulkRNA > colData names(8): n_targets n_bootstraps ... call donor_id >

Aaron Lun (14:31:23): > Can you make an MRE?

Tim Triche (14:31:52): > I will need to use something simpler, orrehashit onto S3, but yes.

Tim Triche (14:32:26): > I need to go take care of an appointment in the next 30 minutes but after that I can make a reprex.

Aaron Lun (14:32:28): > The problem is likelyupdateObjectthat gets called under the hood.

Tim Triche (14:33:02): > I expect so, I just couldn’t figure out where. I’ll see if I can useCATALYSTto provoke it without touching singleCellExperiment directly

Tim Triche (16:57:15): > reproducible example (after updating packages and using R –vanilla)! - File (R): reprex.R

Tim Triche (16:57:33): > sessionInfo() from the above - File (LaTeX/sTeX): sessionInfo.tex

Tim Triche (16:58:04): > I don’t see why/how this occurs, but it’s easily reproducible in my installation

2019-08-18

Aaron Lun (03:31:40): > ThedaFrameexample is because CATALYST defines its ownreducedDimfunctions that mess around with the SCE slots. They shouldn’t have done that, they should be usingcallNextMethod()rather than touching@reducedDimsdirectly.

Aaron Lun (03:31:53): > I’m more concerned about thescRNAexample.

Aaron Lun (03:38:32): > The delay may be solved byhttps://github.com/drisso/SingleCellExperiment/issues/31

Aaron Lun (03:49:48): > https://github.com/HelenaLC/CATALYST/issues/54

Kevin Rue-Albrecht (11:15:59): > Is that what you’d expect for now@Aaron Lun? > > > reducedDim(sce) > NULL > Warning message: > NULL is deprecated. > > reducedDim(sce, 1L) > NULL > Warning message: > NULL is deprecated. > > reducedDim(sce, "PCA") > NULL > Warning message: > NULL is deprecated. >

Kevin Rue-Albrecht (11:16:53): > With the next version throwing error messages such as > > > reducedDim(sce) > Error in .local(x, type, ...) : > 'reducedDim(<SingleCellExperiment>, ...) length(reducedDims(<SingleCellExperiment>)) is 0' >

Kevin Rue-Albrecht (12:50:40): > Right. I think it is done. PR updated.

Aaron Lun (13:00:32): > The deprecation message could be better. But fundamentaly yes.

Kevin Rue-Albrecht (13:01:24): > Well. That part is down to personal taste, so I’ll let you rephrase rather than guess

Aaron Lun (13:46:23): > Finish my comments and I’ll merge.

Kevin Rue-Albrecht (13:53:11): > OK, sotype="unnamed"rather than"", as you initially suggested?

Aaron Lun (13:53:57): > Yes.

Kevin Rue-Albrecht (13:54:01): > Also, does that mean I get to fixreducedDims? [current implementation] > > if (is.null(names(value))) { > colnames(collected) <- character(length(value)) > } >

Aaron Lun (13:54:15): > Yes, spam them withunnamed1, etc.

Kevin Rue-Albrecht (13:54:25): > :ok_hand:

Kevin Rue-Albrecht (14:08:59): > Uh. Just checking before I delete existing unit tests, this doesn’t work anymore whenscedoes not have any reddim result. Though this new error matches the behaviour that you’ve described in the PR, because numeric assignment doesn’t work if the index is out of bounds > > reducedDim(sce, 1) <- d1 >

Kevin Rue-Albrecht (14:10:04): > That said, it’s a whole block of unit tests that would go away then: > > test_that("reducedDim getters/setters work with numeric indices", { > # In the absence of reducedDim > # currently return NULL with a deprecation message > # future will throw an error > expect_null(reducedDim(sce)) # during deprecation > expect_warning(reducedDim(sce), "NULL is deprecated") # during deprecation > # expect_error(reducedDim(sce), "is 0") # after deprecation > expect_null(reducedDim(sce, 2)) # during deprecation > expect_warning(reducedDim(sce, 2), "NULL is deprecated") # during deprecation > # expect_error(reducedDim(sce, 2), "invalid subscript 'type'") # after deprecation > expect_null(reducedDim(sce, "PCA")) # during deprecation > expect_warning(reducedDim(sce, "PCA"), "NULL is deprecated") # during deprecation > # expect_error(reducedDim(sce, "PCA"), "subscript contains invalid names") # after deprecation > > # This gets a bit confusing as the order changes when earlier elements are wiped out. > reducedDim(sce, 1) <- d1 > expect_identical(reducedDim(sce), d1) > expect_identical(reducedDim(sce, 1), d1) > expect_identical(reducedDimNames(sce), "") > reducedDim(sce, 2) <- d2 > expect_identical(reducedDim(sce), d1) > expect_identical(reducedDim(sce, 2), d2) > expect_identical(reducedDimNames(sce), c("unnamed1", "unnamed2")) > > mult <- d1 * 5 > reducedDim(sce, "PCA") <- mult # d1 is the second element. > expect_identical(reducedDim(sce, 1), d1) > expect_identical(reducedDim(sce, 2), d2) > expect_identical(reducedDim(sce, 3), mult) > expect_identical(reducedDimNames(sce), c("", "", "PCA")) > > reducedDim(sce, 1) <- NULL # d2 becomes the first element now. > expect_identical(reducedDim(sce), d2) > expect_identical(reducedDim(sce, 1), d2) > expect_identical(reducedDim(sce, 2), reducedDim(sce, "PCA")) > expect_identical(reducedDimNames(sce), c("", "PCA")) > > reducedDim(sce) <- NULL # 'mult' becomes the first element. > expect_identical(reducedDim(sce), mult) > expect_identical(reducedDimNames(sce), "PCA") > reducedDim(sce) <- d2 # d2 now overwrites the first element. > expect_identical(reducedDim(sce, 1), d2) > expect_identical(reducedDimNames(sce), "PCA") > > expect_error(reducedDim(sce, 5) <- d1, "subscript out of bounds") > }) >

Kevin Rue-Albrecht (14:11:13): > I guess I just have to edit the first couple of assignments (index 1 and 2) to areducedDims<-instead

Aaron Lun (14:14:27): > Huh? Why do those tests go away?

Kevin Rue-Albrecht (14:15:45): > actually, not exactly go away, I just need to edit them, because they rely onreducedDim(sce, 1) <- d1andreducedDim(sce, 2) <- d2working, which now don’t work anymore to create new results

Kevin Rue-Albrecht (14:15:57): > I’ll paste you my new version in a minute, almost done

Kevin Rue-Albrecht (14:18:35): > here’s the updated version, note: > - the new expected errors forreducedDim(<numeric>)- the use ofreducedDimsto compensate for the previously working code > - theunnamed1,2results > > test_that("reducedDim getters/setters work with numeric indices", { > # In the absence of reducedDim > # currently return NULL with a deprecation message > # future will throw an error > expect_null(reducedDim(sce)) # during deprecation > expect_warning(reducedDim(sce), "NULL is deprecated") # during deprecation > # expect_error(reducedDim(sce), "is 0") # after deprecation > expect_null(reducedDim(sce, 2)) # during deprecation > expect_warning(reducedDim(sce, 2), "NULL is deprecated") # during deprecation > # expect_error(reducedDim(sce, 2), "invalid subscript 'type'") # after deprecation > expect_null(reducedDim(sce, "PCA")) # during deprecation > expect_warning(reducedDim(sce, "PCA"), "NULL is deprecated") # during deprecation > # expect_error(reducedDim(sce, "PCA"), "subscript contains invalid names") # after deprecation > > # This gets a bit confusing as the order changes when earlier elements are wiped out. > expect_error(reducedDim(sce, 1) <- d1, "invalid subscript 'type'\nsubscript out of bounds") > expect_error(reducedDim(sce, 2) <- d1, "invalid subscript 'type'\nsubscript out of bounds") > > reducedDims(sce) <- list(d1, d2) > expect_identical(reducedDim(sce), d1) > expect_identical(reducedDim(sce, 2), d2) > expect_identical(reducedDimNames(sce), c("unnamed1", "unnamed2")) > > mult <- d1 * 5 > reducedDim(sce, "PCA") <- mult # d1 is the second element. > expect_identical(reducedDim(sce, 1), d1) > expect_identical(reducedDim(sce, 2), d2) > expect_identical(reducedDim(sce, 3), mult) > expect_identical(reducedDimNames(sce), c("unnamed1", "unnamed2", "PCA")) > > reducedDim(sce, 1) <- NULL # d2 becomes the first element now. > expect_identical(reducedDim(sce), d2) > expect_identical(reducedDim(sce, 1), d2) > expect_identical(reducedDim(sce, 2), reducedDim(sce, "PCA")) > expect_identical(reducedDimNames(sce), c("unnamed2", "PCA")) > > reducedDim(sce) <- NULL # 'mult' becomes the first element. > expect_identical(reducedDim(sce), mult) > expect_identical(reducedDimNames(sce), "PCA") > reducedDim(sce) <- d2 # d2 now overwrites the first element. > expect_identical(reducedDim(sce, 1), d2) > expect_identical(reducedDimNames(sce), "PCA") > > expect_error(reducedDim(sce, 5) <- d1, "subscript out of bounds") > }) >

Aaron Lun (14:19:44): > Looks good, move this comment ” # This gets a bit confusing as the order changes when earlier elements are wiped out.” to the first NULL assignment.

Kevin Rue-Albrecht (14:21:26): > Done and pushed

Aaron Lun (14:23:02): > Okay, if you want to get your name onAuthors@R, do the same foraltExp.

Kevin Rue-Albrecht (14:25:20): > You’re gonna get me killed by both SteveandAnna if I do anything else tonight. And next weekend (bank holiday) I’m visiting my brother and almost 3 month old nephew

Aaron Lun (14:26:43): > Geez, I remember when I had holidays.

Kevin Rue-Albrecht (14:27:38): > and guess who’s in charge of the “data sharing and visualisation” session of our group retreat this Wednesday:stuck_out_tongue:

Aaron Lun (14:27:50): > Uh. Steve?

Kevin Rue-Albrecht (14:28:41): > ha. ha. (just in case you were seriously asking: no it’s me)

Aaron Lun (14:31:21): > A GUI for analysis? We already have that, it’s called Rstudio.

Kevin Rue-Albrecht (18:33:56): > > Okay, if you want to get your name onAuthors@R, do the same foraltExp. > :money_mouth_face:

Kevin Rue-Albrecht (18:34:04): > see PRhttps://github.com/drisso/SingleCellExperiment/pull/36

Kevin Rue-Albrecht (19:22:43): > PR updated. Gotta sleep now.

Kevin Rue-Albrecht (19:23:33): > PS: making Travis describe warnings as errors is so depressing:stuck_out_tongue:

Luke Zappia (21:29:00): > @Luke Zappia has joined the channel

2019-08-19

Kin Lau (17:52:05): > @Kin Lau has joined the channel

2019-08-22

Hervé Pagès (04:06:01): > I was wondering if you guys have considered removing theSCEpart fromsplitSCEByAlt. So justsplitByAlt. Or maybesplitByAltExpto make the link with thealtExpfeature completely explicit?

Kevin Rue-Albrecht (04:54:09): > Without being the voice of SCE, I’ve just looked at the function now and I agree that the current name isn’t super explicit (or perhaps istooexplicit rather) > That said, my suggestion would be splitAltExps. > Except if there is a grammar point in the naming that I’m not aware of, i would argue that there is no real need for the “By”. First because one experiment will remain the “main” one (ie not all experiments are “alt”), then (related) because I’d describe the operation as splitting altExps “away” from the “main” experiment

Aaron Lun (11:50:59): > I have no opinion on any choice other than the one that reduces the amount of work for me.

2019-08-23

Aaron Lun (18:24:41): > Done.

2019-08-24

Aaron Lun (02:12:36): > @Hervé Pagèsjust noticed theDFramechange you made to SingleCellExperiment. Y’know, if S4Vectors provided a function that gave me an empty DataFrame with a given number of rows, I’d just use that instead ofnewing it, which should abstract away all of the implementation details.

Aaron Lun (02:13:31): > Then you wouldn’t have to go poking around in all of S4Vector’s downstreams. Seems like it could save us both some time.

Martin Morgan (17:41:40) (in thread): > maybe just fun to note, but > > > .Internal(inspect(seq_len(10000000))) > @7fca610901c8 13 INTSXP g0c0 [NAM(7)] 1 : 10000000 (compact) > > indicates thatseq_len(10000000)is actually represented in a compact way (maybe start = 1, by = 1, length = 10000000) rather than as a literal vector, and that this propagates through the DataFrame constructor > > > .Internal(inspect(DataFrame(x = seq_len(10000000))$x)) > @7fca610e4778 13 INTSXP g0c0 [NAM(7)] 1 : 10000000 (compact) > > soDataFrame(x = seq_len(10000000)[, FALSE]costs almost nothing in terms of space / time.

Aaron Lun (17:43:11) (in thread): > Hm, interesting, if a bit voodoo-ish.

Aaron Lun (17:43:59) (in thread): > I was just hoping for a simplenrowsargument inDataFrame().

Hervé Pagès (18:05:36) (in thread): > Not sure the world is ready for it: > > library(S4Vectors) > df0 <- new("DFrame", nrows=10000000L) > df1 <- S4Vectors:::make_zero_col_DataFrame(10000000) > df2 <- DataFrame(x = seq_len(10000000))[, FALSE] > > identical(df0, df1) > # [1] TRUE > identical(df0, df2) > # [1] TRUE > > library(microbenchmark) > microbenchmark(df0 = new("DFrame", nrows=10000000L), > df1 = S4Vectors:::make_zero_col_DataFrame(10000000), > df2 = DataFrame(x = seq_len(10000000))[, FALSE]) > # Unit: microseconds > # expr min lq mean median uq max neval cld > # df0 905.783 986.7485 1083.4116 1014.376 1065.151 2654.957 100 b > # df1 166.112 182.7905 192.8524 193.593 200.180 223.136 100 a > # df2 2370.436 2530.8700 2676.7625 2599.868 2684.884 4285.616 100 c > > Sounds like the time has come to unleash the power ofmake_zero_col_DataFrameby exporting it.

Aaron Lun (18:06:58) (in thread): > Yeah!

Aaron Lun (18:07:24) (in thread): > at last, I can get all those 900 microseconds back.

Hervé Pagès (18:13:25) (in thread): > Excuse me but you’ll get 2431.821 microseconds back! Huge difference. And you can get back another extra 20 microseconds by doingnew2("DFrame", nrows=10000000L, check=FALSE), which is basically whatmake_zero_col_DataFrameis doing.

Aaron Lun (18:24:56) (in thread): > Phwoar! Close to the metal, this stuff.

2019-08-25

Aaron Lun (14:25:13): > Planning to move LEM from SCE to BiocSingular. Thoughts?

2019-08-26

Aaron Lun (00:02:54): > Is there a quick way to trawl BioC’s codebase to see if anyone is actually using the LEM?

Aaron Lun (01:27:49): > And while I’m making requests, it would be great if thescatinshow,SummarizedExperiment-methodgot exposed somewhere. I use it in everyshowmethod for my 2-dimensional S4 classes, so there’s at least 4 copies scattered around my codebase.

Hervé Pagès (02:15:29): > scat()now in S4Vectors 0.23.20 and renamedcoolcat()

Aaron Lun (02:58:07): > oh yeah

Aaron Lun (02:58:12): > :+1:

2019-09-06

Aaron Lun (01:42:31): > @Kevin Rue-Albrechtlook forward to the next version of SCE.

Kevin Rue-Albrecht (02:38:30): > I do. I look even more forward to using SCE again to its full extent. Currently using a Seurat workflow for consistency with earlier analyses. No wonder I’m so confused all the time, switching back and forth:sweat_smile:

2019-09-07

Aaron Lun (20:05:21): > <!channel>Second-last call for anyone using the LinearEmbeddingMatirx. This is going to be moved from SingleCellExperiment to BiocSingular unless someone speaks up. Last call will be on the BioC-devel mailing list.

Loyal (20:07:40): > We use it (loosely but more is planned) in ProjectR, but I have no problem getting it from BioSingular if it’s moving. Thanks for the heads up.

2019-09-09

Elana Fertig (09:27:33): > We use it for CoGAPS

Elana Fertig (09:28:01): > it would be nice not to lose it — we went to a lot of work to code it to be compliant with it

Tim Triche (10:24:05): > @Elana Fertigit appears just to be moving from SingleCellExperiment to BiocSingular

Elana Fertig (10:51:36): > not sure how much work that’ll be — is there an option to deprecate slowly?

Martin Morgan (11:10:39): > FWIW the ‘best practice’ is to deprecate in the current bioc-devel (preferably early in the devel cycle) and then make defunct the next cycle, so that developers have almost 12 months from first warning to dire consequences…http://bioconductor.org/developers/how-to/deprecation/

Davide Risso (12:03:57): > Is there a compelling case for moving LEM? If not, leaving it in SCE could be a way for Elana et al to avoid depending on one more package…

Aaron Lun (12:08:35): > Because the LEM has nothing to do with an SCE.

2019-09-10

Elana Fertig (07:24:53) (in thread): > Thanks!

Elana Fertig (07:25:16): > There is definitely a logic to that….

Elana Fertig (07:26:40): > If it’s going to be moved though can we generalize it? Right now as the class BiocSingular is written it’s focused on SVD. It would be nice to have the class be a hub for more factorizarions

Elana Fertig (07:27:52): > Eg CoGAPS and projectR aren’t limited to SVD. It made sense to adopt the linear embeddings class in single cell experiment since it was generalized. It’ll be confusing why we did that and annoying to redo all that infrastructure again for an SVD based class.

Aaron Lun (11:28:28): > I don’t really care where it goes, as long as it’s not in the SCE.

Tim Triche (11:34:55): > so… carve out a separate package to house it? maybe call it LEM?

Friederike Dündar (11:36:43): > stanislaw

Davide Risso (11:43:08): > Just to be clear, the class definition will stay the same, it will just get moved to a different package. (If I understand correctly) So no need to rewrite anything based on SVD on your part Elana.

Davide Risso (11:44:20): > I can see why Aaron doesn’t want it in SCE, but it looks like LEM might be more general than the BiocSingular package?

Davide Risso (11:44:40): > A dedicated package means yet another package to maintain though…

Federico Marini (12:10:39): > moonlander?

Federico Marini (12:10:45): > apollo?

Federico Marini (12:11:47): > > > available::available("apollo") > Urban Dictionary can contain potentially offensive results, > should they be included? [Y]es / [N]o: > 1: n > ── apollo ──────────────────────────────────────────────────────────────────────────────────────── > Name valid: ✔ > Available on CRAN: ✖ > Available on Bioconductor: ✖ > Available on GitHub: ✔ > Abbreviations:[http://www.abbreviations.com/apollo](http://www.abbreviations.com/apollo)Wikipedia:[https://en.wikipedia.org/wiki/apollo](https://en.wikipedia.org/wiki/apollo)Wiktionary:[https://en.wiktionary.org/wiki/apollo](https://en.wiktionary.org/wiki/apollo)Sentiment:??? >

Tim Triche (16:13:12): > so I amstillgetting this irritating error as of 5 minutes ago, with updated everything: > > Error in if (objectVersion(object) < "1.7.1") { : > argument is of length zero > > This is with a SingleCellExperiment that I created 5 minutes ago from TCGA single-cell data with:

Tim Triche (16:13:41): > > runs <- lapply(scDirs, read10xCounts, col.names=TRUE) >

Tim Triche (16:14:16): > I looked up the method, and it expects to see an elementversionin @int_metadata

Tim Triche (16:14:21): > > #' @export > setMethod("objectVersion", "SingleCellExperiment", function(x) { > int_metadata(x)$version > }) >

Tim Triche (16:14:52): > But what I find instead is:

Tim Triche (16:15:01): > > grep("version", names(int_metadata(scRNA)), value=TRUE) > [1] "508084.version" "548327.version" > [3] "721214.version" "782328.version" > [5] "809653.version" "ND_083017.version" > [7] "ND_090617.version" "Normal_sorted_170531.version" > [9] "Normal_sorted_170607.version" >

Tim Triche (16:15:06): > This seems like a bug.

Tim Triche (16:16:13): > A kludge fixes it, though:

Tim Triche (16:16:47): > > int_metadata(scRNA)$version <- Reduce(unique, int_metadata(scRNA)[versions]) > > show(scRNA) > class: SingleCellExperiment > dim: 32991 109626 > metadata(0): > assays(1): counts > rownames(32991): ENSG00000000003 ENSG00000000005 ... ENSG00000283118 > ENSG00000283125 > rowData names(2): ID Symbol > colnames(109626): 508084_AAACCTGAGAAACGCC_1 508084_AAACCTGAGAGTGAGA_1 > ... Normal_sorted_170607_TTTGTCATCTAGAGTC_1 > Normal_sorted_170607_TTTGTCATCTTACCTA_1 > colData names(18): UPN Sample ... AML FLT3.ITD > reducedDimNames(0): > spikeNames(0): > altExpNames(0): >

Tim Triche (16:17:28): > Somewhere there is a logical mismatch between what is written into int_metadata and what is expected to be written into there.

Aaron Lun (16:17:45): > It’s caused by acbind/rbindat some point.

Tim Triche (16:18:12): > I figured as much. Can the kludge be added as a fallback, in the event all versions agree?

Tim Triche (16:18:26): > If versions do not agree, then there is a different sort of problem.

Aaron Lun (16:19:04): > Well, it doesn’t really matter as both functions runupdateObjectanyway.

Tim Triche (16:19:23): > updateObject does not succeed if int_metadata(sce)$version is undefined.

Tim Triche (16:19:27): > That’s the root of all this.

Aaron Lun (16:20:11): > No, what I mean is that at some point in the creation ofsce, there was somer/cbinding done.

Aaron Lun (16:20:50): > Both functions runupdateObjectbefore actually doing the binding, so there is never any problem with out of date versions.

Aaron Lun (16:21:33): > The real problem is thatcon named lists seems to concatenate the name to the value.

Aaron Lun (16:21:36): > > c(A=list(B=13), C=list(B=2312)) >

Tim Triche (16:21:53): > ugh, you’re right

Aaron Lun (16:22:02): > cbind/rbind will also combine int_metadata in a manner consistent with SE’s cbind and rbind

Tim Triche (16:22:09): > that said, the kludge still fixes the resulting problem

Aaron Lun (16:22:16): > this had the unfortunate effect of also doing whatever this was doing.

Aaron Lun (16:24:42): > You can try the GH version and see if it fixes your problelm.

Aaron Lun (17:11:24): > Fixed and tested. 1.7.9.

Tim Triche (17:29:13): > ohhh

Tim Triche (17:29:15): > ok thanks

Tim Triche (17:29:20): > will that go into devel tonight?

Aaron Lun (17:29:40): > Yes.

Tim Triche (17:29:45): > I can install the Github version in the meantime. Thanks much.

2019-09-16

Joan (10:14:39): > @Joan has joined the channel

Joan (10:26:36): > does anyone has idea of how to remove batch effect on ADT data ? i have looked Aaron’s pipeline of how to remove batch effect using batchelor (https://bioconductor.org/packages/release/workflows/vignettes/simpleSingleCell/inst/doc/batch.html#6_controlling_the_merge_order) , but don’t find anything on ADT data. can we apply this process to the ADT data too?@Aaron Lun

Aaron Lun (11:29:46): > This should be discussed at#sc-batch-correction

Joan (11:49:08): > ok, I will go there, thanks Aaron~

2019-09-27

Aaron Lun (01:12:19): > <!channel>If anyone was usingscater::logNormCountsand relying on the normalized values in the alternative experiments, I have changeduse_altexps=FALSEby default. This is to be more forgiving in cases where people have alternative experiments but are just trying to do things with the main data; all-zero counts for a cell in the alternative experiments causeslogNormCountsto break (correctly) whenuse_altexps=TRUE, which is annoying when you don’t care about the altnerative experients for a particular analysis.

Aaron Lun (01:12:36): > This is active as of 1.13.22.

Elana Fertig (09:24:20) (in thread): > Yes!

2019-09-28

Aaron Lun (14:58:34): > How do people use CITE-seq data for cell-based analyses? Do they stick them together with the genes and do clustering altogether, or is there something more sophisticated?

Aaron Lun (14:59:45): > Wondering if I should write acombineAltExpsfunction for SCE.

2019-09-30

Joan (09:06:42): > we cluster the RNA and ADT data separately, but cross reference the clustering result

Aaron Lun (15:02:38): > Sounds like there’s a bunch of utilities there that could be thrown into a pacakge if someone wants to get the ball rolling.

2019-10-02

Kellie Kravarik (18:55:43) (in thread): > The Satija lab has been presenting a method to weigh the CITEseq/RNA data together at a conferences – most recently SCG 2019 last week. I’m not sure it’s really better – I will let the community argue about that for a while – but it does make a pretty picture that aligns with canonical antibody based cell classification very well.

2019-10-04

Aaron Lun (22:32:16) (in thread): > Interesting. I think it would be a useful test case to try out some of BioC’s multi-modality methods (e.g., MOFA).

2019-10-07

Lauren Hsu (09:41:32): > @Lauren Hsu has joined the channel

2019-10-14

Kathy Sivils (15:32:39): > @Kathy Sivils has joined the channel

2019-10-23

Tokuwa Kanno (15:16:59): > @Tokuwa Kanno has joined the channel

2019-10-27

Sean Davis (11:03:05): > Single cell transcriptomics DREAM Challenge with spatial component.https://www.synapse.org/#!Synapse:syn15665609/wiki/582909 - Attachment (synapse.org): Synapse | Sage Bionetworks > Synapse is a platform for supporting scientific collaborations centered around shared biomedical data sets. Our goal is to make biomedical research more transparent, more reproducible, and more accessible to a broader audience of scientists. Synapse serves as the host site for a variety of scientific collaborations, individual research projects, and DREAM challenges.

Hervé Pagès (18:03:29): > Looks like the challenge was in 2018?

Avi Srivastava (20:28:12): > :slightly_smiling_face:I think Sean meant thishttps://www.synapse.org/#!Synapse:syn20692755/wiki/595096, ?? the problem statement looks v interesting ! - Attachment (synapse.org): Synapse | Sage Bionetworks > Synapse is a platform for supporting scientific collaborations centered around shared biomedical data sets. Our goal is to make biomedical research more transparent, more reproducible, and more accessible to a broader audience of scientists. Synapse serves as the host site for a variety of scientific collaborations, individual research projects, and DREAM challenges.

Avi Srivastava (20:29:31): > oh my bad may be as he said spatial component.

Sean Davis (20:39:47): > @Hervé Pagèsand@Avi Srivastava, I meant to point to the spatial aspect of the 2018 DREAM Challenge. I know there has been some interest in working on spatially-resolved data and thought that a DREAM challenge dataset might be of interest.

Sean Davis (20:39:55): > Sorry for the confusion….

Sean Davis (20:42:25) (in thread): > Tagging@Vince Carey@Stephanie Hicks….

Vince Carey (20:44:24): > Right, we have an “emerging topic” on spatial transcriptomics and this dataset could be a nice exemplar – I have not tried to grab it yet, if you have it in a bucket or can post a link it could accelerate the process.@Stephanie Hicks, comments welcome.

Vince Carey (20:46:11): > @Avi Srivastava, new lineage problem also looks interesting … form a team?

Elana Fertig (21:28:14): > we are doing this for the multi-omics workshop we are hosting with@Aedin Culhaneand Kim-Ahn Le Cao

Elana Fertig (21:28:44): > link to one spatial dataset si here

Elana Fertig (21:28:46): > https://github.com/BIRSBiointegration/Hackathon/tree/master/seqFISH

Avi Srivastava (21:47:51) (in thread): > Oh yea sure, although I don’t have much experience in this area but happy to contribute in any meaningful way.

Avi Srivastava (21:49:19): > Another droplet based single cell spatial datahttps://github.com/tudaga/NMFreg_tutorial/blob/master/NMFreg_Tutorial_cerebellum_puck180430_6.ipynb, as easy asgit pull. Companion paperhttps://science.sciencemag.org/content/363/6434/1463

2019-10-28

Sean Davis (06:26:15): > [OFFTOPIC]: If anyone would be willing to start a section on spatial transcriptomics tohttps://github.com/seandavi/awesome-single-cell, it would be much appreciated!

Sean Davis (06:28:55) (in thread): > https://twitter.com/seandavis12/status/1188762760513019904 - Attachment (twitter): Attachment > ROGUE: an entropy-based universal metric for assessing the purity of single cell population https://www.biorxiv.org/content/10.1101/819581v1 > “determination & annotation of cell subtypes is often subjective and arbitrary…Here we present an entropy-based statistic…to quantify the purity of…clusters.” https://pbs.twimg.com/media/EH9U4LKWkAE_bBN.jpg

Sehyun Oh (11:08:44): > @Sehyun Oh has joined the channel

Aaron Lun (22:33:30): > I swore we had a visualization channel, but I’ll just post it here.

2019-10-29

Aaron Lun (00:14:56): > A himawari plot (becausesunflowerplotwas already taken). Each row is a cell type, each column is a gene, and the color represents the log-fold change from the average. So far, this is identical to a normal heatmap - but check out the petals, which represent the proportion of cells in which a gene is detected. - File (PNG): test.png

Aaron Lun (00:16:36): > The aim of this test is to see if we can get an alternative to those ubiquitous dot plots, which make me anxious because of the complex interactions between the size, background color and the color scale.

Aaron Lun (00:23:29): > I had some plot ideas based on the game of craps, using the number of chips to represent the proportion of cells, but I haven’t tried to implement it. Mostly because of the proposed name of the plotting function.

Aaron Lun (00:57:35): > Ugh, and googling “himawari plot” gives me all these anime references. Or maybe google just knows what I usually look for.

Aaron Lun (00:57:42): > Damn all this machine learning!

Peter Hickey (00:59:06): > perhaps 10 petals instead of 12? > fwiw i also get anime without the google history

Aaron Lun (01:03:58): > well, I guess we can be pretty sure that no one has this name already.

Aaron Lun (01:04:14): > Someone should start a visualization channel, if we didn’t have one already.

Aaron Lun (02:27:29): > Top markers from the Zeisel dataset. - File (PNG): test.png

Federico Marini (04:55:31): > Maybe it is worth starting a #coolviz channel?

Federico Marini (04:55:42): > I like the looks, Aaron!

Avi Srivastava (09:06:36) (in thread): > Oh Nice !

Rob Amezquita (10:15:00): > this is adorable

Rob Amezquita (10:15:39): > but yep, also getting lots of anime

Rob Amezquita (10:16:23): > https://www.indifferentlanguages.com/words/sunflower - Attachment (indifferentlanguages.com): Do You Know How to Say Sunflower in Different Languages? > Looking for ways to say sunflower in other languages? Check out our list for saying sunflower in different languages. Be ready to meet a foreign friend!

Avi Srivastava (10:41:55) (in thread): > How about making the color of petals same all round. It just make the lighter color petals show up more brightly and saves some squinting .

Aaron Lun (11:35:54) (in thread): > That was my first version, but it didn’t look great to me.

Aaron Lun (14:38:59): > #cool-vis

2019-10-30

Stephanie Hicks (21:46:51) (in thread): > done:upside_down_face:https://github.com/seandavi/awesome-single-cell/pull/170

2019-10-31

Sean Davis (07:15:29): > https://github.com/seandavi/awesome-single-cell/pull/170#issuecomment-548321963

Ambrose Carr (11:13:03): > @Ambrose Carr has joined the channel

Stephanie Hicks (11:58:40): > haha! Thank you@Sean Davis!

2019-11-04

Izaskun Mallona (07:58:00): > @Izaskun Mallona has joined the channel

2019-11-05

Michelle Miron (13:40:32): > @Michelle Miron has joined the channel

2019-11-07

Kevin Blighe (11:27:04): > @Kevin Blighe has joined the channel

2019-11-08

Alan O’C (08:22:47): > @Alan O’C has joined the channel

2019-11-10

Charlotte Soneson (10:02:09): > You can have just two dimensions, but the number ofrowsin the reduced dim needs to be the same as the number of columns in the SCE (i.e., one per cell).

2019-11-16

Martin Morgan (10:56:05): > I’ve been working on a small packagehttps://github.com/mtmorgan/HCAmtxzipto import the pre-built ‘mtx.zip’ Human Cell Atlas datasets available fromhttps://data.humancellatlas.org/explore/projects(the result from clicking on the download button under the ‘Matrix’ column). The data could be downloaded as .loom and then read using LoomExperiment (https://bioconductor.org/packages/LoomExperiment), but I’m interested in the mtx.zip archives anyway… > > The count data from each experiment are big enough to want to represent them as a sparse matrix, but not so big that they need to be represented on disk. > > How should I store the count data? Currently I useMatrix::readMM()to read the mtx files, creating adgTMatrixclass. But should I be aiming fordgCMatrixor forHDF5Array, or…? Looking for maximum interoperability within the Bioconductor single cell ecosystem. - Attachment (Bioconductor): LoomExperiment > The LoomExperiment class provide a means to easily convert Bioconductor’s

Aaron Lun (20:04:55): > a dgCMatrix is usually the better way to go. dgT’s would work but are less efficient (and get silently converted to dgC’s inside most Matrix functions anyway). Hopefully there’s not too many non-zero entries to pop the max integer limit.

2019-11-18

Sean Davis (11:03:34): > https://twitter.com/seandavis12/status/1196435524556349440 - Attachment (twitter): Attachment > The New York Genome Center is hosting an @NCBI #SingleCell in the #cloud code-a-thon from January 15-17, 2020. > Submissions for project proposals are due December 2nd. > https://support.nlm.nih.gov/Single_Cell_Codeathon_Project_Proposal/ > @DCHackathons @nygenome > #bioinformatics #RNASeq https://pbs.twimg.com/media/EJqXwlBXsAAFCJu.jpg

2019-11-19

Mohamed Gunady (12:11:27): > @Mohamed Gunady has joined the channel

2019-11-20

Siyuan Ma (10:03:45): > @Siyuan Ma has joined the channel

Nolan Nichols (12:01:39): > @Nolan Nichols has joined the channel

Russ Bainer (12:02:40): > @Russ Bainer has joined the channel

Pratima Chennuri (21:27:16): > @Pratima Chennuri has joined the channel

2019-11-26

Sean Davis (08:26:53): > I thought this looked interesting:https://www.biorxiv.org/content/10.1101/853457v1

2019-12-01

Aaron Lun (19:47:08) (in thread): > So@Hervé Pagèscan I get this or what?

2019-12-02

Hervé Pagès (02:08:56) (in thread): > Absolutely young man but you need to watch your manners. I’ll renamemake_zero_col_DataFrame->make_zero_col_DFrameand export it tomorrow.

Aaron Lun (02:11:11) (in thread): > :konata:

Hervé Pagès (11:40:10) (in thread): > Done in S4Vectors 0.25.3

Aaron Lun (11:41:04) (in thread): > :+1:

Hervé Pagès (19:08:41): > @Davide RissoIt was brought to my attention that zinbwave doesn’t support dgCMatrix or HDF5Matrix as input. Are there plans to support this at some point? I’d be happy to help with this.

2019-12-03

Davide Risso (09:48:55): > Hi@Hervé Pagès, yes this is on my radar, but in the context of a bigger refactoring of zinbwave to make it more scalable, so it may take a while

Davide Risso (09:49:39): > if people think that this is urgent enough, I can of course start by supporting dgCMatrix and HDF5Matrix objects before the other changes

2019-12-04

Benjamin Reisman (09:27:50): > @Benjamin Reisman has joined the channel

2019-12-05

Juan Ojeda-Garcia (11:18:03): > @Juan Ojeda-Garcia has joined the channel

Aedin Culhane (17:14:13): > @Davide RissoWe started refactoring mogsa:: mbpca to work wth svds and support sparse matrices. Would love your advice on best approch

Hervé Pagès (17:38:15) (in thread): > Not urgent. Was just curious. Thx for the update.

2019-12-06

Somesh (12:21:15): > @Somesh has joined the channel

2019-12-10

Robert Ivánek (05:40:51): > @Robert Ivánek has joined the channel

Chris Vanderaa (07:41:31): > @Chris Vanderaa has joined the channel

Camille BONAMY (11:57:46): > @Camille BONAMY has joined the channel

2019-12-11

Milan Malfait (10:16:30): > @Milan Malfait has joined the channel

Elisabetta Mereu (11:19:57): > @Elisabetta Mereu has joined the channel

2019-12-16

Aedin Culhane (12:33:23): > If you missed last month’s Human Cell Atlas General Meeting in Spain, 10-11 October, or you would like to rewatch talks. The playlist of videos here:https://www.youtube.com/playlist?list=PLkef4SGmngdYmuzT4cJvpueHk2pFHJ4em

2019-12-17

Jean Yang (02:41:58): > @Jean Yang has joined the channel

Ellis Patrick (15:44:33): > @Ellis Patrick has joined the channel

2019-12-20

Domenick Braccia (08:36:38): > @Domenick Braccia has joined the channel

Paul Harrison (18:00:28): > @Paul Harrison has joined the channel

2019-12-22

Sara Fonseca Costa (16:08:29): > @Sara Fonseca Costa has joined the channel

2019-12-23

dylan (10:36:55): > @dylan has joined the channel

2019-12-25

Aaron Lun (02:06:04): > scater 1k commits.

2020-01-13

Lori Shepherd (13:05:30): > @Lori Shepherd has joined the channel

Lori Shepherd (13:24:24): > Hello<!channel>- For other projects (AnVIL) we have been investigating creating dockstore tools related to Bioconductor packages and workflows (https://dockstore.org) . We were wondering if anyone might be interested in creating a “tool” for taking raw files, to create a singlecellexperiment object - there is already some work on analysis of the singlecellexperiment object but actually creating the object from the raw files seems like it could be a useful tool to create.

Aaron Lun (13:26:24): > From what? scRNAseq has loads of these Rmarkdown files that could probably be converted to tools. Problem is that it deals with a very heterogeneous format so no two raw datasets are the same. Even when they’re from the same lab.

Vince Carey (13:28:56): > Agreed, we’d want to get clear on the upstream elements to be preprocessed/organized into SCE, from different protocols. Does it make sense to consider a family of BAM files, like the RNAseqData.HNRNPC.chr14 … or would one need to deal with FASTQ …

Vince Carey (13:33:08): > Presumably for something like the Tasic data in scRNAseq, we’d get something set up fromhttps://www.ncbi.nlm.nih.gov/sra?term=SRP061902to represent the raw data and these would be inputs to the proposed tool.

Aaron Lun (13:33:38): > The FASTQs? Or the count matrix?

Vince Carey (13:34:22): > Personally I think it makes sense to work from the most primitive data type available.

Aaron Lun (13:34:47): > Sure, but users should know to expect to be harder to reproduce the published results.

Vince Carey (13:38:33): > Yes, if I understand Lori’s interest, we are focusing on implementation of scalable methods inside a (somewhat generally unfamiliar) environment where raw data are readily available but the transformations to analyzable quantities are not. Once a given workflow is set up in dockstore, the analysis based on that workflow is reproducible “by definition” because all the computational resources are in a docker container, and all the steps are in WDL. What we don’t have many examples of are WDL that take us from raw resources to the richly self-describing data containers like SCE, and a template for doing that from a given primitive resource could be repurposed conveniently for similar but independently performed experiments.

Lori Shepherd (14:28:12): > The idea came from that there is already a workflow started in AnVIL that mimics the singlecellexperiment vignette but it starts with the data provided in the package which is already nicely formatted into a SCE object - but it would be useful to provide guidance to go from the raw files to this object.

Martin Morgan (14:32:00): > The HCA recently published their ‘optimus’ pipelinehttps://data.humancellatlas.org/pipelines/optimus-workflowto dockstorehttps://dockstore.org/search?search=optimus(based onhttps://github.com/HumanCellAtlas/skylab/blob/965bf836973835b55c86b7c4059af63f13e8f23f/pipelines/optimus/Optimus.wdl?); if this is close enough to ‘best practice’ then maybe it is a simple matter of adding / checking that the output is easily incorporated into SingleCellExperiment, and combining this with an example / standard Bioc-based analysis.

Kasper D. Hansen (14:46:27): > Having a pipeline from FASTQ files to SCE / count matrix would be really, really nice. We have always avoided getting into this space directly, but there is no doubt that it is likely to see heavy use.

Tim Triche (14:57:39): > we have one called streampipe

Tim Triche (14:58:01): > it takes an ENA archive as a URL and streams the reads into Kallisto/Bustools, then hands them off to scanpy and scvelo

Tim Triche (14:58:39): > I wrote a few shims to drag the derived trajectories, velocities, etc. into SCE by way of Seurat

Tim Triche (14:59:04): > and some added kludges perhttps://romanhaa.github.io/blog/paga_to_r/

Tim Triche (14:59:12): > I have a rotation student working on this now

Tim Triche (14:59:22): > we were going to send it off to F1000R at the end of his rotation

Tim Triche (15:00:04): > there is a set of normal bone marrows with bulk, scRNA, and CyTOF that we decided to target since we already had the outputs from it

Tim Triche (15:03:39): > but the point-and-shoot approach was the entire purpose of building it

Tim Triche (15:03:57): > the Optimus pipeline appears to be the opposite – download everything, run through STAR, do a bunch of other stuff

Tim Triche (15:04:34): > our goal was idiotically simple: point at some FASTQs on someone else’s server and recover counts (spliced and unspliced), trajectories, and velocities

Tim Triche (15:05:08): > some of the code is on GitHub, the rest needs to go up, but if this is what you’re after, that’s what we wrote. We use it all the time.

Tim Triche (15:05:36): > and by “we” I mean a joint effort from my lab and the core since I never seem to have any time to do anything anymore.

Tim Triche (15:06:52): > The workflow part of it was meant to showcase, OK you’ve got your trajectory-assembled velocity-estimated cells into an SCE, now compare the result of “early” merging/estimation with “late” merging/estimation (for e.g. trajectories since that was the part we wanted). That’s Brandon’s project.

Tim Triche (15:07:05): > But the pipe is already there.

Tim Triche (17:21:28): > it is encapsulated as a snakemake workflow, I have asked Brandon to put up the minimum viable product (MVP) that shoots out spliced and unspliced abundances for downstream usage… hopefully a decoupled (all hardcoded paths removed) version will be up before week’s end.

Tim Triche (17:22:02): > HCA Optimus: obtain all possible inputs and outputs, save to disk, also save ZARR files/metadata maps

Michael Love (17:23:21): > Just to throw in for Alevin we have Alevin output to tximeta which makes a SummarizedExperiment object with sparse assay data (and then we can wrap in SCE) and could toolify that (CC@Avi Srivastava)

Tim Triche (17:23:40): > streampipe: given a URL to a pair of FASTQ files, stream them through kbus and count the spliced/unspliced abundance per cell

Tim Triche (17:23:50): > I don’t see why it couldn’t do the same for Alevin

Tim Triche (17:26:35): > can Alevin handle a not-demultiplexed pair of FASTQs and allocate reads to barcodes midway through? I know you and Rob like to stream

Tim Triche (17:27:44): > for us we just wanted a pipeline that would allow us to quickly estimate velocity-informed trajectories

Tim Triche (17:30:08): > interesting, Caleb opened a ticket for RNA velocity for Alevin some time ago!

Tim Triche (17:30:50): > for kbus it’s simple, use one transcriptome index for spliced and another for unspliced (exons + readLength_overhang), I would assume that alevin can do the same?

Tim Triche (17:31:13): > it’s quite straightforward, we just feed the resulting estimates to scanpy and scvelo to reconstruct.

Michael Love (17:33:46) (in thread): > Let me ping@Avi Srivastavaon that

Michael Love (17:34:34) (in thread): > Definitely work in progress. Maybe they can give update

Tim Triche (17:35:42) (in thread): > Avi self-assigned apparently? In any event, it would be nice to send things straight through to tximeta regardless of aligner. It’s just a better way forward

Tim Triche (17:37:43): > would you be willing to fingerprint a merged (PDX-friendly) human-mouse cDNA+ncDNA txome? It’s our default starting point for both spliced and unspliced

Michael Love (17:38:07) (in thread): > Yeah i think they may be able to give you progress report but ill let Avi handle it

Tim Triche (17:38:21): > since the identifiers are disjoint, we just concatenate. logic is as for ncDNA+cDNA > cDNA alone.

Tim Triche (17:39:00): > we can presumably feed tximeta on that basis.

Michael Love (17:39:42) (in thread): > So this has come up before and theres two sticking points (1) does it work w GA4GH API (it may actually) and (2) its not clear to me how to deal w the seqnames

Michael Love (17:40:00) (in thread): > Is there an example of cross org seqnames done “correctly”

Tim Triche (17:40:02) (in thread): > ohhhhh good point

Tim Triche (17:40:09) (in thread): > genbank contigs

Tim Triche (17:40:22) (in thread): > that would be the canonical way to do it

Tim Triche (17:40:27) (in thread): > then there is no ambiguity

Tim Triche (17:42:22) (in thread): > per usual, ENSEMBL does it sensibly:

Tim Triche (17:43:21) (in thread): > >ENSMUST00000207772.2 cdna chromosome:GRCm38:16:59185219:59189928:-1 gene:ENSMUSG00000109020.3 gene_biotype:polymorphic_pseudogene transcript_biotype:polymorphic_pseudogene gene_symbol:Olfr197 description:olfactory receptor 197 [Source:MGI Symbol;Acc:MGI:3030031]

Tim Triche (17:43:33) (in thread): > the assembly is encoded into the contig

Tim Triche (17:47:06) (in thread): > >ENST00000355273.2 cdna chromosome:GRCh38:3:98282888:98283832:1 gene:ENSG00000197938.5 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:OR5H2 description:olfactory receptor family 5 subfamily H member 2 [Source:HGNC Symbol;Acc:HGNC:14752]

Tim Triche (17:47:21) (in thread): > this is why we preferred ENSEMBL for Arkas

Tim Triche (17:49:48): > also, regarding coordinates and GA4GH – honestly they’re supposed to support HGVS, right? In which case the chromosome (contig) is supposed to be encoded into the assembly name, per (e.g.)https://varnomen.hgvs.org/recommendations/DNA/variant/complex/

Tim Triche (17:50:09): > (GA4GH was originally supposed to facilitate graph genome based variant mining)

Tim Triche (17:52:27): > so e.g. human chrX / X is supposed to behttps://www.ncbi.nlm.nih.gov/nuccore/NC_000023.11which is equivalent to GRCh38.p13:X

Tim Triche (17:54:15): > black 6 mouse chrX ishttps://www.ncbi.nlm.nih.gov/nuccore/NC_000086.7which is equivalent to GRCm38.p6:X

Tim Triche (17:54:37): > so for reasonable transcriptomes there should never be a clash, either at transcript or contig level

Tim Triche (17:54:58): > spliced vs unspliced is another matter, not sure what to say about that one:wink:

Tim Triche (17:55:50): > anyways. I can have Brandon add on a default fingerprint for non-alevin spliced+unspliced to see if tximeta can play nice, which would rule.

Tim Triche (17:56:49): > (I’m still not sure how to fingerprint unspliced for velocity, but burn that bridge when we come to it eh)

Tim Triche (17:57:09) (in thread): > :thumbsup:

Michael Love (18:29:21) (in thread): > It will be hash based

Michael Love (18:29:36) (in thread): > At least thats been the discussion for more than a year now

Michael Love (18:30:12) (in thread): > Its better identifier for many reasons than a numeric id, esp since it can be computed

Michael Love (18:35:14) (in thread): > If there is interest in having a fastq -> Alevin -> tximeta -> SCE tool i can put some effort to it, but if we want to leverage optimus thats ok also

Avi Srivastava (23:36:31) (in thread): > Just to add EMBL already have a alevin based pipeline integrated into their systemhttps://github.com/ebi-gene-expression-group/scxa-droplet-quantification-workflow. As a developer of alevin, I can be biased but I think defining best practice in this respect might need more work. I’ll check with HCA people about the difference in the pipeline for DCA and gene expression groups.

Avi Srivastava (23:47:12) (in thread): > @Kasper D. HansenI totally agree, and that’s exactly our thought process when we published alevin. Moreover, alevin isnotjust rapidly generating FASTQ -> count, it’s also about how one can model the whole process in a principled fashion avoiding bias generated by throwing away significant number of gene multi-mapping reads while reporting uncertainty in the form of inferential replicates, which till date no other tool account for. > To add more into mix, once the cell/gene count matrix is generated, it’s also important to consider how much time / memory it takes to actually load the data for downstream analyses,https://github.com/COMBINE-lab/EDS/tree/prioryou can find various benchmarks we did comparing different formats.

Avi Srivastava (23:57:40) (in thread): > Yep, I think it’s relatively easy to do .

2020-01-14

Avi Srivastava (00:06:00) (in thread): > @Tim Trichelet me start by saying very nice piece of work with streampipe ! > However, I might need a little more context withnon-demultiplexed pair of FASTQ, do we mean a non de-multiplexed FASTQs based on CB of 1 experiment ? If yes, then that’s exactly the use case alevin is designed for. Alevin consumes a pair of fastq file with intermixed CB (for 1 library / experiment) performs CB whitelisting, transcriptome / genome alignment, UMI deduplication and generates gene counts internally all through one command. > However, when multiple experiments are combined then the rate of CB collision increases drastically, alevin is not designed to handle that. for example the mouse 3Million cell HCA experiment, it’s actually 164 separate experiments which has to be quantified separately with alevin.

Kasper D. Hansen (07:24:31) (in thread): > I just read the github readme. Why? A 2x saving over loom without random access seems pretty uninteresting to me. It seems you’re interested in adding random access support. That will make it much more interesting of course.

Kasper D. Hansen (07:24:54) (in thread): > (This is about the file format)

Kasper D. Hansen (07:26:01) (in thread): > Anyway, don’t answer because I’m sure you have long term plans for the file format. I’m just commented on what’s shipped.

Tim Triche (10:00:09) (in thread): > Perfect, that’s exactly what I was wondering. Yeah this seems quite pluggable then. If people want velocity fast, use kbus for now, if they want tximeta use alevin for now. Eventually the two can inter operate. Perfect

Tim Triche (10:39:16) (in thread): > :+1:

Tim Triche (10:40:05): > Actually I just realized how to do that (hash unsliced)

Tim Triche (10:40:26): > Will implement (ok Brandon will) & document

Tim Triche (10:41:19): > It makes mixed species easier too since the introns diverge more (if you’re going to hash the sequence)

Vince Carey (10:57:21) (in thread): > can you give references on this hashing concept?

Michael Love (11:13:06) (in thread): > GA4GH:https://www.ga4gh.org/news/ga4gh-releases-refget-api-for-accessing-genomic-reference-sequence-data/

Michael Love (11:13:32) (in thread): > tximeta’s use of hashing (in Bioconductor):https://www.biorxiv.org/content/10.1101/777888v1

Michael Love (11:14:40) (in thread): > GA4GH’s spec for representing variants:https://vr-spec.readthedocs.io/en/1.0/

Michael Love (13:09:50): > sounds interesting:slightly_smiling_face:maybe i’ll port us over to a new channel tho

Michael Love (13:10:35): > here’s a channel:#hash-multi-species

Tim Triche (13:23:28): > done

Tim Triche (13:24:21): > meanwhile we decided to stuff streampipe into a container (i.e. the snakemake repo stays up, but the default “get me some mouse marrow s/u” demo and transcriptome goes in a singularity container so that people can see it go and how to make it give them the same for any FASTQ URL)

Aaron Lun (14:12:12): > Second moving to another channel. Let me know if the SCE needs any extensions.

Tim Triche (14:47:21): > well, since you mentioned it

Tim Triche (14:49:02): > I wrote some shims to shove unspliced & velocity estimates into altExp slots

Tim Triche (14:49:32): > the trajectory computed by PAGA is more “interesting”

Tim Triche (14:49:44): > i’ve just shoved that into metadata

Tim Triche (14:50:53): > are there any people going straight from h5ad adata files to SCEs without using Seurat as an intermediary?

Aaron Lun (14:52:46): > Sounds like there’s room for another package to go directly from those files to SCEs. The SCEpackageitself has no concept of I/O, it just implements the class.

Aaron Lun (14:53:11): > I don’t know what the PAGA trajectory structure is, but if it’s not parallel to the columns, then the metadata is the best and only location.

Tim Triche (14:54:06): > good point, no reason to complicate matters. something like DropletUtils that pulls features into an SCE, but from adata (h5ad) instead of 10X matrix + cells style output

Aaron Lun (14:54:43): > Basically, yes. Once basilisk gets in, you can even write a package containing a suite of SCE-compatible wrappers around scanpy.

Tim Triche (14:54:51): > what does basilisk do

Aaron Lun (14:55:09): > easiest just to showhttps://github.com/LTLA/basilisk/blob/master/vignettes/motivation.Rmd

Tim Triche (14:56:09): > dear god

2020-01-16

Aaron Lun (00:06:56): > YES.

Aaron Lun (00:07:01): > I’VE DONE IT

Aaron Lun (00:07:22): > http://bioconductor.org/spb_reports/basilisk_buildreport_20200115213515.html#tokay2_check_anchor

Aaron Lun (00:12:56): > Am I great? Or the greatest?

2020-01-17

Alan O’C (08:41:44): > Is there an R wrapper for FI-tSNE and/or is there a plan to make one?

Aaron Lun (11:18:30): > As I mentioned on some other channel, I would imagine that the Rtsne maintainer would be pleased to get a PR on this.

Aaron Lun (11:19:10): > Doing so in this manner would also allow us to directly use it throughout BioC’s single-cell stack.

Alan O’C (11:23:54): > Cool, I must have missed that. Just checking to see if someone here is on it already. Unfortunately I think it may require refactoring the existing fitsne library rather than just wrapping it (or doing some major hacking)

Ahmet Kurdoglu (14:00:57): > @Ahmet Kurdoglu has joined the channel

2020-01-21

Qian Liu (14:45:47): > @Qian Liu has joined the channel

2020-01-22

Stephany Orjuela (05:24:12): > @Stephany Orjuela has joined the channel

Hervé Pagès (11:22:02): > What do the SCE experts in this channel think of a submission that relies on Seurat objects instead of SCE objects:https://support.bioconductor.org/p/127714/(Too bad this discussion didn’t happen on the BioC-devel mailing list.) Would it be reasonable to ask them to use SCE objects or are there things that can only be done (or are easier to do) with Seurat objects? Feel free to chime in. Thanks!

Alan O’C (11:30:11): > (not an expert but) why submit to Bioconductor if you don’t depend on any Bioc packages and (from the sounds of it) won’t be a dependency for any?

Aaron Lun (11:33:24): > I would say that, if it’s a BioC package, and it’s a single-cell package, it MUST be able to take SCE inputs, off the bat, without any user intervention. Otherwise there is simply no point.

Lori Shepherd (11:33:47): > I was trying to stress the fact that they should alter to minimum be able to use Seurat object and/or SCE but I don’t think that came across well enough ….

Alan O’C (11:40:39) (in thread): > actually, they depend on biomaRt

Aaron Lun (11:41:58): > Do you want me to chip in?

Aaron Lun (11:42:11): > Really this should be a BioC-devel mailing list conversion.

Vince Carey (11:49:18): > @Alan O’Cone reason tosubmitto BioC would be to take advantage of the multiplatform build and distribute features that present some advantages relative to CRAN. Whether or not it really fits in with the ecosystem in a sound way will affect its chances of acceptance … The conversion vignette provided at Seurat (I linked to it over at #osca-review recently, lacking awareness of a more germane target) is surely useful but is it in a package and will it be maintained in synch with bioc releases?

Vince Carey (11:49:40): > https://community-bioc.slack.com/archives/GGR3WMFH8/p1579532050001500

Aaron Lun (11:50:02): > I have started conversations with the sceasy maintainers about getting their stuff into BioC.https://github.com/cellgeni/sceasy/issues/5

Vince Carey (11:50:04): > https://satijalab.org/seurat/v3.1/conversion_vignette.html - Attachment (satijalab.org): Satija Lab > Lab Webpage —

Alan O’C (11:52:06): > There’s a “download this file from dropbox” line in that vignette that makes me feel extremely uneasy

Aaron Lun (11:53:28): > I have a snarky comment about their engineering practices in general, but I’ll keep that to myself.

2020-01-29

sangsookim (04:33:51): > @sangsookim has joined the channel

sangsookim (04:39:41): > > library(scRNAseq) > sce <- ReprocessedFluidigmData() > Error in ReprocessedFluidigmData() : could not find function "ReprocessedFluidigmData" >

Alan O’C (04:44:08) (in thread): > > > sce <- ReprocessedFluidigmData() > snapshotDate(): 2019-10-22 > see ?scRNAseq and browseVignettes('scRNAseq') for documentation > downloading 1 resources > retrieving 1 resource >

Alan O’C (04:44:18) (in thread): > I think you’re using an old version of Bioconductor

Peter Hickey (04:44:28) (in thread): > also works for me. tryBiocManager::valid()

Peter Hickey (04:44:39) (in thread): > and follow advice therein

Peter Hickey (04:45:23) (in thread): > think you’ll need to be on BioC 3.10 (this is reported inBiocManager::valid()or byBiocManager::version())

2020-01-31

sangsookim (00:20:48) (in thread): > Thank you very much. Upgrading to 3.10 solved the problem.

Alan O’C (09:01:21): > Are the spike-in molecule counts available for any of the scRNAseq data that include ERCC? I don’t see them in coldata or metadata

Alan O’C (09:02:38): > I mean, I know they’re available if I go back to the papers and find out myself, I’m more asking if somebody was smart and nice enough to have already done so

Avi Srivastava (09:41:35): > https://github.com/ppapasaikas/ECCB2018_SC/blob/master/data/ERCC_conc.txt

Alan O’C (09:44:23): > That doesn’t tell me what volume was used in what volume of solution in each experiment

Aaron Lun (11:28:42): > I can confidently say that it’s not in the scRNAseq package. I know where the concentration lives in ArrayExpress forLunSpikeInData(), but I didn’t bother to put it in.

Alan O’C (11:57:53): > Cool just thought it was worth checking. Great resource btw

Alan O’C (11:58:18): > by which I mean thanks for the work in putting it together

Aaron Lun (11:58:26): > If you want to make a PR to add concentrations…

Alan O’C (12:00:55): > You’d add it to the Rmds ininst/scriptsright?

Aaron Lun (12:02:24): > That’s correct. I would upload it probably as a separate object, possibly just a named numeric vector; this is because they will be parallel to therowDataof thealtExpsholding those spike-ins, not parallel to the mainrowDataitself. (Otherwise you would have to fill in everything else withNAs for concentrations.)

Aaron Lun (12:02:39): > If you have a dataset you’re interested in and want to add spike-in concentrations to, we could iterate over it.

Alan O’C (12:06:50): > I think there’s a couple that might be interesting but yeah would start with one, prob Zeisel

2020-02-04

koki (18:38:22): > @koki has joined the channel

2020-02-05

Will Macnair (06:35:33): > @Will Macnair has joined the channel

2020-02-07

Nitin Sharma (04:27:00): > @Nitin Sharma has joined the channel

2020-02-08

Aaron Lun (01:14:37): > <!channel>No action required, buthttps://github.com/drisso/SingleCellExperiment/pull/45may be of interest to some.

GitHub (Legacy) (01:18:59): > was added to this conversation

Sridhar N (09:51:12): > with the current cellranger output there is file that outputs umap co-ordinatesbarcode,umap1,umap2, i was able to assign cell types to each barcode using an external tool..i was wondering how to map gene expression to each celltype cluster? is there a straight forward way to do this or no i have to create a 10x object and import it using sce?

Sridhar N (09:51:55): > > testumap <- ggplot(xx, aes(x = `UMAP-1`, y = `UMAP-2`)) + > geom_point(aes(colour = factor(`Cell Type`)), shape = 3) > testumap > - File (PNG): Screen Shot 2020-02-08 at 8.48.42 AM.png

Aaron Lun (13:03:51): > I don’t understand the problem. Just usescran::findMarkersor something similar.

Sridhar N (13:12:58): > well i was being over ambitious

Sridhar N (13:13:07): > ended up using seurat

USLACKBOT (13:14:50): > This message was deleted.

Aaron Lun (13:18:13): > excellent

Aaron Lun (13:19:06): > Anyway,@Sridhar N, why can’t you just runscran::findMarkers(<your log-expression matrix>, <your factor of cell types>)?

Aaron Lun (13:22:22): > Insofar as it is possible, all of the functions that I write that operate on an SCE can also operate on a naked matrix.

Aaron Lun (13:22:46): > This means that people who just have a matrix and don’t want to wrap it in an SCE can still use those functions.

Aaron Lun (13:23:43): > My personal peeve is with packages that require you to construct an object in order to use the functions; from a programming perspective, this annoys me to no end.

Aaron Lun (13:25:01): > I have to start at the bottom of their stack and work my way up, whereas what should really happen (and what I try to support) are “skybridges” between stacks, e.g., if you just have a reduced dimension result in a matrix and want to run clustering on that, you should be able to do so without constructing a SCE or Seurat or CellDataSet or whatever.

Jared Andrews (13:40:10) (in thread): > You should take a look at dittoSeq if you haven’t already. Dan’s got some very nice visualizations and code that you may be able to build upon.

Aaron Lun (13:50:01) (in thread): > I would like to see a whole bunch of options for SCE visualization. No problem with having multiple packages to do that as long as I can easily shop around with the base SCE currency.

Aaron Lun (14:11:34) (in thread): > If it becomes a Bioconductor package, then I’ll use it. If there’s enough such packages, then I can even write a Chapter on the book about visualization, but that’s a big if.

Aaron Lun (14:14:05) (in thread): > Nagging@Dan Bunisabout his submission.

Jared Andrews (14:21:39) (in thread): > Yeah, dittoSeq doesn’t do flow, so those are nice functions.

Dan Bunis (14:23:09) (in thread): > Thanks for the tag. I’ll take a look. dittoSeq is submitted as well. It’s also a scDataViz tool that does little other than making plots. It works with SCEs as well as Seurats + bulk RNAseq (as non-SCE SummarizedExperiments by the time it’s accepted, currently as a custom S4 that I’ve been requested to scrap).

Dan Bunis (14:24:11) (in thread): > But yea, not built towards cytometry data.

Sridhar N (14:30:32): > agree

Dan Bunis (14:31:40): > For Seurat,@Sridhar N, the equivalent functions areFindMarkersorFindAllMarkers. You’ll have to assign your clusters toIdents(object)first if you don’t already have them there.

Sridhar N (14:33:43): > ye that is what i did

Sridhar N (14:34:32): > my goal was to see expression of certain genes across various celltypes and groups

Dan Bunis (14:34:41): > I also agree that working with plain matrices is ideal. It ends up be quite different than self-contained objects in practice for many features, but is certainly something I’d plan to add into dittoSeq after it’s into bioc

Sridhar N (14:35:09): - File (PNG): Screen Shot 2020-02-08 at 1.33.48 PM.png

Sridhar N (14:35:25): > > color = grDevices::colors()[grep('gr(a|e)y', grDevices::colors(), invert = T)] > DotPlot(integrated, features = features, > group.by ="cell_types",split.by = "group", > cols = sample(color, 200)) + RotatedAxis() >

Aaron Lun (14:35:55): > Wait, is thatRotatedAxisa Seurat function as well?

Sridhar N (14:36:50): > haha yes

Sridhar N (14:37:01): > https://satijalab.org/seurat/v3.0/visualization_vignette.html - Attachment (satijalab.org): Satija Lab > Lab Webpage —

Aaron Lun (14:37:04): > yegods

Aaron Lun (14:37:12): > Those guys really love reinventing the wheel

Dan Bunis (14:37:15): > Seurat did try to get fancy with some ggplot viz mods.

Sridhar N (14:37:21): > although it does not have a option to facet wrap base don group which will be ideal

Aaron Lun (14:37:25): > what was wrong with acoord_flip()?

Sridhar N (14:37:32): > ¯*(ツ)*/¯

Sridhar N (14:37:48): > i might export the dotplot object and add a facet wrap thingy my self

Kevin Rue-Albrecht (14:37:53): > on the topic, did you see that one:https://cran.r-project.org/web/packages/ggeasy/index.html

Kevin Rue-Albrecht (14:40:00): > haven’t used it yet, but it looks convenient:https://twitter.com/carroll_jono/status/1223279115479830528 - Attachment (twitter): Attachment > I am so very excited to announce that {ggeasy} is now a CRAN package! Having trouble remembering how to tweak your {ggplot2} components? We got you covered. https://CRAN.R-project.org/package=ggeasy https://pbs.twimg.com/media/EPn130hU0AAcqVr.jpg

Aaron Lun (14:40:23): > All this viz stuff is definitely not my problem. The sooner I can deprecate all ofscater’s viz commands, the better.

Dan Bunis (14:40:36): > You might be able to just+ facet(...)@Sridhar N

Sridhar N (14:41:42): > like this > > color = grDevices::colors()[grep('gr(a|e)y', grDevices::colors(), invert = T)] > DotPlot(integrated, features = features, > group.by ="cell_types",split.by = "group", > cols = sample(color, 200)) + facet_wrap(~group) >

Sridhar N (14:41:43): > ?

Dan Bunis (14:42:43): > @Aaron LunI do plan to get dittoSeq to the point where you can replace them. Assuming I get dittoSeq in this cycle, hopefully within the April - October cycle!

Aaron Lun (14:42:55): > Good, good.

Dan Bunis (14:43:24): > @Sridhar NI forget the exact syntax, but something like that is what I was thinking!

Aaron Lun (14:44:04): > A good exercise would be to go through the book and list the dittoSeq equivalent for each scater-related visualization call.

Sridhar N (14:44:20): > nah that does not work

Aaron Lun (14:45:55): > That’s probably because theypasted the group onto the cell label, so you don’t have a separategroupin the data.frame anymore.

Dan Bunis (14:48:47) (in thread): > Oh I’d gone through and seen nothing that could not be done. Just I will need to open up how my data.type input works to allow all the scater-related options. …a good reminder cuz that had fallen off my todo list

Aaron Lun (14:49:39): > At the end of the day, unfortunately, we generally have to fall back to raw ggplot to get exactly the figure that we want.

Dan Bunis (14:50:21): > For making dataframes yourself for this purpose,dittoSeq::geneanddittoSeq::metamay help.

Dan Bunis (14:51:35): > They aren’t truly necessary likely, but they standardize grabbing such data from either Seurat or SCE:man-shrugging:

Dan Bunis (14:57:57): > Allowing grabbing an extra gene or metadata column(s) into the data frame is also going into my dittoSeq todo list so that something like this would be easier.

Aaron Lun (16:37:57): > @Dan BunisI also have amakePerCellDFandmakePerFeatureDFfunction inscaterto operate on SCEs. Are you depending on scater ATM? It would be nice if you could PR any useful things from your functions down into scater so that people can benefit from it at a lower level.

Dan Bunis (16:45:35): > Currently my depends are limited as I don’t want to force Seurat peeps to have SCE packages or vice versa.

Sridhar N (16:49:09): > is this the most recent detailed workflow for scaterhttps://f1000research.com/articles/5-2122

Aaron Lun (16:49:29): > Oh no, that’s ancient.

Aaron Lun (16:49:51): > The book is the workflow.

Aaron Lun (16:49:53): > https://osca.bioconductor.org/ - Attachment (osca.bioconductor.org): Orchestrating Single-Cell Analysis with Bioconductor > Online companion to ‘Orchestrating Single-Cell Analysis with Bioconductor’ manuscript by the Bioconductor team.

Sridhar N (16:50:12): > sweet ta!

Aaron Lun (16:50:27): > scater itself doesn’t have a “workflow”, per se, as the individual package documentation is mostly only useful for other developers.

Aaron Lun (16:50:37): > The vignette is somewhat helpful for end-users but not really because it can’t provide a lot of context.

Aaron Lun (16:50:54): > The book provides that context and how all of the different packages synergize with each other.

Aaron Lun (16:51:59): > Alright, I’m going to do some fun stuff before my night of pain.

Aaron Lun (16:52:25): > Try to add acolLabelsdialect to SCE to make it easier to programmatically get/set labels.

Aaron Lun (16:52:40): > Because after that is a review

Aaron Lun (16:52:43): > and then TAXES.

Sridhar N (16:52:52): > haha

Aaron Lun (17:03:24): > @Dan BunisIn any case you can always have yourgenejustscater::makePerCellDF* and then people who aren’t using SCEs don’t need to downloadscater(you can just have it inSuggests). And then we can just have one very good SCE->DF converter rather than two okay-ish ones.

Dan Bunis (17:10:18): > When I first madegene(and.which_data) andmeta, dittoSeq wasn’t built for SCEs yet. I’ll always need them to have separate code for Seurats, but I could make them rely onscater::makePerCellDFfor SCEs.

Aaron Lun (17:11:01): > That sounds like it could be a plan.

Aaron Lun (17:11:23): > It’s these kind of low-visibility, high-impact changes that keep the project moving.

Aaron Lun (18:20:14): > Anyway, y’all got a week to comment onhttps://github.com/drisso/SingleCellExperiment/pull/45before I just do it.

Aaron Lun (19:48:16): > @Dan Buniswe should spec out what you would like to get in thescaterfunction before any PRs are made.

Aaron Lun (19:48:37): > Just procrastinating with this review, I really don’t want to read this ms.

Aaron Lun (19:56:41): > ARGH

Aaron Lun (19:56:50): > This review is just so boring.

Aaron Lun (20:02:53): > I just had an epiphany.

Aaron Lun (20:03:00): > WHY AM I DOING THESE REVIEWS FOR FREE???

Aaron Lun (20:03:12): > Ridiculous. I’m getting ripped off.

Aaron Lun (20:03:24): > I’ve decided. That’s the last review I ever do, ever.

Aaron Lun (20:03:44): > Well, okay, except for@Charlotte Soneson’s one, but that’s more of a code review.

Aaron Lun (20:04:53): > I feel better already.

Aaron Lun (20:20:14): > @Sridhar Nif you’re interested in full control over all aspects of the visualization, then as I mentioned before, it is likely that you will have to dig under the hood. For example: > > library(scRNAseq) > sce <- ZeiselBrainData() > sce <- logNormCounts(sce) > > library(scater) > ids <- colData(sce)[,c("level1class", "tissue")] # cell type and group. > my.genes <- sample(rownames(sce), 5) # pick 5 genes, or your markers, whatever. > > # Compute average log-count per tissue/cell type combination. > # Also compute proportion of cells with detected expression. > ave <- sumCountsAcrossCells(sce, ids=ids, exprs_values='logcounts', average=TRUE, subset_row=my.genes) > prop <- numDetectedAcrossCells(sce, ids=ids, average=TRUE, subset_row=my.genes) > > # Melt this into a long-form data.frame. > combined <- melt(assay(ave)) > colnames(combined) <- c("Gene", "Column", "Average") > combined$Proportion <- melt(assay(prop))[,3] > combined <- cbind(combined, colData(ave)[combined$Column,]) > > # Profit. > ggplot(combined, mapping=aes(x=Gene, y=level1class)) + > geom_point(mapping=aes(size=Proportion, colour=Average)) + > scale_color_gradient(low="white", high="red") + > facet_wrap(~tissue) >

Aaron Lun (20:21:45): > This creates a dot plot faceted by the group. Note the use of a mono-directional colour scale: this is deliberate, see comments inhttps://ltla.github.io/SingleCellThoughts/general/visualization.html.

Sridhar N (20:21:50): > +1

Sridhar N (20:22:28): > haha > > For anyone who doesn't know what I'm talking about, it's the Seurat-style dot plots here: >

Sridhar N (20:22:33): > good on ya mate

Aaron Lun (20:26:50): > IMO, the purpose of@Dan Bunisand@Kevin Blighe’s packages is to provide quick-and-dirty visualization calls that are good enough for interactive analysis and slapping together reports and presentations, but if you want something fine-tuned for publication, you’re going to have to bleed a little.

Aaron Lun (20:27:49) (in thread): > that’s a shame. I mean, it seems like you’re the only contributor.

Dan Bunis (21:18:25): > I also want to add that dittoSeq is certainly for generating plots quickly, but it actually does aim to get you (very close to) publication-ready plots when you want them. I have re-labeling / re-ordering, modifying titles, overlaying trajectory arrows onto UMAPs, and all sorts of other things built right in as simple inputs.

Dan Bunis (21:20:39): > It doesn’t have a DotPlot viz yet though, so even if I added the ability to throw extra data into the ggplot(data) dataframe tonight, I imagine that might not matter to you@Sridhar N?

Sridhar N (21:26:41): > dont sweat it

Sridhar N (21:26:47): > do it when you have time

Sridhar N (21:26:55): > i am hacking on stuff to make this work

Sridhar N (21:27:06): > but having it will be cool for broader audience

Aaron Lun (21:48:10): > @Dan BunisI don’t doubt that, but I would imagine that that when you say “publication-ready”, you mean “ready foryourpublications”. Which is fine and all, but it’s worth keeping in mind that convenient high-level functions impose the author’s opinion about how things should be done, and if people don’t like those choices and they want their plots to look “just so”, then they’ll have to work at a lower level. This is a tension that occurs everywhere but is most pronounced in visualization where every man and his dog has an opinion about how they want their plots to look.

Aaron Lun (21:49:18): > FYI grayscale FTW.

Dan Bunis (21:55:31): > Lol that’s valid

2020-02-09

Alan O’C (15:35:32) (in thread): > It’s not even coord_flip, it’s just rotating the labels XD > > > RotatedAxis > function (...) > { > rotated.theme <- theme(axis.text.x = element_text(angle = 45, > hjust = 1), validate = TRUE, ...) > return(rotated.theme) > } >

Aaron Lun (16:42:53) (in thread): > Oh dear.

2020-02-11

Theresa Alexander (15:36:31): > @Theresa Alexander has joined the channel

2020-02-12

Thanh Le Viet (09:42:34): > @Thanh Le Viet has joined the channel

2020-02-17

Aaron Lun (01:02:59): > It is done.

Arshi Arora (12:29:12): > @Arshi Arora has joined the channel

Andrew Skelton (17:55:03): > @Andrew Skelton has joined the channel

2020-02-19

Paula Beati (13:22:50): > @Paula Beati has joined the channel

Aaron Lun (14:19:16): > @Dan BunisI have SOLVED the DF problem. Theextremebranch ofscateron GitHub now allows you to do this insane thing: > > ggcells(sce) + geom_point(x=TSNE.1, y=TSNE.2, color=My_Gene_Of_Interest)) + facet_wrap(~some_colData_variable) > > It may not be obvious why this is insane, so I will leave that as an exercise for the reader. (Unless you want to cheat and look at the PR’s diff.)

Dan Bunis (14:21:42): > So… ggsce essentially does sce to giant, all inclusive, dataframe then outputs ggplot(data=sce)?

Aaron Lun (14:21:50): > Go on.

Dan Bunis (14:21:58): > That is indeed extreme. Love it.

Dan Bunis (14:22:50): > If only I could expect the same from Seurat, I’d be able to remove a lot of the internals from dittoSeq

Aaron Lun (14:23:03): > Now, you might say, gee Aaron, that’ll desparsify a sparse matrix! Or it’ll drag in all of the content in a HDF5Matrix! I can’t do that for my 1 million cell dataset!

Aaron Lun (14:23:22): > Then you might say to yourself, “wow, Aaron’s really lost it this time. He’s totally off his rocker.”

Aaron Lun (14:23:48): > And that’s where you would be only partially wrong.

Dan Bunis (14:24:07): > lol

Aaron Lun (14:24:14): > The solution is indeed crazy, but it certaintly won’t realize your sparse or file-backed matrices.

Dan Bunis (14:29:36): > I will take a look at how you actually handled this when I get a chance. But that sounds awesome and can allow me to heavily simplify some of my code down the line. I currently remain in the “final push” for my paper.:crossed_fingers:that it’ll be submitted this week!

2020-02-20

Alan O’C (05:17:27): > Could you not subset the data before turning it into a giant DF? Or do you just have a vendetta against RAM

Aaron Lun (12:37:49): > That’s what I was doing before, and it sucked: > > stuff <- makePerCellDF(sce, "My_gene", "My_metadata", "Some_other_thing) > ggplot(stuff) + geom_point(x=My_gene, y=My_metadata) + facet_wrap(~Some_other_thing) > > You can see how I have to specify things twice to build this plot; not good.

Aaron Lun (12:38:20): > But now you can just do: > > ggcells(sce) + geom_point(x=My_gene, y=My_metadata) + facet_wrap(~Some_other_thing) > > withoutbreaking sparsity or whatever.

Aaron Lun (12:39:46): > Major quality of life stuff, because now you can just keep adding layers without having to go back and modify the call to create the initial DF.

Alan O’C (12:45:15): > I was thinkingggcells(sce, aes(x=mygene, y=yourgene)) + geom_point()and only grab what’s in that mapping, but then I guess you could add another geom or stat that tries to grab another gene and completely break it so fair enough

Aaron Lun (13:03:47): > > library(TENxBrainData) > sce <- TENxBrainData() > library(scater) > df <- makePerCellDF(sce) >

Aaron Lun (13:04:30): > Just gonna say: > > > dim(df) > [1] 1306127 28002 > > And yet my laptop only has 16 GB of RAM. Ho ho ho.

Aaron Lun (13:04:45): > > > class(df) > [1] "data.frame" >

Alan O’C (13:12:13): > How have you squeezed an order of magnitude there?

Aaron Lun (13:13:01): > How indeed.

Alan O’C (13:20:53): > I’d be lying if I said I understood altrep or C++ templates, but I get the gist. Very cool!

Martin Morgan (13:41:09): > sounds interesting@Aaron Luncan you be a little less cryptic about what you did? I guess you’re talking abouthttps://github.com/davismcc/scater/pull/103/commits?

Aaron Lun (13:54:28): > Hold on, let me fix that example: > > library(TENxBrainData) > sce <- TENxBrainData() > rownames(sce) <- rowData(sce)$Ensembl > ggcells(sce, exprs_values="counts") + geom_point(aes(x=ENSMUSG00000089699, y=ENSMUSG00000102343)) > > This isn’t a very interesting plot, and it takes some time (blame ggplot and the file-backed HDF5Matrix) but it serves as a demonstration of the transparency of the solution.

Aaron Lun (14:26:53): > Anyway@Martin Morganit’s all ALTREP’d vectors that are lazily extracted from the assay matrix.

Aaron Lun (14:27:19): > So we can have a DF full of those guys that don’t take up any space and are only materialized when requested by the later ggplot layers.

Aaron Lun (14:27:53): > In fact, the slowest part is making the DF in the first place; I had to do theclass() <- "data.frame"hack to bypass thedata.frame()constructor.

Aaron Lun (14:28:21): > Another fun fact is that the vectors extracted from a HDF5Matrices are themselves ALTREPs.Et tu,@Hervé Pagès.

Kasper D. Hansen (15:06:53): > holy shit batman

Kasper D. Hansen (15:07:19): > I have only read the description here, but it sounds amazign

Federico Marini (15:12:01): > That’s serious ALTREP wizardry. That legal?!

Kasper D. Hansen (15:13:31): > oh yeah, that is for sure one purpose of ALTREP as I understand it

Ludwig Geistlinger (15:58:29): > @Ludwig Geistlinger has joined the channel

2020-02-21

Teun van den Brand (07:34:38): > @Teun van den Brand has joined the channel

JiefeiWang (09:43:22): > Hi@Aaron Lun, it is nice to see that ALTREP gets useful here, here is a closed issue that might be helpful:https://github.com/Bioconductor/Contributions/issues/1222

JiefeiWang (09:48:14): > Basically I have developed a package that enables users to develop ALTREP objects using just R code. Bioconductor guys and I had a discussion with R core developers on whether an ALTREP API can call an R function. Luke felt that it is a bad and unsafe idea, so I stopped developing this package and closed the issue.

JiefeiWang (09:54:11): > I am wondering if you will have the same issue. I found some R functions in yourmaterializefunction, possibly it can be safer to do them at a pure C++ level?

Aaron Lun (11:33:38): > @JiefeiWangyes I was thinking about your package when I started writing.

Aaron Lun (11:48:58): > Ihopeit might be okay. Maybe not. It can be made definitely safe for ordinary and sparse matrices; much harder to do so for others.

Aaron Lun (12:21:36): > For example, considering the HDF5Matrix class; I would have to create a LogNormMatrix class with a beachmat hook, then I would hope for HDF5Array to accept my long-standing PR. This would give a pure C++ solution (more or less) to the extraction of data from a log-normalized HDF5Matrix class.

JiefeiWang (12:57:47): > Sounds like an ambitious plan:thinking_face:

Aaron Lun (13:03:40): > Or, if I knew more about working with processes in C, I would create a fork to do all the potential GC-triggering stuff; dump the vector into a binary file; and then read it back in the parent process.

JiefeiWang (13:04:53): > well, that’s even harder than a PR..

JiefeiWang (13:05:24): > I would say if you do not use DATAPTR API(I guess in your example you did not, right?), the current implementation should be fine.

Aaron Lun (13:05:33): > If the process idea holds up, it would be more general than the HDF5 thing I mentioned above, which only works for a specific HDF5Matrix.

Aaron Lun (13:06:12): > Well, I’m not using DATAPTR, but theDataptrmethod does call back into R, and reading Luke and Gabe’s comments suggest that will be problem anywhere.

JiefeiWang (13:07:32): > another solution is to simply throw an error when dataptr is called, if you want 100% safety.

JiefeiWang (13:09:07): > that will limit the use case of your ALTREP object, but for your purpose I believe no one wants to load the entire data into their memory.

JiefeiWang (13:09:25): > so it should be fine.

JiefeiWang (13:11:08): > also it prevents users from burning their computers.

Aaron Lun (13:14:12): > Hold on. Are we talking about the sameDataptr? If you’re talking abouthttps://github.com/davismcc/scater/blob/6bcd6f17e414f945211ed1f3ff41cd59a3861a50/src/altreps.cpp#L110-L112then an error would basically break the entire usage pattern. Or are you talking about some kind of external dataptr?

JiefeiWang (13:15:33): > we are talking the same thing. I’m using cell phone, give me one second to bring my laptop.

JiefeiWang (13:20:38): > Here is your old example > > stuff <- makePerCellDF(sce, "My_gene", "My_metadata", "Some_other_thing) > ggplot(stuff) + geom_point(x=My_gene, y=My_metadata) + facet_wrap(~Some_other_thing) > > I believe the point here is to lazily load the data when it is required, so you can create a hugedata.frameobject without depleting your memory. Am I right?

Aaron Lun (13:21:36): > That was the old version, yes.

Aaron Lun (13:21:48): > The new version does away with the initialstuffcall.

JiefeiWang (13:22:16): > So when the second line is executed, will theDATAPTRfunction be called?

Aaron Lun (13:22:51): > Most probably.

Aaron Lun (13:23:10): > I mean, that’s the whole point.

JiefeiWang (13:23:31): > I see, I originally thought your example will not touchDATAPTRat all.

JiefeiWang (13:24:48): > Actually,ggplotfunction can call the other APTREP API to get the data, it does not have to beDATAPTR(Ideally)

Aaron Lun (13:26:48): > I wouldn’t know about how ggplot does things internally; I don’t think I would be able to rely on their use of one API or another.

JiefeiWang (13:27:11): > I agree

JiefeiWang (13:27:57): > Ok, my suggestion does not work in this case, at least not in this version of ggplot

JiefeiWang (13:29:37): > (Possibly we can make an issue to urge the author of ggplot to use the “safe” ALTREP API, not the “unsafe” DATAPTR)

Aaron Lun (13:30:49): > I don’t really think it makes a difference for me because all of my ALTREP functions just call into the same code thatDataptruses anyway. Unless there are better guarantees about being able to call R code from those other functions compared to usingDATAPTR.

JiefeiWang (13:32:16): > I think we do have. From Luke’s comment, the GC is off only when callingDATAPTR

JiefeiWang (13:33:21): > which means we are safe to call R functions inElt,regionandsubsetfunctions. Unless Luke forgot to mention them.

JiefeiWang (13:34:51): > It is OK for your function to callDATAPTRand eventually call R function, because the GC is not turned off.

JiefeiWang (13:40:35): > Here is Luke’s comment from the issue > > At the moment the ALTREP infrastructure disables GC around calls to > Dataptr and STRSXP Elt and Set_elt methods as these are quite likely > to need to allocate. GC is re-enabled after the method returns or is > left by a jump. The infrastructure does not disable GC around others > method calls as doing so can impact performance. Some methods do not > need it, like Duplicate, since those will only be called in a context > where GC is expected. For others, like integer and real Elt methods > calling code may not have protected things, so GC need to be disabled > by a method implementer for a method that wants to allocate. >

Aaron Lun (13:51:04): > I was of the understanding that wewantto suspend the GC if we’re calling R code inside our ALTREP method, otherwise that could trigger garbage collection and break the client code that might be calling our ALTREP code.

Aaron Lun (13:52:06): > On the other hand, Gabe does say that it wouldn’t be good to run R code with the GC off…

Aaron Lun (13:53:56): > Oh, right. So putting 2 and 2 together, this suggests that it is safe to run ALTREP methods from client code that does not assume that the GC will not run.

Aaron Lun (13:54:07): > Problem being that we don’t control what the client code will call.

Aaron Lun (13:55:11): > And evenreal_Eltwill be a problem. R code will just allocate whatever, and we can’t turn it off.

JiefeiWang (13:55:25): > > I was of the understanding that we want to suspend the GC if we're calling R code inside our ALTREP method, otherwise that could trigger garbage collection and break the client code that might be calling our ALTREP code. > > I think this is correct when the GC is not expected by the client code.

JiefeiWang (14:00:56): > I guess the problem ofDATAPTRis that when they design this API, it is possibly just a simple pointer, so there should be no GC at all. Therefore, they(Including package developers) did not protect any object before callingDATAPTR. When ALTREP comes into this game, it breaks this rule, anyone can do anything inDATAPTR, so they have to turn the GC off before enterDATAPTR. Otherwise it will break tons of package.

Aaron Lun (14:01:33): > Yes, that’s right. I had the same issues when I was using the raw C API several years ago and ALTREP was first coming onto the scene.

Aaron Lun (14:01:44): > It was like, what,INTEGERneedsPROTECTing?

Aaron Lun (14:02:09): > Part of why I switched to Rcpp in the hope that Dirk & co. would take care of the PROTECTion for me.

JiefeiWang (14:02:55): > Possibly you are right, some package developers may not protecteltfunction as well.

JiefeiWang (14:04:32): > If that’s true, it will be unsafe to call any R function in any ALTREP API.

Aaron Lun (14:09:48): > I think I’m going to create pure C++ code for ordinary and sparse matrices, which probably covers 90% of use cases.

Aaron Lun (14:10:03): > The rest will be caught with my crazy process idea.

JiefeiWang (14:12:24): > Sounds good, possibly the idea of sparse matrices can be a new package? It sounds like we are reinventing the R wheel at the C++ level.

Aaron Lun (14:12:46): > I’ve already re-invented the wheel once inbeachmat.

Aaron Lun (14:12:53): > Might as well reinvent it again.sigh

Aaron Lun (14:13:21): > Probably could just link to Matrix but I don’t want to have to deal with all that C junk.

Alan O’C (14:24:00): > Very interesting discussion. Sorry to inject plebeian questions, but what exactly is the danger in suspending GC during R code? That R code may depend on GC running, or that it may trigger GC while it’s suspended? Andhowdangerous are we talking? (ie is it relatively safe to play with ggcells without fear of crashing and burning an R session with all manner of segfault?)

Aaron Lun (14:24:17): > ¯*(ツ)*/¯

Aaron Lun (14:25:04): > Probably no GC will run, which could be problematic for memory management.

Alan O’C (14:27:57): > I’m guessing Jiefei if you got as far as submitting to Bioc it wasn’t completely catastrophic, though I guess as in that thread there’s a difference between not catastrophic and production-ready

JiefeiWang (14:44:55): > Agree@Alan O’C. it is hard to answer how serious it is, but I can give a potentially probamatic example. Imagine that your ALTREP calls an R function, and the function calls another ALTREP. Then the GC Will be turned off twice, and turned on twice. Not sure if the first turning on will cause GC starts to do its duty..

Aaron Lun (19:16:28): > Alright, backstreet’s back. This time with safe lazy loading for ordinary and sparse matrices. > > library(scRNAseq) > sce <- MacoskoRetinaData() > library(scater) > sce <- logNormCounts(sce) > df <- makePerCellDF(sce) > > Need to think about how to handle the general case, e.g., for HDF5Matrices. Probably will try this process thing on the WE.

2020-02-22

Aaron Lun (01:15:44): > POC for processes seems to do well. The premise is that the ALTREP method writes out parameters to file (nothing fancy, just a binary dump of very simple parameters like the requested column/row index). We have previously set up a persistent process that polls an agreed-upon directory; upon seeing a new parameter file in that location, the separate process reads it in, subsets the matrix to extract the requested row/column, and dumps the vector into another binary file. The ALTREP method is in turn waiting for the expected binary file and reads it back in to populate the requested array/value.

Aaron Lun (01:16:46): > MUCH work required to harden this entire procedure - I have already learnt many new things about how theparallelpackage operates - but if we can toughen it up, it allows us to execute arbitrary R operations without affecting the GC behavior of the parent R process. I am assuming that the GC of the R that we started in the separate process (started viasystem) is completely independent of the parent. I would be shocked if that was not the case.

Aaron Lun (01:24:35): > Right. Now I have to do@Charlotte Soneson’s thing, so that’s it from me today.

Aaron Lun (20:33:49): > It is done.https://github.com/LTLA/LazyAssayVectors

Aaron Lun (20:33:57): > Well, more or less. But it works in pure C(++) from ALTREP’s perspective. Little does it know that there’s a whole R process running on the other side.

Aaron Lun (20:35:19): > Currently handles the communication between processes via the filesystem andtempdir. Probably could be more neatly done with sockets, but I don’t know enough about them to attempt it FTTB.

Aaron Lun (22:13:28): > Lol > > > # Normally, this matrix would require several TB of RAM: > > library(Matrix) > > y <- rsparsematrix(1e6, 1e6, density=0.0000001) > > rownames(y) <- sprintf("Gene_%i", seq_len(nrow(y))) > > > > # This is possible, despite appearing to require ridiculous RAM usage. > > df <- createLazyRows(y) > > object.size(df) > 8000127999808 bytes >

Aaron Lun (22:24:00): > Oh R. You poor thing.

Jared Andrews (23:02:35): > Misread that as LazyAssVectors at first

Aaron Lun (23:04:41): > indeed

2020-02-23

Aaron Lun (02:23:33): > This abuse is hilarious. > > library(TENxBrainData) > sce <- TENxBrainData() > df <- createLazyColumns(counts(sce)) > object.size(df) > ## 146463857880 bytes >

Aaron Lun (02:23:47): > Show off how much RAM you can afford to your friends!

Aaron Lun (02:25:14): > That’s at least $2K on RAM alone if that object was real.

Aaron Lun (02:57:57): > On hindsight, that’s actually a real problem, because if someone does tries to savedfaccidentally, boom. The computer’s bricked.

Aaron Lun (03:42:33): > Having reflected on this for a while, it’s probably safer not to use ALTREPs for this application. There’s too little control over where the materialization might happen and the penalty is too severe. In general, it seems like ALTREPs are best suited for improving speed rather than bypassing memory limits; in the former case, unintended materialization just causes things to run a bit slower, no real harm, but the latter case will render the R session unusable.

Aaron Lun (03:43:15): > So, I guess we’re back to specifying the row and column names inggcells. Which is probably for the best.

Aaron Lun (03:48:39): > That said,@JiefeiWangmay find the code inLazyAssayVectorsuseful for revivingSharedObject. It is simple to modify the watcher concept to execute arbitrary R code; you can serialize a function and its arguments and pass it to the watcher process during construction of the ALTREP, and then when the ALTREP is materialized, just send a simple message to the watcher to execute that code. Then you can just pick up the results inside the ALTREP method. As far as the ALTREP knows, there is no R code inside the same process, so we sidestep all of the GC/error issues discussed previously.

Jialin Ma (22:29:14): > @Jialin Ma has joined the channel

2020-02-25

Hervé Pagès (00:37:05): > This was an interesting experiment@Aaron Lun. Sounds like you had a lot of fun playing with ALTREP! Another route to explore is to make ggplot2 work directly on a DataFrame object, or DelayedMatrix object, or any rectangular object that we want to support. After taking a quick look at thisit seemsthat this could be achieved pretty easilyand outside ggplot2by just definingggplot2::fortify()andrlang::eval_tidy()methods for DataFrame objects: > > library(S4Vectors) > library(ggplot2) > > ## Same as ggplot2:::fortify.data.frame > fortify.DataFrame <- function(model, data, ...) model > > library(rlang) > ## Just a proof of concept. A serious implementation > ## should of course avoid coercing the DataFrame to > ## data.frame, which sounds feasible in theory. > eval_tidy.DataFrame <- > function(expr, data=NULL, env=caller_env()) > { > eval_tidy(expr, as.data.frame(data), env=env) > } > > Then: > > df <- data.frame(aa=1:100, bb=runif(100), cc=(1:100)/100) > ggplot(as(df, "DataFrame")) + geom_point(aes(x=bb, y=cc, color=aa)) # works! > > This would open the door to support for all kinds of out-of-memory rectangular objects (granted that ggplot2 internals never try to expand the full object e.g. by callingas.listoras.data.frameon it). You could even imagine support for SE derivatives (after all you’re free to pickup your variables along the rows rather than the columns in youreval_tidy.SummarizedExperimentmethod if you want). > > One major blocker at the moment is thatrlang::eval_tidy()is not a generic so for the above to work you need to use this slightly modified rlang (https://github.com/hpages/rlang) where I’ve turnedeval_tidy()into an S3 generic. Maybe the trickiest part will be to convince the rlang folks to make such a change. Sounds like a good mission for the working group recently announced on the#tidiness_in_biocchannel (see@Michael Lawrencepost onbuilding bridges between Bioconductor and the tidyverse).

Aaron Lun (00:39:55): > Yes, I too hit the wall ateval_tidy.

Aaron Lun (00:40:53): > I had written adelayed.data.frameto try to trickggplotto doing what I wanted. Got most of the way but couldn’t sneak past it at the C level.

Teun van den Brand (03:05:27): > I’m also working on circumventing the eval_tidy and had some success, but it is nowhere near usable yet. The next roadblock is a hardrlang::is_vector()check on all columns, which isFALSEfor S4 classes (https://github.com/tidyverse/ggplot2/issues/3835). Seehttps://community-bioc.slack.com/archives/CPL94P675/p1581926938002800in the cool-vis channel. With regards to the DataFrame class, I cannot see any way that it would survive the internals of ggplot as it is combined and split out into base::data.frame at many points. It would probably be safest to convert to base::data.frame at the fortify stage and let column classes handle any problems. - Attachment: Attachment > I’ve made some progress sheparding Vector-based classes through ggplot. Any thoughts on what classes would be cool to have in ggplot? > df <- DataFrame( > x = GRanges(c("chr1:1000-2000", "chr1:1500-2500", > "chr2:3000-4000", "chr3:4000-4500")), > y = c(1, 2, 3, 2) > ) > > ggplot(df, aes(xmin = x, xmax = x, ymin = y - 0.2, ymax = y + 0.2)) + > geom_rect()

Hervé Pagès (04:02:57): > I should have said that my suggestion to have ggplot2 support arbitrary rectangular objects via definition of specializedggplot2::fortify()andrlang::eval_tidy()methods is for rectangular objects where the columns (or rows) are ordinary vectors (like in Aaron’s use case where the rows of an SCE object are atomic vectors), not S4 objects, so therlang::is_vectorcheck should not cause problems. Looks like our goals are slightly different.

JiefeiWang (06:49:01) (in thread): > Thank you, Aaron. That’s an interesting suggestion. I am actually working on SharedObject and I will take a look at your implementation.

Aaron Lun (11:28:31) (in thread): > If you take it forward, you will probably want to use sockets at the C level, then you can just piggy-back off process creation inparallel::makePSOCKCluster. Tricky bit is writing to the socket connection, you’ll want to mimic how R’sserializedoes it but the C code is impenetrable to me.

Raphael Gottardo (12:04:56): > You guys should look at ggcyto.@Mike Jiangcan comment on this.

2020-02-28

Yi Wang (16:20:52): > @Yi Wang has joined the channel

2020-03-04

Aaron Lun (00:29:17): > On a completely different note. Does anyone want to help me maintain scater?

Aaron Lun (01:21:08): > 3000 downloads a month, 72nd package in terms of downloads. A decent amount of bragging rights if you’re willing to put in the work.

Mike Smith (08:53:58): > You can pitch this in the next Developers’ teleconference

Aaron Lun (11:54:19): > Oh yes.

Tim Triche (12:03:55): > @Ben Johnson@Jacob Morrisonyou know you want to do this thing

Jacob Morrison (12:03:59): > @Jacob Morrison has joined the channel

Aaron Lun (12:08:33): > The succession model follows that of the Sith, see#sc-signaturefor more details.

Tim Triche (12:11:33): > perfectly reasonable

Alan O’C (13:02:32): > Perhaps, it seems fairly stable from release to release for the most part

2020-03-06

Pierre-Luc Germain (14:51:33): > @Pierre-Luc Germain has joined the channel

2020-03-12

Aaron Lun (22:26:37): > @Alan O’C(i) where did you get the volume/dilution numbers from and (ii) isn’t the dilution referring to that of the spike-in solution before adding the volume?

Aaron Lun (22:29:22): > Right, Val’s paper.

Aaron Lun (22:31:28): > Can you confirm those numbers from Zeisel’s original paper? Can’t access it behind the paywall.

Aaron Lun (22:51:35): > re. dilutions: I’m pretty sure that they should be interpreted as e.g. 9 nL of a 1:20000-diluted solution. Your argument description suggested that the 9 nL was 1/20000th of the final reaction volume; this cannot be the case on the Fluidigm C1, which only has 9.1 nL per cell anyway.

Aaron Lun (22:52:45): > I took the liberty of fixing the arg, but if you think I’m wrong, you can always bring it:slightly_smiling_face:

Aaron Lun (22:53:56): > And also we are GO to add this to other datasets. You’ll probably have to hard reset your fork. And don’t forget to add your name to the DESCRIPTION.

2020-03-13

Alan O’C (07:09:19): > Yeah I got the numbers from the power analysis paper. You’re quite right about the semantics, I wasn’t at all careful in being clear there

Alan O’C (07:10:17): > Great, I’ll dig through the spreadsheet later

2020-03-16

cigdemak (13:23:08): > @cigdemak has joined the channel

Dan Bunis (17:14:19): > @Sridhar Nfyi faceting is added in dittoSeq now (https://github.com/dtm2451/dittoSeq). Both through an automaticsplit.byvariable or a manualextra.varsvariable which doesn’t do the splitting, but lets you control your own via > > dittoPlot(…, extra.vars = c("meta1", "meta2")) + > facet_grid(rows = vars(meta1), cols = vars(meta2)) > > Also, dittoSeq has been accepted into Bioconductor:tada:

Sridhar N (17:15:14): > yay!

Sridhar N (17:15:20): > that is awesome

Sridhar N (17:15:29): > many thanks!

2020-03-17

Giuseppe D’Agostino (07:27:27): > @Giuseppe D’Agostino has joined the channel

Jianhong (20:05:39): > @Jianhong has joined the channel

2020-03-18

Crowy (12:18:14): > @Crowy has joined the channel

2020-03-23

Peter Allen (12:05:24): > @Peter Allen has joined the channel

2020-03-24

Edgar (13:24:10): > @Edgar has joined the channel

2020-03-25

Aaron Lun (01:52:48): > @Alan O’Care we going to get some more spike-ins?

Alan O’C (06:29:28): > Sure, though it slipped my mind over the last while tbh

Alan O’C (08:24:39): > Hmm, seems there’s only a couple more in Valentine’s paper so I’ll have to actually put in the leg work for the rest. Sad

brian capaldo (13:32:35): > @brian capaldo has joined the channel

2020-03-31

Yagoub Ali Ibrahim Adam (12:30:31): > @Yagoub Ali Ibrahim Adam has joined the channel

2020-04-01

Raphaël Bonnet (12:14:32): > @Raphaël Bonnet has joined the channel

2020-04-05

Pavitra Roychoudhury (18:55:06): > @Pavitra Roychoudhury has joined the channel

2020-04-16

Dylan Harwood (09:31:59): > @Dylan Harwood has joined the channel

Xinran Tian (14:27:26): > @Xinran Tian has joined the channel

2020-04-20

Nils Eling (02:49:09): > @Nils Eling has joined the channel

2020-04-22

Levi Waldron (13:39:56): > FYI we’re organizing a single-cell multimodal data journal club:https://community-bioc.slack.com/archives/C35G93GJH/p1587573511180200 - Attachment: Attachment > Hi all! > > I am new to this community and have introduced myself here today. I am a postdoctoral researcher with @Levi Waldron. > > I’d like to invite you to a newly forming journal club about single cell multimodal (SCMM) data with @Levi Waldron and @Davide Risso’s groups . If you have experience working with SCMM or are interested in learning more about the literature, please consider joining us. Take a look at the list of article ideas here. > > Please fill out this doodle to help us schedule our first meeting time: https://doodle.com/poll/3ry8kg66yv3rsbvq > > Join us in the #education-and-training channel. This is where I will post SCMM journal club updates. > > In addition to the journal club, if you are interested in sharing a SCMM dataset of yours or one of interest, please share with us. To streamline our database curation, please read the following guidelines here. Parties interested in contributing would have to make a pull request to the repository. > > I’m looking forward to reading with you!

2020-04-23

Tim Triche (11:08:56): > @Ben Johnsonthis seems like it might be right up your alley:wink:

2020-04-24

Sridhar N (11:47:47): > what is the best way to plot only selected cells using dittoseq?

Sridhar N (11:47:59): > > multi_dittoPlot(scrna, genes.toplot, group.by = "cell_types", split.by= "group", > vlnplot.lineweight = 0.2, jitter.size = 0.3, > cells.use = meta("cell_types", scrna) == "Mac") >

Sridhar N (11:48:20): > I could do it for one celltype"Mac"

Sridhar N (11:48:30): > is there a magic command to pass a list?

Jared Andrews (11:51:45): > @Dan BunisMight have some magic to help with that. I think just using%in%and a vector will work though.

Jared Andrews (11:52:28): > > cells.use = meta("cell_types", scrna) %in% c("Mac", "and", "cheese") >

Sridhar N (11:53:27): > lol

Sridhar N (11:53:33): > :face_palm:

Sridhar N (11:53:37): > only if i had tried

Sridhar N (11:53:38): > thanks!

Dan Bunis (12:00:38): > Yup! As@Jared Andrewsgave already, %in% like this is commonly what I use as %in% is often the easiest method. But there are many ways you could do it ascells.usefor all dittoSeq fxns accepts any of these: > * a string vector of cellnames (contents colnames(object) for SCEs or Seurat-v3) > * a logical vector the same length as there are columns in the object > * a numeric vector containing the indices of the wanted cells

Devika Agarwal (12:05:55): > @Devika Agarwal has joined the channel

Dan Bunis (12:06:26): > I’ll also add that while dittoSeq’smeta("metaname", object)function was useful in writing the package as it ensures a standard output with every value named as it’s cell’s name, simpleobject$metanamedoes suffice for most purposes and is easier to type (withtabcompletion too!).

Dan Bunis (12:07:15): > Socells.use = scrna$cell_types %in% c("Mac", "and", "cheese")would have the same effect.

Sridhar N (12:10:20): > awesome thanks!

2020-04-25

Daniela Cassol (17:29:20): > @Daniela Cassol has joined the channel

2020-04-27

Charlotte Rich-Griffin (04:06:21): > @Charlotte Rich-Griffin has joined the channel

Kelly Eckenrode (10:56:58): > @Kelly Eckenrode has joined the channel

2020-04-29

Aaron Lun (02:38:41): > <!here>As intimated some time ago in#developers-forum, I am now proceeding with the divorce betweenscater’s important processing functionality and its despised plotting capabilities. This shouldn’t affect anyone as everything will still be back-compatible and all functions will still be available via re-exports; the change is that all of the things I care about will now live inhttps://github.com/LTLA/scuttlewhile all the plotting functions will stay inscater. > > If you’re passionate about single-cell visualization and you want to try your hand at maintaining a package, let me know and I will onboard you ontoscater’s development team (currently: myself) over the course of this release cycle.

Anamaria Elek (03:05:11): > @Anamaria Elek has joined the channel

Hervé Pagès (13:04:46): > scuttle? I’m surprised nobody went forscoutalready

Aaron Lun (13:05:11): > There’s “utilities” in the name somewhere.

Hervé Pagès (13:05:46): > ah ok

Alan O’C (13:14:15): > Sounds interesting, I was already considering making a PR to remove point outlines from the reduced dim plots

Aaron Lun (13:14:46): > Do you want it?

Aaron Lun (13:14:54): > It’s a renovator’s delight.

Alan O’C (13:15:24): > I’m sensing some estate agent vibes here

Alan O’C (13:15:31): > “It’s a real fixer-upper”:smile:

Aaron Lun (13:15:45): > took the words right out of my mouth

Dan Bunis (13:19:55): > I’d like to help some. Just not sure how much I can commit to.

Alan O’C (13:21:48): > Sure why not. I mean, I took over a CRAN package that I don’t even use, at least here I can get some benefit:joy:

Aaron Lun (13:22:01): > Good, good.

Aaron Lun (13:22:08): > I’ll finalize the divorce over the next week.

2020-04-30

Sean Davis (08:26:10): > Software as a metaphor for life….

Sean Davis (08:27:13): > Or is it the other way around?

Federico Marini (10:38:25): > > Software as a metaphor for life…. > Oh, here’s another 6-words-story:slightly_smiling_face:

2020-05-01

Norbert Tavares (17:43:08): > @Norbert Tavares has joined the channel

2020-05-03

Nitin Sharma (12:18:00): > Hello everyone, I have created a channel#singlecell-queriesfor more general queries regarding single-cell analysis.

2020-05-04

Nadine Bestard-Cuche (06:27:07): > @Nadine Bestard-Cuche has joined the channel

Aaron Lun (13:44:28): > Har har har. scuttle builds and passes check. Most custody arrangements with scater have been resolved, just a few more bits and pieces left to hammer out.

2020-05-05

Zhiyuan Hu (04:54:15): > @Zhiyuan Hu has joined the channel

2020-05-06

Aaron Lun (19:53:08): > scuttle has been submitted. I will begin the process of stripping down scater soon…

Aaron Lun (19:56:42): > In fact, the real problem is in contacting Davis, who has basically just disappeared.

2020-05-07

Stephanie Hicks (00:42:27): > Fwiw I see Davis tweet from time to time, so I’m assuming he is alive at least:upside_down_face:

Shila Ghazanfar (04:30:56): > I dont think Davis has disappeared, he’s just down under:upside_down_face::flag-au:

Sanchit Saini (07:08:43): > @Sanchit Saini has joined the channel

Ben Story (08:30:15): > @Ben Story has joined the channel

Tim Triche (18:25:13) (in thread): - Attachment: rim shot

2020-05-09

Aaron Lun (20:53:55): > @Alan O’Care you ready?

2020-05-10

Alan O’C (04:42:07): > Sure thing

Aaron Lun (04:44:00): > I’ve sent an email to Davis about getting control over the scater GitHub repo. Let’s see how that goes.

Alan O’C (07:50:48): > Planning on transferring the repo or adding myself and Dan as collaborators?

Sangram Keshari Sahu (09:26:56): > @Sangram Keshari Sahu has joined the channel

Aaron Lun (13:42:12): > Yes.

2020-05-11

Giuseppe D’Agostino (22:09:31): > anybody know any methods to build a graph using a large (“Cholmod error ‘problem too large’“-large) sparse adjacency matrix?

Aaron Lun (22:10:31): > I don’t know what context that error occurs in, but I would justwhich()andarrayIndit and pass that tomake_graph.

Giuseppe D’Agostino (22:12:04): > it happens when your RAM craps out trying to go from adgCMatrixto amatrix

Giuseppe D’Agostino (22:13:38): > thanks tho, I’ll see what happens usingarrayInd

Aaron Lun (22:14:30): > igraphcan convert from graph to sparse, so I’m surprised it can’t go back the other way.

Aaron Lun (22:14:50): > Note that the above approach requires you to set non-1 edge weights manually.

Giuseppe D’Agostino (22:18:01): > it actually works directly usinggraph.adjacencyon the sparse matrix. I probably should have started from that

2020-05-12

Aaron Lun (03:38:42): > @Alan O’CIt has begun.

Aaron Lun (03:40:57): > masterandRELEASE_*branches are protected and require a reviewed PR to modify. We’ll use this release cycle as an onboarding period to make sure all is well before any formal transfer of maintainership.

Alan O’C (06:12:44): > Sounds good, just got the email

Aaron Lun (11:49:00): > SHOW ME WHAT YOU GOT

Aaron Lun (20:53:44): > excellent, excellent.

2020-05-13

Michael Love (09:24:19): > is Amezquita et al 2019 an ok journal citation for the SCE container?

Michael Love (09:24:32): > SCE doesn’t have a journal article CITATION it looks like

Federico Marini (09:29:38): > before that I had Huber 2014 when I referred to it as “an extension of the SE”

Michael Love (09:32:38): > right, Amezquita et al explicitly describes it so seems more appropriate

Federico Marini (09:44:17): > Could be added probably in the CITATION piece of SCE@Davide Rissoand@Aaron Lun?

Davide Risso (09:49:30): > yes, as far as I know, Amezquita et al. is the only paper in which we formally describe the SCE class. It should be added in the CITATION of the SCE, I will try to do this when I find the time.

Michael Love (10:10:17): > “I will try to do this when I find the time” is what i have said to every human i talked to in the past 8 weeks

Hervé Pagès (13:01:30): > in the past 8 weeks only? That’s what I’ve said to my wife in the past 15 years

Aedin Culhane (17:24:02): > @Lauren Hsuand I would like to know the best class for a List of SingleCellExperiment objects. We want to easily populate reducedDim slot across multiple SCEs. We have looked at extending ExperimentList and MAE. We could add checks that ensure elements of an MAE or ExperimentList are all SCE (and have reducedDim slot). Or could extend SingleCellExperiment to List or constrain ExperimentList, MAE, checking that the objects have reducedDim slot (aka SCE)? I talked to@Marcel Ramos Pérezabout this. We spoke about@Levi Waldron’s single cell multi modal, a@Aaron LunSCE extension for storing additional meta data. Any suggestions@Peter Hickey@Kevin Blighe

Aedin Culhane (17:26:02): > Also where is the best place to look for the most efficient/ delayArray /off-disk subsetting (intersectRows/Cols) functions of a list of SingleCellExperiment objects.

2020-05-14

Levi Waldron (15:29:57): > Briefly, List gives you metadata, ExperimentList adds row subsetting and intersection of rows, MAE adds a shared colData with map to experiments, more subsetting and reshaping, and more helpers like column intersection.

Levi Waldron (15:32:31): > There’s a big difference in how many methods each implements, so depends on whether you will use those extra methods.

Levi Waldron (15:39:39): > MAE tends to be a more natural fit for multiple experiments on the same specimens. ExperimentList for a meta analysis, multiple studies providing comparable data on different specimens

Aedin Culhane (15:49:27): > Thanks@Levi Waldron. We don’t expect that the single cell ’omics will always come from the same cells, although some experiments do have paired data, primarily we are interesting by features. Thats why we were looking at List or ExperimentList. However we also didn’t want to extend a class, if one already existed which is why we posted the question to see if others have already done this.

Aedin Culhane (15:51:33): > Where should we look to find the most memory /fast efficient implementation of subsetting multiple sparse matrices. (Delayed array or other off-disk?? ).

Hervé Pagès (16:24:47): > @Aedin CulhaneIf the set of features is the same across all your SingleCellExperiment objects then it should be possible tocbind()all your objects into one big SCE object where you keep track of the grouping of the columns in acolDatavariable. Don’t know what the implications would be for the typical single cell workflow though e.g. some adjustments would probably be needed to accommodate the special nature of this Frankenstein SCE.

Aaron Lun (16:25:35): > And if the features are different but the cells are the same, SCE has analtExpcapability.

Alan O’C (16:39:31): > Wouldn’t using a singe SCE (with or without abusing altExp) make it difficult to run dimreds separately for each “element of the list”? Seems like you’d expend much more effort making existing tools aware of how to handle objects like that than just dumping standard SCEs into a container class

Alan O’C (16:40:06): > (assuming you want to treat the batches/whatever we’re talking about here more separately more often than jointly)

Alan O’C (16:40:21): > Though I guess altExp is just a list of SEs so maybe disregard that

Hervé Pagès (16:49:55): > It seems to me that after dumping standard SCEs into a list-like container, whatever that container would be, you’d still have to expend some effort making existing tools aware of how to handle that object.

Alan O’C (16:54:08): > I’d seelapply(list, function)(okay a probably bit more than that tbf) as being less effort than trying to stop all functions that take SCE as input from doing weird things to your data

Hervé Pagès (16:59:13): > Sure, that would be a much cleaner solution. But that would only work if you can write specialized methods for your list-like object, which you cannot do without introducing a new class. And IIUC Aedin wants to avoid that.

Hervé Pagès (17:09:11): > @Aaron LunWould it make sense to consideralternative experimentsalong the 2nd dimension of an SCE?

Aaron Lun (17:09:52): > I don’t see the point. Then it might as well be a list of SCEs.

Aaron Lun (17:10:44): > I mean, what are people trying to do with this?

Hervé Pagès (17:11:13): > I don’t know. > A difference with a list of SCEs would be that they would share the same set of features.

Hervé Pagès (17:14:13): > But I agree that some solid use cases would be needed before embarking on something like this.

2020-05-15

Federico Marini (04:51:23): > iSEE as of now takes the pre-analyzed SCE objects - whatever is in there, will be used to display

Federico Marini (04:51:42): > the integration (MNN, CCA, whatevs) is still to be done offline of shiny

Aedin Culhane (08:38:12): > Thanks. The purpose is for cross study integration. Basically we are using dimension reduction to project /align points. Therefore we just need tools within the functions to store the projections then we return a list of SCE or sparse matrixes with reducedDIm slot. We have just been creasing a list, using lapply etc. however I wondered what others were doing.

Aedin Culhane (08:39:39): > We then has plot and score functions to indicate the assess the effectiveness of the methods

2020-05-18

B P Kailash (08:43:57): > @B P Kailash has joined the channel

Huipeng Li (09:23:18): > @Huipeng Li has joined the channel

2020-05-19

Tobias Hoch (11:18:29): > @Tobias Hoch has joined the channel

2020-05-22

Mike Smith (10:34:07): > Is there anything in Bioconductor that can import the AnnData h5ad format (https://anndata.readthedocs.io/en/stable/) ?

Charlotte Soneson (10:41:07): > I think there was a plan to submitsceasyto Bioconductor (https://github.com/cellgeni/sceasy/issues/5) - however, it requires an outdated version ofanndataright now.

Charlotte Soneson (10:43:24): > Seuratcan read it too (and then convert to SCE…)https://satijalab.org/seurat/v3.1/conversion_vignette.html

Mike Smith (10:45:42): > Cool thanks. You can probably guess what this is for:grimacing:

Charlotte Soneson (10:45:51): > Yep

Charlotte Soneson (10:46:12): > Please don’t rely on an old version ofanndata:smile:(I need the new one)

Mike Smith (10:49:44): > I’ll stick with Seurat (assuming that’s ok). It’s mostly to demonstrate that software already exists to read these formats, so I just want something that works.

Charlotte Soneson (10:50:26): > :+1:Seurat has worked fine for me

Aaron Lun (11:53:52): > Y’know, now thatbasiliskis up, you could even write a native parser using Python code and transfer it over withreticulate.

Tim Triche (13:06:00): > or use thishttps://theislab.github.io/scanpy-in-R/

Tim Triche (13:06:12): > and thanks again Aaron for writingbasilisk!

2020-05-24

Alexander Toenges (14:35:21): > @Alexander Toenges has joined the channel

2020-05-25

Luke Zappia (02:48:34) (in thread): > This is an ok solution but I think abasilisk package would be nice, especially for more novice users. I’d like to give it a go if I can find the time. > > P.S. If anyone has feedback on the Scanpy in R tutorial I would be keen to hear it!

Kaveh Moeini (09:18:18): > @Kaveh Moeini has joined the channel

2020-06-01

Shuyu Zheng (03:06:23): > @Shuyu Zheng has joined the channel

2020-06-03

Aaron Lun (03:02:41): > Now,@Alan O’C, for your greatest maintainer challenge yet: choosing an anime picture to put as the social media preview forscater.

Aaron Lun (03:03:03): > I wonder if this still shows up:https://github.com/LTLA/basilisk

Aaron Lun (03:04:54): > Hm. I guess someone turned it off.

Aaron Lun (03:05:48): > Well, if anyone tweets to a link to one of my repos, you’ll get some sweet anime previews.

Aaron Lun (03:10:17): > Anyway, the purge has begun.scaterhas been hollowed out, with most of its contents migrated toscuttle.

Alan O’C (04:47:57): > You must know how much anime I watch - that would indeed be my greatest challenge:smile:

Alan O’C (04:48:57): > Sounds good, I’m knee deep in conditionals atm but hoping to make some changes later today. I’ll survey the damage at the same time to understand what’s left

2020-06-04

Aaron Lun (02:10:42): > Well, while you do that, I’m going to work on my anime GIF slackbot for one of my other slack groups.

hcorrada (11:16:06): > oh my, we were warned:confused:

2020-06-06

Aaron Lun (19:26:02): > IT IS DONEhttps://ltla.github.io/acceptable-anime-gifs/ - Attachment (acceptable-anime-gifs): REST API documentation > A curated set of GIFs that are reasonably SFW, primarily used to power annoying Slackbots.

Olagunju Abdulrahman (19:58:16): > @Olagunju Abdulrahman has joined the channel

2020-06-07

Alan O’C (06:48:43) (in thread): > God help us all

Yingxin Lin (19:46:28): > @Yingxin Lin has joined the channel

Kasper D. Hansen (22:14:30): > How is the rating decided@Aaron Lun?

Aaron Lun (22:15:33): > By a qualified expert

Aaron Lun (22:15:35): > aka me.

Aaron Lun (22:15:49): > This is not all, of course. BEHOLD

Aaron Lun (22:16:09): > Which you can use like so:https://github.com/LTLA/anime-slack-cronjob

2020-06-08

Tanzeel Tagalsir (03:00:39): > @Tanzeel Tagalsir has joined the channel

Davide Risso (10:30:04) (in thread): > Done. And in less than a month:grimacing:

Aaron Lun (11:01:24) (in thread): > You shouldn’t need atextVersion, it gets auto-generated now

Davide Risso (11:03:35) (in thread): > oh, didn’t know that.. will get rid of it

NABISUBI PATRICIA (16:18:50): > @NABISUBI PATRICIA has joined the channel

2020-06-09

Shankar Shakya (10:05:59): > @Shankar Shakya has joined the channel

Mark Tefero Kivumbi (15:38:00): > @Mark Tefero Kivumbi has joined the channel

Taoyu Mei (17:16:32): > @Taoyu Mei has joined the channel

2020-06-10

MounikaGoruganthu (11:23:56): > @MounikaGoruganthu has joined the channel

Ye Zheng (12:04:49): > @Ye Zheng has joined the channel

Vandhana (16:03:35): > @Vandhana has joined the channel

2020-06-16

Aaron Lun (03:06:13): > @Charlotte Sonesonyou’ll want to check the updated velociraptor docs at?scvelo. I wrote some words about the role of the three different matrices.

Charlotte Soneson (03:06:31): > Ok, will do

Aaron Lun (03:08:29): > If we can add avelocytowrapper, we’re basically done.

Aaron Lun (03:08:33): > Oh, and some tests.

Aaron Lun (03:08:35): > And a vignette

Aaron Lun (03:08:41): > Well, okay, there’s still some way to go.

Charlotte Soneson (03:13:25): > Do we need thevelocytowrapper? It’s already available as an R package, andscVelofits the same model if you choose the steady-state mode.

Aaron Lun (03:14:48): > Is it on CRAN?

Charlotte Soneson (03:15:22): > Not that I know:https://github.com/velocyto-team/velocyto.R

Aaron Lun (04:22:09): > So we’ll have to wrap the Pypi version. Which we would have to do anyway, I always understood the Python version to be more coherent.

Tim Triche (09:45:43): > is scvelo capable of using spanning counts?

Tim Triche (09:46:53): > the velocyto.R package is no longer actively developed IIRC, Ben was saying that La Manno’s lab is focusing on the python version for any further development

Tim Triche (09:48:53): > @Aaron Lunyou recommend randomSVD or irlbaSVD for the default bsparam?

Tim Triche (09:52:02): > also@Aaron Lunis this (https://www.biorxiv.org/content/10.1101/404962v1.full) process automated anywhere in (say) scuttle? (looking through scuttle docs)

jessi elderkin (09:53:59): > @jessi elderkin has joined the channel

Charlotte Soneson (10:31:30): > > is scvelo capable of using spanning counts? > As far as I know it just takes a spliced and an unspliced count matrix, and where you put the spanning reads in the quantification is up to you. > > the velocyto.R package is no longer actively developed IIRC, Ben was saying that La Manno’s lab is focusing on the python version for any further development > I haven’t really used the python version, but the GitHub repo was last updated ~1.5 year ago - I have no insights into whether this is currently maintained/actively developed.

Jordan Veldboom (10:31:32): > @Jordan Veldboom has joined the channel

Tim Triche (10:41:50): > @Charlotte Sonesonin sce_helpers.R you just use scounts as counts – did that end up being the most stable?

Charlotte Soneson (10:42:49): > We don’t really use theXmatrix for anything - the reduced dimension information is provided externally (calculated from the spliced-only quantification). So I would say it doesn’t really matter there. We just need something in that slot.

Tim Triche (10:42:54): > the dataset I’m working on does not have raw data for one of the 8 samples, so I’m working with their summary data first and will go back & reprocess with Salmon in a bit. It’s microwell-seq data – you probably know which one:slightly_smiling_face:

Tim Triche (10:43:04): > ok perfect. that’s kind of what I thought

Charlotte Soneson (10:44:28): > Well, actually, that was not completely true. The PCA used for the moments is calculated from the spliced counts, but it didn’t seem to make a big difference (as long as it wasn’t obtained from only the unspliced counts).

Charlotte Soneson (10:44:57): > Hard to say what is really “best” (whatever that means here).

Tim Triche (10:49:14): > “most practical and least worst”:wink:

Charlotte Soneson (10:51:47): > :slightly_smiling_face:

Tim Triche (10:54:54): > ugh, I realized that I rewrote a bunch of your functions in sce_helper for microwell-seq, except that third matrix has me calling Reduce(union, …) all over the place

Aaron Lun (11:07:54) (in thread): > Depends on whether you’re file-backed or not. Random is much faster for file-backed matrices, but IRLBA has more stable convergence.

Aaron Lun (11:09:15) (in thread): > The safest, guaranteed solution is to turn ondownsample=TRUEinlogNormCounts. This will also discard heaps of data, but that’s how it is.

Aaron Lun (11:09:30) (in thread): > Ideally, I would use something like GLM-PCA, butscryis just SO SLOW.

Tim Triche (11:14:38) (in thread): > very helpful observation. thanks!

Aaron Lun (11:15:07) (in thread): > FastAutoParam tries to choose automatically, but it’s not very smart sometimes.

Aaron Lun (11:15:43) (in thread): > it would be fair to say that spliced counts are more likely to ignore those ambiguous reads?

Tim Triche (11:16:23) (in thread): > for downsampling does it make sense to bootstrap the process? And yes, both GLM-PCA and BFA have been horrendously slow in my experience

Aaron Lun (11:16:59) (in thread): > What do you mean by bootstrapping the downsampling?

Charlotte Soneson (11:20:02) (in thread): > More likely than normal counts? I think it depends on the quantification method (e.g., whether it checks if reads are consistent with a transcript model). And the velocity counting wouldn’t really ignore them, but rather assign them to unspliced; the spliced+unspliced count will generally be > normal count. For reads falling in regions that could be either exonic or intronic, yes, they would typically be counted in the normal count but it’s not clear where they would be counted by the velocity counting.

Aaron Lun (11:20:15): > Also, I wonder whether we can say that, if the dimreds are supplied, the output is totally independent of the supplied X.

Tim Triche (11:20:41) (in thread): > > If 'downsample=TRUE', counts for each cell are randomly > downsampled instead of being scaled. This is occasionally useful > for avoiding artifacts caused by scaling count data with a strong > mean-variance relationship. Each cell is downsampled according to > the ratio between 'down.target' and that cell's size factor. > (Cells with size factors below the target are not downsampled and > are directly scaled by this ratio.) If 'log=TRUE', a > log-transformation is also performed after adding 'pseudo.count' > to the downsampled counts. > > We automatically set 'down.target' to the 1st percentile of size > factors across all cells involved in the analysis, but this is > only appropriate if the resulting expression values are not > compared across different 'normalizeCounts' calls. To obtain > expression values that are comparable across different > 'normalizeCounts' calls (e.g., in 'modelGeneVarWithSpikes' or > 'multiBatchNorm'), 'down_target' should be manually set to a > constant target value that can be considered a low size factor in > every call. >

Tim Triche (11:21:18) (in thread): > I’m digging through the code to see how the randomization happens and where, but this arose when discussing how to map bulk samples onto single cell atlases most informatively

Aaron Lun (11:22:19) (in thread): > Well, I can’t imagine that the spliced counts aremoretolerant of introns than the usual count matrix.

Tim Triche (11:22:27) (in thread): > also, DropletUtils seems to do a different thing (read-level downsampling) which I’m trying to understand (and understand whether one can use a richer bulk/deep dataset as a sort of “tech reps” in a comparison, but where the variability within the pseudo “tech reps” is actually somewhat informative about heterogeneity)

Charlotte Soneson (11:23:07) (in thread): > Agreed

Tim Triche (11:24:11) (in thread): > ultimately it seems a shame to throw out vast amounts of detail from bulk or plate-seq just because 10X/inDrop/dropSeq/microwell-seq is shallow

Aaron Lun (11:24:29) (in thread): > In any case, downsampling is kind of a nuclear option, so I wouldn’t bother unless you have a good reason for doing so.

Aaron Lun (11:24:43) (in thread): > If you’re correcting across protocols, downsampling is probably the least of your problems.

Tim Triche (11:25:56) (in thread): > probably true, and even more fun when attempting to merge spliced/unspliced in a way that doesn’t obliterate the information derived from their relationship

Aaron Lun (11:26:27) (in thread): > Though in the worst case, it is the only legitimate choice; if the shallow dataset does not have enough signal to noise to represent the heterogeneity of interest, there’s no way you can make it up - the only way to make them comparable is to discard information from the deeper dataset.

Tim Triche (11:31:04) (in thread): > which comes back to, suppose I randomly downsample Bulk A and Bulk B 10 times apiece, then project them into the space of Landscape Portrait Atlas 2000. Is it unreasonable to look at how the bulks “spread” in this way?

Aaron Lun (11:32:41) (in thread): > I would just make pseudo-bulks of the landscape portait and compare them to the bulk data like that. Nice, fast and easy, avoids coverage problems.

Tim Triche (11:34:24) (in thread): > if I want to figure out K for a matrix factorization (most particularly NMF), one way to do so is via 5xCV, where I blow away (set to NA) 20% of the matrix entries in a mutually exclusive fashion, run the decomposition at various values of K on all five, then tally the absolute or mean reconstruction error from multiplying W and H and subtracting off the original, full X. Whichever K sucks least based on mean/median/max MAE in 5xCV, we keep that one. Does this strategy work in a fashion for downsampling?

Tim Triche (11:35:10) (in thread): > the metacell approach is certainly going to do a better job of telling us where we “land” if a bulk represents a semi-homogeneous population

Tim Triche (11:35:46) (in thread): > suppose instead that you have some truly wacky tissue, say, a teratoma or a liquid tumor that can’t decide if it’s lymphoid, myeloid, erythroid, or something else entirely

Tim Triche (11:36:26) (in thread): > single-cell data from the tissue would be great but some of these tumors only show up a few times a year even at the busiest tertiary referral hospitals

Aaron Lun (11:36:52) (in thread): > Well, that’s another problem entirely. Don’t think mapping would help you there. You’d have to use the usual deconcolution methods.

Tim Triche (11:37:12) (in thread): > or perhaps we have multiple time points from clinical trials for a type of specimen that’s been seen a dozen times… ever

Tim Triche (11:37:37) (in thread): > fair, but again, deconvolve to what? References for these cell types rarely exist

Tim Triche (11:38:21) (in thread): > I guess at least the latter gives you the “anchor points” so that does address it in a way that is more satisfying. Like a convex hull, “it’s somewhere in here, and the center of mass is HERE”

Tim Triche (11:38:30) (in thread): > OK I think you’re right and this is the way to go. Thanks !

Aaron Lun (11:39:04) (in thread): > I don’t think there’s much you can do with the mapping in that case. It would be equally ambiguous.

Aaron Lun (11:40:08) (in thread): > I also didn’t understand your prior comment about cross validation to choose k; you know the required extent of downsampling to match two libraries or batches, so there’s no need to randomize that to pick an appropriate proportion to downsample to.

Tim Triche (11:40:19) (in thread): > sure, but as long as we can represent the ambiguity, that’s the goal. I expect that the downsamples would show up inside the hull most of the time anyways.

Tim Triche (11:40:46) (in thread): > yeah that’s why I was wondering about read-level sampling instead.

Tim Triche (11:41:17) (in thread): > with counts you know where they land. but that’s not really what’s coming out of the prep or off the sequencer. It’s an interpretation of where the fragments most likely came from and how many.

Charlotte Soneson (11:42:57): > Ithinkthat is correct (and running the current code suggests it to be true), but I haven’t looked in detail at all thescVelocode, and it would be worth confirming (maybe@Luke Zappiacould help us out:slightly_smiling_face:)?

Tim Triche (11:43:53) (in thread): > the thought was, suppose you want 10% of the depth of a sample. one approach is to downsample 10-fold the counts and be done with it. Another approach would be to take 10% of the unique reads, see where they land, and do this 10 times (or 1% x 100, or whatever) to see whether the pseudo-reps end up looking mostly the same or have a substantial amount of divergence between them. Somewhat of a proxy for the cells in the soup when doing it on bulk, although probably pointless for a single-cell plate-seq prep.

Aaron Lun (11:45:13) (in thread): > This would probably be less interesting than you imagine, it would be purely technical variation so the bootstrapped replicates probably be very tight. Probably will not give a useful measure of the uncertainty of the assignments in the cases you describe.

Aaron Lun (11:46:21) (in thread): > Well, it’s not just the variance, but it’s also the fact that your mapping or deconvolution might be totally biased (e.g., if you’re missing an important cell type in the reference), so this or any other approach would give you high confidence in an inaccurate result.

Tim Triche (11:49:34) (in thread): > ugh. I hate it when you’re right

Aaron Lun (11:50:16) (in thread): > I hear that a lot

Kasper D. Hansen (14:14:11): > Which package do you have these velocity helper? functions in?

Aaron Lun (14:14:28): > https://github.com/kevinrue/velociraptor

Kasper D. Hansen (14:15:21): > We have been doing some work on velocity from a methods perspective lately, but don’t have code right now. But we have been taking the methods apart (mostly scvelo)

Kasper D. Hansen (14:15:37): > We still need to put them back together again

Paul Hoffman (18:49:06): > @Paul Hoffman has joined the channel

2020-06-17

Luke Zappia (02:13:55) (in thread): > I’m happy to ask thescVelodevelopers how the different matrices are used. Probably worth making them aware ofvelociraptoranyway unless there are any objections to that?

Charlotte Soneson (02:33:52) (in thread): > Thank you! Totally agree.

Luke Zappia (06:06:44): > Reply about different the different matrices from Volker (scVeloauthor): - File (Plain Text): Untitled

Shijie C. Zheng (10:38:00): > @Shijie C. Zheng has joined the channel

Aaron Lun (11:11:06): > So I take it that if we supply the reduced dims ourselves, thenXis surplus to requirements.

2020-06-18

Aaron Lun (02:56:57): > @Luke ZappiaI think we’re pretty close re.zellkonverter. Maybe some extra effort required to protect against weirdo numpy arrays inAnnData2SCEand maybe some more care spent in convertingHDF5Arrays to the equivalenth5pyrepresentation where possible. But I think all of the major functionality is there.

Luke Zappia (08:11:27): > The package is available here if anyone is interestedhttps://github.com/theislab/zellkonverter

Ruben Dries (09:00:39): > @Ruben Dries has joined the channel

pamela himadewi (09:54:44): > @pamela himadewi has joined the channel

Davide Risso (12:13:53): > that’s awesome!

Stephanie Hicks (13:05:54): > nice!!

Tim Triche (13:28:59): > this is terrific@Luke Zappia

2020-06-19

Luke Zappia (02:22:17): > As always@Aaron Luncan take a lot of the credit for the implementation details, I mostly just set up the infrastructure so far.

Kevin Rue-Albrecht (12:24:58) (in thread): > You and me both :sweat_smile:

2020-06-21

Aaron Lun (20:07:26): > putting together the talk for BioC 2020 and was wondering what I did over the past year.

Stephanie Hicks (23:54:02): > i am very much looking forward to your talk@Aaron Lun!

Aaron Lun (23:55:29): > there shall be memes.

2020-06-22

Kevin Rue-Albrecht (04:47:35): > YOLOhttps://yihui.org/en/2019/03/yolo-karl/https://twitter.com/tam07pb915/status/1101434299369185280 - Attachment (yihui.org): The Implementation of yolo = TRUE in xaringan via yolofy() - Yihui Xie | 谢益辉 > It has been more than two years since Karthik requested the (most famous?) yolo = TRUE feature in xaringan, and I feel amused to see people still having great fun with it. The implementation was … - Attachment (twitter): Attachment > さっきからいじってるxaringanてやつ、ヘッダーの設定にyoloってのがあって、これtrueにするとランダムでおっさんの画像が挿入されるｗｗｗ腹痛いｗｗｗ

Federico Marini (05:09:13): > I do have agit yoloalias btw:smile:

Tim Triche (10:29:57): > does it rebase, merge, or obliterate

Federico Marini (13:36:05): > add -u + commit -m “yolo” + push -f origin master

Tim Triche (13:43:14): > ahahahahah that’s awesome

Rob Patro (13:45:01): > amazeballs

Aaron Lun (13:45:39): > FYISingleCellExperimentnow hasrowPairsandcolPairsif anyone wants to store row or column pairings.

2020-06-23

Aaron Lun (23:27:13): > My god. Velocyto is not easy to use if you’re not going in with Loom files.

Jared Andrews (23:37:11): > I gave it a half hearted attempt but figured it would alter my results enough down the line that it probably wasn’t worth the effort for what I was doing. Hate re-editing figures because the dimensionality reductions changed slightly.

Aaron Lun (23:37:21): > WHATEVER. I’ll just write everything to a loom file and let them sort it out.

Aaron Lun (23:37:54): > I can’t even figure out how to inject my own PCs.

Aaron Lun (23:43:34): > Well, whatever. I refuse to let their API design choices become my problem.

2020-06-24

Aaron Lun (00:06:34): > @Daniel Van Twiskby happy coincidence,colPairsandrowPairsseem to have almost the same implementation asLoomGraph. So you might be able to just use those bits instead.

Aaron Lun (00:37:43): > OMG i’ve done it. I’ve gotten velocyto to run in velociraptor.

Aaron Lun (00:37:47): > Now that was a journey.

Aaron Lun (00:37:58): > “Don’t stop. Believing.”

Aaron Lun (23:15:00): > Dammit, I just can’t figure out how to use this thing.

Aaron Lun (23:15:15): > You know what? It’s not worth it.

Aaron Lun (23:26:32): > We will also need a function to do the projection of the future states onto an existing embedding. Are people just doing some kind of nearest neighbors?

Tim Triche (23:50:05): > AFAIK that’s exactly what people are doing. Which begs numerous questions

Aaron Lun (23:55:08): > I guess the whole pile is already pretty dodgy, so what’s a bit more going to matter?

2020-06-25

Aaron Lun (00:00:15): > I can imagine at least one way of doing it slightly better, but I’ve got to say, this RNA velocity stuff doesn’t exactly inspire enthusiasm.

Aaron Lun (00:12:29): > Also, I’m observing a 1e-8 stochasticity in thescvelopseudotime results, no matter how many seeds I set in both Python and R. Have you seen this before@Charlotte Soneson?

Aaron Lun (00:14:23): > Oh. I know what it probably is. Someone’s probably turned on multithreading and we’re seeing the consequence of numerical precision differences depending on how the CPU decides to do things like summation.

Aaron Lun (00:14:29): > Great. Just great.

Aaron Lun (01:15:47): > Anyway, tests are added. Vignette is added. All we need is a function to map the velocity vectors onto arbitrary embeddings and we’re done.

Aaron Lun (01:16:02): > Well, and for@Luke Zappiato finish offzellkonverterand submit it to BioC.

Charlotte Soneson (01:42:10) (in thread): > In order to ‘project’ the velocities onto an existing embedding,scVelowould calculate a weighted average of the displacement vectors in the embedding, with weights defined by a transition matrix (https://github.com/theislab/scvelo/blob/master/scvelo/tools/velocity_embedding.py), derived from the cosine similarities between estimated velocities and displacement vectors in the original space.

Aaron Lun (01:46:18) (in thread): > Sounds like a really complicated way of just looking for neighbors.

Aaron Lun (01:46:44) (in thread): > it’s still relying on cell-cell transitions so it won’t catch movement to future states where there are no observed cells.

Charlotte Soneson (01:49:01) (in thread): > Not sure what you mean by “just looking for neighbors” here. Neighbors in which space?

Aaron Lun (01:52:43) (in thread): > neighbors of observed cells to the projected future state in the original (gene expression) space.

Aaron Lun (01:58:56) (in thread): > I’d imagine that they’d be doing this projection anyway to construct the transition matrix.

Aaron Lun (01:59:34) (in thread): > In any case, it doesn’t really matter. Let’s just slap together another function that calls the embedding function and we’ll consider the case closed.

Charlotte Soneson (02:05:19) (in thread): > Yes, I think the difference is just between considering nearest neighbors in a Euclidean sense, or instead focus on whether a cell is in the direction indicated by the velocity vector.

Lambda Moses (16:20:49): > The REALLY annoying part about that is that arrows near the end of trajectories within the dataset would point back, because there are no cells beyond the trajectories to compare to.

Aaron Lun (16:21:15): > yes, that’s exactly what I was wondering whether that would happen.

Aaron Lun (16:21:39): > OH WELL.

Aaron Lun (16:28:37): > Thinking lightly about it, it’s probably solvable with some approximations, but probably not worth the effort either.

Avi Srivastava (16:28:45): > may be we can pad imaginary cells near the end-point with some extrapolation.

Aaron Lun (16:30:12): > IMO the best solution is to repeat the UMAP/tSNE with the projected future state of the cells, where we modify the algorithm to give those cells no weight (i.e., they receive weightings from the real cells but they do not contribute back, so they’re kind of “read-only” with respect to the behavior of the algorithm). For the same seed, this will yield identical results to the previous run with the real cells only, but now you’ll have the embedded future states as well.

Aaron Lun (16:30:47): > I thought about doing this forRtsnebecause I was editing the C++ code anyway. But then I forgot about it.

Lambda Moses (16:31:34): > Sounds like a good idea

Aaron Lun (16:32:21): > The REAL problem is that I can’t look at Rtsne’s C++ code without being tempted to refactor the entire thing, which will set me back a week.

Aaron Lun (16:32:44): > And obviously, I’m not the maintainer, and Jesse’s a cool guy but there’s still a bit of give and take there.

brian capaldo (17:55:42): > I am sorry, I have to ask, when do you sleep@Aaron Lun?

Aaron Lun (17:56:19): > from about 1 to 7:30 am PDT.

Aaron Lun (17:56:57): > though past about 12 am is really just spent watching call of duty warzone videos.

brian capaldo (18:01:37): > Fair enough. Alright, back to scATAC for mw

2020-06-26

Aaron Lun (01:49:20): > Embedding is done.

Tim Triche (13:00:49): > that was fast

Tim Triche (13:03:39) (in thread): > I wonder if Brad’s principal curves embedding would catch this better and allow for something resembling extrapolation without it being complete BS

Aaron Lun (13:05:02): > well, I was just calling scVelo’s function, I didn’t bother doing the fancy thing described above.

Tim Triche (13:07:33): > still, it’s the useful bit – going to use it today in fact

Tim Triche (13:08:31): > there is a quirk with scVelo stochastic vs. dynamical thatmightbe relevant for this, in that dynamical is initialized differently from stochastic, but really they should be initialized the same. need to write up the details about this but it came up in lab journal club when we reviewd the paper

2020-06-27

Aaron Lun (19:48:34): > Some perspectives:https://github.com/LTLA/scRNAseq/issues/15

Jared Andrews (20:57:08): > Oh, had no idea monocle3 uses SCE now.

2020-06-28

Stephanie Hicks (15:05:51): > Thanks for writing out the perspective@Aaron Lun

Tim Triche (20:52:59): > yeah the Big Ball of Mud design pattern gets a lot of mileage within Seurat. Thanks@Aaron Lunfor taking a stand. Some of this stuff is ridiculous.

2020-06-29

Lukas Weber (13:41:42): > @Lukas Weber has joined the channel

2020-06-30

Frank Rühle (06:21:15): > @Frank Rühle has joined the channel

2020-07-01

000 (09:00:31): > @000 has joined the channel

2020-07-02

brian capaldo (11:25:29): > Apologies if this is the wrong place for this, but is there a SingleCellExperiment approach for scATAC yet? Getting really sick of having to look up how to access slots every time for CDS and Seurat.

Aaron Lun (13:01:43): > I’ve never tried it, but I bet you could probably just pretend it’s RNA-seq and ram it through the usual pipeline.

Aaron Lun (13:59:31): > I don’t forsee any reason thatwouldn’twork, especially if the difficult task of quantifying the reads has already been done.

Tim Triche (14:48:14): > chromVAR produces something like that – we used it for early (Fluidigm) scATAC in an SE

Aaron Lun (15:43:28): > @Luke ZappiaI think it’s time.

2020-07-03

Luke Zappia (02:49:50) (in thread): > That’s scarily ominous:ghost:. I assume you mean to submitzellkonverter?

Aaron Lun (04:18:06) (in thread): > damn straight

Aaron Lun (04:18:51) (in thread): > you should add some tests for the new Pairs, but that’s the only thing I can think of.

Luke Zappia (04:43:35) (in thread): > Is it worth waiting until the changes to the submission system have been sorted out?

Aaron Lun (14:53:14) (in thread): > I dunno, you can ask.

Lambda Moses (16:22:49) (in thread): > Then I wonder why Seurat is more popular than SingleCellExperiment

Aaron Lun (16:31:19) (in thread): > For much the same reason that the tidyverse is popular despite being an inferior programming paradigm.

Aaron Lun (16:37:56) (in thread): > I should also add that this popularity is only at the user level. Do a rev-dep count to see how many packages have a hard dependency on Seurat compared to the SCE.

Aaron Lun (16:38:11) (in thread): > (Hard being Imports or Depends.)

2020-07-04

Umar Ahmad (08:20:42): > @Umar Ahmad has joined the channel

2020-07-05

Freeman Wang (22:08:13): > @Freeman Wang has joined the channel

2020-07-06

Aaron Lun (02:54:44) (in thread): > Just do it, I would say.

Aaron Lun (02:55:48): > Need a new package name for some clustering routines that I’m going to spin out ofscran.

Aaron Lun (02:57:16): > Probably “sclub”.

Aaron Lun (02:57:28): > Package Title will be “Ain’t no cluster like a single-cell cluster”.

Aaron Lun (03:05:33): > Oh wait. I’ve got it.

Kevin Rue-Albrecht (03:11:16): > How aboutscut? > Looks like a clusters + even a couple of potential pseudotime branches in there - File (PNG): image.png

Kevin Rue-Albrecht (03:12:14) (in thread): > I can somewhat picture a sticker on that theme

Aaron Lun (03:12:27): > Title will be “It’s Clustering Functions, Silly!”

Constantin Ahlmann-Eltze (03:57:31): > Hey@Aaron Lun, you seem to extract a lot of functionality from scran. Is there any chance thecalculateSumFactors()will get its own package? :)

Luke Zappia (05:47:23) (in thread): > So I’ve just had a better look atsceasy (which I should probably have done earlier) and I feel like we have just replicated what they have. Is there a good reason for doing this rather than focusing our efforts in one place?

Tim Triche (12:02:41): > schlubis obviously the right choice. single-cell hierarchical / linearized unsupervised brachpoints

Lambda Moses (13:24:26) (in thread): > Honestly, while I mostly use Seurat for EDA, I’m also quite uncomfortable putting Seurat in “Imports” in my packages. “Suggests” might be more tolerable.

Aaron Lun (22:15:28) (in thread): > (i) I’ve nagged sceasy to get their stuff onto BioC, to no avail. It seems that they are very far off, judging by the state of the repo (no man, no tests, no vignette). > (ii) scope of zellkonverter is limited to SCE/AnnData conversion, and thus avoids excessive dependencies for downstream packages. > (iii) basilisk.

Aaron Lun (22:16:19) (in thread): > I could well imagine that sceasy would depend on zellkonverter, in fact. It’s a matter of who gets there first.

2020-07-07

Pablo Latorre-Doménech (03:16:07): > @Pablo Latorre-Doménech has joined the channel

Luke Zappia (03:50:09) (in thread): > :+1:Those seem like reasonable things to me. I just don’t like duplicating things but I suppose there is enough motivation in this case.

Mehdi Pirooznia (09:25:25): > @Mehdi Pirooznia has joined the channel

Aaron Lun (11:48:48) (in thread): > Also a reminder that 19 is an excellent candidate for a squash merge.

2020-07-08

Luke Zappia (04:29:53) (in thread): > FYI I have some stuff on today and tomorrow but I’m planning on submitting on Friday if there’s anything you want to get in before then.

Charlotte Soneson (10:32:09) (in thread): > This was a warning bell for me: > > To use sceasy ensure the anndata package (version has to be < 0.6.20) is installed

Charlotte Soneson (10:33:37) (in thread): > Btw, do you have an idea whyvelociraptorwill suddenly fail to install for me complaining that > > **** using non-staged installation via StagedInstall field > Error in initialize(value, ...) : object '.AnnDataDependencies' not found > Calls: <Anonymous> ... BasiliskEnvironment -> new -> initialize -> initialize > > I have.AnnDataDependenciesin the session even (I thought it might have been a renaming or so): > > > .AnnDataDependencies > [1] "anndata==0.7.3" "h5py==2.10.0" "hdf5==1.10.5" "natsort==7.0.1" > [5] "numpy==1.18.5" "packaging==20.4" "pandas==1.0.4" "scipy==1.4.1" > [9] "sqlite==3.30.1" >

Luke Zappia (10:54:02) (in thread): > No, sorry. I haven’t triedvelociraptoryet. I guess you could try reinstallingzellkonverter? It’s definitely called that though…

Charlotte Soneson (10:54:43) (in thread): > Thanks. Just installedzellkonverternow from GitHub, so it should be up-to-date.

Aaron Lun (11:14:30) (in thread): > If you’re roxygenating, you need to manually put in.AnnDataDependenicesthe first time, because roxygen builds the NAMESPACE by running the source code.

Charlotte Soneson (11:16:19) (in thread): > I’m installing both packages from GitHub.zellkonverterexports.AnnDataDependencies, andvelociraptorimports it in the respective NAMESPACEs.

Aaron Lun (11:17:52) (in thread): > then there shouldn’t be any problems.

Charlotte Soneson (11:20:17) (in thread): > Yeah…ok, will continue digging. Thx

Michael Love (15:06:58): > Random comment: i’ve been working with a student who has little R or bioinformatics experience, and never heard of single cell until now, using the OSCA online book, and it is really amazing to see how much can be done so quickly. > > Thanks to all the hard work from this group in making these packages and classes clean, simple to use, and then putting together all the material onto the website. > > I told the student that making a UMAP from a single cell dataset is basically one figure toward a paper so they are on their way:rolling_on_the_floor_laughing:

Tim Triche (15:25:19): > seconding this – three interns have gone from 0 to pretty damn good so far this summer

2020-07-10

Rajesh Shigdel (03:42:39): > @Rajesh Shigdel has joined the channel

Aaron Lun (12:32:19): > @Luke Zappianice.

Aaron Lun (12:32:33): > But FYI (emphasis mine): > > The mandatory ‘Description’ field should give acomprehensivedescription of what the package does. One can use several (complete) sentences, but only one paragraph. It should be intelligible to all the intended readership (e.g. for a CRAN package to all CRAN users).**** It is good practice not to start with the package name, ‘This package’ or similar. ****As with the ‘Title’ field, double quotes should be used for quotations (including titles of books and articles), and single quotes for non-English usage, including names of other packages and external software. This field should also be used for explaining the package name if necessary. URLs should be enclosed in angle brackets, e.g. ‘<https://www.r-project.org>’: see alsoSpecifying URLs. - Attachment (cran.r-project.org): Writing R Extensions > Writing R Extensions

2020-07-11

Aaron Lun (01:58:39): > @Kevin Rue-Albrechtwe should polish up anything that needs to be polished up withvelociraptorand get it in the system.

2020-07-12

Andrew McDavid (23:15:44): > Is anyone aware of an “right join” like operator forSingleCellExperimentimplemented anywhere? Basically I want to pad my assays and rowData with NAs (eventually to be zeros) for all features in x that are not in y. Use case is for cell type classification usingSingleR, with a bunch of experiments where I only have access to filtered data, and invariant (null) genes have been filtered out on a per-experiment basis. In this case, null is highly informative, so taking the intersection of all the common features is not working well.

Aaron Lun (23:17:04): > Well, I don’t think you would do well with filling with zeroes, either, TBH.

Aaron Lun (23:22:33): > SingleR treats lack of expression in zeroes as relevant information, so there’s replacing NA’s with zeroes would be making some kind of statement.

Aaron Lun (23:22:49): > Why don’t you just do the annotation on each experiment separately without combining stuff beforehand?

Andrew McDavid (23:26:57): > I have one experiment I am training on, and then want to classify a bunch of other experiments, AFAIK, I have to take the common intersection of everything if I want have the classifier be stable for all the experiments.

Andrew McDavid (23:28:28): > I am comfortable assuming NA = 0 because of how I am (assuming…) the filtered expression matrices were created. of course, it would be better to get raw everything, but I am dealing with various tech-challenged collaborators.

2020-07-13

Luke Zappia (02:36:31) (in thread): > BiocCheckwas complaining it wasn’t long enough so I edited the description and probably broke this. I’ll try and fix it for the next version.

Aaron Lun (02:57:14): > I wouldn’t worry about the concept of stability, just letSingleRtake the intersection of your training dataset and eachindividualtest dataset (but not all at the same time). You’ll still get the annotations; you would only need a common set of genes if you wanted to compare the scores but that doesn’t make much sense anyway. Or, if in some situation it did make sense, and an important marker in one dataset is no longer present in the feature set of another dataset, then filling with zeroes isn’t going to help.

Aaron Lun (03:07:54): > I mean, it would only make sense to fill with zeros if the gene doesn’t exist. Like you’re working with different species or something. Seems a stretch to say that all non-HVGs have all-zero expression.

Aaron Lun (03:26:47): > but anwyay, to answer the immediate question, the usual approach is to subset with padded rows and then replace the row identities. > > expanded <- sce[c(1,1,1,1,seq_len(nrow(sce))),] > assay(expanded)[1:4,] <- 0 > # update rowRanges as required. >

Genevieve Stein-O’Brien (10:15:02): > @Genevieve Stein-O’Brien has joined the channel

Melanie Loth (10:58:13): > @Melanie Loth has joined the channel

Andrew McDavid (16:25:06) (in thread): > I don’t understand this comment. I have a setting where I am stuck with objects test_filt wheretest_filt = test[rowSums(assay(test))>0,], say. But both test_filt and train are CellRanger hg19 experiments, from the same organism and mapped to the same reference. So the “missing” rows fromtest_filtare not unknown or unmeasured, they really were zero but were just filtered upstream of me. If I understand the principles behind SingleR, these genes could be informative, especially if they are markers of some of my cell subsets intrain.

Aaron Lun (16:26:10) (in thread): > i thought you said they were filtered as HVGs.

Andrew McDavid (16:27:17) (in thread): > Filtered because they were unexpressed in a particular experiment.

Andrew McDavid (16:28:02) (in thread): > Thanks, that should do the trick…I will report back if this makes a substantive difference.

2020-07-14

Ashu Sethi (15:50:59): > @Ashu Sethi has joined the channel

2020-07-15

Jessica (09:43:57): > @Jessica has joined the channel

Aaron Lun (18:55:35): > If someone’s up for it, one could build a package around fit-SNE with: > > library(basilisk) > > # Set-up: > fit.env <- BasiliskEnvironment('fitsne', pkgname="scran", > packages=c("opentsne==0.4.3")) > basiliskStart(fit.env) > > m <- matrix(rnorm(2000000), ncol=20) # made-up. > oot <- reticulate::import("openTSNE") > stuff <- oot$TSNE(perplexity=30, n_jobs=8L) > system.time(res <- stuff$fit(m)) >

Aaron Lun (19:05:37): > 100 seconds for 100k cells, not bad.

Aaron Lun (19:06:13): > If you’re wondering why I’m using openTSNE instead of the fitsne package, it’s because the latter doesn’t have Windows wheels.

Alan O’C (19:22:32): > Sounds intriguing. I got partway through refactoring the original fitsne C++ but gave up, it was a bit too clever for me.

Aaron Lun (19:23:02): > I wouldn’t have minded… but then they said they were distributing prebuilt DLLs for windows, and I was like, NO.

2020-07-16

wmuehlhaeuser (03:03:15): > @wmuehlhaeuser has joined the channel

Pedro Baldoni (04:09:43): > @Pedro Baldoni has joined the channel

Alan O’C (21:21:34): > Well I came up with a clever package name so now I can’t resist. Also nice that it allows embedding of new points

Aaron Lun (21:22:48): > Don’t see any new repos on your feed

Aaron Lun (21:26:22): > what did you call it?

Alan O’C (21:27:01): > Still fighting to find a nice way to overwrite py_to_r in some cases and not others

Alan O’C (21:27:08): > snifter

Aaron Lun (21:27:29): > What is this for?

Alan O’C (21:28:54): > Thepy_to_rshenanigans? To handle embedding new data. The TSNE embedding inherits numpy.ndarray

Aaron Lun (21:29:45): > is this on a repo somewhere?

Alan O’C (21:33:06): > Is nowhttps://github.com/Alanocallaghan/snifter

Aaron Lun (21:34:32): > First point is that, inside a package, the reticulate code afterbasiliskStartshould be wrapped inbasiliskRun.

Aaron Lun (21:35:27): > Second point is that theBasiliskEnvironmentconstruction should live inbasilisk.R, see the vignette for deets.

Aaron Lun (21:37:42): > Third point is that I don’t really see the point of all the top-level S3 stuff, there don’t seem to be other types that are relevant here.

Alan O’C (21:39:40): > Yeah my bad, was based on a 30 second “Can I get this to run” scan of the basilisk docs

Aaron Lun (21:40:05): > I look forward to seeing it when you’re ready.

Alan O’C (22:45:38): > I thought there was a distinction between the return value offit()and the embedding object, turns out that’s not the case. Seems to play reasonably nicely now dispatching on the python class

2020-07-17

Chitrasen (12:35:00): > @Chitrasen has joined the channel

Aaron Lun (15:29:51): > why is it called snifter, anyway?

Aaron Lun (15:52:09): > @Alan O’CI’m just going to list all the suggestions as issues, easier to keep track of them.

Alan O’C (15:59:25): > Anagram of R fi tsne meaning small drink of liquor

Alan O’C (15:59:38): > I’m biased in that I really like the word

Alan O’C (15:59:54): > Yeah issues make the most sense, thanks!

2020-07-19

Charlotte Soneson (15:25:32) (in thread): > > **** using non-staged installation via StagedInstall field > Error in initialize(value, ...) : object '.AnnDataDependencies' not found > Calls: <Anonymous> ... BasiliskEnvironment -> new -> initialize -> initialize > > Still/again trying to understand whyvelociraptorfails to install on our server…was just thinking - could it make any difference ifBASILISK_USE_SYSTEM_DIR=1? That’s the setup where it’s failing (it installs fine elsewhere, but there are also other differences).

Aaron Lun (16:48:11) (in thread): > oooh. Yes. Try adding azellkonverter::to that.

Aaron Lun (17:33:28) (in thread): > The cause is thatconfigurewill try to runbasilisik.RBEFORE velociraptor is even installed, meaning that it doesn’t ever actually import zellkonverter symbols.

2020-07-20

Dr Awala Fortune O. (02:38:58): > @Dr Awala Fortune O. has joined the channel

Dr Awala Fortune O. (02:39:28): > hello everyone

Charlotte Soneson (04:26:49) (in thread): > Aah yes, that did it:+1:. I added this to the current PR (vignettebranch) - which now also passes the GHA checks.

Ting Sun (06:51:06): > @Ting Sun has joined the channel

Jennifer Doering (09:59:45): > @Jennifer Doering has joined the channel

Alexandra Garnham (19:18:56): > @Alexandra Garnham has joined the channel

2020-07-21

Manoj Teltumbade (00:14:41): > @Manoj Teltumbade has joined the channel

James MacDonald (15:48:53): > @James MacDonald has joined the channel

2020-07-22

Anke Busch (03:54:45): > @Anke Busch has joined the channel

Dr Isha Goel (09:34:35): > @Dr Isha Goel has joined the channel

Kurt Showmaker (11:47:33): > @Kurt Showmaker has joined the channel

2020-07-23

Bishoy Wadie (03:03:19): > @Bishoy Wadie has joined the channel

Biljana Stankovic (05:10:48): > @Biljana Stankovic has joined the channel

Mindy (11:29:34): > @Mindy has joined the channel

Aaron Lun (23:58:58): > @Peter Hickeywhat happened to the cellbench data!?

2020-07-24

Peter Hickey (00:19:00) (in thread): > life?

Peter Hickey (00:19:05) (in thread): > will chase up after bioc

Aaron Lun (00:22:51) (in thread): > yeah i used to have one of those

Peter Hickey (00:31:38) (in thread): > can recommend finding it again. 5 stars

Aaron Lun (00:32:57) (in thread): > probably still in the drawer at wehi.

2020-07-26

Subhajit Dutta (01:04:59): > @Subhajit Dutta has joined the channel

2020-07-27

Isha Goel (07:35:37): > @Isha Goel has joined the channel

CristinaChe (08:42:49): > @CristinaChe has joined the channel

Noor Pratap Singh (08:43:14): > @Noor Pratap Singh has joined the channel

Helen Horkan (08:43:18): > @Helen Horkan has joined the channel

Arun Chavan (12:09:35): > @Arun Chavan has joined the channel

Will Townes (12:28:27): > @Will Townes has joined the channel

Will Townes (12:29:45): > @Will Townes has left the channel

2020-07-28

jackgisby (09:06:37): > @jackgisby has joined the channel

Ray Su (10:53:08): > @Ray Su has joined the channel

Rajiv Kumar Tripathi (16:33:24): > @Rajiv Kumar Tripathi has joined the channel

2020-07-29

Riyue Sunny Bao (17:40:05): > @Riyue Sunny Bao has joined the channel

Brianna Barry (21:33:18): > @Brianna Barry has joined the channel

2020-07-30

Nastasja (03:28:19): > @Nastasja has joined the channel

beyondpie (10:11:48): > @beyondpie has joined the channel

Tim Howes (12:30:10): > @Tim Howes has joined the channel

Ayush Raman (12:42:14): > @Ayush Raman has joined the channel

Hyun-Hwan Jeong (18:44:06): > @Hyun-Hwan Jeong has joined the channel

2020-07-31

sani (09:41:35): > @sani has joined the channel

bogdan tanasa (13:57:31): > @bogdan tanasa has joined the channel

2020-08-01

Nick Borcherding (11:39:57): > @Nick Borcherding has joined the channel

2020-08-02

Antonio Colaprico (00:11:57): > @Antonio Colaprico has joined the channel

2020-08-03

Aaron Lun (03:09:42): > @Philippe BoileauI’m going to try using scPCA for cell cycle effect removal, don’t know if you have any opinions on this.

Philippe Boileau (03:09:47): > @Philippe Boileau has joined the channel

Sunil Nahata (04:08:42): > @Sunil Nahata has joined the channel

Lara Ianov (11:39:20): > @Lara Ianov has joined the channel

Junyan Xu (12:58:19): > @Junyan Xu has joined the channel

Aaron Lun (22:27:43): > I guess he didn’t.

Philippe Boileau (23:20:49): > Hey@Aaron Lun! Sorry for the delay. I think this could be an interesting application of cPCA/scPCA. I’m happy to discuss if you’d like.

Philippe Boileau (23:29:44) (in thread): > I also recommend settingalg = "rand_var_proj"if running scPCA. It’s significantly faster than the other methods.

Aaron Lun (23:31:03) (in thread): > I actually slapped together a proof-of-concept here:https://github.com/Bioconductor/OrchestratingSingleCellAnalysis/pull/35.

Aaron Lun (23:31:20) (in thread): > It’s really just one paragraph mentioning how it could be done, there’s not a lot more than that

Aaron Lun (23:32:34) (in thread): > I need to dig up a better example, though, because the 416B dataset has something else going on that is correlated with the cell cycle, which makes it seem like the cPCA had no effect.

2020-08-04

Philippe Boileau (00:21:21) (in thread): > Oh, awesome! the paragraph looks good to me. I can take a crack at fixing the issues you’ve pointed out over the weekend.

Philippe Boileau (00:22:31) (in thread): > And yeah, the assumption that cPCA and scPCA make about the decomposition of signal and noise in the target and background can be quite restrictive. Finding an appropriate background dataset isn’t always easy.

Philippe Boileau (00:23:18) (in thread): > I’m happy to help look for an example, too.

Aaron Lun (00:56:45) (in thread): > great, that would help a lot. If it goes well, I can see (s)cPCA being a very cool way to tackle that particular problem.

Philippe Boileau (01:44:30) (in thread): > Great, I’ll keep you posted!

Aaron Lun (01:47:18) (in thread): > Also, if you find a nice demonstration dataset, let’s see if we can shove it intoscRNAseq.

Philippe Boileau (12:15:22) (in thread): > Sounds good to me.

rohitsatyam102 (14:29:00): > @rohitsatyam102 has joined the channel

Peter Hickey (17:34:14): > anyone know of a package that uses theLinearEmbeddingMatrixclass?

Aaron Lun (17:34:46): > I think coGaps was using it at some point.

Peter Hickey (17:36:03) (in thread): > indeedhttps://github.com/FertigLab/CoGAPS/search?q=LinearEmbeddingMatrix&unscoped_q=LinearEmbeddingMatrix

Peter Hickey (17:36:29) (in thread): > guessing you’re not finding it useful/necesssary?

Aaron Lun (17:37:18) (in thread): > No, I don’t think it even belongs there. I would love to move it somewhere else.

2020-08-05

Hans-Rudolf Hotz (03:23:29): > @Hans-Rudolf Hotz has joined the channel

shr19818 (13:46:09): > @shr19818 has joined the channel

2020-08-06

Aaron Lun (20:07:09): > yes, that’s probably mine.

2020-08-07

Mikhail Dozmorov (20:02:08): > @Mikhail Dozmorov has joined the channel

2020-08-09

bogdan tanasa (02:13:21): > Dear all, my apologies in advance as I am posting the question below also here in the forum (it is about Monocle3 that is built on SingleCellExperiment ), and if some people find it not-appropriate please forgive me, i am very pressed by time, i really do not intend to offend anyone, and i believe that you have the expertise ; i have not heard yet from the authors of Monocle3) : i would like to understand better please how i could extract the genes that are differentially expressed atSPECIFIC TIME POINTS (on a TRAJECTORY) in SPECIFIC CLUSTERS/PARTITIONS ; for example considering the example in :https://cole-trapnell-lab.github.io/monocle3/docs/differential/i have done : > > ############################################### > > colData(cds)$clusters = clusters(cds) > colData(cds)$partitions = partitions(cds) > colData(cds)$pseudotime = pseudotime(cds) > > ############################################### > > partition1_cds <- cds[, colData(cds)$partitions == 1] > > cds_subset = partition1_cds > > ############################################### > > subset_pr_test_res <- graph_test(cds_subset, neighbor_graph="principal_graph", cores=4) > > pr_deg_ids <- row.names(subset(subset_pr_test_res, q_value < 0.05)) > > ############################################### > > the question being :how can i find the DE expressed genes along a trajectory associated with partition1, at each time bin : > > unique(colData(cds_subset)$embryo.time.bin) > [1] 330-390 170-210 210-270 > 650 270-330 390-450 450-510 510-580 130-170 > [10] 580-650 > > many thanks ! - Attachment (cole-trapnell-lab.github.io): Monocle 3 > Monocle - A powerful software toolkit for single-cell analysis

2020-08-11

Aaron Lun (12:25:06): > ¯*(ツ)*/¯

Aaron Lun (12:25:07): > tradeSeq.

bogdan tanasa (12:51:10): > yes, thank you Aaron, i have been talking with the authors of TradeSeq, and they are wonderfully supportive

bogdan tanasa (12:54:28): > i am running both algorithms on our scRNAseq data (for some reasons, the lab that has generated the data was a bit more inclined towards Monocle2/Monocle3). Nevertheless, tradeSeq has been working well, and on the side note, i shall mention that it looks that the area of measuring the differential state across trajectories is open to new algorithms (and questions :)

2020-08-17

Ying Xu (05:25:05): > @Ying Xu has joined the channel

Aaron Lun (19:02:55): > Is anyone from the ArchR dev team here?

2020-08-18

Daniel Baker (09:00:41): > @Daniel Baker has joined the channel

2020-08-20

Avi Srivastava (09:36:25): > https://github.com/k3yavi/SingleCellExperimentdo we support rust in bioconductor ?:stuck_out_tongue_winking_eye:

Aaron Lun (11:06:45): > it… could be done, with sufficient conda-fu.

Daniel Baker (11:53:06): > @Daniel Baker has left the channel

2020-08-21

Aaron Lun (13:04:04): > @Luke ZappiaFYI if you’re going to be uploading a lot of H5AD datasets for zellkonverter testing, in the long term you might consider the approach used byhttps://github.com/LTLA/DropletTestFiles/; I’m using that for DropletUtils testing so that I don’t have to add all the required files to DropletUtils itself.

Aaron Lun (13:12:51): > big files, small files, multiple versions of the same files; lots of opportunities for some really thorough testing

2020-08-23

FelixErnst (11:19:52): > @FelixErnst has joined the channel

2020-08-24

Evgeniy Rumynskiy (07:12:42): > @Evgeniy Rumynskiy has joined the channel

Aaron Lun (23:55:59): > @Kevin Rue-Albrechtlet’s get velociraptor into the submission queue.

2020-08-25

Kevin Rue-Albrecht (08:26:44) (in thread): > I’m looking at it now. I don’t have a clue yet what’s causing theKilledsignal on GHA. Any chance we exceed memory allowance?

Kevin Rue-Albrecht (08:51:25) (in thread): > Watching memory usage locally, I saw it shoot above 11 GB.

Kevin Rue-Albrecht (08:52:20) (in thread): > Fromhttps://docs.github.com/en/actions/reference/virtual-environments-for-github-hosted-runners#supported-runners-and-hardware-resources > > Each virtual machine has the same hardware resources available. > > 2-core CPU > > 7 GB of RAM memory > > 14 GB of SSD disk space

Kevin Rue-Albrecht (08:57:32) (in thread): > I can only think of downsampling right now, but that’s not very nice to do to the dataset in the vignette

Kasper D. Hansen (10:07:48): > This package is super useful to us

Kevin Rue-Albrecht (10:41:11): > So.. I’ve cleaned upvelociraptoras much as possible for submission, that said: > 1. We have 2 issues that fail our GHA:#12which has been around since we have a vignette (most likely due to exceeding memory allowanced on GHA), and#13asysreqPython-related issue which popped up when I pushed my unrelated cleanup commits today > 2. I know it’s almost there, but we still needzellkonverterto be accepted and added to bioc-devel, as BiocCheck throws an ERROR for packages not on CRAN or Bioc. > 3. A few BiocCheck warnings that are easy to solve with a bit of@Aaron Lun’s prose, e.g.* WARNING: Description field in the DESCRIPTION file is too concise

Alan O’C (10:43:25) (in thread): > Issue 13 is only with the bioc image, if you use regular ubuntu it’s not a problem. I don’t know why the apt repos are different

Kevin Rue-Albrecht (10:46:23) (in thread): > argh thanks.. do you have/know of a GHA workflow that gives an example workaround?

Alan O’C (10:47:31) (in thread): > Herehttps://github.com/Alanocallaghan/snifter/blob/master/.github/workflows/R-CMD-check.yamlBut the relevant bit is just the ubuntu OS entry:- { os: ubuntu-latest, r: 'release', bioc: 'devel'}

Kevin Rue-Albrecht (10:50:05) (in thread): > Thanks. I’ve hoped between so many GHA workflows over the last year that I’ve lost track of what “best practices” are… I thought the bioc image was supposed to be the closest thing to the bioc build system

Alan O’C (10:52:20) (in thread): > Yeah it’s a bit odd. ideally sysreqs and/or the bioc docker image should get patched but this is at least a workaround for now

Aaron Lun (11:06:10): > 2. shoudl be solved as we’re just waiting for zellkonverter to build on BioC, I believe.

Aaron Lun (11:06:42): > or maybe not, I got confused with snifter.

Aaron Lun (11:07:29) (in thread): > must be because of the densification for the velocity vectors.

Kevin Rue-Albrecht (11:09:58): > Speaking of snifter,@Alan O’Chad to run GHA on ubuntu directly instead of the bioc image. That’s my backup plan but I’m checking with Nitesh if we can get the image to work

Kevin Rue-Albrecht (11:17:47) (in thread): > does scvelo run on python 2?

Kevin Rue-Albrecht (11:18:05) (in thread): > nitesh and I both found:https://github.com/r-hub/sysreqsdb/blob/ae2f753971798547b292b1a92f9ced48b4e0cffb/sysreqs/python.json#L7

Aaron Lun (11:18:31) (in thread): > basilisk shouldn’t even be provisioning a python 2 environment

Kevin Rue-Albrecht (11:19:48) (in thread): > I’m not sure exactly howsysreqworks out the chain of dependencies, but that the only hit I got forpython-minimalin that repo

Kevin Rue-Albrecht (11:21:10): > From@Nitesh Turaga > > ok, with regards to sysreqs …here is what I think is happening, > > > > 1.https://packages.ubuntu.com/xenial/python-minimal: python-minimal is from Ubuntu Xenial 16.04 (two releases ago!!! )…so the sysreqs package is not generating the correctapt-get install <command>. > > 2.https://github.com/r-hub/sysreqsdb/blob/master/platforms/linux-x86_64-ubuntu-gcc.jsonThis is proved by this…. > > 3. It gets the requirement from 16.04 and tries to install it on 20.04 which doesn’t exist. - Attachment (packages.ubuntu.com): Ubuntu – Details of package python-minimal in xenial > minimal subset of the Python language (default version)

Nitesh Turaga (11:25:08): > For what it’s worth, (i’m ok being wrong here) …I think this should behttps://github.com/kevinrue/velociraptor/actions/runs/223748967/workflow#L33-L37 > > sudo apt-get install python2-minimal ## not sure if you need this, bioconductor_docker has python2 and python3 installed > sudo apt-get install pandoc-citeproc > sudo apt-get install -y devscripts >

Alan O’C (12:00:01): > Good catch. sysreqs getspython-minimalfrom this filehttps://github.com/r-hub/sysreqsdb/blob/ae2f753971798547b292b1a92f9ced48b4e0cffb/sysreqs/python.json

Alan O’C (12:01:28): > I guess you can probably remove sysreqs if using the docker image? Presumably it’s got everything you need

Aaron Lun (12:01:51): > can someone point me to the actual error?

Alan O’C (12:03:03) (in thread): > https://github.com/Alanocallaghan/snifter/runs/957942685?check_suite_focus=true > > Reading state information... > Package python-minimal is not available, but is referred to by another package. > This may mean that the package is missing, has been obsoleted, or > is only available from another source > However the following packages replace it: > python2-minimal >

Aaron Lun (12:03:32) (in thread): > ok, good, it’s not an inherent basilisk problem.

Aaron Lun (12:03:42) (in thread): > that is all I wanted to know.

Alan O’C (12:04:03) (in thread): > Yeah it’s just an apt problem with sysreqs and modern ubuntu

Kevin Rue-Albrecht (12:07:20): > Progress. > Now I run into a new python error:https://github.com/kevinrue/velociraptor/runs/1027268633?check_suite_focus=true#step:8:652 > > ImportError: cannot import name 'concat' from 'anndata' (/github/home/.cache/basilisk/1.1.9/velociraptor-0.99.0/env/lib/python3.7/site-packages/anndata/*_init_*.py) >

Aaron Lun (12:09:28): > I have even less idea what is going on here.

2020-08-26

Iwona Belczacka (03:54:37): > @Iwona Belczacka has joined the channel

2020-08-27

FelixErnst (04:23:16) (in thread): > I guess the problem is with differences in ubuntu18.04and20.04. python-minimal was renamed to python2-minimal. Seehttps://pkgs.org/search/?q=python-minimalvs.https://pkgs.org/search/?q=python2-minimal. python-minimal doesn’t seam to be available in ubuntu20.04…

2020-08-28

Aaron Lun (15:41:29): > 82 dependencies on SCE!

Aaron Lun (15:41:52): > Let’s add one more;@Kevin Rue-Albrechtcan you throw velociraptor into the build system?

Aaron Lun (15:42:02): > https://bioconductor.org/packages/devel/bioc/html/zellkonverter.htmlis now available - Attachment (Bioconductor): zellkonverter (development version) > Provides methods to convert between Python AnnData objects and SingleCellExperiment objects. These are primarily intended for use by downstream Bioconductor packages that wrap Python methods for single-cell data analysis. It also includes functions to read and write H5AD files used for saving AnnData objects to disk.

Kevin Rue-Albrecht (16:03:54): > have you had a pass at the package? i’ve bumped up version etc, but GHA still doesn’t pass check (and thus doesn’t get to BiocCheck either)

Aaron Lun (16:04:22): > does it pass locally?

Kevin Rue-Albrecht (16:04:51): > last time I checked, yes, i’m trying again now

Kevin Rue-Albrecht (16:07:28): > > * WARNING: Description field in the DESCRIPTION file is too concise >

Kevin Rue-Albrecht (16:08:05): > either throw me another sentence here, otherwise, i just kicked off branchbioccheck

Kevin Rue-Albrecht (16:09:32): > uh oh… now i’m getting the same error duringcheckas the GHA > > E> Quitting from lines 64-67 (userguide.Rmd) > E> Error: processing vignette 'userguide.Rmd' failed with diagnostics: > E> ImportError: cannot import name 'concat' from 'anndata' (/Users/kevin/Library/Caches/basilisk/1.1.9/velociraptor-0.99.0/env/lib/python3.7/site-packages/anndata/*_init_*.py) >

Kevin Rue-Albrecht (16:10:11) (in thread): > I guess it’sconcatenatenowhttps://anndata.readthedocs.io/en/stable/anndata.AnnData.concatenate.htmlNot sure yet where to fix that

Kevin Rue-Albrecht (16:12:11) (in thread): > thoughhttps://github.com/theislab/anndata/blob/77a633cc825b428d02dabdf75491841c3a175a8d/anndata/tests/test_concatenate.py#L16

Kevin Rue-Albrecht (16:14:01) (in thread): > perhaps time to bump toscvelo==0.2.2

Kevin Rue-Albrecht (16:18:24) (in thread): > hm.. doesn’t seem to help either

Kevin Rue-Albrecht (16:39:30) (in thread): > Seehttps://anndata.readthedocs.io/en/latest/#id1

Kevin Rue-Albrecht (16:41:57) (in thread): > for some reason version conda still installs version 0.7.3

Kevin Rue-Albrecht (16:42:07) (in thread): > probably just needs another while

Kevin Rue-Albrecht (16:50:06) (in thread): > argh no I understand now,zellkonverter::.AnnDataDependenciesdeclaresanndata==0.7.3

Kevin Rue-Albrecht (16:51:46): > anyway.. bottom line is that there are a few bugs to fix before submitting to bioc i think

Kevin Rue-Albrecht (16:52:44) (in thread): > alright, it seems that manually runningconda install anndata==0.7.4and then running R CMD check again (locally) gets the vignette to pass again (locally)

Aaron Lun (16:56:07) (in thread): > Is this a problem with zellkonverter?@Luke Zappia?

Kevin Rue-Albrecht (16:56:22) (in thread): > yep, i’m sending in a PR in 3…2…1

Aaron Lun (16:57:56) (in thread): > How did zellkonverter pass its R CMD Check, then?

Kevin Rue-Albrecht (17:01:38) (in thread): > I don’t know.

Kevin Rue-Albrecht (17:01:53) (in thread): > How do you manage (delete) basilisk environments?

Aaron Lun (17:02:40) (in thread): > basilisk::clearObsoleteDirshould do the job, though this is usually handled automatically.

Kevin Rue-Albrecht (17:02:41) (in thread): > To clear my slate, I justrm -rf’ed /Users/kevin/Library/Caches/basilisk/

Aaron Lun (17:02:54) (in thread): > Orbasilisk::clearExternalDir().

Aaron Lun (17:03:01) (in thread): > You can also just delete it for a single package.

Kevin Rue-Albrecht (17:27:17) (in thread): > I can confirm:https://github.com/theislab/zellkonverter/pull/21fixes it for me (locally)

Kevin Rue-Albrecht (17:29:31) (in thread): > R CMD check is then happy with 0E 0W 2N

Aaron Lun (18:55:15) (in thread): > I can’t even run velociraptor locally, I’m having trouble with pip for numba.

Aaron Lun (18:58:07) (in thread): > We should nail down a couple of thepippackages intoconda: > > cycler, pillow, kiwisolver, matplotlib, click, numpy-groupies, llvmlite, numba, loompy, stdlib-list, sinfo, seaborn, setuptools-scm, numexpr, tables, threadpoolctl, joblib, scikit-learn, decorator, networkx, umap-learn, patsy, statsmodels, tqdm, get-version, legacy-api-wrap, scanpy > > We need to nail down as many dependencies as possible.

Aaron Lun (19:04:20) (in thread): > I suspect the failure arose from an update in one of these anndata-dependent packages that wasn’t there when we first set up the app; these need to be pinned down to avoid surprises like this.

Aaron Lun (19:08:49) (in thread): > pip packages escape the conda package manager, so we can’t rely on conda to do this for us.

Aaron Lun (19:13:02) (in thread): > I also realized that they just uploaded up-to-date versions of scvelo in the past 2 months to conda; we should use these directly, though I’ll have to figure out how to include the bioconda channel.

Aaron Lun (19:53:54) (in thread): > Added support to specify additional channels - check out the advice in?setupBasiliskEnvabout versioning.

Aaron Lun (19:53:57) (in thread): > 1.1.10.

Aaron Lun (20:13:03) (in thread): > So,BasiliskEnvironment()wherescvelo==0.2.2is inpacakges=andchannel=c("bioconda", "conda-forge")

Aaron Lun (20:13:32) (in thread): > Plus as many of the other packages as you can, check out?setupBasiliskEnvfor some recommendations.

2020-09-04

Goutham Atla (08:24:54): > @Goutham Atla has joined the channel

2020-09-07

Tyrone Chen (20:58:36): > @Tyrone Chen has joined the channel

2020-09-09

Aaron Lun (22:22:35): > @Kevin Rue-Albrechtwe should be in position. SEe my previous comments and try pulling down from bioconda with thechannels=arugment.

2020-09-10

Kevin Rue-Albrecht (09:44:50) (in thread): > I’ve succesfully passed R CMD check locally using simply > > velo.env <- BasiliskEnvironment("env", "velociraptor", > packages="scvelo==0.2.2", channels = c("bioconda")) > > but > > as many of the other packages > then I checked out which version of all the other packages inzellkonverter::.AnnDataDependencieswere installed in the environment above, and pinned down those versions as an internal variable.scvelo_dependencies

Kevin Rue-Albrecht (10:35:23): > On the bright news, we’re back to failing because the GHA exceeds the runner memory limit. Everything runs fine for me locally.https://github.com/kevinrue/velociraptor/runs/1096740834?check_suite_focus=true

Kevin Rue-Albrecht (10:36:28): > How about you run R CMD check on your side, edit how exactly you prefer to seebasilisk.Rand then we’ll merge and submit?

Aaron Lun (11:10:52) (in thread): > yup

Aaron Lun (11:22:52): > we should probably trim the test dataset to avoid blowing out to 11 GBs, that’s crazy for a vignette run.

Aaron Lun (11:23:20): > The dataset isn’t even that big, it’s only a few thousand cells. Would have expected ~4 GB at most.

Aaron Lun (11:24:20): > oh yeah, you can get rid of theRemotes:in theDESCRIPTION.

Kevin Rue-Albrecht (12:22:26): > Not sure where you see theRemotes, master branch? latest progress in onbioccheck

Kevin Rue-Albrecht (12:30:06): > As a naively simple workaround, I’m just checking if removing thevelo.out[2-4]objects keeps memory usage within the GHA limit. Otherwise I’ll let you downsample the object however you see fit.

Kevin Rue-Albrecht (12:30:30): > https://github.com/kevinrue/velociraptor/runs/1097552051?check_suite_focus=true

Aaron Lun (12:30:33): > just take the first 500 cells or something

Aaron Lun (12:30:37): > not a big deal.

Kevin Rue-Albrecht (14:45:41): > are we good bossman?@Aaron Lun - File (PNG): image.png

Aaron Lun (14:45:52): > I dunno, are we?

Kevin Rue-Albrecht (14:46:27): > 500 cells got it passed. I’m checking what else needs doing before submission

Kevin Rue-Albrecht (14:46:33): > it’s been a while

Aaron Lun (23:57:21): > looks good

Aaron Lun (23:57:45): > 500 cells gives a crappy looking plot, but we’ll have something better looking in the chapter.

2020-09-11

Kevin Rue-Albrecht (05:57:49): > https://github.com/Bioconductor/Contributions/issues/1632:tada:

Kevin Rue-Albrecht (05:58:46): > #randomfact about 1632 > > September 1–Battle of Castelnaudary: A rebellion against French kingLouis XIIIis crushed. The leader of the rebellion,Gaston, Duke of Orléans, the brother of Louis XIII, surrenders.

Aaron Lun (11:17:49): > um.

Aaron Lun (11:18:07): > You could have a random fact poster everytime someone opens an issue on contributions.

Charlotte Soneson (11:19:09): > I look forward to when it starts predicting the future in 400 submissions or so

Kevin Rue-Albrecht (11:45:55): > I’d be curious to see a machine learning attempt at predicting the likelihood of acceptance for package submissions based on information parsed from the past 1571+ closed submissions ;)

Luke Zappia (11:47:13) (in thread): > High chance it would just predict based on whether or not Aaron is an author:stuck_out_tongue_closed_eyes:

Kevin Rue-Albrecht (11:47:40) (in thread): > probably also a predictor of how fast the package is accepted:stuck_out_tongue:

Kevin Rue-Albrecht (11:48:21) (in thread): > I had a more ‘serious’ hypothesis: e.g., the number of occurences oftest_thatin the package positively correlates with likelihood of acceptance

Jacob Morrison (11:49:04): > @Jacob Morrison has left the channel

Luke Zappia (11:49:33) (in thread): > There are probably a bunch of small things like that which are good signs

Kevin Rue-Albrecht (11:52:25) (in thread): > it’d be fun to learn them:slightly_smiling_face:hopefully it won’t encourage “acceptance-hacking” with people copy pasting thousands oftestthat::expect_identical(1, 1):stuck_out_tongue:

Matt Stone (12:48:56): > @Matt Stone has joined the channel

Hervé Pagès (14:07:05) (in thread): > 2032: return to Earth orbit of 3rd stage of Apollo 12 Saturn V, 63 years after its launch:https://en.wikipedia.org/wiki/2032 - Attachment: 2032 > 2032 (MMXXXII) will be a leap year starting on Thursday of the Gregorian calendar, the 2032nd year of the Common Era (CE) and Anno Domini (AD) designations, the 32nd year of the 3rd millennium, the 32nd year of the 21st century, and the 3rd year of the 2030s decade.

Aaron Lun (15:23:32): > looks like we just need a.Rbuildignoreon your project files.

2020-09-12

Aaron Lun (01:55:42): > Note thatlistPackages(velociraptor:::velo.env)is still showing quite a few packages (~70) that probably need to be explicitly listed

Aaron Lun (01:56:14): > (If you use basilisk 1.1.16 you can just copy-paste most of them intobasilisk.R.)

2020-09-14

Ilir Sheraj (06:24:01): > @Ilir Sheraj has joined the channel

2020-09-21

Chris Cheshire (03:38:07): > @Chris Cheshire has joined the channel

2020-09-24

Ludwig Geistlinger (15:01:37): > @Davide Risso@Aaron LunQuestion about a potential addition toscRNAseq: > > We have five 10x Genomics datasets published alongside our recentCancer Research publication. We did a number of OSCA analyses (preproc, QC, cell cycle, SingleR, …) and other analyses (slingshot, monocle3, inferCNV, consensusOV, …), with results being annotated ascolDataandrowDatato the 5 SCEs. > > We deposited theraw data at GEO, and folks interested in reproducing or using certain parts of the processed data / results could create them from scratch following thevignettes. But certain steps are rather time-consuming (hours / days depending on where you compute) and not everyone might be able or willing to do that. > > Now I thought maybe it’s best / most convenient to make the fully processed and annotated SCE’s available viaExperimentHub, and thatscRNAseqacutally provides already a systematic framework for that. However, the vignette section onadding new datasetsseems to generally discourage adding 10x datasets toscRNAseq? Is an independent data package the better choice?

Aaron Lun (15:06:56): > That particular section got removed.

Aaron Lun (15:07:21): > Though in your case, it would seem to make more sense to have a dedicated data package.

Aaron Lun (15:07:34): > scRNAseq is really an orphanage for datasets that have no real home or relation to each other.

Aaron Lun (15:08:57): > If it were an scRNAseq package, you would get exactly one function to use to handle all the different objects and options, so that might be too restrictive.

Ludwig Geistlinger (15:15:15) (in thread): > If it’s my call, I’m really just searching for a clean post-publication way of conveniently serving the fully processed+annotated datasets. A single function returning the SCE(s) would thus be sufficient for me, and if possible I would actually like to avoid setting up a new package, and re-use existing infrastructure (that you already have in place) where possible.

2020-09-25

Kasper D. Hansen (05:56:38): > Best practices for this would be pretty nice. By “this” I mean that you want to bundle and release data files for analysis for other users, and where data files are data where its often the same data processed at various stages (to avoid comp. overhead but also to enhance repro. by providing the result so to speak). In principle, as long as the datasets are not enourmous, it should be possible to host them somewhere and expose them easily to end users.

Kasper D. Hansen (05:57:27): > This is one of the use cases where we don’t necessarily want to store it on the Bioc-paid-for S3 bucket but want to be able to have the data on some server and have people write a package that can suck it down somehow.

Vince Carey (08:15:02): > Any thoughts about a requestor-pays google cloud storage bucket? To foster experimentation, the Bioc Foundation has an AWS S3 bucket where we could put a few of Ludwig’s files (?) for folks to demonstrate alternative approaches. I’d place some examples in there. I think we could set up a requestor-pays GCS bucket too and then develop some estimates on costs to acquire. We’d also want to get information on “analyzing and reducing” “in the cloud”, which should not be so hard these days, and might be cheaper than downloading large files.

Vince Carey (08:15:39): > This sounds like it COULD be a#bigdata-repentry?

Vince Carey (08:16:48): > @BJ Stubbsany thoughts about tracking costs for requestor pays events?

BJ Stubbs (08:16:53): > @BJ Stubbs has joined the channel

Kasper D. Hansen (08:28:25): > Well, IMO (and I acknowledge I am pushing my own agenda) we want the echo system to be completely indifferent wrt. where the data is hosted. I think its up to the supplier to decide what they want to do. We could have an option for a requester pay bucket, but I don’t think we want to force people to use this

Kasper D. Hansen (08:29:08): > The main critisims of this suggestion has historically been that if people can host wherever, we have no ability to version or to gurantee reproducibility.

Kasper D. Hansen (08:29:39): > Which is true. But IMO this is the case of the perfect becoming the enemy of the good.

Kasper D. Hansen (08:30:12): > We could build in a system where a package contains an md5 hash of the remote resource and verifies the hash upon download so at least we know its the same.

Martin Morgan (08:49:08): > As Kasper notes the Hubs already allow redirection, so it’s a policy decision about allowing ‘third party’ hosting. I know we all have best intentions, but really I think this would lead rapidly to dead data links and hence broken packages. Also I don’t think it’s our role [or the individual study authors] to host large primary data sets – there are other repositories for that. > > If the policy were to change, then some effort should be invested in making sure, programmatically, that the data remain available and, when no longer available, the dependent packages (and their dependencies…) are deprecated / removed.

2020-09-28

Qirong Lin (06:53:58): > @Qirong Lin has joined the channel

2020-09-29

Kasper D. Hansen (05:44:16): > @Martin MorganI’ll add some observations. There is an implicit tension between “we do not want to host large resources” (which I agree with) and issues with data decay.

Kasper D. Hansen (05:49:15): > But for several repositories we can’t just add whatever we want. And we cannot put up whatever fileformat we want. And I think there is a lot of value in both lightly or heavy curated (cleaned) datasets, and we all share that.

Kasper D. Hansen (05:52:25): > Fo example the curatedMetagnomicData package, the authors did a huge service to the community, and Bioconductor essentially hosts the data in ExperimentHub (right? Am I misunderstanding something here?). It seems to me that some of the work@Aaron Lunis doing with exposing scRNA is the same (perhaps less curation): an important dataset gets added to ExperimentHub and has support through a specific R package. But how can we make this easier (perhaps it is easy enough) and also, do we (the project) want to continue to host all of this.

Kasper D. Hansen (05:54:13): > I am advocating for the model where a package supports (multiple) datasets which are hosted outside of our S3 data bucket. Perhaps this should not be ExperimentHub but it should have a similar interface etc.

Kasper D. Hansen (05:54:49): > From a package perspective, being able to query resources and tell the user if the resources are availableright nowseems sensible and something we should make easy. How that factors into a deprecation decision is not clear to me.

Kasper D. Hansen (05:57:02): > Anyway, we’re going to have some discussion of this when we submit recount3 (very soon), since it is very related.

Tim Triche (08:34:17): > Will be quite interested in your take. The full scNMT data, for example, is huge, and it was just a pilot. The single-cell clinical trial correlatives are going to be obscene. Huge waste of money for NIH if they just need to be reprocessed out of GDC each time.

Martin Morgan (08:55:38): > I’m fine with / would encourage the recount data to be hosted outside ExperimentHub, but to be accessible via EH and an ExperimentHub package. I guess the size makes data duplication intimidating, and somehow the recount people have ‘credibility’ with the project; but this seems ad hoc rather than policy-driven.

2020-09-30

Vince Carey (06:10:21): > https://www.nature.com/sdata/policies/repositories#lifeenumerates data repositories. Towards the bottom of the page there are “Generalist repositories” with costs and size limits listed. Metadata about repositories are collected atre3data.organdFAIRshare.org. Bioconductor has entries in both of these. To store at dryad would cost about $5000/TB without any additional arrangement; seehttps://datadryad.org/stash/our_community#institutionalfor membership concept leading to fee waiver. To get fees waived the foundation could try to obtain a membership. I think we’ll have to form a working group to get clear on storage and access to large curated data in the Bioc framework. - Attachment (Nature): Recommended Data Repositories | Scientific Data > Recommended Data Repositories

Robin Zhou (16:49:41): > @Robin Zhou has joined the channel

2020-10-01

Aaron Lun (02:47:01): > Anyway, to cycle back to the original question for@Ludwig Geistlinger. If you have an official place to pull out the components of the SCE (usually GEO or ArrayExpress),scRNAseqcouldbe a potential host. The policy - at least for this package - is that I need to be able to see and re-run the code to generate (the components of) the SCEs. There are some situations where I have pulled content from lab-hosted servers, but I haven’t been too happy about doing that. > > However, if we put aside the technical issues, it really does seem like you would get a lot more value from hosting it in a dedicated package, like Charlotte’sDuoClustering2018or Jonny’sMouseGastrulationData. You would have more control over the contents and interface, and you could update things on your own schedule.

Ludwig Geistlinger (08:05:37): > Thanks for these suggestions, Aaron. Following also the above discussion, I think I have a better idea of what’s currently possible, and will look into what’s involved for setting up a dedicated package, going beyond just pulling the fully processed + annotated SCEs serialized as RDS from some host site as currently done in the github vignettes.

2020-10-04

Aedin Culhane (02:39:22): > I brought down some devel and having fun… with SingleCellExperiment, DelayedArray and matrixStats class: SingleCellExperiment > dim: 27629 26945 > metadata(0): > assays(1): counts > Error in matrixStats::rowRanges(x, …) : > Argument ‘x’ must be a matrix or a vector.

Aaron Lun (05:22:16): > see comments in#bigdata-rep.

Aedin Culhane (12:36:08): > Thanks

Aedin Culhane (12:39:54): > I am in the lucky situation with have multiple technical replicates with 98% similar barcodes and also biological replicates. However what is the most efficient way to store these. If I concentrate these into one sce, there are duplicated barcodes. Should they be multiple sce in the one SingleCellExperiment. I have HTO data in the altExp slot. Each observation has between 5,500-6,000 cells (barcodes)

Aedin Culhane (12:41:14): > Is there nice tools in any pkg for analysis of replicates or that can take advantage of the replicates

Tim Triche (12:46:24): > why not unique-ify the barcodes (-1, -2, -3, whatever) and store a column for the actual barcode in thecolData

Tim Triche (12:46:41): > alternatively collapse them to droplet/cell barcode like BAP2 does for scATAC

Tim Triche (12:46:52): > not that I’ve had to do this repeatedly lately or anything:wink:

Tim Triche (12:47:31): > (bap2 has its own problems, btw; it routinely shits the bed even on nodes with 384GB of RAM and 40 cores)

Tim Triche (12:48:04): > (but the theoretical aspect of “multiple bead barcodes for the same cell” is sound)

Tim Triche (12:48:33): > you can do this with either tags (in the BAM) or metadata

Tim Triche (12:48:41): > I use tags most of the time these days

Aedin Culhane (12:48:49): > Thanks Tim I considered this, but it messes with the HTO mapping. Unless i repeat the HTO matrix, which seems a waste

Aedin Culhane (12:49:11): > It would be better to store as an array or multi-exo?

Tim Triche (12:49:12): > one way or another you’re going to have to map those features

Tim Triche (12:49:53): > altExpsIIRC – that’s what they’re for

Tim Triche (12:50:15): > going to punt to@Aaron Lunon this though

Tim Triche (12:51:10): > the fact of the matter is that droplets are many-to-many cell-to-bead in the general case

Aedin Culhane (12:51:11): > I am using altExprs for the HTO matrix (5 x 6.00)… Then I have 8 sce datasets of 25,000 x 6000 =/- 500 barcodes

Tim Triche (12:51:38): > I’m not sure I’m following here

Tim Triche (12:52:00): > how did you end up with library tech reps from the same ADT-barcoded cells with just one ADT for each

Tim Triche (12:52:29): > (many-to-many relationship holds even more so in this case, it seems)

Tim Triche (12:53:49): > (one ADT per Ab per cell, presumably, but multiple runs of the same UMIs or CBs?)

Aedin Culhane (12:54:08): > It was a complex design.. the cells were labels and pooled, they were run on 2 lanes.

Tim Triche (12:54:32): > oh ok – so how many tech vs. biological reps per assay per cell?

Aedin Culhane (12:54:48): > Its actually a nice dataset for playing with methods though:wink:

Tim Triche (12:54:52): > I believe it

Tim Triche (12:55:21): > it sounds like the data really lives in a sparse tensor for at least one of the assays then?

Tim Triche (12:55:40): > e.g. [feature, cell, lane]

Tim Triche (12:56:04): > because that would be fun for factorizing

Tim Triche (12:56:33): > not sure how many people run tech*biological reps across lanes:slightly_smiling_face:

Aedin Culhane (12:56:44): > Yup… my plan.. just working out the most efficient way to store the data. For corral we play with MAE, but its not ideal. There are 2 biological replicates A, B and for each 4 technical replicates. Each is a pool of cells, labelled with HTO

Aedin Culhane (12:57:11): > Yes,.. I am surprised by the design.. but secretly excited that I get to play with it:wink:

Tim Triche (12:58:30): > are they hashtagged, ADT’ed, or… ?

Tim Triche (13:00:12): > in any event if one can index an additional dimension in a DelayedArray then the mRNA UMI count can live in a sparse tensor. As an added benefit one could just sum or median/MAD the reps for a biologically unique cell if need be

Aedin Culhane (13:00:31): > Yes Antibody tags (cite-seq,,, HTO protocol)

Tim Triche (13:00:37): > not sure if SingleCellExperiment knows how to index into Z/k though

Tim Triche (13:00:55): > Would be ideal if the thing could be sparse and also HDF5 backed. Not sure about that either.

Tim Triche (13:01:15): > Herve or Aaron would know these things. I’m afraid I don’t

Aedin Culhane (13:01:47): > Thats your expertise…

Tim Triche (13:02:14): > indeed, my expertise is lacking

Tim Triche (13:03:24): > hmmm

Tim Triche (13:03:50): > did you play withrowPairsandcolPairsin SCE? That could potentially address some of these issues

Tim Triche (13:04:51): > they’re supposed to be for (e.g.) kNN graphs etc. but seem like they might be abuse-able for your needs too

Aedin Culhane (13:05:29): > no I’ll look at them

Tim Triche (13:05:33): > and can be sparse, which is nice since presumably your replicate graph is sparse w/r/t cells:slightly_smiling_face:

Aedin Culhane (13:05:42): > yes…

Aedin Culhane (13:06:01): > data is dgCMatrix

Tim Triche (13:07:05): > it’s tempting to think of each tech rep as an altExp but that doesn’t really make sense unless they are to be normalized per-lane or some such

Aedin Culhane (13:07:48): > So we can align etc.. for now I’ll use our corralm MAE hack… but would love to work on this most efficient way to do thishttps://bioconductor.org/packages/devel/bioc/vignettes/corral/inst/doc/corralm_alignment.html

Tim Triche (13:08:38): > corrallooks super handy, I didn’t know about it before. Especially if spike-in standards (cells/template) are involved, it looks incredibly powerful

Tim Triche (13:08:56): > the lack of independence between library size and cell type or cell state breaks so many bulk assumptions it’s ridiculous

Tim Triche (13:09:08): > and extends to other assays besides mRNA

Tim Triche (13:09:33): > being able to load up corral with actual controls seems absurdly powerful

Aedin Culhane (13:11:06): > Yes.. agree.. so many assumptions we made on bulk are problematic… we sat on a paper for years (Schwede et al., paper) that showed that molecular types reflect cell composition, which can be dependent on tissue sampling location…. I think there will be many such finds when we fully get out heads around the potential of sce (beyond estimation of cell lineage)

Tim Triche (13:11:18): > (e.g. the amount of accessible chromatin and mRNA copy number in a cell with half the SMARCB1 dosage or active EWS-FLI1 is in no way comparable to a full-dosage or KD cell… all the sophistry in the world can’t conceal that)

Aedin Culhane (13:11:57): > … Tim lets zoom during the week.. we are going off on a tangent for the channel:wink:

Tim Triche (13:12:00): > similarly ,anyone who has tried to extract mRNA from naive B/T cells or neutrophils knows full well that they are much harder to get good libraries from than activated T cells or monocytes

Tim Triche (13:12:03): > you got it

Tim Triche (13:12:13): > but it’s not really off topic for SCE – SCE is capable of handling this

Tim Triche (13:12:23): > (in my defense, that’s why I like SCE)

Tim Triche (13:12:51): > SCE was designed to have spike-in payloads and colPairs can be abused to capture spike-in cells as library size factor standards.

Tim Triche (13:13:12): > Incidentally, if you’re using corral with MAEs of SCEs, we should talk about the full scNMT dataset

Tim Triche (13:13:33): > I reprocessed the damned thing and broke a lot of HDF5 / SE machinery in the process. But it’s done now.

Aedin Culhane (13:13:36): > splitAsList(sce, sce$batch) would be handy (into MAE)

Tim Triche (13:14:42): > colLabelsandsplitAsListcould make a nice combo

Aedin Culhane (13:14:44): > Yup we broke MAE too

Tim Triche (13:14:57): > heheheheh I’m delighted that it’s not just me

Tim Triche (13:15:00): > breaking everything in sight

Tim Triche (13:15:08): > SCE is sturdy though

Tim Triche (13:15:54): > @Aaron Lunsaid long ago that the reason forNOThaving an altExps <-> MAE coercion was the heavyweightness of it all

Tim Triche (13:15:59): > maybe that bears revisiting?

Aedin Culhane (13:16:03): > MAE was a good start but has limits…designing good classes is hard.. so more kudos to SCE

Tim Triche (13:16:07): > yes

Tim Triche (13:16:18): > thanks@Aaron Lunfor suffering so we (mostly) don’t have to

Aedin Culhane (13:16:35): > :fast_parrot:

Aaron Lun (13:47:47): > can someone condense the question for me.

Tim Triche (14:02:36): > what’s the best way to shove a bunch of tech reps with distinct UMIs but shared ADTs into an SCE. I think.@Aedin Culhane

Aaron Lun (14:04:15): > If you want them in a SCE, you should just cbind the matrices together, assuming all samples have the same set of features.

Aaron Lun (14:04:40): > If they have different sets of features, or if you want to preserve the hierarchy of tech reps nested in bio reps, not much choice but to use an MAE.

Aedin Culhane (22:53:21): > Thanks, thats what I though, but I wanted to check

2020-10-07

Anjali Silva (09:07:51): > @Anjali Silva has joined the channel

2020-10-09

Yuyao Song (13:37:38): > @Yuyao Song has joined the channel

2020-10-10

Davide Corso (10:50:55): > @Davide Corso has joined the channel

2020-10-13

Aaron Lun (00:51:10): > looking at a public dataset now, and man, I’ve got to say… same study, same organism, same condition, same technology… but different gene annotations between replicates.

Aaron Lun (00:51:27): > they couldn’t have made life more difficult forscRNAseqif they had tried.

Assa (07:03:09): > I hope this is the right platform here. Does anyone know of single-cell experiments with reallyjust one single-cell? The plan is to try a single-cell experiment where the cells are really rare (neurons). So is it possible to do such an experiment, where I have only 1-2 cells per time-point?

Assa (07:07:46): > Would it be possible to run such an experiment with just 1-2 cells per condition/Timepoint similar to a bulk RNA-Seq experiment, where each cell is a biological replica?

Friederike Dündar (07:19:07): > Why shouldn’t it be possible? I.e. what are the caveats you have identified? While the 10X Genomics/droplet-based techniques would probably not be appropriate, you may just need to use a different single-cell sequencing platform, e.g. plate-based ones.

Assa (07:21:28): > I have more of a difficulty to understand the analysis part of this kind of data.

Assa (07:22:26): > getting the single cells is possible, but would this need to be analyzed similar to “normal” single-cell experiment or is it better to do it as a bulk-method?

Alan O’C (07:22:51): > What type of analysis do you want to do?

Assa (07:23:36): > checking expression behavior over different timepoints.

Alan O’C (08:07:32): > I suspect you’d want to have quite a few replicates to be confident in the results of a differential expression analysis applied using cells as replicates

Alan O’C (08:12:39): > I think as long as the tool you’re using can handle the experimental design you wish to analyse, you would be okay using either bulk or single cell tools. Since differential expression is fairly computationally “cheap” you could use multiple methods to ensure the results make sense

2020-10-15

Pol Castellano (04:43:23): > @Pol Castellano has joined the channel

2020-10-17

Kevin Blighe (08:25:06): > @Kevin Blighe has joined the channel

2020-10-18

Noah Pieta (00:23:01): > @Noah Pieta has joined the channel

hiro (20:01:31): > @hiro has joined the channel

2020-10-19

brian capaldo (14:03:28): > is there an apply function for sce objects? Like, say I want to execute a function on all clusters, or for every gene?

Aaron Lun (14:05:52): > Not to my knowledge. There is probably a better way to do whatever you are trying to do.

Aaron Lun (14:06:38): > at least for the genes. For clusters, I put in a PR for asplitByColfunction:https://github.com/Bioconductor/SummarizedExperiment/pull/45

brian capaldo (14:08:30): > doing some lineage tracing, so every cell is assigned to a clone, and am going to build an MST for each one. Was hoping for an apply function of some sort, but no worries, it’s not a hard thing to do

brian capaldo (14:08:45): > each one being each clone

Aaron Lun (14:10:23): > the splitByCols function will probably help, if you follow it up with alapply.

brian capaldo (14:11:03): > ah yeah

Aaron Lun (14:11:09): > But honestly, I just writeforloops if you have a non-trivial anonymous function - it’s just easier to debug.

brian capaldo (14:11:10): > that should be close enough

brian capaldo (14:11:17): > yeah, same

brian capaldo (14:11:43): > i just like apply functions cause I can usually get mclapply working

brian capaldo (14:12:00): > for some reason, I struggle with parallel for loops

Martin Morgan (16:22:17): > There are a couple of “gotcha’s” withforloops that ‘people’ often fall into. > > Not using anonymous functions in*apply()can help with debugging. Also, when working with complicated objects remember that it is better to update once rather than many times. And finally, minimize information to operate on, e.g., no need for dimnames. So > > .apply1 <- function(x) { > ## operation on, e.g., a plain-old vector `x` > } > > .apply_sce <- function(sce) { > assay(sce, withDimnames = FALSE) <- > apply(assay(sce, withDimnames = FALSE), 2, .apply1) > sce > } > > Probably a performance disaster would be > > .misapply_sce <- function(sce) { > for (j in seq_len(ncol(sce))) > assay(sce)[,j] = .apply1(assay(sce)[,j]) > sce > } > > I partly verified with with.apply1 = log1pandsce = scRNAseq::BachMammaryData()where.apply_sce()took about a minute, and I killed.misapply_sce()after about 5 minutes. > > Here’s a real performance hit when usingforloops > > n = 100000 > x = integer() > for (i in seq_len(n)) > x = c(x, i) > > This ‘copy and append’ makesn (n - 1) / 2copies of an element in x, and even thoughnis not that large, takes about a minute for me. In contrast > > sapply(seq_len(n), function(i) i) > > takes a tenth of a seconds (and of course1:nis very fast, even for hugen)

Hervé Pagès (17:10:28): > Right but I think it’s also worth clarifying thatlapply()/sapply()have noinherentperformance benefits overfor. There seems to be a myth that the looping mechanics are faster inlapply()/sapply()than in aforloop. AFAIK they are notinherentlyfaster. It’s just that, by usinglapply()/sapply(), you’re less likely to do the very inefficient things that you can do in aforloop, like growing an object at each step. This is becauselapply()/sapply()make this harder to do, but not impossible. But if you are careful to not do this kind of inefficient things in yourforloop, there’s no reason it can’t be as fast as anlapply()/sapply(). In some situations it’s even going to be faster. Furthermore, if you’re not going to do anything with the huge list returned bylappply(), using aforloop is the right thing to do: > > fake_write_to_disk <- function(i) NULL > system.time(lapply(1:1e7, fake_write_to_disk)) > # user system elapsed > # 4.944 0.016 4.960 > system.time(for (i in 1:1e7) fake_write_to_disk(i)) > # user system elapsed > # 2.290 0.000 2.291 >

Alan O’C (17:12:24): > Indeed; apply families are just nice wrappers for loops. The R inferno is a good/entertaining read for this and IIRC avoids the overemphasis on functional programming that people seem to fall into

Martin Morgan (17:56:13): > Totally understand that*apply()are implemented as essentiallyforloops (is there an alternative?), but for novice though intermediate programmers (and for pros most of the time!) the use of*apply()seem like a much better way to go. I think saying they are ‘just nice wrappers…’ undersells and misguides! Also taking 2s longer for 1e7 calls doing nothing seems like a small performance penalty (if the function did almost anything then that cost would determine performance) compared to the hours (literally!) a for loop of that size would cost in the copy-and-append example. > > I agree that aforloop seems like a good choice when the return value is uninteresting, but for conceptual reasons more than performance.

Alan O’C (18:00:45): > Saying that you should not use for loops in R also undersells (the language) and misguides though; understanding why performance penalties arise is the important element

Hervé Pagès (18:16:05): > You even see people doing all kinds of ugly things in theirlapply()loops, like using<<-, when a for loop would have been the natural thing to use. But somehow they’ve been told since they were little thatforloops in R are a “very very bad thing”.:wink:

2020-10-20

FelixErnst (02:11:42) (in thread): > Yesforloops are evil. I mean this in a sense, that they are found in any other language and, if some switches over to R from another language, the concept of vectorization is something totally foreign as a concept. Just to break the ice, the first sentence is sometimes helpful. I have seen so manyforloops with a ton ofifstatements, which can be refactored into nice littlelapply

Mikhael Manurung (06:36:52) (in thread): > As long as you pre-allocate then there should not be any performance penalty in using for loops, right?

Alan O’C (06:39:26) (in thread): > I think modify-in-place can also be costly for large objects because R copies objects to a new temp object and then copies back. data.table is an exception I think

brian capaldo (13:13:53): > To be clear, I have no qualms about usingforloops and understand the “right” way to use them. I likeapplyfunctions because I have much less issues convertingapplyintomclapplycode vs convertingforloops into parallelforloops. It’s completely a personal bias, not one rooted in any sort of misconceptions about the approaches.

Mikhael Manurung (13:39:48) (in thread): > Is it because ofdata.table’s in-place modification?

Mike Smith (13:42:30): > Anyone fancy discussing this in the next devel call? Maybe it’s a little to ‘basic’ but I think there’s definitely some kind of Dunning-Kruger relationship where you can replace ‘confidence’ with ‘willingness to use loops’

Edgar (17:22:28): > I would use either apply , mcapply or parallel for loops. I dont care which, as long as I solve my issue quickly. I like for loops since I can if statements.

Adele Barugahare (18:08:08): > @Adele Barugahare has joined the channel

Chris Chiu (19:16:39): > @Chris Chiu has joined the channel

2020-10-21

Sudarshan (04:33:30): > @Sudarshan has joined the channel

Hervé Pagès (22:36:12) (in thread): > This is misleading. What you want to tell to people coming from a non-vectorized programming language is that most of the time they can take advantage of vectorization to avoid loops, not just to avoidforloops. With a vectorized language like R, doingsapply(x, toupper)on character vectorxis almost as bad as doing the same thing with aforloop. Loops are evil, not justforloops.

2020-10-23

Tim Triche (12:03:43): > bplapplywith a job directory can be quite handy when running Very Large Jobs over a Substantial Period Of Time

ImranF (16:44:25): > @ImranF has joined the channel

2020-10-29

Jordan L. (13:51:05): > @Jordan L. has joined the channel

2020-11-03

Pablo Rodriguez (11:40:22): > @Pablo Rodriguez has joined the channel

2020-11-04

Regina Reynolds (15:56:29): > @Regina Reynolds has joined the channel

2020-11-09

Al J Abadi (23:06:12): > @Al J Abadi has joined the channel

2020-11-11

Joshua Shapiro (09:09:35): > @Joshua Shapiro has joined the channel

watanabe_st (19:25:12): > @watanabe_st has joined the channel

2020-11-17

Carmen Abaurre (07:10:40): > @Carmen Abaurre has joined the channel

2020-11-18

eugenia.galeota (05:39:28): > @eugenia.galeota has joined the channel

Liliana Zięba (11:39:41): > @Liliana Zięba has joined the channel

2020-11-19

David Dittmar (08:32:51): > @David Dittmar has joined the channel

2020-11-23

Dominique Paul (08:38:45): > @Dominique Paul has joined the channel

2020-11-26

Aaron Lun (15:13:09): > @Alan O’CI added a Dockerfile but you’ll probably want to deploy it on your dockerhub account and then repoint the GHA to use that instead. Currently it’s using iSEE’s image.

Aaron Lun (15:17:31): > ah damn. destiny’s not building on BioC so our actions are failing. Maybe we want to start pruning some of these methods that don’t get a lot of screentime.

Aaron Lun (15:20:44): > Or, we could just force in destiny in the dockerfile by installing it from the github sources.

Aaron Lun (15:20:48): > ¯*(ツ)*/¯

Alan O’C (16:28:12): > Thanks. Yeah that’s a bit of a pain, I wouldn’t be opposed to axing it but then I’m hardly a diffusion map afficionado

Aaron Lun (23:49:16): > Anyway, I think I’m happy with the PR, so review the changes and squash it when you’re happy

2020-11-27

Alan O’C (09:41:21): > I do prefer low being yellow with viridis on a white background but I don’t feel strongly enough to change it

Luke Zappia (09:56:07) (in thread): > I always find it confusing when yellow is low, seems backward to me. I think maybe I can’t get past the blue == cold == low vs yellow/red == hot == high comparison.

Alan O’C (10:04:07) (in thread): > blue/red I agree but dark blue vs yellow as in viridis I have no implicit association

2020-12-03

cottamma (10:07:36): > @cottamma has joined the channel

2020-12-10

Phil Xie (04:35:42): > @Phil Xie has joined the channel

2020-12-12

Huipeng Li (00:38:17): > @Huipeng Li has joined the channel

2020-12-14

Nick Owen (13:21:52): > @Nick Owen has joined the channel

Bharati Mehani (20:03:33): > @Bharati Mehani has joined the channel

2020-12-15

Fredrick E. Kakembo (01:50:16): > @Fredrick E. Kakembo has joined the channel

Jenny Brown (03:02:57): > @Jenny Brown has joined the channel

2020-12-21

Giacomo Antonello (04:21:06): > @Giacomo Antonello has joined the channel

Faris Naji (07:22:56): > @Faris Naji has joined the channel

Yue Pan (09:06:44): > @Yue Pan has joined the channel

2020-12-22

Wancen Mu (09:19:46): > @Wancen Mu has joined the channel

Aaron Lun (21:54:39): > @Kevin Rue-Albrechtdo you want to own the SCEGallery?

2020-12-25

Paul Myers (21:41:20): > @Paul Myers has joined the channel

2020-12-26

Kevin Blighe (10:31:24): > @Kevin Blighe has joined the channel

Charlotte Soneson (10:36:37): > https://bioconductor.org/packages/zellkonverter/

Aaron Lun (17:33:44): > “convert to H5” is pretty vague.

Aaron Lun (17:34:12): > There’s at least one other HDF5-based format (loom).

Hervé Pagès (21:12:49): > Pretty vague indeed. If you just want to move the assay data of a SingleCellExperiment object from memory to disk (as HDF5 datasets), maybe take a look at?saveHDF5SummarizedExperiment()in the HDF5Array package.

2020-12-27

Kevin Rue-Albrecht (08:32:35) (in thread): > i can’t really take on another open ended initiative right now

2021-01-01

Bernd (14:06:53): > @Bernd has joined the channel

2021-01-19

Andrea Gobbini (04:35:41): > @Andrea Gobbini has joined the channel

2021-01-22

Annajiat Alim Rasel (15:45:38): > @Annajiat Alim Rasel has joined the channel

2021-01-24

Modeline Longjohn (16:40:48): > @Modeline Longjohn has joined the channel

2021-01-28

Shannan Ho Sui (09:42:10): > @Shannan Ho Sui has joined the channel

Friederike Dündar (15:22:35): > Is there are reason that thecolDataof SCE andaltExp(sce)are separate? Or did I mess up somewhere along the road?

Aaron Lun (15:23:47): > allows you to put modality-specific colData in the altExps if you like. e.g., “CITE-seq-derived clusters” vs “RNA-seq-derived clusters”. If you don’t care about the distinction, you can usealtExp(sce, withColData=TRUE).

Friederike Dündar (15:24:24): > aha!

Friederike Dündar (15:24:25): > thanks

Friederike Dündar (15:25:14): > I guess that’s for when I create the altExp?

Friederike Dündar (15:26:03): > will that translate throughout, i.e. if I change something insce’scolData, will that change be visible toaltExp?

Aaron Lun (15:27:41): > Not sure what you mean by “visibility”, but if you change someting incolData(sce)and then doaltExp(sce,withColData=TRUE), then yes, the former change will be propagated to the object returned by the latter.

Friederike Dündar (15:29:03): > yes. had tripped myself up with an erroneous pair of ()

Friederike Dündar (15:29:07): > all good

Aaron Lun (15:29:40): > BTW I don’t think we have analtExp(sce, withColData=TRUE) <- Xoption in the setter. I guess it would do the reverse of whatwithColData=TRUEdoes for the getter. I’ll need to think about the symmetry here.

2021-02-02

Aaron Lun (16:38:47): > Possibly of interesthttps://bioconductor.org/packages/devel/bioc/vignettes/SingleCellExperiment/inst/doc/apply.html

Friederike Dündar (17:33:09): > ah, that’s cute! Thanks!

2021-02-05

MARC SUBIRANA I GRANÉS (09:57:22): > @MARC SUBIRANA I GRANÉS has joined the channel

gargi (16:26:32): > @gargi has joined the channel

2021-02-11

Peter Allen (11:05:14): > I’m not sure if this has been answered here or not but I’m wanting to do a scRNA integration analysis (case/control) and I have 4 scRNA samples that have been preprocessed separately through cellranger. I created individual seurat objects then combined them. > > dataset_loc <- mapping$folder > ids <- as.character(mapping$names) > > d10x.data <- sapply(dataset_loc, function(i){ > d10x <- Read10X(i) > colnames(d10x) <- paste(sapply(strsplit(colnames(d10x),split="-"),'[[',1L),mapping$names[match(i, mapping$folder)],sep="-") > d10x > }) > > experiment.data <- do.call("cbind", d10x.data) > > experiment.aggregate <- CreateSeuratObject( > experiment.data, > project = "scRNA integration", > min.cells = 10, > min.features = 200, > names.field = 2, > names.delim = "\\-") > > For normalization, I separated the groups (case/control) normalized them using theNormalizeData()function and then integrate them via anchors. My question is: Is that typically how normalization is done? I also wanted to know if the number of cells difference between groups (21,067 vs 11,626) would affect my study?

Jared Andrews (11:07:42): > Seurat is not a bioconductor package and the devs are not here.

Jared Andrews (11:08:24): > I’d recommend asking on their github or biostars.

Aaron Lun (11:22:13): > The equivalent BioC commands would be something likeDropletUtils::read10xCounts(), which would produce an SCE object; and thenbatchelor::fastMNN(), which would perform the correction. With some normalziation and feature selection in between.

Aaron Lun (11:36:39): > Probably something along the lines of: > > ## Untested! > library(DropletUtils) > sce <- read10xCounts(all.10x.dirs) > > # QC > library(scuttle) > stats <- perCellQCMetrics(sce, subset=list(Mito=grepl("MT-", rowData(sce)$Symbol))) # or whatever to get the mito genes. > discard <- quickPerCellQC(stats, block=sce$Sample, sub.fields="subsets_Mito_percent") > sce <- sce[,!discard$discard] > > # Multi-batch normalization > library(batchelor) > sce <- multiBatchNorm(sce, batch=sce$Sample) > > # Feature selection > library(scran) > dec <- modelGeneVar(sce, block=sce$Sample) > hvgs <- getTopHVGs(dec, n=5000) # or however many genes you like > > # Correction. > mnn.out <- fastMNN(sce, batch=sce$Sample, subset.row=hvgs, correct.all=TRUE) > reducedDim(sce, "corrected") <- reducedDim(mnn.out) # for real work > assay(sce, "corrected") <- assay(mnn.out) # for plotting >

Aaron Lun (11:37:26): > Half a dozen diagnostic plots omitted, but you get the general idea.

Peter Allen (13:16:28): > Thanks!

2021-02-13

ImranF (06:51:29): > @ImranF has joined the channel

Wes W (12:34:16): > @Wes W has joined the channel

2021-02-15

Hojae Lee (21:17:36): > @Hojae Lee has joined the channel

Hojae Lee (21:21:24): > Hello, I’m sorry if this is a very obvious question, but may I ask the difference in use cases forSingleCellExperimentobject andSeuratobject? From what I have gathered so far, it seems likeSingleCellExperiment, combined withscranand/orscater, can perform a wider variety of transformations or normalization, whileSeuratobjects are more “locked-in”. However, are there more downstream analyses whereSeuratobject is more appropriate? > > Thank you very much in advance!

Jared Andrews (22:44:33): > Seurat is not a bioconductor package. Many bioconductor packages will not support it, though there are relatively simple ways to convert between them.

Jared Andrews (22:46:09): > SingleCellExperiment objects are backwards compatible and will not undergo breaking changes between versions for the most part, whereas Seurat is I think on it’s 4th major version change (though they do provide functions to update objects, I think).

2021-02-16

Hervé Pagès (01:17:19): > Aaron’s (@Aaron Lun) perspectives on the Seurat vs SingleCellExperiment question are also worth readinghttps://github.com/LTLA/scRNAseq/issues/15

Sridhar N (01:45:54): > hold my:beer:

2021-02-19

abdullah hanta (01:27:24): > @abdullah hanta has joined the channel

2021-02-23

Wynn Cheung (10:33:06): > @Wynn Cheung has joined the channel

2021-02-26

Scott Lipnick (13:38:40): > @Scott Lipnick has joined the channel

2021-03-01

Diana Hendrickx (03:18:12): > @Diana Hendrickx has joined the channel

Diana Hendrickx (03:18:32): > @Diana Hendrickx has left the channel

2021-03-05

Peter Allen (08:54:01): > I wasn’t sure which channel to pose my question, but I was wondering if there is such a method to oversample single cell ATAC/RNA bam files? There’s a picard method for undersampling (DownsampleSam (Picard)) but I haven’t found anything for oversampling or if it’s even possible.

ImranF (09:24:56): > When working with largeSCEobjects (100+ k cells), I often find myself subsampling the cells (but proportional to the cluster sizes; i.e. some grouping factor incolData). Wondering if this is worthy of a feature inscater/scran

2021-03-11

Calli Dendrou (04:08:55): > @Calli Dendrou has joined the channel

2021-03-22

Tim Triche (08:01:46) (in thread): > I’ve implemented this in velocessor for such situations ; happy to contribute to wherever since velocessor is veering off into other directions anyhow

2021-03-23

Philipp Schäfer (13:03:03): > @Philipp Schäfer has joined the channel

2021-03-24

Harry Danwonno (20:08:00): > @Harry Danwonno has joined the channel

2021-03-26

Yile He (13:27:28): > @Yile He has joined the channel

2021-04-02

Nathan Johnson (21:56:01): > @Nathan Johnson has joined the channel

2021-04-06

Lindsay Hayes (00:04:46): > @Lindsay Hayes has joined the channel

2021-04-17

Aaron Lun (19:41:52): > @Luke Zappiasplatter might benefit fromhttps://github.com/LTLA/DelayedRandomArray

Aaron Lun (19:42:13): > e.g. to generate a 20k vs 1e6 count matrix: > > ngenes <- 20000 > log.abundances <- runif(ngenes, -2, 5) > > nclusters <- 20 # define 20 clusters and their population means. > cluster.means <- matrix(2^rnorm(ngenes*nclusters, log.abundances, sd=2), ncol=nclusters) > > ncells <- 1e6 > clusters <- sample(nclusters, ncells, replace=TRUE) # randomly allocate cells > cell.means <- DelayedArray(cluster.means)[,clusters] > > dispersions <- 0.05 + 10/cell.means # typical mean variance trend. > > y <- RandomNbinomArray(c(ngenes, ncells), mu=cell.means, size=1/dispersions) >

Hervé Pagès (22:42:52): > That’s cool!

2021-04-19

Wes W (17:41:39) (in thread): > also loops with recursion or functions with recursion see that R penalty… the sever I run my code on now if pretty beefy and probably wouldnt be an issue, but back in the day I learned the hard way to crash my session that way… I tend to do all my quick data wrangling recursion stuff in another language then import that output back into R…

Wes W (17:44:15) (in thread): > that being said I had a while loop inside a while loop I made for testing out various parameters in a UMAP for a single cell data set (just changing spanning distance and nn, to 25 combinations) that R didnt like either , the one while loop worked fine, it was nesting them that bugged out… but I might have done something wrong there so cant blame the language on that one…

2021-04-21

Aishwarya Mandava (13:55:14): > @Aishwarya Mandava has joined the channel

2021-04-23

Wes W (18:51:40): > Hey crew, not a Bioc question, but I just ran into a little fire so thought I would ask my single cell crew. I was just in the middle of a mkfastq and it crashed out because I ran out of space on the disc… not something that generally happens on my RAID , but its been an odd week… looking at the error log, it looks like it was attempting to stash stuff to disc, does that mean there is a way to restart the process and pick up where I left off once i clear off some WGS and old scRNA?

Aaron Lun (18:53:18): > seems like a better question for#singlecell-queries, but generally, I think that cellranger does recover from its last good state. That’s what I vaguely remember when I last ran it.

Wes W (18:53:42): > thanks@Aaron Lun

2021-04-27

Aaron Lun (02:38:11): > @Luke Zappiazellkonverter is broken in devel due to changes inHDF5Array::H5ADMatrix’s handling of the layer specification.

Luke Zappia (02:43:59) (in thread): > Sigh. Do you know if there is an issue or something with details of the change? Hopefully I can spend some time on zellkonverter today.

Aaron Lun (02:46:41) (in thread): > the old behavior allowed us to specify a path toX, orlayers/blah. The new behavior requires us to pass inlayers=NULLorlayers="blah", respectively. This is a pain because the information extracted from the reticulate object is the former, so some mangling is required to get it into the latter.

Aaron Lun (02:47:28) (in thread): > probably something in.extract_or_skip_assaywithname <- basename(name), then setlayers=if (name!="X") name

Luke Zappia (02:47:48) (in thread): > Hmmmm…ok. Thanks!

Aaron Lun (02:49:08) (in thread): > basically, if thenamewe got from the reticulate object is"X", or"/X", we want to setlayers=NULL; otherwise we want to setlayers=basename(name).

Aaron Lun (02:52:19) (in thread): > Technically, the most correct approach is to do: > > if (HDF5Array:::h5isgroup(file, name)) { > mat <- HDF5Array::H5SparseMatrix(file, name) > } else { > mat <- HDF5Array::HDF5Array(file, name) > } > > to avoid any nasty future surprises related to changes in the name of thelayersgroup. We can just copy theHDF5Array:::h5isgroupdefinition for the time being. Ideally it would be put into ****rhdf5**** but I don’t know what happened to the maintainer of that package.

Aaron Lun (02:53:00) (in thread): > the above also has the advantage that it avoids redundant loading of the dimnames for each assay.

Aaron Lun (02:53:27) (in thread): > Anyway, stuff that in somewhere, have a look at my PR, and we’ll see if we can get an R-native writer in before the next release.

2021-04-29

Aaron Lun (00:40:10): > @Dan Bunisjust noticed that yourdittoSeq::importDemuxseems to be emitting adgTMatrix. You may wish to convert this to adgCMatrix, the latter is much more efficient for various operations. (And in fact, a lot of ****Matrix**** functions will just convert adgTMatrixto adgCMatrixbefore doing stuff, e.g.,colSums).

Dan Bunis (02:07:42): > Honestly, I have no idea if anyone other than me and a rotation student have ever used that function, and I look at it quite infrequently, but the only thing that function does to the input Seurat/SCE object is add metadata via$.

Dan Bunis (02:08:44): > maybe something upstream, I’ll take a look. ¯*(ツ)*/¯

2021-04-30

JAVAN OKENDO (02:54:37): > @JAVAN OKENDO has joined the channel

Jovan Tanevski (03:57:41): > @Jovan Tanevski has joined the channel

Ibra Lujumba (06:51:21): > @Ibra Lujumba has joined the channel

Anita Ghansah (20:26:46): > @Anita Ghansah has joined the channel

2021-05-03

Winfred Gatua (02:17:21): > @Winfred Gatua has joined the channel

Oluwaseyi Ashaka (05:04:51): > @Oluwaseyi Ashaka has joined the channel

2021-05-09

Aaron Lun (01:04:19): > @Alan O’Cperhaps we shouldDepends: scuttleinscater, save everyone anotherlibrary()call.

2021-05-10

ImranF (13:54:17): > Does SingleCellExperiment (or maybe SummarizedExperiment) have an analog toSeurat::SplitObject()?This looks similar (https://rdrr.io/github/mikelove/tximeta/man/splitSE.html), but isnt

Aaron Lun (14:01:48): > I dunno. What does it do?

ImranF (14:03:25): > Splits a single SCE into a list based on a colData factor. I just wrote a hacky version below: > > splitSCE <- function(sce, colData.var) { > vv <- unique(colData(sce)[[colData.var]]) # var vector > sce.list <- lapply(vv, function(v) sce[,colData(sce)[[colData.var]]==v]) > names(sce.list) <- vv > return (sce.list) > } > sce.list <- splitSCE(sce, "lib_id") > > Just seems like something that woul dbe there

Aaron Lun (14:29:17): > A long time ago, I put in a PR to SE about it, it never got merged. Then I accidentally overwrote the PR with another feature request and forgot about.

Tim Triche (14:31:29): > this does seem to get reimplemented a lot (and tximeta::splitSE does something different, namely splitting out rows from assays for e.g. velocity calculations)

Alan O’C (14:31:39): > I do use that pattern pretty often, and I don’t know that it needs to be massively more complex than that

brian capaldo (14:33:18): > I think I actually asked about this a while ago to, and a similar conversation ensued

2021-05-11

Megha Lal (16:45:54): > @Megha Lal has joined the channel

2021-05-21

Aaron Lun (02:06:16): > SCE 1.14.1 hascombineCols. > > library(SingleCellExperiment) > library(scRNAseq) > sce1 <- ZeiselBrainData() > sce2 <- ZeiselNervousData() > out <- cbind(sce1, sce2) # breaks for many reasons > out <- combineCols(sce1, sce2) # works >

2021-05-27

Aarthi Ravikrishnan (21:38:48): > @Aarthi Ravikrishnan has joined the channel

2021-05-29

Rob Patro (13:14:17): > Hi all, this is maybe more of a general R question, but does anyone know how I can efficiently compute the spearman correlation between all corresponding columns (cell barcodes) of a pair of singlecellexperiment objects?

2021-05-30

Aaron Lun (06:27:50): > > library(scuttle) > x1 <- mockSCE() > x2 <- mockSCE() > > library(scran) > prescaled1 <- scaledColRanks(assay(x1), as.sparse=TRUE) > prescaled2 <- scaledColRanks(assay(x2), as.sparse=TRUE) > > library(ScaledMatrix) > scaled1 <- ScaledMatrix(prescaled1, colMeans(prescaled1)) > scaled2 <- ScaledMatrix(prescaled2, colMeans(prescaled2)) > > correlation <- 1 - 2*colSums((scaled1 - scaled2)^2) > correlation[1:10] > > cor(assay(x1)[,1], assay(x2)[,1], method="spearman") # for comparison >

Rob Patro (08:22:35): > Thanks, Aaron!

2021-06-01

Charlotte Soneson (03:06:42): > @Luke Zappia/@Aaron Lun- is there a way currently to get the data fromadata.raw.Xinto aSingleCellExperimentwithzellkonverter?

Luke Zappia (03:11:06) (in thread): > Not by default, we have kinda ignored therawslot for now. If you have the Python object in an R environment have can you try passingadata.rawtoAnnData2SCE()(something likeAnnData2SCE(adata$raw))? I can’t remember ifrawis a complete anndata or not but if it is that might work.

Charlotte Soneson (03:16:21) (in thread): > Ok, let me try that

Charlotte Soneson (03:47:34) (in thread): > That does not seem to do it: > > > AnnData2SCE(res$raw) > Error in py_get_attr_impl(x, name, silent) : > AttributeError: 'Raw' object has no attribute 'layers' >

Charlotte Soneson (03:51:31) (in thread): > But I can access the matrix ‘manually’ (res$raw$X), so maybe that’s the way to go.

Luke Zappia (03:53:06) (in thread): > :crying_cat_face:Looks like maybe we should add some support for this. Maybe open an issue as a reminder? > > In the meantime if all you want is the X matrix you can probably extract it directly with something like: > > X_mat <- t(adata$raw$X) > > Obviously that’s pretty manual though and there could be issues ifreticulatedoesn’t like the matrix formats. Sorry I’m not more helpful.

Charlotte Soneson (03:54:34) (in thread): > Yes, I think I’ll do that for now - it seems to work, and we have some controls that we can check the values against. I’ll open an issue as well. Thanks a lot!

Charlotte Soneson (04:27:10) (in thread): > Btw, I also tried to work around it by replacing the originalXmatrix inresand runningAnnData2SCE, but that seems to give me an error as well (although this works with the originalX) > > > res$X <- res$raw$X > > AnnData2SCE(res) > Error in as(mat, "dgCMatrix") : > no method or default for coercing "dgRMatrix" to "dgCMatrix" >

Luke Zappia (04:29:04) (in thread): > Huh, that’s weird. Surely that method exists…? Thanks!

Paul Hoffman (11:10:19) (in thread): > {Matrix} doesn’t define methods for convertingdgRMatrixtodgCMatrixor vice versa. Instead, you have to go through the parent classes ofRsparseMatrixandCsparseMatrix(as(mat, "CsparseMatrix")). From what I’ve seen, it tends to always convert dgR to dgC, but {Matrix} is weird sometimes

Luke Zappia (11:13:13) (in thread): > Ah, yes! I vaguely remember running into this before. Thanks!

2021-06-02

Rob Patro (21:29:38): > Any idea why I might be getting the following when running the spearman calculation code above? > > Error in blockApply(x, FUN = FUN, ..., grid = grid, as.sparse = NA, BPPARAM = BPPARAM) : > formal argument "as.sparse" matched by multiple actual arguments >

Aaron Lun (21:31:29): > not beyond making sure you’re on the latest version of everything

Aaron Lun (21:32:04): > though that code works on both the current release and previous devel, so I didn’t think there was anything too special in between.

Rob Patro (21:44:16): > Ahh, looks like I am a (stable) release behind!

Rob Patro (21:44:24): > that probably explains it.

2021-06-04

Flavio Lombardo (05:53:19): > @Flavio Lombardo has joined the channel

Wes W (08:41:27): > Hey all, I am doing some benchmarks on various integration methods for the data in my study and writing a few custom functions. I have searched google and maybe I am just searching it wrong , but is there a way to set a default reduction on an sce object? my code works fine if I doreducedDim(sce, 'MYintegration')but that only works if I can pass the known reduction to function. because non-integration data is also in the reducedDim’s (PCA, UMAPs, maybe a dozen UMAPS) I dont want to run a loop forreduction[i]because then I will be doing a bunch of computations I dont care about on a bunch of reductions (which would just waste cpu cycles). I can of course hard code passed reductions as a list into the function, BUT before I do that, for ease of use for anyone else trying to run my function in the future , is there a way to set a reducedDim on an sce object as default? sorry if I am over thinking this

ImranF (09:15:17): > FWIW, its better to be explicit (regarding which embedding you’re operating on). With that said, your downstream functions could work on the (brittle) assumption that the first reducedDim is the “default”. The onus is then on you ensure the reducedDim you care about is always the first one returned by reducedDims(sce)

Wes W (09:29:53) (in thread): > thanks

2021-06-09

Lorenzo Bonaguro (03:40:19): > @Lorenzo Bonaguro has joined the channel

2021-06-16

Rob Patro (21:52:38): > Hi@Aaron Lun: So, in the process of trying to figure out the random set of heuristics used by CellRanger’s (bootleg-ish) implementation of emptyDrops, my student and I took at look at STARsolo, in which Alex has reverse engineered what they are doing. My student “backported” that to an R implementation that makes use of the proper emptyDrops. I know that you mentioned inhttps://github.com/MarioniLab/DropletUtils/issues/42that you have no interest in chasing down those random heuristics yourself (which I understand, it was a pain even having to go fetch them out of the implementation that was already reverse engineered for us). However, would you be interested in this function as a PR maybe?

Rob Patro (21:54:36): > We were only interested in it mainly because it seems these random heuristics have a big effect specifically when filtering for single-nucleus RNA-seq samples (though we’ve not necessarily validated that these heuristics are in any sense optimal). But I’m not sure if it’d even be something you’d be interested in having in DropletUtils or not.

2021-06-17

Aaron Lun (11:33:14): > my initial inclination is that I don’t want to bother with it, but it depends on how much code can be shared with the originalemptyDrops()implementation. If it shares the core Monte Carlo code, then perhaps.

Aaron Lun (11:34:07): > Otherwise you could just make a new package and stick the function in there.

Rob Patro (11:34:52): > Our function is just a simple wrapper aroundemptyDrops

Rob Patro (11:35:04): > it usesemptyDropsout of the box

Rob Patro (11:35:42): > the other trappings are pre and postemptyDropsheuristics that CellRanger (& hence default STARsolo) apply to what is otherwise a theoretically justified filtering approach …

Aaron Lun (11:39:48): > hm, I see

Aaron Lun (11:40:08): > well, if you make a PR I’ll have a look at it

Aaron Lun (11:40:33): > in fact, if I can’t tell the difference between the PR code and my own, I’ll proabbly just merge it

Aaron Lun (11:41:15): > e.g. 4 space indent, no tidyverse nonsense,<-instead of=, etc. etc.

Rob Patro (12:14:36): > Thanks! I’ll ping my student about it (with the style suggestions). My R-fu is weak, so I’m trusting him to make otherwise reasonable code suggestions. Of course, once the PR is submitted, we’ll incorporate any feedback you have there until you feel comfortable with a merge.

Rob Patro (15:19:29): > Ok — PR submitted :https://github.com/MarioniLab/DropletUtils/pull/64.

2021-06-24

Ilaria Billato (08:29:02): > @Ilaria Billato has joined the channel

2021-07-01

Ben Story (08:44:42): > I’m working on a package where I have a big matrices with the columns containing each single-cell. The rows in this case are variant counts OR allele frequencies OR likelihood values. Currently I’m using an internal hand-made class but I’m wondering if there would be an alternative such as a SingleCellExperiment object or something that would be more appropriate and Bioconductor-friendly that I could conform my data to?

Chris Vanderaa (11:36:05) (in thread): > If your 3 modalities (counts, frequencies, likelihood) represent the same features (same number of rows), then you should be able to useSingleCellExperimentand store the data as separate matrices in theassays(). > If your modalities have different number of rows, you might rather useMultiAssayExperimentwhere each matrix is than stored as a separateSingleCellExperiment. Moreover, if the rows are linked between matrices, you might be interested to useQFeatures.

2021-07-05

Chouaib Benchraka (01:57:05): > @Chouaib Benchraka has joined the channel

Bernd (12:38:40): > I am suddenly getting the following error message: > unable to find an inherited method for function ‘bindCOLS’ for signature ‘“SingleCellExperiment”’

Bernd (12:39:14): > I believe this happens when I try to use cbind with two singleCellExperiment objects.

Bernd (12:40:10): > I am in the middle of deploying some programs to some machines for a course that should happen in 2 days and it would be great if I could get this to work:wink:

Bernd (12:40:20): > any suggestions are welcome

Bernd (13:15:50): > It always helps writing down things in chat groups. It really helps finding the problem yourself:smile:. Somehow the library was not loaded at the right moment… solved. Sorry for the spam

Bernd (14:06:56): > Let’s try this again: > I get the following error message on Linux/Ubuntu R4.1: > Error in .aggregate_and_align_colnames(all_colnames, strict.colnames = strict.colnames) : the DFrame objects to combine must have the same column names. > On Mac, no error message and he does a full outer merge (which is what I want) > I tried playing with strict.colnames, but no luck…

Bernd (14:19:22): > I found this in the docs: > The colnames incolData(SummarizedExperiment)must match or an error is thrown. > > But why is it working on Mac? Might this be related to loading libraries in a different order again??

Martin Morgan (14:52:03): > DoBiocManager::version()andBiocManager::valid()report the same on the two systems?

Bernd (16:48:06): > not exactly the same. too many packages involved to poste here. The once not valid are “BiocParallel”, “ComplexHeatmap”, “ggnetwork”, “InteractiveComplexHeatmap” on the one side and “BiocParallel”, “ComplexHeatmap”, “gargle”, “InteractiveComplexHeatmap”, GenomeInfoDb, Seurat, spatstat on the linus side

Bernd (16:49:54): > both: bioconductor 3.12, R 4.1.0, different BLAS/LAPACK and locale

Bernd (16:50:46): > somehow I cannot copy from the VM (linux) to the mac (with slack)

Bernd (17:14:06): > i updated the old packages on the linux machine. Still same problem.

Bernd (17:15:25): > Which machine has the real problem? According to the documentation the Mac, but I prefer the way the Mac is behaving…:wink:

2021-07-06

Bernd (01:07:44): > I also tried to make sure that the libraries are called in the right order, but that didn’t help either. > How can I check which default values are being used? There was this strict.colnames for examples that I didn’t set, maybe there is the difference?

Aaron Lun (02:06:26): > An MRE would be heflpul.

Davide Risso (02:08:06): > This may be unrelated to your problem (but it actually may indirectly solve it)… the latest Bioconductor release is 3.13. I would recommend using that for a course rather than teaching older versions of packages

Davide Risso (02:09:56): > Also, isn’t BiocManager::valid() complaining that you’re using Bioc 3.12 with R 4.1?

Bernd (05:02:35): > Sorry, quite embarrassing… It seems the fault is my side. A bit difficult to explain, but in the end I should first looked at the docs, then identify that the real problem is the side that didn’t create an error message. Thus, I don’t think it will help anyone to get into the details of this issue and you might want to delete the whole conversation. Sorry for the spam, really embarrassed:flushed:… > And thanks for pointing out the bioc version issue.

Friederike Dündar (06:17:26): > These things always happen while stress-preparing for a class:slightly_smiling_face:

Friederike Dündar (06:17:32): > #beenthere

Friederike Dündar (06:17:42): > #willbethereagain

Davide Risso (06:22:50): > exactly! No need to be embarrassed!

Bernd (10:41:41): > Thanks for your understanding:wink:

2021-07-12

Jared Andrews (12:21:39): > If I wanted to cram a bunch of arbitrary dataframes (say, several DESeq2 results) into an SCE, would the internal metadata be the place to do that or is there a more appropriate option?

Bernd (12:44:34) (in thread): > Just to let you know the presentation was ok and the tutorial as well. We might be able to share parts of the tutorial in the future through youtube…

2021-07-13

Luke Zappia (03:40:44) (in thread): > I think that’s probably the best place if they don’t fit intorowDataorcolData. If it an internal package thing useint_metadatabut if it is something the user should see or just for your analysis I would use regularmetadata. > > DE results specifically you could probably put inrowDataif you have results for every gene (maybe as a list column) butmetadatais probably simpler.

Jared Andrews (09:01:54) (in thread): > Alright, that’s what I figured, thanks.

2021-07-14

Peter Allen (14:14:14): > @Peter Allen has joined the channel

Peter Allen (14:16:34): > Is there a vignette for how to process 10x runs for ~55 samples (each with their own run)? Google has not been too forgiving. I noticed there’s a vignette onmulti-sample comparisonsbut I’ve read various threads mentioning having to preprocess separately. - Attachment (bioconductor.org): Chapter 14 Multi-sample comparisons | Orchestrating Single-Cell Analysis with Bioconductor > Or: how I learned to stop worrying and love the t-SNEs.

ImranF (14:58:26): > Well it depends on what you mean by “process”, but everything before clustering/integration would have to be done per-sample

Friederike Dündar (15:07:25): > lapplyis your friend

Peter Allen (15:07:55) (in thread): > Right. I’m on board with that, but at what point would I integrate/merge the samples into a sce?

Friederike Dündar (15:13:10): > That chapter you’re referring to above is pretty comprehensive and verbose IMO, make sure to look at the “history” tabs to see how the processing was done

Friederike Dündar (15:13:34): - File (PNG): image.png

ImranF (17:30:04) (in thread): > Of the top of my head… after QC, doublet removal, variance stabilizing… Though I’d recommend you first familiarise yourself with the “standard” scRNA workflow

2021-07-19

Leo Lahti (17:02:44): > @Leo Lahti has joined the channel

2021-07-22

Konrad J. Debski (03:48:35): > @Konrad J. Debski has joined the channel

2021-07-23

Batool Almarzouq (15:54:06): > @Batool Almarzouq has joined the channel

2021-07-26

Wes W (08:58:32): > probably not sce related , but just incase anyone else had this problem. I normally save my sce objects as rds in R (if there is a better way, please do tell me), > > I havent had any issues except that saveRDS is slow and there is no progress bar, i just wait until the process ends and assume it worked. for about 2 years now, it has. > > but today I tried to open an object back up from 2 weeks ago and I got this error:ReadItem: unknown type 49, perhaps written by later version of Rgoogling didnt reveal anything about type 49 that I could find, although there are like 30 other numbers that come in google with similar error , with the perhaps written later by thing. > > none of the “solutions” for the other errors seemed applicable to me (plenty of disc space, same version of R) , I tried to open in two different O/S with the same R version and I get the same error. > > I think the file is corrupt , although the file size is correct for what you would expect from my previous older saves. > > is there anything else I can try to restore and / or load this file? thank you

Pablo Rodriguez (11:52:13) (in thread): > Tryqs::qread()orvroom::vroom(), maybe there is a chance the format is valid and can be read by one of those

Wes W (12:06:03) (in thread): > def worth a shot , thanks Pablo, trying now

2021-07-27

Vince Carey (07:38:10) (in thread): > @Wes Wwould it be possible to make the offending object available in, e.g., cloud storage?

Wes W (08:34:12) (in thread): > @Pablo Rodriguezsadly no luck with that approach@Vince Careyyep i could do that, you have an idea?

Vince Carey (09:26:57) (in thread): > well, nothing specific, but I think other core members might be able to diagnose better with access to the object. you could run R under a debugger to step through events leading to the bad ReadItem – it might be useful to know the result of validObject(sce) prior to serialization (if you can recreate), value of sessionInfo at time of production, etc.

2021-07-30

lalchung nungabt (04:31:04): > @lalchung nungabt has joined the channel

2021-08-04

ChiaSin (09:59:13): > @ChiaSin has joined the channel

Tim Triche (16:21:16): > just in case it hasn’t been said before, sometimes when Seurat operations crash your R session on a machine with 128GB of RAM and 8 cores, switching to an SCE can make things go faster, use less RAM, and actually complete. Seurat really is a pig (and a slow one, at that)

Tim Triche (16:22:05): > tempted to post a Human Fetal Cell Atlas (or, actually, two) as a demonstration. Monocle3 at least uses SCE to inherit from. Seurat uses something that explodes into a fiery ball of RAM consumption

Jared Andrews (16:32:15): > scale.dataoutputs a dense matrix that balloons object size wildly, iirc. I thinkSCTransformdoes the same.

Tim Triche (16:43:10): > that’s the one that killed my machine –scale.data

Tim Triche (16:44:17): > one of the grad students I advise has implemented an extremely fast and robust NMF approach that seems to render most of these things pointless. It’s also far more interpretable than typical approaches that assume sphericity

Tim Triche (16:44:56): > Takes about 30 seconds to iterate over the human fetal cell atlas subsets that we care about (blood and heart cells).

Kevin Blighe (18:47:33): > Tim (or others), would it be feasible to use the Human Fetal Cell Atlas as a reference for mouse foetal cells by merely converting the gene names between human-to-mouse (or vice-versa). I have seen this being done, but wasn’t sure if it’s ‘a thing’

Tim Triche (20:51:25): > Neal Young did something like this for blood, and Cao demonstrates correspondence between the two at least in the hematopoietic branches inhttps://science.sciencemag.org/content/370/6518/eaba7721/tab-figures-data#sec-10 - Attachment (Science): A human cell atlas of fetal gene expression

Tim Triche (20:53:21): > We’ve been doing something not entirely unlike this for mouse CITEseq to look at L-R interactions and rig up something like a poor man’s PICseq; usingMus.musculusandHomo.sapiensto rationalize orthologs seems to work better thanOrthology.eg.dbfor whatever tha’ts worth.

2021-08-05

Assa (01:50:43): > I hope this is the right forum to maybe search for information about my question, but if not, please let me know.

Assa (01:50:52): > Is there a prefer way to handle zeros with a scRNA-Seq data set, depending on the technology used? I mean, should a 10x data set be handled differently than a SMART-Seq data set? How do I decide whether i should impute or use a zero-inflated based algorithm? Any comments on that?

Julia Philipp (03:07:51): > @Julia Philipp has joined the channel

Prateek Arora (04:53:28): > @Prateek Arora has joined the channel

Stephanie Hicks (08:27:36) (in thread): > @Assagreat question! yes this is a widely discussed topic in the field. Briefly, the choice of how many zeros we “expect” to see does seem to vary across technologies. This difference is important in how not only we preprocess the data (e.g. imputation), but also how we model the data (e.g. dimensionality reduction or differential expression). Here are a few papers you might find relevant related to just evaluating “how many zeros we expect to see in 10x data”. specifically, they show 10x data are not zero inflated > * https://www.nature.com/articles/s41587-019-0379-5 > * https://academic.oup.com/bioinformatics/article/33/21/3486/3952669 > * https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1861-6 > In terms of just imputation, here is a paper on evaluating and benchmarking 18 scRNA-seq imputation methods where both 10x and smart-seq data are discussed: > * https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02132-x > Two other more recent papers on this topic that you might find relevant: > * https://www.biorxiv.org/content/10.1101/2020.12.28.424633v1 > * https://www.biorxiv.org/content/10.1101/477794v2 > Happy single-cell data analysis-ing!

Assa (08:36:46) (in thread): > Thanks for this greatly detailed and expertly structured answer. All I have to do now, is start reading.:slightly_smiling_face:

Stephanie Hicks (08:41:21) (in thread): > sure thing happy to help. Enjoy!

Stephanie Hicks (08:44:08) (in thread): > @Assaoh and p.s. for future queries, you might find#singlecell-queriesfind a better place to ask such Qs as this channel is primarily focused on Qs around working with the SingleCellExperiment class / objects.:slightly_smiling_face:

Assa (08:49:28) (in thread): > great, didn’t see that channel. thanks again.

Ambarish S. Ghatpande (09:01:44): > @Ambarish S. Ghatpande has joined the channel

shristi shrestha (10:25:38): > @shristi shrestha has joined the channel

2021-08-06

Wanmin Dai (01:09:45): > @Wanmin Dai has joined the channel

Wanmin Dai (09:26:43): > Hello everyone, just wondering has anybody used Zellkonverter to convert between anndata and sce? It seems that my sce object exported by the readH5AD function does not have a count assay?

Luke Zappia (09:54:26) (in thread): > The assays converted byreadH5AD()will be named the same as they were in the.h5adfile. Most likely this will beXbut there might be others as well. WhetherXis counts or not depends what is saved in the file.

Wanmin Dai (09:59:38) (in thread): > Thanks a lot Luke! That indeed seems to be the case and there is a count assay under X. Sorry I’m not familiar with anndata, but is there a way to rename the X assay so downstream analysis packages can correctly recognize it as counts?

Luke Zappia (10:15:26) (in thread): > Once you have the SingleCellExperiment you can access it like any other assay so something likecounts(sce) <- assay(sce, "X")should work.

Wanmin Dai (10:17:20) (in thread): > Thanks a lot, much appreciated!

Luke Zappia (10:17:33) (in thread): > If you havezellkonverterv1.2.1 you can also set theX_nameargument inreadH5AD(), soreadH5AD(file, X_name = "counts")should give you an assay named “counts” rather than “X”.

Felix M (19:29:43): > @Felix M has joined the channel

2021-08-10

Woo (08:18:29): > @Woo has joined the channel

2021-09-07

Andrew Jaffe (14:52:08): > @Andrew Jaffe has joined the channel

2021-10-06

Frederick Tan (12:04:16): > @Frederick Tan has joined the channel

Frederick Tan (12:16:20): > While attempting to create some “cached” intermediates for a workshop, we ran into an issue. Instructors are attempting to save ExperimentHub data e.g. > > sce <- TENxPBMCData() > saveRDS( sce, "sce.pbmc.rds" ) > > However, when participants attempt to use the data e.g. > > sce <- readRDS( "sce.pbmc.rds" ) > assay( sce, "counts" ) > > They get get this error > > Error in value[[3L]](cond) : 'assay(<SingleCellExperiment>, i="character", ...)' invalid subscript 'i' failed to open file '/home/idies/.cache/ExperimentHub/4123a2dff8_1611' > > Is this a supported use case? Wasn’t able to find any options to break the link to .cache and “localize” everything to the object. Note that this is a mini example and the problem initially cropped up when attempting to cache the results of a multisample integration so it’d be great to be able to have students start with that object:slightly_smiling_face:

Lori Shepherd (12:26:38): > I can’t reproduce this locally – is it being run on docker, in anvil, or else? > > > library("TENxPBMCData") > > sce <- TENxPBMCData() > snapshotDate(): 2021-10-06 > see ?TENxPBMCData and browseVignettes('TENxPBMCData') for documentation > loading from cache > > saveRDS( sce,"sce.pbmc.rds" ) > > new session > > > library("TENxPBMCData") > > sce <- readRDS( "sce.pbmc.rds" ) > > assay( sce, "counts" )<33694 x 4340> sparse matrix of class DelayedMatrix and type "integer": > [,1] [,2] [,3] [,4] ... [,4337] [,4338] [,4339] > ENSG00000243485 0 0 0 0 . 0 0 0 > ENSG00000237613 0 0 0 0 . 0 0 0 > ENSG00000186092 0 0 0 0 . 0 0 0 > ENSG00000238009 0 0 0 0 . 0 0 0 > ENSG00000239945 0 0 0 0 . 0 0 0 > ... . . . . . . . . >

Marcel Ramos Pérez (12:33:54): > I also can’t reproduce on Bioc 3.14, R version 4.1.1, TENxPBMCData version 1.11.1. > > library(TENxPBMCData) > # [truncated...] > sce <- TENxPBMCData() > #> snapshotDate(): 2021-10-06 > #> see ?TENxPBMCData and browseVignettes('TENxPBMCData') for documentation > #> downloading 1 resources > #> retrieving 1 resource > #> loading from cache > saveRDS(sce, "sce_pbmc.rds") > sce2 <- readRDS("sce_pbmc.rds") > head(assay(sce2, "counts")) > #> <6 x 4340> sparse matrix of class DelayedMatrix and type "integer": > #> [,1] [,2] [,3] [,4] ... [,4337] [,4338] [,4339] > #> ENSG00000243485 0 0 0 0 . 0 0 0 > #> ENSG00000237613 0 0 0 0 . 0 0 0 > #> ENSG00000186092 0 0 0 0 . 0 0 0 > #> ENSG00000238009 0 0 0 0 . 0 0 0 > #> ENSG00000239945 0 0 0 0 . 0 0 0 > #> ENSG00000239906 0 0 0 0 . 0 0 0 > #> [,4340] > #> ENSG00000243485 0 > #> ENSG00000237613 0 > #> ENSG00000186092 0 > #> ENSG00000238009 0 > #> ENSG00000239945 0 > #> ENSG00000239906 0 >

Nitesh Turaga (12:34:22): > What version of R are you running@Frederick Tan

Nitesh Turaga (12:34:23): > ?

Nitesh Turaga (12:34:43): > and bioconductor.

Frederick Tan (12:40:42): > Using containers onsciserver.org(R 4.0.3 / Bioconductor 3.12)

Frederick Tan (12:41:12): > @Lori ShepherdWhen you say “new session” are you deleting your.cachedirectory?

Lori Shepherd (12:41:34): > no – opening a new R session in a different terminal (as I am working command line)

Frederick Tan (12:42:09): > We don’t see this if the creator of the object attempts to load … only when a different user attempts to load.

Frederick Tan (12:42:57): > Don’t quite understand what this is > > > sce@assays@data@listData$counts@seed@filepath > [1] "/home/idies/.cache/ExperimentHub/4123a2dff8_1611" >

Lori Shepherd (12:45:28): > my guess based on that then – is because its a DelayedArray based object when its saved its saving the path to the files and not the resolved object - I don’t know howsciserve.orgworks or the location path file system but I would imagine its a permission access the cache under a users home directory – maybe try saving the cache in a location or with permissions for all users?

Lori Shepherd (12:45:33): > but just a guess –

Frederick Tan (12:49:44): > Definitely students don’t have access to other instructor’s .cache. So two questions > 1. Is this a supported use case for ExperimentHub? i.e. Instructor downloads, saves, and then shares with students so that all the students don’t have to “run” ExperimentHub? > 2. Does SingleCellExperiment already have all the data in the object or is it tied to the cache? Is there a way to “localize” and break the link to the cache?

Lori Shepherd (13:47:40): > It should be a use case. you can control the location of the cache generation – and there are functions to “export” a cache to share with collaborators which might be the route that needs to be taken –@Michael Lovedid you ever play around with the use case of using it across various users for direct access to a shared location? > > I can’t answer the SingleCellExperiment question as I don’t have enough experience to know how it works under the hood as far as localize vs delayed data aspect

Marcel Ramos Pérez (13:53:18): > Hi Frederick@Frederick Tan, I was able to reproduce the issue by saving the file as above and then trying to load it in a docker container (with a different cache): > > > assay(sce, "counts") > Error in value[[3L]](cond) : > 'assay(<SingleCellExperiment>, i="character", ...)' invalid subscript 'i' > H5Fis_hdf5() returned an error > > The portable way to share H5-based SCE objects is to useHDF5Array::saveHDF5SummarizedExperimentand to share a zip of that folder. > > # instructor saves data into folder > saveHDF5SummarizedExperiment(sce, "my_sce") > # students would then run below after unzipping the folder > asce <- loadHDF5SummarizedExperiment(dir = "my_sce") > assay(asce, "counts") > # <33694 x 4340> sparse matrix of class DelayedMatrix and type "integer": > # [,1] [,2] [,3] [,4] ... [,4337] [,4338] [,4339] [,4340] > # ENSG00000243485 0 0 0 0 . 0 0 0 0 >

Frederick Tan (14:06:53): > Thank you@Marcel Ramos Pérez! Passing this along to the instructors. Wondering if there’s a place this could be further advertised … seems tricky since we encountered the problem through ExperimentHub -> TENxPBMCData -> SingleCellExperiment

claudiozanettini (14:18:58): > @claudiozanettini has joined the channel

Michael Love (20:04:48) (in thread): > Unfortunately not:confused:The plan was to create a space where multiple users of a Linux group have read/write permission. Seems like it would work

Michael Love (20:11:20) (in thread): > Instructor could use a .cache location which is readable? That seems like a useful workshop technique which would be good to try out

Frederick Tan (20:29:31) (in thread): > Yes, I think we were able to test the shared .cache and that it works (need to confirm)

Frederick Tan (20:30:41) (in thread): > But was curious about other options … definitely going to explore saveHDF5SummarizedExperiment later (e.g. does it package up and then restore the relevant .cache files?)

2021-10-07

Hervé Pagès (14:36:34) (in thread): > We’re open to suggestions for how to improve the situation. Please discuss here:https://github.com/Bioconductor/SummarizedExperiment/issues/59

2021-10-21

Emily Collins (14:28:26): > @Emily Collins has joined the channel

2021-10-22

JP Cartailler (10:41:50): > @JP Cartailler has joined the channel

2021-10-28

Jun Yu (11:27:53): > @Jun Yu has joined the channel

2021-10-29

Enrico Ferrero (13:22:26): > @Enrico Ferrero has joined the channel

2021-11-02

Aedin Culhane (15:09:06): > Who is working on interoperability between SCE and anndata? (is there a project on this)

Paul Hoffman (15:21:06) (in thread): > https://github.com/cellgeni/sceasy

Alan O’C (15:43:26) (in thread): > zellkonverter as well?

Federico Marini (18:51:12) (in thread): > zellkonverter is simply the:bomb:

2021-11-03

Luke Zappia (03:38:31) (in thread): > Happy to talk about zellkonverter if you want. There’s also the R anndata package and anndata2ri if you want to do things from the Python side.

Davide Risso (07:18:57) (in thread): > this seems all very related to one of the#hca-data-insightsdiscussions planned for this week. It would be great to have more input/participants if you want to join (see the#hca-data-insightschannel for details)

2021-11-08

Paula Nieto García (03:29:39): > @Paula Nieto García has joined the channel

2021-11-10

Gabriel Hoffman (11:25:33): > @Gabriel Hoffman has joined the channel

2021-11-11

kent riemondy (14:39:32): > @kent riemondy has joined the channel

2021-11-24

Heather Bouzek (14:46:29): > @Heather Bouzek has joined the channel

2021-12-14

Megha Lal (08:24:10): > @Megha Lal has left the channel

2021-12-27

Leo Lahti (15:25:22): > What would be the best way to split aSummarizedExperimentobject by sample groups (categories defined in colData)?

ImranF (15:37:50): > Hi Leo, this keeps coming up. Brief discussion and code therehttps://community-bioc.slack.com/archives/C6KJHH0M9/p1620669257022600 - Attachment: Attachment > Does SingleCellExperiment (or maybe SummarizedExperiment) have an analog to Seurat::SplitObject() ?This looks similar ( https://rdrr.io/github/mikelove/tximeta/man/splitSE.html ), but isnt

Leo Lahti (16:23:26) (in thread): > Thanks!

2021-12-30

Tuomas Borman (03:42:42): > @Tuomas Borman has joined the channel

2021-12-31

Leo Lahti (17:41:49): > ThesampleMapmechanism allows non-trivial (i.e. other than one-to-one) matchings betweencolDataand the individual experiments inMultiAssayExperiment. But does anyone have tips for a solution (class?) that provides similar non-trivial matching schemefor featuresbetween experiments inMultiAssayExperiment?

2022-01-01

Vince Carey (12:05:59): > @Marcel Ramos Pérez@Levi Waldron^^

Levi Waldron (12:50:28) (in thread): > It’s something we’ve talked about but not implemented, and I’d be happy to revisit with some use cases in mind.

2022-01-02

Leo Lahti (03:32:34) (in thread): > One possible use case is in microbiome research, where the features are often taxonomic units (genus, species..), but they can also be functional units (genes, pathways..). Mappings between taxonomy and function are available, and often relevant for research. A better support for feature matching could allow more efficiently look at questions like 1) which taxa may have contributed to some detected genes; or 2) which pathways might be altered, given the observed shifts in taxonomic composition.

Leo Lahti (03:44:40) (in thread): > The question came up in this thread:https://github.com/microbiome/mia/issues/190. This is not exactly single-cell related but there might be similar questions withSCE.

2022-01-03

Kurt Showmaker (17:05:24): > @Kurt Showmaker has joined the channel

2022-01-19

Stephany Orjuela (10:10:29): > @Stephany Orjuela has left the channel

2022-01-25

Norbert Tavares (17:42:30): > Has anyone tried to infer race/ancestry from single-cell transcriptomics data, essentially approximating genotyping assays from scRNA-seq data?

2022-01-28

Andrew Jaffe (11:42:31): > i assume that an intermediary output of this pipeline would be such a genotype matrix necessary to infer ancestryhttps://www.nature.com/articles/nbt.4042?report=reader - Attachment (Nature): Multiplexed droplet single-cell RNA-sequencing using natural genetic variation > Nature Biotechnology - Droplet single-cell RNA-seq is applied to large numbers of pooled samples from unrelated individuals.

Andrew Jaffe (11:43:01): > (i’ve never used that pipeline but genotype variation has been used to demultiplex mixed pools of cells from different donors in the same 10x channel)

Norbert Tavares (17:29:37) (in thread): > Yep. I’m chatting with Ye and Zaitlen about what next steps are need to infer ancestry.

2022-02-01

Lindsay Clark (09:58:09): > @Lindsay Clark has joined the channel

2022-02-10

Susanna (03:45:32): > @Susanna has joined the channel

2022-02-15

Gene Cutler (12:01:40): > @Gene Cutler has joined the channel

2022-03-05

Giulio Benedetti (15:17:26): > @Giulio Benedetti has joined the channel

2022-03-10

Ramon Massoni-Badosa (11:27:25): > @Ramon Massoni-Badosa has joined the channel

Ramon Massoni-Badosa (12:05:37): > Hi all! In the context of the human cell atlas (HCA), we have generated a multimodal atlas of human tonsils. We are currently developing the HCATonsilData package, which similarly to the TabulaMurisSenisData and HCAData packages, aims to provide programmatic access to our dataset via ExperimentHub. For scRNA-seq, CITE-seq and Spatial transcriptomics I think it will be pretty straight-forward. However, I’m running into some problems with scATAC-seq. I’ve analyzed the data with Signac, so now I have the data in the form of Seurat/ChromatinAssay objects. Is there any SingleCellExperiment extension to accommodate this? Most of the slots are similar, but there are some like the FragmentsObject which are ATAC-specific.

Ramon Massoni-Badosa (12:06:39): > Our vision is to make our data as interoperable as possible, so ideally we would like to provide fast access to the different analysis tools (archR, Signac, etc)

Ramon Massoni-Badosa (12:08:26): > thanks in advance for your help!

Martin Morgan (12:22:51) (in thread): > @Ramon Massoni-Badosanot directly relevant to the multi-modal aspect, but just wanted to point out, in case you haven’t seen them, thehcapackage (in Bioconductor) and cellxgenedp package (submitted to Bioconductor) which provide convenient programmatic ways of discovery and access to HCA data resources, including datasets that have been processed using standard HCA pipelines. > > The vignettes of both hca (two vignettes, from thehca landing page) and cellxgenedp (theRmd filein the github repo should still be pretty readable; check out the shiny app onthis branch!) provide very useful introductions. - Attachment (Bioconductor): hca (development version) > This package provides users with the ability to query the Human Cell Atlas data repository for single-cell experiment data. The projects(), files(), samples() and bundles() functions retrieve summary information on each of these indexes; corresponding *_details() are available for individual entries of each index. File-based resources can be downloaded using files_download(). Advanced use of the package allows the user to page through large result sets, and to flexibly query the ‘list-of-lists’ structure representing query responses.

Ramon Massoni-Badosa (12:51:45) (in thread): > thanks@Martin Morgan! will check

2022-03-18

Olga Malkova (15:41:27): > @Olga Malkova has joined the channel

2022-03-21

Pedro Sanchez (05:02:49): > @Pedro Sanchez has joined the channel

2022-03-23

Doortje Theunissen (04:39:17): > @Doortje Theunissen has joined the channel

Susanna (13:43:45): > Hi all. Does anyone know how to deal with h5ad file? Doing some research I found the Seurat library could help. From Seurat, I tried Convert(‘input_file.h5ad’, dest = ‘new_file.h5seurat’) and then I’m supposed to load the new file with LoadH5Seurat(‘new_file.h5seurat’). > However I’m having trouble with the first part: the Concert function throws me an error saying “Error: Not a sparce matrix”. Has this ever happened to anyone? Thank you :)

Charlotte Soneson (13:50:31) (in thread): > Not a direct answer to yourSeuratproblem (note thatSeuratis not a Bioconductor package, so this is often not the best forum forSeurat-specific questions), but if you want aSingleCellExperimentobject, I would recommend taking a look at thezellkonverterpackage:https://www.bioconductor.org/packages/release/bioc/vignettes/zellkonverter/inst/doc/zellkonverter.html

Susanna (13:51:11) (in thread): > Thank you, will do!

2022-03-29

Tim Triche (13:14:32): > oh hey

Tim Triche (13:14:45): > my postdoc was squawking about how S4 can’t play nice with C++

Tim Triche (13:15:18): > I mentioned that the whole point of the shallow reference class at the heart of SummarizedExperiment/SingleCellExperiment is to deal with indirection

Tim Triche (13:15:41): > am I full of misconceptions or is he? If I didn’t have a 2pm I’d dig into the current code to definitively settle this

Tim Triche (13:16:05): > also, I tried converting some .h5ad files to Seurat multimodal objects the other day and it did not work as expected. The altExp formalism is less full of surprises.

Tim Triche (13:16:11): > SCE is so much nicer

Alan O’C (13:18:24): > What does > > S4 can’t play nice with C++ > mean exactly?

Tim Triche (13:19:37): > his complaint was coming from a career C++ programmer who likes maximum freedom in terms of matrix representations and indexing

Tim Triche (13:20:05): > so I can sort of kind of understand it, but betweenbeachmatand SCE/DelayedArray, I don’t necessarily agree

Alan O’C (13:21:11): > Ah right, sounds like a discussion that’s way above my pay grade. Carry on

Tim Triche (13:22:02): > not necessarily – the question is whether it is ever a good idea to reinvent the wheel

Hervé Pagès (19:01:57): > FWIW shallow reference was dropped from the SE/SCE internals in 2019. Seehttps://github.com/Bioconductor/SummarizedExperiment/commit/e8a159a81e8d5805c7781301bcd5c68d066531d7If your (probably serialized) objects still use a ShallowSimpleListAssays internally, it’s time to pass them thruupdateObject(). If you need to do this in a more automated fashion for a bunch of objects in your package or in a collection of packages, take a look at theupdateObjectpackage(new in BioC 3.15):https://bioconductor.org/packages/updateObjectGo straight to the vignette. - Attachment (Bioconductor): updateObject (development version) > A set of tools built around updateObject() to work with old serialized S4 instances. The package is primarily useful to package maintainers who want to update the serialized S4 instances included in their package. This is still work-in-progress.

Tim Triche (19:34:47): > oh wow

Tim Triche (19:34:51): > this is really helpful

Tim Triche (19:34:56): > what’s the default guts now?

Tim Triche (19:36:07): > this was Zach’s beef, thanks@Hervé Pagès… I will look into whether his preferences can still be satisfied re: simple access to Matrix-shaped objects (he has a particularly fast and compact representation that he’d like to use)

Hervé Pagès (19:58:13): > Default guts is SimpleList which is basically an ordinary list. In the era of on-disk assay data, there’s not much benefit in providing reference semantic at the level of thelistof assays itself (which is what ShallowSimpleListAssays was providing). Maybe the case could still be made for reference semantic at the level of theindividualassays but that’s no longer SummarizedExperiment business, it’s the business of whatever matrix-like object you stick in your SE object:wink:Zach should be able to stick his fast and compact objects directly in an SE as long as those objects satisfy some basic requirements e.g.dim(),dimnames(),[+ optionallycbind()andrbind()(if the receiving SE needs to support these operations). > > A few years ago we’ve relaxed the requirements so that anything that supports this core set of operations can be used in a SE object or derivative. We also support multidimensional assays as long as the first 2 dimensions matchdim(SE).

2022-03-30

Kasper D. Hansen (08:15:37): > That’s a very helpful description which makes total sense to me

2022-04-03

rohitsatyam102 (09:47:53): > Hi everyone. I was exploring my scRNASeq data set tying to understand which approach is better for it’s normalization. Here are the mean-variance plots of raw counts, lognormalised and sctransformed over thecontrolsample is shown. I also combined both control and treatment and used edgeRestimateCommonDispfunction andplotMeanVarto see if my data follows negative-binomial distribution or Poisson. I know that as the sample number (here cells are acting as individual samples) increases NB distribution approaches Poisson. Can these plots help in making a choice. I see downstream using SCT creates a lot of clusters while lognormalisation only few (post integration). Now since I have no reason to believe that there should be 23 cell types, I wonder what could explain, is sct leading to overcorrection here? - File (PNG): image.png - File (PNG): image.png - File (PNG): image.png

rohitsatyam102 (09:53:28) (in thread): > Here is the code used:https://gist.github.com/Rohit-Satyam/c7187f9c0ddfc8bb1bd3d8c2c8b9dd8a

ImranF (13:10:43) (in thread): > A couple of quick thoughts: > 1. Check the overlap (or lack thereof) in HVGs after lognormalization and SCTransform. If there are, let say, <15% shared HVGs… that would be peculiar > 2. Under the asusmption that both control and treat samples ahve the similar cell composition… maybe just check where clusters mapt o your known/expected celltypes? and then examine the “rest”?

rohitsatyam102 (14:18:45) (in thread): > 1. It’s 66.4% (I use 2000 variable features in each approach). I useVariableFeaturesto retrive HGVs and create the venn diagram shown below. - File (PNG): image.png

rohitsatyam102 (14:20:05) (in thread): > 2. I believe they do have similar composition. When I merge the control and treatment matrix and make UMAP they almost overlap greater than 90%.

rohitsatyam102 (14:31:24) (in thread): > UPDATE 1: I also checked if mitochondrial genes could be at play so I wen’t on checking if mitochondrial genes selected as HGVs were different across normalization approaches. - File (PNG): image.png

2022-04-04

ImranF (14:05:28) (in thread): > > 2. I believe they do have similar composition. When I merge the control and treatment matrix and make UMAP they almost overlap greater than 90%. > If cell composition is similar, try SCtransform()ing independently on each sample, and then integrating. Though, off the top of my head, you will have to select the same HVGs for both samples. If you don’t do that, you might get slightly different sets of HVGs for the ctrl and trmt samples (thus making downstream stuff annoying).

ImranF (14:05:53) (in thread): > Nonetheless, as I said earlier, you might just have to check what those “extra” clusters are

2022-04-05

rohitsatyam102 (05:43:19) (in thread): > Hi@ImranF. Yes I did the analysis by independently normalising the samples and then performing integration. That’s exactly how the Seurat workflow has suggested and usedSelectIntegrationFeaturesto find HGVs.

2022-04-06

rohitsatyam102 (22:22:11) (in thread): > Did some more analysis similar tosctransform paperbut couldn’t conclude anything!! - Attachment (BioMed Central): Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression - Genome Biology > Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from “regularized negative binomial regression,” where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat. - File (PNG): image.png - File (PNG): image.png

2022-04-14

Dario Righelli (05:29:28): > @Dario Righelli has joined the channel

Dario Righelli (05:30:16): > Hi all, > > I don’t know where to put this, but as discussed with@Davide Risso@Charlotte Soneson@Michael Stadlerand@Robert Ivánekat the Bioc retreat in Heidelberg, I’m working on a function for reading 10x-Multiome data starting from the default root path.Hereyou can find the current working implementation with some example data. > > We were discussing if to put this into the DropletUtils package or to create a new package with some additional tools for handling 10x-Multiome data.

2022-04-15

Michael Stadler (05:50:54): > @Michael Stadler has joined the channel

2022-04-18

Tim Triche (11:44:29) (in thread): > Having in a separate package would make it easier to find, if decoupling it is not to hard. Might also be nice for adapting to scCUT&Tag etc

2022-04-25

Peter Hickey (18:56:42): > @Aaron LunshouldcombineCols()work in this case of combining 2 SCEs with overlapping genes but distinct samples? > > library(SingleCellExperiment) > > # Simulate 2 SCEs so that they have some genes (rows) in column but distinct > # samples (columns) > sce1 <- scater::mockSCE(ncells = 10, ngenes = 30) > sce1 > #> class: SingleCellExperiment > #> dim: 30 10 > #> metadata(0): > #> assays(1): counts > #> rownames(30): Gene_0001 Gene_0002 ... Gene_0029 Gene_0030 > #> rowData names(0): > #> colnames(10): Cell_001 Cell_002 ... Cell_009 Cell_010 > #> colData names(3): Mutation_Status Cell_Cycle Treatment > #> reducedDimNames(0): > #> mainExpName: NULL > #> altExpNames(1): Spikes > sce2 <- scater::mockSCE(ncells = 20, ngenes = 40) > # Modify colnames so there is no clash with 1st SCE. > colnames(sce2) <- toupper(colnames(sce2)) > sce2 > #> class: SingleCellExperiment > #> dim: 40 20 > #> metadata(0): > #> assays(1): counts > #> rownames(40): Gene_0001 Gene_0002 ... Gene_0039 Gene_0040 > #> rowData names(0): > #> colnames(20): CELL_001 CELL_002 ... CELL_019 CELL_020 > #> colData names(3): Mutation_Status Cell_Cycle Treatment > #> reducedDimNames(0): > #> mainExpName: NULL > #> altExpNames(1): Spikes > > sce <- combineCols(sce1, sce2) > #> Error in (function (assays = SimpleList(), rowData = NULL, rowRanges = GRangesList(), : the rownames and colnames of the supplied assay(s) must be NULL or > #> identical to those of the RangedSummarizedExperiment object (or > #> derivative) to construct > > se <- combineCols( > as(sce1, "SingleCellExperiment"), > as(sce2, "SingleCellExperiment")) > #> Error in (function (assays = SimpleList(), rowData = NULL, rowRanges = GRangesList(), : the rownames and colnames of the supplied assay(s) must be NULL or > #> identical to those of the RangedSummarizedExperiment object (or > #> derivative) to construct >

2022-04-27

Ynnez Bestari (10:23:37): > @Ynnez Bestari has joined the channel

Genoa Polumbo (10:23:46): > @Genoa Polumbo has joined the channel

Ynnez Bestari (10:27:23): > We would like to invite everyone interested in working with single-cell and spatial omics data to a seminar with Katy Börner on constructing a reference atlas for mapping healthy cells in the human body within the NIH Human BioMolecular Atlas Program. Please see below for meeting time and link: > Speaker: Speaker: Katy Börner, Victor H. Yngve Distinguished Professor of Intelligent Systems Engineering, Indiana University > Title: Human Reference Atlas Construction and Usage > When: May 2nd, 2022 4:00-5:00 PM Eastern Time (United States and Canada) > > Join Zoom meetinghttps://harvard.zoom.us/j/97173440183?pwd=eHI1ODRub0p5NGNEZncwU0lURlJjdz09Password: 421790 > > Join by telephone (use any number to dial in)+1 929 436 2866 +1 312 626 6799 +1 301 715 8592 +1 346 248 7799 +1 669 900 6833 +1 253 215 8782International numbers available:https://harvard.zoom.us/u/ausGPPR0POne tap mobile:+19294362866,,97173440183# US (New York) > > Join by SIP conference room system > Meeting ID: 971 7344 018397173440183@zoomcrc.com

2022-05-02

Ynnez Bestari (16:15:02): > Hello all, the seminar with Dr. Katy Börner is now live. You may join here: Join Zoom meetinghttps://harvard.zoom.us/j/97173440183?pwd=eHI1ODRub0p5NGNEZncwU0lURlJjdz09Password: 421790

2022-05-07

rohitsatyam102 (12:40:22): > Sorry that I am posting it here but@Aaron Luncan you please nod on this if the answer to myown queryon Bioconductor Forum is valid. Actually I couldn’t find many posts/queries on Biostars/Bioconductor forum where people use bulk RNASeq to annotate scRNASeq. So I am bit skeptical

2022-05-08

Tim Triche (10:44:12): > Fwiw, not speaking for Aaron, but most people view bulk RNAseq as a mixture of single cells in various states (cycle, metabolic, TF variation) even if sorted or clonal (cf. panel E inhttps://www.nature.com/articles/nature14590/figures/10). As a result, it is more common to fit a model for how much of each cell state and/or cell type appears to be present in a bulk sample (eg learning an NMF model from single cells and then solving for the hat matrix in each bulk sample using the learned weight matrix from the sc data).

Jared Andrews (12:00:12): > Many of the celldex datasets are from bulk RNA-seq of sorted cell types and work fine for scRNA-seq.

2022-05-09

Tim Triche (08:55:31): > Define “work fine”

Tim Triche (08:59:31): > “Provide answers that are not outrageously incorrect in healthy tissues” is a different bar from “capture relevant biology” (latter being experiment dependent, but see e.g.https://www.nature.com/articles/s43018-022-00356-3for a situation where aberrant cellular interactions appear to be a key aspect of the underlying biology) - Attachment (Nature): Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology > Nature Cancer - Danko and colleagues develop BayesPrism, a bulk RNA sequencing deconvolution tool to infer cell type composition and cell-specific expression levels across clinical cancer datasets.

Tim Triche (09:00:19): > Not saying you are wrong, but “work[s] fine” is necessarily subjective and contextual.

Jared Andrews (10:09:07): > Fine as in “give you reasonable cell type annotations given an appropriate reference dataset”. If a single cell dataset isn’t annotated for the cell states specified, then using a single cell dataset as a reference isn’t going to get you any closer than a bulk one. Given the goal is cell type annotation of a single cell dataset rather than deconvolution of a bulk RNA-seq dataset and that the SingleR book shows examples of where using a bulk RNA-seq dataset to annotate a scRNA-seq yields reasonable results; yeah, I’d say as a general method, it works “fine” for the intended purposed. > > Of course, if you’re trying to differentiate between cell states or whatnot, then a reference containing samples in those states with them appropriately labeled will be necessary.

2022-05-10

Genoa Polumbo (10:55:03): > Interested in working with single-cell and spatial omics data? Please fill out a brief survey as we construct HuBMAP: a reference atlas for mapping healthy cells in the human body within the NIH Human BioMolecular Atlas Program.https://app.smartsheet.com/b/form/0c3f6d63a3784b648a4083db9bbdcb56

rohitsatyam102 (14:57:56) (in thread): > Interesting! Do you have recommendations packages/tools that can be used for deconvoluting Bulk RNASeq of non-human organism such as Plasmodium some universal tool (like SingleR was for transferring for labels from Bulk to RNASeq no matter what the organism)? The BayesPrism is for Human specifically!!

2022-05-12

Theresa Alexander (11:30:45): > Question regarding using sc-RNAseq data for a vignette: > I want to have examples for both an SCE and Seurat object. For the Seurat example, would it be better to download some 10X data with something like the following: > > url <-[https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz](https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz)destfile <- "./pbmc3k_filtered_gene_bc_matrices.tar.gz" > download.file(url, destfile) > > Or better to grab an SCE object from the sc-RNAseq or celldex package and coerce to a Seurat object? > Thanks in advance!

Dan Bunis (15:29:46): > If you can, I actually recommend having as little of your build system demands rely on Seurat as possible. I switched the dittoSeq vignette from using Seurat & SCE for its data example (scRNAseq grab,as.Seurat, Seurat pre-processing for Seurat user comfort =p, then as.SCE) tojust SCE (scRNAseq and pre-processing with code from the OSCA book) because sometimes Seurat just breaks in a version and gets removed from CRAN.

2022-05-18

Friederike Dündar (13:57:07): > You can have non executed pseudo code for the Seurat example

Alan O’C (14:04:24): > The issue with example code that doesn’t execute as part of automated builds is it becomes obsolete and therefore useless surprisingly quickly

2022-05-20

Theresa Alexander (08:18:07) (in thread): > Thanks!

2022-05-23

rohitsatyam102 (17:15:10): > When comparing the GO terms obtained after ORA from bulk RNAseq and GO terms obtained from scRNAseq per cluster, I observe a huge disparity. Is it correct to directly compare the GO term like this? Even when I combine DE genes from all cluster and then perform ORA the difference persists!! What’s the ideal way of comparing the bulk and scRNAseq DEGs ( Am performing ORA separately on Up and Down regulated genes)

rohitsatyam102 (17:19:01) (in thread): > I was thinking of pseudobulking buti don’t have two replicates per condition. would like to hear the canonical way in which most people of our community do!!

2022-06-23

Stephanie Hicks (21:08:46): > the CZI Single-Cell Biology Data Insights Cycle 2 RFA was just announcedhttps://chanzuckerberg.com/rfa/single-cell-data-insights. Due date is Aug 25 - Attachment (Chan Zuckerberg Initiative): New Funding Opportunity for Single-Cell Computational Biology > Advancing tools and resources for analyzing single-cell biology datasets to gain greater insights into health and disease.

2022-07-04

Andrew J. Rech (19:45:15): > @Andrew J. Rech has joined the channel

2022-07-07

Clara Pereira (14:27:15): > @Clara Pereira has joined the channel

2022-07-13

Alan Aw (14:59:01): > @Alan Aw has joined the channel

2022-07-15

Ashley Robbins (15:18:26): > @Ashley Robbins has joined the channel

2022-07-18

Filip Stachura (09:24:59): > @Filip Stachura has joined the channel

2022-07-19

Anna Reisetter (17:22:03): > @Anna Reisetter has joined the channel

2022-07-28

Mervin Fansler (17:21:16): > @Mervin Fansler has joined the channel

2022-08-02

Alvaro Sanchez (05:09:42): > @Alvaro Sanchez has joined the channel

2022-08-04

László Kupcsik (21:58:58): > @László Kupcsik has joined the channel

2022-08-11

Rene Welch (17:16:20): > @Rene Welch has joined the channel

2022-08-15

Michael Kaufman (13:15:53): > @Michael Kaufman has joined the channel

2022-09-04

Gurpreet Kaur (14:59:55): > @Gurpreet Kaur has joined the channel

2022-09-27

Jennifer Holmes (16:14:51): > @Jennifer Holmes has joined the channel

vin (22:14:27): > @vin has joined the channel

2022-10-06

Devika Agarwal (05:38:06): > @Devika Agarwal has joined the channel

2022-10-20

Connie Li Wai Suen (01:26:06): > @Connie Li Wai Suen has joined the channel

2022-10-28

Brian Schilder (08:31:16): > @Brian Schilder has joined the channel

Vandenbulcke Stijn (12:29:50): > @Vandenbulcke Stijn has joined the channel

2022-10-31

Chenyue Lu (10:05:41): > @Chenyue Lu has joined the channel

2022-11-06

Sherine Khalafalla Saber (11:21:33): > @Sherine Khalafalla Saber has joined the channel

2022-12-02

Vince Carey (11:06:48): > Quiet here? > > The second effort is a supplemental library called SOMA.io that will enable users to convert SOMA-backed objects to and from the two most popular domain-specific formats: anndata and Seurat. > > – that’s fromhttps://github.com/single-cell-data/SOMA

Alan O’C (11:19:51) (in thread): > https://xkcd.com/927/ - Attachment: Standards > [Title text] “Fortunately, the charging one has been solved now that we’ve all standardized on mini-USB. Or is it micro-USB? Shit.”

2022-12-08

Tim Triche (11:15:36) (in thread): > SOMA is interesting.Happy to discuss

Tim Triche (11:17:28) (in thread): > It’sone of two backends that could stand to be hooked in via reference classes.Talked with@Martin Morganabout this in San Jose (and Zach with the TileDB folks).As an API contract it is attractive

2022-12-12

Umran (17:58:19): > @Umran has joined the channel

Lexi Bounds (17:59:49): > @Lexi Bounds has joined the channel

2022-12-13

Ana Cristina Guerra de Souza (09:01:30): > @Ana Cristina Guerra de Souza has joined the channel

Xiangnan Xu (18:33:04): > @Xiangnan Xu has joined the channel

Yue Cao (19:04:30): > @Yue Cao has joined the channel

2022-12-14

Bárbara Zita Peters Couto (14:11:14): > @Bárbara Zita Peters Couto has joined the channel

Lijia Yu (19:38:04): > @Lijia Yu has joined the channel

2022-12-20

Jennifer Foltz (10:41:38): > @Jennifer Foltz has joined the channel

2023-01-08

Pageneck Chikondowa (05:33:15): > @Pageneck Chikondowa has joined the channel

2023-01-26

Yu Zhang (12:33:00): > @Yu Zhang has joined the channel

2023-01-31

Ahmad Al Ajami (09:11:20): > @Ahmad Al Ajami has joined the channel

2023-02-13

Hervé Pagès (10:44:52): > @Hervé Pagès has left the channel

2023-02-20

Iivari (02:28:26): > @Iivari has joined the channel

2023-02-22

michaelkleymn (01:44:29): > @michaelkleymn has joined the channel

2023-02-23

Claire Seibold (15:49:01): > @Claire Seibold has joined the channel

2023-02-28

Ramin (15:30:12): > @Ramin has joined the channel

2023-03-08

Ayantika Sen (01:36:40): > @Ayantika Sen has joined the channel

2023-03-10

Edel Aron (15:27:51): > @Edel Aron has joined the channel

2023-03-17

Michael Milton (00:49:02): > @Michael Milton has joined the channel

Michael Milton (00:52:35) (in thread): > I recently got an update on SOMA’s progress:https://github.com/single-cell-data/TileDB-SOMA/issues/1087#issuecomment-1468614647

Michael Milton (00:56:25) (in thread): > Close to a release, but they don’t seem to have any SCE support yet

Tim Triche (02:07:36) (in thread): > Not sure that the r api is really so close to release.Conversion between Seurat and SCE is of course fairly straightforward.TileDB customers want Seurat because the big ball of mud is familiar:man-shrugging:

Martin Morgan (06:32:28) (in thread): > I think SCE should be something independent of SOMA; ‘we’ (in a separate package) assemble an SCE from calls to SOMA. I could start a repository for this…

Michael Milton (06:48:30) (in thread): > It seems like they plan on doing it anyway, though

Martin Morgan (07:43:50) (in thread): > I guess I was thinking of creating an SCE from SOMA, rather than the other wa around, or is that in the works too?

Michael Milton (07:47:51) (in thread): > Oh, yes I agree that is important. It’s not mentioned in the issue I linked only because it was already implemented in their old implementation:https://tiledb-inc.github.io/tiledbsc/reference/SOMA.html#method-to-summarized-experiment-

Michael Milton (07:48:14) (in thread): > But from the discussion is seems that they’re keen to implement both directions of the SCE <-> SOMA conversion

Tim Triche (08:46:41) (in thread): > In an ideal world, a lazy SCE with something like a SQLDataFrame for cell/sample metadata would be awesome

Tim Triche (08:48:48) (in thread): > Who’spresenting this at bioc2023?@Martin Morganare you going to update people?You’rebest qualifiedI’dthink!

Tim Triche (08:50:53) (in thread): > The caching infrastructure implemented by Stefano is super cool but I wonder ifit’s realisticfor it to keep up with incoming merges (several on the horizon) and multimodal data (ibid).I suppose it would be ideal for multiple approaches to get presentations at BioC!

Tim Triche (08:51:47) (in thread): > Ulterior motive,I’verealized that having lazy covariates and lazy data appeals immensely to me of late.HDF5 is swell except when itisn’t

Martin Morgan (10:02:34) (in thread): > @Michael Miltonwould be a great person to present at Bioc2023, on this andCuratedAtlasQueryR(probably travel funds could be found either through a travel scholarship or…) ! With respect to lazy data, I haveh5ad(this package is very immature and will probably remain that way…) for interfacing with h5ad files; data stays on disk until asked… I’ve also spoken with Luke Zappia (zellkonverterauthor) about the value of laziness in his mature readH5AD function. - Attachment (Bioconductor): zellkonverter > Provides methods to convert between Python AnnData objects and SingleCellExperiment objects. These are primarily intended for use by downstream Bioconductor packages that wrap Python methods for single-cell data analysis. It also includes functions to read and write H5AD files used for saving AnnData objects to disk.

Michael Milton (10:14:00) (in thread): > I would certainly be interested in presenting, though I hadn’t made any plans to make a submission…!

Michael Milton (10:14:52) (in thread): > It’s funny you should talk about anndata. I’ve been looking into (lazy) anndata backends for SCE recently.

Michael Milton (10:18:08) (in thread): > zellkonverter’s native R reader is very promising, and I can get some good performance with it. It even supports lazy (“backed”) loading for all data matrices.

Michael Milton (10:21:28) (in thread): > There’s alsoMuDatawhich has a similar native R reader and writer.

2023-03-30

Ludwig Geistlinger (11:01:09): > Mark your calendars for a CCB seminar special with Aaron Lun, > the mastermind behind theOrchestrating Single-Cell Analysis with Bioconductor(OSCA) online book! > > Aaron will speak about the journey that lead to the OSCA book > from a developer’s perspective in his talk: > > Code, sweat, and tears: how the OSCA sausage was made > > When: April 03, 2023, 3 PM ET > Where:https://harvard.zoom.us/j/97173440183?pwd=eHI1ODRub0p5NGNEZncwU0lURlJjdz09

Stefania Pirrotta (11:46:29): > @Stefania Pirrotta has joined the channel

2023-03-31

Laura Masatti (06:45:46): > @Laura Masatti has joined the channel

2023-04-05

Steve Lianoglou (12:52:02) (in thread): > I’ve been out of the loop and only just saw this – any chance this seminar was recorded?

Ludwig Geistlinger (13:04:49) (in thread): > Yes we’ve recorded the talk and I’ll have a link for you soon

2023-04-06

Steve Lianoglou (01:08:38) (in thread): > great, thanks!

Ludwig Geistlinger (13:13:59) (in thread): > Here is the recording:https://youtu.be/NCBUBP4Ll9I - Attachment (YouTube): Code, sweat, and tears: how the OSCA sausage was made

2023-04-12

aaronwolen (15:21:07): > @aaronwolen has joined the channel

2023-04-21

Kozo Nishida (14:22:04): > @Kozo Nishida has joined the channel

2023-05-04

Leopoldo Valiente (16:28:50): > @Leopoldo Valiente has joined the channel

2023-05-17

Hassan Kehinde Ajulo (12:18:13): > @Hassan Kehinde Ajulo has joined the channel

2023-05-18

Oluwafemi Oyedele (05:54:26): > @Oluwafemi Oyedele has joined the channel

2023-06-14

Stevie Pederson (21:30:16): > @Stevie Pederson has left the channel

2023-06-19

Pierre-Paul Axisa (05:12:28): > @Pierre-Paul Axisa has joined the channel

2023-06-28

Andrew Ghazi (10:59:48): > @Andrew Ghazi has joined the channel

2023-07-12

Axel Klenk (19:33:54): > @Axel Klenk has joined the channel

2023-07-24

Leo Lahti (17:46:12): > I wanted to ask the following - someone on this channel may have thought about this already: > > -> the missing support for DataFrame in ggplot2: > * ggplot2 eatsdata.frames but it does not understandS4vectors::DataFrame > * multiple Bioconductor packages (incl. the SCE framework) are now usingDataFrameinstead ofdata.frame > * hence, one must doggplot(as.data.frame(DF))whenDFis aDataFrame; Also OSCA seems to do so:https://github.com/OSCA-source/OSCA.advanced > * but it would be handy if we could just call:ggplot(DF)(i.e. ifggplot2could supportDataFrame) > * problem: ggplot2 is not Bioconductor package, so it would be less straightforward to request adding support > Has someone thought about a solution to this (or aware of a solution)?

Kevin Rue-Albrecht (18:34:40) (in thread): > I always spend extra time when teaching ggplot2 and bioc highlighting that irritating point. I never invested the time to study a fix beyond the as.data.frame trick. > That said, ggplot2 accepts other data.frame-like objects, like tibble and datatable sothere’sgot to be something we can do on the bioc side to tag DataFrame as something ggplot2 can understand.Whatdoes ggplot2 rely on to decide that an object is similar enough to a data.frame to work with it?

Leo Lahti (18:35:22) (in thread): > Yes, I have the same problem: this is rather confusing when teaching Bioconductor methods. Would have to explore ggplot2 more closely unless someone already got somewhere with this.

Michael Milton (19:23:32) (in thread): > It looks like you could implementfortify, which is called to convert the data into a data frame. I think tibble and data.table work because they both inherit fromdata.framewhich has afortify.data.frameimplementation.DataFramedoesn’t inherit fromdata.frame. I guess that was a a deliberate decision.

2023-07-25

Kevin Rue-Albrecht (06:19:04) (in thread): > Ah right then it sounds feasible and overdue if ‘fortify’ could be the answer

Kevin Rue-Albrecht (07:36:11) (in thread): > PS: I wasn’t there when a decision was made butdata.framebeing S3 (derives fromlist), I think thatDataFramewas simply built from scratch to be an “all-S4” class, entirely managed by Bioconductor core team, with minimal dependency on external packages/classes.

Kevin Rue-Albrecht (07:52:46) (in thread): > OK, so basically, this seems to work indeed > > library(S4Vectors) > library(ggplot2) > > df <- DataFrame( > x = 1:10, > y = rnorm(10) > ) > > fortify.DataFrame <- function(model, data, ...) { > as.data.frame(model) > } > > ggplot(df, aes(x, y)) + > geom_point() >

Kevin Rue-Albrecht (07:53:17) (in thread): > Gotta go for lunch now, but I’ll make a PR onS4Vectorsand see what the maintainers say about it

Kevin Rue-Albrecht (07:54:22) (in thread): > FYI: theggplot2::fortifyman page says this: - File (PNG): image.png

Kevin Rue-Albrecht (07:54:48) (in thread): > So thisfortifyhack might only be a temporary fix. Needs further investigation

Kevin Rue-Albrecht (08:12:00) (in thread): > problem: to definefortify.DataFrameinS4Vectors, it would need to addggplot2in itsImports:which I think would add too many dependencies for such a small gain. > Also, I’m trying to understand what “wider range of methods” refers to in “I now recommend using thebroompackage, which implements a much wider range of methods.” > Sounds like its theaugmentgeneric that needs to be defined now:https://github.com/tidymodels/broom/blob/081117811b8d517c5aec3de2f94648883c2a6d3e/NEWS.md?plain=1#L990

Kevin Rue-Albrecht (08:22:53) (in thread): > https://broom.tidymodels.org/articles/adding-tidiers.html > > Tidiers for objects from BioConductor belong inbiobroom - Attachment (broom.tidymodels.org): Writing new tidier methods > broom

Kevin Rue-Albrecht (08:30:22) (in thread): > this is ugly for different reasons but seems to workhttps://github.com/kevinrue/S4Vectors/tree/kra-fortify

Kevin Rue-Albrecht (09:13:50) (in thread): > (ugly#1: I’m importing from a package listed inSuggests)

Kevin Rue-Albrecht (09:17:44) (in thread): > basically, I’m facing a chicken-and-egg problem: > * ggplot2should but cannotSuggestsa Bioconductor package (can it?) becauseinstall.packages("ggplot2")would not know where to getS4Vectorsfrom, at least not without users runningoptions(repos = BiocManager::repositories()) > * S4Vectorsshould , but probably doesn’t want, to addggplot2toImports, so that it can@importFrom ggplot2 fortify, so that it can exportfortify.DataFrame

Leo Lahti (09:35:01) (in thread): > Hmm.

Kevin Rue-Albrecht (09:45:13) (in thread): > I’ve opened this PR to continue the discussion with the S4Vectors developershttps://github.com/Bioconductor/S4Vectors/pull/116 - Attachment: #116 add fortify.DataFrame > Hi there > > TL;DR: This PR makes the following code possible: > > > library(S4Vectors) > > df <- DataFrame( > x = 1:10, > y = rnorm(10) > ) > > library(ggplot2) > > ggplot(df, aes(x, y)) + > geom_point() > > # see also: > fortify(df) > > > Without the need for as.data.frame(df) in the ggplot() call. > > * * * > > Context: > > As per https://community-bioc.slack.com/archives/C6KJHH0M9/p1690235172607369 > > I’ve messed around S4vectors a bit to test feasibility, and somehow landed on my feet with something that seems to work. > > I’ll be honest, I’m not even sure why R allows me to do it, but it seems that I can importFrom a package that is listed in Suggests (i.e., not in Depends). > > I added ggplot2 to Suggests because I don’t like the idea of having it under Imports. It just feels wrong to automatically install ggplot2 and its own dependencies as a dependency of S4Vectors. S4Vectors should remain a lightweight package. > > I suppose that if users have ggplot2 installed, the import statement “just works”, and if they don’t have ggplot2 installed.. well… they don’t have any reason to call ggplot() on a DataFrame object :D > > I’m aware that this PR is unlikely to be the final fix (if any is possible at all). I just aim to give a starting point to the discussion. > > Also, I’ve considered other approaches, but run into chicken-and-egg issues: > > • I suspect ggplot2 will not accept to Suggests: S4Vectors, as I don’t see any Bioconductor package in its existing Imports/Suggests (https://cran.r-project.org/web/packages/ggplot2/index.html) and install.packages() cannot see Bioconductor packages without messing with options(repos). > • I suspect S4Vectors will not accept to Imports: ggplot2, to justify more cleanly importFrom(ggplot2, fortify). Same reason as above: keep S4Vectors dependencies to a minimum > • I noted that ?ggplot2::fortify states “Rather than using this function, I now recommend using the broom package, which implements a much wider range of methods. fortify() may be deprecated in the future.” However, it is not clear to me what needs to be done in broom (or biobroom)

Kevin Rue-Albrecht (09:45:42) (in thread): > but we can continue to discuss unofficially here:slightly_smiling_face:

Michael Milton (18:47:23) (in thread): > Remind me why you need to import ggplot2 in order to define a fortify method?

Michael Milton (18:50:49) (in thread): > Also you can have an S4 class inherit from an S3 one, but it’s a bit odd: > > > DataFrame = setClass("DataFrame", contains="data.frame") > > inherits(DataFrame(), "data.frame") > [1] TRUE >

2023-07-26

Kevin Rue-Albrecht (04:11:56) (in thread): > Building/installing S4Vectors fails if the generic “fortify” is not imported from ggplot2 when the method fortify.DataFrame is exported from S4Vectors

Jenny Drnevich (13:02:53): > @Jenny Drnevich has joined the channel

2023-07-28

Konstantinos Daniilidis (13:47:16): > @Konstantinos Daniilidis has joined the channel

Benjamin Yang (15:59:00): > @Benjamin Yang has joined the channel

2023-08-02

Jamin Liu (14:41:56): > @Jamin Liu has joined the channel

2023-08-03

USLACKBOT (10:05:43): > This message was deleted.

Michael Love (10:41:04) (in thread): > what doesfilter()do?

Michael Love (10:41:34) (in thread): > like filter cells based on colData?

Michael Love (10:43:27) (in thread): > @stefano mangiolahas implemented this in tidySingleCellExperiment > > library(tidySingleCellExperiment) > sce |> filter(Phase == "G1") >

Ellis Patrick (10:43:43) (in thread): > fantastic!!!! Thanks

Michael Love (10:43:43) (in thread): > we have a workshop Friday at 10amhttps://tidybiology.github.io/tidyomicsWorkshopBioc2023/articles/tidyGenomicsTranscriptomics.html - Attachment (tidybiology.github.io): Tidy genomic and transcriptomic single-cell analyses > tidyomicsWorkshopBioc2023

Ellis Patrick (10:44:03) (in thread): > This is exactly what I want

Michael Love (10:44:32) (in thread): > Room 307, boguns welcome

Flávia E. Rius (11:21:33): > @Flávia E. Rius has joined the channel

Ritika Giri (15:59:38): > @Ritika Giri has joined the channel

2023-08-04

Trisha Timpug (09:36:20): > @Trisha Timpug has joined the channel

2023-08-11

Hervé Pagès (02:15:16): > @Hervé Pagès has joined the channel

Hervé Pagès (02:24:51) (in thread): > Does anybody know why their defaultfortify()method (ggplot2:::fortify.default()) doesn’t simply try to callas.data.frame()on the supplied object? Might be worth asking them. Would makeggplot()work on any object that supportsas.data.frame()e.g. DataFrame, matrix, dgCMatrix, DelayedMatrix, SparseMatrix, etc… Why add the additional requirement that these objects must implement afortify()method when all that is needed is that they supportas.data.frame()?

2023-08-12

Kevin Rue-Albrecht (09:18:10) (in thread): > Good question.Don’tknow.@Leo Lahtido you want to approach the ggplot2 people and ask?

Leo Lahti (12:23:37) (in thread): > Yes, I can do.

Leo Lahti (12:39:52) (in thread): > I think this also concerns tidyverse functions, for instancedplyr::full_join(colData(se1), colData(se2))does not work, whereasdplyr::full_join(as.data.frame(colData(se1)), as.data.frame(colData(se2)))seems to work > > -> should something be done with that, too..?

Leo Lahti (13:24:37) (in thread): > Ok I opened this issue in ggplot now:https://github.com/tidyverse/ggplot2/issues/5390 - Attachment: #5390 Support for formats other than data.frame > Problem Whereas ggplot2 supports data.frame, many other data structures are available that could benefit from the ability to use ggplot2 functionality. Examples include e.g. DataFrame, matrix, dgCMatrix, DelayedMatrix, SparseMatrix, etc. Many of these classes support as.data.frame() and can be easily converted into a data.frame. However, the need to do this with every ggplot2 function call becomes rapidly very repetitive. > > Suggested solution The default fortify() method, ggplot2:::fortify.default() could just try to call as.data.frame() on the supplied object. This would directly make ggplot() work on any object that supports as.data.frame() (e.g. DataFrame, matrix, dgCMatrix, DelayedMatrix, SparseMatrix, etc.) > > Bioconductor/S4Vectors#116 > > Let’s load libraries and example data > > > library(S4Vectors) > library(ggplot2) > data(iris) > > > Usual data.frame works as expected: > > > ggplot(iris, aes(x=Sepal.Width, y=Sepal.Length)) + geom_point() > > > DataFrame does not work, and ggplot call throws and error: > > > ggplot(DataFrame(iris), aes(x=Sepal.Width, y=Sepal.Length)) + geom_point() > > > > Error in fortify():
> > ! data must be a <data.frame>, or an object coercible by fortify(),
> > not a object. > > At the moment our default solution has been to always add as.data.frame() around DataFrame objects, like: > > > ggplot(as.data.frame(d), aes(x, y)) + geom_point() > > > There was initial discussion that related to the challenges this adds to teaching standard plotting in ecosystems that rely on classes that are closely related to data.frame but not that. > > Initial thought was to solve this in the S4Vectors class (for DataFrame), see the PR by @kevinrue - then @hpages pointed out the more general solution described above. > > -> Could ggplot add the as.data.frame check to extend the support to other formats than data.frame? If yes, we might be able to provide a PR.

Hervé Pagès (15:01:49) (in thread): > Thanks@Leo Lahti!

2023-08-18

Victor Yuan (12:35:38): > @Victor Yuan has joined the channel

2023-08-20

Jacques SERIZAY (10:38:27): > @Jacques SERIZAY has joined the channel

2023-09-03

Lea Seep (09:52:49): > @Lea Seep has joined the channel

2023-09-13

Christopher Chin (17:05:09): > @Christopher Chin has joined the channel

2023-09-15

Leo Lahti (04:56:30): > @Leo Lahti has joined the channel

2023-09-18

Krithika Bhuvanesh (16:54:25): > @Krithika Bhuvanesh has joined the channel

Krithika Bhuvanesh (16:54:36): > Has anyone used monocle3 in Bioconductor ?

Krithika Bhuvanesh (17:04:38): > I’m trying to install monocle 3 on google colab for teaching, but somehow doesnt work for me. If anyone has had success, please connect with me

2023-09-19

Ramon Massoni-Badosa (06:37:45) (in thread): > could you please share the error message?

2023-09-20

Alik Huseynov (04:51:09): > @Alik Huseynov has joined the channel

Hervé Pagès (12:06:58) (in thread): > This was finally merged:https://github.com/tidyverse/ggplot2/pull/5404Don’t know how long it will take to make it to CRAN though.

2023-09-21

Leo Lahti (04:23:05) (in thread): > Yes, noted! So cool. We have started the preps to adopt the change in our systems:pray::heavy_check_mark::muscle::fast_parrot:

Leo Lahti (04:23:31) (in thread): > Thanks@Hervé Pagèsand all others.

Jacques SERIZAY (09:06:23) (in thread): > Awesome to know that we won’t have to runas.data.frameeverytime we want to plot variables from aDataFrame, thanks to people who managed to add that toggplot2:hugging_face::tada:! > Following up on this issue,dplyrandtidyrverbs still do not work onDataFrames . > > > library(S4Vectors) > Loading required package: stats4 > Loading required package: BiocGenerics > > Attaching package: 'BiocGenerics' > > The following objects are masked from 'package:stats': > > IQR, mad, sd, var, xtabs > > The following objects are masked from 'package:base': > > anyDuplicated, aperm, append, as.data.frame, basename, cbind, > colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, > get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, > match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, > Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, > table, tapply, union, unique, unsplit, which.max, which.min > > Attaching package: 'S4Vectors' > > The following object is masked from 'package:utils': > > findMatches > > The following objects are masked from 'package:base': > > expand.grid, I, unname > > > library(ggplot2) > > library(dplyr) > > Attaching package: 'dplyr' > > The following objects are masked from 'package:S4Vectors': > > first, intersect, rename, setdiff, setequal, union > > The following objects are masked from 'package:BiocGenerics': > > combine, intersect, setdiff, union > > The following objects are masked from 'package:stats': > > filter, lag > > The following objects are masked from 'package:base': > > intersect, setdiff, setequal, union > > > df <- DataFrame(x = c(1,2,3), y = c(1, 4, 9)) > > ggplot(df, aes(x, y)) + geom_line() > > mutate(df, z = x * 2) > Error in UseMethod("mutate") : > no applicable method for 'mutate' applied to an object of class "c('DFrame', 'DataFrame', 'SimpleList', 'RectangularData', 'List', 'DataFrame_OR_NULL', 'Vector', 'list_OR_List', 'Annotated', 'vector_OR_Vector')" > > This is a limitation to “tidy” genomic workflows (e.g.plyranges-based), assummarizeorselect(..., .drop_ranges = TRUE)return aDataFrame. So one cannot currently pipedplyr/tidyrverbs to a previousplyranges::summarize(). This can easily be solved by inserting aas_tibble()(oras.data.frame()call in the pipe, but it’s net very “tidy”. I have heard ofDFplyrthat implementdplyrmethods toDataFrameobjects themselves, but never used it myself, it has not been integrated into Bioc (AFAIK) and it seems unmaintained. Are there other officially supported alternatives? Otherwise, is there any on-going work to support this? And if not, should it be added to thetidyomicschallenge? > > > df <- GRanges("I:1-10") |> group_by(strand) |> summarize(width) > > df > DataFrame with 1 row and 2 columns > strand width > <Rle> <IntegerList> > 1 * 10 > > df |> mutate(id = 1) > Error in UseMethod("mutate") : > no applicable method for 'mutate' applied to an object of class "c('DFrame', 'DataFrame', 'SimpleList', 'RectangularData', 'List', 'DataFrame_OR_NULL', 'Vector', 'list_OR_List', 'Annotated', 'vector_OR_Vector')" > > df |> as_tibble() |> mutate(id = 1) > # A tibble: 1 × 3 > strand width id > <fct> <list> <dbl> > 1 * <int [1]> 1 >

Leo Lahti (09:24:43) (in thread): > I agree we would have this need, too.

2023-09-25

Kerim Secener (09:12:44): > @Kerim Secener has joined the channel

2023-11-02

Sunil Poudel (10:52:04): > @Sunil Poudel has joined the channel

Sunil Poudel (10:59:58): > Our next seminar on Mon, Nov 06, 3-4 AM ET, will feature Ricard Argelaguet, who will discussprinciples and challenges in single-cell data integration. Join us under the zoom link provided in the flyer below! - Attachment (Nature): Computational principles and challenges in single-cell data integration > Nature Biotechnology - As the number of single-cell experiments with multiple data modalities increases, Argelaguet and colleagues review the concepts and challenges of data integration. - File (PDF): CCB_SeminarFlyer_Ricard.pdf

2023-12-01

Tram Nguyen (10:16:32): > @Tram Nguyen has joined the channel

2023-12-15

Zepeng QU (15:04:33): > @Zepeng QU has joined the channel

2024-01-10

Bernie Mulvey (15:04:33): > @Bernie Mulvey has joined the channel

2024-01-11

Nilesh Kumar (12:01:21): > @Nilesh Kumar has joined the channel

2024-01-22

Assa (11:45:49): > @Assa has left the channel

2024-02-15

Leo Lahti (01:02:09): > Anyone considering of coming to ECCB (https://eccb2024.fi)? I am thinking of putting together a workshop/tutorial on Orchestrating Multi-Assay Analyses with Bioconductor. That would ideally be a joint session by developers from different fields (single cell, microbiome, transcriptome, metabolome, proteome) + using SummarizedExperiment / MultiAssayExperiment family of tools.

Wes W (08:54:33) (in thread): > If there was a travel grant available I’d come workshop with you Leo

2024-02-16

Leo Lahti (06:01:17) (in thread): > Thanks - I submitted a microbiome focused multiomic application. Let’s see how that will go and then plan more.

Tim Triche (13:58:50) (in thread): > this sounds rad. when does it happen?

Tim Triche (13:59:01) (in thread): > I could at least send one of my postdocs (I think)

Tim Triche (13:59:24) (in thread): > ooh september so infinitely long days!!!:tada:

Wes W (13:59:51) (in thread): > or just make me an acting post-doc and send me haha > > it worked for Wesley Crusher

Tim Triche (14:00:15) (in thread): > are you eligible for T32 funding

2024-02-19

Leo Lahti (05:30:20) (in thread): > ECCB is in Sep 16-20 in Turku, Finland

Leo Lahti (05:30:34) (in thread): > But we dont know yet if the suggestion goes through

2024-03-02

Aaron Lun (17:35:32): > Hm. I was sure we had a SingleR channel somewhere, but can’t remember it anymore. Anyway,@Jared Andrews@Dan Bunis@Friederike Dündarcelldexis moving from EHub to the newgypsumframework that should make it a lot easier to contribute new datasets. I’ve already transferred all of the existing datasets - hopefully correctly, we’ll see once thecelldex1.13.2 hits the shelves. If you want to add your own reference datasets, you already have owner permissions: > > > gypsum::fetchPermissions("celldex") > $owners > $owners[[1]] > [1] "LTLA" > > $owners[[2]] > [1] "j-andrews7" > > $owners[[3]] > [1] "dtm2451" > > $owners[[4]] > [1] "friedue" > > > $uploaders > list() > > So you can go nuts and follow the instructions athttps://github.com/LTLA/celldex/blob/master/vignettes/userguide.Rmd#L207. (You don’t have to wait for the upload permissions in 5, because you already have permission as acelldexowner; and I just added everyone who wasn’t a collaborator to the repo.) Similarly, if you know someone who wants to upload a reference dataset, you can get them to follow those instructions and then you follow the instructions athttps://github.com/LTLA/celldex?tab=readme-ov-file#maintainer-notesto review/approve their upload. > > P.S. If you do upload something, you’ll be the first non-self uploaders, so there might be some turbulence. > > P.P.S. I think the Cloudflare storage is currently being billed to me personally, which I don’t mind but just don’t go too crazy with what you upload. There’s a per-package quota anyway so you won’t be able to make me homeless but nonetheless. > > P.P.P.S. I’m on vacation starting today, so responses might be a bit slower than usual.

Jared Andrews (17:50:21): > A vacation? How scandalous, Aaron.

Jared Andrews (17:50:57): > Obviously, I kid - enjoy your break. Looks cool though, I may have some datasets to add at some point.

Lori Shepherd (20:24:37): > ShouldI add an rdatadateremoved to the celldx objects so they are no longer visible in experimenthub?

Aaron Lun (20:43:30): > not right now I think, see my response in#osca-book

2024-03-04

Dan Bunis (09:33:34): > Sounds awesome! Thanks, Aaron. Enjoy your vacation!

2024-03-05

Pratibha Panwar (01:32:40): > @Pratibha Panwar has joined the channel

Vince Carey (14:29:32): > Intrepid souls who are working with the devel branch may wish to explore the newest version of scRNAseq. One untoward event that needs explanation: > > > z = PaulHSCData(legacy=FALSE) > The value -2^31 was detected in the dataset. > This has been converted to NA within R. > The value -2^31 was detected in the dataset. > This has been converted to NA within R. > > We know the message comes from rhdf5, but what is it about the data that is triggering this?

Aaron Lun (16:54:36): > nothing to worry about, I just forgot to contact@Mike Smithto get an option inrhdf5to silence the message

Aaron Lun (16:54:49): > see my response inhttps://github.com/LTLA/scRNAseq/issues/44 - Attachment: #44 extreme value detection event with new PaulHSCData, not seen in legacy call > > > z = PaulHSCData(legacy=FALSE) > The value -2^31 was detected in the dataset. > This has been converted to NA within R. > The value -2^31 was detected in the dataset. > This has been converted to NA within R. > > > seen on windows and linux > > > > sessionInfo() > R Under development (unstable) (2024-01-10 r85797) > Platform: x86_64-pc-linux-gnu > Running under: Ubuntu 22.04.4 LTS > > Matrix products: default > BLAS: /home/vincent/R-4-4-dist/lib/R/lib/libRblas.so > LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0 > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > time zone: America/New_York > tzcode source: system (glibc) > > attached base packages: > [1] stats4 stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] scRNAseq_2.17.1 SingleCellExperiment_1.25.0 > [3] SummarizedExperiment_1.33.3 Biobase_2.63.0 > [5] GenomicRanges_1.55.3 GenomeInfoDb_1.39.6 > [7] IRanges_2.37.1 S4Vectors_0.41.3 > [9] BiocGenerics_0.49.1 MatrixGenerics_1.15.0 > [11] matrixStats_1.2.0 rmarkdown_2.25 > > loaded via a namespace (and not attached): > [1] DBI_1.2.2 bitops_1.0-7 httr2_1.0.0 > [4] biomaRt_2.59.1 rlang_1.1.3 magrittr_2.0.3 > [7] gypsum_0.99.9 compiler_4.4.0 RSQLite_2.3.5 > [10] GenomicFeatures_1.55.3 png_0.1-8 vctrs_0.6.5 > [13] stringr_1.5.1 ProtGenerics_1.35.2 pkgconfig_2.0.3 > [16] crayon_1.5.2 fastmap_1.1.1 dbplyr_2.4.0 > [19] XVector_0.43.1 utf8_1.2.4 Rsamtools_2.19.3 > [22] bit_4.0.5 xfun_0.42 aws.s3_0.3.21 > [25] zlibbioc_1.49.0 cachem_1.0.8 jsonlite_1.8.8 > [28] progress_1.2.3 blob_1.2.4 rhdf5filters_1.15.2 > [31] DelayedArray_0.29.7 Rhdf5lib_1.25.1 BiocParallel_1.37.0 > [34] parallel_4.4.0 prettyunits_1.2.0 R6_2.5.1 > [37] stringi_1.8.3 rtracklayer_1.63.0 Rcpp_1.0.12 > [40] knitr_1.45 base64enc_0.1-3 Matrix_1.6-4 > [43] tidyselect_1.2.0 abind_1.4-5 yaml_2.3.8 > [46] codetools_0.2-19 curl_5.2.0 lattice_0.22-5 > [49] alabaster.sce_1.3.3 tibble_3.2.1 KEGGREST_1.43.0 > [52] evaluate_0.23 BiocFileCache_2.11.1 alabaster.schemas_1.3.1 > [55] xml2_1.3.6 ExperimentHub_2.11.1 Biostrings_2.71.2 > [58] pillar_1.9.0 BiocManager_1.30.22 filelock_1.0.3 > [61] generics_0.1.3 startup_0.21.0 RCurl_1.98-1.14 > [64] BiocVersion_3.19.1 ensembldb_2.27.1 hms_1.1.3 > [67] alabaster.base_1.3.20 alabaster.ranges_1.3.3 glue_1.7.0 > [70] alabaster.matrix_1.3.12 lazyeval_0.2.2 tools_4.4.0 > [73] AnnotationHub_3.11.1 BiocIO_1.13.0 GenomicAlignments_1.39.4 > [76] XML_3.99-0.16.1 rhdf5_2.47.4 grid_4.4.0 > [79] AnnotationDbi_1.65.2 GenomeInfoDbData_1.2.11 HDF5Array_1.31.5 > [82] restfulr_0.0.15 cli_3.6.2 rappdirs_0.3.3 > [85] fansi_1.0.6 S4Arrays_1.3.4 dplyr_1.1.4 > [88] AnnotationFilter_1.27.0 alabaster.se_1.3.4 digest_0.6.34 > [91] SparseArray_1.3.4 rjson_0.2.21 memoise_2.0.1 > [94] htmltools_0.5.7 lifecycle_1.0.4 httr_1.4.7 > [97] aws.signature_0.6.0 bit64_4.0.5 > > >

2024-03-07

Hervé Pagès (15:08:21): > Seems likePaulHSCData(legacy=FALSE)andPaulHSCData(legacy=TRUE)have some significant differences (e.g. their nb of rows are different). Is this expected?

Aaron Lun (18:44:01): > hm.

Aaron Lun (18:44:02): > oh.

Aaron Lun (18:46:51): > just some getter-specific logic that I accidentally restricted tolegacy=TRUE.

Aaron Lun (18:47:24): > should be fixed in 2.17.5.

Hervé Pagès (20:16:14): > Thanks. Also thecolData()of the legacy dataset has a column (Well_ID) not present in the gypsum-backed dataset. Maybe some unit tests could help make sure that the gypsum-backed stuff is a drop-in replacement for the EH-backed stuff.

2024-03-26

Aaron Lun (20:42:21): > thoughts of deprecation of asSparse= in *Pairs functions of the SCE:https://github.com/drisso/SingleCellExperiment/issues/73#issuecomment-2019477191; comments welcome. - Attachment: Comment on #73 colPairs(x, asSparse = TRUE) fails when there are no metadata columns > I’ll take the PR since it’s already been put together, but longer term, I am wondering whether it would be better to (i) deprecate the asSparse= and (ii) export the .hits2mat function (and its counterpart for converting a matrix to a Hits object). This would simplify the interface; the SCE only cares about Hits going in and out, and exactly how this is converted to/from a sparse matrix is up to the user/application outside of the SCE. > > I don’t ever use the colPairs and rowPairs functionality so I have no relevant experience/opinion on this matter, but happy to take thoughts from people who do use it.

2024-03-27

abhich (05:45:05): > @abhich has joined the channel

2024-04-18

Weston Elison (15:53:39): > @Weston Elison has joined the channel

2024-04-25

Mercedes Guerrero (05:02:15): > @Mercedes Guerrero has joined the channel

2024-04-28

Danielle Callan (08:39:09): > @Danielle Callan has joined the channel

2024-04-29

Amarinder Singh Thind (08:29:13): > @Amarinder Singh Thind has joined the channel

Samuel Gunz (09:35:26): > @Samuel Gunz has joined the channel

2024-05-06

Michal Kolář (11:56:25): > @Michal Kolář has joined the channel

2024-05-07

Carlo Pecoraro (12:11:06): > @Carlo Pecoraro has joined the channel

2024-05-09

Philippe Laffont (07:40:00): > @Philippe Laffont has joined the channel

2024-05-26

Michael Milton (22:04:09): > Are there any known concerns with usingzellkonverter’s native R anndata reader? What is meant by “slightly different output”?

2024-05-30

Luke Zappia (03:05:57) (in thread): > The process is different which can result in some inconsistencies (mostly for less common objects). The normal Python reader goesH5AD -(anndata)-> Python -(reticulate)-> Rwhile the R reader goesH5AD -(rhdf5)-> R. At each conversion step decisions have to be made about how what formats to use and sometimes they can be different (often due to limitations of different packages). The R reader should work and put things in the same places but it’s not thoroughly tested. There is an effort to develop a much more robust native R interface but it’s a bit stalled at the moment.

Michael Milton (03:50:15) (in thread): > Thanks!

2024-06-13

Aedin Culhane (08:32:02): > set the channel topic: SingleCellExperiment Class in Bioconductor

2024-06-19

Aedin Culhane (12:49:51): > Anyone interested in working with BD data?

Maria Doyle (13:27:25): > @Maria Doyle has joined the channel

Tim Triche (20:54:21) (in thread): > Sort of.Have some AbSeq data lying around

2024-06-30

Nicolas Peterson (13:09:06): > @Nicolas Peterson has joined the channel

2024-07-05

Antonin Thiébaut (04:37:33): > @Antonin Thiébaut has joined the channel

2024-07-08

Sean Davis (09:24:29): - Attachment: Attachment > https://github.com/seandavi/awesome-single-cell still looking for updates. There is no Bioconductor section–would love to see that contribution!

Pedro Sanchez (09:58:45) (in thread): > Hi Sean! I have sort of an internal database about single-cell tools, so I may merge it with your repository! I’ll try to do it during my spare time during this summer

Pedro Sanchez (09:59:53) (in thread): > However, I’d be interested in knowing what is the overlap with similar efforts such ashttps://www.scrna-tools.org/. What’s your take on this?@Luke Zappia - Attachment (scrna-tools.org): scRNA-tools > A catalogue of single-cell RNA-sequencing analysis tools

Luke Zappia (11:04:54) (in thread): > There is some overlap (I tried to add tools from Sean’s page) but I would say they are complementary. scRNA-tools both has more information on tools and is more restricted (only includes tools, not other single-cell stuff). FYI (if you didn’t know already), scRNA-tools is now in “low-maintenance mode” which means I am no longer actively adding tools but will still make updates based on community contributions (similar to what Sean does).

Sean Davis (11:06:16) (in thread): > Definitely complementary resources. Luke’s is definitely more structured when it comes to tools. He has done an amazing job of organizing a “wild west” for the community!

Pedro Sanchez (11:09:54) (in thread): > Cool! Thank you for your answers. I’ll try to contribute during this month and the next one

2024-07-11

Hothri Moka (07:21:06): > @Hothri Moka has joined the channel

2024-07-30

Jorge Kageyama (17:48:03): > @Jorge Kageyama has joined the channel

2024-07-31

Zahraa W Alsafwani (17:21:18): > @Zahraa W Alsafwani has joined the channel

2024-08-08

Zhu Yujia (01:36:38): > @Zhu Yujia has joined the channel

2024-08-11

Lu Yang (13:29:59): > @Lu Yang has joined the channel

2024-08-19

Rema Gesaka (09:37:30): > @Rema Gesaka has joined the channel

2024-08-20

Ahmad Al Ajami (10:48:08): > I want to merge twosceobjects with the samecolDatabut slightly different sets of features. The merged object should include all features (so the union, not the intersection) from bothsceobjects. For features that are present in only one of the original objects, I want to fill the missing entries with zeros. > Is there an easy and straightforward way to do this without having to create a newsceobject?

Alik Huseynov (11:39:17) (in thread): > Basically full join replacing NAs with zeros? > Probably these would help:SummarizedExperiment::combineRows > SummarizedExperiment::combineCols

brian capaldo (12:54:55): > Trying to runscoreMarkerson a fairly large sce object (~130,000 cells), but getting an error I can’t quite figure out > > > # score markers > > label_markers <- scoreMarkers(sce_integrated, > group = sce_integrated$label, > block = sce_integrated$batch) > Error in FUN(x, table, nomatch = nomatch, incomparables = incomparables) : > 'match' requires vector arguments > > findMarkers works for some reason

Aaron Lun (13:23:39) (in thread): > nothing comes to mind, might need to see the traceback.

brian capaldo (13:24:45) (in thread): > will do that as soon as findMarkers finishes

brian capaldo (13:39:50) (in thread): > > > # score markers > > label_markers <- scoreMarkers(sce_integrated, > group = sce_integrated$label, > block = sce_integrated$batch) > Error in FUN(x, table, nomatch = nomatch, incomparables = incomparables) : > 'match' requires vector arguments > > traceback() > 13: FUN(x, table, nomatch = nomatch, incomparables = incomparables) > 12: match(desired.comparisons, averaged.comparisons) > 11: match(desired.comparisons, averaged.comparisons) > 10: .cross_reference_to_desired(pre.ave$averaged.comparisons, desired.comparisons, > collapse.symmetric = collapse.symmetric) > 9: .scoreMarkers(assay(x, assay.type), groups, ...) > 8: .local(x, ...) > 7: .nextMethod(x, groups = groups, ...) > 6: eval(call, callEnv) > 5: eval(call, callEnv) > 4: callNextMethod(x, groups = groups, ...) > 3: .local(x, ...) > 2: scoreMarkers(sce_integrated, group = sce_integrated$label, block = sce_integrated$batch) > 1: scoreMarkers(sce_integrated, group = sce_integrated$label, block = sce_integrated$batch) >

Aaron Lun (13:40:32) (in thread): > hm.

Aaron Lun (13:40:44) (in thread): > that doesn’t ring any bells at all.

Aaron Lun (13:40:59) (in thread): > if I had to guess, looks likeS4Vectors::match()isn’t being imported propertly.

brian capaldo (13:44:05) (in thread): > at the moment, findMarkers is sufficient. I’ll try this on some other sce objects I have floating around, and see if I can put together a reprex. This environment is a mess as I am swapping between seurat/singleCellExperiment/cell_data_set, so that could also be contributing

2024-08-21

Laura Symul (08:58:04): > @Laura Symul has joined the channel

2024-08-26

Krithika Bhuvanesh (22:50:04): > @Krithika Bhuvanesh has left the channel

2024-09-10

Alex Qin (03:46:35): > @Alex Qin has joined the channel

2024-09-20

Camille Guillermin (09:30:17): > @Camille Guillermin has joined the channel

2024-10-18

Felipe ten Caten (11:11:37): > @Felipe ten Caten has joined the channel

2024-11-27

brian capaldo (13:11:39) (in thread): > finally figured this out, it was seurat masking match

2024-12-04

brian capaldo (12:07:13) (in thread): > welp, misspoke, it’s actually monocle3

2025-01-09

Ammar Sabir Cheema (11:40:51): > @Ammar Sabir Cheema has joined the channel

2025-01-27

Lambda Moses (23:58:42): > I wonder if those who designed SCE have thought about this. Dimension reduction methods such as PCA, NMF, and many others return both the cell projections and gene loadings, and the gene loadings are often really interesting and I do make a big deal out of them. Right now it’s not very straightforward to access the gene loadings, which are stored in the attributes of thereducedDimentry and the structure of the attributes on gene loading and things like variance explained is not standardized. So what if there’s a field inint_elementMetadataanalogous toreducedDimsbut for gene loadings, so some entries inreducedDimshave corresponding entries in that field? Or should there be a S3 or S4 class to standardize the structure of the attributes for gene loading and variance explained and to give nice getters and setters?

2025-01-28

Stephanie Hicks (05:43:29): > hi@Lambda Moses! Thanks for the suggestion. Similar to you, I’m also a fan of leveraging the gene loadings. However, one of the challenges becomes how much to add vs not add when designing a class. For example, there are some that don’t even likeSingleCellExperimentfor single cell and would prefer to have just aSummarizedExperiment(basically to have it as minimal / lightweight as possible). And then there are others who would love to see a lot more added toSingleCellExperiment. It’s a always a tension.:confused:

Kasper D. Hansen (08:14:44): > I think the design issue here is that currently SCE supports some kind of dimension reduction for the cells, but is agnostic to the specifics. “Explained variation” is specific to PCA, and not all dimension reduction methods have gene loadings.

Kasper D. Hansen (08:20:44): > With the current design, every cell has a small number of “points” which is the reduced dimension. You could for example say there is a reduced dimension slot, and that slot either contains (say) the result from a PCA or a UMAP or a NMF. That would give you total flexibility in storage. But in that case, you have some downstream impacts. Because anytime you would want to write something taking a SCE as input, you would potentially need to special case what dimension reduction tool you used. Now, you can just grab a row of the matrix and you have what the cells have been mapped to.

Vince Carey (12:49:14): > There is the#biocclasseschannel for strategic discussion. Use cases are welcome … experimental class/method extension code would also be welcome. Dimension reduction processes and outcomes are not simple and users/devs should be assisted in computing/retrieving relevant details. But how to accomplish this without being overly prescriptive or rigid is the big question.

Lambda Moses (13:38:44): > I think it makes sense to standardize the gene loading part a bit while making it optional for dimension reductions that don’t have gene loadings. PCA is not the only one with loadings and standardization of the field serves many different dimension reduction methods. I already mentioned NMF. There’re also variants of PCA and NMF like those that enforce sparsity. MULTISPATI PCA also has loadings and eigenvalues (Moran’s I times variance explained). NSF (a form of spatial NMF) has loadings and we can get something like variance explained from postprocessing. But I would favor some flexibility, like no specific slots for PCA, UMAP, or NMF. That way we can store results say from NMF with different numbers of factors under different names. Meanwhile I’ve never usedcolPairsin SCE.

Peter Hickey (16:30:18): > FWIW i had a bit of a discussion with@jackgisbyaround this when reviewing their packageReducedExperiment: > * https://github.com/Bioconductor/Contributions/issues/3592#issuecomment-2544496329 > * https://github.com/Bioconductor/Contributions/issues/3592#issuecomment-2575271067 > They might also be interested in contributing to this discussion - Attachment: Comment on #3592 ReducedExperiment

Vince Carey (18:18:56): > glad u brought this up pete – very worthwhile reading

2025-02-03

Tim Triche (14:58:06) (in thread): > there is a class called LinearEmbeddingMatrix for storing this stuff, and we’ve written accessors for it as a side effect of visualization code for dimreds in Singlet. I am curious whether ReducedExperiment can be modified to take advantage of its implementation for similar purposes; it appears that the author has considered module stability and enrichment as criteria for factor enrichment analysis.

Lambda Moses (15:00:28) (in thread): > Good to know. Somehow I’ve never heard of it before.

2025-02-04

Lambda Moses (02:07:54) (in thread): > Then I wonder why scater’srunPCAandrunNMFdon’t useLinearEmbeddingMatrixwhile it’s clearly applicable

Tim Triche (05:11:36) (in thread): > Probably because the current accessors are kind of wonky.I implemented wrappers in singlet but the UX leaves much to be desired

2025-03-13

Tim Triche (08:17:30) (in thread): > Now I’m starting to wonder whether the ReducedDim* hierarchy couldn’t stand an overhaul. I can’t plot factors in iSEE the way I do features and that kind of sucks.

Tim Triche (08:18:27) (in thread): > i.e. “which factor/channel discriminates tissue vs. translocation vs. ageGroup” is a chore and usually involves squashing the SummarizedExperiment into a smaller one with metadata() holding the original-dimensional features, which is a bit absurd

2025-04-25

Marisa Loach (04:57:39): > @Marisa Loach has joined the channel