#osca-book

2019-08-15

Stephanie Hicks (14:12:41): > @Stephanie Hicks has joined the channel

Stephanie Hicks (14:12:41): > set the channel description: To discuss questions, problems, and contributions to the Orchestrating Single Cell Analysis (https://osca.bioconductor.org) bookdown book

Mike Smith (14:12:41): > @Mike Smith has joined the channel

Rob Amezquita (14:12:41): > @Rob Amezquita has joined the channel

Aaron Lun (14:12:41): > @Aaron Lun has joined the channel

Federico Marini (14:12:41): > @Federico Marini has joined the channel

Kevin Rue-Albrecht (14:12:41): > @Kevin Rue-Albrecht has joined the channel

Charlotte Soneson (14:12:41): > @Charlotte Soneson has joined the channel

Stephanie Hicks (14:15:56): > @Tim Triche

Tim Triche (14:16:29): > @Tim Triche has joined the channel

Sean Davis (14:19:09): > @Sean Davis has joined the channel

Laurent Gatto (14:19:12): > @Laurent Gatto has joined the channel

Qian Liu (14:30:05): > @Qian Liu has joined the channel

Tim Triche (14:35:42): > :oscar:

Federico Marini (14:47:00): > Hi:slightly_smiling_face:

Charlotte Soneson (14:47:39): > :wave:

Aaron Lun (16:24:10): > Right. Volunteers for things.

Aaron Lun (16:24:16): > What did I say needed to be done?

Aaron Lun (16:24:33): > Big ticket items are trajcectry and cell annotation.

Aaron Lun (16:24:46): > The latter really refers to signatures but we could put in a bit about SingleR as well.

Tim Triche (16:26:15): > tomorrow I can find out if Kin Lau (in our group) is up for singleR/cell-identity

Tim Triche (16:26:27): > it would be a very good fit if so.

Stephanie Hicks (16:33:14): > For trajectory, are we thinking of using slingshot? I’m happy to help there

Stephanie Hicks (16:34:02): > Also, I can help contribute a single nucleus qc since I brought it up. But it will likely be a month or two before I get to it.

Aaron Lun (16:52:18): > You can use whatever you like for trajectory.

Boris Hejblum (18:51:54): > @Boris Hejblum has joined the channel

Aaron Lun (23:53:41): > Right. Onto the rest of the data integration chapter.

2019-08-16

Aaron Lun (00:22:18): > Oh@Rob Amezquita! Where art thou?!

Aaron Lun (00:22:42): > Well, at least you’re still alive. I can see your Github commits.

Aaron Lun (00:23:56): > Argh. Can’t get any paywalled articles at home.

Kevin Rue-Albrecht (04:21:02): > Even with this ?https://sci-hub.tw - Attachment (sci-hub.tw): Sci-Hub: устраняя преграды на пути распространения знаний > Первый в мире пиратский ресурс, который открыл массовый доступ к десяткам миллионов научных статей

Tiago C. Silva (12:54:18): > @Tiago C. Silva has joined the channel

Aedin Culhane (16:37:01): > @Aedin Culhane has joined the channel

2019-08-17

Kevin Wang (00:17:03): > @Kevin Wang has joined the channel

Aaron Lun (04:34:43): > Alright, time to finish the fight.

Aaron Lun (04:35:20): > data integration while listening to the con air theme song.

2019-08-18

Aaron Lun (13:49:30): > Maybe@Rob Amezquitaturned off notifications?

Aaron Lun (13:49:49): > Anyone in Seattle? Can someone physically nudge him to pay attention to the slack?

Rob Amezquita (15:13:13): > Hey! Went backpacking and went dark. Will be back at it tomorrow

2019-08-19

Mikhael Manurung (16:12:14): > @Mikhael Manurung has joined the channel

2019-08-20

Aaron Lun (17:35:39): > @Kevin Rue-Albrechtis the chapter done?

Aaron Lun (17:35:59): > Get rid of the redundant t-sne citation and turn the “deployed examples” into a sentence.

Kevin Rue-Albrecht (17:39:47): > I haven’t touched anything for a while. I can look at it tomorrow night. Gotta sleep now, I need to look awake at the lab day tomorrow.

Aaron Lun (17:40:25): > Need to get your conference goggles

Aaron Lun (17:41:10): - File (PNG): image.png

Aaron Lun (17:41:36): > I used to wear something like this during my lab meetings.

Kevin Rue-Albrecht (17:44:52): > yeah… I don’t think our lab meetings work the same way then xD

Kevin Rue-Albrecht (17:47:57): > Alright, so tomorrow night: chapter polish and wrapping up the#iseeupdate forlogNormCounts(exprs_values="tophat_counts")After that I’m offline at my brother’s for the weekend

2019-08-21

Aaron Lun (17:44:13): > Scran is all yellow. Gogogogogogo.

Kevin Rue-Albrecht (17:48:31): > Just wrapping up the iSEE"logNormCounts"fixes. > Let me check this now > > Get rid of the redundant t-sne citation and turn the “deployed examples” into a sentence.

Kevin Rue-Albrecht (17:52:23): > OK.. I don’t get it: what “redundant t-sne citation” ? I don’t see duplicates inref.bib

Aaron Lun (17:53:42): > There should already be the same citation somewhere in that bib file. It must be there, because I copied the bib file from simplesInglecell.

Kevin Rue-Albrecht (17:54:20): > Well, I searched for “tsne” case insensitive. oh hang on maybe with hyphen then

Aaron Lun (17:54:36): > van2008visualizing?

Aaron Lun (17:54:42): > Can’t remember what it was.

Kevin Rue-Albrecht (17:55:33): > gotcha “van2008visualizing”

Kevin Rue-Albrecht (17:58:23): > Preview: > > For demontration and inspiration, we refer readers to the following examples of deployed applications:

Kevin Rue-Albrecht (18:15:36): > Dammit, we mentioned > > The shorter panel codes are useful for the configuration of tours, described in the section Examples of usages for iSEE apps. > We do have that section, but it doesn’t include a tour yet. Charlotte wrote a 3-step one for the workshop. I’ll throw that in now.

Kevin Rue-Albrecht (18:45:04): > Chapter should be good to go now.

Kevin Rue-Albrecht (18:45:36): > Happy to take suggestions for future versions, but that won’t happen this week

2019-08-24

Aaron Lun (17:37:06): > Anyone?

Aaron Lun (17:37:08): > Talk to me.

Aaron Lun (17:45:26): > @Rob Amezquita? What’s the progress on our timelines?

Aaron Lun (17:45:45): > Who else made promises about this book?

2019-08-25

Aaron Lun (17:25:42): > Well, insofar as I’m the only one writing anything, I’m going to play the game of “how many of my own citations can I get into each chapter?”

Aaron Lun (17:25:54): > Turns out it’s a lot.

Aaron Lun (17:26:31): > If you ever wondered what the connection was between single-cell RNA-seq and Hi-C… well, it’s there.

Aaron Lun (19:29:17): > Writing this book is so boring. I’ve been going at it for hours now.

Stephanie Hicks (21:51:43): > @Aaron Lunprogress was made in the trajectory Rmdhttps://github.com/Bioconductor/OSCABase/pull/4

Aaron Lun (22:01:23): > hooray

Aaron Lun (22:01:31): > will look at it after dinner

Aaron Lun (23:20:55): > alright, it’s time.

2019-08-26

Aaron Lun (01:37:14): > Comments added.

Stephanie Hicks (09:32:33): > thanks@Aaron Lun! I’ll tackle these edits

2019-08-28

Davide Risso (05:57:28): > @Davide Risso has joined the channel

2019-08-30

Aaron Lun (01:35:57): > Writing the annotation section.

Aaron Lun (01:36:07): > Geez. GO is useless.

Aaron Lun (01:36:15): > “metabolic process”. Wow, thanks.

Kevin Rue-Albrecht (04:36:46): > true, personally I find this one more informative > > GO:0016706 | oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors > > > xx <- select(GO.db, keys(GO.db), columns(GO.db), "GOID") > paste0(xx[which.max(nchar(xx$TERM)), c("GOID", "TERM")], collapse = " | ") >

Aaron Lun (19:51:29): > Alright<!channel>, got some long weekend tasks to do.

Aaron Lun (19:52:24): > I guess no one’s here, then.

Rob Amezquita (19:54:34): > I am!

2019-08-31

Stephanie Hicks (02:40:00): > @Aaron Lun— I’m out of the office all week / weekend, but back on Tuesday and will pick up with trajectory then!:slightly_smiling_face:

Aaron Lun (02:45:13): > Cell annotation on the march.

2019-09-01

Aaron Lun (13:00:00): > @Rob Amezquitayou’re going to need to install scran from GitHub, it’s yet to build.

Aaron Lun (14:42:54): > This quick start section is anything but. I’m tossing it out.

2019-09-02

Rob Amezquita (11:58:37) (in thread): > But my code for fgsea is so useful:sob:

Aaron Lun (11:59:25) (in thread): > Move it to the cell anotation section where it belongs.

Aaron Lun (19:48:31): > oh god it’s almost over

Aaron Lun (19:48:39): > been booking for three days.

Aaron Lun (20:04:33): > Right, I’ve had enough.

Aaron Lun (20:05:05): > no more booking for today

Aaron Lun (20:07:26): > @Rob AmezquitaYou need to make sure those workflows are showing up in a separate Part, not in the same Part as the topics.

Aaron Lun (20:09:53): > I think it’s because grun ended up on top of lun for some reason in_update.sh.

Aaron Lun (20:10:08): > alright, off to do some laundry.

2019-09-05

Aaron Lun (13:12:07): > <!channel>We need some urgent dev-ops support for this book.@Rob Amezquitais getting slaughtered on the hutch’s servers.

Stephanie Hicks (13:12:40): > what specifically?

Tim Triche (13:13:32): > ah shit Jenny’s not on this slack. she probably knows whatever is wrong there

Aaron Lun (13:13:47): > In the past few months we have daily builds but have actually only built the book twice.

Aaron Lun (13:14:26): > Part of that’s on my updates, which can’t be helped. But part of it’s on the apparent instability of the R environment of the hutch.

Aaron Lun (13:14:48): > For example, did you know that they compile with-march=native? Which means that any packages with native code built on one node may not execute on another.

Tim Triche (13:15:07): > oh gross

Rob Amezquita (13:17:14): > YEP.

Stephanie Hicks (13:17:27): > WAT

Stephanie Hicks (13:17:35): > what’s the reason for that?

Rob Amezquita (13:17:37): > cos theyre all about “optimizing”

Tim Triche (13:17:48): > @Rob Amezquitahave you talked with Dirk about that?

Rob Amezquita (13:18:05): > ive got a thread pleading for a super minimal version of R to be installed

Stephanie Hicks (13:18:11): > @Rob Amezquitawhat computational resources are need to build the book?

Rob Amezquita (13:18:14): > no packages, no march-native, nothing

Rob Amezquita (13:18:27): > just something that can execute daily builds

Aaron Lun (13:18:29): > Probably 16 GB RAM, one core.

Aaron Lun (13:18:45): > We don’t use multi core extensively. Maybe only in big-data.Rmd

Rob Amezquita (13:19:00): > @Tim Tricheim just gonna call them at this point. you think Dirk is the best go to or John?

Rob Amezquita (13:19:10): > getting pretty urgent since this thing is gonna go out the door v soon

Stephanie Hicks (13:19:19): > @Rob Amezquitaand you can’t install R locally in your home directory?

Rob Amezquita (13:19:33): > nope, i get all sorts of weird library errors, GLIBCXX not found whatever

Aaron Lun (13:19:53): > What we really want is to just build it on the Bioc machines.

Stephanie Hicks (13:19:57): > that’sinsanity

Aaron Lun (13:20:03): > It’s a BioC book.

Aaron Lun (13:20:11): > It’s hosted onbioconductor.org

Stephanie Hicks (13:20:22): > have you asked Martin?

Tim Triche (13:20:24): > @Rob Amezquitanot sure, I just pinged Jenny Smith in Soheil’s lab since she works with both

Rob Amezquita (13:20:41): > thanks@Tim Triche!

Rob Amezquita (13:22:03): > we could also look into containers, maybe once the book is more stable, to solve some of the problems

Rob Amezquita (13:23:10): > but for now, i think the easiest thing to do is just get this built on the hutch servers by having a clean, minimal build of R

Rob Amezquita (13:24:02): > also, any tips on the Make stuff would be appreciated:https://github.com/bioconductor/OrchestratingSingleCellAnalysis

Rob Amezquita (13:24:19): > so anything invoked by theMakefile, which is mostly made up of the_*.shfiles

Rob Amezquita (13:24:43): > any tips/optimizations would be:pray:

Kevin Rue-Albrecht (13:34:18): > Haven’t used containers extensively, but it sounds like they would provide a much more stable environment in this case. Names that come to mind are@Sean Davis(compiled the bioc2019 workshops),@Nitesh Turagaand@Davis McCarthyare big advocates too

Rob Amezquita (13:47:24): > okay, ive gotten on the phone with the Hutch IT and one of them is gonna make getting this built and stable a priority

Rob Amezquita (13:47:37): > @Aaron Luntheres no more errors right??:stuck_out_tongue:

Rob Amezquita (13:47:56): > (no more code errors hopefully)

Kevin Rue-Albrecht (13:52:43): > Was there ever a plotting strike team? I’m guilty of showing interest and not living up to it. From scrolling the current version from the top, it looks like the first plot I come across is base R (https://osca.bioconductor.org/quick-start.html, section 6.2). - Attachment (osca.bioconductor.org): Chapter 6 Quick Start | Orchestrating Single-Cell Analysis with Bioconductor > Online companion to ‘Orchestrating Single-Cell Analysis with Bioconductor’ manuscript by the Bioconductor team.

Rob Amezquita (13:53:12): > that chapters gonna be removed

Rob Amezquita (13:53:41): > but yeah, the plots could use some love. but i think we should freeze on any code work onOSCABaseuntil we figure out the build system

Kevin Rue-Albrecht (13:53:48): > I must admit that having a compiled version to consult would make it a lot easier to spot the plots to update than compiling each chapter locally

Aaron Lun (13:54:10): > Anything I wrote from scratch is base R.

Aaron Lun (13:54:20): > I will only consider changes if the total number of characters is reduced.

Rob Amezquita (13:54:21): > also a policy for plots would be good, and plotting related code. would be good to have the code folding implemented for any plotting-related code in particular

Kevin Rue-Albrecht (13:54:27): > > we should freeze on any code work onOSCABaseuntil we figure out the build system > Absolutely. Not a good idea to mix up structural and aesthetic updates

Kevin Rue-Albrecht (13:55:26): > > would be good to have the code folding implemented > I agree even more here. One of the first things that slowed me down in diving in the plotting aspects is the idea of turning the existing concise code into 10-line ggplot blocks

Rob Amezquita (13:55:30): > okay putting this here:

Rob Amezquita (13:55:33): > how would you tell R to install libraries to a custom location, and preserve that custom location, and make sure that libraries when loaded are pulled from there first before trying the site libraries

Rob Amezquita (13:55:57): > i want to get rid of my personal library location variable, so thatanyonecan build it

Kevin Rue-Albrecht (13:55:58): > .Rprofile?

Kevin Rue-Albrecht (13:56:37): > with a bunch oflib.loc,lib, and.libPathsvalues set there? (details TBD)

Rob Amezquita (13:56:45): > yeah but what do i set those to

Sean Davis (13:57:18) (in thread): > Starting thread on containers….

Sean Davis (13:57:24) (in thread): > See:https://github.com/Bioconductor/BiocWorkshops2019/blob/master/README.md

Kevin Rue-Albrecht (13:57:29): > @Charlotte Sonesonput up a fight with multiple libraries when we were preparing the bioc2019 workshop I believe

Rob Amezquita (13:57:43): > so the reason i have to have this in the first place is because i run aBiocManager::valid()call, and it checks theR_LIBS_SITEby default (or something thats not the personal library where the latest packages are)

Sean Davis (13:57:47) (in thread): > And:https://github.com/Bioconductor/BiocWorkshops2019/tree/master/inst/docker

Rob Amezquita (13:58:18): > and then for installation as well, just to avoid the issue of it trying to write to theR_LIBS_SITElocation where you dont have permissions (e.g. this build is being run as a user)

Rob Amezquita (13:58:32) (in thread): > thanks sean!

Rob Amezquita (13:58:37) (in thread): > how often is this docker updated

Rob Amezquita (13:58:51) (in thread): > or, i guess if we are reinstalling the packages anyways the issue is mostly moot.

Kevin Rue-Albrecht (13:58:53) (in thread): > @Rob AmezquitaHere’s the only snippet I could pull out from my DM with Charlotte > > > BiocManager::version() > [1] '3.10' > > BiocManager::valid(lib = .libPaths()[1]) > [1] TRUE >

Rob Amezquita (13:59:14) (in thread): > yeap thats what i do

Sean Davis (13:59:15) (in thread): > I wanted to point to the process, not the docker container.

Kevin Rue-Albrecht (13:59:32) (in thread): > OK. Well that’s as much as I can help right now, I’m afraid

Rob Amezquita (13:59:35) (in thread): > although sometimes depending on the specifics i have seen it be that the personal library is not first

Rob Amezquita (13:59:54) (in thread): > so im afraid its not entirely dependable as a solution, maybe a good bandaid for now though to just use the .libPaths()[1]

Sean Davis (13:59:55) (in thread): > The container is not updated; it is a static artifact so that the workshops and software are available as is for whenever.

Sean Davis (14:01:39) (in thread): > Start with the bioconductor full docker image that includes libraries for installing packages. Install packages into a docker volume so that you can tar up the package directory, once installed. That will bring you to the point where the instructions to run the docker image pick up.

Aaron Lun (14:02:15) (in thread): > Asking him now.@Martin Morgan

Sean Davis (14:02:18) (in thread): > The Dockerfile in the docker directory is probably close to what you want to start with.

Aaron Lun (14:02:43) (in thread): > I don’t know of any. But clearly that doesn’t mean that there are none. ¯*(ツ)*/¯

Rob Amezquita (14:03:05) (in thread): > awesome, thanks for the tips! ill look more into this

Sean Davis (14:03:34) (in thread): > Since the “master” workshop package had all the dependencies included, installation was pretty much just:BiocManager::install('MASTERWORKSHOPPACKAGE').

Sean Davis (14:09:09) (in thread): > This is also addressed by using docker.

Sean Davis (14:11:11) (in thread): > We have a small working group working on a lightweight build system for gitbook.https://github.com/Bioconductor/workshopbuilder

Sean Davis (14:11:42) (in thread): > @Marcel Ramos Pérez,@Lorena Pantano,@Nathan Sheffield, and@Levi Waldronare the main players right now.

Sean Davis (14:12:32): - Attachment: Attachment > We have a small working group working on a lightweight build system for gitbook. https://github.com/Bioconductor/workshopbuilder

Sean Davis (14:13:14): > We’d welcome input.#biocworkshops

Lorena Pantano (14:13:21): > @Lorena Pantano has joined the channel

Levi Waldron (14:22:27): > @Levi Waldron has joined the channel

Rob Amezquita (14:43:19) (in thread): > :thumbsup:

Marcel Ramos Pérez (15:26:18): > @Marcel Ramos Pérez has joined the channel

2019-09-09

Mike Smith (11:32:00): > I’ve applied for some resources on the German bioinformatics cloud network I’m a member of, with the thought that we could have a dedicated build machine there that we have complete control over. I’ll post back here when the approve/decline the application.

Aaron Lun (14:04:35): > Rob and I are getting absolutely killed here.

2019-09-10

Peter Hickey (19:18:50): > @Peter Hickey has joined the channel

2019-09-15

Koen Van den Berge (14:52:30): > @Koen Van den Berge has joined the channel

2019-09-17

Rob Amezquita (14:49:00): > anyone have a good dataset suggestion off the top of their head for writing about trajectory analysis?

Aaron Lun (14:49:46): > scRNAseq::RichardTCellData has a T cell activation time course

Aaron Lun (14:50:06): > You’ll need to get 1.99.6, which is still not built yet.

Kevin Rue-Albrecht (16:15:27): > Is it just me of is the new dark theme a bit intense?

Kevin Rue-Albrecht (16:16:05): > Personal opinion but I liked the light background better

Rob Amezquita (16:25:06): > either the dark theme is indeed unpopular or the light theme proponents are more outspoken about their preference, cant figure out which is more likely..:thinking_face:

Aaron Lun (16:25:31): > The dark theme is stupid.

Aaron Lun (16:25:36): > I can’t even see the code.

Aaron Lun (16:25:46): > And the plot background is white, which is jarring.

Rob Amezquita (16:26:38): > fine changing it back now

Kevin Rue-Albrecht (16:26:50): > Aaron beat me at the politically correct way of putting it

Rob Amezquita (16:26:52): > if/when it compiles itll revert to white default

Aaron Lun (16:26:54): > Don’t forget to get better syntax highlighting.

Rob Amezquita (16:27:05): > and with zenburn as the highlighter

Rob Amezquita (16:27:20): > speak now or hold your piece until the next successful build if you have a different preference

Federico Marini (16:27:54): > Could we have the highlighting…BiocStyle?

Rob Amezquita (16:28:15): > @Federico Marinihow would that be implemented in the bookdown?

Rob Amezquita (16:28:28): > right now im changing the syntax highlighter via the_output.ymloptions

Federico Marini (16:28:29): > no clue as of now

Federico Marini (16:28:51): > My guess would be some css toyingaround

Federico Marini (16:29:12): > A best bet on the go-to-guys for this:

Rob Amezquita (16:29:12): > yeah, i think it could probably get added to thestyle.cssthen

Federico Marini (16:29:22): > - andrzej oles, if he is around in Slack

Federico Marini (16:29:49): > @Mike Smithwho did a more than neat job with the MSMB book

Federico Marini (16:30:14): > (for which there is also a MSMBstyle package, if I recall correctly)

Rob Amezquita (16:31:04): > awesome, yeah if theres a concrete gameplan for how to go about changing the syntax highlighting, happy to accept a PR to change it

Kevin Rue-Albrecht (16:32:57): > I’m trying to find the information again, but I’m afraid it would rather require to define abiocbookoutput format rendering function .

Kevin Rue-Albrecht (16:32:59): > See?bookdown::gitbook

Rob Amezquita (16:34:22): > definitely would be cool for the upcoming Springer Bioconductor Everything series

Kevin Rue-Albrecht (16:40:25): > That could be helpfulhttps://bookdown.org/yihui/rmarkdown/format-derive.html- Deriving from built-in formats > - Fully custom formats > - Using a new format - Attachment (bookdown.org): 18.1 Deriving from built-in formats | R Markdown: The Definitive Guide > The first official book authored by the core R Markdown developers that provides a comprehensive and accurate reference to the R Markdown ecosystem. With R Markdown, you can easily create reproducible data analysis reports, presentations, dashboards, interactive applications, books, dissertations, websites, and journal articles, while enjoying the simplicity of Markdown and the great power of R and other languages.

Aaron Lun (16:49:15): > You can go peel out theBSPARAM=IrlbaParamcalls, all scran/scater defaults have been switched.

2019-09-18

Davide Risso (04:45:00) (in thread): > The data that we used to develop slingshot are quite nice… I can add the dataset to the scRNAseq if there’s interest in using it

Davide Risso (04:46:22) (in thread): > https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE95601 - Attachment (ncbi.nlm.nih.gov): GEO Accession viewer > NCBI’s Gene Expression Omnibus (GEO) is a public archive and resource for gene expression data.

Rob Amezquita (08:59:44) (in thread): > Yes please!

Aedin Culhane (11:09:56): > My student lauren Hsu is working through the osca book and is happy to provide feedback/suggestions/edits

Rob Amezquita (11:39:31): > thanks@Aedin Culhanewe would be happy to take any and all feedback! please let her know to submit a GH issue/PR and/or to reach out to me/Aaron if she has any questions

2019-09-21

Aaron Lun (19:11:48): > marco

2019-09-23

Rob Amezquita (10:30:48): > polo

Kevin Rue-Albrecht (11:36:46): > ralph lauren

Stephanie Hicks (13:08:53): > pollo:chicken:

2019-09-29

Aaron Lun (21:15:03): > People should try to spot pain points in the book. Stuff that is too hard to type should be wrapped in a function in one of the packages.

2019-10-01

Aaron Lun (01:16:35): > I’ve almost finished killing simpleSingleCell, so this book had better be building reliably.

Charlotte Soneson (01:32:51): > @Aaron LunSpeaking of simpleSingleCell, we were looking at the batch effect chapter of the OSCA book yesterday and there are some references tochosenin the text that seem not to have a correspondence in the code (supposedly lifted over from simpleSingleCell, where they do).

Aaron Lun (01:41:08): > Lines?

Aaron Lun (01:41:17): > Just mention them in the PR.

Charlotte Soneson (01:48:49): > Yeah, I didn’t make it to the end yet. Will do when I get to work.

Aaron Lun (01:49:02): > Fix them if the fixes are obvious

Aaron Lun (02:39:46): > Fixed

Aaron Lun (02:40:01): > Mergemasterinto your PR to avoid conflicts.

Charlotte Soneson (02:43:06): > Done

2019-10-03

Aaron Lun (14:14:30): > @Rob Amezquitacan you describe to@Vince Careythe problems with the book compilation?

Aaron Lun (14:22:41): > From what I remember: > - The Hutch insists on compiling with-march=native, which means that C++ code routinely breaks if we install packages on a different node than the one we use to build the book. > - Intermittent problems with accessing (already locally cached!) data from ExperimentHub or AnnotationHub; my guess would be something is broken with POSIX locking and sqlite. > - Intermittent problems with accessing knitr cached variables, which are heavily used in the “workflows” section. > > Now, I say “intermittent”, but because we do so much of it, there is guaranteed to be at least one failure during a single compilation of the book!

Rob Amezquita (17:59:36): > ^ yep and yep, those are the main problems

2019-10-04

Lauren Hsu (09:15:06): > @Lauren Hsu has joined the channel

2019-10-05

Aaron Lun (00:24:33): > @Rob Amezquitagive your HCA chapter a better name. You’re not analyzing all of the HCA.

Aaron Lun (19:09:08): > After several months, SSC migration is complete, and all readers are redirected to the book.

Aaron Lun (19:09:18): > So… we better get building,@Rob Amezquita. Like, right now.

Rob Amezquita (20:45:16): > nice! alright i am sending an ad hoc build since the cron system (still!) doesnt seem to like doing the build for whatever reason…time to troubleshoot whatever bugs got introduced..

Rob Amezquita (20:52:43): > would anyone know why a cron job would just “ghost” out of doing its job? like maybe it got killed or something? > > label: unnamed-chunk-1 > |................... | 29% > ordinary text without R code > > |..................... | 32% > label: quality-control > Loading required package: ggplot2 > > Those are the last few lines of the build log, and it just…ends there.:thinking_face:

Rob Amezquita (20:57:05): > emailed one of the scicomp people, maybe the build got too intense resource wise and the job manager killed it would be my naive guess..

Aaron Lun (21:58:40): > All workflows can be run on my laptop, except maybe that HCA one because I haven’t tried.

Aaron Lun (21:59:07): > Also,@Rob Amezquita, read the slack and rename your damn chapter.

2019-10-07

Aaron Lun (19:38:08): > @Stephanie Hicksthe trajectory chapter needs finishing, lest we live with the current placeholder forever.

Stephanie Hicks (21:16:08): > ack! thanks for the reminder@Aaron Lun— will take care of that this week

2019-10-08

Rob Amezquita (09:29:33): > @Stephanie Hicksin case you havent seen it:smile:https://osca.bioconductor.org/trajectory-analysis.html - Attachment (osca.bioconductor.org): Chapter 17 Trajectory Analysis | Orchestrating Single-Cell Analysis with Bioconductor > Online companion to ‘Orchestrating Single-Cell Analysis with Bioconductor’ manuscript by the Bioconductor team.

Stephanie Hicks (09:41:08): > hahahaha

Stephanie Hicks (09:41:26): > can i tweet this?

Stephanie Hicks (09:41:28): > :smile:

Stephanie Hicks (09:41:34): > it will motivate me to finish

Rob Amezquita (09:41:40): > haha yes!

2019-10-09

Aedin Culhane (17:25:39): > Hi Quick questions. I am trying to read a GEO study that had a bunch of csv.tar.gz files with cells in rows, genes in columns. There are different numbers of rows and columns in each file. I checked the OSCA and Aaron Bibles (https://bioconductor.org/packages/release/workflows/vignettes/simpleSingleCell/inst/doc/reads.html) The latter uses read.delim and cbind

Aedin Culhane (17:26:19): > Whats the best way to get these into a SingleCellExperiment

Aaron Lun (17:30:48): > Sounds lke you’d have to read each CSV in one at a time, if these are count tables.

Aaron Lun (17:31:24): > scater::readSparseCountsmay help if the tables are very large; this creates a sparse matrix in a chunk-wise fashion, rather than trying to read it all in and convert it at once.

Lauren Hsu (17:32:28): > What is the best way to merge them though?

Aaron Lun (17:33:43): > cbind. And if there’s different numbers of rows, you’ll just have to take the intersection, or do something that’s otherwise sensible.

Lauren Hsu (17:35:41): > ah, okay. is there a way to merge them as SingleCellExperiment objects and force the non-overlapping rows to have na or 0?

Aaron Lun (17:36:59): > Not at the moment. I don’t see a real application for that.NAis correct but would make the object very dfificult to work with, 0 is misleading.

Lauren Hsu (18:34:10): > Got it, thanks

2019-10-10

Federico Marini (14:56:05): > General Q to you@Rob Amezquitabut also for the other main contributors

Federico Marini (14:56:21): > Are we thinking of compiling also the pdf for the book?

Federico Marini (14:56:40): > Would be awesome to have a one-file snapshot of it

Rob Amezquita (15:01:15): > at some point yes

Rob Amezquita (15:02:00): > once we finish off a few major missing parts and can call it a “stable” version i think we can provide a snapshotted PDF on the site

Federico Marini (15:02:18): > awesome

2019-10-11

Aaron Lun (12:16:07): > <!channel>I need more volunteers to put in real data examples of integrating multiple datasets with one of the displayed methods.

Aaron Lun (12:16:34): > They don’t have to be fleshed out chapters, just write it in the vein ofhttps://osca.bioconductor.org/merged-pancreas.html - Attachment (osca.bioconductor.org): Chapter 30 Merged human pancreas datasets | Orchestrating Single-Cell Analysis with Bioconductor > Online companion to ‘Orchestrating Single-Cell Analysis with Bioconductor’ manuscript by the Bioconductor team.

Aaron Lun (19:34:53): > Come on people. There’s got to be at least one of you up for this.

2019-10-12

Aaron Lun (02:50:49): > Seriously? No volunteers? At all?

Tim Triche (10:16:38): > oh hey

Tim Triche (10:17:44): > I actually need to integrate some fetal, pediatric, and adult blood development datasets

Tim Triche (10:18:25): > So I might as well work through the same process to integrate all the Smart-Seq2, 10X, nanopore, etc. data properly

Tim Triche (10:19:07): > similarly with a bunch of xenograft datasets where I don’t care nearly as much about the tumors as the stroma

Tim Triche (10:19:51): > Incidentally, I re-ran singleR in parallel and serial with the 60k cells, the parallel speedup was marginal even on a 48-core box

Tim Triche (10:20:17): > that was surprising. I’ll try running it serial-then-parallel to see if the loading is the choke point and get a reprex.

Aaron Lun (12:59:10): > Suggest shifting the last part back to#sc-signature

Aaron Lun (14:33:48): > iSEE authors need to fix their chapter, some of the colData columns have disappeared from the SCE and they don’t work anymore.

Aaron Lun (14:35:06): > poking@Kevin Rue-Albrecht@Federico Marini@Charlotte Soneson

Aaron Lun (14:35:27): > Make sure to clear thetenx-unfiltered-pbmc4kcache and re-run it.

Kevin Rue-Albrecht (15:11:28): > Message received. Though you can also open a GH issue as a sticky reminder. We’ll poke you back with a PR.

Kevin Rue-Albrecht (18:20:45): > Speaking of, re-knittingtenx-unfiltered-pbmc4k.Rmdgives me the following error > > Quitting from lines 190-197 (tenx-unfiltered-pbmc4k.Rmd) > Error: all(c("CD14", "CD68", "MNDA") %in% topset) is not TRUE > > Running the code interactively now to see what may have changed …

Kevin Rue-Albrecht (18:25:20): > Also > > # Checking 'marker.set' and 'chosen.platelets' are consistent. > stopifnot(identical(marker.set, markers[[chosen.mono]])) > > looks inconsistent (‘chosen.platelets’ vs ‘chosen.mono’)

Kevin Rue-Albrecht (18:30:19) (in thread): > @Aaron Lunlooks likechosen.mono <- 7is the closest for me, although I only get 2 of the 3 markers: > > > c("CD14", "CD68", "MNDA") %in% topset > [1] TRUE FALSE TRUE >

Aaron Lun (18:37:26) (in thread): > Make sure you’re up to date with GitHub scran

Kevin Rue-Albrecht (18:38:56) (in thread): > Living on the devel > > Bioconductor version 3.10 (BiocManager 1.30.7), ?BiocManager::install for help > > BiocManager::valid() > [1] TRUE >

Kevin Rue-Albrecht (18:39:10) (in thread): > But let me try again.

Kevin Rue-Albrecht (18:39:24) (in thread): > I might not have been fully up to date earlier this evening

Aaron Lun (18:40:03) (in thread): > GitHub scran is several versions ahead of BioCdevel

Kevin Rue-Albrecht (18:40:15) (in thread): > owwww

Kevin Rue-Albrecht (18:40:55) (in thread): > way to keep me on my toes

Aaron Lun (18:40:59) (in thread): > Well, now it’s just one version ahead.

Aaron Lun (18:41:16): > yeas, copy pasted.

Kevin Rue-Albrecht (18:56:05) (in thread): > Ok cool it worked now, with the GH version. > I’ll have a look at iSEE tomorrow, if no one gets there before

2019-10-14

Fabiola Curion (04:54:30): > @Fabiola Curion has joined the channel

Fabiola Curion (05:06:33): > hi all, sorry for spamming, i read a couple of chapters from the book (e.g.,https://osca.bioconductor.org/integrating-datasets.html) and i know you’re using the 3k and 4k 10x datasets for demo but those are quite old and i was wondering if anyone could suggest a newer healthy PBMC published dataset that they’ve used happily (no obvious technical issues and good quality). I need a 10X healthy human PBMC sample (>5k cells) , possibly version 2-3 chemistry to align (MNNcorrect!) to a disease dataset I have - also i only have male patients. thanks! - Attachment (osca.bioconductor.org): Chapter 13 Integrating Datasets | Orchestrating Single-Cell Analysis with Bioconductor > Online companion to ‘Orchestrating Single-Cell Analysis with Bioconductor’ manuscript by the Bioconductor team.

Davide Risso (06:11:58): > would this work?https://data.humancellatlas.org/explore/projects/cc95ff89-2e68-4a08-a234-480eca21ce79

Davide Risso (06:19:50): > In general, the list of available HCA datasets is available here and some may be useful (I see blood, but cannot tell if it’s exactly what you need):https://data.humancellatlas.org/explore/projects

Fabiola Curion (06:20:01): > hi Davide,thanks, i thought of that too, but they were collected from bone marrow and umbilical cord so i’m wary that i’ll see a lot of differences just because my samples were collected from peripheral blood instead. i need to make some statements about cell types frequencies too so that can be a problem

Davide Risso (06:24:12): > I guess none of the datasets in theTENxPBMCDataare new enough?

Davide Risso (06:24:31): > Thepbmc68khas at least the advantage of having many cells

Fabiola Curion (06:29:14): > It makes sense to start screening the HCA data (and the 10x ones, i think they are old but some of them may not be as bad as i thought), i was just hoping that someone had already tried to do something similar to my task. it’s very complicated to draw conclusions if your own experiment doesn’t have controls, and also sometime dangerous to use someone else’s data as control…anyway i will try to screen for the “perfect control PBMC dataset” and hopefully won’t be too difficult. thanks!

Tim Triche (12:39:59): > how about this:https://bioconductor.org/packages/release/data/experiment/html/HCAData.html - Attachment (Bioconductor): HCAData > This package allows a direct access to the dataset generated by the Human Cell Atlas project for further processing in R and Bioconductor, in the comfortable format of SingleCellExperiment objects (available in other formats here: http://preview.data.humancellatlas.org/).

Tim Triche (12:40:29): > Now I’m thinking about how to package the fetal liver data similarly… that experiment is such a beast

Stephanie Hicks (23:27:27): > @Aaron Lun— I know I’m way over due on the trajectory chapter. I am making progress this week (https://github.com/Bioconductor/OSCABase/pull/4#) and anticipate it being done tomorrow if all goes well :)

Aaron Lun (23:31:36): > k

2019-10-15

Mikhael Manurung (06:54:26): > In the osca-book chapter 8 (Feature Selection),modelGeneVarwas used to model gene-wise variance. However, I cannot find it in the current ref manual ofscran. Is the function from developer version of scran? Is it in any way different from the existingtrendVar?

Rob Amezquita (10:31:53): > @Mikhael Manurungyes currently the book is running on Bioc-devel (3.10) (you can check your version withBiocManager::version()); you can switch over viaBiocManager::install(version = '3.10')and update your packages to the devel versions

Aaron Lun (11:19:53): > it works better thantrendVar()“off-the-shelf” when fitting a trend to the variances of the endogenous genes. Previously,trendVarused a naive loess, which would give inappropriate weight to the majority of low-abundance genes.

Mikhael Manurung (13:19:51): > By the way, is this slack group the right place for me to ask stuffs related to the osca-book? Or would the support forum be more appropriate? I wouldn’t want to spam this channel of course.

Aaron Lun (17:39:56): > support site is preferred in case other people would have the same question, given that you can’t easily search slack histories from google.

Aaron Lun (17:40:16): > But if it’s something quick or you think it’s a bug, I’ll take it here.

2019-10-24

Aaron Lun (15:10:24): > <!channel>It’s time to scroll through the book again and poke holes in it. Typos, confusing wording, inconsistent logic, etc.

2019-10-29

Jack Zhu (15:58:38): > @Jack Zhu has joined the channel

2019-11-01

Tim Triche (08:39:24): > quick q, is it reasonable to drop mentions of Datacamp from section 3.3 (and anywhere else in the book)? They have proven to be a bad actor, both individually (their CEO) and as a corporation, have hurt a personal friend, and have been so furtive about avoiding accountability that the Rstudio folks scrubbed all mention of them:https://twitter.com/rstudio/status/1117889813879746560 - Attachment (twitter): Attachment > 1. We have removed all links to DataCamp from our websites and have requested that DataCamp stop using our instructors’ names and likenesses on their website.

Tim Triche (08:40:45): > It would be different if they hadn’t tried to dodge accountability for months and months, but the effort was clearly deliberate, and there’s just no place for that in an otherwise friendly community (IMHO, hopefully in yours as well; maybe most don’t know)

Tim Triche (08:41:28): > Also I bolted together some trajectory inference examples if@Stephanie Hicksis too busy:wink:since I needed to plug that hole

Tim Triche (08:41:43): > https://trichelab.github.io/PJS/slingshot/

Tim Triche (08:42:43): > Still picking through the rubble of a lot of data structure changes that affected MTseeker, although I did implement a dual-reference strategy for tracking mitochondrial exchange and evolution in PDXs. Maybe that can go in later:wink:

Stephanie Hicks (08:42:47): > I am supportive of dropping mentions of Datacamp

Tim Triche (08:43:30): > I was surprised that UMAP vs. PCA and exclusion of dead cells made such a huge difference in lineage/trajectory inference btw

Stephanie Hicks (08:43:34): > also, this is as far as I’ve gotten in trajectory (https://github.com/Bioconductor/OSCABase/pull/4#issuecomment-544795083)

Stephanie Hicks (08:43:46): > I would welcome help@Tim Triche:upside_down_face:

Tim Triche (08:44:05): > oh hey

Tim Triche (08:44:19): > so have you tried plotting bulk samples onto inferred SC lineages?

Stephanie Hicks (08:44:28): > i have not

Tim Triche (08:44:33): > because that was the only reason I ever cared about sc in the first place:slightly_smiling_face:

Tim Triche (08:45:19): > I’m going to give it a shot with some clinical samples against inferred fetal/adult hematopoietic lineage trees for today’s workshop (and for a paper that has been languishing for months due to me sucking and not being able to figure it out)

Tim Triche (08:46:06): > if you work with Pat Brown (for example) you know that one of the open questions is whether the cell of origin is different in pediatric vs adult leukemia, and whether that makes a clinical difference in how people should be treated

Tim Triche (08:46:29): > communicating this clearly to anything other than ultraspecialists has been hellish for many years

Tim Triche (08:46:47): > the single-cell framework makes it “easy”, for certain values of easy

Tim Triche (08:47:01): > (which is to say, “barely possible”)

Stephanie Hicks (08:47:12): > pat brown aka inventor of the microarray and the impossible burger?

Tim Triche (08:47:20): > no Patrick Brown at Hopkins

Stephanie Hicks (08:47:26): > oh lol

Tim Triche (08:47:33): > pediatric heme/onc

Stephanie Hicks (08:47:37): > got it

Tim Triche (08:47:49): > Pat Brown is cool too but I hear the burgers are keeping him busy.

Tim Triche (08:48:01): > I need to look up their middle names.

Stephanie Hicks (08:48:02): > too many cool pat browns to keep track of

Tim Triche (08:48:25): > Anyways, I’ll check in later. Interested to hear whether snRNAseq works on FFPE for you.

Tim Triche (08:48:45): > With any luck the abuse from workshop participants will produce something worthwhile for OSCA too.

Stephanie Hicks (08:48:57): > yeah, still in the phase of making sure we are mapping to the right thing.

Stephanie Hicks (08:49:06): > I know aligning to mRNA is not right

Tim Triche (08:49:17): > as in kallisto? or… ?

Stephanie Hicks (08:49:26): > but can’t figure out how to create a “pre mRNA” transcriptome with both introns and exons

Tim Triche (08:49:57): > seen this?https://bustools.github.io/BUS_notebooks_R/velocity.html

Stephanie Hicks (08:50:14): > akahttps://support.bioconductor.org/p/125544/

Tim Triche (08:50:28): > we have been doing a lot of single cell total RNAseq lately and will have to deal with it more, soon

Charlotte Soneson (08:52:48): > @Stephanie HicksWe have been doing this a bit - happy to discuss/provide our code if you think that might helphttps://community-bioc.slack.com/archives/CM2CUGBGB/p1572612566072000 - Attachment: Attachment > but can’t figure out how to create a “pre mRNA” transcriptome with both introns and exons

Stephanie Hicks (08:53:39): > @Charlotte Sonesonthat would be really helpful!

Stephanie Hicks (08:55:03): > any code would be great, but happy to talk if you have some time.:slightly_smiling_face:

Stephanie Hicks (08:57:13): > i was thinking I could just map to the genome, but then that doesn’t really take into account different types of pre mRNA isoforms

Tim Triche (09:24:11): > the kallisto/bustools approach is meant to enable fast RNA velocity calculations (like for PAGA/scvelo) but the use of flanking cDNA sequences seems like it ought to suffice for your needs too. The examples are all in R using ensembldb to build the txome, although I don’t see why it couldn’t be generalized for any arbitrary GTF and flanking sequence length (other than mapping uncertainty within big introns holding a lot of repetitive sequence)

Tim Triche (09:26:01): > * to build the cDNA txome and cDNA+intronflanking txome, so as to compare the relative abundance of the two for scvelo

Tim Triche (09:26:51): > cf.https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1663-xwhich is the paper I was psychotic enough to present today:scream: - Attachment (Genome Biology): PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells > Single-cell RNA-seq quantifies biological heterogeneity across both discrete cell types and continuous cell transitions. Partition-based graph abstraction (PAGA) provides an interpretable graph-like map of the arising data manifold, based on estimating connectivity of manifold partitions ( https://github.com/theislab/paga ). PAGA maps preserve the global topology of data, allow analyzing data at different resolutions, and result in much higher computational efficiency of the typical exploratory data analysis workflow. We demonstrate the method by inferring structure-rich cell maps with consistent topology across four hematopoietic datasets, adult planaria and the zebrafish embryo and benchmark computational performance on one million neurons.

Tim Triche (09:29:09): > they mention, but don’t really emphasize, that RNA velocity estimates via scvelo can be used to inform trajectories. We saw this a lot in the CD4/CD8 data from infection time courses, where instantaneous “rare cell” populations resolved upon plotting RNA velocity into cells transiting known states. It turns out that velocyto’s estimation procedure is arcane (to be polite) and since we wanted to use the magnitude of the velocity vector to shade plot alpha values, down the rabbit hole we went. PAGA turned out to be on the other end of the rabbit hole.

Tim Triche (11:41:30): > nb. more on the whole “include 100bp of flank vs include the whole introns” here:https://liorpachter.wordpress.com/2019/07/01/high-velocity-rna-velocity/ - Attachment (Bits of DNA): High velocity RNA velocity > This post is the fifth in a series of five posts related to the paper “Melsted, Booeshaghi et al., Modular and efficient pre-processing of single-cell RNA-seq, bioRxiv, 2019“. The posts are: > Near-optimal pre-processing of single-cell RNA-seq > Single-cell RNA-seq for dummies > How to solve an NP-complete problem in linear time > Rotating the knee (plot) and related yoga > High velocity RNA velocity > The following passage about Beethoven’s fifth symphony was written by one of my favorite musicologists: > “No great music has ever been built from an initial figure of four notes. As I have said elsewhere, you might as well say that every piece of music is built from an initial figure of one note. You may profitably say that the highest living creatures have begun from a single nucleated cell. But no ultra-microscope has yet unraveled the complexities of the single living cell; nor, if the spectroscope is to be believed, are we yet very full informed of the complexities of a single atom of iron : and it is quite absurd to suppose that the evolution of a piece of music can proceed from a ‘simple figure of four notes’ on lines in the least resembling those of nature.” – Donald Francis Tovey writing about Beethoven’s Fifth Symphony in Essays in Musical Analysis Volume I, 1935. > This passage conveys something true about Beethoven’s fifth symphony: an understanding of it cannot arise from a limited fixation on the famous four note motif. As far as single-cell biology goes, I don’t know whether Tovey was familiar with Theodor Boveri‘s sea urchin experiments, but he certainly hit upon a scientific truth as well: single cells cannot be understood in isolation. Key to understanding them is context (Eberwine et al., 2013). > RNA velocity, with roots in the work of Zeisel et al., 2011, has been recently adapted for single-cell RNA-seq by La Manno et al. 2018, and provides much needed context for interpreting the transcriptomes of single-cells in the form of a dynamics overlay. Since writing a review about the idea last year (Svensson and Pachter, 2019), I’ve become increasingly convinced that the method, despite relying on sparse data, numerous very strong model assumptions, and lots of averaging, is providing meaningful biological insight. For example, in a recent study of spermatogonial stem cells (Guo et al. 2018), the authors describe two “unexpected” transitions between distinct states of cells that are revealed by RNA velocity analysis (panel a from their Figure 6, see below): > > Producing an RNA velocity analysis currently requires running the programs Cell Ranger followed by velocyto. These programs are both very slow. Cell Ranger’s running time scales at about 3 hours per hundred million reads (see Supplementary Table 1 Melsted, Booeshaghi et al., 2019). The subsequent velocyto run is also slow. The authors describe it as taking “approximately 3 hours” but anecdotally the running time can be much longer on large datasets. The programs also require lots of memory. > To facilitate rapid and large-scale RNA velocity analysis, in Melsted, Booeshaghi et al., 2019  we describe a kallisto|bustools workflow that makes possible efficient RNA velocity computations at least an order of magnitude faster than with Cell Ranger and velocyto. The work, a tour-de-force of development, testing and validation, was primarily that of Sina Booeshaghi. Páll Melsted implemented the bustools capture command and Kristján Hjörleifsson assisted with identifying and optimizing the indices for pseudoalignment. We present analysis on two datasets in the paper. The first is single-cell RNA-seq from retinal development recently published in Clark et al. 2019. This is a beautiful paper- and I don’t mean just in terms of the results. Their data and results are extremely well organized making their paper reproducible. This is so important it merits a shout out 👏🏾 > See Clark et al. 2019‘s  GEO GSE 118614 for a well-organized and useful data share. > The figure below shows RNA velocity vectors overlaid on UMAP coordinates for Clark et al.’s 10 stage time series of retinal development (see cell [8] in our python notebook): > > An overlap on the same UMAP with cells colored by type is shown below: > > Clark et al. performed a detailed pseudotime analysis in their paper, which successfully identified genes associated with cell changes during development. This is a reproduction of their figure 2: > > We examined the six genes from their panel C from a velocity point of view using the scvelo package and the results are beautiful: > > What can be seen with RNA velocity is not only the changes in expression that are extracted from pseudotime analysis (Clark et al. 2019 Figure 2 panel C), but also changes in their velocity, i.e. their acceleration (middle column above). RNA velocity adds an interesting dimension to the analysis. > To validate that our kallisto|bustools RNA velocity workflow provides results consistent with velocyto, we performed a direct comparison with the developing human forebrain dataset published by La Manno et al. in the original RNA velocity paper (La Manno et al. 2018 Figure 4). > > The results are concordant, not only in terms of the displayed vectors, but also, crucially, in the estimation of the underlying phase diagrams (the figure below shows a comparison for the same dataset; kallisto on the left, Cell Ranger + velocyto on the right): > > Digging deeper into the data, one difference we found between the workflows (other than speed) is the number of reads counts. We implemented a simple strategy to estimate the required spliced and unspliced matrices that attempts to follow the one described in the La Manno et al. paper, where the authors describe the rules for characterizing reads as spliced vs. unspliced as follows: > 1. A molecule was annotated as spliced if all of the reads in the set supporting a given molecule map only to the exonic regions of the compatible transcripts. > 2. A molecule was annotated as unspliced if all of the compatible transcript models had at least one read among the supporting set of reads for this molecule mapping that i) spanned exon-intron boundary, or ii) mapped to the intron of that transcript. > In the kallisto|bustools workflow this logic was implemented via the bustools capture command which was first use to identify all reads that were compatible only with exons (i.e. there was no pseudoalignment to any intron) and then all reads that were compatible only with introns  (i.e. there was no pseudoalignment completely within an exon). While our “spliced matrices” had similar numbers of counts, our “unspliced matrices” had considerably more (see Melsted, Booeshaghi et al. 2019 Supplementary Figure 10A and B): > > To understand the discrepancy better we investigated the La Manno et al. code, and we believe that differences arise from the velocyto package logic.py code in which the same count function > def count(self, molitem: vcy.Molitem, cell_bcidx: int, dict_layers_columns: Dict[str, np.ndarray], geneid2ix: Dict[str, int]) > appears 8 times and each version appears to implement a slightly different “logic” than described in the methods section. > A tutorial showing how to efficiently perform RNA velocity is available on the kallisto|bustools website. There is no excuse not to examine cells in context.

Tim Triche (11:41:50): > BUSPaRse is now in BioC, fwiw.

Stephanie Hicks (12:17:19): > Thanks@Tim Triche!

Aaron Lun (13:29:53): > I don’t have much to say here, as long as we don’t drag in too many system dependencies.

Aaron Lun (13:30:46): > The reliability of the hutch’s servers is a real problem and we can’t be adding more failure points.

Rob Amezquita (15:07:15) (in thread): > absolutely 100% supportive of this. i simply havent made time to go back and edit out/replace all the links…do you have alternate resources I could link to for helping people get started with R? would help a lot with editing

2019-11-04

Tim Triche (08:47:01) (in thread): > Um,@Michael Love’s and@Rafael Irizarry’s awesome course?http://genomicsclass.github.io/book/orhttps://courses.edx.org/courses/HarvardX/PH525x/1T2014/dffde833663e4f71ab64246ebe5598d1/?

Tim Triche (08:47:30) (in thread): > Usually I just send interns/techs/rotation students/anyone who sets foot in my lab to Mike’s online course.

Tim Triche (08:48:56) (in thread): > However, maybe it’s better to find a more general overview like Jeff Leek’s or Kasper’s courses? There are a few:slightly_smiling_face:

Tim Triche (09:03:53): > (Looking at@Stephanie Hicks’s PR – yeah I can see how this sort of thing becomes a problem – ugh)

Tim Triche (09:04:05): > What is the relationship between OSCAbase and OSCA-the-book?

Tim Triche (09:09:34): > I am wondering whether a comparison between trajectories in the HCA adult BM vs. Haniffa’s fetal liver data is worthwhile (it is for me personally, but like you said, maybe it breaks the Hutch’s servers… ugh)

Rob Amezquita (10:17:22) (in thread): > Yeah I think a more general overview would be good - is there a course that has a similar interactive sort of a platform for coding? A la Codecademy and the like

Rob Amezquita (10:17:41) (in thread): > Courses are great but have a lot of requirements for getting up and running.

Tim Triche (11:23:47) (in thread): > Maybe this sort of thing?https://learnr-examples.shinyapps.io/ex-data-summarise/

Tim Triche (11:24:48) (in thread): > specificallyhttps://learnr-examples.shinyapps.io/ex-data-summarise/#section-summarise-groups-with-summarise(run code / submit answer)

Rob Amezquita (11:29:08) (in thread): > thats a nice resource, but still not quite what im looking for. turns out though codecademy has a pretty good substitute that im looking for with interactive console

Rob Amezquita (11:29:17) (in thread): > https://www.codecademy.com/learn/learn-r - Attachment (Codecademy): Learn R | Codecademy > R is a popular language used by data scientists and researchers. If you are working with data, R is a fantastic language to learn.

Tim Triche (11:38:01) (in thread): > That looks like a good resource.

Rob Amezquita (11:42:40) (in thread): > https://github.com/Bioconductor/OSCABase/commit/fe06be17922cd43ea04d9b349ee809681ee3148f

Rob Amezquita (11:43:27) (in thread): > thanks again tim for bringing this to my attention and prompting me to update it - i shouldve just updated it way before as i knew about the DC stuff but hadnt made time to simply update the resource.

Aaron Lun (11:45:39): > The base provides the Rmarkdown files and various functions required strictly to compile individual reports. The book contains all of the HTMLs that get generated. We split them so that people wanting to edit the Rmarkdowns didn’t have to work in the repo with huge blobs because of the images.

Tim Triche (11:50:01): > OK cool, thanks. Now I have an idea of where to put things. I was playing with geometric sketching and attempting to characterize the effect of sampling on trajectories (in a very limited sense), that might be worthwhile given that slingshot can take hours to run on the datasets I’ve been using. (Although it does a very nice job in those hours!)

Tim Triche (11:55:17) (in thread): > :thumbsup:

Tim Triche (11:55:24) (in thread): > thanks for doing this.

Rob Amezquita (12:04:58) (in thread): > this could be an interesting inclusion on “trajectory”, if we could get a precomputed ExperimentHub based dataset for this sort of data would definitely be nice to include in OSCA

Tim Triche (12:06:08) (in thread): > bingo

Tim Triche (12:06:31) (in thread): > the PAGA and scVelo papers discuss this, I think it’s the endgame for these types of analyses

Rob Amezquita (12:06:55) (in thread): > do you have a pipeline for generating this sort of data?

Rob Amezquita (12:07:23) (in thread): > i have one worked out using kallisto-bustools, maybe i should just run it on the fmous PBMC example dataset and add it to TENxPBMCData or something..

Tim Triche (12:08:19) (in thread): > Somewhat. Mondays are usually my writing days but I will check on its status as soon as I get in today. We wanted to automate the kallisto -> bustools -> scvelo -> paga parts and possibly figure out what degree of downsampling is acceptable to retain the unique bits of information in a dataset (since geosketch is pretty fast)

Tim Triche (12:08:41) (in thread): > HCA bone marrow would be my choice for an example (ulterior motives as usual)

2019-11-07

Mike Smith (11:01:57) (in thread): > What’s the current paradigm for building, it sounds like it’s a headache? If it helps I’ve got access to a 28 core/64GB VM where I’m the sys admin. I’m playing with running R studio server there, and it if would be helpful for people to edit and build the book via that interface it’d be a cool uses of the resources. > > The login can be found athttp://134.176.27.124& send me a message if you’re interested in trying it out.

Rob Amezquita (11:06:31): > current build runs on the hutch servers as a cron job overnight…it actually has started to become a lot more reliable (until i broke it a couple days ago), just has been a bit of a headache while things have been in major flux

Rob Amezquita (11:07:09): > i love the idea of an interactive interface for editing though@Mike Smith, that is a neat idea for simplifying the editing process!

Kevin Blighe (11:28:01): > @Kevin Blighe has joined the channel

2019-11-21

Aaron Lun (11:30:57): > @Rob Amezquitabook needs separate builds for devel and release.

Rob Amezquita (14:13:42): > alrighty finally got the dang HCA stuff fixed and sorted out so making a new version now.@Aaron Lunwhat would be a good strategy to separate the builds for devel and release?

Rob Amezquita (14:14:23): > and should they each publish separate websites? e.g. a devel branch that publishes toosca-devel.bioconductor.org

Aaron Lun (14:15:39): > yes.

Rob Amezquita (14:16:11): > how exactly? i dont think github pages supports publishing from multiple branches to separate sites

Kevin Rue-Albrecht (14:16:23): > > yes > you asked for it@Rob Amezquita:rolling_on_the_floor_laughing:

Aaron Lun (14:16:47): > Well, how is BioC hosting it? Can’t they just pull from a separate branch?

Rob Amezquita (14:18:04): > via github pages -> CNAME something something

Rob Amezquita (14:18:10): > i dont understand the details obviously

Rob Amezquita (14:18:41): > easiest solution is to create another repo entirely

Rob Amezquita (14:18:48): > and host from there

Rob Amezquita (14:19:17): > and sync the two with the exception of the install file using devel bioc branch

Aaron Lun (14:19:43): > Note that you can just install dependencies from theSuggests:field inOSCAUtils.

Aaron Lun (14:20:26): > There’s a full list there.

Rob Amezquita (14:21:13): > but why?

Rob Amezquita (14:21:35): > sorry didnt follow the jump there

Aaron Lun (14:22:08): > Reduce information in Hutch-specific scripts required to build the book.

Rob Amezquita (14:24:46): > right now it pulls the dependencies straight from leafing throughOSCABase- so the main argument to change would be in the_install.shscript to change the version from3.10to devel

Rob Amezquita (14:25:04): > so im not getting whereOSCAUtilsis fitting into that?

Aaron Lun (14:25:08): > The installation script would no longer need to scrape the files.

Rob Amezquita (14:25:26): > ah okay, how isOSCAUtilsgetting its dependencies?

Aaron Lun (14:26:14): > there’s another function inside it that scrapes the files. The point being to minimize the information in the scripts.

Rob Amezquita (14:27:13): > i see i see, so thats going to change the_install.shscript - did you submit a PR for that yet?

Rob Amezquita (14:27:35): > if not i can look into it

Aaron Lun (14:27:38): > No.

Rob Amezquita (14:27:50): > doesnt solve the issue tho of publishing multiple versions of osca

Rob Amezquita (14:28:02): > a release vs devel version

Aaron Lun (14:28:11): > It does allow us to move more build information out of the repo.

Aaron Lun (14:28:27): > So you could imagine a OSCABuild repo that contains version-agnostic scripts.

Aaron Lun (14:28:49): > Then the Orchestrating* and Orcehstrating-devel repos will literally only contain the output of running those scripts.

Aaron Lun (14:29:08): > No need to keep anything in sync between those two base repos.

Rob Amezquita (14:29:16): > that would be nice…i see now what youre proposing…

Rob Amezquita (14:29:21): > ingenious

Aaron Lun (14:29:28): > If the scripts can be generalized, then they arguably belong in oscabase.

Rob Amezquita (14:29:39): > essentially migrate as much of the make stuff toOSCAUtils

Rob Amezquita (14:30:04): > and haveOSCAUtilsjust generate the website a la pkgdown sites

Aaron Lun (14:30:26): > … yes. But there had better be no hutch-specific stuff in there.

Rob Amezquita (14:31:35): > so at the moment i think the main hutch specific stuff is in the cron job that actually builds the book and pulls my custom R version for building

Rob Amezquita (14:33:19): > yeah so the cronjob right now is the main hutch specific thing in order to deal with installation hiccups (installs only work on the head/login node, and the worker nodes for whatever reason just cant find key libraries for installs, hence the separation)

Rob Amezquita (14:33:32): > _cron.shand_cron-install.sh

Aaron Lun (14:34:49): > Well. let’s just get started with everything else.

Aaron Lun (14:35:24): > make a new dir (build) in oscabase and move all general build scripts there, noting the ability to strip down install.sh. Check that all suggests are covered though.

Rob Amezquita (14:38:31): > sounds like a plan, then we’ll have a squeeky clean Orch* and can make a Orchrelease and Orchdevel repo to publish on

Sean Davis (16:42:34): > Are you doing these builds on Linux?

Sean Davis (16:42:59): > Has anyone tried to use the bioconductor_full (after installing tinytex) to do the builds?

Sean Davis (16:43:51): > If that works once, it would free folks from the Hutch machines and we could work toward better more robust automation.

Sean Davis (16:44:37): > The docker approach worked reasonably well for last year’s workshops.

2019-11-27

Aaron Lun (23:49:17): > is rob alive?

Aaron Lun (23:49:19): > @Rob Amezquita!

2019-11-28

Aaron Lun (21:47:26): > HELLO

Aaron Lun (22:48:31): > ROB

2019-11-30

Aaron Lun (00:52:09): > Bone marrow 380000 cells is sending my laptop into swap.

Aaron Lun (00:52:24): > It took 2 hours to run. Geez.

Aaron Lun (01:46:02): > @Rob AmezquitaI cut out your gene set bit because it uses a non-CRAN package, and I cut out the slingshot bit because it depends on hard-coded cluster identities without prior checks on what those clusters are meant to be.

2019-12-02

Rob Amezquita (13:55:26) (in thread): > its on Linux on the Hutch clusters, but haven’t tried the docker/singularity approach yet, unfortunately our file system is not well configured to support it and involves getting even more help to get started, but may be the way to go forward from here

Rob Amezquita (13:55:46): > awesome! thanks!

Aaron Lun (13:56:22): > ETA on the devel builds?

Rob Amezquita (13:57:30): > devel builds are every night since i switch over to devel (the current codebase isnt going to work with release, will need to rollback the repo to a commit from 11-15 and tag that version)

Rob Amezquita (13:59:15): > and then from there update the website attribution info to the publication, and that’l be the “release” 3.10 version of the book (probably would need to also cut out trajectory and any other WIP sections)

Rob Amezquita (13:59:43): > still need to decouple the build scripts out from the main book repo into the base repo

Sean Davis (14:59:07) (in thread): > Do you need a cluster, or just one or more machines?

Rob Amezquita (18:03:39) (in thread): > dont necessarily need a cluster, just a machien that can dedicate ~2-4 hours to building/installing on a regular basis. for now going to try to work things out with my IT to get proper sys admin support

Rob Amezquita (18:07:23) (in thread): > ive emailed scicomp at the Hutch to see about getting some consultation about how to get this set up, will report back if/when i get some good news on this front

Aaron Lun (22:25:24): > As in, you have switched, or what?

Sean Davis (22:44:53) (in thread): > @Rob Amezquitaperhaps we should talk (audio) at some point. We can likely accomplish this with much less fuss in a cloud environment.

Sean Davis (22:47:58) (in thread): > As an aside, Matthew Trunnell at the Hutch might be interested in getting one or more of his folks interested. Let me know if you want an intro.

Rob Amezquita (23:03:52) (in thread): > Thanks for the tip@Sean DavisI know Matthew and will reach out to him too - agreed that a cloud environment could be a lot less headache

2019-12-04

Aaron Lun (11:52:05): > @Rob AmezquitaI don’t see builds. Talk to me.

2019-12-05

Juan Ojeda-Garcia (11:15:40): > @Juan Ojeda-Garcia has joined the channel

Aaron Lun (17:22:36): > @Rob Amezquitabuild build build buildbuilbd uilbduilbduilbduildbudilbdluidb

Kevin Rue-Albrecht (17:23:04): > sounds like build spirit

2019-12-06

Aaron Lun (11:31:35): > Y’know@Rob Amezquitait wouldn’t hurt to give us a status update.

2019-12-10

Robert Ivánek (05:40:22): > @Robert Ivánek has joined the channel

Chris Vanderaa (07:42:18): > @Chris Vanderaa has joined the channel

2019-12-13

Aaron Lun (00:40:45): > I guess@Rob Amezquitais done for the year, then.

Aaron Lun (00:40:52): > Will anyone else step up to build this book?

Stephanie Hicks (00:44:09): > If I remember correctly, i believe@Sean Davissaid he had intentions of helping to solve that problem@Aaron Lun(i.e. building the book). But I’m unclear about what stage that’s in?

Aaron Lun (01:20:58): > I dreamt of a world where I could go to sleep and the book would build and I would wake up and see the book.

2019-12-16

Mike Smith (09:58:04): > @Aaron Lunlets come up with a solution for this then. Unfortunately the VM I linked to before is currently out of action because the entire university is offline after a security incident (https://bit.ly/2PPTLPh), so an interactive build machine is off the table for now, but I see no reason we can’t use EMBL hardware & the BioC docker images to do this on a routine basis.

Aaron Lun (11:43:43): > Everything should work off the bat with BioC-devel, interactive builds should not be required. I would do this myself but the HCA workflow doesn’t play nice with our cluster - some kind of memory issue during parallelization.

Sean Davis (14:38:40): > Is there a build script somewhere in the repo? What are the memory and compute requirements for the build?

Sean Davis (14:39:48): > In other words, if one had a linux instance running bioconductor_full, what would one do to build the book?

Aaron Lun (16:30:33): > There are some instructions inhttps://github.com/Bioconductor/OrchestratingSingleCellAnalysis

Aaron Lun (16:30:36): > README

Aaron Lun (16:31:04): > It should be a case ofmake update,make knitandmake install. Probably could be streamlined but that’s where it is right now.

Mike Smith (16:55:36): > Which version of BioC is it supposed to use?https://github.com/Bioconductor/OrchestratingSingleCellAnalysis/blob/b31f2328aca88de9edafefd42cb81c3dd77f41b0/_install.sh#L6seems to suggests BioC 3.10 rather than devel (unless I’m getting my versions mixed up).

Aaron Lun (16:57:45): > TL;DR is to use BioC 3.11.

Aaron Lun (16:58:23): > We did discuss having a version for 3.10, and there is a tagged version on the repo that was last known to work with 3.10, but that’s not the priority right now.

Sean Davis (17:00:50): > Thx,@Aaron Lun. I was looking at another fork of the repo. Does anyone know the rough max compute or memory requirements to build the current iteration?

Mike Smith (17:06:13): > _cron.shrequests a node with 128GB RAM and 20GB temp space but I’ve can’t no idea how much of an over requisition that is.

Sean Davis (17:11:10): > Thx,@Mike Smith. I’m just trying to scope out how big a cloud instance would be needed. Obviously, bigger is more expensive….

Mike Smith (17:12:28): > I’m going to try and get a build working on our cluster tonight, so hopefully I’ll have some first hand numbers on what it required once that’s completed.

Sean Davis (17:14:12): > Great. Thanks. If you have any feedback after going through a build on what works and what doesn’t out-of-the-box, that can save some headache when I sandbox something.

Aaron Lun (18:38:51): > @Mike SmithThose values are almost certainly overstated. I can run every workflow on my laptop (16 GB RAM).

Aaron Lun (18:39:09): > The HCA one goes into swap every now and then, mostly because of a bug with parallelizatoin in DA.

2019-12-17

Mike Smith (11:15:25): > Predictably this has taken a bit longer than hoped, and I’ve learnt a lot about the order .Renviron files are loaded in, but I’ve finally got to a stage where a build failure appears to be content related rather than environment e.g. > > Quitting from lines 178-180 (P1_W04.data-infrastructure.Rmd) > Error in (function (classes, fdef, mtable) : > unable to find an inherited method for function 'colData' for signature '"DFrame"' > Calls: local ... withVisible -> eval -> eval -> colData -> <Anonymous> >

Mike Smith (11:16:42): > @Aaron LunHow do you and Rob share build logs? I guess they were pushed to the git repo?

Aaron Lun (11:32:50): > There is an OSCAlogs repository in rob’s account.

Aaron Lun (11:33:24): > Hm. P1 isn’t my jurisdiction.

Aaron Lun (19:15:17): > Should be fixed, remember tomake update.

2019-12-18

Mike Smith (15:46:19): > So it’s a bit rudimentary, but I’ve started to put together build request tool athttps://www.huber.embl.de/users/msmith/osca-builder/If you press the button at the top it’ll launch a build job on our cluster. Any currently running builds are shown in the next panel, and the log for the currently running/most recently run build displays in the text box.

Aaron Lun (15:46:59): > just pressed the button

Aaron Lun (15:47:40): > Ugh. Looks likemake updateisn’t happy.

Mike Smith (15:48:04): > It’s still got my local changes, from before you accepts the pull request

Mike Smith (15:51:26): > There’s probably a hundred ways this will break, I guess it would be better to clone a fresh version with each build, but I wanted to see if I could get the webpage-to-cluster communication working

Aaron Lun (15:51:44): > looks pretty good to me

Aaron Lun (15:51:49): > Getting live updates

Mike Smith (15:52:58): > Also it’s failing somewhere in chapter 2, so I’ve no idea if the job setting will let it complete when there;s no code errors.

Mike Smith (15:53:15): > The log should update every second.

Aaron Lun (15:53:24): > I’ll have alook at it now.

Mike Smith (15:55:07): > If you submit multiple jobs it’ll either interleave them together or keep wiping each other out, not sure which since they write to the same log file.

Mike Smith (15:55:26): > If it looks useful I can smooth out the rough edges

Aaron Lun (15:56:31): > This is very useful.

Aaron Lun (15:59:06): > re chapter 2 error: I would guess that it’s because there’s a missingmake knitstep in your build.

Aaron Lun (16:00:37): > To give some background; there are some analysis outputs that are used throughout the book. However, I don’t want to have to build them at every usage, and I don’t want to have to mandate that people read the analysis-generating code before they read the actual book content. So we build those analysis outputs first withmake knit, and then we retrieve the cached values for use throughout the book.

Mike Smith (16:03:49): > Ah ok. I’ll add that to the build and remove my changes tomake updatewill run again.

Aaron Lun (20:05:32): > Looks like it’s stalling on the integration step of the HCA workflow. That takes an hour on my laptop with 8 cores. What’s the resourcing on the cluster, and does it respond to MulticoreParam?

Aaron Lun (20:16:01): > I should clarify - the entire HCA workflow takes an hour on my laptop, not just the integration step.

Aaron Lun (20:20:22): > If I had to guess, there was a known bug in DelayedArray where it wasn’t responding to the correct parallelization parameters (seehttps://github.com/Bioconductor/DelayedArray/pull/58).batchelor had coded its way around it, but because the bug has now been fixed, the hack doesn’t work anymore; this means thatfastMNN()does not respond to the specified BPPARAM object and just uses the default.

Aaron Lun (20:21:19): > On clusters, this is highly problematic asMulticoreParamthinks that all the threads on the node are available, not just the ones allocated by the job scheduler. In my case, this resulted in 50 processes fighting over 8 threads.

2019-12-19

Mike Smith (02:32:10): > Yes, the call themodelGeneVarseems to spin up as many processes as there are cores, which is not ideal, but it doesn’t seem to take more than a couple of minutes an SLURM doesn’t kill it for being a badly behaved job. As you say, the really long step appears to befastMNN, it timed out after 4 hours last night. Does that require a lot of disk access? The 8 R processes are using 100%CPU, suggesting that’s not the case, but this step isn’t spawning 50 processes (at least not at the moment).

Aaron Lun (11:28:03): > It hits the disk to read in the full matrix about 8 times; each step is a matrix multiplication viarsvd.

Aaron Lun (11:28:35): > For me, each full matrix read takes ~5 minutes across 8 cores.

Aaron Lun (11:28:53): > But I have a SSD, so that’s probably on the “very fast” side.

Mike Smith (11:35:01): > It seems to get past that step now. The previous job allocation only asked for 4 cores (I haven’t looked at the code properly), so I wonder if the run last night just got stuck in some job contention. It requests 8 cores now, and SLURM seems to just be cool with it using the whole nodes worth when it hits that bug. I’ve also adjusted the location of the *Hub caches so they should be on higher performance drives than yesterday.

Aaron Lun (11:35:48): > Ok, cool.

Mike Smith (11:36:03): > Do you know if R can return a ‘failure’ code when something likeR -e rmarkdown::render()fails to complete?

Aaron Lun (11:36:03): > I did update batchelor yesterday, and DA should also be bumped, so the bug should disappear soon enough.

Mike Smith (11:37:30): > At the moment the make clean fails with > > Quitting from lines 47-61 (merged-pancreas.Rmd) > Error in x[i, , drop = FALSE] : > invalid or not-yet-implemented 'Matrix' subsetting > > but then themake buildjust carries on regardless until it too fails because files its expecting to find are missing

Mike Smith (11:38:08): > It’d be nice to stop at the first failure, rather than mask it with tons of lines of output from the next step

Aaron Lun (11:38:15): > Hm. Well, that error is also kind of weird.

Mike Smith (11:38:25): > I guess I can just check for the existance of all files that are expected

Aaron Lun (11:39:33): > This works well enough: > > R -e "Blah" > echo $? >

Aaron Lun (11:39:59): > I guess we could set theMakefileto haveset -ue && …or something like that.

Aaron Lun (11:40:37): > In any case, just put up with it for the time being, I’m planning to migrate all of this stuff to OSCABase so that the Orchestrating* Repo basically just has the bookdown output.

2019-12-24

Kevin Rue-Albrecht (05:10:20): > Isn’t « orchestrating … » Bioc-patented by now?:yum:https://twitter.com/bibryam/status/1209391512037052417?s=12 - Attachment (twitter): Attachment > Orchestrating Storage with Kubernetes and Implications for Persisting State > https://buff.ly/2Zg8ozC

Federico Marini (05:19:19): > better not open up pandora’s vase on patents

Federico Marini (05:19:22): > :stuck_out_tongue:

2019-12-25

Aaron Lun (02:48:32): > Compilation of workflows should now be greatly improved, now they are only ever rendered once. This should shave an hour off compilation time, at least.

2019-12-26

Aaron Lun (00:49:02): > Check out this sentence: > > In some situations, we will have performed within-batch analyses to characterize salient aspects of population heterogeneity. > Woah. So many big words.

Aaron Lun (00:49:31): > just playing scrabble all the time when I’m writing this book.

2020-01-01

Aaron Lun (15:53:37): > @Mike Smiththis should now be as simple as: > > library(OSCAUtils) > spawnBook("some_dir") > compileWorkflows("some_dir") >

Aaron Lun (15:53:44): > and then you’re ready to go for the bookdown rendering.

2020-01-02

Aaron Lun (22:13:21): > The latest error makes no sense. What’s the version of scater that’s running on your side?

2020-01-06

Rob Amezquita (14:04:57): > Okay back online, looks like the repo has acquired some new wizardry! is building on the fred hutch servers still a thing or is it officially going to be moving to another site? if its still going to be on fh servers i can work today on updating the book to accommodate the new OSCAUtils functionality

Aaron Lun (14:05:21): > where on earth were you?

Rob Amezquita (14:06:46): > working on too many other projects for demanding biologists/offline for the holidays

Rob Amezquita (14:07:32): > im hoping to get onto a regular monthly schedule for maintaining the book, rn goal is to get this compiling again before the 15th so that its not overly stale

Rob Amezquita (14:07:52): > catching myself up on OSCABase/Orchestrating repos..

Koen Van den Berge (14:59:52): > @Koen Van den Berge has left the channel

Rob Amezquita (19:06:37): > @Aaron Lunlooks like scater hiccuped on installing the devel version: > > installing to /home/ramezqui/Rbuild/devel-foss-2016b/library/00LOCK-scater/00new/scater/libs > **** R > **** inst > **** byte-compile and prepare package for lazy loading > Error: object 'make_zero_col_DFrame' is not exported by 'namespace:S4Vectors' >

Aaron Lun (19:07:02): > Lloks like you need the devel version of S4Vectors.

2020-01-07

Mike Smith (09:53:08) (in thread): > So this requires the HEAD of the master branch of OSCABase, rather than the 5ce8d16 commit tagged in Bioconductor/OrchestratingSingleCellAnalysis , right?

Aaron Lun (11:28:00): > Yes. I think you should be just able to enter the directory and run those commands iand tit will update OSCABase for you.

Mike Smith (12:00:04): > Yep, but in order to get the version of OSCAUtils with those functions, I had to update OSCABase manually, a bit of a chicken/egg problem.

Mike Smith (12:01:28): > How does this integrate with the existing set of make … commands? It looks like an replacement to the make knit step, is that correct?

Aaron Lun (12:04:16): > Yes, it’s a replacement.

Aaron Lun (12:05:06): > In practice, I usually have a separate clone of the OSCABase repo. and then I call those commands on another directory to populate it.

Aaron Lun (12:05:33): > The entire “Orchestrating” repository is an output directory so I never directly write code to it.

Aaron Lun (19:02:40): > Hello?

2020-01-08

Rob Amezquita (10:08:30): > running the spawnBook/compileWorkflows flow that you have in OSCABase here; for future reference, would installing OSCAUtils install all the dependencies for the workflows as well? it looks like it has a bunch of the required packaged in the suggests, just wondering if thats complete

Aaron Lun (11:23:14): > Yes. This can be continually updated by calling a function named something likecreateDependencies()(not exactly sure what it was called).

Aaron Lun (11:23:37): > This will scrape all Rmarkdown files for::,lirbaryandrequirecalls.

Rob Amezquita (11:26:31): > great so once thecompileWorkflows()function finishes whats the next step for compiling the rest? or is that where the make scripts will take over?

Aaron Lun (11:27:53): > You just do normal bookdown stuff on the repo. There is nothing special left to do.

Aaron Lun (11:28:08): > That’s assuming that I put enough stuff in thesundries.

Mike Smith (12:04:11): > I went withbookdown::render_book("index.Rmd", "bookdown::gitbook", quiet = FALSE, output_dir = "docs", new_session = TRUE)and that seems sufficient to start building the book

Mike Smith (12:04:53): > For me it errors out when doing some sanity checking in chapter P2_W14: > > Quitting from lines 453-455 (P2_W14.protein-abundance.Rmd) > Error: all(getMarkerEffects(of.interest["PD-1", ]) > 1) is not TRUE >

Aaron Lun (12:05:12): > well, at least we got a bit further.

Mike Smith (12:05:29): > Looking at the results produced they’re very different from what’s currently online

Aaron Lun (12:05:45): > Probably some of the cluster numbers jumped around

Aaron Lun (12:05:55): > I don’t remember changing any code that would have caused that, though.

Aaron Lun (12:37:52): > Works fine for me.

Aaron Lun (12:38:08): > oh wait, hold on, wrong chapter.

Aaron Lun (12:42:19): > right, okay, got it.

Aaron Lun (12:45:01): > Whoops. Forgot I changed a parameter from log10 to log2. This should be fixed now.

Aaron Lun (12:45:28): > If you re-run it, just setcompileWorkflows(fresh=FALSE)to re-use the cached values, given that they haven’t changed.

Aaron Lun (12:46:57): > I would prefer for routine runs to usefresh=TRUE, but debugging runs might as well usefresh=FALSEif the bug isn’t occurring in theworkflows.

Mike Smith (13:00:11): > Cool, thanks. Default on my build page isfresh=FALSEbut when I get some time I’ll try and make two buttons or something so the choice can be made.

Aaron Lun (22:26:58): > oh

Aaron Lun (22:27:00): > oh my god

Aaron Lun (22:27:03): > it’s built!

2020-01-09

Mike Smith (03:16:26): > :+1:

Rob Amezquita (11:16:09): > got the build working here too! okay im going to update the Orchestrating* repo to this version, and then for OSCABase we just need to add@Mike Smith’s bookdown bit, and then I’ll update the cron job

Rob Amezquita (11:16:35): > unless@Mike Smithyou have it working elsewhere now and we dont need the hutch servers?

Rob Amezquita (11:22:51): > nooooo

Rob Amezquita (11:22:58): > tried pushing but some of the SVGs are too damn big

Rob Amezquita (11:23:06): > 250MB

Rob Amezquita (11:24:28): > these need to be converted to a different, flatter filetype: > > remote: Resolving deltas: 100% (161/161), completed with 42 local objects. > remote: warning: File docs/P3_W06.grun-pancreas_files/figure-html/unref-416b-variance-1.svg is 73.56 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB > remote: warning: File docs/P3_W15.hca-bone-marrow_files/figure-html/unref-nest-var-1.svg is 75.54 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB > remote: warning: File docs/P3_W09.segerstolpe-pancreas_files/figure-html/unref-seger-variance-1.svg is 53.12 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB > remote: error: GH001: Large files detected. You may want to try Git Large File Storage -[https://git-lfs.github.com](https://git-lfs.github.com). > remote: error: Trace: c23834e67ed8395b47e1d34e7dca1651 > remote: error: See[http://git.io/iEPt8g](http://git.io/iEPt8g)for more information. > remote: error: File docs/P3_W15.hca-bone-marrow_files/figure-html/unref-hca-bone-umap-1.svg is 243.44 MB; this exceeds GitHub's file size limit of 100.00 MB > remote: error: File docs/P3_W07.muraro-pancreas_files/figure-html/unref-muraro-variance-1.svg is 125.98 MB; this exceeds GitHub's file size limit of 100.00 MB > remote: error: File docs/P3_W15.hca-bone-marrow_files/figure-html/unref-hca-bone-mito-1.svg is 146.00 MB; this exceeds GitHub's file size limit of 100.00 MB > remote: error: File docs/P3_W15.hca-bone-marrow_files/figure-html/unref-hca-bone-qc-1.svg is 435.28 MB; this exceeds GitHub's file size limit of 100.00 MB > To git@github.com:Bioconductor/OrchestratingSingleCellAnalysis.git > ! [remote rejected] master -> master (pre-receive hook declined) > error: failed to push some refs to['git@github.com](mailto:'git@github.com):Bioconductor/OrchestratingSingleCellAnalysis.git' >

Rob Amezquita (11:25:13): > but besides that minor issue, how do we want to handle builds?

Mike Smith (11:25:51): > My builder athttps://www.huber.embl.de/users/msmith/osca-builder/completed too. I don’t necessarily want to supplant your Hutch build system if it’s working fine, but it seemed helpful to allow people (i.e.@Aaron Lun) to kick off a build on demand, and view the output log if it failed.

Rob Amezquita (11:26:08): > oh i hadnt seen this!

Rob Amezquita (11:26:44): > im totally fine supplanting the Hutch system, unfortunately it gets clobbered by the installing of packages step

Rob Amezquita (11:27:19): > given that the book is currently on devel, it looks like that may be an issue for a while (esp if@Aaron Lunis actively working on it and developing new pkg functionality at the same time)

Rob Amezquita (11:27:42): > is your system more reliable with updating the packages/staying current with the devel branch of Bioconductor?

Rob Amezquita (11:28:30): > and having an interactive book builder is just awesome

Mike Smith (11:29:10): > It’s based on the bioconductor_full docker image so there shouldn’t be an system dependencies/module loading etc issues, and the first step of any build is to update all installed packages to the lastest CRAN/BioC-devel versions

Rob Amezquita (11:29:58): > thats awesome!! yeah the Hutch has hardware issues with even running Singularity, let alone Docker, so it was impossible to get that going without significant IT support, of which i had no say in getting

Rob Amezquita (11:32:00): > if youre okay with it Mike, I’d love to have the builds not on the Hutch systems because its a headache and your solution is way more seamless/easy to kick off

Rob Amezquita (11:32:39): > and that way i can try to actually spend more time contributing content vs just trying to sysadmin this thing into working..

Mike Smith (11:39:02): > Yer sure. There’s definitely some rough edges to smooth out, but I’m happy to try and do that. We’ll have to work out someway to actually get the results in the repo, I guess giving my Github account access would be sufficient

Rob Amezquita (11:40:29): > lets start there!

Aaron Lun (11:40:55): > Creating PNGs instead of SVGs would fix the file size problem.

Rob Amezquita (11:41:15): > but also sorry i dont want to like, load you up with responsibility either mike, because i know its a bear to sysadmin this helluva project

Rob Amezquita (11:42:25): > so please shout out if you need help with getting something going - and as long as i manage R package installs the book should build fine assuming no bugs here at the hutch if a manual build is needed for whatever reason

Sean Davis (13:33:59): > If you all have a dockerfile that you can share that and instructions on how you use it to run the build, that would be great. We can potentially orchestrate this using a cloud-based process, so less tied to one infrastructure/person.

Mike Smith (14:04:02): > Sure. It’s actually running on singularity, just based originally on the bioconductor_all docker. I’ll polish it up and then push to the main repo for the book.

Rob Amezquita (14:31:59): > for now im going to recompile the book - just added this to the README for OSCABase, but if we have theCairopackage present, it’ll render figures using theCairoPNGdevice - im rerunning bookdown now so I can push the latest version up, excited to see it live

2020-01-10

Kevin Rue-Albrecht (08:41:40): > Nature vol 577 page 160:yum: - File (JPEG): Image from iOS

Rob Amezquita (10:57:35): > one thing i ran into is retaining the images (bioconductor sticker, workflow diagrams, the favicon, book cover, etc) and the CNAME file in the final build. that needs to be somewhere (maybe sundries?)

Rob Amezquita (10:58:51): > that said, the book has been officially updated:https://osca.bioconductor.org/ - Attachment (osca.bioconductor.org): Orchestrating Single-Cell Analysis with Bioconductor > Online companion to ‘Orchestrating Single-Cell Analysis with Bioconductor’ manuscript by the Bioconductor team.

Aaron Lun (11:30:13): > Some of that stuff doesn’t seem like it needs to be in the repo. The BioC sticker should be in BiocSTickers, the workflow diagram can be pulled from the paper.

Rob Amezquita (11:32:25): > so youre thinking pull it from online resources?

Rob Amezquita (11:32:29): > that sounds doable

Rob Amezquita (11:32:49): > but the favico must stay

Aaron Lun (11:33:05): > See how I’m pulling it for the memes.

Rob Amezquita (11:34:57): > we must always have a WIP section so that the memes stay

Rob Amezquita (11:35:46): > another thing that will be good to do is to track views/downloads of a PDF book (working on compiling that@Federico Mariniright now!)

Federico Marini (11:55:39): > Yayyy:slightly_smiling_face:

Stephanie Hicks (13:22:25): > thank you for getting the book built@Rob Amezquita!!

Rob Amezquita (14:44:59): > really all thanks to@Mike Smithand@Aaron Lunbug fixes i just hit the play button and it finally worked:sweat_smile:

Rob Amezquita (14:45:05): > some exciting updates in there!

Stephanie Hicks (15:55:49): > thank you@Mike Smith@Aaron Lun!!

2020-01-13

Peter Hickey (01:03:21): > i’m teaching a 2-day workshop next month based onhttps://osca.bioconductor.org/. > this is a non-commercial event for researchers at WEHI and perhaps other research institutes. > > i’m not a lawyer but this seems permitted under the ‘CC BY-NC-ND 3.0 US’ license (https://creativecommons.org/licenses/by-nc-nd/3.0/us/). > However, my reading of the license is that i can’t distribute any slides adapted from this material (the ‘no derivatives’ clause). > is that the intention? > > Finally, is it okay to include figures from OSCA in a flyer advertising the workshop? - Attachment (osca.bioconductor.org): Orchestrating Single-Cell Analysis with Bioconductor > Online companion to ‘Orchestrating Single-Cell Analysis with Bioconductor’ manuscript by the Bioconductor team.

Aaron Lun (01:11:16): > I can’t imagine there being a problem.

Aaron Lun (01:15:52): > Though note that some functions are devel only, so keep an eye out for those.

Sean Davis (06:27:27): > My reading of the No Derivatives is that slides would be a derivative. Copying an entire chapter to support a course would not be. It might make sense to go back to contributors to ask about changing the license tohttps://creativecommons.org/licenses/by/4.0/.

Rob Amezquita (10:47:33): > Yeah can definitely update license on next go around - I had thought “no derivatives” would mean taking a chapter, editing small parts of it, and claiming it as your own. But can change the license to the one you propose@Sean DavisI don’t have strong feelings about it

Rob Amezquita (10:49:15): > And@Peter Hickeyrealistically, even if you were infringing, I don’t think we have enough cash to litigate so…free for all! Good luck on the workshop (and if/when you get a chance, would love to see the slides!)

2020-01-14

Stephanie Hicks (09:42:07): > that works for me too!

2020-01-16

Aaron Lun (15:30:56): > We’re not rebuilding.

2020-01-17

Aaron Lun (17:48:47): > Hey!@Rob Amezquita!

Rob Amezquita (17:49:09): > boop!

Rob Amezquita (17:49:14): > shall we rebuild from the FHCRC servers

Aaron Lun (17:49:53): > Well, whatever you were doing before.

Rob Amezquita (17:49:59): > a completely manual build, haha

Aaron Lun (17:50:06): > Until someone figures out a better way to do it.

Rob Amezquita (17:50:45): > yessiree!

Rob Amezquita (17:51:50): > i gotta revamp the build scripts to work from OSCABase - should I just add a build script inside of OSCABase or have it separate from the repos and just have one that works on the cluster?

Aaron Lun (17:52:11): > What kind of build scripts can there possibly be?

Aaron Lun (17:52:44): > Surely everything is captured insidesundries.

Rob Amezquita (18:38:00): > images for one didnt seem to get imported into there (problem with linking out to publication for image is that its walled off i would think?), but so the script i would need to make for the FHCRC servers to build nightly again would a) install/update packages on the head/login cluster, where HW isnt an issue and then b) kick off a job on a compute node to compile the book via sbatch

Rob Amezquita (18:38:39): > kinda sorta already in_cron.shjust need to update that (in older commits of Orch* repo)

Aaron Lun (19:06:43): > What? The NM image is walled off? I thought we paid for OA

2020-01-18

Aaron Lun (22:32:32): > I fixed the image thing. Give me some builds.

2020-01-21

Vince Carey (07:32:03): > @Vince Carey has joined the channel

2020-01-22

Aaron Lun (00:32:12): > Builds? Builds!? My kingdom for some builds!

Aaron Lun (00:35:40): > @Mike Smithcan you set up an auto-pusher to the Orchestrating Github from your end? It looks like your system is actually working.

Mike Smith (15:25:26): > Yep sure, I’ll take a look tomorrow.

2020-01-23

Aaron Lun (16:45:53): > @Mike Smith?

Mike Smith (17:37:40): > Tomorrow tomorrow?

Aaron Lun (19:12:57): > That’s the weekend?

2020-01-25

Aaron Lun (20:32:05): > So.

Aaron Lun (20:32:07): > @Mike Smith

2020-01-27

Aaron Lun (20:09:41): > Hello?

Aaron Lun (20:09:43): > Anyone?

Aaron Lun (20:35:40): > Is@Rob Amezquitastill alive? And will@Stephanie Hicksfinish the trajectory chapter?

Stephanie Hicks (20:38:48): > i’m alive! and slowly digging myself out of my completely overwhelming fall semester of teaching. Good news is that i’m not teaching this spring:tada:

Stephanie Hicks (20:39:13): > so finishing that chapter is soemthign i’m very close to finishing!

Aaron Lun (20:39:51): > good, good.

2020-01-28

Helena L. Crowell (05:09:22): > @Helena L. Crowell has joined the channel

Helena L. Crowell (05:20:19): > Dear all, > I noticed a while ago although the title of this (totally awesome!) resource is “Orchestrating single-cell analysis”, it is,to my knowledge, missing a workflow for analysis of single-cell mass cytometry (CyTOF) data. > > A couple year back, I wroteCATALYSTfor preprocessing of CyTOF data. It has since been extended to provide a framework for differential discovery (DA and subpopulation-specific state changes a la chapter 14) drawing from an F1000 workflow. > > At the time, the package was pretty terrible (!!) Locally, we have now developed an all-things-SCE way of doing things (including preprocessing, gating, visualisation, downstream analysis), which leverages, for example,scaterfor visualization & dimension reduction,diffcytfor differential testing (which in turn usesedgeRandlimma). The main goal being > * to get away from the rather ugly data structures used in the cytometry community (i.e.,flowFrames andflowSets andGatingSets. > * to in turn have better communication with all the other tools available that are applicable to this data. > * to eventually omit the need of uploading things to Cytobank, FlowJo, R and back and forth for analysis. > Now, sorry for the long rent. Long story short: > I was wondering if there was any interest at all to incorporate a chapter on “Analyzing CyTOF data” that includes a full R-/Bioc-based pipeline for preprocessing, gating, and differential discovery?

Kevin Rue-Albrecht (06:46:21): > There are quite a few people around me who work with CyTOF data (I haven’t had the pleasure yet). My point being that a resource/reference for a Bioc workflow would likely be beneficial. I know I would look it up if I had to prepare myself for an analysis with an overview of a typical workflow.

Kevin Rue-Albrecht (06:49:52): > I’m sure there are workflows floating around the web, but OSCA seems like a natural place for any data modality linked with single-cell analysis. Integration with protein abundance is already mentioned for CITE-seq (https://github.com/Bioconductor/OSCABase/blob/af32759067cd18e10c1ad565b6aa4c3620e7afa6/analysis/protein-abundance.Rmd#L7), so why not CyTOF ?

Aaron Lun (12:09:03): > I don’t mind, but one thing to be aware of is that this book is already very large, and any error at any point will break the build. (And no, please don’t tell me caching is the answer.)

Aaron Lun (12:11:22): > Actually, I guess I do mind. The cytof chapter doesn’t seem like it would share anything with the other chapters in terms of processing, normalization, transformations, etc., so the benefit of having it as another chapter in the same book seems a bit tangential. CITE-seq is only thrown in because it is (i) still count data and (ii) tightly integrated with gene expression data.

Aaron Lun (12:16:37): > I think you would benefit from an entirely separate book, then you would actually get multiple chapters to talk about the various gating, QC and normalization strategies. We can help with some of the logistics; despite its appearance, OSCA actually involves a fair amount of thought into how to coordinate data across chapters.

Aaron Lun (12:17:22): > In fact, I would very much like to see a book forallcytometry, something likecyto.bioconductor.org

Aaron Lun (12:18:11): > But that’s probably a bigger project.

2020-01-29

Helena L. Crowell (00:29:58): > Thanks Aaron for killing all my hopes and dreams as I knew you would. > Ok, that makes sense. So maybe you experienced ones have some advice on how one could go about this? Did you all meet like on the street? I’m thinking I know a fair amount of competent people in the cyto area and could reach out to some… > Independent of osca or not, I do think it’s totally worth having a bioc resource for all things cyto, and I’m sure it’d be appreciated by many.

Aaron Lun (00:31:13): > Build it and they will come.

Aaron Lun (00:32:25): > Or in other words: promises are cheap. Start working and see who puts their money where their mouth is with a PR.

Tim Triche (12:44:00): > @Helena L. CrowellI used your workflow as a template for analysing Hourigan’s CyTOF data alongside both bulk and 10X scRNAseq from the same donors. I recently assigned the comparison as a workflow writeup to a rotation student (because I’m a psychopath, I guess). I could put him in touch if you’re willing to be a bit patient; I did already write the code for all three datasets (importing so that they can be compared) and I figured this type of setup is a good benchmark for “what happens when I do X in preprocessing, Y in integration, or Z in experimental design, on a real dataset with multimodel measurements of biological replicates?”

Tim Triche (12:44:53): > Chris (Hourigan) is supportive and, as it turns out, has extra aliquots of all the marrows involved, in the event that something elseneedsto be standardized (or would greatly advance the field if it were). So there’s that.

Tim Triche (12:46:09): > We implemented a “reprocess a remote 10X BAM” stream-based approach for incrementally testing different approaches to quantification, normalization, integration, etc. so that people don’t end up with huge files all over the place. I don’t know if something like that is feasible with the CyTOF data; I just downloaded it into my Dropbox and processed it.

Tim Triche (12:46:27): > There are a number of features of Hourigan’s design that (IMHO) recommend it for this particular task.

Tim Triche (12:47:54): > If the workflow is compact and people take a shine to it, perhaps there will be interest in merging it back into OSCA. But it seems like a workflow and paper demonstrating how to do these things on samples that have had all the same stuff measured would make a fine stand-alone example too.

Aaron Lun (12:50:43): > If you make a PR to OSCABase, I will review it for suitability.

Aaron Lun (12:51:44): > I should note at the start that any non-R dependencies is a deal-breaker.

Aaron Lun (13:01:07): > Well, I guess that depends on whetherbasiliskmight be useable at that point. But even that’s pretty touch and go.

Tim Triche (13:01:50): > streampipe is separate from R; all the code that I used for my first pass at bulk, 10X, and CyTOF data loading for the samples is in R. If anything, we might provide the output of streampipe/scanpy/scvelo as an option.

Tim Triche (13:02:27): > I am interested in whether@Helena L. Crowellimmediately recoils upon witnessing my clumsy approach at CyTOF data loading:slightly_smiling_face:

Tim Triche (13:03:44): > CyTOF not 10X. I just used dropletUtils for 10X, at least when I initially wrote the code. Now I’m more interested in better using the spliced and unspliced counts. That might create a job for basilisk, since I’m not aware of anything that works as cleanly as scvelo and runs in R…

Tim Triche (13:06:33): > For a first pass, though, I’d prefer to keepeverythingin R, to the extent of just loading an SCE of output from our usual streampipe processing if any exploration of velocity-based trajectories is relevant. (The CyTOF data is more dense but lower-dimensional than the 10X data, with about an order of magnitude more observations per subject than 10X cells post-filtration.)

Tim Triche (13:09:44) (in thread): > See rambling conversation between Aaron and myself. I wrote something that attempts to do this, but it’s not good enough. I’ve assigned it to a rotation student, and he agreed (perhaps unknowingly) to go after it, so… it might be a great opportunity for him to have your guidance, as I’m not a CyTOF expert. Lab colleagues have done a lot of IMC/Helios work, but the prospect of learning across bulk/sc/CyTOF data is an exciting one.

Aaron Lun (13:29:15): > Well, if you make a minimalist PR, I will be happy to look at it.

Aaron Lun (13:31:21): > It should have some kind of general theme to be a topic chapter; otherwise you should consider making it a workflow chapter.

Tim Triche (13:58:24): > The original idea (thematically) was to have matched data on biological replicates as a framework for evaluating processing/analysis methods, and coincidentally it highlighted the ready availability of Bioconductor libraries to work with all three data types. I’m not sure any CyTOF-loading code exists in Python, for example. If it does I don’t know that I’d trust it.

Tim Triche (13:59:02): > Arguably, there’s substantial overlap with (say) CITE-seq, but the amount of events you can collect with CyTOF dwarfs that of CITE.

Helena L. Crowell (13:59:24): > *Just a random note that I’m at dinner but am excited to reply late :)

Tim Triche (13:59:46): > OK cool. I was going to go talk with Brandon and curious what you thought. I’ll take that as a “it’s not completely insane”:wink:

Tim Triche (14:00:32): > We have trajectories, etc. plotted on all of the samples already, but that’s one of the areas where I felt like More Research Is Needed (tm) after looking at the results. Happy to take this into a sidebar, email, whatever.

Tim Triche (14:01:41) (in thread): > I just saw yourmuscatpreprint. I do believe this may have legs. Say hi to Marc:wink:

Aaron Lun (16:47:08): > Regarding the theme: the closest one I would consider to your description would be “benchmarking”. Do be mindful of the tone in which the chapter is written. The emphasis is on “this is what you can do, using this data as a demonstration”. It would be out of scope to go into a detailed discussion of the peculiarities of this particular dataset and its interpretation; if you want to do that, write a paper:slightly_smiling_face:

Tim Triche (17:34:01): > Yes and yes

Tim Triche (17:34:34): > Sort of a flip side to the CITE chapter.

2020-01-30

Aaron Lun (12:21:09): > @Mike Smithis this build happening or what?

2020-02-02

Mikhael Manurung (03:35:31) (in thread): > Happy diffcyt/CATALYST user here:wave:I am willing to participate in writing this up. It would be nice indeed to have an OSCA-like resource but for thr cytometry community.

2020-02-03

Laurent Gatto (13:26:58) (in thread): > Hi@Helena L. Crowellet al. We (@Chris Vanderaaand myself) are doing some developments on mass spectrometry-based proteomics at the single cell level (see for example the SCoPE2 technologyhttp://dx.doi.org/10.1101/665307), usingSingleCellExperimentobjects and testing some of the pipelines in the osca book. Depending on how things evolve and mature, we thought it would have been a possible addition to the osca book (given that the title is technology agnostic) - probably not then.

Tim Triche (13:43:02): > @Laurent Gattoconsider looking in#cyto-spec-book

Helena L. Crowell (13:58:13): > Yes, so I made a channel #cyto-book (prob should come up with a better name) to discuss & collect ideas. If you’re not opposed to the idea, I’m thinking a proteomics bioc book would be a good home for both all cyto & mass spec related things. I started putting together some things in a bookdown & am happy to share a (mostly empty) draft at some point; when it’s no longer too embarassing. > Unfortunately I’m quite busy with moving to the US this month, but will continue when I can & hoping we can get together some highly motivated people to contribute. I already have a few lined up :)

Helena L. Crowell (14:05:14) (in thread): > That sounds great Laurent. Just saw this & I replied in the osca channel. > I am very motivated to make this crazy idea into something & am holding a couple meetings this week. Also I’m planning to get some folks from the cyto community involved whom I know, including imaging mass cytometry pros. > Mass spec would fit in quite well I think! Though osca is near 40 chaptes with the title “…Single-Cell…” it seams there’s no room for all I (and maybe you as well) would want say about other data. And just 1 chapter seams like missing out on many things.

Laurent Gatto (16:17:24) (in thread): > Thank you Helena. I will be looking foroward to your cyto-book initiative. In addition to mass spec, we also have interest in flow cytometry in my lab.

Laurent Gatto (16:18:00) (in thread): > Thanks, and now worries, that your time. All the best with your moving!

Laurent Gatto (16:18:47) (in thread): > By the way, re imaging mass cytometry, the#spatialchannel might also be a good fit for that.

Aaron Lun (20:37:02): > Before you guys disappear to another channel, some parting words on book building. I can tell you that this book was a real pain, and that’s in the best case where I have effectively full control over both its content and almost all of the software packages that it uses. Even now, we cannot reliably build the thing (last successful build was on the 9th, and both builders have been MIA).

Aaron Lun (20:41:48): > The moral of the story is to keep your book’s scope tightly defined from an implementation perspective. Regardless of how both cytometry and mass spec are “proteomics”, it seems to me that these are really, really, two different books. I cannot imagine that MS preprocessing (MALDI-TOF? Orbitrap? I dunno) is the same as that from cytof, and it would seem that their dimensionalities are totally different; I would guess that your MS data is much higher dimensional than the ~50 tags we get from Cytof. Shoehorning both things into the same book makes it a real pain to build because the failure points are doubled. Remember, it only takes one error in one chapter to cause the whole compilation to stop, and this is where the fingerpointing begins.

2020-02-04

Helena L. Crowell (01:11:01): > Thanks… Very motivational all that:ghost:I get your point. Than again, no need to reach the same scope (>35 chapters). Let’s see. Too early to fight about it ;)

Helena L. Crowell (01:17:09): > So to summarize your advice: write your own book, we have no space for you! Never mind, don’t write a book at all, it’s too painful, or write two books instead:exploding_head:(just kidding)

Sean Davis (05:39:13): > Just to add here, consider doing something other than abook. I think all of us who have successfully produced one were surprised about the amount of work and the fragility of the bookdown system as a collaborative editing system. > Chapters are the meat of the book, are easy to produce and manage, and are publishable in an academic sense. Consider alternatives to a book such as partnering with a journal, producing bioconductor workflows, or a collection of independent websites, organized into a collection.

Matt N Tran (08:27:22): > @Matt N Tran has joined the channel

2020-02-05

Tim Triche (12:10:35) (in thread): > or maybe “write chapters, and don’t try too hard to force them into a narrative” which seems like it can help us from the towering complexity of (e.g.) OSCA

Aaron Lun (12:11:42): > The narrative is not really the problem.

Aaron Lun (12:11:59): > It’s simply dependencies.

Tim Triche (12:12:10) (in thread): > This was kind of my sensation all along – a full book is thankless and mostly un-recognized work compared to a glam paper (OSCA Nature Methods) or even a heavily used package’s vignette. If we can write standalone workflow “chapters” we have a chance, if we can’t it will likely die

Tim Triche (12:12:22): > And yes the dependencies seem to pile up in a hurry.

Tim Triche (12:12:39): > I think we discussed CATALYST issues in this very forum some months ago

Tim Triche (12:12:53): > in fact that’s where the code I posted in#cyto-spec-bookgot debugged

2020-02-07

Rob Amezquita (12:38:08): > Going to chime in here @Helena L. Crowell- given the title, yes, makes sense that we should include all things single cell, including mass/flow cytometry analysis, but given the current book’s structure, at this point it would probably be out of scope especially since (as far as I know) it uses fairly different pipelines. > > Also, yeah, like@Tim Triche(or someone?) said, maintaining a book of this complexity is fairly thankless, esp since once it gets to this size typically CI/CD tools don’t work due to the long build times, and its hard to keep up with the changes that happen (more a blessing than a curse, and I’m grateful@Aaron Lunhas spearheaded making OSCA the necronomicon that it is!! wouldnt be as awesome as it is without his tireless efforts). > > given the current issues with keeping the build going (I’m able to come in/fix things/manually build at a rate of about ~1/month), I think adding the complexities of cytometry would make it even more difficult than it already is for me/@Mike Smithall that said, yeah, I think a separate flow/mass cytometry analysis book would be awesome and more than warranted. in fact,@Raphael Gottardoand@Greg Finakhave been discussing this as well internally in our group. so you’d have support from the getgo from other willing collaborators as well! > > re: narrative: I think this is something the community sorely needs - the “BiocVerse” is extremely complex and difficult for newcomers to navigate, esp since Bioconductor has taken a “micro-packages” (dare I say tidyverse?) approach (which from an engineering perspective, is much more sane than everything-in-one) - so I think books such as OSCA are very valuable in that they provide a cohesive framework for folks to read/understand various steps. vignettes for individual packages with more detail will always be necessary, but books to tie up the various packages into cogent narratives are imho the next necessary step that bioconductor needs to take to continue to be viable. its the whole reason@Raphael Gottardoand@Stephanie Hicksand I wanted to start OSCA in the first place (with hat tip to simpleSingleCell by Aaron as additional inspiration), and I would love to see more of these sorts of resources happen!

Raphael Gottardo (12:38:12): > @Raphael Gottardo has joined the channel

Greg Finak (12:38:12): > @Greg Finak has joined the channel

Aaron Lun (12:39:45): > @Rob AmezquitaIt looks like you’re back, so maybe you could get to work on the interoperability chapter.

Aaron Lun (12:40:00): > Or the trajectory one, I don’t care which one.

Rob Amezquita (12:40:44) (in thread): > yeah its getting close to my 1/month self commitment so thats in order:slightly_smiling_face:

Rob Amezquita (13:03:31) (in thread): > yeap, totally agree on the fragility issue…my only thing with the bioconductor workflows is their lack of visibility/polish compared to having a fully compiled book - maybe there could be a good compromise solution in the works that puts great workflows/tomes front and center with their own domain a laosca.bioconductor.organd that have a more, how to say, book-ish styling? agreed though, the bookdown business has been a freaking nightmare (although nonetheless, impressive that its even possible with all R)

Sean Davis (13:38:44) (in thread): > Agreed that branding is important.

Rob Amezquita (13:54:14) (in thread): > absolutely! i mean, science in some ways is marketing right?:slightly_smiling_face:

2020-02-10

Mike Smith (16:29:58): > I’ve updated the builder so you can choose to clear the cache or not, and now it will (hopefully) create a preview git branch if a build completes succesfully and push this to Github. The author can then take a look at the rendered HTML version atwww.huber.embl.de/users/msmith/osca-builder/docsand check they’re happy, before doing a pull request to the master branch. That way we don’t automatically override what’s currently published, but anyone can trigger a build and anyone with permission can update the website.

Mike Smith (16:31:33): > However, it’s currently failing because the NNLM package has been retired from CRAN (https://cran.r-project.org/web/packages/NNLM/index.html) and so we get the error: > > Quitting from lines 255-274 (P2_W05.reduced-dimensions.Rmd) > Error in loadNamespace(name) : there is no package called 'NNLM' >

Aaron Lun (16:31:34): > Yeah!

Aaron Lun (16:31:41): > Okay, that’s fine, I’ll deal with that.

Aaron Lun (16:31:56): > Thanks@Mike Smith.

Mike Smith (16:32:40): > I expect the auto push stuff will fail the first time it gets there, but once it’s building I’ll iron out the kinks.

Aaron Lun (22:33:16): > It’s… it’s beautiful.

Aaron Lun (22:36:57): > I think the other proposed books will benefit greatly from this infrastructure. If it helps, I can simplify our book’s build process so that the only special thing you need to do is install OSCAUtils (i.e., nocompileWorkflows()orspawnBook()); after that, it’ll be a straight run through bookdown. Then you can just plug and play different books into the same build system.

Aaron Lun (22:37:55): > (However, it would still be helpful to have the two buttons so that caches are not cleaned out unnecessarily, which allows easier debugging and reduces the computational load on the build system.)

2020-02-11

Aaron Lun (01:17:44): > Hm. 350000 cells takes a while.

Aaron Lun (01:18:08): > I was hoping it would be done in < 1 hour, but I guess IO on a server is slower than my SSD.

Mike Smith (02:46:57): > I need to port the base images over to Niteshes new Docker files, although hopefully that won’t bring everything crashing down.

Aaron Lun (02:47:25): > No probs, just trigger a new job when you’re done.

Aaron Lun (02:47:34): > I’ll be asleep before this one finishes anyway.

Mike Smith (02:48:37): > I already maintain Wolfgang’s MSMB book in a totally different build system, so it would be great to try and formalize this for others to use. It’d also be nice to have the log for previous builds. I wanted to see what had gone wrong with the previous run, but by the time I opened it not on my phone screen you’d already kicked off a new build.

2020-02-12

Aaron Lun (02:54:58): > Fixing bugs…

2020-02-19

Daniel Newhouse (11:27:22): > @Daniel Newhouse has joined the channel

2020-02-20

Aaron Lun (19:23:30): > Can you make some gifs to accompany the chapter?

2020-02-25

Aaron Lun (00:03:23): > Whooops.@Federico Marinisee comment above. But then I realized it would be easier to just make videos and link to them.

Aaron Lun (00:03:50): > You could make some videos with some Bioconductor-licensed BGM

Aaron Lun (02:17:41): > @Mike SmithIt’s working nicely but are you sure that “build” doesn’t clear the cache? The timings seem a bit slower than I thought they would be, hard to tell but if you can confirm then I know what to worry about re. package efficiency.

Federico Marini (03:31:34): > Guess videos are easier to control in terms of playback

Federico Marini (03:31:51): > but if you want there is the oneliner I use to convert mov to gif

2020-03-01

Aaron Lun (19:48:39): > too long.

Aaron Lun (19:48:42): > slingshot takes too long.

Aaron Lun (19:48:52): > Only 14000 cells and I’m clocking past 20 minutes.

Aaron Lun (19:49:06): > It should have been done in <2 minutes, max.

2020-03-02

Mike Smith (09:36:20): > Which section is this? I’ll run it outside of the book builder and check how long it takes

Mike Smith (09:54:24) (in thread): > When you press ‘Build’ it sets the environment variableFRESH=0when the job is launched. This is variable is printed in the very first line of the output build log so I can check. AssumingFRESH=0the build script then follows: > > ## Make sure OSCAUtils is up to date + all other packages > BiocManager::install("Bioconductor/OSCABase", subdir = "package", update = TRUE, ask = FALSE) > > ## /scratch/msmith/OSCA should be populated if FRESH=0 & we've run at least once > OSCAUtils::spawnBook("/scratch/msmith/OSCA") > > ## Part 1 - Common workflow building > OSCAUtils::compileWorkflows("/scratch/msmith/OSCA", fresh=FALSE) > > ## Part 2 - Compile complete book > setwd("/scratch/msmith/OSCA"); bookdown::render_book("index.Rmd", "bookdown::gitbook", quiet = FALSE, output_dir = "docs", new_session = TRUE) > > Let me know if you see something wrong in that workflow

Aaron Lun (11:38:27): > Hm. Looks fine.

Aaron Lun (11:38:51) (in thread): > It’s a PR; not yet in the book, not your problem.

2020-03-08

Aaron Lun (03:51:30): > @Mike Smithit builds! but can’t findgit.

2020-03-09

Mike Smith (05:08:20): > That sounds like the I’ve reached the point where the image manifest needs to be changed, so I’ll switch over to Nitesh’s new docker images and run some tests.

2020-03-11

Aaron Lun (20:16:43): > @Kelly StreetI’m going to hit your repo with a whole bunch of requests.

Kelly Street (20:21:37): > @Kelly Street has joined the channel

Kelly Street (20:24:22) (in thread): > Also, just saw your earlier comment about runtime, butapprox_pointsmight help with that?

Aaron Lun (20:24:35) (in thread): > Yes, was testing that now.

Aaron Lun (20:24:45) (in thread): > Broke my R installation by updating brew, so it’s taking a while.

Aaron Lun (20:30:33) (in thread): > looks pretty good

2020-03-22

Leonardo Collado Torres (22:00:07): > @Leonardo Collado Torres has joined the channel

2020-03-24

Aaron Lun (01:07:14): > One of the odd benefits of working from home is that I now have two laptops. So I can spare my usual home computer to compile the book while I do stuff on my work computer. Ho ho ho.

Aaron Lun (01:07:25): > And with that, the book has been recompiled.

Aaron Lun (01:19:21): > @Kelly Streetdid my requested changes make their way through to BioC-devel?

Aaron Lun (01:19:52): > On a related note, you can see the latest draft of that chapter.https://osca.bioconductor.org/trajectory-analysis.html - Attachment (osca.bioconductor.org): Chapter 17 Trajectory Analysis | Orchestrating Single-Cell Analysis with Bioconductor > Online companion to ‘Orchestrating Single-Cell Analysis with Bioconductor’ manuscript by the Bioconductor team.

Aaron Lun (01:20:04): > Bit messed up but there’s some solid slingshotty goodness down there.

Peter Hickey (01:22:40): > some rmarkdown weirdness at the start > > processing file: P3_W12.nestorowa-hsc.Rmd > > | > | | 0% | > |. | 2% ordinary text without R code > > | > |.. | 4% label: unref-setup (with options) List of 2 $ echo : logi FALSE $ results: chr “asis” > > | > |…. | 5% ordinary text without R code > > | > |….. | 7% label: data-loading Loading required package: SingleCellExperiment Loading required package: SummarizedExperiment Loading required package: GenomicRanges Loading required package: stats4 Loading required package: BiocGenerics Loading required package: parallel >

Aaron Lun (01:25:47): > Oh yeah, that’s intentional, to show how much effort was involved in getting the whole thing to work.

Aaron Lun (01:25:53): > Just kidding. Looking at it now.

Aaron Lun (02:22:58): > Should be fixed. Will build again (sigh) tomorrow.

Kelly Street (10:22:47) (in thread): > embedCurvesis there! Should I also add the function that extracts the data.frame for plotting curves? > And the branch identification stuff is still just on thebranchIDbranch on github. If that was working as it’s supposed to, I would be open to adding it, as well.

2020-03-25

Aaron Lun (00:55:56) (in thread): > @Kelly Streetyes and yes.

brian capaldo (13:30:22): > @brian capaldo has joined the channel

2020-03-27

Aaron Lun (12:55:07): > Nope.

Aaron Lun (12:55:27): > I’ve no appetite to do that, but you’re more than welcome to push it.

2020-03-31

Aaron Lun (00:28:37): > @Mike Smithhas there been any progress on the builder, or do I continue compiling it on my laptop?

Dan Bunis (13:31:31): > @Dan Bunis has joined the channel

2020-04-02

Talha (07:39:01): > @Talha has joined the channel

2020-04-09

Aaron Lun (01:18:59): > @Rob AmezquitaI see that no progress has been made on the interoperability chapter.

2020-04-30

Devika Agarwal (05:34:26): > @Devika Agarwal has joined the channel

2020-05-04

Aaron Lun (02:38:53): > We need to sort out this release/devel situation on the book.

2020-05-05

Stephanie Hicks (07:25:38): > I apologize for being out of the loop@Aaron Lun. Are you looking to have two versions of the book (eg devel vs release)? That makes a lot of sense

Stephanie Hicks (07:26:04): > What is needed to make that happen?

Lukas Weber (23:07:19): > @Lukas Weber has joined the channel

2020-05-06

Aaron Lun (04:45:25): > Probably need another repo. Something likehttps://osca-dev.bioconductor.org/. Who set this up in the first place? Was it@Martin Morgan?

Stephanie Hicks (07:25:03): > Yes, I believe@Rob Amezquitaasked@Martin Morgan

Martin Morgan (08:37:17): > @Martin Morgan has joined the channel

Martin Morgan (08:37:44) (in thread): > happy to facilitate with whatever; let me know the name and who to give admin access to…

Avi Srivastava (11:04:23): > @Avi Srivastava has joined the channel

Aaron Lun (12:22:56) (in thread): > I think I’d need aBioconductor/OrchestratingSingleCellAnalysis-devrepository, and a copy of whatever URL magic was used to gethttps://osca-dev.bioconductor.org/to point to the resultant GitHub pages.

Martin Morgan (12:27:16) (in thread): > is it confusing to have osca & OrchestratingSingleCellAnalysis repos?

Aaron Lun (12:29:04) (in thread): > I was planning to rename the repos as follows: > * OSCABase -> OrchestratingSingleCellAnalysis-base > * OrchestratingSingleCellAnalysis -> OrchestratingSingleCellAnalysis-release > * And then the new OrchestratingSingleCellAnalysis-dev. Or devel, I suppose. > Should be pretty clear that all of those are the same set of things. The release and dev repos only exist to host content and should have no real commits on them; thebaserepo is where all the action is at, as it provides the raw material for the two others.

Martin Morgan (15:57:41) (in thread): > the repo andosca-dev.bioconductor.orgshould be good to go

Aaron Lun (15:58:02) (in thread): > okay, thanks. I’ll sort it out tonight.

2020-05-07

Aaron Lun (02:04:43): > @Martin Morganspeaking of which, I have another book that I would like to live in the Bioconductor domain name.

Aaron Lun (11:54:53): > oh man. Could you tell them to post on the support site? I don’t have a biostars account.

Aaron Lun (11:55:37): > thx

Aaron Lun (15:35:20): > DEV version is up:http://osca-dev.bioconductor.org/ - Attachment (osca-dev.bioconductor.org): Orchestrating Single-Cell Analysis with Bioconductor > Online companion to ‘Orchestrating Single-Cell Analysis with Bioconductor’ manuscript by the Bioconductor team.

Aaron Lun (15:35:33): > Do we have a stable location for the sticker that I can repoint to?

Marcel Ramos Pérez (15:41:39): > Do you meanhttps://github.com/Bioconductor/BiocStickers/?

Aaron Lun (16:12:12): > Or even better.

Aaron Lun (16:12:16): > Check it out now.

Martin Morgan (16:24:45): > but the animation synthesizes one strand 5’ -> 3’, the other 3’ -> 5’ ??

Aaron Lun (16:40:59): > @Kevin Rue-Albrecht

Kevin Rue-Albrecht (17:28:48): > Wooops

Kevin Rue-Albrecht (17:29:22): > :sweat_smile:

Kevin Rue-Albrecht (17:30:07): > I was so focused on making it look like a loading bar, I forgot biology for a moment… well an hour …

Martin Morgan (18:31:25): > if you’re tweaking, the helix is also incorrect (left vs right-handed)…:slightly_smiling_face:

Kevin Rue-Albrecht (18:38:11): - File (GIF): ezgif-5-83a653eae543.gif

Kevin Rue-Albrecht (18:38:33): > I haven’t tweaked the helix (I’m aware that has been a recurrent topic)

Kevin Rue-Albrecht (18:39:23): > before “optimisation” (ezgif) - File (GIF): ezgif-5-555bf51297e7.gif

Kevin Rue-Albrecht (18:44:35): > well i have no credit for the original sticker, I just borrow the sticker file from BiocStickers and patiently delete a circle at a time to make the GIF

Kevin Rue-Albrecht (18:45:51): > fixing the left/right-handing will be somewhat more complicated and left to another day, as it requires resizing - and maybe repositioning - each individual circle

Kevin Rue-Albrecht (18:46:56): > bonus (?) point for the new GIF: I’ve made the background transparent.

Kevin Rue-Albrecht (18:47:30): > Though I can’t really think of any good reason to put the GIF on any other brackground than white anyway

Marcel Ramos Pérez (19:23:56): > Maybe usegganimate?

2020-05-08

Kevin Rue-Albrecht (04:02:24): > thanks for the reminder@Marcel Ramos Pérezgganimatehas been on my list of a while (for other purposes), seems worth checking out for this application too

Kevin Rue-Albrecht (04:03:31): > I liked the idea of working with the original files to preserve’s Laurent’s work

Kevin Rue-Albrecht (18:26:36): > Forgot to bring over here the series of posts from#sticker_joyHere is a GIF sticker with right-handed helix and 5’->3’ synthesis > I renamed the file without realising that it would break OSCA, so I’ve also opened a PR forhttp://osca-dev.bioconductor.org/to update that - File (GIF): Bioconductor-parallel.gif

Aaron Lun (18:27:09): > They don’t seem to finish on time.

Kevin Rue-Albrecht (18:27:19): > Blue has an extra two circles

Kevin Rue-Albrecht (18:28:23): > Checkout#sticker_joy. I had a version where blue started with an offset of 2 to finish in sync, but that looks even weirder

Kevin Rue-Albrecht (18:31:25): > Anyway, over and out for today. I’ve messed around enough with Laurent’s work for one day

Aaron Lun (18:33:00): > FWIW I would actually like to see a faster gif with one strand at a time, like you did before. Then you get a higher frame rate and you don’t have to worry about the lack of sync.

Martin Morgan (18:33:02) (in thread): > @Kevin Rue-Albrechtthis is so great, Kevin!. That the helix is correct and replication in the right direction!

Kevin Rue-Albrecht (18:36:38): > I’ll make it faster tomorrow, but you’re looking for a *Serial sticker:https://github.com/Bioconductor/BiocStickers/blob/master/Bioconductor/Bioconductor-serial.gif

Kevin Rue-Albrecht (18:38:00) (in thread): > Thanks! I’ve PR’ed a couple of versions on the BiocStickers. Feedback welcome about speed, etc.

Kevin Rue-Albrecht (18:39:26): > :bioc-load:

2020-05-16

Aaron Lun (03:34:02): > @Martin Morganisosca.bioconductor.orgpointing at the right place? I changed the repo name so it probably isn’t looking athttps://bioconductor.github.io/OrchestratingSingleCellAnalysis-release/anymore. - Attachment (bioconductor.github.io): Orchestrating Single-Cell Analysis with Bioconductor > Online companion to ‘Orchestrating Single-Cell Analysis with Bioconductor’ manuscript by the Bioconductor team.

Martin Morgan (06:48:31): > this filehttps://github.com/Bioconductor/OrchestratingSingleCellAnalysis-release/blob/327c497c9c189adc6e5e0f2fb444cd804e6a918d/docs/CNAMEturns out to be important

Martin Morgan (12:31:29): > @Kevin Bligheno, it’s better to wait for the permanent solution (@Aaron Lunrestoring CNAME) rather than pointing to a temporary fix…

Aaron Lun (14:05:23): > It is done.

Kevin Rue-Albrecht (14:07:58): > :) - File (GIF): Bioconductor-serial (1).gif

Aaron Lun (14:36:49): > I hope OSCA isn’t being used as bedtime reading for children. Not really the target audience, I must say.

Aaron Lun (14:58:09): > Good for them.

Aaron Lun (14:58:24): > I remember when I was a high schooler.

Aaron Lun (14:58:33): > Geez, that was a decade ago.

Aaron Lun (14:58:49): > Getting old.

Stephanie Hicks (15:02:45): > @Kevin Rue-Albrecht— i turned your gif into an emoji:bioc-hex:!

Kevin Rue-Albrecht (15:03:18): > Awesome! I did the note alone for that too:bioc-load:

Stephanie Hicks (15:03:27): > :thumbsup:

Stephanie Hicks (15:03:39): > :happy_goat:

Aaron Lun (19:58:39): > How many CPUs do I have to play with on the BioC build system for workflow packages?

Aaron Lun (19:58:57): > My laptop can’t take much more of running this 350k cell analysis routinely.

2020-05-17

Aaron Lun (01:38:21): > I’ve got it. I know how to deceive the BBS into building this bloody book.

Aaron Lun (01:47:09): > But first, I need to get this onto BioC-devel:https://github.com/LTLA/rebook.

Aaron Lun (01:47:38): > Oh, this’ll be great. I’ll be able to just relentlessly abuse the workflow build system to get it to make my book.

Aaron Lun (01:47:55): > No more waiting 3 hours for my laptop to churn through the book by itself!

Aaron Lun (01:48:30): > 2 of those hours are okay, I can still do other things. the last hour uses all my CPU and RAM so I need to do something else.

Aaron Lun (01:48:35): > Like go clean my apartment.

Aaron Lun (14:43:58): > We don’t even have an OSCA 1.

2020-05-18

Alexander Toenges (09:02:15): > @Alexander Toenges has joined the channel

Alexander Toenges (09:03:40): > The section on RNA velocity is currently blank. Is this under construction / released soon?

Aaron Lun (11:40:55): > It’ll be done whenever someone does it.

2020-05-21

Aaron Lun (19:13:15): > @Davide Risso > > library(scRNAseq) > se <- ZeiselBrainData() > > library(scater) > se <- logNormCounts(se) > > library(scran) > dec <- modelGeneVar(se) > hvgs <- getTopHVGs(dec, prop=0.1) > > system.time(se <- runPCA(se, subset_row=hvgs)) > > library(zinbwave) > out <- zinbwave(se[hvgs,], zeroinflation=FALSE, residuals=FALSE) > > I’m not getting any reduced dims inout.

2020-05-22

Davide Risso (08:27:20): > I will have a look… is this with Bioc release or devel?

Davide Risso (08:52:02): > Oh, I see the default value ofKis 0, you should specify aKgreater than zero, typically 10 works well, and the method doesn’t scale well with K so I would not go much beyond 10

Davide Risso (08:52:48): > In case you haven’t noticed, there is also thescrypackage in the new release which implements SCE friendly GLM-PCA which should be faster thanzinbwave

Davide Risso (08:54:06): > > > out <- zinbwave(se[hvgs,], K=2, zeroinflation=FALSE, residuals=FALSE) > > out > class: SingleCellExperiment > dim: 967 3005 > metadata(0): > assays(2): counts logcounts > rownames(967): Plp1 Trf ... Tgfbr2 Syt7 > rowData names(1): featureType > colnames(3005): 1772071015_C02 1772071017_G12 ... 1772066098_A12 > 1772058148_F03 > colData names(11): tissue group # ... level2class sizeFactor > reducedDimNames(2): PCA zinbwave > altExpNames(2): ERCC repeat >

Aaron Lun (12:57:00): > Seems that the default value ofKshoudl not be zero. Kind of defeats the point.

Aaron Lun (12:57:23): > Anyway, I was hoping to get a quick and dirty method that uses the existing sizeFactors.

Aaron Lun (12:57:33): > Is this the GLM-PCA?

Aaron Lun (13:00:25): > Well, I’m going to stop you right there:https://github.com/willtownes/glmpca/blob/edc04cccc374644f9213015e17e8ee6d53f61e72/R/glmpca.R#L303I can’t countenance that.

Davide Risso (14:28:18) (in thread): > Yeah, I can’t remember why we thought that was a good idea… probably for those that use it to compute the weights since K=0 is faster…

2020-05-23

Stephanie Hicks (00:53:38): > Yeah, it’s not great that line. There is a group who is interested in extending this to work with other matrices e.g. hdf5. But to be fair we just started on these extensions.

2020-05-25

Aaron Lun (18:32:21): > Also,tradeSeq::fitGAMis telling me that it’ll take 39 minutes for ~1000 cells. That’s like 20% of the time required to build the entire book!

2020-05-30

Aaron Lun (22:19:01): > I need an intron count dataset.

Stephanie Hicks (22:35:55): > doing some RNA velocity?

Aaron Lun (22:37:45): > For the book.

Stephanie Hicks (22:41:35): > right, so I know@Charlotte Soneson@Avi Srivastava@Rob Patrohave generated a few count matrices with introns (https://github.com/csoneson/rna_velocity_quant) forhttps://www.biorxiv.org/content/10.1101/2020.03.13.990069v1

Stephanie Hicks (22:42:03): > hmm, but upon a closer look, this looks like the scripts to generate the matrices, not the matrices themselves

2020-05-31

Charlotte Soneson (03:09:57): > Happy to share some count matrices, e.g. the ones generated here:https://combine-lab.github.io/alevin-tutorial/2020/alevin-velocity/. Depending on how you’re planning to structure things, there are also example data sets built intoscVelo(e.g.https://scvelo.readthedocs.io/Pancreas.html). - Attachment (combine-lab.github.io): Alevin Velocity > RNA Velocity with alevin

Aaron Lun (03:19:21): > It needs to be in a SCE.

Charlotte Soneson (03:54:11): > Ok. So I guess there are several options: > * I can share the SCE we got from the tutorial I linked to above > * I can share any of the objects we used for the paper that Stephanie mentioned (processed from raw data using the code in the GitHub repo linked in her post) > * If you want to get one of thescVeloexample data sets in SCE format, you could do e.g. > > > download.file("[https://github.com/theislab/scvelo_notebooks/raw/master/data/Pancreas/endocrinogenesis_day15.h5ad](https://github.com/theislab/scvelo_notebooks/raw/master/data/Pancreas/endocrinogenesis_day15.h5ad)", "endocrinogenesis_day15.h5ad") > x <- Seurat::ReadH5AD("endocrinogenesis_day15.h5ad") > sce <- SingleCellExperiment(assays = list(counts = GetAssayData(GetAssay(x, "spliced")), spliced = GetAssayData(GetAssay(x, "spliced")), unspliced = GetAssayData(GetAssay(x, "unspliced"))), colData = x@meta.data) >

Aaron Lun (04:05:08): > Can’t you just make a PR into scRNAseq?

Charlotte Soneson (04:08:08): > Yes, I could do that, adding just an example for one of the data sets from the paper (I have more than 50 SCE objects in there).

Aaron Lun (04:09:44): > Or you can make your own EHub package. I don’t really care.

Aaron Lun (04:10:04): > I sure as hell am not going to take a scenic tour through AnnData and Seurat to get what I want.

2020-06-01

Aaron Lun (02:25:19): > @Charlotte Sonesonwhere are these count matrices?

Aaron Lun (02:25:40): > from the alevin website.

Aaron Lun (02:27:02): > and what’s with the wonky styling on the website? half the screen is useless.

Helena L. Crowell (03:02:52): > @Helena L. Crowell has left the channel

Charlotte Soneson (03:21:32): > Here’s the SCE generated in the tutorial - File (Gzip): txis.rds

Aaron Lun (03:22:50): > I’ll need to be able to pull this from somewhere for the book builder to work.

Aaron Lun (03:23:33): > Perhaps it is just easiest to wrap up all the relevant code into a Rmarkdown and make a PR into scRNAseq.

Charlotte Soneson (03:28:51): > Yes, I’ll get to that

Aaron Lun (03:30:34): > How long does it take for that chunk to run?

Charlotte Soneson (03:31:24): > Which chunk? The whole tutorial up to the generation of the SCE?

Aaron Lun (11:05:53): > yeah

Charlotte Soneson (11:06:23): > The actual analysis takes maybe 30 min-1h or so, most of the time goes to download the data.

Aaron Lun (11:06:53): > okay. Maybe I’ll just get you to upload it to EHub as well.

2020-06-02

Stephanie Hicks (12:49:44): > random question. Would anyone here (or can you recommend someone) — who has read and used the#osca-bookonline to help them analyze their own data — be open to talk with a journalist from Nature about your experience with the OSCA book? He is writing up an article, and wants to talk to people who have been using it. His goal is to understand the impact of the book on the world of single cell through people’s experiences.

Ludwig Geistlinger (14:06:34): > @Ludwig Geistlinger has joined the channel

Peter Hickey (18:16:23) (in thread): > we’ve been using it as the basis for the reports we generate in the single cell core facility where i lead the bioinf. also used it to teach workshops

Stephanie Hicks (19:29:20) (in thread): > Would it be OK if I pass along your name and email to have the journalist reach out to you?

Peter Hickey (19:58:27) (in thread): > sure thing

2020-06-03

Chris Vanderaa (05:02:59) (in thread): > I have used it to dive into my first scRNA-Seq analysis. If that’s the interest of the journalist, I am willing to share my experience with the OSCA book as a newbie.

Stephanie Hicks (16:01:05) (in thread): > ok thanks! I passed along your name and email address@Chris Vanderaa

Aaron Lun (19:49:26): > FYI I’m making solid progress in tricking the BBS into building the book. Much sleight of hand required here to convince the system to run bookdown.

2020-06-06

Olagunju Abdulrahman (19:58:36): > @Olagunju Abdulrahman has joined the channel

2020-06-10

Aaron Lun (00:57:07): > Y’know@Martin MorganI wonder whether we could have a page on the Bioconductor website that lists all of the available books. I already have two entries to add there (this one and the SingleR book), there’s the spatial book, and then there’s the DelayedArray developer’s guide.

Aaron Lun (00:57:18): > Even better if those books were all available in thebioconductor.orgdomain.

Aaron Lun (00:58:16): > Even better still if those books were built by the BBS. I have managed to get more than half of the OSCA book to build there using simpleSingleCell as a trojan horse, seehttp://bioconductor.org/checkResults/devel/workflows-LATEST/simpleSingleCell/malbec1-buildsrc.html.

Martin Morgan (05:09:33): > in some ways this sounds like an iteration on the ‘workflow’ concept, but where the presentation is something other than a single package. Or is it really that the ‘books’couldbe wrapped in the package infrastructure, buying quite a bit in terms of infrastructure re-use?

Aaron Lun (10:21:01): > I think it may well be possible to build the book using the package infrastructure, but deployment will have to be another piece.

Aaron Lun (10:46:07): > There is an interesting question of how many compute resources the workflow builder has. Does it have more than my laptop? I guess we’ll find out!

Martin Morgan (12:24:57): > The workflow builder uses the same servers as the nightly builders, so they are quite powerful but competing with other uses; ‘long running’ is probably fine, while ‘using a lot of memory & tens of cpus’ would not work. It would be interesting to use books as a way to explore altenative build systems@Vince Careye.g., rent a large compute node for an hour every week…

Aaron Lun (12:26:23): > Max resource usage for the book is the hour it takes to compile the HCA chapter (16 GB RAM, 10 CPUs). Don’t know if that’s large in your definition.

Martin Morgan (12:29:40): > That’s maybe 1/3 to 1/2 the compute power of the build system nodes, which I think are fully occupied maybe 16-20 hours a day. So it’s a substantial commitment that probably impinges on the overall builds…@Hervé Pagèswill correct the details

Aaron Lun (12:32:10): > Hm. Guess I could turn the CPU usage down. What’s the workflow time limit?

Hervé Pagès (12:37:55): > @Hervé Pagès has joined the channel

Hervé Pagès (12:56:41): > 2h time limit for workflows (see TIMEOUT in glyph table at top ofhttps://bioconductor.org/checkResults/3.12/workflows-LATEST/). The workflows builds are squeezed in a short window between 2 software build runs so giving them more time would create an overlap. I’ve been thinking of making the software builds take a break on Saturdays so we’d have the full power of the builders available for the Long Tests builds. There would probably still be enough time left to also run the workflow builds that day (with an extended time limit).

Aaron Lun (12:58:09): > Hm. Even at full resources, it still takes about 3 hours to compile the book on my laptop. Mind you, 2 of those hours are run with <8 GB and 1 core.

Aaron Lun (12:59:05): > Also it uses a lot of HDF5Arrays, and my laptop has an SSD, so I often budget a ~2-fold slowdown on a more distributed filesystem.

Hervé Pagès (13:02:00): > I think we could easily give the workflows builds 5 or 6 hours on Saturdays.

Hervé Pagès (13:04:03): > or more. I mean we would have a full 24 h window to run the Long Tests and workflows builds.

Hervé Pagès (13:04:25): > that should be enough?

Aaron Lun (13:07:43): > Yes, that should be sufficient.

Aaron Lun (13:08:37): > Are there any hard-caps on the available compute resources (e.g., via cgroups) or is a free for all?

Hervé Pagès (13:15:55): > No enforced ones besides the limitation of the hardware: 20 cores, 32 Gb or RAM for the Linux builders. (The Linux builders are a little bit outdated and lagging behind with respect to the other builders but they’re still by far the fastest.) Of course we expect package/workflow authors to be good citizens and to not abuse the resources. Please don’t laugh!

Hervé Pagès (13:18:40): > FWIW we should get a much more powerful Linux builder soon (with SSD).

Hervé Pagès (14:03:05): > I see simpleSingleCell build time is very close to the 2h limit. We could probably increase that limit to 3h (plan A) and keep the current schedule (workflows builds run on Tuesdays and Fridays) but if we want to go beyond that we’d probably have to go for plan B (run the workflow builds once a week on Saturdays).

Hervé Pagès (14:09:43): > Plan C would be to have some dedicated builds for books maybe? The input to the build system should still be in the form of a package though so using the workflows builds for that sounds like a natural fit. Plan D is to tag some workflows as books (e.g. via a BiocViews term or via.BBSoptions) and have the workflows builder recognize the tag for all kinds of special treatments.

Aaron Lun (16:17:17): > Let’s see how far we go with A, and how much of the book is built before the TIMEOUT. Probably it’ll get into the HCA chapter but it won’t finish it.

Hervé Pagès (22:49:14): > I’ve increased the timeout limit from 2h to 3h. Let’s see how it goes~tomorrow~on Friday.

2020-06-22

Vince Carey (07:12:10): > It would be nice ifbioconductor.orgcould take care of these tasks with its own resources, but I think we need a different model for the long run. Getting an estimate for the charges for a OSCA-book build in the AnVIL system seems totally appropriate for all parties concerned. Once we get an estimate for annual support we can try to get money from STRIDES or build up foundation funds to cover.@Aaron Lundo you – or a book team member – want to spend some time with me to go over how to make an AnVIL workspace adequate to build the book there?

Aaron Lun (13:35:40): > What would this entail?

Aaron Lun (21:40:41): > if you just need a dockerfile, i can make that no problems.

2020-06-25

Almut (08:05:57): > @Almut has joined the channel

2020-06-27

Hervé Pagès (17:32:59): > @Aaron LunLooks like the simpleSingleCell workflow needs more than 3h on malbec1. I’ve just increased the timeout limit again from 3h to 4h. Really curious to see how much we can stretch the window for the workflow builds without compromising the daily software builds. Let’s see how it goes next Tuesday.

Aaron Lun (17:34:35): > k

2020-07-01

Aaron Lun (00:08:32): > OMG

Aaron Lun (00:08:38): > IT’S DONE IT

Aaron Lun (00:12:20): > THE JOURNEY IS OVER

Hervé Pagès (13:12:40): > Made it in 3h 58m 8s so very close to timing out again. pffff!

2020-07-02

Aaron Lun (14:08:24): > That’s actually not as bad as I thought, it’s only +1 hour compared to my laptop’s runtime.

Aaron Lun (14:08:32): > Maybe even less. Maybe only +30 minutes.

Aaron Lun (21:00:20): > On a side note, there’s something weird with the BBS’s pandoc.

Aaron Lun (21:01:25): > BBS: - File (PNG): Screenshot from 2020-07-02 18-00-38.png

Aaron Lun (21:01:51): > Mine: - File (PNG): Screenshot from 2020-07-02 18-01-10.png

Aaron Lun (21:02:35): > Note the differences in the color scheme. Also there is no space between the code and output block with BBS’s pandoc.

Hervé Pagès (22:08:36): > mmh, doesn’t look good. malbec1 has pandoc 2.1-1 which is the version included with Ubuntu 18.04 LTS, So quite old given that the latest version is 2.10 (from 2020-06-29). I just installed pandoc 2.7.3 (luckily the Pandoc people provide Linux binaries so the installation is easy). Note that we also have 2.7.3 on the Windows and Mac builders. We purposely stay away from the most recent versions as they seem to break a few Bioconductor packages (we actually had to downgrade to 2.7.3 recently on the Windows and Mac builders to work around this). > Let’s see how things go tomorrow (next scheduled workflow build) but you’ll need a version bump in simpleSingleCell to trigger propagation of the new tarball.

2020-07-03

Aaron Lun (00:30:05): > Excellent.

Hervé Pagès (19:15:58): > Same problem with pandoc 2.7.3. Updating to the latest pandoc (2.10) doesn’t help either (just tried this on my laptop, still running Ubuntu 16.04 here). The HTML source code I get locally is the same as the online HTML: > > <p>To inspect the object, we can simply type <code>sce</code> into the console to see some pertinent information, which will display an overview of the various slots available to us (which may or may not have any data).</p> > <div class="sourceCode" id="cb10"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb10-1"><a href="data-infrastructure.html#cb10-1" aria-hidden="true"></a>sce</span></code></pre></div> > <pre><code>## class: SingleCellExperiment > ## dim: 10 3 > ## metadata(0): > ## assays(1): counts > ## rownames(10): gene_1 gene_2 ... gene_9 gene_10 > ## rowData names(0): > ## colnames(3): cell_1 cell_2 cell_3 > ## colData names(0): > ## reducedDimNames(0): > ## altExpNames(0):</code></pre> > > so the lack of space between the code and output block suggests a CSS issue.

Hervé Pagès (19:27:07): > sessionInfo()after runningbookdown::render_book('index.Rmd')on Part 1 of the book only: > > > sessionInfo() > R version 4.0.2 Patched (2020-06-24 r78746) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 16.04.6 LTS > > Matrix products: default > BLAS: /home/hpages/R/R-4.0.r78746/lib/libRblas.so > LAPACK: /home/hpages/R/R-4.0.r78746/lib/libRlapack.so > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_4.0.2 magrittr_1.5 bookdown_0.20 htmltools_0.5.0 > [5] tools_4.0.2 rstudioapi_0.11 yaml_2.2.1 stringi_1.4.6 > [9] rmarkdown_2.3 knitr_1.29 stringr_1.4.0 digest_0.6.25 > [13] xfun_0.15 rlang_0.4.6 evaluate_0.14 >

Aaron Lun (21:10:50): > I have just realized that my version of pandoc is actually pretty ancient, 1.19.2.4.

Aaron Lun (21:38:27): > god. Figured it out.highlights.jsdoes NOT play nice with the latest pandoc.

2020-07-07

Vivek Das (02:58:53): > @Vivek Das has joined the channel

2020-07-11

Vince Carey (20:40:33) (in thread): > Sorry to have lost track of this. If book-building is still of concern just let me know and I can look into the idea of building on AnVIL.

Aaron Lun (20:57:49) (in thread): > I don’t know. We have functional builds on the BBS but is that the way forward?

2020-07-12

Vince Carey (08:06:20) (in thread): > If you can make a dockerfile and point me to the current source repo for book i will engage with AnVIL to produce an image of book.

2020-07-17

Hervé Pagès (15:42:18): > @Aaron LunFor some unclear reasons, the tarball generated byR CMD build simpleSingleCellseems corrupted: > > > untar("simpleSingleCell_1.13.5.tar.gz", "simpleSingleCell/DESCRIPTION") > /bin/tar: Skipping to next header > /bin/tar: Skipping to next header > /bin/tar: Exiting with failure status due to previous errors > Warning message: > In untar("simpleSingleCell_1.13.5.tar.gz", "simpleSingleCell/DESCRIPTION") : > '/bin/tar -xf 'simpleSingleCell_1.13.5.tar.gz' 'simpleSingleCell/DESCRIPTION'’ returned error code 2 > > Any idea what could cause this? FWIW I ran across this while looking at the logs for the propagation scripts where this is causing problems (I’m working on a workaround). Kind of unexpected thatR CMD buildcan produce invalid tarballs under certain conditions. I’d rather have it fail.:disappointed:

Aaron Lun (15:44:43): > Not sure, but if I had to say, it’s because one of the chapters instantiates a conda environment in the working directory. (This is not a permanent thing; it should be replaced by the basilisk clientvelociraptoronce the base packagezellkonvertermakes it past the SPB.) The result is that the tarball also includes a 300 MB conda environment with super-long paths. This went through once so I don’t know why it fails now, but if there’s a candidate for a failure point, this would be it.

Aaron Lun (15:45:17): > I think the book also pulls down some files into a local BiocFileCache that I didn’t remember to delete in theMakeflie, but this should be smaller.

Hervé Pagès (18:00:13): > Thanks. Almost everything goes thru except the citation. More precisely: extracting the tarball withtar zxfor installing it in R works as expected and without complaining. However listing the content of the tarball withtar ztfor extracting individual files from it returns error code 2. This was breakingbiocViews::extractCitations()which is used in our propagation script for extracting the citations that get displayed on the website. I tried to make some changes to it to make it robust to corrupted tarballs.

2020-07-20

Ting Sun (15:59:54): > @Ting Sun has joined the channel

shr19818 (16:00:06): > @shr19818 has joined the channel

2020-07-21

Aaron Lun (02:29:27): > @Vince Careyhttps://github.com/Bioconductor/OrchestratingSingleCellAnalysis-basecontains a Dockerfile that should have all the book’s dependencies. I haven’t tried running it on the container yet, though it should be as easy as booting it up in interactive mode, opening R and callingbookdown::render().

Vince Carey (05:33:55): > @Aaron Luni just started building the image … hit the libglpk.so error for igraph …

Vince Carey (08:02:10): > lots of noise during the image build process withsudo docker build -t vjcitn/oscabk:v1 OrchestratingSingleCellAnalysis-base/do you use other parameters? > > : In file(con, "r") : > URL '[https://bioconductor.org/config.yaml](https://bioconductor.org/config.yaml)': status was 'Couldn't resolve host name' > 2: In file(con, "r") : > URL '[http://bioconductor.org/config.yaml](http://bioconductor.org/config.yaml)': status was 'Couldn't resolve host name' > **** testing if installed package can be loaded from final location > During startup - Warning messages: > 1: In file(con, "r") : > URL '[https://bioconductor.org/config.yaml](https://bioconductor.org/config.yaml)': status was 'Couldn't resolve host name' > 2: In file(con, "r") : > URL '[http://bioconductor.org/config.yaml](http://bioconductor.org/config.yaml)': status was 'Couldn't resolve host name' > **** testing if installed package keeps a record of temporary installation path > * DONE (EnsDb.Hsapiens.v86) > > The downloaded source packages are in > ‘/tmp/RtmpQuP6yx/downloaded_packages’ > Warning: unable to access index for repository[https://bioconductor.org/packages/3.12/bioc/src/contrib](https://bioconductor.org/packages/3.12/bioc/src/contrib): > cannot open URL '[https://bioconductor.org/packages/3.12/bioc/src/contrib/PACKAGES](https://bioconductor.org/packages/3.12/bioc/src/contrib/PACKAGES)' > Warning: unable to access index for repository[https://bioconductor.org/packages/3.12/data/annotation/src/contrib](https://bioconductor.org/packages/3.12/data/annotation/src/contrib): > cannot open URL '[https://bioconductor.org/packages/3.12/data/annotation/src/contrib/PACKAGES](https://bioconductor.org/packages/3.12/data/annotation/src/contrib/PACKAGES)' > Warning: unable to access index for repository[https://bioconductor.org/packages/3.12/data/experiment/src/contrib](https://bioconductor.org/packages/3.12/data/experiment/src/contrib): > cannot open URL '[https://bioconductor.org/packages/3.12/data/experiment/src/contrib/PACKAGES](https://bioconductor.org/packages/3.12/data/experiment/src/contrib/PACKAGES)' > Warning: unable to access index for repository[https://bioconductor.org/packages/3.12/workflows/src/contrib](https://bioconductor.org/packages/3.12/workflows/src/contrib): > cannot open URL '[https://bioconductor.org/packages/3.12/workflows/src/contrib/PACKAGES](https://bioconductor.org/packages/3.12/workflows/src/contrib/PACKAGES)' > Warning: unable to access index for repository[https://packagemanager.rstudio.com/all/__linux__/bionic/291/src/contrib](https://packagemanager.rstudio.com/all/__linux__/bionic/291/src/contrib): > cannot open URL '[https://packagemanager.rstudio.com/all/__linux__/bionic/291/src/contrib/PACKAGES](https://packagemanager.rstudio.com/all/__linux__/bionic/291/src/contrib/PACKAGES)' > Warning messages: > 1: package ‘rebook’ is not available (for R version 4.0.0) > 2: In install.packages(...) : > installation of package ‘monocle’ had non-zero exit status > 3: In install.packages(...) : > installation of package ‘iSEE’ had non-zero exit status > 4: In install.packages(...) : > installation of package ‘slingshot’ had non-zero exit status > 5: In install.packages(...) : > installation of package ‘batchelor’ had non-zero exit status > 6: In install.packages(...) : > installation of package ‘scran’ had non-zero exit status > 7: In install.packages(...) : > installation of package ‘tradeSeq’ had non-zero exit status > > > > > Removing intermediate container 0d32ada094b6 > ---> 22448915536a > Step 4/5 : RUN R --quiet -e "BiocManager::install('bookdown')" > ---> Running in 4a8c20f02112 > During startup - Warning messages: > 1: In file(con, "r") : > URL '[https://bioconductor.org/config.yaml](https://bioconductor.org/config.yaml)': status was 'Couldn't resolve host name' > 2: In file(con, "r") : > URL '[http://bioconductor.org/config.yaml](http://bioconductor.org/config.yaml)': status was 'Couldn't resolve host name' > > BiocManager::install('bookdown') > Error: Bioconductor version cannot be validated; no internet connection? > Execution halted > The command '/bin/sh -c R --quiet -e "BiocManager::install('bookdown')"' returned a non-zero code: 1 >

Aaron Lun (11:11:59): > jeez, it is noisy.

Aaron Lun (11:12:21): > I wonder if we could get@Martin Morganto link the repo to dockerhub. I can’t do that because I don’t own the GH org.

Martin Morgan (12:04:54): > I’ll pass to@Nitesh Turaga– so ‘the repo’ isBioconductor/OrchestratingSingleCellAnalysis-base?

Nitesh Turaga (12:05:17): > @Nitesh Turaga has joined the channel

Aaron Lun (12:05:36): > yep.

Nitesh Turaga (12:05:53): > let’s not do “base” right??

Nitesh Turaga (12:06:07): > Let’s just call itBioconductor/OrchestratingSingleCellAnalysisthoughts?

Nitesh Turaga (12:06:34): > and the tags on it can be whatever you want…

Aaron Lun (12:06:36): > if we had a better deployment system for books beyond github, then sure.

Aaron Lun (12:07:01): > The “-base” distinguishes it from the repos that actually host the compiled books, “-release” and “-devel”.

Nitesh Turaga (12:07:45): > I see, as opposed to tags + branches which bioconductor_docker uses now?

Nitesh Turaga (12:08:31): > It’s more “git-like” to use branches and more docker-like to use tags. But this isn’t my responsibility to maintain, so you can call it “base” if you think that’s the best way forward

Aaron Lun (12:09:06): > OSCA-base actuallydoeshave branches for the release version of all of the content, but these are distinct from the actual repo that host the compiled version of the book.

Nitesh Turaga (12:09:20): > I see!

Aaron Lun (12:09:22): > Fundamentally, this is because GH Pages only creates one site per repo.

Nitesh Turaga (12:09:40): > Ok, so, you need me to push this image to bioconductor ?

Aaron Lun (12:10:05): > just to register a dockerhub entry so that people don’t have to manually build it themselves.

Nitesh Turaga (12:10:43): > ok, is there a place I can download a pre-built image on some registry? Or I have to build it the first time and push?

Aaron Lun (12:11:02): > wait wait wait.

Aaron Lun (12:11:16): > Can’t you just indicate to dockerhub that it should build from the repo?

Aaron Lun (12:11:21): > That’s what I always do.

Nitesh Turaga (12:11:30): > Yes…

Aaron Lun (12:11:30): > I never actually build my own images.

Nitesh Turaga (12:12:15): > Yes there is….sorry…brain just stopped for a second

Nitesh Turaga (12:12:51): - File (PNG): Screen Shot 2020-07-21 at 12.12.42 PM.png

Aaron Lun (12:13:01): > guh.

Aaron Lun (12:13:03): > hm.

Nitesh Turaga (12:13:08): > _?

Aaron Lun (12:13:24): > Well, I guess we could rename it sans the “-base”, that’s mostly historical anyway.

Aaron Lun (12:13:43): > Once we have a better deployment solution for the books, that’s what it would be.

Aaron Lun (12:13:56): > The “-base” is only required to differentiate this repo from “-release” and “-devel”.

Nitesh Turaga (12:14:37): > I’ll wait for a final decision …

Aaron Lun (12:14:44): > Well, I just did it.

Nitesh Turaga (12:14:46): > I’m for not haveing-base

Aaron Lun (12:15:12): > I wish we could have these books hosted on a BioC server instead.

Nitesh Turaga (12:15:25): > Also, we need it to be all lowercase on dockerhub

Nitesh Turaga (12:15:52): > bioconductor/orchestratingsinglecellanalysis

Nitesh Turaga (12:16:04): > See error message above..

Nitesh Turaga (12:16:27): > But that shouldn’t matter…

Nitesh Turaga (12:16:43): > we don’t need the github repo and image to be named the same:confused:

Nitesh Turaga (12:19:19): > https://hub.docker.com/repository/docker/bioconductor/orchestratingsinglecellanalysis/general

Aaron Lun (12:21:04): > great, thanks. I can’t see the build in progress until it finishes, I guess I don’t have permisisions.

Nitesh Turaga (12:21:15): > I can give you permissions.

Aaron Lun (12:21:17): > not a problem, I have faith in my dockerfile.

Nitesh Turaga (12:31:41): > It failed…

Aaron Lun (12:31:58): > IT BETRAYED ME.

Aaron Lun (12:32:10): > Maybe you should give me access then.

Aaron Lun (12:32:12): > ltla.

Nitesh Turaga (12:32:14): > Lol, I think it’s because of missing igraph dependencies

Nitesh Turaga (12:32:22): > libglpk

Nitesh Turaga (12:32:23): > :confused:

Aaron Lun (12:32:26): > Hold on, I just installed those.

Nitesh Turaga (12:32:42): > Trying to figure out how to get you access

Nitesh Turaga (12:32:44): > one second

Nitesh Turaga (12:33:53): > what’s your dockerhub id?

Aaron Lun (12:34:00): > ltla

Aaron Lun (12:34:12): > I think. I actually can’t remember, I haven’t logged in myself for a while.

Aaron Lun (12:34:27): > As in, I’m already logged in, so I never actually type my login.

Aaron Lun (12:34:42): > And I’m on the wrong computer today so I can’t actually log in!

Nitesh Turaga (12:37:05): > Did you get an invite?

Aaron Lun (12:37:35): > yep, looks like that works.

Nitesh Turaga (12:38:27): > You should see “admin” writes on that image….But i’m wondering if “read and write” are more appropriate.

Aaron Lun (12:38:53): > ah you foiled my dastardly plans.

Nitesh Turaga (12:39:49): > ok, read and write it is….

Aaron Lun (12:41:14): > i can’t actually see an error message on the builds section, though.

Nitesh Turaga (12:41:34): > damn it…dockerhub….ok..

Nitesh Turaga (12:42:03): > How about now?

Aaron Lun (12:42:37): > nope, I’m just seeing “no autobuilds available”.

Aaron Lun (12:42:44): > oh wait, hold on

Aaron Lun (12:42:51): > I had to click “manage repository”

Nitesh Turaga (12:42:57): > hard refresh…“Cmd + Shift + R”

Nitesh Turaga (12:43:02): > obliterate the cache

Aaron Lun (12:43:31): > it’s the manage repo, caching clearing doesn’t do anything.

Nitesh Turaga (12:43:35): > you should see this view

Nitesh Turaga (12:43:42): - File (PNG): Screen Shot 2020-07-21 at 12.43.30 PM.png

Aaron Lun (12:44:04): > yes, I do now. The “public” view doesn’t contain this information.

Nitesh Turaga (12:44:28): > Perfect!

2020-07-22

Aaron Lun (12:06:08): > @Vince Careyas promised:https://hub.docker.com/r/bioconductor/orchestratingsinglecellanalysis

Vince Carey (12:32:44) (in thread): > can you give me the exact command? render is not exported from the bookdown in the image…

Aaron Lun (12:33:19) (in thread): > Oh, whoops.bookdown::render_book("index.Rmd").

Vince Carey (13:36:08) (in thread): > failed for lack of RMTstat which I then installed and restarted.

Vince Carey (13:37:00) (in thread): > thoughts about parallelization? is there a ‘make’ concept to avoid redundant recompilation?

Aaron Lun (13:37:11) (in thread): > not that I’m aware of.

Aaron Lun (13:37:24) (in thread): > that would also be complicated, as some of the chpaters are dependent on each other.

Aaron Lun (13:38:35) (in thread): > we could compile some of the workflow chapters in parallel, and use their cached contents to enable rapid execution later on.

Vince Carey (13:40:01) (in thread): > sounds like a plan … sounds like a book … about itself

Aaron Lun (13:46:58) (in thread): > there’s probably a whole bunch of hidden Suggests in there. I will guess thatstatmodis another one. As well as variousscaterSuggested packages, TBH.

Vince Carey (13:51:37) (in thread): > A dependency analysis would probably be worthwhile. This book is not going away any time soon but some dependencies might…

Aaron Lun (13:59:18) (in thread): > well, figuring out hidden suggests is pretty hard, it seems like an empirical thing.

Vince Carey (14:25:58) (in thread): > clustree

Vince Carey (14:49:53) (in thread): > So for applications of this sort, how about extending library() to fslibrary(), a fail-safe version that will use an installer if the requested library is not found? I think it is called for here. Of course it does not exist.

Vince Carey (16:00:51) (in thread): > celldex

Aaron Lun (16:01:23) (in thread): > my god

Aaron Lun (16:01:43) (in thread): > hold on, Rtsne didn’t fail somewhere?

Vince Carey (16:50:36) (in thread): > goana.default – GO.db is not installed or can’t be loaded … no problems with Rtsne yet.

Aaron Lun (16:52:27) (in thread): > working on a parallel make now.

Vince Carey (16:54:32) (in thread): > great. we need an installation verification mechanism also to fail early. should the “book” also be a “package”?

Aaron Lun (16:55:23) (in thread): > I don’t think there’s a way to catch uninstalled Suggests because you don’t know which Suggests end up being needed.

Vince Carey (16:59:13) (in thread): > Well then we just want a piece of code that knows all the packages needed to build the book. IMHO that would be the result of rownames(installed.packages()) after a successful build. It might have some unneeded packages but that is a minor concern.

Aaron Lun (17:00:45) (in thread): > yes, that will be the case withrebook::updateDependenciesshoudl do it.

Aaron Lun (17:29:27) (in thread): > Updated the requirements, image is rebuilding; added a Makefile for parallel builds.

Aaron Lun (17:30:05) (in thread): > well. We would need to install some kind of parallel make in the docker container.

Aaron Lun (17:33:04) (in thread): > actually, maybe it’s already there.

Vince Carey (17:53:23) (in thread): > > |........ | 11% > label: unnamed-chunk-3 (with options) > List of 1 > $ indent: chr " " > > Quitting from lines 53-63 (data-integration.Rmd) > Error in .Call2("C_h5getdimscales", filepath, name, scalename, PACKAGE = "HDF5Array") : > failed to open file '/tmp/RtmpZ4vat3/BiocFileCache/c4a19cd583f_1605' > Calls: local ... h5readDimnames -> get_h5dimnames -> h5getdimscales -> .Call2 > > Execution halted > Error in Rscript_render(f, render_args, render_meta, add1, add2) : > Failed to compile data-integration.Rmd >

Aaron Lun (17:53:58) (in thread): > ugugh.

Vince Carey (17:54:20) (in thread): > space?

Aaron Lun (17:55:01) (in thread): > Maybe it’s because BiocFileCache doesn’t know where to make the cache when it’s in a docker image.

Aaron Lun (17:55:34) (in thread): > Though that doesn’t make awholelot of sense either.

Aaron Lun (18:19:08) (in thread): > I think the BiocFileCache defaults back to thetempdir()ifrappdirscan’t find an otherwise suitable location. It’s clear to see that this would be broken, as the tmpdir is destroyed when a report ends and thus the cache contents would not persist to the next report (which is kind of the point of the cache).

Aaron Lun (18:21:01) (in thread): > Suggest doing the following before callingbookdown: > > Sys.setenv(EXPERIMENT_HUB_CACHE="ExperimentHub", ANNOTATION_HUB_CACHE="AnnotationHub") > > See if that helps.

Aaron Lun (18:21:55) (in thread): > Note that this will require destruction of all existing cached objects, so best to just reboot the container. You can use the latets docker hub build, I added all the extra dependencies.

Vince Carey (18:28:33) (in thread): > ok .. FWIW dies here > {--} > > pbmc3k <- all.sce$pbmc3k > > > dec3k <- all.dec$pbmc3k > > > pbmc3k > class: SingleCellExperiment > dim: 32738 2609 > metadata(0): > assays(2): counts logcounts > rownames(32738): ENSG00000243485 ENSG00000237613 ... ENSG00000215616 > ENSG00000215611 > rowData names(3): ENSEMBL_ID Symbol_TENx Symbol > colnames: NULL > colData names(13): Sample Barcode ... sizeFactor label > Loading required package: HDF5Array > Loading required package: DelayedArray > Loading required package: stats4 > Loading required package: Matrix > Loading required package: matrixStats > Loading required package: BiocGenerics > Loading required package: parallel > > Attaching package: ‘BiocGenerics’ > > The following objects are masked from ‘package:parallel’: > > clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, > clusterExport, clusterMap, parApply, parCapply, parLapply, > parLapplyLB, parRapply, parSapply, parSapplyLB > > The following objects are masked from ‘package:stats’: > > IQR, mad, sd, var, xtabs > > The following objects are masked from ‘package:base’: > > anyDuplicated, append, as.data.frame, basename, cbind, colnames, > dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, > grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, > order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, > rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, > union, unique, unsplit, which.max, which.min > > Loading required package: S4Vectors > > Attaching package: ‘S4Vectors’ > > The following object is masked from ‘package:Matrix’: > > expand > > The following object is masked from ‘package:base’: > > expand.grid > > Loading required package: IRanges > > Attaching package: ‘DelayedArray’ > > The following objects are masked from ‘package:matrixStats’: > > colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges > > The following objects are masked from ‘package:base’: > > aperm, apply, rowsum > > Loading required package: rhdf5 > reducedDimNames(3): PCA TSNE UMAP > altExpNames(0): > > > pbmc4k <- all.sce$pbmc4k > > > dec4k <- all.dec$pbmc4k > > > pbmc4k > class: SingleCellExperiment > dim: 33694 4182 > metadata(0): > assays(2): counts logcounts > rownames(33694): ENSG00000243485 ENSG00000237613 ... ENSG00000277475 > ENSG00000268674 > rowData names(3): ENSEMBL_ID Symbol_TENx Symbol > colnames: NULL > colData names(13): Sample Barcode ... sizeFactor label > reducedDimNames(3): PCA TSNE UMAP > altExpNames(0): > > > ## ----------------------------------------------------------------------------- > > universe <- intersect(rownames(pbmc3k), rownames(pbmc4k)) > > > length(universe) > [1] 31232 > > > # Subsetting the SingleCellExperiment object. > > pbmc3k <- pbmc3k[universe,] > Error in .Call2("C_h5getdimscales", filepath, name, scalename, PACKAGE = "HDF5Array") : > failed to open file '/tmp/RtmpjVPyo4/BiocFileCache/c8c27ed79d_1605' >

Vince Carey (18:29:08) (in thread): > i will update image and restart with the new env settings

Aaron Lun (18:29:56) (in thread): > good luck

Aaron Lun (18:30:12) (in thread): > If it works I’ll put it into the image itself.

Vince Carey (18:38:46) (in thread): > updated image and ran Sys.setenv but > > label: unnamed-chunk-3 (with options) > List of 1 > $ message: logi FALSE > > Quitting from lines 40-41 (data-infrastructure.Rmd) > Error in library(SingleCellExperiment) : > there is no package called 'SingleCellExperiment' > Calls: local ... withCallingHandlers -> withVisible -> eval -> eval -> library > > Execution halted > Error in Rscript_render(f, render_args, render_meta, add1, add2) : > Failed to compile data-infrastructure.Rmd >

Aaron Lun (18:39:15) (in thread): > oh god. what

Aaron Lun (18:39:29) (in thread): > oh crap. damn, I know why.

Aaron Lun (18:39:40) (in thread): > Okay, let me rebuild that image. I’ll throw in the environment variables while I do so.

Aaron Lun (18:45:22) (in thread): > hey, does the container already have Make?

Aaron Lun (18:45:59) (in thread): > I’m guessing it does, otherwise packages wouldn’t compile.

Vince Carey (19:08:28) (in thread): > /usr/bin/make is present

Aaron Lun (19:53:38) (in thread): > hoorah! it’s done.

Vince Carey (23:22:00) (in thread): > sadly, we come to > > aperm, apply, rowsum > > Loading required package: rhdf5 > |....... | 10% > ordinary text without R code > > |........ | 11% > label: unnamed-chunk-3 (with options) > List of 1 > $ indent: chr " " > > Quitting from lines 53-63 (data-integration.Rmd) > Error in .Call2("C_h5getdimscales", filepath, name, scalename, PACKAGE = "HDF5Array") : > failed to open file '/tmp/RtmpPK48fu/BiocFileCache/3a6a2ee194_1605' > Calls: local ... h5readDimnames -> get_h5dimnames -> h5getdimscales -> .Call2 > > Execution halted > Error in Rscript_render(f, render_args, render_meta, add1, add2) : > Failed to compile data-integration.Rmd > > again

Aaron Lun (23:32:06) (in thread): > gawaw

Aaron Lun (23:32:31) (in thread): > Did my env vars actually stick?

Vince Carey (23:34:13) (in thread): > > > Sys.getenv("EXPERIMENT_HUB_CACHE") > [1] "/ExperimentHub" > > leading slash ok?

Aaron Lun (23:37:36) (in thread): > I think so, that’s intended from a docker container.

Aaron Lun (23:37:46) (in thread): > Now that I’m on a linux machine, I’ll boot it up and have a look.

2020-07-23

Aaron Lun (00:09:25) (in thread): > Hm.

Aaron Lun (00:20:13) (in thread): > What happens if you just dormarkdown::render("data-integration.Rmd")? Clear out any*_cache.

Aaron Lun (00:25:06) (in thread): > Passes through fine for me.

Vince Carey (07:33:22) (in thread): > > aperm, apply, rowsum > > Loading required package: rhdf5 > |....... | 10% > ordinary text without R code > > |........ | 11% > label: unnamed-chunk-3 (with options) > List of 1 > $ indent: chr " " > > Quitting from lines 53-63 (data-integration.Rmd) > Error in .Call2("C_h5getdimscales", filepath, name, scalename, PACKAGE = "HDF5Array") : > failed to open file '/tmp/RtmpcFOoUB/BiocFileCache/211a2138a_1605' > > > tempdir() > [1] "/tmp/RtmpSWGq8h" > > dir(tempdir()) > [1] "callr-env-b1972b581" > [2] "HDF5Array_dataset_creation_global_counter" > [3] "HDF5Array_dump" > [4] "HDF5Array_dump_files_global_counter" > [5] "HDF5Array_dump_log" > [6] "HDF5Array_dump_names_global_counter" > > dir("/tmp/RtmpcFOoUB/BiocFileCache/") > character(0) > > dir("/tmp/RtmpcFOoUB/") > character(0) >

Vince Carey (07:57:06) (in thread): > I pulled the image just now. Some things changed. I tried rmarkdown::render(‘data-integration.Rmd’) and then scater was not found!

Vince Carey (09:49:16) (in thread): > started fresh … shocking new error > > output file: cell-annotation.knit.md > > Bioconductor version 3.12 (BiocManager 1.30.10), ?BiocManager::install for help > > > processing file: data-integration.Rmd > |. | 1% > ordinary text without R code > > |.. | 3% > label: setup (with options) > List of 2 > $ echo : logi FALSE > $ results: chr "asis" > > |... | 4% > inline R code fragments > > |.... | 6% > label: unnamed-chunk-1 (with options) > List of 2 > $ results: chr "asis" > $ echo : logi FALSE > > |..... | 7% > ordinary text without R code > > |...... | 8% > label: unnamed-chunk-2 > Quitting from lines 39-45 (data-integration.Rmd) > Error in all.sce$pbmc3k : $ operator is invalid for atomic vectors > Calls: local ... handle -> withCallingHandlers -> withVisible -> eval -> eval > > Execution halted > Error in Rscript_render(f, render_args, render_meta, add1, add2) : > Failed to compile data-integration.Rmd >

Vince Carey (09:50:28) (in thread): > > stvjc@stvjc-XPS-13-9300:~/OSCABOOK$ sudo docker pull bioconductor/orchestratingsinglecellanalysis:latest > [sudo] password for stvjc: > latest: Pulling from bioconductor/orchestratingsinglecellanalysis > Digest: sha256:5b096ab814c4897e7668ff3db711395b9c697e1aa180738641f9db3c261eb11c > Status: Image is up to date for bioconductor/orchestratingsinglecellanalysis:latest > docker.io/bioconductor/orchestratingsinglecellanalysis:latest >

Aaron Lun (11:13:15) (in thread): > my god, it’s chaos.

Vince Carey (13:33:12) (in thread): > do we agree on the id of the container image in use? can you send me your exact calling parameters?

Vince Carey (13:34:01) (in thread): > these seem to be precisely the things that docker is supposed to insulate us from!

Aaron Lun (14:32:20) (in thread): > I was using 5d7cc994.

Vince Carey (18:34:01) (in thread): > > REPOSITORY TAG IMAGE ID CREATED SIZE > vjcitn/remlarge latest 7df6d21496dd 8 hours ago 4.68GB > bioconductor/orchestratingsinglecellanalysis latest e03ca322a5a2 14 hours ago 7.62GB > vjcitn/remlarge <none> 46d6d2ff4db1 17 hours ago 4.46GB > mikelove/alevin2bioc latest 161a4f4d70f8 4 days ago 5.07GB > bioconductor/bioconductor_docker devel 78860d63165a 6 days ago 3.78GB > jmacdon/bioc2020anno latest e9620baa47df 3 weeks ago 5.86GB > stvjc@stvjc-XPS-13-9300:~$ sudo docker pull bioconductor/orchestratingsinglecellanalysis > Using default tag: latest > latest: Pulling from bioconductor/orchestratingsinglecellanalysis > Digest: sha256:5b096ab814c4897e7668ff3db711395b9c697e1aa180738641f9db3c261eb11c > Status: Image is up to date for bioconductor/orchestratingsinglecellanalysis:latest > docker.io/bioconductor/orchestratingsinglecellanalysis:latest > stvjc@stvjc-XPS-13-9300:~$ > > … can you confirm we are using same, and can you tell me exactly the command line used when you have success?

Aaron Lun (19:14:20) (in thread): > hold on, need to boot up my computer that can actually run docker.

Aaron Lun (19:18:05) (in thread): > Right, I have: > > REPOSITORY TAG IMAGE ID CREATED SIZE > bioconductor/orchestratingsinglecellanalysis latest e03ca322a5a2 15 hours ago 7.62GB > bioconductor/orchestratingsinglecellanalysis <none> 533644d9b159 24 hours ago 7.62GB > <none> <none> 1aff42e33f3a 2 days ago 3.8GB > <none> <none> 0b44d0e09e67 2 days ago 3.78GB > <none> <none> f902c5a86364 2 days ago 3.78GB > bioconductor/bioconductor_docker devel 78860d63165a 6 days ago 3.78GB >

Aaron Lun (19:18:34) (in thread): > Then I do: > > sudo docker run -it e03ca322a5a2 bash >

Aaron Lun (19:18:47) (in thread): > Open R, and thenrmarkdown::render("data-integration.Rmd").

Aaron Lun (19:19:44) (in thread): > What. No pacakge named scater?

Aaron Lun (19:21:06) (in thread): > This is surreal.

Aaron Lun (19:24:11) (in thread): > Ah. The docker image itself failed to install scater, because it failed to verify the download size for ggbeeswarm from Rstudio. Wacky.

Aaron Lun (19:25:43): > @Nitesh Turagais there a way to fail the entire image build if an R package fails to install? Looks like it just quietly keeps on going, yielding a somewhat broken image.

Vince Carey (19:29:54) (in thread): > So that was a random event hm? We need a retry method. This fault-tolerant library/installer may be needed.

Aaron Lun (19:31:11) (in thread): > well, the ideal solution would be for the docker build to fail so that I know to retry it. Once the docker image is built, there should not be any further issues.

Vince Carey (19:31:43) (in thread): > In a sense it seems antithetical to the containerization approach to leave so much up to realtime installation. We could make a book-defined binary repo, or install them into the container. Is the latter approach the intended approach?

Aaron Lun (19:32:24) (in thread): > Hold on, hold on. The installation is already happening during container build.

Vince Carey (19:32:47) (in thread): > OK – so it was supposed to be there but because of a random event it was not.

Aaron Lun (19:32:53) (in thread): > yes.

Aaron Lun (19:33:25) (in thread): > That build should have failed much more noisily.

Vince Carey (19:34:09) (in thread): > OK, interesting. So you will build and push again and then I will try again.

Aaron Lun (19:34:44) (in thread): > yep. Probably the build will finish in an hour.

Aaron Lun (19:34:50) (in thread): > all things going well.

Aaron Lun (19:34:56) (in thread): > Are you already using anvil for this?

Aaron Lun (19:35:57) (in thread): > My god. It’s a 7 GB image!

Aaron Lun (19:46:09) (in thread): > Okay, the image is rebuilding. Note that the book content now lives in/home/bookto avoid messing with any of the stuff in/.

Vince Carey (20:10:49) (in thread): > no i am not using anvil yet. just verifying that i have a working system before doing the surgery on the dockerfile to get it to run on anvil. i have a new ubuntu 20 dell xps with 1TB SSD so i can use docker without constantly filling disk

Aaron Lun (20:16:47) (in thread): > what needs to happen to the dockerfile to get it to work on Anvil?

Vince Carey (21:40:04) (in thread): > i sent you an email on that… it is a conjecture … but i think it is likely to work

2020-07-24

Aaron Lun (00:01:11) (in thread): > loks like it’s done.

Aaron Lun (00:23:19) (in thread): > can repro the error.

Aaron Lun (01:00:27) (in thread): > aha. needed aSys.setenv(EXPERIMENT_HUB_ASK=FALSE)in there. Will modify the Dockerfile in a bit.

Nitesh Turaga (08:37:16): > Hmm…fail the entire build….You can install packages separately ininstall.Rscript. Say something like this in the Dockerfile > > ADD install.R /tmp/ > > RUN Rscript /tmp/install.R && rm -rf /tmp/install.R > > Then, maybe do some magic in the R file. WithtryCatch{}, and if there is an exceptionsys.exit()?

Nitesh Turaga (08:57:30): > But that may just exit the R process but not break the dockerfile build.

Nitesh Turaga (08:59:41): > The way to make a RUN command fail is RUN exit 1

Kevin Rue-Albrecht (09:33:47): > Isn’t that a job forpipefail? (disabled by default)https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html#:~:text=pipefail,posix - Attachment (gnu.org): The Set Builtin (Bash Reference Manual) > The Set Builtin (Bash Reference Manual)

Kevin Rue-Albrecht (09:34:40): > See alsohttps://sipb.mit.edu/doc/safe-shell/ - Attachment (sipb.mit.edu): Writing Safe Shell Scripts > MIT Student Information Processing Board

Nitesh Turaga (09:34:44): > Do you have a working example of this in a dockerfile?

Kevin Rue-Albrecht (09:35:04): > nope - I haven’t got to that use case yet. Just offering a hunch

Nitesh Turaga (09:35:31): > I see

Nitesh Turaga (09:35:37): > ^ get it?:smile:

Kevin Rue-Albrecht (09:36:03): - File (PNG): image.png

Kevin Rue-Albrecht (09:38:08) (in thread): > life at work (before lockdown…) has become hell: I hadn’t realised how often people around me said “I see” > Now it’s just impossible to focus. I hear those words All. The. Time.

Kevin Rue-Albrecht (09:39:04) (in thread): > It’s almost refreshing working from home. Much less frequent:laughing:

Kevin Rue-Albrecht (09:40:56): > Back to topic, I would try > > RUN set -euf -o pipefail > > or perhaps more simply > > RUN set -e -o pipefail > > based onhttps://sipb.mit.edu/doc/safe-shell/

Kevin Rue-Albrecht (09:42:11): > Normally the settings should persist and apply to the following RUN instructions.

Kevin Rue-Albrecht (09:42:45): > Whether it’s enough to cause the error exit as desired… I’m just as curious as you to know

Aaron Lun (11:32:39) (in thread): > should be working now, give it a spin.

2020-07-30

Ayush Raman (12:44:32): > @Ayush Raman has joined the channel

2020-07-31

bogdan tanasa (13:56:34): > @bogdan tanasa has joined the channel

2020-08-17

Roye Rozov (02:10:15): > @Roye Rozov has joined the channel

2020-09-25

Anna Liza Kretzschmar (22:55:30): > @Anna Liza Kretzschmar has joined the channel

2020-10-06

Aaron Lun (00:29:23): > Hey, who owns the copyright forr this thing?

Aaron Lun (00:37:57): > Well, congratulations. Bioc owns the copyright, I suppose.

Aaron Lun (00:46:48): > And why do we have such an annoying CC BY-NC-ND license? Why can’t we just use CC-BY?

Stephanie Hicks (14:48:51): > i honestly can’t remember how we landed on that copyright.

Aaron Lun (14:49:36): > well, the copyright is probably “The Authors, 2020”. The license could be relaxed to CC BY which is probably more reasonable… I mean, no commercial use?

Lukas Weber (14:59:49): > I think CC BY is usually better, since NC can cause unexpected problems. I seem to remember there was a blog post by Lior Pachter about how NC caused problems with using kallisto in Galaxy.

Aaron Lun (15:00:43): > without commercial use, that kind of means I can’t even use it.

Lukas Weber (15:00:47): > yep

Lukas Weber (15:00:53): > strictly speaking

2020-10-11

Kozo Nishida (21:42:23): > @Kozo Nishida has joined the channel

2020-10-18

Adele Barugahare (20:25:36): > @Adele Barugahare has joined the channel

2020-10-30

brian capaldo (13:47:41): > Can I just say that the random comments in code blocks in the book keep me very engaged.

Aaron Lun (13:51:01): > we aim to please

brian capaldo (13:56:59): > is it a horrible idea to perform DE on a pseudotimecourse with limma, or should I stick with monocle’s functions (they just take forever sometimes)?

Aaron Lun (14:17:33): > plain ol’ linear models are usually decent for finding the major trends

brian capaldo (14:18:31): > that’s what I figured

Aaron Lun (14:19:01): > there’s not much point using limma specifically, though, because you’ll have so many cells that the empirical Bayes shrinkage won’t matter.

Aaron Lun (14:19:15): > TSCAN::testPseudotimehas a simple wrapper that sorts out the spline for you.

brian capaldo (14:19:42): > aw man, I was hoping to avoid yet another library

brian capaldo (14:19:48): > I’ll check it out

brian capaldo (14:21:09): > i probably need it to sort out my lineage barcodes anyways

2020-11-05

brian capaldo (12:47:31): > thanks forTSCAN::testPseudotime(), works very nice. I have a quick question. I’m running it on a Y shaped trajectory, and there’s only 1 column in my pseudotime vector. I should be splitting the Y into two separate paths and testing them independently, correct?

Aaron Lun (23:39:02): > A Y, or a V? There should be at least two columns in the pseudotime if there’s a branch point in there.

2020-11-06

brian capaldo (10:06:20): > It’s y shaped, I figured it out. Using monocle3 so I had to do some igraph manipulation to format correctly

2020-11-11

Joshua Shapiro (09:09:16): > @Joshua Shapiro has joined the channel

2020-11-12

Philippe Boileau (15:08:51): > @Philippe Boileau has joined the channel

2020-11-14

Kozo Nishida (01:01:25): > Does bookdown have the ability to combine multiple languages (translations) into one site? > (If bookdown has it, ) it would be nice if osca-book (and other online book packages) would accept (language) translations (e.g. Spanish, Mandarin, Japanese) as pull requests. > What do you think about this?

Aaron Lun (01:22:05): > If you can figure out how to do it, I would be open to it.

Aaron Lun (02:44:43): > One immediate concern would be how to keep those translations in sync.

Kozo Nishida (03:18:12): > Unfortunately, bookdown doesn’t seem to have a mechanism to manage multiple language translations with 1 repo (like Sphinx). > (Sorry, I didn’t know that…)

Kozo Nishida (03:22:50): > We may need to submit the translated “online book” package as another package (like OSCA-ja, SingleRBook-ja). > (But I don’t know if Bioconductor will accept such approach.)

Kozo Nishida (03:25:26): > As you have mentioned, syncing the translation with the original is also an issue.

Hervé Pagès (05:07:26): > One concern with this approach is that all the book translations contain the same code chunks so basically we would end up building many times the same thing. E.g. 10 x 4h = 40h just for having 10 translations of the OSCA book. Doesn’t sound optimal.

Stephanie Hicks (07:31:43): > @Kozo Nishidaanother approach might be to convert the RMarkdown text in the book to another language using something like Amazon Polly. We use the ari package to convert courses we build in R Markdown into videos in other languages.https://johnmuschelli.com/ari_paper/

2020-11-19

Kevin Blighe (08:30:09): > @Kevin Blighe has joined the channel

David Dittmar (08:33:13): > @David Dittmar has joined the channel

2020-11-24

David Dittmar (04:28:31): > Hello! Just a minor comment on Section 6.5 Removing low-quality cells: > > lost <- calculateAverage(counts(sce.416b)[,!discard]) > kept <- calculateAverage(counts(sce.416b)[,discard]) > > should be > > lost <- calculateAverage(counts(sce.416b)[,discard]) > kept <- calculateAverage(counts(sce.416b)[,!discard]) > > :blush:

Kevin Rue-Albrecht (07:02:56): > https://github.com/Bioconductor/OrchestratingSingleCellAnalysis/blob/9fe25f178ffefa4cdb9f571889032d02333573c5/quality-control.Rmd#L413@David Dittmarfork -> edit -> PR and get your name as a contributor to this monster:grin:

David Dittmar (09:48:09) (in thread): > thank you! will do!

2020-12-02

Konstantinos Geles (Constantinos Yeles) (05:43:08): > @Konstantinos Geles (Constantinos Yeles) has joined the channel

2020-12-03

Mikhael Manurung (08:32:33): > I noticed that Aaron used the word “obligatory” a lot for tSNE/UMAP plots. To me, it sounded like he is implying that these plots are not giving any additional insights but people will always ask for it so he did it anyway. Am I reading too much into it?

Stephanie Hicks (09:33:20): > I don’t want to speak for@Aaron Lun, but it is true that some version of tsne/umap plots are almost always in single cell papers.

Aaron Lun (11:13:18): > it’s 50% sarcasm. But only 50%.

Aaron Lun (11:16:30): > maybe give or take a few percent.

Mikhael Manurung (14:20:46): > One thing that I frequently heard about tSNE plots are about its usefulness for inspecting the quality of clustering (I don’t even know what was meant by quality). But how can we actually do that especially when tSNE may artificially breaks the embedding? We can’t even use it to infer the relatedness between clusters. I do use tSNE to quickly check the similarity of my cells across batches. Is there any principled use of tSNE plots?

2020-12-12

Huipeng Li (00:39:13): > @Huipeng Li has joined the channel

2020-12-17

Hervé Pagès (14:23:30): > In Figure 4.1 (https://bioconductor.org/books/3.12/OSCA/data-infrastructure.html#background), the yellow strip is located near the top of the colData and reducedDims components so corresponds to cells with low indices (1, 2, 3, 4). OTOH the yellow strip in the assays component is located to the right which suggests cells with high indices. Should probably be moved to the left. > FWIW some early drafts of the Nature paper had the same issue but I believe it was corrected in the final version.

2020-12-21

Harithaa Anand (04:11:04): > @Harithaa Anand has joined the channel

2021-01-01

Bernd (14:06:05): > @Bernd has joined the channel

2021-01-10

Aaron Lun (21:17:55): > @Hervé PagèsI have some craaazy ideas to split up the book into separate compilation units while preserving the smoothness of links between the chapters.

Aaron Lun (21:18:42): > Amongst other things, it will require the books to become full fledged packages so that I can import the linking information across books.

Aaron Lun (21:19:20): > The idea would be to improve the resilience of the book to sporadic failures in one section or another.

2021-01-11

Aaron Lun (01:36:29): > my god. It’ll be beautiful.

Aaron Lun (01:36:50): > OSCA.intro, OSCA.basic, OSCA.advanced, OSCA.workflows

2021-01-12

Hugo Tavares (04:04:31): > @Hugo Tavares has joined the channel

Hervé Pagès (15:14:24): > @Aaron LunSounds good, I’m a big fan of modularization/compartmentalization too. But I guess this means we’ll need to revisit the deployment mechanism so all the OSCA.xxx subbooks deploy to the same URL or what do you have in mind?

Aaron Lun (15:16:09): > I think that each subbook will be its own package and can deploy to its own location, i.e., books/3.12/OSCA.basic. The tricky part is to be able to link smoothly between books; this will be handled on the R side rather than having anything to do with the BBS.

Aaron Lun (15:16:37): > We can start with OSCA.intro, which is pretty lightweight; I can prep that up.

Hervé Pagès (15:17:01): > In other words I won’t have anything to do. I like that:wink:

Aaron Lun (15:17:21): > well, aside from waving them through the submission process

Hervé Pagès (15:17:41): > of course

Hervé Pagès (15:21:37): > what about deploying to sub-locations e.g. books/3.12/OSCA/intro, books/3.12/OSCA/basic, etc.. so we keep the full thing under books/3.12/OSCA?

Aaron Lun (15:22:41): > that could also be a possibility. Would just require some change in the smart-linking to detect the extra layer of redirection, e.g., based on a field in theDESCRIPTION.

Hervé Pagès (15:24:01): > I’ll let you figure out what’s easier. In any case, supporting the subdeployments would be an easy change to the current deployment script.

2021-01-16

Aaron Lun (00:36:30): > alright! it’s time! long weekend, time to carve up the book.

2021-01-17

Aaron Lun (05:16:28): > IT HAS BEGUN

2021-01-18

Aaron Lun (03:00:53): > My god, it’s beautiful.

2021-01-21

Aaron Lun (03:03:09): > Y’know, now that I look at it, I realize that some of my chapters are effectively books in their own right, and have sections that should be split into separate chapters.

2021-01-22

Martin Morgan (09:47:49): > https://bioconductor.org/books/devel/OSCA/fromhttps://bioconductor.org/books/devel/doesn’t lead anywhere; should it?

Hervé Pagès (09:52:41): > This is expected. I don’t think the book has ever propagated in devel.

Aaron Lun (11:17:22): > that is correct. Part of the motivation for the fragmentation is to improve the probability of the build.

Annajiat Alim Rasel (15:44:54): > @Annajiat Alim Rasel has joined the channel

2021-02-01

Aaron Lun (01:29:26): > all redirects are now in place.

2021-02-11

Andrew Jaffe (11:51:04): > @Andrew Jaffe has joined the channel

2021-02-13

Aaron Lun (23:34:55): > I have noticed… that I love parentheses.

Aaron Lun (23:35:02): > my god, there’s so many damn parentheses.

Aaron Lun (23:35:11): > (Even entire sentences are often in parentheses.)

Aaron Lun (23:35:18): > (Or in fact, entire paragraphs.)

2021-02-14

Aaron Lun (01:18:01): > https://github.com/OSCA-source

2021-02-15

Aaron Lun (00:11:55): > No one’s going to comment on the super-awesome organization avatar? Took me an hour to make.

2021-02-24

Russ Bainer (18:52:14): > @Russ Bainer has joined the channel

2021-02-26

Aaron Lun (15:11:26): > @Mikhael Manurung@Wes W@Ana Beatriz Villaseñor Altamiranoyou are the three volunteers I have collected so far, so I’m just organizing everyone over here.

Ana Beatriz Villaseñor Altamirano (16:33:21): > @Ana Beatriz Villaseñor Altamirano has joined the channel

Ana Beatriz Villaseñor Altamirano (16:33:57): > Excellent! nice to meet you all:smile:

Aaron Lun (16:36:19): > So, let me remind myself who wants to do what.

Aaron Lun (16:37:06): > IIRC@Mikhael Manurungwas going to start work on a miloR chapter. I should add that the authors are also interested in contributing something, so it could be a good opportunity to get some more heads into the system.

Aaron Lun (16:37:27): > And@Wes Wwas going to give a shot at doing the multimodal analysis subbook.

Aaron Lun (16:37:51): > there is another item of work but it’s pretty boring and It hink we should ocncentrate on these two pieces anyway.

Wes W (18:29:39): > @Wes W has joined the channel

Wes W (18:34:46): > perfect, I have a big CITE-seq / BCRseq / scRNA 10X experiment that just came off the novaseq this week, so next week I will be jumping into the analysis and it will be a good chance for me to start some sub chapter for the multimodal work as i go…

Aaron Lun (18:36:07): > Right - a goood way to do it would be to figure out the gaps in the BioC ecosystem and we’ll just progressively fill them in as we go along.

2021-02-28

Wes W (09:21:25): > perfect

2021-03-08

yue you (21:29:00): > @yue you has joined the channel

2021-03-12

Ana Beatriz Villaseñor Altamirano (00:17:26): > How can I/we help? Replicate the analysis?

Wes W (09:47:56): > So I have started working on doing CITE-seq / scRNA seq stuff from my pipeline… just substituting some of my packages to use all Bioconductor packages… little bit of data wrangling but its going well… I will share the github for the new multimodal book tonight , send me your github user names and I will add you to the team and people can start adding in and working on the sections they want too

Aaron Lun (11:18:07): > excelelnt, excellent

Aaron Lun (11:18:23): > I’m stil shuffling content around between the existing OSCA subbooks to make it easier to read.

Aaron Lun (19:59:38): > I’d be interested in knowing how much value Seurat’s complicated WNN method adds to the downstream analysis, for example.

Aaron Lun (20:00:20): > @Ana Beatriz Villaseñor Altamiranowhat would you like to do? Bunch of options here.

Aaron Lun (20:01:14): > You could help@Wes Wor you could go through the current book and tell me the bits that you find confusing so that I can clarify them.

Aaron Lun (20:03:08): > Or you could help write the milo chapter, but I don’t know who has gotten started. tagging@Mike Morgan, plus@Mikhael Manurungexpressed some interest in doing this.

2021-03-13

Ana Beatriz Villaseñor Altamirano (10:04:13): > Github:https://github.com/AnaBVA@Aaron LunI don’t have any preferences. I can start by giving some comments/thoughts on the current book and when someone else needs/wants help I jump into it.

Aaron Lun (14:42:30): > Sounds good. Just post issues in the relevant repos athttps://github.com/OSCA-source/

2021-03-15

Mike Morgan (05:12:08): > @Mike Morgan has joined the channel

2021-03-17

Mike Morgan (07:44:46): > @Aaron Lunwe’ve not started on the Milo chapter - do you have a deadline that we can work towards/stop this getting sidelined on our end?

Aaron Lun (11:26:49): > Ideally this release at the end of april. Kind of depends on getting milo in BioC first, tho

2021-03-18

Aaron Lun (03:07:50): > @Alan O’Ccan you add a section about densvis and snifter tohttps://github.com/OSCA-source/OSCA.advanced/blob/master/inst/book/more-reddim.Rmd? - Attachment: inst/book/more-reddim.Rmd > ``> --- > output: > html_document > bibliography: ref.bib > --- > > # Dimensionality reduction, redux > > > > ## Overview > >link(“dimensionality-reduction”, “OSCA.basic”)` introduced the key concepts for dimensionality reduction of scRNA-seq data. > Here, we describe some data-driven strategies for picking an appropriate number of top PCs for downstream analyses. > We also demonstrate some other dimensionality reduction strategies that operate on the raw counts. > For the most part, we will be again using the @zeisel2015brain dataset: > > >

library(scran)
top.zeisel <- getTopHVGs(dec.zeisel, n=2000)
set.seed(100) 
sce.zeisel <- fixedPCA(sce.zeisel, subset.row=top.zeisel)

More choices for the number of PCs

Using the elbow point

A simple heuristic for choosing the suitable number of PCs \(d\) involves identifying the elbow point in the percentage of variance explained by successive PCs. This refers to the “elbow” in the curve of a scree plot as shown in Figure @ref(fig:elbow).

# Percentage of variance explained is tucked away in the attributes.
percent.var <- attr(reducedDim(sce.zeisel), "percentVar")

library(PCAtools)
chosen.elbow <- findElbowPoint(percent.var)
chosen.elbow

plot(percent.var, xlab="PC", ylab="Variance explained (%)")
abline(v=chosen.elbow, col="red")

Our assumption is that each of the top PCs capturing biological signal should explain much more variance than the remaining PCs. Thus, there should be a sharp drop in the percentage of variance explained when we move past the last “biological” PC. This manifests as an elbow in the scree plot, the location of which serves as a natural choice for \(d\). Once this is identified, we can subset the reducedDims() entry to only retain the first \(d\) PCs of interest.

# Creating a new entry with only the first 20 PCs, 
# which is useful if we still need the full set of PCs later. 
reducedDim(sce.zeisel, "PCA.elbow") <- reducedDim(sce.zeisel)[,1:chosen.elbow]
reducedDimNames(sce.zeisel)

From a practical perspective, the use of the elbow point tends to retain fewer PCs compared to other methods. The definition of “much more variance” is relative so, in order to be retained, later PCs must explain a amount of variance that is comparable to that explained by the first few PCs. Strong biological variation in the early PCs will shift the elbow to the left, potentially excluding weaker (but still interesting) variation in the next PCs immediately following the elbow.

Using the technical noise

Another strategy is to retain all PCs until the percentage of total variation explained reaches some threshold \(T\). For example, we might retain the top set of PCs that explains 80% of the total variation in the data. Of course, it would be pointless to swap one arbitrary parameter \(d\) for another \(T\). Instead, we derive a suitable value for \(T\) by calculating the proportion of variance in the data that is attributed to the biological component. This is done using the denoisePCA() function with the variance modelling results from modelGeneVarWithSpikes() or related functions, where \(T\) is defined as the ratio of the sum of the biological components to the sum of total variances. To illustrate, we use this strategy to pick the number of PCs in the 10X PBMC dataset.

library(scran)
set.seed(111001001)
denoised.pbmc <- denoisePCA(sce.pbmc, technical=dec.pbmc, subset.row=top.pbmc)
ncol(reducedDim(denoised.pbmc))

The dimensionality of the output represents the lower bound on the number of PCs required to retain all biological variation. This choice of \(d\) is motivated by the fact that any fewer PCs will definitely discard some aspect of biological signal. (Of course, the converse is not true; there is no guarantee that the retained PCs capture all of the signal, which is only generally possible if no dimensionality reduction is performed at all.) From a practical perspective, the denoisePCA() approach usually retains more PCs than the elbow point method as the former does not compare PCs to each other and is less likely to discard PCs corresponding to secondary factors of variation. The downside is that many minor aspects of variation may not be interesting (e.g., transcriptional bursting) and their retention would only add irrelevant noise.

Note that denoisePCA() imposes internal caps on the number of PCs that can be chosen in this manner. By default, the number is bounded within the “reasonable” limits of 5 and 50 to avoid selection of too few PCs (when technical noise is high relative to biological variation) or too many PCs (when technical noise is very low). For example, applying this function to the Zeisel brain data hits the upper limit:

set.seed(001001001)
denoised.zeisel <- denoisePCA(sce.zeisel, technical=dec.zeisel, 
    subset.row=top.zeisel)
ncol(reducedDim(denoised.zeisel))

This method also tends to perform best when the mean-variance trend reflects the actual technical noise, i.e., estimated by modelGeneVarByPoisson() or modelGeneVarWithSpikes() instead of modelGeneVar() (link("sec:spikeins", "OSCA.basic")). Variance modelling results from modelGeneVar() tend to understate the actual biological variation, especially in highly heterogeneous datasets where secondary factors of variation inflate the fitted values of the trend. Fewer PCs are subsequently retained because \(T\) is artificially lowered, as evidenced by denoisePCA() returning the lower limit of 5 PCs for the PBMC dataset:

dec.pbmc2 <- modelGeneVar(sce.pbmc)
denoised.pbmc2 <- denoisePCA(sce.pbmc, technical=dec.pbmc2, subset.row=top.pbmc)
ncol(reducedDim(denoised.pbmc2))

Based on population structure

Yet another method to choose \(d\) uses information about the number of subpopulations in the data. Consider a situation where each subpopulation differs from the others along a different axis in the high-dimensional space (e.g., because it is defined by a unique set of marker genes). This suggests that we should set \(d\) to the number of unique subpopulations minus 1, which guarantees separation of all subpopulations while retaining as few dimensions (and noise) as possible. We can use this reasoning to loosely motivate an a priori choice for \(d\) - for example, if we expect around 10 different cell types in our population, we would set \(d \approx 10\).

In practice, the number of subpopulations is usually not known in advance. Rather, we use a heuristic approach that uses the number of clusters as a proxy for the number of subpopulations. We perform clustering (graph-based by default, see link("clustering-graph", "OSCA.basic")) on the first \(d^**\) PCs and only consider the values of \(d^**\) that yield no more than \(d^*+1\) clusters. If we detect more clusters with fewer dimensions, we consider this to represent overclustering rather than distinct subpopulations, assuming that multiple subpopulations should not be distinguishable on the same axes. We test a range of \(d^*\) and set \(d\) to the value that maximizes the number of clusters while satisfying the above condition. This attempts to capture as many distinct (putative) subpopulations as possible by ret…

Aaron Lun (03:08:28): > Just run it on Zeisel and stick it at the end after my failed attempt to use NMF.

Alan O’C (05:38:49): > @Alan O’C has joined the channel

Alan O’C (05:39:48): > Sure, I don’t promise to be extremely quick about it though

2021-03-19

Aaron Lun (18:04:19): > @Kelly Street@Koen Van den Berge@Hector Roux de Bézieuxwould any/some/all of you be interested in creating an OSCA.trajectory subbook?

Aaron Lun (18:04:47): > BioC’s book infrasturcture gives you a lot of space (and compute time) to go into a lot of detail about the best workflow and various options on real, big datasets.

Aaron Lun (18:05:46): > And of course, if you write the book, you can pick the software to use.

2021-03-20

Kelly Street (22:00:58): > Yeah, I may be interested. I don’t know if there is a whole book’s worth of stuff to say there, but it definitely would be nice to have a big unified resource to point at.

2021-03-21

Alexander Toenges (07:47:02): > Can I give some suggestions towards points that I would find valuable and which are frequently asked or discussed e.g. at biostars or various bioinfo Slacks: > * choice of dim. reduction – which are recommended, how does outcome differ and which are clearly not recommended > * common pitfalls during analysis, choice of alternatives to the default options and how that influences results /in which situation can it be beneficial to change defaults > * useful diagnostics to decide whether a trajectory makes sense > * basically just seeing the tools in action together with some nice plotting and analysis examples can already give inspiration to many users and would therefore be valuable > * a good description of the method to new users that is somewhat easy to understand without reading the entire paper that the tool is based on, strengths and limitations, hands-on advises, something like this. Especially this point is imho (among many others) a key advantage of OSCA vs e.g. Seurat vignettes which often do not explain at all what id going on under the hood

Aaron Lun (14:13:03): > bring on the issues or pRs.

Alexander Toenges (16:08:37): > that was all related to the trajectory section you mentioned.

Aaron Lun (20:52:25): > oh

Aaron Lun (20:53:12): > well,@Kelly Street, I’m sure that between the vignettes for slingshot, tradeSeq, and condiment, there’s enough for three chapters. Plus another few for interesting case studies. And I’m sure we could flesh out the RNA velocity section as well in the current chapter.

Aaron Lun (20:53:22): > If everyone chips in with a chapter, we’ll have this done in no time.

2021-03-22

Koen Van den Berge (07:38:24): > @Koen Van den Berge has joined the channel

Kelly Street (11:15:31): > Sounds good, I’d be happy to help out!

Koen Van den Berge (17:26:28): > Thanks for the suggestion Aaron, we’re hoping to be able to do this!

Alan O’C (21:11:23): > To get bookdown to run, I had toinstall_githubthe book sections (otherwisepackageDescriptiondoesn’t findOSCA.advanced). Is that right or did I just miss a trick somewhere?

Aaron Lun (22:13:27): > Hm. I thought OSCA.advanced was discoverable byBiocManager::install, at least for devel. Perhaps@Hervé Pagèsmight know some specifics.

2021-03-23

Hervé Pagès (06:17:33): > The latest BiocManager knows about the book repo: > > > library(BiocManager) > Bioconductor version 3.13 (BiocManager 1.30.11), ?BiocManager::install for help > > repositories() > BioCsoft > "[https://bioconductor.org/packages/3.13/bioc](https://bioconductor.org/packages/3.13/bioc)" > BioCann > "[https://bioconductor.org/packages/3.13/data/annotation](https://bioconductor.org/packages/3.13/data/annotation)" > BioCexp > "[https://bioconductor.org/packages/3.13/data/experiment](https://bioconductor.org/packages/3.13/data/experiment)" > BioCworkflows > "[https://bioconductor.org/packages/3.13/workflows](https://bioconductor.org/packages/3.13/workflows)" > BioCbooks > "[https://bioconductor.org/packages/3.13/books](https://bioconductor.org/packages/3.13/books)" > CRAN > "[https://cran.rstudio.com](https://cran.rstudio.com)" > > butOSCA.advancedis not there because I don’t think it has built since I moved the book builds from rex3 to malbec2 on March 10.

Philipp Schäfer (13:02:49): > @Philipp Schäfer has joined the channel

2021-03-24

Hervé Pagès (11:20:37): > Book builds are all green todayhttps://bioconductor.org/checkResults/3.13/books-LATEST/andBiocManager::install("OSCA.advanced")now should work.:tada:

Aaron Lun (11:22:16): > oh, it’s beautiful

Hervé Pagès (11:28:11): > yes and the perfect horizontal alignment of the names in the Maintainer column is neat:wink:

Aaron Lun (11:34:43): > there’s more where that came from!

Aaron Lun (11:35:01): > actually, there is more, OSCA.multisample should be making its way to you at some point.

Nitesh Turaga (11:35:11): > @Hervé Pagès@Aaron LunAre there any packages which are not in Bioconductor or CRAN which are needed to build the book ?

Nitesh Turaga (11:35:20): > @Vince CareyMight be interested in this question too

Aaron Lun (11:35:30): > not beyond other book packages.

Aaron Lun (11:35:39): > the books need each other to build.

Nitesh Turaga (11:35:53): > Huh…i might need you to elaborate a little bit…

Aaron Lun (11:36:05): > the books link to each other, and they share R objects between each other.

Aaron Lun (11:36:22): > So they basicallySuggestseach other so that everyone is present at install time.

Nitesh Turaga (11:36:52): > Oh i see….so if I were to build this book on the AnVIL, i’d have to build all of them …

Aaron Lun (11:37:19): > Technically, you have toinstallall of them, even if you only build one.

Hervé Pagès (12:13:57) (in thread): > No. The build system only knows about the CRAN and BioC repos, likeBiocManager::install()does, and installs everything from that.

Hervé Pagès (23:56:10): > Maybe you’re just simplifying but if that’s literally the case then this should be reflected in the deps. For exampleOSCA.advancedonly importsOSCA.workflowsand suggestsOSCA.basic. This means that it only needs those 2 subbooks to be installed before it can be built. Please don’t assume that the build system will always install all the subbooks before it starts building them. In theory it should be enough that it installsA’s deps (including suggested packages) before it can buildA.

2021-03-25

Aaron Lun (00:33:05): > Yes, that’s right.

Aaron Lun (00:34:31): > The problem is thatBiocManager::install()apparently doesn’t install the other books, so it’s just safest to install them all if one is doing it manually.

Hervé Pagès (01:55:26): > By “the other books” I guess you mean those in Suggests. Yeah it’s a bummer that usingdependencies=c("Depends", "Imports", "LinkingTo", "Suggests")treats Suggests recursively, leading to installation of hundreds of unneeded packages. Has always been a deal breaker for me. It’s sad because I suspect that the typical use case for passing"Suggests"to thedependenciesargument, at least for developers, is to install everything that will be needed to runR CMD buildorR CMD checkon the package.

Aaron Lun (02:23:53): > On other matters: is OSCA.multisample going through the system automatically, or do you have to manually add it to the book builds?

Hervé Pagès (10:54:50): > I shouldn’t have to do anything.

Hervé Pagès (10:59:53): > Oh, looks like it’s been added to the workflows manifest instead of the books manifest. Let me fix that.

Hervé Pagès (11:05:07): > done

2021-03-26

Hervé Pagès (13:34:48): > OSCA.multisamplehas joined the party:https://bioconductor.org/checkResults/3.13/books-LATEST/

Aaron Lun (13:35:14): > great

Aaron Lun (13:35:24): > failure is expected, i”ll have a look at it later.

2021-03-27

Aaron Lun (21:43:35): > Was I meant to be able to doBiocManager::install("OSCA"), or are we still waiting for BiocManager on CRAN?

Hervé Pagès (22:26:33): > Works for me. But this is only supported starting with BioC 3.13. Are you trying to do this in release?

Aaron Lun (23:15:48): > I’m getting the old: > > > BiocManager::install("OSCA") > Bioconductor version 3.13 (BiocManager 1.30.10), R Under development (unstable) > (2021-01-24 r79876) > Installing package(s) 'OSCA' > Warning message: > package ‘OSCA’ is not available for this version of R > > on a mostlyvalid()installation. (Ignoring packages that are too new because I’m develping them).

2021-03-28

Hervé Pagès (02:01:08): > Hmm.. I have BiocManager 1.30.11. Oh right, I probably installed it at some point from GitHub (https://github.com/Bioconductor/BiocManager) because I wanted to give it a try. Can’t remember exactly what I did, that must have been at least 1 month ago. Sorry for the confusion.

Aaron Lun (22:22:33): > Hm. I’m still not seeing OSCA on 1.30.12.

2021-03-29

Hervé Pagès (15:03:08) (in thread): > > > library(BiocManager) > Bioconductor version 3.13 (BiocManager 1.30.12), ?BiocManager::install for help > > > repositories() > BioCsoft > "[https://bioconductor.org/packages/3.13/bioc](https://bioconductor.org/packages/3.13/bioc)" > BioCann > "[https://bioconductor.org/packages/3.13/data/annotation](https://bioconductor.org/packages/3.13/data/annotation)" > BioCexp > "[https://bioconductor.org/packages/3.13/data/experiment](https://bioconductor.org/packages/3.13/data/experiment)" > BioCworkflows > "[https://bioconductor.org/packages/3.13/workflows](https://bioconductor.org/packages/3.13/workflows)" > BioCbooks > "[https://bioconductor.org/packages/3.13/books](https://bioconductor.org/packages/3.13/books)" > CRAN > "[https://cran.rstudio.com](https://cran.rstudio.com)" > > > install("OSCA") > Bioconductor version 3.13 (BiocManager 1.30.12), R Under development (unstable) > (2021-03-08 r80083) > Installing package(s) 'OSCA' > trying URL '[https://bioconductor.org/packages/3.13/books/src/contrib/OSCA_1.1.17.tar.gz](https://bioconductor.org/packages/3.13/books/src/contrib/OSCA_1.1.17.tar.gz)' > Content type 'application/x-gzip' length 444451 bytes (434 KB) > ================================================== > downloaded 434 KB > > * installing **source** package 'OSCA' ... > **** using staged installation > **** inst > **** help > No man pages found in package 'OSCA' > ***** installing help indices > **** building package indices > **** installing vignettes > **** testing if installed package can be loaded from temporary location > Warning: replacing previous import 'BiocStyle::pdf_document' by 'rmarkdown::pdf_document' when loading 'OSCA' > Warning: replacing previous import 'BiocStyle::md_document' by 'rmarkdown::md_document' when loading 'OSCA' > Warning: replacing previous import 'BiocStyle::html_document' by 'rmarkdown::html_document' when loading 'OSCA' > **** testing if installed package can be loaded from final location > Warning: replacing previous import 'BiocStyle::pdf_document' by 'rmarkdown::pdf_document' when loading 'OSCA' > Warning: replacing previous import 'BiocStyle::md_document' by 'rmarkdown::md_document' when loading 'OSCA' > Warning: replacing previous import 'BiocStyle::html_document' by 'rmarkdown::html_document' when loading 'OSCA' > **** testing if installed package keeps a record of temporary installation path > * DONE (OSCA) > > The downloaded source packages are in > '/tmp/RtmpVtMvYq/downloaded_packages' > Updating HTML index of packages in '.Library' > Making 'packages.html' ... done >

2021-03-30

Aaron Lun (02:24:33) (in thread): > huh.

Aaron Lun (02:24:50) (in thread): > wasn’t working last night, but whatever.

Hervé Pagès (12:52:12) (in thread): > ¯*(ツ)*/¯

2021-03-31

Ana Beatriz Villaseñor Altamirano (14:31:04) (in thread): > I had the same problem withinstall("OSCA")but restarting R session solved the issue. Maybe a broken connection with a repo?

Hervé Pagès (15:18:13) (in thread): > or maybe some caching (local or on the Amazon CloudFront side) got in the way

2021-04-01

RGentleman (12:07:32): > @RGentleman has joined the channel

2021-04-02

Hervé Pagès (12:24:45): > @Aaron LunClicking on the “Orchestrating Single-Cell Analysis with Bioconductor” link inhttps://bioconductor.org/books/3.13/OSCA.multisample/takes me to a plain HTTP connection. Any chance relative URLs can be used to cross link the sub-books? - Attachment (bioconductor.org): Multi-Sample Single-Cell Analyses with Bioconductor > Multi-Sample Single-Cell Analyses with Bioconductor

2021-04-03

Vince Carey (05:45:42): > I’d vote for having a landing page like a package landing page, that provides information on installation. I think there is a plan for this,@Lori Shepherd@Hervé Pagès

Lori Shepherd (05:45:51): > @Lori Shepherd has joined the channel

Aaron Lun (06:26:52): > this has been discussed.

Aaron Lun (06:36:15) (in thread): > Not easily. The books are all in different packages, and no book knows the relative path to any other book. One could make it work by applying insider knowledge of how you organized the books’ HTMLs, but it would mean that those links would be useless for anyone attempting to build the books outside of the BBS.

Aaron Lun (06:37:13) (in thread): > I know such people exist because I’ve received a fair number of people asking about the inane license. I don’t know how many people are actively rebuilding the entire set of books, but I could well imagine that they’re tweaking individual chapters and compiling them, and the links should at least be comprehensible in those cases.

Aaron Lun (06:39:12) (in thread): > If there is a compelling reason to have relative links, I guess we could have ****rebook**** respond to an environment variable that specifies the path to the directory containing all the books. That would be pretty fragile and would break for any book with Rmd’s in subdirectories, but it would work well enough for the current set-up.

Lori Shepherd (10:12:37): > Yes. We just haven’t gotten to it yet. We need to make a views page for it and then I can make updates for auto generation on the website code.

Hervé Pagès (15:14:13) (in thread): > How about working under the assumption that all the sub-books are going to be deployed underhttps://bioconductor.org/books/3.13/OSCA/?

Hervé Pagès (15:16:33) (in thread): > or underhttp://some.place.org/some/path/OSCA

Hervé Pagès (15:19:14) (in thread): > then no need for an environment variable to control rebook?

Aaron Lun (15:40:52) (in thread): > I can’t see how this will work easily. It would requirerebook::linkto understand whether the target book was in the same “family” as the book that is currently being compiled. There’s no concept of subbooks inrebook; when I talk about the OSCA subbooks, this is just how I organize the books in my head, there’s no code to reinforce it formally.

Aaron Lun (15:42:53) (in thread): > One could add a family concept but it is superfluous for anything else. And even then, the relative paths still introduce a lot of fragility and assumptions that I mentioned above.

Hervé Pagès (15:55:55) (in thread): > It seems that you prefer to think of the OSCA subbooks as a collection of separated books that can be hosted in arbitrary places. I guess in my mind the subbooks are parts of the same big book with the OSCA component as the main component to put them all together. > Anyways, let’s use an environment variable to tell rebook the URL where all the components are going to be hosted. Will that address the original issue that if someone starts navigating the book in HTTPS mode they’ll remain in HTTPS mode, and if they start in HTTP mode they’ll remain in HTTP mode?

Aaron Lun (16:01:21) (in thread): > I didn’t realize that was the original issue. Well, it seems likeBiocStyle::Biocbookcreates a HTTP link rather than HTTPS. Seems like the simplest fix is to just make it use HTTPS all the time, we shouldn’t be using HTTP anyway.

Hervé Pagès (16:02:47) (in thread): > Yes, there’s the issue of remaining in HTTPS mode. But I also want to emphasize that relative URLs are so much better than absolute URLs so should be used whenever possible. For example, if the book uses relative URLs, then we can put the entire HTML version of the book in a tarball. Then people can extract the tarball on their laptop and can read and navigate the book even when they are offline.

Hervé Pagès (16:05:59) (in thread): > And using relative URLs also solves the issue of preserving the HTTP/HTTPS mode.

2021-04-04

Aaron Lun (07:38:07) (in thread): > But that’s the thing. What belongs in this hypothetical tarball? Each book canrebook::linkto any other BBS-produced book, so if we were to use relative paths, the tarball would potentially need to include every book in the build cohort. For example, I could link to non-OSCA books likeSingleRBookor the yet-to-be-published OSTA book. Even if you set it up so that people have to download the OSCA books as a set, I don’t have a mechanism for producing relative links within the OSCA suite (as mentioned above, there isn’t even a concept of such a suite) and HTTP(S) links to other books.

Hervé Pagès (14:53:36) (in thread): > Well, it’s really a matter of point of view here. If you consider that OSCA.intro and OSCA.basics are not more connected to each other than OSCA.intro and csawBook, then there’s indeed no reason to worry that all the cross-linking uses external links. It’s still kind of unfortunate though that this breaks mirroring.

Aaron Lun (17:01:07): > In any case,rebook(andbasilisk) have both been refactored to usedir.expiry.rebookhas an expiry interval of 7 days, whilebasiliskhas an expiry interval of 30 days. So e.g. if I bumped a book’s version at every build for a week, I would keep a maximum of 4 caches before one expires.

2021-04-08

Aaron Lun (03:00:46): > @Kelly Street@Koen Van den Bergehttps://github.com/OSCA-source/OSCA.trajectory

Aaron Lun (03:01:20): > slowly migrating all content there. I’ll need you to expand on the chapters for slingshot, tradeSeq and the other package whose name I’ve forgotten.

Stephanie Hicks (07:45:53) (in thread): > condiments

2021-04-17

Aaron Lun (00:50:27): > @Alan O’Cso am I going to get a PR for your reddim packages?

2021-04-18

Alan O’C (04:04:51): > Yeah, it’s on the way (ish)

Aaron Lun (20:53:05): > great. don’t worry about building the whole book, just make sure that chapter works.

2021-04-19

Alan O’C (08:04:54): > All good. Code’s written and I know what the text will be I just haven’t gotten as far as moving it from my brain into a text editor just yet

2021-05-11

Megha Lal (16:45:19): > @Megha Lal has joined the channel

2021-05-14

Aaron Lun (01:29:34): > @Alan O’Cwhat would it take to get you to write a short section on “visualization options for marker detection” at the bottom of marker-detection.Rmd in OSCA.basic?

Aaron Lun (01:30:25): > i can offer you a 20% off voucher for a dispensary

Aaron Lun (01:30:47): > and I’ll sweeten the pot with free delivery

Alan O’C (04:20:43): > Will see how bored I am this weekend (probably very)

Alan O’C (07:47:44): > There may be lots to do; I’m just a very boring person

2021-06-11

Wes W (11:33:00) (in thread): > did you do this? my saturday plans were canceled and I am happy to do some visualization options write up

Alan O’C (11:34:11) (in thread): > I did not

2021-06-12

Wes W (23:16:01) (in thread): > sending you a dm , i’ll jump on it

2021-07-16

Lori Shepherd (12:43:03): > @Lori Shepherd has left the channel

2021-07-19

Leo Lahti (17:01:56): > @Leo Lahti has joined the channel

2021-07-23

Leo Lahti (14:13:31): > As far as I can see, therunMDSfunction does not accept distance matrices that are calculated outside of this function. Correct? This would be handy because in some cases a user (like me) would like to calculate distances based on measures that are not included in the runMDS options (vegan::vegdist). Is there such option, or would it possible to add such option? Perhaps@Aaron Lunknows best?

Aaron Lun (14:14:53): > uh

Aaron Lun (14:15:06): > you could put a request inscater’s repo

Aaron Lun (14:15:24): > i think there was some work related to runMDS done by@Alan O’Cand someone else - maybe@FelixErnst

Leo Lahti (14:28:58): > Oh, ok. The function manpage mentioned you as the function author and this relates to some OSCA examples, so I though this could a natural place to chat before opening an issue. But yes, I could continue with scater unless the others have comments.

Batool Almarzouq (15:53:44): > @Batool Almarzouq has joined the channel

Alan O’C (15:59:24): > I thought the current version does support custom distance matrices, but yeah it was Felix’s work. Seehttps://github.com/Alanocallaghan/scater/pull/126(I thought there was an issue too but I can’t find it)

Alan O’C (16:00:25): > Well not distance matrices but a function to calculate distance matrices is about the same

2021-07-24

FelixErnst (06:50:23): > @FelixErnst has joined the channel

FelixErnst (06:53:01): > It don’t think, that runMDS can use a distance matrix in any event, since the function actually does create such a matrix and stores it inreducedDims. An appropriate distance matrix can of course be stored inreducedDims, but I think for a discussion the scater github repo would be better suited

2021-07-30

Kevin Blighe (16:17:21): > I wonder if there is any desired way to cite the SingleCellExperiment object graphical overview from OSCA? > So far, I have: > Amezquita R, Lun A, Hicks S, Gottardo R (2021), Chapter 4, Orchestrating Single-Cell Analysis with Bioconductor,http://bioconductor.org/books/3.13/OSCA.intro/the-singlecellexperiment-class.html.

Aaron Lun (16:18:20): > yesm tgat’s where its from

Aaron Lun (16:18:24): > oh, the paper.

Kevin Blighe (16:19:13): > Basically, just using the Figure 4.1 in my BioC 2021 slides

Aaron Lun (16:19:46): > just use the nat methods paper

2021-08-22

Leo Lahti (17:20:46): > Hi - I have been looking at OSCA source materials. I would be curious to know more about the motivation behindsplitting the book into multiple repositories- is this explained somewhere, or easy to summarize briefly? Is is just about managing dependencies or speeding up conversions during development?

2021-08-23

Aaron Lun (11:51:57): > well, the entire book was a major pain in the ass to build in one go

Aaron Lun (11:52:13): > i’m sure the history of this channel would attest to that

Aaron Lun (11:52:48): > failures in any one step would bring the entire build to a halt. sometimes not even my fault, e.g., if the Hubs were acting up.

Aaron Lun (11:53:20): > By having multiple builds, there was an improved chance that, at any build cycle, one of the subbooks would make it all the way to the end.

Aaron Lun (11:53:48): > Then the next build cycle doesn’t have to worry about that subbook, and so on, until all the subbooks are built.

Alexander Toenges (12:31:37): > How long does it take to build the entire book?

Aaron Lun (12:36:10): > dunno. maybe 2-3 hours

Aaron Lun (12:36:19): > single thread. Probably less now that it’s distributed into parallel builds.

Alan O’C (12:40:03): > Even the sub-books are pretty hefty builds, to be fair

Leo Lahti (12:57:23): > Ok clear, thanks.

Leo Lahti (12:59:02): > I was asking as we are building the other book on microbiome studies, and now looking at how to organize. No immediate need to split into multiple repositories but this may become necessary.

2021-09-09

Julien Roux (01:58:24): > @Julien Roux has joined the channel

2021-09-16

Henry Miller (18:35:15): > @Henry Miller has joined the channel

Henry Miller (18:35:21): > @Henry Miller has left the channel

2022-01-28

Megha Lal (11:13:57): > @Megha Lal has left the channel

2022-02-25

Ana Beatriz Villaseñor Altamirano (11:25:00): > Hi! I quick question, for citing the new bookOSCA, multi-sample, do you prefer citation from the BioconductorOSCA paper? or cited as a book or github repo? or both? - Attachment (bioconductor.org): Multi-Sample Single-Cell Analyses with Bioconductor > Multi-Sample Single-Cell Analyses with Bioconductor - Attachment (Nature): Orchestrating single-cell analysis with Bioconductor > Nature Methods - This Perspective highlights open-source software for single-cell analysis released as part of the Bioconductor project, providing an overview for users and developers.

Ludwig Geistlinger (12:52:47): > This is likely best answered by Aaron, but I think you always want to cite the paper. Citing the corresponding chapter in addition is optional, but would certainly not hurt.

Ana Beatriz Villaseñor Altamirano (13:53:08) (in thread): > Thanks!!

2022-03-01

GuandongShang (03:47:15): > @GuandongShang has joined the channel

GuandongShang (03:49:52): > Hi, everyone。 I am reading OCSA, I am wondering whether there is a plot showing all the package mentioned like tidyverse faimly - File (PNG): image.png - File (PNG): image.png

Stephanie Hicks (08:30:25): > Not to my knowledge@GuandongShang

Federico Marini (08:47:18): > There’s a (notoriously incomplete) slide we use in some workshops for iSEE you can use as a starter -https://isee.github.io/iSEEWorkshop2020Slides/#7

Federico Marini (08:47:30): > but that is far from being comprehensive

Martin Morgan (10:26:24): > Maybe a start would be to visit each ofhttps://github.com/orgs/OSCA-source/repositoriesand read the ‘Depends’ field of the description files, along the lines of > > href = "[https://raw.githubusercontent.com/OSCA-source/OSCA.advanced/master/DESCRIPTION](https://raw.githubusercontent.com/OSCA-source/OSCA.advanced/master/DESCRIPTION)" > pkgs = strsplit(read.dcf(url(href), "Depends"), ",[[:space:]]*")[[1]] > > Maybe the individual packages could be grouped based on biocViews, read (viaread.dcf) fromhttps://bioconductor.org/packages/devel/bioc/VIEWS(alsodata/experimentanddata/annotationin addition tobioc)? > > For what it’s worth I think there area lotof packages involved > > > db = available.packages(repos = BiocManager::repositories()) > > deps = unique(unlist(tools::package_dependencies(pkgs, db, recursive=TRUE))) > > length(deps) > [1] 266 > > >

2022-03-05

Giulio Benedetti (15:17:43): > @Giulio Benedetti has joined the channel

2022-03-21

Pedro Sanchez (05:02:14): > @Pedro Sanchez has joined the channel

2022-05-09

Chiaowen Joyce Hsiao (16:27:00): > @Chiaowen Joyce Hsiao has joined the channel

2022-05-19

Vince Carey (09:05:04): > We need to have some discussion of the maintenance of “book” resources distributed by Bioconductor. The OSCA book needs > a maintenance team. At this time basic and multisample are failing to build in 3.15. Herve has identified a number of changes to > upstream packages that cause failed sanity checks. It is good that the sanity checks are there, but identifying the source of failure > can be complex. Changes to igraph and AUCell have been identified so far. > > It seems worthwhile to rescue the OSCA book for 3.15 if possible. Going forward we need some policies to sustainably introduce > and maintain monograph-level resources in the project. One concept that has been broached is organizing and presenting book > modules along the lines of the “conference workshop” contributions. We definitely want to reduce the role of special infrastructure > or exceptional management (e.g., permitting lack of unit tests) in book management within the project.

Sean Davis (09:40:38) (in thread): > I’m happy to discuss the workshop package stuff that we use for Orchestra.@Mike Smithmay have some thoughts given his experience with Wolfgang’s and Susan’s book.

Alan O’C (09:46:40) (in thread): > Perhaps some kind of revdep checks would be useful? Or a cron job that looks at the dep graph and identifies possibly breaking upstream changes (not just for books)

Leo Lahti (17:29:44) (in thread): > This is relevant for us as well, working on the OMA book which is somewhat similar to OSCA but related to microbiome studies. Ideally, we would like to distribute it later (when mature) through Bioc, in a similar way.

Peter Hickey (18:47:52) (in thread): > I would try to help in some way. OSCA has been / is an invaluable resource in my eyes

Ludwig Geistlinger (21:39:43) (in thread): > I second Pete and would also volunteer to help / join a maintenance team for the OSCA book.

2022-05-20

Alan O’C (03:48:19) (in thread): > I would also be happy to help, given the number of people I’ve recommended it to

Mike Smith (06:03:58) (in thread): > A collection of thoughts from ourMSMB bookDependencies > * Reliance on large numbers of packages (I think MSMB has 120 direct dependencies, ~ 500 packages get installed) and code rot is a definite problem. I’d estimate we get a build failure introduced by something outside our control every two months. > * We’re currently failing because thenetworksispackage was removed from CRAN on 09-05-2022 > * I haven’t got any numbers to back this up, but I feel like removal of packages from CRAN is a more frequent problem than breaking changes in code we rely on - but it’s probably close to 50/50. > * I think we’ve had deprecated Bioc packages too, but there more warning that it’s going to happen so isn’t as disruptive. > Build system > * The current build strategy is to use GitHub actions to build the complete book independently with both the Bioc Release and Bioc Devel docker images every two weeks. We also build both versions whenever a change is pushed to GitHub, and the pipeline can be triggered manually too. > * All installed packages (both CRAN and Bioc) are updated before each run. > * In the case of a successful build with the Release workflow the resulting web pages are deployed immediately - this has gone wrong e.g. once the book “built” but several chapters were just missing! > * The devel workflow exists only to get a “heads up” on what we need to fix soon. It’s broken more often than not (perhaps indicative of how frequent problems are) but I will check the build logs periodically and either amend the book or contact the maintainer of the package that’s causing the error. > Book structure > * MSMB isn’t a package. It’s just a collection of Rmd files, which get processed withbookdown::render_book()+ a small amount of extra code. > * Code in each chapter is independent of all the others. You can reference figures etc between chapters, but if an object should be reused it either needs to be saved as an RDS and loaded or created dynamically in both places. I think this rule was put in place very early in the writing process. > * The two points above let us build all the chapters in parallel. It’s not amazingly beneficial, but MSMB currently takes ~30 mins to build vs > 2hrs when it was a single process. > * It also allows reporting on the status of each chapter rather than just “the book failed somewhere” and reporting the first problem encountered. > Problem reporting > * Currently you have to check the status of the GitHub action page manually, probably because you realise a change hasn’t propagated to the website. Setting up an automated notification is on my TODO list. > * We don’t have formal unit tests. There are numerousstopifnot()statements in hidden code blocks. FWIW I don’t think one of these has ever triggered a build failure. > * We modify the knitr warning infrastructure to allow code blocks to have “known warnings” masked. That way we can suppress warning that are known and not interesting e.g. ggplot2 warnings like “stat_contour..: Zero contours were generated” but still report new things that crop up over time. > * It’s hard to detect some types of change e.g. if a figure no longer looks like it did before, and we don’t make much effort to do so at the moment. > Responsibilities > * Fairly clear delimitation of responsibility for maintaining the book. > * Chapters are maintained by either Susan or Wolfgang. Any problem that requires modifying the ‘content’ of the book is handled by the respective chapter owner e.g. if a package disappears what alternative should be used or is the content removed? > * I maintain the build infrastructure and will make necessary changes to source files that don’t alter the meaning or interpretation of code e.g. default argument changes

Hervé Pagès (12:40:21) (in thread): > Thanks@Peter Hickey@Ludwig Geistlinger@Alan O’Cfor volunteering. We’re going to set you up soon.

Nitesh Turaga (13:35:46) (in thread): > @Peter Hickey@Ludwig Geistlinger@Alan O’CYou are now maintainers and have access to all 6 OSCA repos. Please update the maintainer list as you progress through this.

Ludwig Geistlinger (13:41:55) (in thread): > Thanks, Nitesh. Based on Mike’s comments, a spreadsheet where folks can sign up for individual chapters will likely be a good start.

Hervé Pagès (14:14:38) (in thread): > Thanks@Nitesh Turaga!@Peter Hickey@Ludwig Geistlinger@Alan O’CThe OSCA book is divided in sub-books (OSCA, OSCA.intro, OSCA.basic, etc…) and each sub-book is a package with Aaron currently as the maintainer. All the sub-books are on GitHub athttps://github.com/OSCA-sourceI’m about to give you access to all the repos there. First thing you guys will want to decide is who takes care of what sub-book and update the DESCRIPTION files accordingly. Then push the changes to both GitHub andgit.bioconductor.org, and see your changes show up here in the next couple of days: > * release:https://bioconductor.org/checkResults/release/books-LATEST/ > * devel:https://bioconductor.org/checkResults/devel/books-LATEST/ > Seehttps://bioconductor.org/checkResults/for the frequency of these builds. > > FWIW I’ve made some progress on fixing the current errors. I’ve only applied my fixes to 3.16 at the moment but will port them to 3.15 soon. The error that we still see forOSCA.basicin 3.16 should go away when theAUCellfolks address the following issue:https://github.com/aertslab/AUCell/issues/28

2022-05-21

Peter Hickey (18:28:13) (in thread): > Thanks. I’m on leave until June 8, happy for others to assign me a sub-book/chapters as you go through it

2022-05-23

Alan O’C (15:42:12) (in thread): > Thanks Hervé! Hopefully the AUCell folks oblige, because removing or working around it seems a pain. I’d be happy to take on OSCA and OSCA.multisample, as the former is basically just a collection of links. However if anybody feels they have expertise in multisample I’d be delighted to take one of the earlier books instead, as they’re probably substantially less work:smile:Also, if org PATs are a thing, it might be worth adding one to the repo so that the default action passes?

Vince Carey (16:02:20) (in thread): > @Alan O’Ccan you give a little more context about PAT and the default action? I think PATs are defined only at the user, not the organizational level, but a PAT for a user with organization access will work at the organization level unless its scope is limited.

Alan O’C (16:03:31) (in thread): > The default GHA for every repo is 401ing (rate limit), the log suggests a PAT might get around thishttps://github.com/OSCA-source/OSCA/runs/6530394271?check_suite_focus=true

Alan O’C (16:03:52) (in thread): > Oh, never mind! Actually invalid creds.

2022-06-01

Ludwig Geistlinger (11:28:49) (in thread): > Sorry just circling back after a week of vacation on this. Thanks@Alan O’Cfor getting started on this. Sounds good to me if you are taking on OSCA + OSCA.multisample.@Peter Hickeyany preferences on how we divide OSCA.intro, OSCA.basic, OSCA.advanced, OSCA.workflows among us? I think OSCA.basic and OSCA.advanced are likely the largest chunks among those 4, and it might make sense to pair OSCA.basic and OSCA.workflows as the workflows seem to be based to the largest extent on basic processing. > > Means I could take on OSCA.intro and OSCA.advanced, and you OSCA.basic and OSCA.workflows, or vice versa? We can also check whether there are additional volunteers if the maintenance burden of two sub-books per person figures out to be too high.

Wes W (21:31:50) (in thread): > Happy to help also. did we setup a list for things that need doing so i can jump on and take a task? > > ah yes the AUCell issue, it broke some of my pipelines earlier this year but sorted it out with some dirty hacks…

2022-06-02

Ludwig Geistlinger (09:28:20) (in thread): > Sure@Wes W, if you are interested in taking over maintenance of one of the chapters OSCA.basic, OSCA.advanced, or OSCA.workflows, we can add you to the maintenance team.

Wes W (09:44:39) (in thread): > sounds good@Ludwig Geistlinger, yeah for continuity i could do clustering in both basic and advanced… , I could grab droplet processing as well , I am actually giving a talk on it at the BioC2022 event in Seattle., so could be a good opportunity to maybe transfer some of that stuff in (although its heavily influenced by Luke’s approaches)

Wes W (09:48:39) (in thread): > i do a lot of multimodal sc work , and while i think there is a lot of updates the book needs in this area, to be honest with my ambition vs my time availability , I dont think I would have the time to update that section

Hervé Pagès (11:59:22) (in thread): > Hi@Wes W, I’d be happy to grant you write access to the OSCA book on GitHub. What’s you GitHub username?

Alan O’C (12:30:33) (in thread): > Feel free to chip in onhttps://github.com/OSCA-source/OSCA/issues/3

Wes W (17:14:07) (in thread): > @Hervé Pagèsits “Varix”

Wes W (17:24:13) (in thread): > https://github.com/Varix

Hervé Pagès (18:07:24) (in thread): > Ok. You should have received invites to collaborate to the various OSCA repos. Please accept ASAP. Thanks!

Wes W (20:14:51) (in thread): > accepted:smiley:

2022-06-21

Kozo Nishida (11:45:23): > Hi all, > The OSCA book licenses are different inhttps://bioconductor.org/books/release/(CC BY-NC-ND 3.0 US) andhttps://bioconductor.org/books/release/OSCA/(CC BY 4.0). > Which is correct? > (I hope CC BY 4.0 for the translation.) - Attachment (bioconductor.org): Orchestrating Single-Cell Analysis with Bioconductor > Or: how I learned to stop worrying and love the t-SNEs.

2022-06-28

Wes W (12:11:09) (in thread): > i have a long trip coming up (70 hours of travel, 35 hours each way) and can really get into some of the issues for the area of the book I was assigned and do some maintenance and updates. Is anyone free early next week to help me pump out a quick to-do list? else I can just eyeball without priority issues I find.

Peter Hickey (19:18:34) (in thread): > I’ve just come down with COVID, but perhaps by then I’ll be feeling better and can help? I do not envy you with that amount of travel:grimacing:

Wes W (20:32:15) (in thread): > haha thanks:smiley:some places are just really hard to get to still

Peter Hickey (21:13:43) (in thread): > living in Australia, I’d saymostplaces are really hard to get to:wink:

2022-07-15

Ashley Robbins (15:18:27): > @Ashley Robbins has joined the channel

2022-07-28

Mervin Fansler (17:20:25): > @Mervin Fansler has joined the channel

2022-07-29

Alex Mahmoud (16:55:28): > @Alex Mahmoud has joined the channel

2022-08-30

Lori Shepherd (13:41:58): > @Lori Shepherd has joined the channel

2022-10-06

Devika Agarwal (05:37:40): > @Devika Agarwal has joined the channel

2022-11-24

Peter Hickey (19:37:49): > saw this getting a bit of publicity on twitterhttps://twitter.com/AnnaCSchaar/status/1595820503666565120 - Attachment (twitter): Attachment > 1/n Looking for the best practices in your day-to-day unimodal or multimodal single-cell data analysis for single-cell transcriptomics, chromatin accessibility, surface protein, TCR/BCR or spatial omics? We have your back with the https://sc-best-practices.org online book. https://twitter.com/fabian_theis/status/1595810545365565447 - Attachment (twitter): Attachment > 1/n Single-cell data analysis is complex with many tools to choose from - but which ones work best and should be used when? Led by @LukasHeumos and @AnnaCSchaar, we wrote a free online book that aims to guide single-cell data analysts: https://sc-best-practices.org

Peter Hickey (19:38:18): > Took a quick look and just suggested they take a bit more care when discussing prior workhttps://github.com/theislab/single-cell-best-practices/issues/114 - Attachment: #114 Care when summarising prior art > Disclosure: I’m a maintainer of the Orchestrating Single Cell Analysis with Bioconductor book (OSCA). > > Hi, > > I’ve just started reading your guide and look forward to learning more about the Python-side of single-cell analysis, in particular.
> I’m confident your guide will garner a lot of interested readers, and so I’d like to ask that you please take care when discussing prior art in this area.
> For example, some of what you currently have about OSCA is incorrect or misleading, such as: > > > However, [OSCA] does not comprise advanced topics such as RNA velocity, spatial transcriptomics and others. Moreover, additional modalities such as ATAC-Seq or CITE-Seq data, or the multimodal integration of these are not covered. > > OSCA does have some content on: > > • <https://bioconductor.org/books/3.16/OSCA.advanced/trajectory-analysis.html#rna-velocity|RNA Velocity> > • <https://bioconductor.org/books/3.16/OSCA.advanced/integrating-with-protein-abundance.html#integrating-with-protein-abundance|Integrating with protein abundance (e.g., CITE-seq)> > • <https://bioconductor.org/books/3.16/OSCA.advanced/integrating-with-protein-abundance.html#integration-with-gene-expression-data|Multimodal integration (e.g. integrating RNA and antibody data> > > OSCA is not intended to cover scATAC-seq (or other single-cell ’omics) or spatial transcriptomics, but I’d argue it covers much more than “basic single-cell RNA-Seq analysis”. > > For spatial transcriptomics, there is Orchestrating Spatially-Resolved Transcriptomics Analysis with Bioconductor (OSTA), which is perhaps not as well-developed as OSCA but has been around since 2020. > > Thanks for your consideration,
> Pete

Stephanie Hicks (20:16:41): > Thank you@Peter Hickeyfor opening up the issue.Hopefully they will be responsive.

Stephanie Hicks (20:40:44): > I guess one positive spin is “Imitation is the sincerest form of flattery” – Oscar Wilde:upside_down_face: - File (PNG): Screenshot 2022-11-24 at 8.37.12 PM.png - File (PNG): Screenshot 2022-11-24 at 8.37.25 PM.png

Stephanie Hicks (20:50:57): > Upon reading through it a bit, maybe it’s more still a work in progress? I’m excited to see where it goes though. But, I’m not sure I get it the purpose though for some of the chapters e.g. the entire feature selection chapter is just demo-ingscry(https://bioconductor.org/packages/scry) from what I can tell?https://www.sc-best-practices.org/preprocessing_visualization/feature_selection.html - Attachment (Bioconductor): scry > Many modern biological datasets consist of small counts that are not well fit by standard linear-Gaussian methods such as principal component analysis. This package provides implementations of count-based feature selection and dimension reduction algorithms. These methods can be used to facilitate unsupervised analysis of any high-dimensional data such as single-cell RNA-seq.

Stephanie Hicks (21:09:07): > also, there are some factually incorrect statements….:confused: > > “Bioconductor is a project which develops, supports and shares free open source software with a focus on rigorous and reproducible analysis of data for many different biological assays including single-cell. A homogeneous developer and user experience and extensive documentation with user friendly vignettes are the biggest strengths of Bioconductor. Seurat is a well regarded R package specifically designed for the analysis of single-cell data. It offers tooling for all steps of the analysis including multimodal and spatial data. The well written vignettes and the large user-base is what Seurat is known for. However,both R options can run into scalability issues for extremely large datasets (more than half a million cells)which motivated the Python based community to develop the scverse ecosystem.”

Peter Hickey (21:23:27): > it’s definitely rough - there’s empty chapters (ironically often on topics they say OSCA doesn’t cover but actually does!)

Peter Hickey (21:24:37): > and agreed that there’s other factually wrong stuff about BioC and R in there. I didn’t want to pile on or get into language wars, but because of the lab this is coming from I’m concerned about it becoming gospel to new readers

Aaron Lun (22:31:14): > imho this is standard operating procedure these days; if you don’t dump all over existing work, how will you ever get more grant money?

Aaron Lun (22:31:42): > i guess i did a little of it to seurat, now the scanpy folks do it to both of us, and someone else will do it to them in time

Aaron Lun (22:31:49): > a beautiful vicious circle of life

2022-11-25

Martin Morgan (18:29:12): > I spoke with Anna at a recent CZI meeting. She was very interested in engaging the Bioconductor community, perhaps aware of the limited exposure their group had to OSCA, etc. I’d encourage further positive engagement…

Stephanie Hicks (20:12:31): > Thanks Martin!That’sgreat to hear

2022-11-26

Vince Carey (23:03:36): > @Davide Risso… it would be in line with your suggestion in a recent TAB meeting to have some bridge building among single-cell research groups. Thoughts on how we should proceed? A satellite meeting or session in (the vicinity of) EuroBioc 2023?

2022-11-27

Davide Risso (04:25:38): > I can certainly bring it up at the EuroBioc2023 organization meeting

Davide Risso (04:27:53): > Since they are based in Europe, it might be a good venue to discuss these issues

Vince Carey (08:12:08): > Perhaps we have time to propose a session on Agile Monograph Production for Computational Biology at ISMB or a conference of similar scope?

Davide Risso (14:18:11): > As far as I can tell from here they only accept submissions for proceedings and tutorials. Do you know what is the channel to organize a session at ISMB?

Davide Risso (14:22:24): > I guess a tutorial on “osca for large scale data” at ismb could also be a good idea if the goal is to make sure people that don’t know Bioconductor well are aware that we can indeed analyze dataset with more than half million cells…

Ludwig Geistlinger (16:18:53): > And OSCA Advanced, Chapter 14 Dealing with big data will be a good starting point for such a tutorial. I think the whole “R solutions do not scale for half a million cells” is taken directly from thescanpy paperback from 2018 - where already in the conclusion of this paper it states “Just before submission of this manuscript, a C++ library that provides simple interfacing of HDF5-backed matrices in R was made available as a preprint” … pointing to Aaron’s beachmat solution. So that claim is fairly outdated.

2022-11-28

Federico Marini (15:35:44) (in thread): > Can confirm her very positive and open attitude. She is one of the persons spearheading this now, but it was still at a work-in-progress stage just two weeks ago - I was surprised it got advertised so early

Federico Marini (15:38:31): > I don’t have the background to say it is feasible/how easily feasible that would be, but how nice would it be to have one “unifying” live book, backed by quarto in the rendering, whichreally promotes interoperable code and workflows?

Federico Marini (15:39:14): > but at least for some simple website-oriented projects, quarto just feeds on unchanged qmd files and seems to… just work

Federico Marini (15:39:27): > probably bookdown is a different beast

Stephanie Hicks (16:32:41): > I thinkit’spossible from a technical perspective!I’veforced myself to learn it this fall for my course and it truly is amazing as everyone says it is.

Stephanie Hicks (16:33:49): > I’dbe happy to join you in proposing that idea to others if there interest

Vince Carey (16:54:20) (in thread): > hi@Federico Marinican you clarify a bit about the topic of live book and the role of interoperable code and workflows? I know quarto has simplified the incorporation of multiple languages in one document, but is that an actual goal of the monographs we are talking about? additionally, is quarto going to be a replacement of pkgdow?

Peter Hickey (17:53:36): > I’m busy this week with BioC Asia, but if people would like to follow-up either in the original issue (https://github.com/theislab/single-cell-best-practices/issues/114) or a follow-up please go ahead - Attachment: #114 Care when summarising prior art > Disclosure: I’m a maintainer of the Orchestrating Single Cell Analysis with Bioconductor book (OSCA). > > Hi, > > I’ve just started reading your guide and look forward to learning more about the Python-side of single-cell analysis, in particular.
> I’m confident your guide will garner a lot of interested readers, and so I’d like to ask that you please take care when discussing prior art in this area.
> For example, some of what you currently have about OSCA is incorrect or misleading, such as: > > > However, [OSCA] does not comprise advanced topics such as RNA velocity, spatial transcriptomics and others. Moreover, additional modalities such as ATAC-Seq or CITE-Seq data, or the multimodal integration of these are not covered. > > OSCA does have some content on: > > • <https://bioconductor.org/books/3.16/OSCA.advanced/trajectory-analysis.html#rna-velocity|RNA Velocity> > • <https://bioconductor.org/books/3.16/OSCA.advanced/integrating-with-protein-abundance.html#integrating-with-protein-abundance|Integrating with protein abundance (e.g., CITE-seq)> > • <https://bioconductor.org/books/3.16/OSCA.advanced/integrating-with-protein-abundance.html#integration-with-gene-expression-data|Multimodal integration (e.g. integrating RNA and antibody data)> > > OSCA is not intended to cover scATAC-seq (or other single-cell ’omics) or spatial transcriptomics, but I’d argue it covers much more than “basic single-cell RNA-Seq analysis”. > > For spatial transcriptomics, there is Orchestrating Spatially-Resolved Transcriptomics Analysis with Bioconductor (OSTA), which is perhaps not as well-developed as OSCA but has been around since 2020. > > Thanks for your consideration,
> Pete

2022-11-29

Federico Marini (04:23:53) (in thread): > long shot here, but it could fit perfectly to a CZI call among the OSS-friendly ones

Assa (09:09:39): > @Assa has joined the channel

2022-11-30

Luke Zappia (06:18:33): > @Luke Zappia has joined the channel

Luke Zappia (06:19:14): > Might be just me but I can’t seem to access the release version of OSCA at the moment (devel works fine though)

Luke Zappia (06:29:11): > Just seeing the discussion above, I have been one of the contributors to the single-cell best practices book. It’s definitely still a work in progress (IMO probably a bit rough still to have been announced publicly but there was pressure to “get it out”). There is a conscious effort to try and make it language-agnostic but there is still some bias showing through in places that needs to be fixed. Everything will eventually be externally reviewed as well which should help but I don’t think that process has started yet. If you have any concerns you want to raise I’m sure Lukas and Anna who are leading the project would be happy to hear them. I’m happy to pass things on/put people in contact but GitHub issues would also be welcome.

Luke Zappia (06:29:59) (in thread): > Never mind, it’s working now. Just very slow to load for some reason.

Lukas Weber (10:28:09) (in thread): > (different Lukas, not me:upside_down_face:)

Calandra Grima (22:16:02): > @Calandra Grima has joined the channel

2022-12-01

Chris Chiu (05:21:39): > @Chris Chiu has joined the channel

2022-12-04

Wolfgang Huber (03:07:54): > @Wolfgang Huber has joined the channel

Wolfgang Huber (03:15:11) (in thread): > I think this would be very valuable, > (a) for the visibility it gets at the largest bioinformatics conference (people we may not otherwise reach and that are still relevant) > (b) teaching your stuff is the best way to fix it. > ISMB 2023 is 23-27 July in Lyon/France (beautiful city near the Alps with great food) and the deadline for tutorial proposals is in two weeks (19 Dec).

Wolfgang Huber (03:19:08) (in thread): > Can we put together a team for this?

Wolfgang Huber (03:20:54) (in thread): > For the older books (2005, 2008), we did accompany them with a fair bit of travelling and evangelizing, which actually is fun. It’s a bit like a musician doing the “world tour” after releasing their new album:slightly_smiling_face:

Davide Risso (11:42:42) (in thread): > I’m happy to help (perhaps even lead) the effort of submitting the tutorial before the deadline and I should be able to go to Lyon this summer. Any other volunteer?

Davide Risso (11:48:51): > The theme could be large scale single cell data analysis with Bioconductor, and as@Ludwig Geistlingerhas pointed out there is already some material in OSCA as well as in@Stephanie Hickslarge scale book (https://github.com/stephaniehicks/large-scale-data-base)

Davide Risso (11:51:11): > From the instructions: “A PDF of your tutorial proposal must be uploaded. Please include tutorial title, proposed speakers, target audience, an abstract, proposed learning objectives and proposed agenda for the half day or full day tutorial.”

Ludwig Geistlinger (12:46:10) (in thread): > What is the deadline? I am booked out for December but I would be happy to help one way or another as of January.

Vince Carey (13:06:41) (in thread): > I’d be interested in working on the tutorial. We should decide whether we think it should be “runnable on laptop” or “use scalable computing environment in cloud”. I am more interested in the latter approach.

Ludwig Geistlinger (13:12:00) (in thread): > Ah I just see in Wolfgang’s comment that the deadline is 19 Dec. So I will not be able to help much for the proposal, but could help then in putting together the actual tutorial in the first half of 2023.

Davide Risso (14:05:36) (in thread): > Thanks@Ludwig Geistlinger! I think I can manage the proposal with minimal input from others that wish to be involved. The main thing would be having a list of people that at least potentially can come to Lyon in July.

Davide Risso (14:06:54) (in thread): > @Vince Careyif we go for a full day tutorial perhaps we can do both?

Vince Carey (14:26:03) (in thread): > I would think so, but preparing the cloud solutions would take some time and administrative work … The NHGRI AnVIL system could probably be used, but perhaps a European academic cloud would work better. I think we’d want to do the development of the cloud part in a way that could be transferred to one or another provider with relatively little effort.

Ludwig Geistlinger (15:21:22) (in thread): > Count me in for Lyon in July, Davide

2022-12-05

Wolfgang Huber (04:24:59) (in thread): > I am happy to help with writing the tutorial (to the extent of several days) and also the proposal (some hours), under the leadership of Davide. > I am not yet sure whether I can come to Lyon (it’ll probably be XOR with Bioc in Boston due to family considerations), but depending on what’s needed I can try to engage people in the group here, e.g. encourage Constantin Ahlmann-Eltze, Mike Smith or Junyan Lu

Wolfgang Huber (04:26:24) (in thread): > @Vince Carey, re European Academic Cloud, Mike Smith would know best what may work here (my impression is that there are options).

Wolfgang Huber (04:31:36) (in thread): > Re the cloud / scalability topic raised by Vince, my intuition would be to separate this into two sessions: one more about the conceptual science: preprocessing/QC, automating cell “type” annotation, differential expression & cell “type” abundance when comparing different complex samples (tissues) across conditions. I think this is the most important part for ISMB. > And one more about getting it done computationally, which is also important but contingent on the first.

Stephanie Hicks (10:15:21) (in thread): > I’m also happy to help with the writing of the tutorial. Dec 19th is the deadline, correct? Do we have a google doc started?

Aaron Lun (16:49:08): > @Aaron Lun has left the channel

2022-12-06

Davide Risso (04:44:30): > I’ve started a google doc here:https://docs.google.com/document/d/11N5E8IPI2L4_XPLRt6LxLx6My4WKfTAAkbzM7mF_L3M/edit?usp=sharing - File (Google Docs): ISMB 2023 OSCA tutorial

Davide Risso (04:45:07): > It’s just the headers for now. The proposal has a max of 4 pages and must include a draft of the schedule with speakers and titles

Davide Risso (04:45:44): > The first thing to decide is whether we aim for online or in person, half-day or full-day and what are the people involved

Davide Risso (04:47:33): > I would say that we have enough material to cover a full day and following@Wolfgang Huber’s suggestion I would perhaps suggest for the morning session to be more conceptual (QC, cell type annotation, DE, etc.) and the afternoon more technical (HDF5, cloud, etc.)

Wolfgang Huber (07:35:21): > My 2 eurocents: 1. in person, 2. agree

Stephanie Hicks (11:13:35): > my 2 uscents: i agree with@Wolfgang Huber:point_up:

2022-12-10

Vince Carey (15:57:41) (in thread): > I guess we do not have a doc yet?@Davide Risso

2022-12-11

Davide Risso (03:36:09) (in thread): > Yes we do, I had shared it in the main channel:https://docs.google.com/document/d/11N5E8IPI2L4_XPLRt6LxLx6My4WKfTAAkbzM7mF_L3M/edit?usp=sharing - File (Google Docs): ISMB 2023 OSCA tutorial

2022-12-12

Lexi Bounds (17:59:19): > @Lexi Bounds has joined the channel

2022-12-13

Ana Cristina Guerra de Souza (09:00:40): > @Ana Cristina Guerra de Souza has joined the channel

2022-12-14

Vince Carey (11:19:36): > Looks like we need to get cracking with this proposal. Suppose we focus on what can be done on a laptop, first. What is (are) the exemplary dataset(s) that we would like to focus on? Are the DropletTestFiles good enough? Or if we want to use orchestra, what’s a role forhttps://mtmorgan.github.io/HCABiocTraining/?? Maybe too far downstream to start with, but a good target nevertheless. - Attachment (mtmorgan.github.io): Introduction to Human Cell Atlas Data Access & Analysis in R / Bioconductor > A brief but comprehensive introduction to Human Cell > Atlas data retrieval and analysis in R / Bioconductor.

Chris Chiu (11:31:22): > @Chris Chiu has left the channel

2022-12-15

Isaac Virshup (09:40:31): > @Isaac Virshup has joined the channel

Isaac Virshup (09:59:05): > Hey all, > > Just got pointed to the tutorial discussion here by@Luke Zappia. scverse is also looking into submitting an ISMB/ ECCB tutorial proposal around scRNA-seq analysis:sweat_smile:! > > I’ve just sent an email to@Vince Carey,@Davide Risso, and@Ludwig Geistlingerseeing if we can do some coordination on differentiating our proposals here. Please let me know if you’d like to be CC-ed!

2022-12-18

Davide Risso (16:52:29): > Sorry I’ve been swamped and unable to contribute to the tutorial proposal. I can put something together tomorrow, any last minute feedback will be appreciated—@Vince Carey@Ludwig Geistlinger@Wolfgang Hubercan I put your names as potential speakers?

Davide Risso (16:53:14): > For Wolfgang it would be either Constantin or Mike rather than you, correct?

Davide Risso (16:53:56): > Just a reminder that tomorrow is the deadline for submission

Davide Risso (16:55:04): > Apologies again for the last minute notice

Ludwig Geistlinger (17:24:10): > Sure you can put my name as potential speaker

2022-12-20

Jennifer Foltz (10:40:59): > @Jennifer Foltz has joined the channel

Davide Risso (16:03:59): > To everyone interested: I submitted a proposal for an ISMB tutorial, loosely based on the OSCA book (+ some highlights of Martin’s work on the HCA and on facilities for large data). Sorry that I couldn’t share with you a final draft before submitting but I finished it very last minute!

Davide Risso (16:04:16): > We should know by the end of January whether the tutorial is selected

Stephanie Hicks (16:41:24): > thank you@Davide Rissofor leading this!

2022-12-22

Davide Risso (03:24:11): > Hi everyone, in case we want to capitalize on the ISMB tutorial effort and take@Wolfgang Huber’s suggestion of performing a “world book tour” for OSCA, I got this email from ABRF: > > Please complete this short form to submit an Educational Workshop Proposal for the 2023 ABRF Meeting in Boston, MA. The 2023 Pre-Meeting Workshops are scheduled for Sunday, May 7, 2023. The 2023 workshops will be in-person only, and scheduled for either half-day (4 hours) or full-day (8 hours) sessions. Workshop registrants will be required to pay an additional registration fee, in addition to the meeting registration. Workshop organizers do not need to identify sponsors: ABRF will work with Corporate Relations and leadership to determine appropriate fees and sponsorship opportunities. > > Educational Workshops are supported by registration fees for speakers' travel reimbursement: for each accepted workshop, up to 3 speakers will be reimbursed up to $500 for hotel and travel-related expenses. Speakers presenting virtually will not receive travel reimbursements. > A maximum of four workshops slots will be available: submit your proposal early and before the January 15, 2023 deadline for full consideration. > We look forward to hearing from you. > > The conference is in May in Boston and the deadline for submitting the proposal is Jan 15, 2023. I won’t be able to go, but perhaps someone local would be interested in presenting the book?

Davide Risso (03:33:06): > https://web.cvent.com/event/c098c911-6b50-4c56-9422-11cbf32a088a/summary

Davide Risso (03:33:11): > https://docs.google.com/forms/d/e/1FAIpQLSdiS7oAeA9dmgrGw32cvYg817Wm_caCzqcqB0N-FgAlqbKVyg/viewform

Stephanie Hicks (08:30:30): > Heythat’sgreat! Yeah, very supportive if someone local in Boston wanted to lead that.I’llbe in Seattle the week before, so I doubt I would be able to travel over the weekend.

Wes W (09:59:21): > I can be in Boston then. its a short train ride.

Vince Carey (13:32:32): > Interesting, I can probably help out in Boston too. Never heard of ABRF though.

Laurent Gatto (15:33:29): > ABRF is quite active in proteomics - they have run several comparative studies across different labs.

2022-12-23

Davide Risso (01:25:58): > Yes, they were involved with the MAQC/SEQC studies as well. Its target is people working in cores so might be a good audience of people developing/applying standardized pipelines and they’re typically interested in single-cell analysis best practices

Jill Lundell (10:59:30): > @Jill Lundell has joined the channel

2022-12-24

Davide Risso (05:32:29): > If someone wants to lead this effort I’m happy to share my ISMB final proposal as a starting point

Vince Carey (06:14:47): > I am on board, but not so active in SC research so would mainly be a facilitator, and will step aside if someone else wants to lead. I am sure we could find travel money for someone who really wanted to do this instead of me. I would like to know about a platform that should be used – has anyone taught “from the book” in orchestra?

Vince Carey (06:16:36): > Curiously the eponymous link at orchestra does not resolve – it gives “nan”, but the workshop does launch into R 4.0.2, Bioc 3.12. - File (PNG): badosca.png

2023-01-03

Hans-Rudolf Hotz (04:47:04): > @Hans-Rudolf Hotz has joined the channel

Wes W (19:33:08): > Keep me posted on how I can help@Davide Risso

2023-01-04

Davide Risso (06:39:21): > Thanks@Wes W, as I said I won’t be able to be in Boston for ABRF, so it makes sense that someone else is the main contact for the application. Would you be willing to do it? The application form is quite straightforward (see second link above), and I can share with you the ISMB proposal for inspiration.

2023-01-05

Wes W (10:15:33): > Happy to be the main contact for ABRF in Boston.

2023-01-12

Davide Risso (09:01:52): > Great@Wes W. Here’s my ISMB proposal if you need inspiration to complete the form:https://docs.google.com/document/d/1xLiohdkqHHloNphFyyv2QvYIV99Fa-NljXAPbfiO0f0/edit?usp=sharing - File (Google Docs): OSCA ISMB tutorial

2023-01-13

Wes W (00:58:53): > Submitted!@Davide Risso

2023-01-27

Davide Risso (12:10:44): > https://media0.giphy.com/media/3zFcbgHoIXzykQc7vU/giphy.webp?cid=6c09b9528ae1a8d4267b5f72045fc1b939e467757cb7b21f&rid=giphy.webp&ct=g - Attachment: Attachment

Davide Risso (12:11:03): > We’re going to Lyon!

Davide Risso (12:11:30): > Hello Davide - Thank you for your submission to the ISMB/ECCB 2023 Tutorial Program. We are pleased to inform you that your submission has been accepted for presentation. Our open call for submissions yielded nearly 50 high quality and interesting submissions and we were very pleased with the quality of the proposals and the wide variety of topics offered. >
> We will be in touch over the next few weeks with you regarding details to assist you in your preparation of the tutorial including key dates regarding the development of your detailed schedule and presentation materials. >
> We look forward to your involvement. >
> Best regards, >
> Annette McGrath, Patricia M. Palagi, Madelaine Gogol > Chairs, Tutorial Committee

Ludwig Geistlinger (12:12:06): > Congrats, great news indeed!

Stephanie Hicks (12:28:50): > Congratulations!

Leonardo Collado Torres (12:39:43): > Nice!

2023-01-31

Ahmad Al Ajami (09:10:48): > @Ahmad Al Ajami has joined the channel

2023-02-09

Wes W (02:27:45): > Thats Awesome news

Wes W (02:38:05): > Sadly we did not get the workshop for ABRF =(

Hervé Pagès (14:55:20): > @Hervé Pagès has left the channel

2023-02-13

Vince Carey (16:05:03): > Multiple build failures in 3.17 branch. OSCA.intro is green, other chapters red.

Vince Carey (16:05:34): > OSCA.basic failing in 3.16

Vince Carey (16:06:47): > When working with chapter 12 code I noted > > > set.seed(1010010) > > > altExp(sce) <- runTSNE(altExp(sce)) > > > colLabels(altExp(sce)) <- factor(clusters.adt) > > > plotTSNE(altExp(sce), colour_by="label", text_by="label", text_color="red") > Error in self$palette(n) : attempt to apply non-function > In addition: Warning message: >

Alan O’C (16:17:36): > I’ll check it out, thanks for flagging

Alan O’C (18:49:28): > This one’s a scater issue (well, really a viridis issue), will fix shortly

Peter Hickey (19:36:56): > OSCA.basic is probablyhttps://github.com/OSCA-source/OSCA.basic/issues/10that I need to fix, sorry

2023-02-14

Alan O’C (19:55:06): > Fix on the way to bioc for the devel stuff on my end

2023-02-15

Peter Hickey (03:44:04): > Think I’ve fixed OSCA.basic in release (unsure about devel has other issues related to scater) > I’ll keep an eye out for the next build report

Vince Carey (11:35:20) (in thread): > just wondering if you know a workaround for this … poking around scater has not produced a clue

Alan O’C (11:36:02) (in thread): > It should be fixed in the current devel?

Alan O’C (11:37:40) (in thread): > Ah, I guess the new version isn’t built yet. It’s on github alanocallaghan/scater@master if you need it now

Vince Carey (11:38:11) (in thread): > great, thanks

2023-02-17

Peter Hickey (03:41:47): > OSCA.basicis fixed in devel but I’m having trouble pushing the same fixRELEASE_3_16(seehttps://community-bioc.slack.com/archives/C6MVC96AZ/p1676623072030339) - Attachment: Attachment > I’m having trouble pushing to the RELEASE_3_16 branch of OSCA.basic > > OSCA.basic % git log > commit 7f72c9d1b5fcc5ad36135c844b6cc53888cc0e65 (HEAD -> RELEASE_3_16, origin/RELEASE_3_16) > Author: Peter Hickey <peter.hickey@gmail.com> > Date: Wed Feb 15 19:43:05 2023 +1100 > > Version bump > > commit 02aaa25edad393e75ee3aac310afbe7d9176c433 > Author: Peter Hickey <peter.hickey@gmail.com> > Date: Wed Feb 15 19:41:24 2023 +1100 > > Fix link > > - See [https://github.com/OSCA-source/OSCA.basic/issues/10#issuecomment-1430829599](https://github.com/OSCA-source/OSCA.basic/issues/10#issuecomment-1430829599) > - Closes #10 > > commit 634198fdcfb2e9b640d6d977ee2ee977dd6e7f0a (upstream/RELEASE_3_16) > Author: Peter Hickey <peter.hickey@gmail.com> > Date: Wed Feb 15 16:59:02 2023 +1100 > > Update README > > OSCA.basic % git push upstream RELEASE_3_16 > Enumerating objects: 13, done. > Counting objects: 100% (13/13), done. > Delta compression using up to 4 threads > Compressing objects: 100% (8/8), done. > Writing objects: 100% (8/8), 829 bytes | 165.00 KiB/s, done. > Total 8 (delta 5), reused 0 (delta 0) > remote: Traceback (most recent call last): > remote: File "hooks/pre-receive.h00-pre-receive-hook-dataexp-workflow", line 89, in <module> > remote: apply_hooks(hooks_dict) > remote: File "hooks/pre-receive.h00-pre-receive-hook-dataexp-workflow", line 71, in apply_hooks > remote: if hooks_dict["pre-receive-hook-merge-markers"]: # enable hook > remote: KeyError: 'pre-receive-hook-merge-markers' > To [git.bioconductor.org:packages/OSCA.basic](http://git.bioconductor.org:packages/OSCA.basic) > ! [remote rejected] RELEASE_3_16 -> RELEASE_3_16 (pre-receive hook declined) > error: failed to push some refs to '<mailto:git@git.bioconductor.org|git@git.bioconductor.org>:packages/OSCA.basic'

Lori Shepherd (08:11:43): > Sorry about that. We updated the hooks yesterday. I didn’t honestly test the workflow hook just software. I think I have it corrected if you wanted to try again and your earliest convenience

Peter Hickey (17:50:46): > Thanks, Lori. Can confirm it’s working again:slightly_smiling_face:

2023-02-19

Alan O’C (16:43:09): > Is RELEASE_3_16 for basic up to date on bioc? I still see the old build report. Otherwise all green:slightly_smiling_face:

Peter Hickey (20:55:43): > it looks to be from my end > > % git log > commit 7f72c9d1b5fcc5ad36135c844b6cc53888cc0e65 (HEAD -> RELEASE_3_16, upstream/RELEASE_3_16, origin/RELEASE_3_16) > Author: Peter Hickey <peter.hickey@gmail.com> > Date: Wed Feb 15 19:43:05 2023 +1100 > > Version bump > > commit 02aaa25edad393e75ee3aac310afbe7d9176c433 > Author: Peter Hickey <peter.hickey@gmail.com> > Date: Wed Feb 15 19:41:24 2023 +1100 > > Fix link > > - See[https://github.com/OSCA-source/OSCA.basic/issues/10#issuecomment-1430829599](https://github.com/OSCA-source/OSCA.basic/issues/10#issuecomment-1430829599)- Closes #10 > > commit 634198fdcfb2e9b640d6d977ee2ee977dd6e7f0a > Author: Peter Hickey <peter.hickey@gmail.com> > Date: Wed Feb 15 16:59:02 2023 +1100 > > Update README >

Peter Hickey (20:57:23): > But I don’t see it reflected in the BioC git logs:https://bioconductor.org/developers/rss-feeds/gitlog.release.xmlThe most recent commit there is634198fdcfb2e9b640d6d977ee2ee977dd6e7f0a

Peter Hickey (20:58:33): > And the most recent build still refers tov1.6.1inRELEASE_3_16(https://www.bioconductor.org/checkResults/3.16/books-LATEST/OSCA.basic/nebbiolo2-buildsrc.html) > That build is 1-2 days after I thought I pushed those 2 most recent commits, which correspond tov1.6.2

Peter Hickey (20:59:17): > @Lori Shepherdwe might please need some help from you our the core team to understand why these 2 most recent commits don’t seem to have been picked up by the BioC git and/or build machine

2023-02-20

Mike Smith (05:04:13): > @Peter HickeyI now I see your7f72c9d1b5fcc5ad36135c844b6cc53888cc0e65reflected inhttps://bioconductor.org/developers/rss-feeds/gitlog.release.xmlFor what it’s worth, anecdotally I’ve noticed a lag a few times in the last couple of months between a commit being made an it appearing in the RSS feed. I’ve not spent enough time to work out if it’s systematic, but I’ve definitely hunted for a commit, not found it despite other more recent commits to other repos appearing there, and then it’s been in the log the next day.

Alan O’C (05:06:41) (in thread): > Didn’t know there was a global git log, that’s handy. Is it linked to anywhere?

Mike Smith (05:19:16) (in thread): > There’s this page (https://bioconductor.org/developers/gitlog/) which then links to separate RSS feeds for master and release branches. I use the RSS feeds to check which packages should be updated incode.bioconductor.org. That’s why I’ve noticed the lag when I can’t find a commit I’m expecting in the code browser.

Alan O’C (05:21:33) (in thread): > Nice, thanks. Don’t judge me for my commit messages:rolling_on_the_floor_laughing:

Lori Shepherd (09:37:01) (in thread): > I’ll checktoo. The rss feed is controlled by the hooks too ask I may need to adjust it

Lori Shepherd (09:37:55) (in thread): > Keep in mind the release builds are on a different frequency and only occurs 2 or 3 times a week

Peter Hickey (15:41:50): > OSCA.basicbuild now okay inRELEASE_3_16

Peter Hickey (15:42:07): > back to all OK forOSCAin both release and devel

2023-02-25

Vince Carey (02:35:30): > query on ch 12 that addresses CITEseq with ADT counts in altExp vs SingleCellMultiModal that uses MultiAssayExperiment to manage rna and adt together … of interest to working group on classes@Laurent Gatto?? (noted your comment on scMultiome) does classes working group have a slack channel?

Laurent Gatto (03:59:57) (in thread): > Yes, we have#biocclasses- I’ll make a note of this for future discussions.

Laurent Gatto (04:02:28) (in thread): > Seehttps://github.com/Bioconductor/BiocClassesWorkingGroup/issues/9 - Attachment: #9 altExp vs MultiAssayExperiment to manage multiple assays > A message from Vince on slack > > > query on ch 12 that addresses CITEseq with ADT counts in altExp vs SingleCellMultiModal that uses MultiAssayExperiment to manage rna and adt together … of interest to working group on classes @Laurent ?? (noted your comment on scMultiome) does classes working group have a slack channel?>

2023-03-02

Davide Risso (16:28:41) (in thread): > FYI theDataClassargument inSingleCellMultimodal::CITEseq()allows you to get the data as SCE with altExp (seehttps://github.com/waldronlab/SingleCellMultiModal/blob/e020f9a6ba7791139fcef2260513346c7b1da7bb/vignettes/CITEseq.Rmd#L109)

2023-03-10

Edel Aron (15:28:58): > @Edel Aron has joined the channel

2023-03-23

Davide Risso (04:33:10): > Hi all, I received this email from ISMB: > > Dear ISMB/ECCB 2023 Tutorial and Workshop Leads, > > ISCB has recently been approached by a book publisher seeking interest in developing a training book series as part of the bioinformatic publishing series. After speaking with the chair of the publications committee and one of the chairs of the education committee, we wanted to gauge the interest of the community to see if this is something that may be of interest. > > The authors of the books will have publishing agreements with the publisher and have the rights to the royalties. ISCB's commitment would be to ensure a pipeline of proposals (1-5 per year). Additionally, ISCB members will get purchasing discounts. > > Please take a minute to answer our interest survey -[https://forms.gle/o2uRZhGTG1vR5nK37](https://forms.gle/o2uRZhGTG1vR5nK37). Your feedback would be invaluable to our decision making. > > Thank you in advance for your time. > > ~ Diane > > Just a reminder that we are presenting a tutorial at ISMB based on OSCA. Any interest in a printed version of the OSCA book (or something derived from it) if the opportunity arises?

Alan O’C (08:08:57): > It seems like a printed version of OSCA would remove most of the benefits it has now living within the Bioc ecosystem

Stephanie Hicks (22:03:44): > i very much agree with@Alan O’C

2023-03-24

H. Emre (07:28:25): > @H. Emre has joined the channel

RGentleman (14:21:11): > i also agree - a printed version would be more problematic than online resources. It is less interactive and could provide some support challenges -

2023-03-30

Ludwig Geistlinger (11:01:54): > Mark your calendars for a CCB seminar special with Aaron Lun, > the mastermind behind theOrchestrating Single-Cell Analysiswith Bioconductor(OSCA) online book! > > Aaron will speak about the journey that lead to the OSCA book > from a developer’s perspective in his talk: > > Code, sweat, and tears: how the OSCA sausage was made > > When: April 03, 2023, 3 PM ET > Where:https://harvard.zoom.us/j/97173440183?pwd=eHI1ODRub0p5NGNEZncwU0lURlJjdz09

Robert Shear (11:46:06): > @Robert Shear has joined the channel

Andres Wokaty (12:22:02): > @Andres Wokaty has joined the channel

Peter Hickey (17:40:42): > Could it please be recorded?

Ludwig Geistlinger (17:43:31) (in thread): > Sure we’ll record

2023-03-31

Ilaria Billato (08:57:50): > @Ilaria Billato has joined the channel

2023-04-04

Jacques SERIZAY (05:01:15): > @Jacques SERIZAY has joined the channel

2023-04-06

Ludwig Geistlinger (13:15:00): > @Peter Hickeyhere is the recording from Aaron’s talk:https://youtu.be/NCBUBP4Ll9I - Attachment (YouTube): Code, sweat, and tears: how the OSCA sausage was made

2023-04-20

Chris Vanderaa (04:36:12): > @Chris Vanderaa has left the channel

2023-04-25

Peter Hickey (18:52:55): > Thanks@Ludwig Geistlinger!

2023-05-07

Peter Hickey (22:53:03): > https://bioconductor.org/books/release/OSCA/is still pointing to the BioC 3.16 version. Shouldn’t it be 3.17 and how can it be fixed? - Attachment (bioconductor.org): Orchestrating Single-Cell Analysis with Bioconductor > Or: how I learned to stop worrying and love the t-SNEs.

2023-05-08

Lori Shepherd (06:31:32): > @Andres Wokatycan you check on these links please

Andres Wokaty (10:09:27): > I fixed the release link; however, the devel books haven’t propagated yet due to errors or packages not being available.

Vince Carey (15:46:45): > athttps://bioconductor.org/books/release/OSCA/bioconductor-sticker image link is broken – first text line after “Welcome” - Attachment (bioconductor.org): Orchestrating Single-Cell Analysis with Bioconductor > Or: how I learned to stop worrying and love the t-SNEs.

Peter Hickey (18:24:35) (in thread): > Thanks Jennifer! Is that something that will need to be manually updated upon each release?

Peter Hickey (18:24:53) (in thread): > I’ll look into the OSCA book builds in devel

Andres Wokaty (18:26:01) (in thread): > Yes, actually I didn’t know about this link or I didn’t remember it. It should be one of the release tasks for the core, specifically me. I am still learning things :)

Peter Hickey (18:26:39) (in thread): > all good:slightly_smiling_face:I wasn’t sure if it was something us in the OSCA group needed to do

Peter Hickey (18:33:38) (in thread): > Thanks, Vince. Will address inhttps://github.com/OSCA-source/OSCA/issues/8 - Attachment: #8 Update link to BioC sticker gif > URL below needs to be updated to https://raw.githubusercontent.com/Bioconductor/BiocStickers/devel/Bioconductor/Bioconductor-serial.gif in both release and devel.
> > https://github.com/OSCA-source/OSCA/blob/fe7b362a3215622e06e987aa021d7eec2d0a1fba/inst/book/index.Rmd#L9|OSCA/inst/book/index.Rmd > > Line 9 in fe7b362

Peter Hickey (19:38:40) (in thread): > Resolved

Peter Hickey (19:41:28): > Who is admin ofhttps://github.com/OSCA-source? We need to update the default branch to ‘devel’ on GitHub (https://contributions.bioconductor.org/branch-rename-faqs.html)

2023-05-10

Alan O’C (12:57:50): > fyi I broke at least some of osca by making the names of reducedDim always be (eg) TSNE1, UMAP1 etc rather than dim1 etc before. I’ve fixed multisample, will check the others later

Peter Hickey (20:31:37) (in thread): > @Vince Carey@Andres Wokatythis was fixed in release forv1.10.1(https://github.com/OSCA-source/OSCA/commits/RELEASE_3_17) and the build passed and has the green ‘propagated’ circle (https://bioconductor.org/checkResults/3.17/books-LATEST/OSCA/nebbiolo1-buildsrc.html) but the published version is still onv1.10.0(https://bioconductor.org/books/release/OSCA/) - Attachment (bioconductor.org): Orchestrating Single-Cell Analysis with Bioconductor > Or: how I learned to stop worrying and love the t-SNEs.

Peter Hickey (20:32:07) (in thread): > is something amiss?

Andres Wokaty (22:29:46) (in thread): > Thanks for bring this up. I needed to correct the schedule for propagation. I went ahead and ran it since it was supposed to run earlier today.

Peter Hickey (23:03:23) (in thread): > Thank you Jennifer!

2023-05-15

Peter Hickey (19:09:38): > @Ludwig Geistlingerwould you please take a look atOSCA.advanced. The builds are failing in both release (https://bioconductor.org/checkResults/3.17/books-LATEST/OSCA.advanced/nebbiolo1-buildsrc.html) and devel (https://bioconductor.org/checkResults/3.18/books-LATEST/OSCA.advanced/nebbiolo2-buildsrc.html)@Alan O’Cwould you please take a look atOSCA.multisample. The build is failing in release (https://bioconductor.org/checkResults/3.17/books-LATEST/OSCA.multisample/nebbiolo1-buildsrc.html) but OK in devel (https://bioconductor.org/checkResults/3.18/books-LATEST/OSCA.multisample/)

Ludwig Geistlinger (19:38:39): > Thanks Pete. I am currently ooo but will be taking a look next week when I am back.

2023-05-16

Alan O’C (05:04:04): > I’ve pushed a fix to multisample devel on the 10th May that should’ve fixed that error

Alan O’C (05:04:37): > I can also have a look at advanced (it’s also due to my handiwork in scater)

Alan O’C (10:17:40): > There should be a fix on bioc for advanced now too, although given that my multisample fix hasn’t shown up don’t take that as a given

Peter Hickey (17:30:45) (in thread): > OSCA.multisampleis fixed in devel, but not in release. Do you need to cherry pick the fix from devel to release?

Alan O’C (19:10:52) (in thread): > Ah fair, I mixed up my bioc versions. both books should be fixed in both versions next build

2023-05-17

Hassan Kehinde Ajulo (12:17:01): > @Hassan Kehinde Ajulo has joined the channel

Peter Hickey (23:30:16) (in thread): > Thanks, Alan! All books are now green in release and devel (although not propagating in devel due to some dependencies not being available)

2023-05-18

Oluwafemi Oyedele (05:53:27): > @Oluwafemi Oyedele has joined the channel

2023-05-19

Alan O’C (04:25:02): > Cheers, sorry in future I need to remember to run through the builds before pushing scater changes

2023-05-21

Aaron Lun (13:57:15): > @Aaron Lun has joined the channel

Aaron Lun (13:57:28): > do we have a DOI for the book itself, just like we have for packages?

Lori Shepherd (17:51:07): > As far as I know we have not assigned a DOI for tht books through Bioconductor

2023-05-22

Peter Hickey (17:47:01): > @Lori Shepherdor@Vince Careydo you know the answer to this?https://community-bioc.slack.com/archives/CM2CUGBGB/p1683589288330799 - Attachment: Attachment > Who is admin of https://github.com/OSCA-source? We need to update the default branch to ‘devel’ on GitHub (https://contributions.bioconductor.org/branch-rename-faqs.html)

Marcel Ramos Pérez (17:49:59) (in thread): > perhaps@Alan O’Cor@Aaron Lunknow the answer?

Alan O’C (18:19:44) (in thread): > I don’t, but I think it was@Hervé Pagèswho made us developers before?

Hervé Pagès (21:59:45): > @Hervé Pagès has joined the channel

Hervé Pagès (22:07:42) (in thread): > @Peter HickeyWhen the OSCA team was formed one year ago, I was granted admin rights by@Aaron Lunso I could grant write access to all members of the team. I don’t really see myself as part of that team so maybe someone else should be granted admin rights. Would you be ok with that@Peter Hickey?

Peter Hickey (22:26:18) (in thread): > yep, happy to do that

Hervé Pagès (23:55:55) (in thread): > Great. You now have Admin role on all the repos.

Peter Hickey (23:56:28) (in thread): > thanks!

2023-07-21

Aaron Lun (14:32:17): > <!here>Sometime in the next release cycle, I would like to switchscranandscuttleto uselibscranwith the newtatamirepresentations. This should give several-fold speed-ups for large datasets… > > But it is likely that all the results will change. And if I do this, the book will break (seehttps://www.youtube.com/watch?v=NCBUBP4Ll9Ifor details). The question is what everyone here wants to do about it. > > One option is to go ahead with the breaking changes. This could be a good opportunity to streamline and update the book. For example, I’ve come around to the idea that further QC on top of Cellranger’semptyDrops-based cell filtering is unnecessary. > > The other option is to do all my changes in a new package, e.g.,scran2. This will preserve the book builds but the book’s contents will be obsolete. > > I would obviously prefer option 1, but I can only do so much, and I’ll have my hands full with the packages. Is there anyone who is interested in working on the OSCA book 2nd edition? There’s probably a paper in there somewhere about how to write this kind of book (e.g.,dir.expiry,rebook).

2023-07-22

Vince Carey (09:32:00): > What’s your time frame Aaron? Next release cycle means “during 3.18”? Our community/TAB need to discuss in some detail how we want to manage monograph-size assets. It would be nice to “crowd-source” the update/verification process and we have computational resources to help with this so that collaborators can focus on content/tool upgrade with minimal attention to configuration. I would also like there to be a little more discussion about the cost/benefit of rebook, which has a good motivation. Have other authors (OSTA, RforMassSpectrometry, MSMB) had the same use case? I would say there are potential papers about both the concept of the integrated monograph-size computational resource and its management, and about the specific tooling underlying OSCA… but that idea of writing it up has been bouncing around for a bit. Should we have a look atmanubot? Seriously. - Attachment (manubot.org): Manubot - Manuscripts, open and automated > A tool set and workflow for scholarly publishing that is open, collaborative, continuous, automated, reproducible, and free.

Vince Carey (09:33:32): > Also of note at#biocbooks

Aaron Lun (11:29:45): > @Vince Careythe time frame is whatever it needs to be. Strictly speaking, for my own/company’s use, I already have what I need athttps://github.com/LTLA/scran.chan. > > It just seems unfortunate that everyone else is stuck on the old slow stuff, hence this suggestion.

2023-07-23

Alan O’C (05:19:04): > I would be happy to do some work to update the books, although my time is also limited as my current job isn’t even directly using R at the moment. I guess the first step would be to try to drop in the new scran version and see just how much breaks, and thus vaguely how much work it will be?

Alan O’C (05:19:57): > Also Aaron, is basilisk broken at the moment? My depends on pkgs are broken with some unintelligible (to me) conda error, but they work fine locally, and the same is true as far as I can see for basilisk

2023-07-24

Aaron Lun (10:58:34): > see comments on#biocpython

Peter Hickey (17:35:37): > in principle, I support option 1.

Peter Hickey (17:35:40): > @Aaron Lunwill switching the internals ofscranandscuttleto uselibscranchange the R-level API much? thescran.chanR-level API looks rather different, is why i ask

Aaron Lun (17:46:30): > i think so, unfortunately. maybe this change could be minimized and the “standard” usage pattern will be preserved, but a lot of less-commonly-used options will likely disappear. > > This probably affects some esoteric aspects of the book. I remember the QC section having a million different options, lots of different curve fitting for the mean-variance trend, etc. Those have all been dropped in scran.chan for various reasons, e.g.: > * QC seems less necessary for 10X data than in the wild days of plate-based assays, given that the cell filtering already does a decent job AND each QC step runs the risk of throwing away distinct cell types. > * UMI data in general has fairly well-behaved mean-variance trends, which eliminates the need for the shenanigans involved in fitting a trend to read count data. > Plus some other thoughts, but I don’t actively analyze sc data anymore (other than what looks okay in kana runs), so it’s hard for me to suggest general edits to the book.

Peter Hickey (17:49:37): > hmm, so that sounds like it’s gonna be more work than what Alan was proposing (and I was thinking) of ” try to drop in the new scran version and see just how much breaks, and thus vaguely how much work it will be”

Alan O’C (19:16:12): > Makes sense, though obviously more work isn’t the most exciting thing to hear of, we should at some point be prepared to move on from the original contents

Alan O’C (19:16:40): > Having said that the notion of removing rather than adding stuff is not terrible

2023-07-25

Aaron Lun (02:27:29): > There is a general problem of finding someone to stump up some time/money on the book. I don’t know that any of us are really funded for this - I know I’m not - but if this is to be considered a serious resource, it should be maintained as such.

Davide Risso (02:28:56): > As someone that currently analyzes single-nuclear and multiome data I would be cautious in saying that we don’t need QC filtering. We find a lot of crappy “cells” even after removing empty droplets… I agree that the variance modeling can be streamlined as in my experience the vanilla modelGeneVar does a pretty good job.

Davide Risso (02:33:41): > Totally agree that we need some people funded to working on this.@Vince CareyI wonder if the training aim of the U24 grant could help

Davide Risso (02:39:34): > FWIW@Ludwig Geistlinger@Marcel Ramos Pérez@Dario Righelliand I have tried to condense OSCA in a one day tutorial just delivered at ISMB. This (https://bioconductor.github.io/ISMB.OSCA/index.html) is a first attempt at a more streamlined less encyclopedic version of the material. Your feedback is of course much appreciated! - Attachment (bioconductor.github.io): Orchestrating Large-Scale Single-Cell Analysis with Bioconductor > This tutorial aims to provide a solid foundation in using > Bioconductor tools for single-cell RNA-seq analysis by walking through > various steps of typical workflows using example datasets.

Dario Righelli (02:39:37): > @Dario Righelli has joined the channel

Ludwig Geistlinger (07:14:40): > There is a bit of tension here between what we should do and what the current > OSCA maintenance team has the capacities to do. It seems clear that the OSCA book like an ordinary R/Bioc package should undergo enhancements, new features, and deprecation under the devel/release paradigm. If there are significant improvements to the book like the ones Aaron describes, we should certainly be able to incorporate those as a way of keeping the book relevant + up-to-date. Now we are primarily constrained here by what Alan, Pete, and I can do on the side, which seems to provide insufficient resources for the task at hand. > > I agree with Aaron that given the visibility that the OSCA book lends to > the project (It looks like we are looking at >500 citations by the end of the year; > I think we are missing download stats and traffic on the OSCA homepage > to get a sense of how frequently the book is accessed/used), an effort should be > made to provide proper funding that would allow to hire somebody that could take on this task more seriously. Will there be another round of CZI EOSS? We should > consider a proposal that does not only focus on the maintenance component but also on new features and additional functionality currently not covered by the book.

Ludwig Geistlinger (09:26:30) (in thread): > Some impressions below from our OSCA tutorial at the ISMB, which was sold out and very well received, providing some more evidence for continued relevance of the book as an essential single-cell analysis and education resource. Shout-out also to@Ilaria Billatoand@Stefania Pirrottafor reviewing the materials and to@Alex Mahmoudfor providing us with the infrastructure for running the tutorial in the new BiocWorkshops Galaxy framework! > > As Davide had pointed out the tutorial is a light version of the book that concentrates on essential aspects for getting started with the book (“The OSCA book in a day”). > The tutorial is in large parts a faithful copy of the OSCA book, but also adds contents that are not (yet) covered in the OSCA book such as interoperability with other popular single-cell analysis ecosystems and accessing data from the Human Cell Atlas. - File (JPEG): Davide.jpeg - File (JPEG): Ludwig.jpeg - File (JPEG): Dario.jpeg - File (JPEG): Marcel.jpeg

2023-07-26

Wes W (08:52:10): > Happy to be more involved in Book2. Option 1 sounds alright to me. I think when I first joined the book team i was double assigned to part of the book that was already assigned to someone else and this has left me largely unutilized in the team structure. > > I do work with scRNA transcriptome, CITE-seq, and spatial data on the daily so I am kinda in the think of it terms of how the data is coming off these platforms these days. I still use a combination of CellRanger’s filtered barcodes + emptyDrops (for some samples empty drops only adds 200-400 cells cellranger missed) for my starting cells and GEM selection. Hopefully this doesn’t become the non-norm too soon as I just wrote a package to help my post-docs and phd students do this more easily. > > Let me know how I can chip in more as another publication here or there will really help my K99 application

Stefania Pirrotta (10:17:56): > @Stefania Pirrotta has joined the channel

Hervé Pagès (12:44:54): > Just a reminder that whatever changes are going to happen inscranandscuttle, the usual recommendation applies, that is, the previous API/behavior should be preserved as much as possible and deprecated for the next 6 months, at least. Otherwise other packages or workflows that depend on the “old behavior” will break, not just the OSCA book.

2023-07-27

Estella Dong (08:06:33): > @Estella Dong has joined the channel

2023-07-28

Konstantinos Daniilidis (13:47:24): > @Konstantinos Daniilidis has joined the channel

Benjamin Yang (15:58:11): > @Benjamin Yang has joined the channel

2023-09-14

Vince Carey (07:04:10): > I noticed that the contributors page is out of date in various ways …@Stephanie Hicks@Raphael Gottardocan supply updates?

Stephanie Hicks (08:45:24): > That’s a great point@Vince Carey! I think I could guess some of the more recent contributors, but maybe look at just contributors on the github page to help inform that?

Stephanie Hicks (08:45:32): > But very supportive of updating the contributors page!

Wes W (11:58:17): > Thanks@Vince Carey!!

2023-09-15

Leo Lahti (04:55:51): > @Leo Lahti has joined the channel

Ludwig Geistlinger (16:51:31): > Is it possible to obtain some stats around how often the OSCA book has been accessed over the course of the last months/years?

2023-09-21

Robert Shear (13:14:34): > While we are logging access on Google Analytics, it’s designed for sophisticated explorations of user behavior and so is somewhat tough to use. I think the easiest way to answer Ludwig’s question is with an Athena query in AWS. At the moment, we only have the last 6 months of log data available. But as part of the stats replacement, we will be moving the logs back to S3. So sometime in the next few weeks, we ought to be able to answer questions like this back 7+ years. The thing that GA has that our logs don’t is a user cookie. So we can count unique IPs but not individual users. This makes a difference behind a firewall where the traffic is all routed through a NAT server. > Here’s the skeleton for the Athena query > > with t1 as ( > select request_ip, count(*) n, min("date") start_date, max("date") end_date > from "default"."cloudfront_logs" > where starts_with(uri, '/books/release/OSCA') > group by request_ip > ) > select count(*) unique_ips, sum(n) total_gets, > min(start_date) first_date, max(end_date) last_date > from t1 > > And here’s the results > > # unique_ips total_gets first_date last_date > 1 23029 586345 2023-03-13 2023-09-16 >

Ludwig Geistlinger (13:58:53): > Thanks@Robert Shear- that’s super helpful!

2023-10-04

Lambda Moses (19:00:53): > @Lambda Moses has joined the channel

2023-11-05

Peter Hickey (17:15:57): > @Andres WokatyTheOSCA.workflowsbook is failing to build in BioC 3.18 due to an issue loading a resource from ExperimentHub (EH3493;https://bioconductor.org/checkResults/3.18/books-LATEST/OSCA.workflows/nebbiolo2-buildsrc.html) > Ithinkthis could be due to an issue with the ExperimentHub cache onnebbiolo2; could you please check if that’s the case?

2023-11-06

Andres Wokaty (09:44:05) (in thread): > Thanks for letting me know. There was an issue and I forced the resource to be downloaded again.

Peter Hickey (15:40:19) (in thread): > Thank you!

2023-11-08

Peter Hickey (19:51:09): > FYI build failure forOSCA.introin BioC 3.18 is hopefully intermittent. I think it’s the same issue ashttps://community-bioc.slack.com/archives/C056CEJTH5Z/p1699433071811819, which Herve has sought to fix - Attachment: Attachment > I lost track of the conversation, does anyone know if the problem above might be the cause of the error I’m seeing for {zellkonverter}? > > Quitting from lines 42-48 [read] (zellkonverter.Rmd) > Error: processing vignette 'zellkonverter.Rmd' failed with diagnostics: > ImportError: /home/biocbuild/.cache/R/basilisk/1.14.0/zellkonverter/1.12.0/zellkonverterAnnDataEnv-0.10.2/lib/python3.11/site-packages/h5py/../../.././libcurl.so.4: undefined symbol: nghttp2_option_set_no_rfc9113_leading_and_trailing_ws_validation > Run `reticulate::py_last_error()` for details. > It’s only on some platforms so maybe it could also be a cache thing?

2023-12-01

Tram Nguyen (10:16:26): > @Tram Nguyen has joined the channel

2023-12-14

Marc Elosua (15:39:51): > @Marc Elosua has joined the channel

2024-01-08

Connie Li Wai Suen (17:06:22): > @Connie Li Wai Suen has joined the channel

2024-01-12

Lori Shepherd (09:28:29): > <!channel>The OSCA book is failing on the devel builds for almost all sections and OSCA.multisample is failing in both release and devel – are maintainers aware and working on fixes?

2024-01-14

Peter Hickey (14:50:16): > i’ve been working on it

Peter Hickey (14:51:02): > All books now building in BioC 3.18

Peter Hickey (14:51:36): > Fixed issues in most books in BioC 3.19, with OSCA.multisample an OSCA.advanced to be fixed today

2024-01-15

Peter Hickey (17:33:33): > update: some fixed, some still broken (with new problems revealed as sub-books start building successfully). I’ll keep working on it this week until I get them all building

2024-01-16

Alan O’C (05:27:43): > Thanks Peter! I’ll try have a look as well, was struggling with package problems previously

2024-01-21

Axel Klenk (16:12:33): > @Axel Klenk has joined the channel

2024-01-22

Peter Hickey (19:01:04): > Looking for some help with issues I can’t reproduce locally in fixing OSCA in BioC 3.19: > 1. ~~~~OSCA.basic gives error ~~~~~~could not find function "rowRanges"~~~~~~ when compiling ~~~~~~quality-control.Rmd~~~~~~ (~~https://bioconductor.org/checkResults/3.19/books-LATEST/OSCA.basic/nebbiolo1-buildsrc.html)~~ > 2. OSCA.advanced~~~gives error ~~~~~Corrupt Cache: index file~~~~~~ when running ~~~~~~AnnotationHub::AnnotationHub()~~~~~~ as part of ~~~~~~more-qc.Rmd~~~~~~ (~~https://bioconductor.org/checkResults/3.19/books-LATEST/OSCA.advanced/nebbiolo1-buildsrc.html)~~but is broken for different reasons (see thread) > 3. ~~~~OSCA.workflows gives error ~~~~~~Corrupt Cache: index file~~~~~~ when running ~~~~~~scRNAseq::LunSpikeInData(which = "416b")~~~~~~ as part of ~~~~~~lun-416b.Rmd~~~~~~ (~~https://bioconductor.org/checkResults/3.19/books-LATEST/OSCA.workflows/nebbiolo1-buildsrc.html)~~ > (1) I got no idea > (2) and (3) seem to be pointing to corruptAnnotationHub/ExperimentHubcaches. > > 4.OSCA.multisampleis also broken (https://bioconductor.org/checkResults/3.19/books-LATEST/OSCA.multisample/nebbiolo1-buildsrc.html),~~~but I think that’s because it depends on the ~~~~~muraro-pancreas.Rmd~~~~~~ from ~~****OSCA.workflows****~~ in ~~~~~~using-corrected-values.Rmd~~~~~~, so hopefully once ~~****OSCA.workflows****~~ is working then we’ll get a clean build of OSCA.multisample~~~~but for different reasons (see thread)

2024-01-23

Luke Zappia (02:48:13) (in thread): > I’ve seen the cache issue on other packages so I think it just needs to be reset on some machines

Alan O’C (08:43:52) (in thread): > Having a look now, sorry for the delay

Andres Wokaty (13:33:24) (in thread): > I reset the cache yesterday so it should clear up on the next build. if it doesn’t, please ping me.

Alan O’C (19:52:02) (in thread): > basic compiled locally for me, also no idea. Think you’re right about the cache problem for the rest

Peter Hickey (21:50:13) (in thread): > Thanks for taking a look and confirming!

Peter Hickey (21:50:23) (in thread): > I’ll keep an eye on the next build

2024-01-28

Alan O’C (19:11:20) (in thread): > Multisample is proper broken, will look again tomorrow

2024-01-29

Alan O’C (07:23:00) (in thread): > Advanced also broken still, not just the cache error. Link above for advanced is wronghttps://bioconductor.org/checkResults/devel/books-LATEST/OSCA.advanced/nebbiolo1-buildsrc.html

2024-01-30

Peter Hickey (19:42:54) (in thread): > Fixed one issue inOSCA.advanced(https://github.com/OSCA-source/OSCA.advanced/commits/devel/) but thetrajectory.Rmdchapter is properly broken, which I’m working through

Peter Hickey (19:45:53) (in thread): > OSCA.multisampleis broken because of change inedgeRdescribed inhttps://github.com/OSCA-source/OSCA.multisample/issues/16. I reverted to the legacyedgeRbehaviour for BioC 3.18 but it would seem a better idea to move to the new behaviour for BioC 3.19 except that it changes the results and the surrounding text in the book - Attachment: #16 Change of default legacy in edgeR::glmQLFit() with edgeR v4.0.7 changes results in cluster-abundance.Rmd > From edgeR news file: > > > Changes in version 4.0.0 (2023-10-25) > > > > • New statistical methods implemented in glmQLFit() to ensure accurate estimation of the quasi-dispersion for data with small counts. The new method computes adjusted residual deviances with adjusted degrees of freedom to improve the chisquare approximation to the residual deviance. The new methodology includes the new argument ‘top.proportion’ for glmQLFit() to specify the proportion of highly expressed genes used to estimate the common NB dispersion used in the new method. The output DGEGLM object contains new components ‘leverage’, ‘unit.deviance.adj’, ‘unit.df.adj’, ‘deviance.adj’, ‘df.residual.adj’ and ‘working.dispersion’. The new method can be turned on ‘legacy=FALSE’. By default, glmQLFit() will give the same results as in previous releases of edgeR. > > Recently, the default changed from legacy=TRUE to legacy=FALSE in the release branch (BioC 3.18): https://code.bioconductor.org/browse/edgeR/commit/1f0de5e1fa24e436315e13fad517e1bdac502fdd
> @gksmyth: It’s a bit unexpected for the default value to change in the release version, so I’d like to confirm that this change was intended for the release version (BioC 3.18) and not just the devel version (BioC 3.19) of edgeR? > > @LTLA, @alanocallaghan:
> With regards to inst/book/cluster-abundance.Rmd in OSCA.multisample, we can revert to the previous behaviour just by adding legacy=TRUE in the appropriate places which seems the best solution for BioC 3.18 (I’ll make a PR).
> For BioC 3.19, we can either adapt the text to the new results (i.e. use the default legacy=FALSE) or stick with the old results and text (i.e. also make legacy=TRUE). > > * * * > > Example based on inst/book/cluster-abundance.Rmd > > Extract example and setup DGEList object > > > library(rebook) > suppressPackageStartupMessages(library(SingleCellExperiment)) > extractCached( > system.file("book/pijuan-embryo.Rmd", package = "OSCA.multisample"), > chunk = "dimensionality-reduction", > objects = "merged") > #> <button class="rebook-collapse">View set-up code (Chapter \@ref(chimeric-mouse-embryo-10x-genomics))</button> > #> <div class="rebook-content"> > #> > #> > #> #— loading —# > #> library(MouseGastrulationData) > #> sce.chimera <- WTChimeraData(samples=5:10) > #> sce.chimera > #> > #> #— feature-annotation —# > #> library(scater) > #> rownames(sce.chimera) <- uniquifyFeatureNames( > #> rowData(sce.chimera)\(ENSEMBL, rowData(sce.chimera)\)SYMBOL) > #> > #> #— quality-control —# > #> drop <- sce.chimera\(celltype.mapped %in% c("stripped", "Doublet") > #> sce.chimera <- sce.chimera[,!drop] > #> > #> #--- normalization ---# > #> sce.chimera <- logNormCounts(sce.chimera) > #> > #> #--- variance-modelling ---# > #> library(scran) > #> dec.chimera <- modelGeneVar(sce.chimera, block=sce.chimera\)sample) > #> chosen.hvgs <- dec.chimera\(bio > 0 > #> > #> #--- merging ---# > #> library(batchelor) > #> set.seed(01001001) > #> merged <- correctExperiments(sce.chimera, > #> batch=sce.chimera\)sample, > #> subset.row=chosen.hvgs, > #> PARAM=FastMnnParam( > #> merge.order=list( > #> list(1,3,5), # WT (3 replicates) > #> list(2,4,6) # td-Tomato (3 replicates) > #> ) > #> ) > #> ) > #> > #> #— clustering —# > #> g <- buildSNNGraph(merged, use.dimred=“corrected”) > #> clusters <- igraph::cluster_louvain(g) > #> colLabels(merged) <- factor(clusters\(membership) > #> > #> #--- dimensionality-reduction ---# > #> merged <- runTSNE(merged, dimred="corrected", external_neighbors=TRUE) > #> merged <- runUMAP(merged, dimred="corrected", external_neighbors=TRUE) > #> ``` > #> > #> </div> > abundances <- table(merged\)celltype.mapped, merged\(sample) > abundances <- unclass(abundances) > ``` > > **Performing the DA analysis** > > ``` > suppressPackageStartupMessages(library(edgeR)) > # Attaching some column metadata. > [extra.info](http://extra.info) [extra.info](- colData(merged)[match(colnames(abundances), merged\)sample), ] > y.ab <- DGEList(abundances, samples = [http://extra.info)) > keep <- filterByExpr(y.ab, group = y.ab\(samples\)tomato) > y.ab <- y.ab[keep, ] > design <- model.matrix(~factor(pool) + factor(tomato), y.ab\(samples) > y.ab <- estimateDisp(y.ab, design, trend="none") > > # Change in behaviour > # edgeR v4.0.9 ](http://extra.info)) > keep <- filterByExpr(y.ab, group = y.ab\)samples\(tomato) > y.ab <- y.ab[keep, ] > design <- model.matrix(~factor(pool) + factor(tomato), y.ab\)samples) > y.ab <- estimateDisp(y.ab, design, trend=“none”) > > # Change in behaviour > # edgeR v4.0.9 ) v4.0.6 > fit.ab <- glmQLFit(y.ab, design, robust = TRUE, abundance.trend = FALSE) > # edgeR v4.0.6 behaviour (legacy = TRUE) > fit.ab.legacy <- glmQLFit(y.ab, design, robust = TRUE, abundance.trend = FALSE, legacy = TRUE) > # Change in behaviour > fit.ab\(var.prior > #> [1] 2.995648 > fit.ab.legacy\)var.prior > #> [1] 1.254041 > > # Consequential change in results > res <- glmQLFTest(fit.ab, coef=ncol(design)) > res.legacy <- glmQLFTest(fit.ab.legacy, coef=ncol(design)) > topTags(res, n = Inf) > #> Coefficient: factor(tomato)TRUE > #> logFC logCPM F PValue > #> ExE ectoderm -6.61386847 13.02497 44.00496047 2.682303e-08 > #> Mesenchyme 1.16998749 16.29382 15.11355411 3.104565e-04 > #> Allantois 0.77885715 15.50702 5.68283632 2.113409e-02 > #> Erythroid3 -0.64025988 17.28041 5.21921194 2.680090e-02 > #> Cardiomyocytes 0.78103396 14.86430 4.84112733 3.263443e-02 > #> Neural crest -0.78224850 14.76462 4.61438627 3.678044e-02 > #> Endothelium 0.75511256 14.28905 3.81255096 5.671491e-02 > #> Haematoendothelial progenitors 0.62212276 14.72323 3.05073961 8.709715e-02 > #> ExE mesoderm 0.35404299 15.67835 1.28015130 2.634921e-01 > #> Pharyngeal mesoderm 0.35492435 15.72073 1.27978174 2.635601e-01 > #> Forebrain/Midbrain/Hindbrain -0.29051795 16.54919 1.01467831 3.188358e-01 > #> Def. endoderm 0.59254011 12.40076 0.97192278 3.291447e-01 > #> Surface ectoderm 0.30115212 15.97647 0.95756471 3.327076e-01 > #> Erythroid2 -0.25947602 15.91448 0.67463758 4.155004e-01 > #> Caudal Mesoderm -0.46915460 12.08574 0.48677204 4.887368e-01 > #> Paraxial mesoderm -0.18710612 15.77605 0.36209038 5.501793e-01 > #> Intermediate mesoderm -0.21718837 14.31334 0.32958277 5.685855e-01 > #> Somitic mesoderm -0.19641966 14.26757 0.25071372 6.188613e-01 > #> Erythroid1 0.16020607 14.62373 0.19124084 6.638472e-01 > #> Gut 0.10147578 15.18933 0.09319683 7.614717e-01 > #> Spinal cord -0.09975658 15.29656 0.09233149 7.625471e-01 > #> NMP -0.09935100 15.14231 0.08772911 7.683620e-01 > #> Blood progenitors 2 -0.09595636 13.57505 0.04718809 8.289512e-01 > #> Rostral neurectoderm 0.03196157 13.37318 0.00457413 9.463593e-01 > #> FDR > #> ExE ectoderm 6.437527e-07 > #> Mesenchyme 3.725478e-03 > #> Allantois 1.471217e-01 > #> Erythroid3 1.471217e-01 > #> Cardiomyocytes 1.471217e-01 > #> Neural crest 1.471217e-01 > #> Endothelium 1.944511e-01 > #> Haematoendothelial progenitors 2.612914e-01 > #> ExE mesoderm 6.142294e-01 > #> Pharyngeal mesoderm 6.142294e-01 > #> Forebrain/Midbrain/Hindbrain 6.142294e-01 > #> Def. endoderm 6.142294e-01 > #> Surface ectoderm 6.142294e-01 > #> Erythroid2 7.122864e-01 > #> Caudal Mesoderm 7.819788e-01 > #> Paraxial mesoderm 8.027090e-01 > #> Intermediate mesoderm 8.027090e-01 > #> Somitic mesoderm 8.251484e-01 > #> Erythroid1 8.382130e-01 > #> Gut 8.382130e-01 > #> Spinal cord 8.382130e-01 > #> NMP 8.382130e-01 > #> Bl…

2024-01-31

Alan O’C (04:18:16) (in thread): > Ah great thanks! Main thing is knowing what’s actually caused the change, otherwise it’s a bit of a guessing game

2024-02-02

Lori Shepherd (08:12:56): > Heads up: OSCA.basic depends on AUCell. AUCell has been failing on devel since at least Dec 12 when we reinstalled R and realized CRAN removed rbokeh. We have been reaching out to AUCell but they have been unresponsive. AUCell is therefore at risk for deprecation. If anyone knows the AUCell maintainer it would be beneficial to reach out to them asap to fix the package.

Hervé Pagès (12:00:59) (in thread): > Have you tried here?https://github.com/aertslab/AUCell/issues

Lori Shepherd (12:02:21) (in thread): > no i tried their email; in principle I don’t hunt down alternate forms of communication. I’ll post an issue there too.

Vince Carey (13:20:42) (in thread): > @Laurent Gattodo you happen to know Stein Aerts, in whose lab AUCell was hatched?

Laurent Gatto (13:41:14) (in thread): > I’ve met him and can try to contact him directly. Will do this week-end.

2024-02-04

Laurent Gatto (05:02:42) (in thread): > Following up here: I emailed Stein Aerts yesterday, explaining the issue with rbokeh and offering help to fix it. He said he would look into it.

Peter Hickey (14:58:26) (in thread): > FWIW droppingAUCellfrom OSCA has been on my radar for a while because it’s gone through more than 1 long period where it broke or the behaviour changed with little warning

Peter Hickey (15:00:03) (in thread): > * https://github.com/OSCA-source/OSCA.basic/issues/4# > * https://github.com/OSCA-source/OSCA.basic/issues/8 - Attachment: #8 Cell type annotation chapter broken because of breaking changes in AUCell

Laurent Gatto (15:03:50) (in thread): > Thank you for your efforts - this does feel very frustrating.

2024-02-09

Levi Waldron (09:25:33): > @Levi Waldron has left the channel

2024-02-15

Alan O’C (17:27:54) (in thread): > Multisample also had some changes in the pseudotime graph, but they seemed inconsequential. Fingers crossed should be fixed on both release and devel now

Peter Hickey (18:18:12) (in thread): > thanks!

Peter Hickey (18:18:43) (in thread): > did you make some changes and have you pushed these got bioc git and github?

2024-02-16

Alan O’C (04:01:27) (in thread): > Had just pushed them to bioc, but now on gh as well

Alan O’C (04:02:19) (in thread): > Changes were just to add legacy=TRUE and to change the adjacency tests on the pseudotime graph. Didn’t dig into why the graph had changed, though, but it was different in release and devel

2024-02-18

Peter Hickey (15:10:43) (in thread): > Great, it’s got a clean build now again in BioC 3.19

Peter Hickey (15:10:54) (in thread): > @Ludwig Geistlingercan you please take a look atOSCA.advancedin BioC 3.19

2024-02-19

Ludwig Geistlinger (10:54:37) (in thread): > Thanks Pete. Sure, will take a look. Will need to wait until Friday though.

2024-02-23

Chris Magnano (16:34:40): > @Chris Magnano has joined the channel

2024-02-25

Ludwig Geistlinger (12:58:51): > It seems there is a caching issue for OSCA.intro in bothreleaseanddevelthat I can’t reproduce locally. Can we clear the caches on the builders and see whether the issue persist? Maybe@Lori Shepherd.

Ludwig Geistlinger (13:02:58) (in thread): > It seems some recent changes to a not yet identified package in the trajectory stack are changing the shape of the trajectory discussed in Chapter 10 of OSCA.advanced - triggering some of the included sanity checks. Working on a fix.

Lori Shepherd (16:34:13): > I can look into it tomorrow

2024-02-26

Alan O’C (05:48:48) (in thread): > Sounds like it’s probably the same as the pseudotime issues in multisample

Lori Shepherd (12:15:09): > having trouble trying to debug this as well. It looks like that section of code is creating its own unqiue BiocFileCache but in the working directory of where it is run – and I’m not sure where that actually ends up on the builder for books to check for the conflict of file or to even reset it

Lori Shepherd (12:16:57): > @Andres Wokaty/@Hervé Pagèsany ideas?

Andres Wokaty (14:23:56): > Is there a still an issue with this? The build reports for OCSCA.intro are all passing. Am I missing something?

Lori Shepherd (14:27:20): > looks like it cleared up on its own

Ludwig Geistlinger (14:32:20): > Interesting, well even better, thank you both@Lori Shepherdand@Andres Wokaty!

Ludwig Geistlinger (16:48:19) (in thread): > yes seems to come from changes in TSCAN or one of its dependencies

2024-03-02

Aaron Lun (17:24:13): > scRNAseqis being updated to use the newgypsumframework, and the new version should be available once it passes check on the build system (2.17.2). > > This update should be transparent to allscRNAsequsers; the main user-visible change (for now) is that the files are being pulled from the newgypsumbackend instead of ExperimentHub. (Check outhttps://github.com/ArtifactDB/gypsum-workerfor more information.) > > The OSCA book uses many ofscRNAseq’s datasets and should hopefully not be affected; nonetheless, this is an advance notice to keep watch for changes in the book compilation due to any errors during ETL from EHub togypsum. > > As for the motivation: the shift togypsumsimplifies the addition of new datasets to the package, i.e.,scRNAseqmaintainers can now add, update and review datasets themselves without waiting for manual intervention from the BioC core team. The datasets can also be read in other languages like Python, though we haven’t gotten around to writing a user-friendly interface like thescRNAseqR package. > > OSCA maintainers might consider leveraginggypsumto store intermediate analysis results, e.g., the workflow outputs that are used as input to some of the chapters. By computing and storing these intermediate artifacts - say, once per release cycle - and then saying “here’s one we prepared earlier” at the start of every chapter, maintainers can eliminate the complex dependencies between books/chapters and improve the robustness of the build. > > Anyway, I’m on vacation starting today, so responses may be less frequent, but hopefully it all goes well.

Lori Shepherd (20:23:27): > So assuming then i should add an rdatadateremoved to the eh scRNAseq objects so they are no longer available past this date?

Aaron Lun (20:42:48): > give it some more time, i think, just to make sure everything’s fine and we can roll back if there are problems. the new and old mechanisms won’t conflict with each other (and in fact, the current getters have alegacy=TRUEoption to continue pulling from EHub). > > The plan would be to soft-deprecate the EHub assets in the next release, and then actually deprecate them in the release after that, and then we can add the flag to stop availability. Some applications (e.g. kana) also pull directly from the EHub API without going through R, so those would also need to be adapted.

2024-03-03

Sridhar N (23:45:23): > @Sridhar N has joined the channel

2024-03-04

Frederick Tan (08:50:53): > @Frederick Tan has joined the channel

2024-03-05

Vince Carey (11:59:20): > since new scRNAseq is being discussed here, need to make sense of > > > z = PaulHSCData(legacy=FALSE) > The value -2^31 was detected in the dataset. > This has been converted to NA within R. > The value -2^31 was detected in the dataset. > This has been converted to NA within R. > > (2.17.1)

2024-03-07

Jared Andrews (15:40:17): > @Jared Andrews has joined the channel

2024-03-13

Kozo Nishida (06:23:21): > Is there a pre-set-up Docker image to reproducehttps://github.com/OSCA-source/OSCA.multisamplecode?

Alan O’C (07:00:51): > No, the book doesn’t use docker, but you could write a simple Dockerfile by starting from the general bioc image and then installing the (sub) book dependencies as in the README, eg clone the book repo then runBiocManager::install(remotes::local_package_deps(dependencies=TRUE))

2024-03-19

Alan O’C (05:52:04): > Workflows is broken again/still broken in devel:https://bioconductor.org/checkResults/devel/books-LATEST/OSCA.workflows/nebbiolo1-buildsrc.html

Aaron Lun (20:41:42): > looks like thecolDatacolumn names changed. I think this is caused upstream; ArrayExpress itself changed the names, I just re-used the same script to pull it down.

Aaron Lun (21:09:19): > alright, the getter now restores the old colnames, but keep in mind thatfetchDatasetuses the new colnames.

2024-03-27

abhich (05:46:07): > @abhich has joined the channel

2024-03-28

Ivan Osinnii (02:42:19): > @Ivan Osinnii has joined the channel

2024-04-03

Ivan Osinnii (07:49:38): > Hi everyone. I am using singleR package to run automatic cell annotation using sce objects. I observed likely a batch effect between my query and reference datasets. To correct for that I used MNN quick correction described in chapter 1.2 and 1.6 of OSCA multisample book. quick.corrected datasets contain only “reconstructed” assay with many negative values inside. I split them again to query and ref datasets. And I seemingly cannot run logNormCounts function on them in order to provide “logcounts” for reference in SingleR. Could you please suggest how could I overcome this?

2024-04-04

Ivan Osinnii (07:48:29): > Could someone please check the source file for an example from the book Basic Chapter 7 > > sce.tasic <- TasicBrainData() > > Gives an error download failed > web resource path: ‘https://experimenthub.bioconductor.org/fetch/2594’ > reason: Internal Server Error (HTTP 500).

Vince Carey (10:05:19): > I had no problems in R-patched/Bioc 3.18 or R-devel/Bioc 3.19, using library(scRNAseq) before the call, although the latter uses a different API to obtain the data. Please provide sessionInfo() outcome when reporting problems.

Ivan Osinnii (11:57:16) (in thread): > Thank you. When I called library(scRNAseq) before sce.tasic <- TasicBrainData() it worked. I just loaded this library in the beginning of the script to load ZeiselBrainData() and I thought there is no need to load this library second time

Vince Carey (13:04:36) (in thread): > Glad it worked, but the 500 error probably has a different origin. If it happens again please be in touch. It could very well be intermittent and not reproducible.

Vince Carey (13:58:54): > I noticed that CodeDepends(github)has been archived from CRAN. I suspect it has a significant role in rebook?

Aaron Lun (14:00:14): > ah great.

2024-04-05

Aaron Lun (12:41:06): > i pinged gabe directly.

2024-04-07

Peter Hickey (22:00:18): > I got back today from 3 weeks leave and see lots of new(?) breakages in devel for OSCA (https://bioconductor.org/checkResults/3.19/books-LATEST/). These don’t seem to have been discussed here. Has anyone started looking into it? I’ll do so later this week

2024-04-08

Alan O’C (04:38:09): > Sorry yeah, I briefly looked into multisample and I think? it was related to the error in workflows, but I’ll need to check

Alan O’C (07:04:13): > Yeah, I think multisample is broken because of workflows. I’ll leave it up to whoever is in charge of workflows, but the error seems to be clustering again. The error here:https://bioconductor.org/checkResults/3.19/books-LATEST/OSCA.workflows/nebbiolo1-buildsrc.htmlis checking if there’s a strong donor effect in the clustering results. I’d say 0.395 is still “strong” so can probs just be updated. Probably the same thing causing clustering/pseudotime changes elsewhere

Peter Hickey (18:26:35): > Thanks, Alan.OSCA.workflowsis me, so I’ll start with it

2024-04-14

Vince Carey (07:38:33): > https://bioconductor.org/books/3.18/OSCA.basic/cell-type-annotation.html#motivation-3has a hyperlink to the SingleR book inltla.github.iothat does not resolve. Last sentence of 7.2.1. I will see about making a PR. - Attachment (bioconductor.org): Chapter 7 Cell type annotation | Basics of Single-Cell Analysis with Bioconductor > Chapter 7 Cell type annotation | Basics of Single-Cell Analysis with Bioconductor

2024-04-15

Alan O’C (10:53:22): > Hmm, with the amount of external links we should probably be linting the URLs as part of the the checks

Alan O’C (10:54:02): > Also sad that’s gone, I recall there being a few good resources on that site, if I remember well one about data fraud

Aaron Lun (10:56:47): > huh i don’t remember deleting it

Alan O’C (11:04:07): > There I was assuming your corporate overlords had told you to play nice:upside_down_face:

Aaron Lun (11:16:28): > hold on, there might be separate issues here. > > The link to the singler book should probably point to the bioconductor domain, i probably forgot to update it in the intro. > > The data fraud stuff washttps://ltla.github.io/SingleCellThoughts/

Aaron Lun (11:19:14): > i forgot i wrote that. holy crap it’s a fun read.

Aaron Lun (11:20:12): > should update it to mention dall-e for fake image generation

Aaron Lun (11:20:53): > actually, there is a section on neural networks. wow.

Alan O’C (11:24:21): > Yes, I like to point people to it on the topic of “what if somebody vaguely competent decided to fake their data?” rather than the truly substandard forgeries people tend to catch

Alan O’C (11:25:29): > Yeah presumably the SingleR link should point tohttps://bioconductor.org/books/release/SingleRBook/introduction.html, although I guess ideally it should point to the same version as OSCA - Attachment (bioconductor.org): Chapter 1 Introduction | Assigning cell types with SingleR > The SingleR book. Because sometimes, a vignette just isn’t enough.

2024-04-16

Alan O’C (10:34:44) (in thread): > Narrowed it down slightly@Peter Hickey, the metadata has changed because of changes in scRNAseq, the old crosstab of donor is: > AZ HP1502401 HP1504101T2D HP1504901 HP1506401 HP1507101 > 96 352 383 383 383 383 > HP1508501T2D HP1509101 HP1525301T2D HP1526901T2D > 383 383 384 384 > > The new one is > > H1 H2 H3 H4 H5 H6 T2D1 T2D2 T2D3 T2D4 > 96 352 383 383 383 383 383 383 384 384 > > So the numbers line up, but in QC we subset some of the samples because they’re low quality to define the quality metrics in the others, therefore the QC changes so everything downstream changes

Alan O’C (10:34:49) (in thread): > I’ll try make a PR

Peter Hickey (18:48:48) (in thread): > Thanks for making the PR. I’ve found a few changes to the metadata for datasets inscRNAseqthat I’ve been following up with Aaron (https://github.com/LTLA/scRNAseq/issues/47,https://github.com/LTLA/scRNAseq/issues/48). I’ll take a look this morning as I’m not yet sure if this is related those, but I’ll sort out a solution for all these minor annoyances - Attachment: #47 colData change for GrunPancreasData in BioC 3.19; worth making legacy = TRUE the default in BioC 3.19 - Attachment: #48 colData change for SegerstolpePancreasData in BioC 3.19

2024-04-17

Peter Hickey (02:04:53) (in thread): > I thinkOSCA.workflowsis now fixed and updated on bioc git. I think this will fix some of the build/check failures for the other sub-books, but still investigating

Alan O’C (05:23:58) (in thread): > Ah nice one, I thought you might have been too busy to get to the workflows bit yet so wanted to get started if not. Yes will try look into that today

2024-04-21

Ludwig Geistlinger (14:59:58): > Thetrajectory analysis chapterin the OSCA advanced sub-book currently fails to build in devel becausetradeSeq is not available. It seems tradeSeq is not available becauseclusterExperiment is not available. It seems clusterExperiment is not available becausehowmany has been removed from CRAN. Can this be fixed?@Elizabeth Purdom@Davide Risso@Hector Roux de Bézieux@Koen Van den Berge

Elizabeth Purdom (15:00:02): > @Elizabeth Purdom has joined the channel

Hector Roux de Bézieux (15:00:02): > @Hector Roux de Bézieux has joined the channel

2024-04-22

Vince Carey (06:04:04) (in thread): > Hi Ludwig – did you try to contact the howmany author? The email in the howmany DESCRIPTION seems out of date. IIUC Meinshausen is at ETHZ but the email is Oxford.

Ludwig Geistlinger (14:35:57) (in thread): > Thanks Vince. I am passing this question on to@Elizabeth Purdomand@Davide Risso. Also how essential ishowmanyforclusterExperiment? Can it be replaced if needed?

2024-04-24

Davide Risso (13:15:30) (in thread): > Hi Ludwig,thanks for the heads-up! Elizabeth is on it and hopefully the issue will be fixed soon.

Ludwig Geistlinger (13:28:17) (in thread): > Thanks for the update Davide!

2024-05-06

Michal Kolář (11:57:11): > @Michal Kolář has joined the channel

2024-05-09

Philippe Laffont (07:39:54): > @Philippe Laffont has joined the channel

2024-05-15

Sunil Nahata (08:31:52): > @Sunil Nahata has joined the channel

2024-05-21

Vince Carey (07:15:23): > I am doing some rebuilding (actually just running vignettes) in AnVIL and have run into > > Error in FUN(X[[i]], ...) : object 'hk' not found > > a few times. I solve it withhk=1before building. Any clues?

Peter Hickey (18:51:41): > no idea, sorry. never seen that before when building locally

2024-05-27

Peter Hickey (18:57:43) (in thread): > @Ludwig Geistlingerany updates? the trajectory chapter remains broken in both BioC 3.19 and 3.20

2024-05-28

Ludwig Geistlinger (10:29:42) (in thread): > Thanks for the nudge Pete. The update here is that the clusterExperiment/tradeSeq issue seems to have been resolved and we are back to the original issue that the shape of the trajectory has changed which triggers some of the failsafe checks. Looking into that based on the PR that you had put in to fix that at some point.

Koen Van den Berge (15:45:34) (in thread): > Thank you,@Ludwig Geistlinger. Let me know if anything is required from the tradeSeq side.

Peter Hickey (17:59:40) (in thread): > That PR was pretty old and still work in progress. i’m unsure how much will directly apply.

2024-05-29

Ludwig Geistlinger (16:31:13) (in thread): > I’ve put in a fix in devel. If it gets through on the builders, I’ll migrate over to release as well.

2024-06-02

Ludwig Geistlinger (15:40:49) (in thread): > Thetrajectory analysis chapterin the OSCA advanced sub-book currentlyfails to build in develbecause of anissue with velociraptor. Do you already have this on the radar Kevin@Kevin Rue-Albrecht?

2024-06-03

Kevin Rue-Albrecht (06:27:57) (in thread): > I’ve been fixing velociraptor recently. > v1.15.2 is on its way:https://bioconductor.org/checkResults/devel/bioc-LATEST/velociraptor/

Kevin Rue-Albrecht (06:28:46) (in thread): > although I’m not sure whether the issue on Linux is something I can fix or a BBS issue: > > ImportError: /var/cache/basilisk/1.17.0/velociraptor/1.15.1/env/lib/python3.9/site-packages/PIL/../../../libtiff.so.6: undefined symbol: jpeg12_write_raw_data, version LIBJPEG_8.0 > > https://bioconductor.org/checkResults/devel/bioc-LATEST/velociraptor/nebbiolo2-buildsrc.html

Kevin Rue-Albrecht (07:18:56) (in thread): > Looking into it further, I really can’t tell what changed. > It seems the new conda environment is using ‘libjpeg-turbo==3.0.0’,https://github.com/kevinrue/velociraptor/blob/devel/R/basilisk.R#L51C1-L52C1While it wasn’t specified before.

Kevin Rue-Albrecht (07:19:41) (in thread): > All i can say is that it works on the linux cluster at work. I don’t know what to look for to compare with the BBS

Kevin Rue-Albrecht (08:33:26) (in thread): > I don’t understand howlibjpeg-turboworks.https://github.com/libjpeg-turbo/libjpeg-turbo/it seems to havejpeg12_write_raw_datahttps://github.com/libjpeg-turbo/libjpeg-turbo/blob/3c17063ef1ab43f5877f19d670dc39497c5cd036/jsamplecomp.h#L118However, I don’t know whether it’s supposed to getLIBJPEG_8.0from > Am I reading this right that it’s using whatever libjpeg it find in the system and declares that version here?https://github.com/libjpeg-turbo/libjpeg-turbo/blob/3c17063ef1ab43f5877f19d670dc39497c5cd036/release/libjpeg.pc.in#L8In which case it points to an update needed on the BBS rather than anything I can do in velociraptor

Kevin Rue-Albrecht (09:52:44) (in thread): > this seems relevant;https://github.com/python-pillow/Pillow/issues/6864

Hervé Pagès (22:14:04) (in thread): > The mystery here is that I can run the following code with no problem in an interactive session on nebbiolo2 (Ubuntu 22.04): > > library(scuttle) > sce1 <- mockSCE() > sce2 <- mockSCE() > spliced <- counts(sce1) > unspliced <- counts(sce2) > library(velociraptor) > out <- scvelo(list(X=spliced, spliced=spliced, unspliced=unspliced)) > > It also works fine in the context ofR CMD check. But it fails with theImportError: ... libtiff.so.6: undefined symbol: jpeg12_write_raw_data, version LIBJPEG_8.0error when run in the context ofR CMD build:thinking_face:Can someone else reproduce thisR CMD builderror on Ubuntu 22.04? Everything works fine on my laptop (Ubuntu 23.10).@Kevin Rue-AlbrechtWhat OS do they have on your linux cluster at work?

2024-06-04

Kevin Rue-Albrecht (03:40:11) (in thread): > uname -a > > Linux 5.15.0-89-generic #99-Ubuntu SMP Mon Oct 30 20:42:41 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux > > > $ lsb_release -a > No LSB modules are available. > Distributor ID: Ubuntu > Description: Ubuntu 22.04.2 LTS > Release: 22.04 > Codename: jammy >

Kevin Rue-Albrecht (03:40:30) (in thread): > Thanks for looking into this !

Hervé Pagès (12:01:58) (in thread): > I can reproduce this onbioconductor_docker:develso it’s not a peculiarity of nebbiolo2. The same happens onbioconductor_docker:devel: the error only happens in the context ofR CMD build, but everything works fine interactively or in the context ofR CMD check.

Vince Carey (13:03:38) (in thread): > is python involved? i have been distracted. the different phases of R build vs check could use, unintentionally, different dynamic libs for python modules that kevin might have gotten lucky on

2024-06-06

Vince Carey (21:19:36) (in thread): > @Kevin Rue-Albrechti just want to verify that what succeeds “at your work server” is R CMDbuild. The mystery to this point is that manual installation of the package (without build) produces a working code stack, but R CMD build cannot resolve a symbol in libtiff.so. “Debugging” R CMD build is tricky. I’ve explored various ways of modifying the task (e.g., introducing a newer miniconda) to no avail.

2024-06-07

Kevin Rue-Albrecht (04:16:18) (in thread): > well, what succeeds is build followed by running the example code in?scvelo

Vince Carey (06:03:28) (in thread): > Can you run these commands > > > tsyms = system(sprintf("nm %s/velociraptor/1.15.2/env/lib/libtiff.so", getExternalDir()), intern=TRUE) > > grep("jpeg_write_raw", tsyms, value=TRUE) > [1] " U jpeg_write_raw_data@LIBJPEG_8.0" > [2] "0000000000034ce0 t TIFFjpeg_write_raw_data" > [3] "0000000000038990 t TIFFjpeg_write_raw_data" > > and compare to my output

Vince Carey (06:07:05) (in thread): > (getExternalDir requires library(basilisk.utils) or equivalent)

Vince Carey (06:17:09) (in thread): > reading the 3.20 build log for OSCA.advanced (but not for velociraptor) more closely i see that an error is coming up with .activate_fallback(…) … this could be an important clue.

Kevin Rue-Albrecht (07:09:01) (in thread): > Right now > > > tsyms = system(sprintf("nm %s/velociraptor/1.15.2/env/lib/libtiff.so", getExternalDir()), intern=TRUE) > nm: '/ceph/project/sims-lab/albrecht/R-cache/R/basilisk/1.14.1/velociraptor/1.15.2/env/lib/libtiff.so': No such file > Warning message: > In system(sprintf("nm %s/velociraptor/1.15.2/env/lib/libtiff.so", : > running command 'nm /ceph/project/sims-lab/albrecht/R-cache/R/basilisk/1.14.1/velociraptor/1.15.2/env/lib/libtiff.so' had status 1 > > If I run the command directly on the command line > > $ nm /ceph/project/sims-lab/albrecht/R-cache/R/basilisk/1.14.1/velociraptor/1.15.2/env/lib/libtiff.so > nm: '/ceph/project/sims-lab/albrecht/R-cache/R/basilisk/1.14.1/velociraptor/1.15.2/env/lib/libtiff.so': No such file >

Kevin Rue-Albrecht (07:09:21) (in thread): > > > grep("jpeg_write_raw", tsyms, value=TRUE) > character(0) >

Vince Carey (11:21:54) (in thread): > the basilisk version in use is out of date. i don’t know whether that could block your view of the problem we are seeing. In devel branch I am seeing 1.17.0.

2024-06-15

Ludwig Geistlinger (06:13:34) (in thread): > Note that this issue now also started to appear in release forvelociraptorand consequently also forOSCA.advanced.

2024-06-17

Kevin Rue-Albrecht (04:29:52) (in thread): > I’ve lost track of what I’m supposed to test and how

Kevin Rue-Albrecht (04:59:06) (in thread): > FYI, we don’t have R 4.4 on the institute’s Linux cluster. I guess I’ll have to figure out again if and how to use docker images

Kevin Rue-Albrecht (05:15:44) (in thread): > We’re forced to useapptainerto run docker images. > I’ve tried running the bioc devel image > > apptainer run[docker://bioconductor/bioconductor_docker:devel](docker://bioconductor/bioconductor_docker:devel) > > and after unpacking the layers it crashed with > > INFO: Creating SIF file... > s6-overlay-preinit: fatal: unable to mkdir /var/run/s6: Function not implemented >

Kevin Rue-Albrecht (06:27:28) (in thread): > Forget the last issue above. Solved it with IT. > I’m now working within the container, figuring out whether I can install all the dependencies and replicate the issue.

Kevin Rue-Albrecht (08:31:15) (in thread): > OK. So, I’ve managed to reproduce the error in the bioc-devel docker image > > libiconv-1.17 | 689 KB | ########## | 100% > Preparing transaction: ...working... done > Verifying transaction: ...working... done > Executing transaction: ...working... done > > Quitting from lines 76-82 [unnamed-chunk-5] (velociraptor.Rmd) > Error: processing vignette 'velociraptor.Rmd' failed with diagnostics: > ImportError: /ceph/project/sims-lab/albrecht/R-cache/R/basilisk/1.17.0/velociraptor/1.15.2/env/lib/python3.9/site-packages/PIL/../../../libtiff.so.6: undefined symbol: jpeg12_write_raw_data, version LIBJPEG_8.0 > Run `reticulate::py_last_error()` for details. > --- failed re-building 'velociraptor.Rmd' > > The question is what do I do now:confused:

Kevin Rue-Albrecht (08:43:27) (in thread): > @Vince CareyIn the bioc-devel container running on my institute’s Linux cluster, I get the same as you > > > tsyms = system(sprintf("nm %s/velociraptor/1.15.2/env/lib/libtiff.so", getExternalDir()), intern=TRUE) > > grep("jpeg_write_raw", tsyms, value=TRUE) > [1] " U jpeg_write_raw_data@LIBJPEG_8.0" > [2] "0000000000034ce0 t TIFFjpeg_write_raw_data" > [3] "0000000000038990 t TIFFjpeg_write_raw_data" >

Kevin Rue-Albrecht (09:20:13) (in thread): > I don’t think I’ve linked this one yet:https://github.com/python-pillow/Pillow/issues/7269which points tolibjpeg-turboas I also homed in to in one of my earlier messageshttps://community-bioc.slack.com/archives/CM2CUGBGB/p1717413536290059?thread_ts=1713725998.833219&cid=CM2CUGBGBPerhaps naive, my next objective is to pinpillowto the last version before they switched tolibjpeg-turbo - Attachment: Attachment > Looking into it further, I really can’t tell what changed. > It seems the new conda environment is using ‘libjpeg-turbo==3.0.0’, > https://github.com/kevinrue/velociraptor/blob/devel/R/basilisk.R#L51C1-L52C1 > While it wasn’t specified before.

Vince Carey (15:58:16) (in thread): > hoping that this works!

Kevin Rue-Albrecht (17:05:21) (in thread): > couldn’t find any.:confused:

Kevin Rue-Albrecht (17:07:30) (in thread): > I’ve tried pinning a few versions of pillow (randomly) back to 9.0.5 and actually broke the Conda solver with alleged conda version conflicts which did not look like conflicts ( there were severals hundreds if not thousands lines of conflict but the ones I inspected did not show any actual conflict)

Kevin Rue-Albrecht (17:07:33) (in thread): > anyway, I’ve openedhttps://github.com/python-pillow/Pillow/issues/8148 - Attachment: #8148 Pillow installed via Conda on Linux errors with libtiff.so.6: undefined symbol: jpeg12_write_raw_data, version LIBJPEG_8.0 > What did you do? > > I am debugging an issue for a Bioconductor R package that doesn’t build on Linux at the moment:
> https://bioconductor.org/checkResults/devel/bioc-LATEST/velociraptor/nebbiolo2-buildsrc.html > > I have replicated the issue on a HPC at work. However, the HPC does not provide the version of R that I need, I have followed IT guidelines and used Apptainer to work in a container that provides everything I need to replicate the issue. > > The issue arose when I ran R CMD build on the Bioconductor R package that uses a Conda environment to run some Python code internally. > > > apptainer shell \ > --writable-tmpfs \ > -B /project/sims-lab/albrecht/git/kevinrue/velociraptor \ > -B /project/sims-lab/albrecht/gypsum \ > -B /ceph/project/sims-lab/albrecht/R-cache \ > /project/sims-lab/albrecht/mycontainers/velociraptor.sif > > # -B /project/sims-lab/albrecht/gypsum \ needed to download and access test data > # -B /ceph/project/sims-lab/albrecht/R-cache \ needed to set up the conda environment using basilisk > > # needed to access data during R CMD build > export GYPSUM_CACHE_DIR="/project/sims-lab/albrecht/gypsum" > > # R CMD build > R CMD build velociraptor > > > What did you expect to happen? > > The package used to pass R CMD build, e.g. https://bioconductor.org/checkResults/3.18/bioc-LATEST/velociraptor/nebbiolo2-install.html > > What actually happened? > > The R function that creates the Conda environment and runs the Python code crashes with the error > > > ImportError: /ceph/project/sims-lab/albrecht/R-cache/R/basilisk/1.17.0/velociraptor/1.15.2/env/lib/python3.9/site-packages/PIL/../../../libtiff.so.6: undefined symbol: jpeg12_write_raw_data, version LIBJPEG_8.0 > > > Which I have seen in the previous issue #7269 but cannot find a way to fix in my case. > > What are your OS, Python and Pillow versions? > > • OS: Linux > • Python: 3.9.12 > • Pillow: 10.3.0 > > > Please paste here the output of running: > > python3 -m PIL.report > or > python3 -m PIL --report > > Or the output of the following Python code: > > from PIL import report > # or > from PIL import features > features.pilinfo(supported_formats=False) > > > > -------------------------------------------------------------------- > Pillow 10.3.0 > Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:34:28) > [GCC 10.3.0] > -------------------------------------------------------------------- > Python executable is /ceph/project/sims-lab/albrecht/R-cache/R/basilisk/1.17.0/velociraptor/1.15.2/env/bin/python > System Python files loaded from /ceph/project/sims-lab/albrecht/R-cache/R/basilisk/1.17.0/velociraptor/1.15.2/env > -------------------------------------------------------------------- > Python Pillow modules loaded from /ceph/project/sims-lab/albrecht/R-cache/R/basilisk/1.17.0/velociraptor/1.15.2/env/lib/python3.9/site-packages/PIL > Binary Pillow modules loaded from /ceph/project/sims-lab/albrecht/R-cache/R/basilisk/1.17.0/velociraptor/1.15.2/env/lib/python3.9/site-packages/PIL > -------------------------------------------------------------------- > --- PIL CORE support ok, compiled for 10.3.0 > --- TKINTER support ok, loaded 8.6 > --- FREETYPE2 support ok, loaded 2.12.1 > --- LITTLECMS2 support ok, loaded 2.16 > --- WEBP support ok, loaded 1.4.0 > --- WEBP Transparency support ok > --- WEBPMUX support ok > --- WEBP Animation support ok > --- JPEG support ok, compiled for libjpeg-turbo 3.0.0 > --- OPENJPEG (JPEG2000) support ok, loaded 2.5.2 > --- ZLIB (PNG/ZIP) support ok, loaded 1.2.11 > --- LIBTIFF support ok, loaded 4.6.0 > ***** RAQM (Bidirectional Text) support not installed > ***** LIBIMAGEQUANT (Quantization method) support not installed > --- XCB (X protocol) support ok > -------------------------------------------------------------------- >

Kevin Rue-Albrecht (17:40:48) (in thread): > in the meantime, I can only throw my hands in the air, because I’m out of my depth here, I have no clue if the fix is within my power or waiting forpilloworlibjpeg-turboto fix whatever the issue is

2024-06-18

Kevin Rue-Albrecht (05:46:16) (in thread): > TL;DR:https://github.com/python-pillow/Pillow/issues/8148#issuecomment-2174599174 > * The maintainer of pillow thinks the issue is out of his hands > * He redirected me to those controlling the build process of conda-forge pillow - Attachment: Comment on #8148 Pillow installed via Conda on Linux errors with libtiff.so.6: undefined symbol: jpeg12_write_raw_data, version LIBJPEG_8.0 > In #7269, the undefined symbol was in _imaging.so, so I do think that’s different. > > Allow me to try and understand - you’ve pasted the features/report info, showing the Pillow installation and the versions of its dependencies. So you are able to install Pillow successfully. What you’re talking about is something else importing Pillow, and that triggers the error. > > > that doesn’t build on Linux at the moment > > So it did build on Linux previously at https://bioconductor.org/checkResults/3.18/bioc-LATEST/velociraptor/nebbiolo2-install.html, but now fails at https://bioconductor.org/checkResults/devel/bioc-LATEST/velociraptor/nebbiolo2-buildsrc.html. > > Any idea what triggered this change? > > > to run some Python code internally. > > From a quick browse of https://github.com/kevinrue/velociraptor, is scvelo where this becomes a Python matter? So was this triggered by the recent kevinrue/velociraptor#71? > > Ultimately, I’m not sure that we’ll be able to fix this from our end. It sounds like the error is in how libtiff was built, and we don’t control the build process of conda-forge pillow - that is maintained over at https://github.com/conda-forge/pillow-feedstock. Or maybe it is even a task for https://github.com/conda-forge/libtiff-feedstock, since they will willing to deal with conda-forge/libtiff-feedstock#78

Kevin Rue-Albrecht (05:50:24) (in thread): > I’ve tried to runR CMD buildon the latest version of velociraptor for Bioc 3.18 usingbioconductor/bioconductor_docker:RELEASE_3_18and even there Conda fails to resolve the scvelo environment (we didn’t pin everything at the time). > In other words, I don’t think we can even mimic the behaviour of velociraptor in BioC 3.18 anymore. To confirm that, I’ll try running the example in?scvelousing BioC 3.18, installing all packages from Bioc without building anything locally. I’m afraid that - due to incomplete pinning of conda packages - the package doesn’t work anymore in BioC 3.18 even though it’s reported as passing the build at the time

Vince Carey (07:17:04) (in thread): > We are on the same page. I wonder if there is a way to reconstruct the version collection that worked in the past. Did you ever push a container that had a working environment? You could use python’s session_info to get all the associated versions. Marcel has put effort into collating calendar dates with contemporaneous bioc/CRAN package collections (BiocArchive). I wonder if anyone has tackled the problem for conda?

Kevin Rue-Albrecht (07:55:49) (in thread): > Unfortunately, I don’t have a copy of the full environment or its specs. I’m chasing up people who might.

Kevin Rue-Albrecht (08:56:12) (in thread): > @Charlotte Sonesongave me a snapshot of the working scvelo environment form bioc 3.18. I’m checking now whether I can plug that back in the devel version of velociraptor (if so, I’ll do release next). > Note: > * That will be a step backward in term of scvelo (0.3.2 -> 0.2.2) > * I hope for it to be a temporary set back while the issue is investigated athttps://github.com/conda-forge/libtiff-feedstock/issues/104 - Attachment: #104 libtiff.so.6: undefined symbol: jpeg12_write_raw_data, version LIBJPEG_8.0 > Solution to issue cannot be found in the documentation. > > ☑︎ I checked the documentation. > > Issue > > May I redirect you to python-pillow/Pillow#8148 to give you all the details and some additional context without copy pasting here? > > I am happy to clarify anything you need to investigate further! > > Installed packages > > ``` > # Name Version Build Channel > _libgcc_mutex 0.1 conda_forge conda-forge > _openmp_mutex 4.5 2_gnu conda-forge > absl-py 2.1.0 pyhd8ed1ab_0 conda-forge > anndata 0.10.7 pyhd8ed1ab_0 conda-forge > annotated-types 0.7.0 pyhd8ed1ab_0 conda-forge > anyio 4.3.0 pyhd8ed1ab_0 conda-forge > aom 3.9.0 hac33072_0 conda-forge > array-api-compat 1.6 pyhd8ed1ab_0 conda-forge > arrow 1.3.0 pyhd8ed1ab_0 conda-forge > beautifulsoup4 4.12.3 pyha770c72_0 conda-forge > blessed 1.19.1 pyhe4f9e05_2 conda-forge > blosc 1.21.5 hc2324a3_1 conda-forge > boto3 1.34.111 pyhd8ed1ab_0 conda-forge > botocore 1.34.111 pyge38_1234567_0 conda-forge > brotli 1.1.0 hd590300_1 conda-forge > brotli-bin 1.1.0 hd590300_1 conda-forge > brotli-python 1.1.0 py39h3d6467e_1 conda-forge > bzip2 1.0.8 hd590300_5 conda-forge > c-ares 1.28.1 hd590300_0 conda-forge > ca-certificates 2024.2.2 hbcca054_0 conda-forge > cachecontrol 0.14.0 pyhd8ed1ab_0 conda-forge > cachecontrol-with-filecache 0.14.0 pyhd8ed1ab_0 conda-forge > cached-property 1.5.2 hd8ed1ab_1 conda-forge > cached_property 1.5.2 pyha770c72_1 conda-forge > certifi 2024.2.2 pyhd8ed1ab_0 conda-forge > cffi 1.16.0 py39h7a31438_0 conda-forge > charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge > chex 0.1.86 pyhd8ed1ab_0 conda-forge > cleo 2.1.0 pyhd8ed1ab_0 conda-forge > click 8.1.7 unix_pyh707e725_0 conda-forge > colorama 0.4.6 pyhd8ed1ab_0 conda-forge > contextlib2 21.6.0 pyhd8ed1ab_0 conda-forge > contourpy 1.2.1 py39h7633fee_0 conda-forge > crashtest 0.4.1 pyhd8ed1ab_0 conda-forge > croniter 1.3.15 pyhd8ed1ab_0 conda-forge > cryptography 42.0.7 py39h8169da8_0 conda-forge > cycler 0.12.1 pyhd8ed1ab_0 conda-forge > dateutils 0.6.12 py_0 conda-forge > dav1d 1.2.1 hd590300_0 conda-forge > dbus 1.13.6 h5008d03_3 conda-forge > deepdiff 7.0.1 pyhd8ed1ab_0 conda-forge > distlib 0.3.8 pyhd8ed1ab_0 conda-forge > dnspython 2.6.1 pyhd8ed1ab_1 conda-forge > docrep 0.3.2 pyh44b312d_0 conda-forge > dulwich 0.21.7 py39hd1e30aa_0 conda-forge > email-validator 2.1.1 pyhd8ed1ab_0 conda-forge > email_validator 2.1.1 hd8ed1ab_0 conda-forge > et_xmlfile 1.1.0 pyhd8ed1ab_0 conda-forge > etils 1.6.0 pyhd8ed1ab_0 conda-forge > exceptiongroup 1.2.0 pyhd8ed1ab_2 conda-forge > expat 2.6.2 h59595ed_0 conda-forge > fastapi 0.111.0 pyhd8ed1ab_0 conda-forge > fastapi-cli 0.0.4 pyhd8ed1ab_0 conda-forge > filelock 3.14.0 pyhd8ed1ab_0 conda-forge > flax 0.8.3 pyhd8ed1ab_0 conda-forge > fonttools 4.51.0 py39hd1e30aa_0 conda-forge > freetype 2.12.1 h267a509_2 conda-forge > fsspec 2023.12.2 pyhca7485f_0 conda-forge > get-annotations 0.1.2 pyhd8ed1ab_0 conda-forge > gmp 6.3.0 h59595ed_1 conda-forge > gmpy2 2.1.5 py39h048c657_1 conda-forge > h11 0.14.0 pyhd8ed1ab_0 conda-forge > h2 4.1.0 py39hf3d152e_0 conda-forge > h5py 3.11.0 nompi_py39h24b94d4_101 conda-forge > hdf5 1.14.3 nompi_h4f84152_101 conda-forge > hpack 4.0.0 pyh9f0ad1d_0 conda-forge > httpcore 1.0.5 pyhd8ed1ab_0 conda-forge > httpx 0.27.0 pyhd8ed1ab_0 conda-forge > hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge > icu 73.2 h59595ed_0 conda-forge > idna 3.7 pyhd8ed1ab_0 conda-forge > importlib-metadata 7.1.0 pyha770c72_0 conda-forge > importlib-resources 6.4.0 pyhd8ed1ab_0 conda-forge > importlib_metadata 7.1.0 hd8ed1ab_0 conda-forge > importlib_resources 6.4.0 pyhd8ed1ab_0 conda-forge > inquirer 3.1.4 pyhd8ed1ab_0 conda-forge > itsdangerous 2.2.0 pyhd8ed1ab_0 conda-forge > jaraco.classes 3.4.0 pyhd8ed1ab_1 conda-forge > jax 0.4.27 pyhd8ed1ab_0 conda-forge > jaxlib 0.4.23 cpu_py39hf4a887c_2 conda-forge > jeepney 0.8.0 pyhd8ed1ab_0 conda-forge > jinja2 3.1.4 pyhd8ed1ab_0 conda-forge > jmespath 1.0.1 pyhd8ed1ab_0 conda-forge > joblib 1.4.2 pyhd8ed1ab_0 conda-forge > keyring 24.3.1 py39hf3d152e_0 conda-forge > keyutils 1.6.1 h166bdaf_0 conda-forge > kiwisolver 1.4.5 py39h7633fee_1 conda-forge > krb5 1.21.2 h659d440_0 conda-forge > lcms2 2.16 hb7c19ff_0 conda-forge > ld_impl_linux-64 2.40 h55db66e_0 conda-forge > legacy-api-wrap 1.4 pyhd8ed1ab_1 conda-forge > lerc 4.0.0 h27087fc_0 conda-forge > libabseil 20240116.2 cxx17_h59595ed_0 conda-forge > libaec 1.1.3 h59595ed_0 conda-forge > libavif16 1.0.4 hd2f8ffe_4 conda-forge > libblas 3.9.0 22_linux64_openblas conda-forge > libbrotlicommon 1.1.0 hd590300_1 conda-forge > libbrotlidec 1.1.0 hd590300_1 conda-forge > libbrotlienc 1.1.0 hd590300_1 conda-forge > libcblas 3.9.0 22_linux64_openblas conda-forge > libcurl 8.8.0 hca28451_0 conda-forge > libdeflate 1.20 hd590300_0 conda-forge > libedit 3.1.20191231 he28a2e2_2 conda-forge > libev …

Kevin Rue-Albrecht (09:40:30) (in thread): > looks like a step back will be helping us for the time being https://github.com/kevinrue/velociraptor/actions/runs/9565453250/job/26368421543Hopefully we’ll be able to leap forward whenever the libtiff issue gets resolved

Vince Carey (09:48:44) (in thread): > :thumbsup:

2024-06-19

Vince Carey (14:08:52) (in thread): > But lots of red for 3.19 chapters at this time… e.g, segfault in advanced chunk 26/62

Peter Hickey (17:55:53) (in thread): > Specifically, OSCA.basic, OSCA.advanced, OSCA.multisample, OSCA.workflows. > > All seem to be because of an issue with chunks involving SingleR and not something i’ve seen before. > SingleR is building fine. > Hoping it was something sporadic and will check again next week, but if someone can reproduce that would help

2024-06-20

Kevin Rue-Albrecht (04:41:46) (in thread): > Woohoo soit’snot just me:stuck_out_tongue_winking_eye:

Vince Carey (08:48:46) (in thread): > Within the 3.19 container (freshly pulled) with OSCA.basic sources for cell-annotation.Rmd, I hit > > processing file: cell-annotation.Rmd > |............ | 26% [unnamed-chunk-7] > Quitting from lines at lines 126-132 [unnamed-chunk-7] (cell-annotation.Rmd) > Error in `sce.muraro$label`: > ! $ operator is invalid for atomic vectors >

Vince Carey (08:49:24) (in thread): > oh but sce.muraro is NOT_AVAILABLE, which means I am building it wrong.

Vince Carey (08:50:56) (in thread): > giving up on reproducing within container, but it should be illuminating. what is a good way to build book chapters short of full compilation?

Peter Hickey (17:26:35) (in thread): > TheNOT_AVAILABLEis a caching issue, I think

Peter Hickey (17:26:55) (in thread): > From OSCA/rebook’s own caching mechanism, I think

2024-06-21

Vince Carey (08:36:54) (in thread): > yes, NOT_AVAILABLE means I am just not respecting the order of computation that would occur with a full book build. Maybe if I did that first I could work on individual chapters. but a shortcut would be good to have.

2024-06-24

Peter Hickey (17:39:25): > OSCA building successfully again in BioC 3.19, with the exception ofOSCA.introwhich looks like a sporadicBiocFileCacheissue (https://bioconductor.org/checkResults/3.19/books-LATEST/OSCA.intro/nebbiolo1-buildsrc.html)

Peter Hickey (17:40:37): > OSCA failing in BioC 3.20, but these all seem to point to an issue loading resourceAH73905fromAnnotationHub. I tested locally and could not reproduce, so again assuming this is a sporadic caching or hub issue

Lori Shepherd (20:04:01) (in thread): > I can check both of these tomorrow

2024-06-25

Lori Shepherd (07:42:01) (in thread): > So I’m not sure how much I can help with this. If there are duplicate entries than that would be an issue with the code and how files are entered/retrieved? I could reset the cache but if it keeps occurring than there should be more investigation. Can you give me the code that is run to reproduce this and I can try to get more info on the builder? Since the chunks are unnamed I’m having trouble pulling out what code I should run

Lori Shepherd (07:45:24) (in thread): > this should be resolved. let me know if it is not

Kevin Rue-Albrecht (08:27:36) (in thread): > oh crap - it seems that i forgot to push the latest commit upstream, i only pushed it to github (velociraptor)

Peter Hickey (17:44:13) (in thread): > These are the 2 chunks in that file involvingBiocFileCache: > 1. https://github.com/OSCA-source/OSCA.intro/blob/9e41ed50128cb4fa072060338946aa123a332b4d/inst/book/getting-datasets.Rmd#L89-L97 > 2. https://github.com/OSCA-source/OSCA.intro/blob/9e41ed50128cb4fa072060338946aa123a332b4d/inst/book/getting-datasets.Rmd#L138-L140 > Both work locally for me (BioC 3.20) and the same chunks are fine onnebbiolo2when building the book in BioC 3.20 (https://bioconductor.org/checkResults/3.20/books-LATEST/OSCA.intro/)

2024-06-26

Lori Shepherd (09:32:44) (in thread): > I don’t know if the books get a different cache location or are set differently but i just went onto nebbiolo1 and both of these sections work when I run them manually too. It seems like its an issue with the 2 chunk that creates its own unique cache instead of using a system default – if I can figure out where that is on nebbiolo1 I can see if there are actually duplicates or clear it but not sure where it is yet

Lori Shepherd (09:54:27) (in thread): > I can’t find where its created but is there a reason for creating the separate specific cache for raw data that would get created in the working directory? The default cache doestools::R_user_dir("BiocFileCache", which="cache")where maybe it would be appropriate and easier to debug instead of using a working directory to use a standard cache locationBiocFileCahce(tools::R_user_dir("raw_data", which="cache"), ask=FALSE)?

Lori Shepherd (09:55:37) (in thread): > It shouldn’t matter and it should work – but since I can’t find the cache location its making it harder to track it down because I don’t know what the working directory is at this point of compiling the book

Peter Hickey (19:02:10) (in thread): > @Aaron Lunany recollection of if there “is there a reason for creating the separate specific cache for raw data that would get created in the working director this was done?” as Lori asks above?

Peter Hickey (19:03:48): > And back to failing BioC 3.19 with same SingleR-related errors as we had a couple of builds back (https://community-bioc.slack.com/archives/CM2CUGBGB/p1718834153985399?thread_ts=1713725998.833219&cid=CM2CUGBGB).

Peter Hickey (19:05:12): > And bizarre failures in BioC 3.20 > * OSCA.basic:could not find function "rowRanges"(https://bioconductor.org/checkResults/3.20/books-LATEST/OSCA.basic/nebbiolo2-buildsrc.html) > * OSCA.advanced: Error in.mapIds()`` (https://bioconductor.org/checkResults/3.20/books-LATEST/OSCA.advanced/nebbiolo2-buildsrc.html)

Peter Hickey (19:06:27): > Don’t have time to look more closely before the next build, so will wait to then because these again smell like weird sporadic failures that aren’t due to changes in the book itself

Aaron Lun (19:24:43) (in thread): > probably so i could easily reclaim hard drive space by deleting just that single directory, instead of having to periodically prune the global cache.

Lori Shepherd (21:17:16) (in thread): > But could it be specialized in the cache space instead of working directory, so it’s still separate but standard and easier to find/debug?

2024-06-27

Vince Carey (06:58:53): > Wow. I feel like the book is a kind of canary in the coal mine. We can’t have such chaotic build results for stable code. Has anyone looked at a github action approach for osca? Alternately one could consider a new edition using BiocBook, which does have a github-action foundation.

Ludwig Geistlinger (08:40:06): > The osca book is a complex construct with many dependencies. It showcases how many different Bioconductor packages can be used to “orchestrate single-cell analysis”. Given the many dependencies and thus many opportunities for breakage, I wouldn’t think that a github action approach or a BiocBook approach would reduce the observed frequency of breakages. What would reduce the frequency of breakage would be to slim down the dependency stack of the OSCA book, but this comes at the cost of being less representative of the many different packages that exist in Bioc for single-cell analysis.

Vince Carey (10:33:02): > Is “could not find function ‘rowRanges’” plausibly related to dependency situation? I guess I would start with a container that can build the book reliably for linux – do we have one?

Vince Carey (10:34:39): > It is just too hard to reproduce errors to diagnose them.

Aaron Lun (11:46:54): > i suspect many of the current problems are caused by the inter-subbook sharing of compiled results, which uses some extreme knitr cache magic. In hindsight this was probably too complicated. > > A hypothetical next edition of the book should probably stop doing that. With the upcoming libscran libraries, it should be cheap to just re-run the desired prior analysis for each chapter.

Ludwig Geistlinger (13:16:11) (in thread): > With dependency situation I was primarily referring to some of the recent problems with eg tradeSeq and velociraptor

Stephanie Hicks (20:32:19): > I would echo the sentiments that the inter-subbook dependencies should be reduced in another iteration, if folks agree and/or have the bandwidth

Peter Hickey (21:54:29): > I have no bandwidth to do more than keep it running as best we can in its current form. > Since no new content is really being added to OSCA or has been for the past couple of years, I actually think that’s an okay state to be in. > > Because the content of the book isn’t changing much, the intermittent (but more frequent than we’d like) build failures don’t really bother me too much - as long as there’s a recent-ish rendering of the book available then the content is still up to date in its rendered form and available on the web. > It gets a bit more stressful when these failures occur near release time - but everything is more stressful around then:upside_down_face:On the other hand, the difficulties in building the book (be it locally or on BioC servers) certainly don’t help with having people contribute to the book. > But its not clear to me that that’s the number 1 reason why the content has been fairly stable over the last few years. > Again, this a situation I don’t mind because I don’t think the book should focus/endorse on the latest whizbang high-profile software/publication but should focus on the concepts, promoting critical thinking about results (which is sorely lacking in much of scRNA-seq analysis material IMO), and well-used software

2024-06-28

Hervé Pagès (21:06:40): > Book all green today in release (https://bioconductor.org/checkResults/3.19/books-LATEST/) and looking much better in devel (https://bioconductor.org/checkResults/3.20/books-LATEST/). Yes, a lot of sporadic, hard to reproduce errors.

2024-06-30

Nicolas Peterson (13:08:49): > @Nicolas Peterson has joined the channel

Peter Hickey (18:22:37) (in thread): > pushed a fix for devel. hopefully all green next build

2024-07-01

Alan O’C (04:57:25): > The book is in a weird spot, as we discussed around the grant application. Would need a lot of work to update or upgrade, which none of us can really spare for what would be fairly thankless work (in terms of career outputs). As it is, full of random build errors. But I agree with Peter: as it stands, random build errors are annoying, but something I can generally deal with. I definitely can’t deal with a rewrite at the moment. > > Is there any way we could reduce the tendency to cache between books without needing to totally rewrite? Or would that just balloon the build times too much without the faster libscran code

2024-07-22

Peter Hickey (18:03:01): > @Andres WokatyWhen I navigate tohttps://bioconductor.org/books/release/OSCA/book-contents.htmland click on any of the subbooks I get sent to the BioC 3.18 versions (e.g.,https://bioconductor.org/books/3.18/OSCA.intro/) rather than the current release (e.g.https://bioconductor.org/books/release/OSCA.intro/). Can you please look into this. Thanks

Andres Wokaty (18:04:48) (in thread): > Sure, I’ll look into it tomorrow if I don’t get to it today

Andres Wokaty (18:20:57) (in thread): > Ok, I needed to fix a symlink. Everything appears to be correct now.

Peter Hickey (18:58:56) (in thread): > Thanks for the quick fix!

2024-07-23

Aaron Lun (19:49:30): > FYI planning to update Annoy by several patch numbers (1.17.0 to 1.17.3) to fix a memory leak (https://github.com/spotify/annoy/releases). This should not change the results of all neighbor-based steps, but it might, who knows.

2024-07-26

Chenyue Lu (14:57:13): > @Chenyue Lu has joined the channel

2024-08-09

Alan O’C (06:27:27): > Workflows is broken in devel:https://bioconductor.org/checkResults/devel/books-LATEST/OSCA.workflows/nebbiolo2-buildsrc.html > > # processing file: lun-416b.Rmd > # 1/48 > # 2/48 [unref-setup] > # 3/48 > # 4/48 [loading] > # Error: > # ! failed to load resource > # name: AH73905 > # title: Ensembl 97 EnsDb for Mus musculus > # reason: Required tables gene, tx, tx2exon, exon, chromosome are not present in the database! > # Backtrace: > # 1. scRNAseq::LunSpikeInData(which = "416b") > # 2. scRNAseq:::.define_location_from_ensembl(...) > # 3. scRNAseq:::.pull_down_ensdb(species, ahub.id = ahub.id) > # 5. AnnotationHub()[[ahub.id]] > # 6. AnnotationHub (local) .local(x, i, j = j, ...) > # 7. AnnotationHub:::.Hub_get1(x[idx], force = force, verbose = verbose) > # 8. base::tryCatch(...) > # 9. base (local) tryCatchList(expr, classes, parentenv, handlers) > # 10. base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]]) > # 11. value[[3L]](cond) >

Alan O’C (06:28:01): > Multisample error in release: > > # processing file: using-corrected-values.Rmd > # 1/36 > # 2/36 [setup] > # 3/36 > # 4/36 [unnamed-chunk-1] > # 5/36 > # 6/36 [unnamed-chunk-2] > # 7/36 > # 8/36 [unnamed-chunk-3] > # 9/36 > # 10/36 [unnamed-chunk-4] > # > # ***** caught segfault ***** > # address 0x3ca2, cause 'memory not mapped' > # > # Traceback: > # 1: grouped_medians(curptr, as.integer(flabels) - 1L, nlevels(flabels), nthreads = num.threads) >

Alan O’C (06:28:27): > idk if I can get to these for the next week+ but will try

Aaron Lun (13:25:24): > second issue might be related tohttps://community-bioc.slack.com/archives/CEQ04GKEC/p1723100240326649?thread_ts=1723001276.043899&cid=CEQ04GKEC - Attachment: Attachment > @Andres Wokaty Looks like nebbiolo1 used the devel versions of some packages e.g. DelayedArray (0.31.11) and SparseArray (1.5.31) for the latest BioC 3.19 data-experiment builds: https://bioconductor.org/checkResults/3.19/data-experiment-LATEST/nebbiolo1-R-instpkgs.html > Not sure how this could happen :thinking_face:

Lori Shepherd (14:23:53): > And I just check on the Annotation resource and it manually is working so that will likely clear up on the next build as well…

2024-08-19

Rema Gesaka (09:38:31): > @Rema Gesaka has joined the channel

2024-08-21

Alan O’C (05:31:09): > Workflows still broken in devel, multisample fine in release now

Lori Shepherd (10:17:21): > @Kevin Rue-Albrechtvelociraptor is still failing on linux in devel with some conda errors? Its been failing for one reason or another since June – did something need to be done on our end or are you still working on it? I ask here as OSCA.advanced lists it as a dependency if it becomes deprecated or continues to fail into the next release

Kevin Rue-Albrecht (10:21:21) (in thread): > argh - sorry worked on it back in June and had it working on linux on github action (https://github.com/kevinrue/velociraptor/actions/runs/9680893309) but struggled to get all three working as I ran into platform-specific conda dependency issues. > It’s been a busy year for me with eurobioc to organise in Oxford on top of everything else, so I’ve dropped the ball on this one

Kevin Rue-Albrecht (10:30:49) (in thread): > Each OS and architecture needs a different conda environment, which takes me days of trials and errors to figure out, let alone update. For instance Windows is stuck with an old version of scvelo because the more recent one depends on the conda package ‘jaxlib’ that is not available for windows. > I’ve documented those things in?scvelobut would need to take another stab at it

Kevin Rue-Albrecht (10:33:34): > On a separate note, what’s the story about ‘TIMEOUT’ (3.19) and ‘NA’ (3.20) on Windows Server?https://bioconductor.org/checkResults/3.20/bioc-LATEST/velociraptor/

Kevin Rue-Albrecht (11:02:19) (in thread): > Actually, I’m confused why it’s failing on linux on the BBS. It’s working on my linux machine and in the linux github action

Charlotte Soneson (11:07:34) (in thread): > I think this is becausebasiliskrecently changed to usingminiforge, which means that it’s now using (in devel only) a much newer conda version, including using mamba to solve environments, and in several cases it just doesn’t manage to solve the same environments as before. We saw the same insketchRandorthos(and recently pushed updated environments for both), I think using the same approach as you did in June forvelociraptorwould work to get the new specifications.

Peter Hickey (18:26:14) (in thread): > Looks okay to me?https://bioconductor.org/checkResults/3.20/books-LATEST/OSCA.workflows/

2024-08-22

Alan O’C (05:27:01) (in thread): > Oh nice, might’ve been a cache thing (although p sure I force refreshed)

2024-08-23

Kevin Rue-Albrecht (04:22:12) (in thread): > I’ve pushed upstream/devel an updated version 1.15.5 which contains an environment that I’ve successfully tested on a Ubuntu 24.04 Desktop - GitHub Action has been hasn’t been reliable for velociraptor lately

Martin Morgan (07:05:49): > @Martin Morgan has left the channel

2024-08-28

Kevin Rue-Albrecht (05:04:52) (in thread): > @Lori ShepherdBBS now fails with the same error as github > * https://bioconductor.org/checkResults/3.20/bioc-LATEST/velociraptor/nebbiolo2-buildsrc.html > * https://github.com/kevinrue/velociraptor/actions/runs/10521842994/job/29153275263 > I’ve opened github issues where I thought it was relevant but haven’t figured out the source of the issue yet > * https://github.com/kevinrue/velociraptor/issues/78 > * https://github.com/python-pillow/Pillow/issues/8148

Kevin Rue-Albrecht (09:18:45): > Having tried a variety of approaches interactively and through automation, it seems to all come back to@Hervé Pagès’s observationhttps://community-bioc.slack.com/archives/CM2CUGBGB/p1717516918056459?thread_ts=1713725998.833219&cid=CM2CUGBGBI’ve had success: > * Using the bioc-devel Docker container > * Making a derived container with velociraptor and all its dependencies, installed using BiocManager > * Installing velociraptor from source usingremotes::install_local(, build_vignettes=FALSE) > * Running the example code from?scvelowhich then creates the micromamba environment and runs the code successfully > However,R CMD build velociraptorkeeps failing with the error > > ImportError: /github/home/.cache/R/basilisk/1.17.2/velociraptor/1.15.5/env/lib/python3.11/site-packages/PIL/../../../libtiff.so.6: undefined symbol: jpeg12_write_raw_data, version LIBJPEG_8.0 > I’m truly lost at this point - Attachment: Attachment > I can reproduce this on bioconductor_docker:devel so it’s not a peculiarity of nebbiolo2. The same happens on bioconductor_docker:devel: the error only happens in the context of R CMD build, but everything works fine interactively or in the context of R CMD check.

2024-08-29

Kevin Rue-Albrecht (02:45:36): > I’m no expert, at that level anyway, but this sounds a plausible possibility for my situationhttps://stackoverflow.com/a/33307233However, if it’s truly a discrepancy in compiler versions, how do I resolve the problem across platforms and machines ? - Attachment (Stack Overflow): Installing R package with undefined symbol > I have installed R-3.0.1 with intel13 and am trying to install reshape2 version 1.2.2. I am installing it from source with R CMD INSTALL reshape2 but I get the following error: > > Error in dyn.load(f…

Kevin Rue-Albrecht (05:57:25): > :face_with_open_eyes_and_hand_over_mouth:https://github.com/kevinrue/velociraptor/actions/runs/10612115572/job/29413105915Seems like addinganacondato the channels solved it for Linux ?!? > I honestly forgot how I got there, but one thing I noticed was that libpeg-turbo was stuck at version 3.0.0 on conda-forge while anaconda was at 3.0.3https://anaconda.org/anaconda/libjpeg-turbo

Mike Smith (07:48:26): > Maybe not very helpful information, but relying on the anaconda channel would prohibit installation of your package here at EMBL. Anaconda Inc have been pursuing the institute for licensing fees, and at the moment any of the Anaconda maintained channels are inaccessible inside our firewall and we’re forbidden from trying to access them. I presume this would break the installation of velociraptor for us here, and maybe other sites are similarly affected.

Kevin Rue-Albrecht (09:01:33) (in thread): > Thanks. It may not help me fix the bug, but it is important feedback nonetheless. > Unfortunately, in the current situation, I am faced with the choice of 1) a package that I cannot get to build on Linux without anaconda, so no one can install it 2) a package that builds with Anaconda so some may be allowed to install it. > I was aware of the new licensing policy of Anaconda, and merely tried this approach as a hail mary.

Kevin Rue-Albrecht (09:05:04) (in thread): > I’m almost annoyed that it works. I would really love to understand the issue and solve it using a 100% conda-forge environment.

Mike Smith (09:08:41) (in thread): > This seems to be an interesting example of the value Anaconda claim they provide i.e. a curated set of packages that work together (sounds very like Bioconductor to me!). Most people here have been of the opinion that “fine, we’ll just use conda-forge & bioconda”, but it seems you’ve stuck an instance where the community channels aren’t providing a viable set of packages.

Kevin Rue-Albrecht (12:24:10) (in thread): > Actually also a good use case forhttps://kevinrue.github.io/BiocChallenges/ - Attachment (kevinrue.github.io): Challenges for the Bioconductor community > This package hosts challenges contributed by and for the Bioconductor community. > It provides functionality to manage, filter, and display challenges as articles of a pkgdown website. > Challenges are bite-sized projects led by volunteers, encouraging collaboration and sharing of expertise between contributors.

Kevin Rue-Albrecht (12:24:34) (in thread): > “Get this working without using anaconda”:grin:

Vince Carey (13:48:23) (in thread): > Does this amount to making a wheel for PIL that works on bioconductor’s build system? Maybe@Jayaram Kancherlacould comment on this?

Jayaram Kancherla (13:48:30): > @Jayaram Kancherla has joined the channel

Vince Carey (13:49:58) (in thread): > I wonder if we will wind up making a “channel” of our own for problematic components.

Vince Carey (13:51:40) (in thread): > And maybe the context can be shared with@Björn Grüning

Björn Grüning (13:51:43): > @Björn Grüning has joined the channel

Kevin Rue-Albrecht (15:52:42) (in thread): > Happy to share whatever information is useful on request.I’vebeen dragging this issue for months with so many red herrings now that my storytelling would be a mess. > First, fingers crossed the latest version bump will work on the BBS like it just did on GitHub action. I’m not taking anything for granted anymore:sweat_smile:

Jayaram Kancherla (20:54:01) (in thread): > Sorry joining this conversation half way through, what is PIL? > > we’ve also had to move away from Anaconda due to licensing and what I hear is an absurd cost they quoted for Roche/Genentech. Most of our workflows are now based on miniforge which uses conda-forge as the default channel. For packages that are not available, we have an internal PyPI where we publish them to setup environments or use in various CD processes. > > To Mike’s comment, I am not sure if conda is more like bioconductor, they don’t test integration across packages, its more robust in the availability of libraries (python and beyond) that pass their security checks and kind of works right now. conda-forge seems more like ad-hoc.

2024-08-30

Kevin Rue-Albrecht (04:05:43) (in thread): > pillow/PIL is the Python Imaging Libraryhttps://anaconda.org/anaconda/pillowIt seemed to be the thing triggering the error on the BBShttps://bioconductor.org/checkResults/3.20/bioc-LATEST/velociraptor/nebbiolo2-buildsrc.htmlHowever, it’s a rather long story and the bug only seems to happen duringR CMD buildwhen building the vignette. If the package is built without the vignette, the conda environment is created just fine at runtime.

Kevin Rue-Albrecht (04:07:02) (in thread): > Like I wrote earlier, right now I’m stuck between two bad solutions: > 1. a package that doesn’t build on the BBS/GitHub and therefore that no one can use > 2. a package that builds on GitHub (waiting to see on BBS) using anaconda, and therefore that some people may use

2024-09-07

Aaron Lun (01:14:31): > BIOCNEIGHBORS HAS UPDATED

2024-09-09

Aaron Lun (23:42:36): > I don’t know what’s happening with the two package plan we had before the grant got sunk, but I’ve fulfilled my end:https://github.com/Bioconductor/Contributions/issues/3536 - Attachment: #3536 scrapper > Update the following URL to point to the GitHub repository of
> the package you wish to submit to Bioconductor > > • Repository: https://github.com/libscran/scrapper > > Confirm the following by editing each check box to ‘[x]’ > > • I understand that by submitting my package to Bioconductor,
> the package source and all review commentary are visible to the
> general public. > • I have read the Bioconductor Package Submission
> instructions. My package is consistent with the Bioconductor
> Package Guidelines. > • I understand Bioconductor <https://bioconductor.org/developers/package-submission/#naming|Package Naming Policy> and acknowledge
> Bioconductor may retain use of package name. > • I understand that a minimum requirement for package acceptance
> is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
> Passing these checks does not result in automatic acceptance. The
> package will then undergo a formal review and recommendations for
> acceptance regarding other Bioconductor standards will be addressed. > • My package addresses statistical or bioinformatic issues related
> to the analysis and comprehension of high throughput genomic data. > • I am committed to the long-term maintenance of my package. This
> includes monitoring the support site for issues that users may
> have, subscribing to the bioc-devel mailing list to stay aware
> of developments in the Bioconductor community, responding promptly
> to requests for updates from the Core team in response to changes in
> R or underlying software. > • I am familiar with the Bioconductor code of conduct and
> agree to abide by it. > > I am familiar with the essential aspects of Bioconductor software
> management, including: > > • The ‘devel’ branch for new packages and features. > • The stable ‘release’ branch, made available every six
> months, for bug fixes. > • Bioconductor version control using Git
> (optionally via GitHub). > > For questions/help about the submission process, including questions about
> the output of the automatic reports generated by the SPB (Single Package
> Builder), please use the #package-submission channel of our Community Slack.
> Follow the link on the home page of the Bioconductor website to sign up.

2024-09-10

Aaron Lun (01:01:32) (in thread): > getting back to the PIL issue: smells like conda is finding the wrong libtiff, possibly from R itself. Normally the fallback provides some protection against this but either it’s not being triggered properly in a BUILD context or the fallback R itself has the wrong libtiff. Hard to say without a local repro.

Aaron Lun (01:15:58) (in thread): > for example, if I look at my ubuntu’s systemlibjpeg.so.8, I only seejpeg_write_raw_data(note the lack of 12). Thelibjpegin the conda environment should have the +12 version; at least, my fallback environment does. Which suggests that the real problem is thatBUILDis not allowing the fallback to run.

Mike Smith (04:11:48) (in thread): > What is the fallback environment? Is that a conda thing, a basilisk thing, or a system thing? I was able to reproduce Kevin’s issue in a Docker container, but couldn’t figure out why there was a difference in run-time linking between theBUILDcontext and running it as a regular user. I agree it certainly looks like it’s linking to the wrong version of thelibjpeg, but I couldn’t determine how it was selecting the library to link.

Kevin Rue-Albrecht (04:52:10) (in thread): > @Mike SmithObviously something better answered by Aaron, but in the meantime, I’ve dug a bit and it seems abasilisk.utilsthing:https://rdrr.io/github/LTLA/basilisk.utils/man/getFallbackREnv.html

Aaron Lun (11:35:01) (in thread): > the prupose of the fallback is discussed in the “testing package loads” section of?basiliskStart

Aaron Lun (11:38:04) (in thread): > one coulddebug()the code to check whether the fallback is being triggered in a non-BUILD context. To diagnose build-only failures is a bit harder, but in the few cases I’ve had to do it, I just packsaveRDS()calls to an absolute path inside the code and then readRDS the outputs to inspect the state at each step.

Ludwig Geistlinger (14:10:33): - Attachment: Attachment > Hi <!channel>: > > As discussed in yesterday’s education meeting, we would like to open up our OSCA-based scRNA-seq module for feedback from the community and review by Bioconductor’s education committee (GitHub repo / rendered course page). Our goal is to collect and implement feedback over the course of the next month prior to proceeding with contributing the module to the Carpentries incubator and the collection of Bioconductor teaching modules. > > A bit of context: > > This workshop is based on the OSCA tutorial that Davide Risso, Dario Righelli, Marcel Ramos, and myself gave at the ISMB conference last year. The tutorial is a light version of the OSCA book that concentrates on essential aspects for getting started with the book (“The OSCA book in a day”). The tutorial is in large parts a faithful copy of the OSCA book, but also adds contents that are not (yet) covered in the OSCA book such as interoperability with other popular single-cell analysis ecosystems and accessing data from the Human Cell Atlas. > > How to provide feedback: > > We welcome feedback provided through GitHub issues. Contributions via pull requests should be discussed via GitHub issues first. > > Next steps: > > Feedback provided by Oct 07, 2024, will be considered and incorporated where appropriate during a dedicated sprint session. If you would like to participate in the sprint, please provide your availability here.

2024-09-13

Mike Smith (05:59:00) (in thread): > A bit more digging suggests this is not actually due to theBUILDper-se but rather when it’s being knitted byrmarkdown. Runningrmarkdown::render("velociraptor/vignettes/velociraptor.Rmd")will also produce the error, with a few more details than you get fromBUILDe.g. > > 10/23 [unnamed-chunk-5] > Error in `.activate_fallback()`: > ! ImportError: /root/.cache/R/basilisk/1.17.2/velociraptor/1.15.5/env/lib/python3.11/site-packages/PIL/../../../libtiff.so.6: undefined symbol: jpeg12_write_raw_data, version LIBJPEG_8.0 > Run `reticulate::py_last_error()` for details. > Backtrace: > 1. velociraptor::scvelo(...) > 2. velociraptor::scvelo(...) > 3. velociraptor (local) .local(x, ...) > 7. velociraptor (local) .nextMethod(x, ..., sf.X = sf.X, dimred = dimred) > 8. velociraptor (local) .local(x, ...) > 9. velociraptor:::.scvelo(...) > 10. basilisk::basiliskRun(...) > 11. basilisk::basiliskStart(...) > 12. basilisk:::.activate_fallback(...) >

2024-09-20

Camille Guillermin (09:30:28): > @Camille Guillermin has joined the channel

2024-09-24

Zhu Yujia (11:25:51): > @Zhu Yujia has joined the channel

2024-09-26

Kevin Rue-Albrecht (08:10:50) (in thread): > https://github.com/kevinrue/velociraptor/pull/91without relying on anaconda - Attachment: #91 Update Linux environment to avoid anaconda - File (PNG): image.png

Kevin Rue-Albrecht (08:11:29) (in thread): > just installing a very specific set of packages, as perhttps://github.com/conda-forge/libtiff-feedstock/issues/104#issuecomment-2375893029 - Attachment: Comment on #104 libtiff.so.6: undefined symbol: jpeg12_write_raw_data, version LIBJPEG_8.0 > I have been having the same issue. > > > ! ImportError: /opt/conda/envs/scil3extfig/lib/python3.10/site-packages/PIL/../../../libtiff.so.6: undefined symbol: jpeg12_write_raw_data, version LIBJPEG_8.0 > > > > My conda env with the error: > > > python=3.10.14 libtiff=4.6.0 pillow=10.0.1 matplotlib=3.8.4 numpy=1.26.4 cooltools=0.5.2 pairtools=1.0.3 cooler=0.9.3 bioframe=0.6.4 pygenometracks=3.3 tqdm seaborn=0.13.2 pandas=1.5.3 > > > > Downgrading tolibtiff=4.5.1 and pillow=10.0.0 fixed the issue. - File (PNG): image.png

2024-11-07

Malvika Kharbanda (22:11:00): > @Malvika Kharbanda has joined the channel

2024-11-20

Alan O’C (10:53:56): > Got an error in multisample that I’ll be looking into in the coming days. Any off-the-cuff guesses for why MNN and/or clustering would have changed here?https://bioconductor.org/books/3.20/OSCA.multisample/merged-pancreas.html - Attachment (bioconductor.org): Chapter 8 Human pancreas (multiple technologies) | Multi-Sample Single-Cell Analyses with Bioconductor > Chapter 8 Human pancreas (multiple technologies) | Multi-Sample Single-Cell Analyses with Bioconductor

Aaron Lun (12:57:43): > don’t think it was me

Peter Hickey (16:54:32): > not specificcally, but i just fixed something in OCSA.basic (annotation chapter) in BioC 3.20 and found while doing it that there were some small changes to numbers of cells/cluster and I think some small changes either with the cluster numbers of cell type numbers for some of pancreas datasets.

Peter Hickey (16:54:39): > sorry that’s not more specific

Hervé Pagès (21:04:06): > Another change inigraph, maybe? A new version made it to CRAN about 1 month ago. Already caused some problems before:https://github.com/OSCA-source/OSCA.multisample/issues/8

2024-11-26

Alan O’C (04:39:29): > Yeah I couldn’t spot anything in the immediate Bioc deps, igraph would make sense. Going to be a bit of a pain to debug, hopefully can make some time this week

2024-12-10

Aaron Lun (17:54:08): > note: minor change to the defaultalpha=ofemptyDrops, rationale described athttps://github.com/MarioniLab/DropletUtils/pull/118 - Attachment: #118 Set alpha=Inf as the default for testEmptyDrops. > This is because a finite alpha is not universally safer; despite representing an overdispersed multinomial, using it to compute p-values for a multinomial-distributed vector will actually be anticonservative! > > This means that an inaccurate alpha cannot be brushed under the carpet. Previously, I was trusting that any finite alpha would be safer than alpha=Inf, but it seems that this is not true. We can’t just ignore poor estimates of alpha that we get from the assumed-ambient counts. > > So, we just keep it simple and default alpha=Inf, which is exactly correct for the multinomial case (which should hopefully handle most use cases). This also reduces runtime and aligns with the CellRanger defaults.

Peter Hickey (18:24:25) (in thread): > Thanks. Will take a look once the latest version ofDropletUtilspropagates and we get builds of OSCA

Aaron Lun (18:52:51) (in thread): > the old default has been there for 7 years, truly a epochal event

2024-12-16

Peter Hickey (02:39:13) (in thread): > Looking into affected files > > % grep -l emptyDrops OSCA*/inst/book/*Rmd > OSCA.advanced/inst/book/doublet-detection.Rmd > OSCA.advanced/inst/book/droplet-processing.Rmd > OSCA.advanced/inst/book/nuclei-analysis.Rmd # False positive (results identical in BioC 3.20 and 3.21) > OSCA.multisample/inst/book/ambient-problems.Rmd # False positive (function not actually called) > OSCA.workflows/inst/book/tenx-unfiltered-pbmc4k.Rmd >

Peter Hickey (02:39:56) (in thread): > https://bioconductor.org/books/3.20/OSCA.advanced/doublet-detection.html#identifying-inter-sample-doublets > > # 3.20 > length(is.cell) > ## [1] 21780 > > # 3.21 > > length(is.cell) > [1] 21526 >

Peter Hickey (02:41:11) (in thread): > https://bioconductor.org/books/3.20/OSCA.advanced/droplet-processing.html#testing-for-empty-droplets > > # 3.20 > summary(e.out$FDR <= 0.001) > ## Mode FALSE TRUE NA's > ## logical 989 4300 731991 > > # 3.21 > > summary(e.out$FDR <= 0.001) > Mode FALSE TRUE NA's > logical 1179 4110 731991 >

Peter Hickey (02:41:39) (in thread): > https://bioconductor.org/books/3.20/OSCA.advanced/droplet-processing.html#cell-calling-options > > # 3.20 > summary(is.cell) > ## Mode FALSE TRUE NA's > ## logical 1384 7934 15770 > > # 3.21 > > summary(is.cell) > Mode FALSE TRUE NA's > logical 1389 7929 15770 >

Peter Hickey (02:42:55) (in thread): > https://bioconductor.org/books/3.20/OSCA.workflows/unfiltered-human-pbmcs-10x-genomics.html#quality-control-2 > > # 3.20 > set.seed(100) > e.out <- emptyDrops(counts(sce.pbmc)) > sce.pbmc <- sce.pbmc[,which(e.out$FDR <= 0.001)] > unfiltered <- sce.pbmc > stats <- perCellQCMetrics(sce.pbmc, subsets=list(Mito=which(location=="MT"))) > high.mito <- isOutlier(stats$subsets_Mito_percent, type="higher") > sce.pbmc <- sce.pbmc[,!high.mito] > summary(high.mito) > ## Mode FALSE TRUE > ## logical 3985 315 > > # 3.21 > > set.seed(100) > > e.out <- emptyDrops(counts(sce.pbmc)) > > sce.pbmc <- sce.pbmc[,which(e.out$FDR <= 0.001)] > > unfiltered <- sce.pbmc > > stats <- perCellQCMetrics(sce.pbmc, subsets=list(Mito=which(location=="MT"))) > > high.mito <- isOutlier(stats$subsets_Mito_percent, type="higher") > > sce.pbmc <- sce.pbmc[,!high.mito] > > summary(high.mito) > Mode FALSE TRUE > logical 3805 305 >

Peter Hickey (02:47:34) (in thread): > Some of these have knock-on consequences because datasets are used in other chapters/books

Peter Hickey (02:48:50) (in thread): > Do we adjust book content (text, sanity checks, etc.) for new default or tweakemptyDrops()parameters in book so that content remains unchanged?

2024-12-17

Alan O’C (10:46:12) (in thread): > I didn’t manage to get anywhere with multisample rolling back igraph a few versions. Results look qualitatively similar, the main difference seems to be we get 2 more clusters that are subsets of acinar and alpha cells, this crosstab vs the one at the bottom here:https://bioconductor.org/books/3.19/OSCA.multisample/merged-pancreas.html > > > table(proposed, clusters) > clusters > proposed 1 2 3 4 5 6 7 8 9 10 > acinar 6 352 0 0 1 1 0 65 0 0 > alpha 6 1 10 0 1 1440 2 0 2 425 > beta 5 3 920 2 0 3 7 0 0 0 > co-expression 0 0 26 0 0 11 0 0 1 1 > delta 2 0 3 0 1 1 0 0 309 0 > ductal 624 5 4 5 0 1 1 0 0 0 > endothelial 0 0 0 1 33 0 0 0 0 0 > epsilon 0 0 0 0 0 1 7 0 0 0 > gamma 0 0 0 0 0 0 280 2 0 0 > mesenchymal 1 0 0 79 0 0 0 0 0 0 > mhc class ii 4 0 0 0 0 0 0 0 0 0 > nana 8 2 1 0 1 9 0 0 1 4 > none/other 4 0 1 0 4 2 1 0 0 0 > stellate 1 0 0 70 1 0 0 0 0 0 > unclassified 0 0 0 2 0 0 0 0 0 0 > unclassified endocrine 0 0 2 0 0 3 0 0 1 0 > unclear 4 0 0 0 0 0 0 0 0 0 > > Will probably just lower the accepted rand threshold unless there’s any suggestions for other culprits or objections

Peter Hickey (17:04:06): > i’ve certainyl just tweaked cutoffs when results didn’t seem qualitatively different

2024-12-18

Aaron Lun (01:12:14) (in thread): > up to you. that is the fundamental question at the heart of the Biocbook problem. i guess you could put something likealpha=NULL # (back-compatibility only)to indicate to users that they needn’t add this.

Alan O’C (06:15:01): > Yeah, bit nervous when I don’t know why but ultimately it’s not used anywhere else at the moment

Michael Totty (15:04:15): > @Michael Totty has joined the channel

2025-01-09

Ammar Sabir Cheema (11:40:02): > @Ammar Sabir Cheema has joined the channel

Peter Hickey (19:31:34): > Slowly making my way updates identified in thread above about change to defaultalphaofemptyDrops()(https://community-bioc.slack.com/archives/CM2CUGBGB/p1733871248326239) > * [x]OSCA.workflows(https://github.com/OSCA-source/OSCA.workflows/commit/6ab1fc45a33848136d6ac0282c42b0ece5e2476a) > * [ ]OSCA.advancedWIP currently tracking in branchhttps://github.com/OSCA-source/OSCA.advanced/tree/emptyDrops-changes - Attachment: Attachment > note: minor change to the default alpha= of emptyDrops, rationale described at https://github.com/MarioniLab/DropletUtils/pull/118

2025-01-15

Alan O’C (09:09:39): > Ah was just about to ask if the workflows/advanced errors were similar to the ones I saw in multisample and if they were on anybody’s radar, thanks

2025-01-20

Peter Hickey (16:41:31): > I’ll try to return toOSCA.advancedthis week

2025-02-20

António Domingues (14:38:40): > @António Domingues has joined the channel

2025-03-05

Peter Hickey (22:18:43): > With BioC 3.21 scheduled for release in just over a month, we need to get OSCA building in devel. > Some of the sub-books have been failing for a long time: > * @Ludwig Geistlingerhttps://bioconductor.org/checkResults/3.21/books-LATEST/OSCA.advanced/nebbiolo1-buildsrc.html > * @Alan O’Chttps://bioconductor.org/checkResults/3.21/books-LATEST/OSCA.multisample/nebbiolo1-buildsrc.html > I’m happy to help. but I’d first like to please know where you are with these issues

2025-03-06

Ludwig Geistlinger (14:57:49): > Thanks for the nudge Pete. I’ll take a look and report back.

2025-03-07

Alan O’C (07:22:28): > Oh thanks, I didn’t spot any notification. I’ll have a look today

2025-03-10

Alan O’C (08:43:21): > Multisample relates to this I think which is apparently temporary,@Hervé Pagèsam I safe to ignore and wait for a DelayedArray update or should I look to resolve in the interim?https://github.com/Bioconductor/DelayedArray/commit/e1703455e5feead4fba6ee65922dc531d4e10e63

Ludwig Geistlinger (18:08:15) (in thread): > Update: I am trying to reproduce the error locally. Some difficulties arise from this command inmore-clustering.Rmd: > > > extractFromPackage("tenx-unfiltered-pbmc4k.Rmd", package="OSCA.workflows", > + chunk="dimensionality-reduction", objects="sce.pbmc") >

Ludwig Geistlinger (18:09:06) (in thread): > which gives: > > > sce.pbmc > [1] "NOT AVAILABLE" >

Ludwig Geistlinger (18:13:12) (in thread): > However, when executing the code that the aboveextractFromPackagecommand produces in an interactive session, thesce.pbmcis returned as aSingleCellExperimentthat can be plugged into subsequent commands without error

Ludwig Geistlinger (18:15:22) (in thread): > Am I missing something?

Ludwig Geistlinger (18:18:44) (in thread): > Btw extracting from other vignettes/rmds fromOSCA.workflowsworks just fine, for example > > extractFromPackage("lun-416b.Rmd", package="OSCA.workflows", > chunk="loading", objects="sce.416b") > > produces theSingleCellExperimentobject forsce.416b

Ludwig Geistlinger (18:19:08) (in thread): > also@Alan O’C

Peter Hickey (19:33:04) (in thread): > i think it happens when the cache is out of date or improperly filled. > So clearing your local cache might help (along with old builds of the book)

Ludwig Geistlinger (20:17:12) (in thread): > Thanks Pete. Is clearing the cache here just deleting the contents of the book cache directory? > > > bookCache('OSCA.workflows') > [1] "~/Library/Caches/rebook/OSCA.workflows/1.15.3" >

Ludwig Geistlinger (21:02:52) (in thread): > in related news: I adapted the failsafe inmore-clustering.Rmdthat was causing the error in OSCA.advanced and pushed the fix. As we observed before there seem to be some fluctuations in the clustering outputs across the book introduced by an obscure source, probably somewhere in the igraph stack.

Peter Hickey (22:15:29) (in thread): > I think that’s the location. there might also be some files in your working directory (or wherever you are building the book locally)

2025-03-11

Alan O’C (05:56:02) (in thread): > Don’t think this affects the multisample stuff, at least not yet. > > Yeah, that’s where the cache is at. If you ran rmarkdown::render on any individual files then there’ll be local cache for those as well

2025-03-17

Hervé Pagès (13:59:14) (in thread): > I was travelling last week and not paying attention sorry. I don’t have much context. What error are you seeing and where?

2025-03-22

Ludwig Geistlinger (14:49:32) (in thread): > Hi@Peter Hickeywhat is the status of theemptyDrops-changesbranch, can this be merged intodevel?

Ludwig Geistlinger (15:20:00): > There is currently acaching issue in OSCA.introthat I can’t reproduce locally,@Lori Shepherdany chance we could clear the cache on nebbiolo1 and see whether this resolves the issue?

Lori Shepherd (16:22:17): > Interesting… Yes I can look into it on Monday.

Ludwig Geistlinger (16:53:45) (in thread): > Thanks!

2025-03-24

Lori Shepherd (07:46:23) (in thread): > @Ludwig Geistlingerdo you know where the cache is that the osca book uses? when I try the code in getting-datasets.Rmd it doesn’t look like its using the same location for a default cache as I can not find an entry associated with it

Lori Shepherd (07:46:49) (in thread): > do you know if it specifies a unique location for the cache?

Ludwig Geistlinger (09:14:26) (in thread): > The codeheresuggests that this would just be the default location

Lori Shepherd (09:27:26) (in thread): > > > library(BiocFileCache) > > bfc <- BiocFileCache(ask=FALSE) > > url <- file.path("[ftp://ftp.ncbi.nlm.nih.gov/geo/series](ftp://ftp.ncbi.nlm.nih.gov/geo/series)", > "GSE85nnn/GSE85241/suppl", > "GSE85241%5Fcellsystems%5Fdataset%5F4donors%5Fupdated%2Ecsv%2Egz") > > bfc > class: BiocFileCache > bfccache: /home/biocbuild/.cache/R/BiocFileCache > bfccount: 208 > For more information see: bfcinfo() or bfcquery() > > bfcquery(bfc, url) > # A tibble: 0 × 13 > # ℹ 13 variables: rid <chr>, rname <chr>, create_time <dbl>, access_time <dbl>, > # rpath <chr>, rtype <chr>, fpath <chr>, last_modified_time <dbl>, > # etag <chr>, expires <dbl>, mtbls_id <chr>, mtbls_assay_name <chr>, > # derived_spectral_data_file <chr> > > So its not being found in the cache at all – and not ideal to reset since this is the default cache and has 208 other entries for other packages

Lori Shepherd (09:29:56) (in thread): > I could run the code manually – but if there is some other issue going on it could potentially mask them. Which is why I was curious if we knew for sure it is the default cache being used and there isn’t a different location being set

Vince Carey (10:29:41) (in thread): > seehttps://github.com/OSCA-source/OSCA.intro/issues/6.. not a solution but a request for more defensive programming in the book code. - Attachment: #6 “the caching issue on BBS” > This chunk seems to be causing problems on BBS at this time. > > > library(BiocFileCache) > bfc <- BiocFileCache(ask=FALSE) > url <- file.path("<ftp://ftp.ncbi.nlm.nih.gov/geo/series>", > "GSE85nnn/GSE85241/suppl", > "GSE85241%5Fcellsystems%5Fdataset%5F4donors%5Fupdated%2Ecsv%2Egz") > > # Making a symbolic link so that the later code can pretend > # that we downloaded the file into the local directory. > muraro.fname <- bfcrpath(bfc, url) > local.name <- URLdecode(basename(url)) > unlink(local.name) > if (.Platform$OS.type=="windows") { > file.copy(muraro.fname, local.name) > } else { > file.symlink(muraro.fname, local.name) > } > > > > IMHO this code could be improved with some defensiveness. I don’t know why we are seeing > > > # Error in bfcrpath(bfc, url) : not all 'rnames' found or unique. > # Calls: <Anonymous> ... withVisible -> eval -> eval -> bfcrpath -> bfcrpath > # > # Quitting from getting-datasets.Rmd:88-105 [unnamed-chunk-2] > # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > # <error/rlang_error> > # Error in `bfcrpath()`: > # ! not all 'rnames' found or unique. > # --- > # Backtrace: > # ▆ > # 1. ├─BiocFileCache::bfcrpath(bfc, url) > # 2. └─BiocFileCache::bfcrpath(bfc, url) > # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > # > # Execution halted > > > > as a first-in-line response to this. A softer landing could implement a policy for non-unique rnames (use the most recent one, for example), and bfcrpath could be more explicit: is the rname not found, or is it non-unique? Well, that’s an issue for BiocFileCache, stay tuned. > > Nevertheless, this bit of book code should a) use named chunks, b) use bfcinfo to figure out what is available, and c) retrieve from web or cache as appropriate. Can we try that instead of mucking with the BBS cache?

Lori Shepherd (10:34:12) (in thread): > also seems to have recurred in the past see previoushttps://community-bioc.slack.com/archives/CM2CUGBGB/p1719265165435919too

Peter Hickey (16:39:49) (in thread): > no don’t merge it. I’ll try to finish it this week with a proper PR

Ludwig Geistlinger (16:47:33) (in thread): > thanks, this will help resolvingcurrent breakagein OSCA.advanced due to updates of thealphaparam

2025-03-25

Vince Carey (09:31:44) (in thread): > And it looks like the problem “went away”. It can be reproduced if desired and the defensive pattern demonstrated if there is interest. Hints: use bfcadd to explicitly produce a duplicate resource (should this be defended against?), use bfcquery to check presence, when multiple replies are present, pick one, warn, hope for the best.

Vince Carey (09:34:30) (in thread): > Lesson to me: “BiocFileCache methodsshould not failunless there is no reasonable course of action. If a query is ambiguous, the package should make a guess about what to do, offer an option in the call sequence to define the guessing or fail policy, and obey the policy set by the caller.”

Vince Carey (09:35:36) (in thread): > We don’t want “refresh cache” to constitute a well-accepted course of action or suggestion to problems with cached resources. We can plop this discussion into developer-forum….

Lori Shepherd (09:36:49) (in thread): > I don’t necessarily agree with that – if caches are suppose to be reproducible “taking a guess” is not a good solution – but maybe print out more information to do a better query would be

Lori Shepherd (09:39:00) (in thread): > its also could be caught against if not using the helper function to add/extract path in one function – Martin pushed for this functionality which I always thought may cause problems – in better practice, querying to see if it exists, if not add to cache, if yes grab that path is the more extensive programming practice that would also protect against this and allow the user to determine the action rather that the cache taking guesses

Vince Carey (09:39:03) (in thread): > Caches and reproducibility are a perennial problem. A guess like “use the most recent of multiple entries” is IMHO a reasonable course of action; the user should be informed that this was done. An alternate policy is “use the first of multiple entries”, and another policy is “fail on multiple”.

Vince Carey (09:40:04) (in thread): > How to use the package is certainly open for discussion. But “don’t let deep infrastructure fail if there are reasonable bailout actions” is a principle I think we should consider.

Vince Carey (09:40:56) (in thread): > How did the problem “go away” BTW? Do we know?

Ludwig Geistlinger (09:42:29) (in thread): > I suspect there was a temporary downtime at NCBI/GEO that caused this

Lori Shepherd (09:43:04) (in thread): > seems consistent what we have been seeing on those website and would account then for it missing on the earlier run but added correctly now

Ludwig Geistlinger (09:43:21) (in thread): > Do we actually now see an entry in the default cache location now that things ran through?

Lori Shepherd (09:43:48) (in thread): > I agree that perhaps the condense message could be improved and checking for each scenario (duplicate vs not found) would help with better debugging and more informative message to the end user

Lori Shepherd (09:44:57) (in thread): > yes we do see it in the default cache location

Ludwig Geistlinger (09:45:46) (in thread): > ok so at least we resolved this piece of the puzzle

2025-03-28

Alan O’C (05:47:10) (in thread): > The error here traces back to binary operations between delayed and regular arrayshttps://bioconductor.org/checkResults/devel/books-LATEST/OSCA.multisample/nebbiolo1-buildsrc.html

Hervé Pagès (13:07:04) (in thread): > > library(Matrix) > library(ResidualMatrix) > design <- model.matrix(~gl(5, 50)) > y0 <- rsparsematrix(nrow(design), 200, 0.1) > y <- ResidualMatrix(y0, design) > extract_array(y, list(1:2, 1:3)) # works > > But it no longer seems to work when in the context ofbatchelor::regressBatches(): > > library(batchelor) > regressBatches(y, y) > # Error in validObject(.Object) : invalid class "ResidualMatrix" object: > # the supplied seed must support extract_array() > > Feels like an import problem somewhere. I’ll try to dig into this later today…

2025-03-31

Hervé Pagès (16:20:01) (in thread): > This line (from theextract_array()method for ResidualMatrixSeed objects defined in theResidualMatrixpackage): > > resid <- get_matrix2(x2) - get_Q(x2) %*% get_Qty(x2) > > assumes that-is defined betweenmatrix-likeobjectget_matrix2(x2)and matrix objectget_Q(x2) %*% get_Qty(x2). However there’s no such guarantee in general, even if the two operands are guaranteed to be conformable. This is why the following fails (ybeing the ResidualMatrix object obtained in my above post): > > rms <- ResidualMatrixSeed(y) > extract_array(rms, list(1:2, 1:3)) > # Error in get_matrix2(x2) - get_Q(x2) %*% get_Qty(x2) : > '-' between a DelayedArray object and an array is not supported yet > > The following change fixes the problem: > > hpages@XPS15:~/ResidualMatrix$ git diff > diff --git a/R/seed.R b/R/seed.R > index 3b5e4ce..356f7be 100644 > --- a/R/seed.R > +++ b/R/seed.R > @@ -185,11 +185,11 @@ setMethod("extract_array", "ResidualMatrixSeed", function(x, index) { > index <- rev(index) > } > x2 <- subset_ResidualMatrixSeed(x, index[[1]], index[[2]]) > - resid <- get_matrix2(x2) - get_Q(x2) %*% get_Qty(x2) > + resid <- as.matrix(get_matrix2(x2)) - get_Q(x2) %*% get_Qty(x2) > if (was_transposed) { > resid <- t(resid) > } > - as.matrix(resid) > + resid > }) > > Do you want to report the issue on theResidualMatrixrepo on GitHub@Alan O’C?

2025-04-02

Peter Hickey (23:32:46) (in thread): > Okay, PR made after getting it to build locally. Pushed to BioC and hopefully the build goes smoothly

2025-04-03

Ludwig Geistlinger (08:45:52) (in thread): > Thanks Pete!

2025-04-08

Alan O’C (04:05:01) (in thread): > Oh sorry, I was TKO last week with something like COVID. Pete’s opened an issue and I posted your patch there