#hca_rfa

2017-07-27

Aaron Lun (16:12:46): > @Aaron Lun has joined the channel

Aaron Lun (16:12:47): > set the channel description: To coordinate BioC-related submissions to the CZI HCA RFA

Peter Hickey (16:13:08): > @Peter Hickey has joined the channel

Aaron Lun (16:14:20): > Hey@Peter Hickeycould you throw in a link to the Google docs with the notes?

Aaron Lun (16:14:29): > I will put something on the BioC devel mailing list about this channel.

Peter Hickey (16:15:26): > i will. i was planning to tidy it up…would you prefer that or the raw notes?

Aaron Lun (16:16:39): > Happy for you to clean it up; I guess we should allow people to write to it?

Aaron Lun (16:16:55): > Assuming no malicious edits.

Peter Hickey (16:17:37): > i’ll set up a clean doc that ppl can edit based on my scrawled notes

Peter Hickey (16:17:49): > i’ll do over coffee tomorrow morning

Aaron Lun (16:18:12): > Okay, I’ll wait till then to post something on BioC-devel.

Davis McCarthy (16:18:34): > @Davis McCarthy has joined the channel

Davide Risso (16:18:48): > @Davide Risso has joined the channel

Andrew McDavid (16:18:48): > @Andrew McDavid has joined the channel

Stephanie Hicks (16:18:48): > @Stephanie Hicks has joined the channel

Martin Morgan (16:19:26): > @Martin Morgan has joined the channel

Wolfgang Huber (16:19:27): > @Wolfgang Huber has joined the channel

Vince Carey (16:19:37): > @Vince Carey has joined the channel

Aedin Culhane (16:19:37): > @Aedin Culhane has joined the channel

Aaron Lun (16:22:06): > Feel free to invite other peeps from the meet.

Peter Hickey (16:29:15): > @Aaron Lungot mixed up with what you were asking for…rfa notes arehttps://docs.google.com/document/d/198NNo7q9olhTwLhkBJrRcVrYL8Ua4jDTPS6ms3-V2a0/edit?usp=sharing

2017-07-28

Aaron Lun (00:08:24): > @Aaron Lun pinned a message to this channel.

Aaron Lun (14:03:29): > For starters, I think it would be great if we could see if BiocParallel works off-the-shelf with cloud computing services.

Kasper D. Hansen (14:03:51): > @Kasper D. Hansen has joined the channel

Vince Carey (17:50:04): > @Aedin Culhanehi will arrange invite

Martin Morgan (22:43:54): > BiocParallel: each cloud service is different, so one needs to write a AWSParam(), GoogleParam(), etc… but this sound like a good idea.

2017-07-29

Aaron Lun (18:36:37): > Yes, that’s pretty much what I was thinking.

2017-07-30

Aedin Culhane (19:52:14): > Would the Future package in R (https://cran.r-project.org/web/packages/future/) and its extensions from Henrik Bengtsson (https://github.com/HenrikBengtsson/BiocParallel.FutureParam) or the cloudyR packages be useful (http://cloudyr.github.io/) - Attachment (GitHub): HenrikBengtsson/BiocParallel.FutureParam > :rocket: R package: BiocParallel.FutureParam - a BiocParallelParam Class for Futures - Attachment (cloudyr.github.io): the cloudyr project > The goal of this initiative is to make cloud computing with R easier, starting with robust tools for working with cloud computing platforms. The project’s inital work is with Amazon Web Services, various crowdsourcing platforms, and popular continuous integration services for R package development.

2017-07-31

Aaron Lun (07:59:47): > Fascinating. I’ll admit that I don’t understand much aboutfuture, though.

Aaron Lun (08:03:25): > The cloud stuff definitely sounds like a good thing to submit for the RFA. This is probably outside the expertise of John’s group, but I’m sure we’ll be happy to support it as much as we can.

Aaron Lun (08:04:53): > Currently I’m thinking of writing something forBigDataAlgorithms; but it would be really cool if someone worked on improving interactive visualization, e.g., someSingleCellBrowserpackage.

2017-08-02

Kevin Rue-Albrecht (12:46:40): > @Kevin Rue-Albrecht has joined the channel

2017-08-07

Kevin Rue-Albrecht (10:05:20): > Just as a pre-meeting question,@Aaron Lunare you thinking something along the lines ofscater_gui()? or something more focused on tackling high density plots (e.g. how to represent a million cells without a million data points) ?

Aaron Lun (10:08:48): > I was thinking something specifically for generating plots. For example, a swap-in replacement forplot()that would open a browser, or something like that.

Aaron Lun (10:11:55): > People keep telling me thatplotlycan do this, but I’ll believe it when I see it.

Aaron Lun (10:12:52): > Or specifically, can it make a metadata-rich scatter plot without requiring 20 lines of code?

Kevin Rue-Albrecht (10:17:16): > Right, thanks. I’m just trying to get a sense of the challenge to plot big data, before even considering interactivity. > I found thatggplotfor instance, really doesn’t scale up well for heat maps usinggeom_tile

Aaron Lun (10:22:16): > Right. There’s at least two challenges during visualization; one regarding the scale of the data, the other regarding its richness.

Kevin Rue-Albrecht (10:44:54): > Cool. I look forward to the call to hear more about both aspects. > I need to do a bit more research before I can suggest anything (I haven’t used them much yet buttibbles just popped to mind to address one aspect of data richness given that they don’t coerce their input; this way data type could be preserved and facilitate later plotting)

2017-08-08

Aaron Lun (10:27:20): > Argh. Writing this common section is like pulling teeth.

Aaron Lun (11:41:27): > Well, I’ve written drafts for “1. To develop and propagate a common class for storing and manipulating single-cell data in R”

Aaron Lun (11:41:32): > and “2. To improve the representation of very large single-cell data sets in R using HDF5 files”

Aaron Lun (11:43:15): > Currently writing something about the cloud.

Aaron Lun (11:43:38): > Need someone to write something about multimodal stuff, because I’m getting seriously bored.

Kevin Rue-Albrecht (11:44:58): > Common section between the three axes outlined yesterday right? (1. Infrastructure 2. Methods, 3. Visualisation)

Kevin Rue-Albrecht (11:48:33): > hang on, nevermind, I was trying to help by suggesting a link with “To facilitate the development of interoperable methods”, but I have a feeling you covered that in your 1. I’ll keep watching and learning.

Aaron Lun (12:21:44): > @Peter HickeyDid you have any ideas about multimodality?

Peter Hickey (12:23:00): > can you point me to anything where they say they’re actually doing multiple assays and which ones?

Aaron Lun (12:26:00): > … good question.

Davide Risso (12:42:48): > Do we have a shared document?

Davide Risso (12:44:01): > The first bullet point of the rfa is “Developing standard formats and analysis pipelines for genomic, proteomic, and imaging data, in forms that enable consistent use of these pipelines by numerous experimental labs”

Davide Risso (12:44:32): > So I guess they will have proteomics and imaging data?:thinking_face:

Davide Risso (12:45:21): > Also: “Supporting analytical methods and machine learning approaches to solving problems such as multimodal integration”

Aaron Lun (12:45:39): > @Davide RissoWe will have a shared document once I find John and pester him about the 1000 word thing.

Aaron Lun (12:46:43): > As for the multimodalities; this is probably known only the upper echelons of the HCA.

Aaron Lun (12:46:59): > I guess they’re waiting for the pilot studies from RFA1 before they make a decision.

Davide Risso (12:48:21): > Re-reading the RFA imaging data is mentioned several times so I wonder if that’s what they mean by multimodal: imaging + genomics

Davide Risso (12:48:35): > Which is not really our strong suite

Aaron Lun (12:54:19): > There’s already some stuff for imaging, e.g.,https://github.com/chanzuckerberg/starfish - Attachment (GitHub): chanzuckerberg/starfish > starfish - *fish – a standardized analysis pipeline for image-based transcriptomics

Aaron Lun (12:54:27): > Which blew my mind, because it’s not in matlab.

Aaron Lun (12:54:33): > My old archnemesis.

Aaron Lun (12:54:51): > Anyway, the draft 1000 words are done; now I have to find out what John actually meant.

Aaron Lun (17:28:08): > Okay, just talked to the boss. Apparently he got the “shared 1000 word” format from talking directly to the CZI coordinator for this RFA round. He is happy to try to get a written statement from the CZI on this matter for interested parties: I guess@Martin Morgan@Vince Carey@Davide Risso, anyone else who wants to be cc’d on that email?

2017-08-09

Aaron Lun (04:59:44): > Also,https://docs.google.com/document/d/1ZOiPLDCoI97P0rH01HOk5KDnBL-NjqMtFKFumMXwXCI/edit?usp=sharing

Aaron Lun (05:00:32): > Caveats:

Aaron Lun (05:00:45): > 1. I’m not sure exactly what’s meant to be in the 1000 words yet.

Aaron Lun (05:01:02): > 2. I ran out of puff in the cloud computing part, so that’s why it’s a bit short.

Aaron Lun (05:01:57): > 3. Currently I’ve only enabled commenting on that link until we’ve nailed down the structure.

Aaron Lun (05:03:20): > 4. It’s also a bit over 1000 words, but we can cut words once we’re finished.

Aaron Lun (05:03:57): > 5. Yeah, I know most of the citations are mine. They were just the easiest to remember.

Kasper D. Hansen (10:06:22): > Just FYI, my lab is quite interested in participating in this. I am interested in also adding some work on scalable algorithms for this type of data. I have existing grant text from our work on recount2 which could be used as a starting point. Ie. do something standard like PCA or clustering for 1M cells.

Davide Risso (10:06:48): > Hi Aaron, I’ve read the proposal and it looks pretty good. I have just a major comment: in both aims 1 and 2 the balance between preliminary results and new (potentially exciting) proposed work is a bit off. It reads as we already did all the cool things and now we’re left with necessary but not exciting work

Kasper D. Hansen (10:07:00): > I have been traveling for vacation, so a bit out of the loop. Will get up to speed today

Kasper D. Hansen (10:07:48): > @Davide Rissomy guess is that they are also interested in taking existing small scale approaches and making them production ready and scalable for extremely large datasets. That is not trivial

Kasper D. Hansen (10:07:59): > But let me read and think and get up to speed first

Davide Risso (10:08:22): > OK, good to know

Davide Risso (10:09:19): > I was just wondering if we could reorganize the aims a bit so that aim 1 is not just about a new S4 class but includes scaling up the methods in the existing packages (focus on performance)

Davide Risso (10:09:36): > with the second aim focused on HDF5 / on disk methods

Aaron Lun (10:09:52): > Yes, that’s possible.

Davide Risso (10:10:03): > alternatively we could push more on the multimodality in aim 1

Aaron Lun (10:10:38): > Actually that might be better, as I want to keep aim 1 more about the containers.

Aaron Lun (10:10:51): > Aim 2 should be about scalability (part of which is the on-disk representation).

Aaron Lun (10:11:15): > However, I don’t know much about what needs to be done withMultiAssayExperiment.

Davide Risso (10:14:29): > does SingleCellExperiment have all we need to store methylation data?

Peter Hickey (10:15:02): > yes, it would have multiple assays, however

Peter Hickey (10:15:32): > at least 2 of (Methylated, Unmethylated, Coverage) and potentially 1 ‘smoothed’ methylation level. socountsetc. wouldn’t be appropriate

Davide Risso (10:15:54): > right

Davide Risso (10:16:14): > so if we want SingleCellExperiment to be general for both assays

Davide Risso (10:16:36): > should the additional slots that Aaron proposed be in a derived RNA-seq specific class?

Aaron Lun (10:17:03): > I mean, they’re not slots, they’re just convenience methods.

Aaron Lun (10:17:22): > The question would be whether you want to derive a specific class for those methods to operate on.

Aaron Lun (10:17:40): > You probably wouldn’t have spike-ins for methylation data anyway…?

Davide Risso (10:18:02): > right… sorry, I’ll continue on#singlecellexperiment

Ayshwarya Subramanian (11:33:44): > @Ayshwarya Subramanian has joined the channel

Stephanie Hicks (11:35:51): > Hi Aaron, thanks for starting the proposal! I read through it and had two questions/comments. (1) Is this specific for just the infrastructure portion that we discussed in phone call on Mon? If so, that’s totally fine. As it reads now it feels like all three project aims are “broadly” related to infrastructure (e.g. containers & scalability), but if we wanted to make it more all encompassing, I think including work on more methods development e.g. scalable algorithms/BigDataAlgorithms for this data (as@Kasper D. Hansenand@Kevin Rue-Albrechtmentioned) and work on data viz would be awesome too. (2) If we were going to go the more all encompassing route, would something like this work?Project aims: > > 1. To develop BioCinfrastructureto manipulate and improve the representation of very large, multi-modal single-cell data in R (and something about scalability in the title too) > 1.A e.g. develop appropriate containers (common S4 objects, multi-modal stuff) > 1.B e.g. make data scalable (file-backed representations, and cloud computing) > 1.C open to suggestions > > 2. To developalgorithms/methodsto analyze single-cell data in R/BioC > 2.A e.g. as@Kasper D. Hansenmentioned something like PCA or clustering for a million cells or as@Aaron Lunwrote, BigDataAlgorithms to operate on HDF5Matrix objects > 2.B more proposed methods from others e.g.@Davide Risso2.C open to suggestions > > 3. To develop open-sourcedata viz toolsin R/BioC to handle very large single-cell data > 3.A I remember@Peter Hickeymentioned something about Di Cook’s PhD student working generally in this area. I know less about this topic, but I think being able to address the question of how to visualize 1M+ cell is very relevant.

Aaron Lun (11:40:05): > @Stephanie HicksDo any of us want to do visualization, or are good at it?

Aaron Lun (11:40:23): > Happy to reorganize it if we have volunteers for that part.

Aaron Lun (11:41:34): > I stuck with infrastructure because that’s Bioconductor’s bread and butter.

Aaron Lun (11:42:02): > I also sort of consider the methods as infrastructure, SVD and kNN and linear models are basic enough.

Stephanie Hicks (11:43:21): > mmm, didn’t@Lorena Pantanomention wanting to help with the data viz portion on phone call? Not sure if we had PIs who would volunteer though.

Aaron Lun (11:51:04): > I think we’d need at least a PI-level commitment to that part (unless@Lorena Pantanois a PI?) if it’s going to be put down as a specific project aim.

Stephanie Hicks (11:52:57): > Yeah, I totally agree. For someone reason I thought others (PIs) mentioned they were interested, but maybe I’m not remembering correctly.

Aaron Lun (11:54:11): > I should also note that I just made up the cloud stuff in the current Aim 3, of which I know nothing about.@Martin Morganand@Vince Careywill probably have some better opinions on what needs to be done.

Aaron Lun (12:02:25): > Anyway, I’ve started a section in Pete’s google doc saying who wants to do what. I’ve filled in other people based on the contents of this channel, but just correct/elaborate it as you go along.

Stephanie Hicks (12:04:43): > to minimize scrolling for anyone else, here is Pete’s google doc:https://docs.google.com/document/d/198NNo7q9olhTwLhkBJrRcVrYL8Ua4jDTPS6ms3-V2a0/edit

Aaron Lun (12:10:28): > Looking at it again, Aim 1 could be sexier. Not quite sure how to do that; maybe will talk more about multimodalities.

Aaron Lun (12:10:50): > Aim 2 is pretty sexy, and aim 3 can be fleshed out by someone who knows what they’re talking about.

Aaron Lun (12:29:09): > Or we could split it into three alternative aims:

Aaron Lun (12:29:18): > 1) Data access and representation (i.e., containers, API for the DCP)

Aaron Lun (12:29:32): > 2) Scalability infrastructure (cloud + HDF5)

Aaron Lun (12:29:41): > 3) Scalability algorithms (bigDataAlgorithms)

Aaron Lun (12:30:28): > And we could probably stuff visualization into (3) somewhere.

Vince Carey (12:32:12): > I stuck a big wad of text at the end of the google doc, with a different expression of aims. Feel free to use or discard, I have to run off.

Aaron Lun (12:41:35): > Thanks@Vince Carey. Agree about the S4, it’s probably too technical to start with. I also like contracting it into two aims, this will reduce the amount of technical waffle I need to put in. It also gives us some space for visualization, if anyone is still interested in that.

Kevin Rue-Albrecht (12:46:17): > As@Aaron Lunand@Stephanie Hicksmention PI-level commitment, I just wanted to clarify that I’m not a PI, I’ve joined the call on Monday because I’m a computational biology postdoc involved in various scRNA-seq analyses in the University of Oxford (UK), and that I’m keen to get involved and help out wherever I’m welcome, starting with feedback including performance of methods and infrastructure on in-house data. Mainly from a user’s perspective, with a decent experience of Bioc package development.

Davide Risso (13:09:24): > What exactly do we mean by visualization?

Davide Risso (13:10:33): > I can imagine people want scatterplots and heatmaps, right?

Davide Risso (13:11:08): > Is the challenge that with a million point it’s challenging to draw a heatmap or a scatterplot with base R graphics?

Davide Risso (13:11:24): > Or are we thinking more about interactivity?

Davide Risso (13:11:48): > Or something else entirely?

Kevin Rue-Albrecht (13:17:09): > @Davide Risso: I was asking@Aaron Lunabout this yesterday. Initially I naively thought about a smart re-use of existingggplot2,ComplexHeatmap, andbasefunctions, but Aaron made a valid point: “Probably will involve some extra work, especially regarding how to retrieve data as needed rather than plotting everything at once.”

Kevin Rue-Albrecht (13:18:22): > and “The interactive stuff can probably be handled with a combination of plotly and HTML5”. > I can’t really comment on that one, because I still haven’t usedplotly

Kevin Rue-Albrecht (13:23:20): > But I can definitely say from experience that large heat map usingggplot2::geom_tiledo not scale particularly well: in a Shiny app, it was taking ~15s to draw a heat map of genotypes (218 variants x 5,844 samples).

Kevin Rue-Albrecht (13:27:13): > I have no idea what’s happening behind the scenes in plotting libraries, I have no doubt that they are already optimised for speed in many ways. Rather, I guess that newtypesof plots are likely needed to represent large single cell data sets before it really makes sense to address interactivity.

Kevin Rue-Albrecht (13:30:59): > For instance, (off the top of my head, might be a stupid idea) scatter plots might be improved for large single-cell (SC) data sets by collapsing groups of “very similar” cells as a single data point of larger size. > (My naive pragmatic suggestion to avoid trying to plot 1 million data points)

Aaron Lun (13:38:25): > I must admit to being a bit bemused as to how@Vince Careymanaged to edit that document; I thought the link was comment-only. Well, if you see something you want to edit… just do it.

Peter Hickey (13:44:00): > @Aaron Lunit is set to open edit. i can switch to closed edit (for only those here already) and view for others (adding them as needed) if you like

Aaron Lun (13:44:12): > whoops.

Aaron Lun (13:44:24): > yes, please, if you will.

Aaron Lun (13:44:33): > Don’t know much about these newfangled “docs”

Aaron Lun (13:44:44): > What happened to good old Latex?

Peter Hickey (13:46:53) (in thread): > nevermind, was getting my google docs mixed up. you created this one, right? in which case you can control the sharing settings (button on top right of screen)

Aaron Lun (13:47:31) (in thread): > Yeah, it’s got “Anyone with the link can comment”. So how did Vince slap down that text?

Aaron Lun (13:47:46) (in thread): > I mean, can you edit it?

Stephanie Hicks (13:48:15) (in thread): > I like this set of Aims too.

Stephanie Hicks (13:51:43) (in thread): > I tested both Aaron and Pete’s documents. I can edit both.

Peter Hickey (13:52:25) (in thread): > i think mine is set to full blown editing (e.g. your changes aren’t marked up) but aaron’s is set to ‘suggesting’ so changes are marked up by user ID/anonymous

Stephanie Hicks (13:53:08) (in thread): > Ah that makes sense. In Aaron’s it seemed like “track changes” was enabled.

Aaron Lun (13:53:39) (in thread): > next time I’ll do this in latex and get everyone to just send PRs.

Stephanie Hicks (14:09:35) (in thread): > @Kevin Rue-Albrecht@Davide RissoI think some people will want data viz tools that retrieve only a portion of the data as needed and others will want to use all the data (e.g. 1M cells). Within those two frameworks, I can imagine different needs. e.g. if you are just retrieving a portion of the data as needed, you can prob use base scatterplots, heatmaps. Others will want more interactive tools (e.g. collapsing similar cells into single points and being able to “zoom in” and “zoom out”, etc).

Davide Risso (14:14:51): > I think what Vince added is a comment, but google doc renders it as text

Vince Carey (14:16:02): > I am sorry if I disturbed the authoring. You can get access to previous versions of text in google docs, so I feel there is minimal need to control edits among this group.

Davide Risso (14:19:28) (in thread): > Overleaf!

Aaron Lun (14:20:49): > @Vince CareyNo problems; I wasn’t really bothered, I was just confused about Google doc’s permission system.

Aaron Lun (14:32:10): > Okay, I’ve finished taking the bits of Vince’s text that I liked, and it feels much better now.

Aaron Lun (14:34:21): > Much more high-level and inspirational, I think. Could be made even more so; feel free to edit.

Aaron Lun (14:35:15): > Full editing is now enabled, WGPCGR.

Aaron Lun (14:37:06) (in thread): > Don’t get me started on overleaf….

Stephanie Hicks (14:38:33) (in thread): > Agreed

Davide Risso (14:38:47) (in thread): > Yes, this looks very good!

Davide Risso (14:39:35) (in thread): > 998 words:smile:

Aaron Lun (14:42:27): > There’s a bit of fat around, so if that gets trimmed we can probably put in a section on visualization, if we have any takers.

Davide Risso (14:42:41): > So any final word from John on the 1,000 + 500 words rule?

Aaron Lun (14:42:58): > He’s emailed the CZI person, hopefully get a response before the end of the week.

Davide Risso (14:43:03): > I.e., should we start thinking of each individual 500 word section?

Kevin Rue-Albrecht (14:44:10): > @Kevin Rue-Albrecht pinned a message to this channel.

Peter Hickey (14:44:28) (in thread): > re viz: have emailed stuart lee who’s working with Di Cook to see if there’s any overlap with his PhD work

Aaron Lun (14:44:48) (in thread): > Is this the old WEHI stuart?

Peter Hickey (14:45:21) (in thread): > yep, phd with Di and Matt Ritchie. so it’s viz + genomics i think

Aaron Lun (14:46:42): > Also: I don’t know about you guys, but I didn’t want to get into bespoke single-cell methods. Mostly because we could have just all applied separately, there’s no real advantage to working together if we all just go and make our own method.

Aaron Lun (14:47:14) (in thread): > Fun fact: we used to share anime recommendations.

Davide Risso (14:47:29): > I agree. It seems that if we really can “solve” SVN, nearest neighbors, linear models

Davide Risso (14:47:53): > then it would be really easy to build on those blocks for each of our own methods

Davide Risso (14:49:09): > I think we should just be careful with the individual part of the application and make sure that we are all proposing something different

Vince Carey (14:49:10): > Apropos vizualization,http://gehlenborglab.org/– he did a nice job at the conference – but i assume no one in this group is working with him? should i approach him for interest?

Aaron Lun (14:50:58) (in thread): > Ah, the upset guy; yeah, that was good.

Aaron Lun (14:53:13) (in thread): > Should we hold off until we decide whether or not the application needs a dedicated section on visualization? It would be cool, but maybe the infrastructure stuff would be enough.

Aaron Lun (14:55:02) (in thread): > I’m hoping that we come to a consensus on Monday, so there should still be plenty of time to ask?

Vince Carey (14:55:18) (in thread): > yes – BTW i can’t tell if this is public or just a 2person thread

Vince Carey (14:56:04): > I would agree with Davide that we should put some time into the separation of tasks for the different PIs and their coinvestigators.

Vince Carey (14:56:21): > Perhaps stake some claims in the google doc?

Aaron Lun (14:56:44): > Yep, started a section in Pete’s google doc (before I realized that mine was editable).

Aaron Lun (14:56:56): > I can copy the stuff I have in there, give me a sec.

Peter Hickey (14:57:08) (in thread): > ‘threads’ are public but separated from the main chat

Peter Hickey (14:57:34) (in thread): > there’s a tag on the left panel ‘all threads’ where you can see them listed

Peter Hickey (21:12:38): > heard back from Stuart Lee (PhD student of Di Cook and Matt Ritchie). visualisation of large genomics data is the planned topic of one of his PhD thesis chapters, so they could be a great fit. have invited him to join the slack channel and will try to have a chat with him in the next couple of days to learn more of their plans

2017-08-10

Aedin Culhane (02:53:00): > @Aedin Culhane pinned a message to this channel.

Aaron Lun (04:09:43): > Okay, John got a response back from the CZI about this; let me see how I can stuff it into the Slack channel.

Aaron Lun (04:10:54): > @Aaron Lunshared a file:CZI Format Response - File (Canvas): CZI Format Response

Aaron Lun (04:12:21): > Looks like it’s a bit more flexible than 1000 words, so we can expand or trim as needed; though 1000 is probably a good figure to aim for.

Aaron Lun (04:13:09): > @Aaron Lun pinned ’s Canvas CZI Format Response to this channel.

Davide Risso (09:51:28): > Yes, I wouldn’t go beyond 1000 as it may be tough to talk about the specifics of each project in less than 600 words

Aaron Lun (10:45:42): > @Davide RissoI was thinking of having another meeting on Monday to finalize our strategy.

Aaron Lun (10:45:57): > Who had Bluejeans set up?

Kevin Rue-Albrecht (10:59:06): > I remember 10 participants listed, with 9 sharing theirs screens: > Aaron, Vince, Martin, Davide, Peter, Stephanie, Kevin, Lorena > (that makes 8 so far, I can’t remember the last two)

Kevin Rue-Albrecht (11:00:10): > Ah, Sean (9)

Aaron Lun (11:02:33): > But presumably it was hosted by someone, or someone was… paying… for it?

Kevin Rue-Albrecht (11:03:03): > haaaa “set up” in this way, sorry:sweat_smile:

Davide Risso (11:04:19): > @Martin Morganset up the meeting

Aaron Lun (11:04:22): > Yes, had to get a dictionary to remember what “paying” for things means.

Aaron Lun (11:04:25): > Ah, okay.

Davide Risso (11:05:06): > But yes, I agree that having another meeting on Monday dedicated to the RFA would be good

Kevin Rue-Albrecht (11:05:17): > @Aaron Lunoriginal message - Attachment: Attachment > Hi everyone, > just a gentle reminder that we will have a group call next Monday (8/7) at 12noon EST. > The Bioconductor foundation offered to pay for the use of the Bluejeans conference software (up to 50 people in the call) so there’s room for everybody! > I will send more info and links to join the call on Monday.

Davide Risso (11:12:10): > Should we do it again at 12noon EST? I don’t think that I can set up the meeting. I believe that only@Martin Morganhas that power

Aaron Lun (11:20:32): > Yep, sounds good.

Aaron Lun (11:20:38): > if others are happy with that.

Kevin Rue-Albrecht (11:25:43) (in thread): > Thinking about the interactive zoom-in/out as a Google Maps, I’m toying with the analogy of house collapsing into cities, regions, countries, … > It occurs to me that the ‘collapse’ of multiple data points could be done in a few different ways: > 1) visually close points (e.g. turning the plot progressively into a heat map representing areas dense in cells, irrespective of their phenotype/cluster membership), > 2) points that share phenotype (either experimentally known, or resulting for unsupervised clustering methods) that could be collapsed as semi-transparent discs potentially overlapping > > Also, typically I would expect the ‘collapse’ to take place according to some arbitrary (configurable) threshold in terms of data points (e.g. < 1,000 cells: individual data points; 1e3-1e5: discs, etc…). In order to maintain some compromise between ‘time to display’, ‘image file size’, ‘information content’. > > Again just thoughts, if it can help get the ball rolling.

Peter Hickey (11:34:32): > I’m likely to be unavailable.@Kasper D. Hansenmay be available and is probably who you want anyway to discuss grants

Aaron Lun (12:16:32): > Yeah, let’s talk money. I’ll see if I can drag along John.

Aaron Lun (13:03:40): > Wondering whether I should merge the cloud bit and free up a section on visualization…

Aaron Lun (13:06:33): > Or maybe the cloud stuff is strong enough to stand on its own.

Aaron Lun (13:06:51): > :persevere:

Aaron Lun (13:29:37): > Okay, it’s been substantially de-waffled, down to about 850 words. I think it’s probably enough words for the shared component, each application will then have ~800 unique words to talk about prior contributions, proposed work, proposal for evaluation/dissemination, and the statement of commitment.

Aaron Lun (13:33:13): > Looks pretty good, if I do say so myself. Or at least, much better than it was before. Thanks to@Vince Careyfor calibrating the level of the aims.

Kasper D. Hansen (14:05:21): > I’ll be on the call. As I recall, the RFA was completely opaque wrt. money. Any intel on what we can ask for?

Aaron Lun (14:08:10): > 500,000 USD for one year, according to John.

Aaron Lun (14:08:19): > per individual application.

Aaron Lun (14:09:42): > But I think that’s the very top end of applications; I don’t think we’ll be asking for nearly as much.

Kasper D. Hansen (14:16:18): > No, it would be way way less I think

Kasper D. Hansen (14:17:12): > (per application)

Stuart Lee (21:13:41): > @Stuart Lee has joined the channel

Stuart Lee (22:17:43): > Hi everyone, I’m@Di Cook’s PhD student working on visualisation for genomics. Di and I are keen to contribute to the visualisation part of the grant. Broadly, my thesis topic overlaps with some of the content in the grant. At the moment I’m working in collaboration with Mike Lawrence on a package called query (a dplyr-like interface for Bioconductor classes with deferred evaluation) and ggbio2 (an overhaul of the grammar of graphics for biology). The reason we like the grammar is because it provides a framework for reasoning about how the data relates to the graphic (in that you can think about how realisations of random variables are being mapped to aesthetics). We’ve been thinking about issues around scalability and interactivity and how other methods such as guided tours and scagnostics could be useful in scRNA-seq. I would be keen to build a package that extends ggbio2 specifically for SingleCellExperiment. Anyway always happy to discuss further if you’re interested.

Di Cook (22:17:50): > @Di Cook has joined the channel

2017-08-11

Kasper D. Hansen (10:21:33): > Several people are probably going to draft stuff over weekend, in prep for Monday. May I suggest we all use google docs to write and share, posting a link here when there is something to look at.

Kasper D. Hansen (10:22:28): > I think - but could be wrong - that the more coordinated our efforts are, the better. They clearly want to encourage collaborations. We have a strong track record of that, so let’s build on that strength

Aaron Lun (11:14:42): > John will hopefully be able to join our call 30 minutes in, once he finishes some other HCA-related stuff.

Davide Risso (12:59:53): > FYI from the HCA slack: “CZI will be hosting a Reddit Ask Me Anything on Monday, August 14th- opening at 7:30am PT- our team will answer questions from 10:30am- 12:30pm PT. The goal is to discuss questions related to our the second request for applications for the Human Cell Atlas related to collaborative computational tools (https://chanzuckerberg.com/initiatives/rfa)” - Attachment (The Chan Zuckerberg Initiative): Request for Applications – The Chan Zuckerberg Initiative > The Chan Zuckerberg Initiative seeks applications for the development of computational tools, algorithms, visualizations, and benchmark datasets in support of the Human Cell Atlas.

Vince Carey (15:47:06) (in thread): > Stuart, this sounds great. If I recall correctly the RFA encourages links to sites and videos … so if you can post some on guided tours and scagnostics in genomics it would be appreciated. The logistics of getting a proposal out of your institution would involve someone registering with the CZI system and I think this requires email back and forth with CZI, so if that hasn’t begun it would be wise to start.

Vince Carey (15:47:06): - Attachment: Attachment > Hi everyone, I’m @Di Cook’s PhD student working on visualisation for genomics. Di and I are keen to contribute to the visualisation part of the grant. Broadly, my thesis topic overlaps with some of the content in the grant. At the moment I’m working in collaboration with Mike Lawrence on a package called query (a dplyr-like interface for Bioconductor classes with deferred evaluation) and ggbio2 (an overhaul of the grammar of graphics for biology). The reason we like the grammar is because it provides a framework for reasoning about how the data relates to the graphic (in that you can think about how realisations of random variables are being mapped to aesthetics). We’ve been thinking about issues around scalability and interactivity and how other methods such as guided tours and scagnostics could be useful in scRNA-seq. I would be keen to build a package that extends ggbio2 specifically for SingleCellExperiment. Anyway always happy to discuss further if you’re interested. - Attachment: Attachment > Stuart, this sounds great. If I recall correctly the RFA encourages links to sites and videos … so if you can post some on guided tours and scagnostics in genomics it would be appreciated. The logistics of getting a proposal out of your institution would involve someone registering with the CZI system and I think this requires email back and forth with CZI, so if that hasn’t begun it would be wise to start.

2017-08-13

Di Cook (06:55:47): > i’ve registered now

Davide Risso (18:18:20): > Hi all, this is a friendly reminder that we will have a meeting to discuss strategies for the CZI RFA tomorrow (Monday) at 12noon EST

Davide Risso (18:18:37): > To connect, please use the link:https://bluejeans.com/985480508 - Attachment (bluejeans.com): Blue Jeans Network | Video Collaboration in the Cloud > Blue Jeans Network - Interoperable, Cloud-based, Affordable Video Conferencing Service

Davide Risso (18:19:20): > To join via phone : > 1) Dial:+1.408.740.7256(United States)+1.408.317.9253(Alternate number) > (see all numbers -https://www.bluejeans.com/numbers) > 2) Enter Conference ID : 985480508 - Attachment (Blue Jeans Network): Audio Conferencing Using Telephone Numbers - BlueJeans > Make audio conference calls with BlueJeans using any of these telephone numbers. Enjoy one touch audio conferencing and experience easy to use conference calls.

Davide Risso (18:29:27): > Agenda:https://docs.google.com/document/d/120MAhngbIe_EGi2ObnKyBMfuFpGBqK_3inOmRdHZnmU/

Kasper D. Hansen (22:18:12): > @Di CookThe RFA seems to have particular focus on web-based visualization technologies. Also, note that they are probably especially interested viz. tools for very large number of cells (at least my guess)

Stuart Lee (23:42:32): > @Di Cookand I have added in a short paragraph as an aim 3 now, so you can get an idea of what we would contribute.

2017-08-14

Mike Smith (05:02:25): > @Mike Smith has joined the channel

Kasper D. Hansen (11:55:00): > We should write something about distributed model fits.

Kasper D. Hansen (11:55:39): > or distribution of data summaries

Lorena Pantano (11:56:31): > @Lorena Pantano has joined the channel

Kasper D. Hansen (13:11:11): > So we decided to aim at having individual applications ready Friday, shared via google doc

Kasper D. Hansen (13:11:20): > We have another call same time next Monday

Kasper D. Hansen (13:14:19): > https://www.reddit.com/r/science/comments/6tlrbk/hi_reddit_were_a_group_of_scientists_and/ - Attachment (reddit): Hi Reddit, we’re a group of scientists and engineers from the Chan Zuckerberg Initiative and we’re helping to build a Human Cell Atlas. Ask Us Anything! • r/science > Hey Reddit! We’re a group of scientists and engineers from the Chan Zuckerberg Initiative – a philanthropic organization founded by Mark…

Kasper D. Hansen (13:16:11): > Hmm, I thought it was about the specific RFA and instead it seems to be some general Qs

Martin Morgan (13:16:12): > set up a reminder “RFA discussion https://bluejeans.com/985480508 at 12 noon EST” in this channel at 9AM Monday, August 21, Eastern Daylight Time.

Kasper D. Hansen (13:16:32): > Not really the place to ask for legal advice on including out-of-instituion people

Davide Risso (13:26:46): > yeah, I’ll just send an email..

Aaron Lun (13:27:14): > @Davide RissoPut down some old stuff regarding big data algorithms from an old draft, now on the back end of the Google docs; some of it may be useful to you.

Davide Risso (13:27:28): > thanks!

Kasper D. Hansen (13:56:44): > from the ama “Glad you asked! Analysis and re-analysis of HCA data requires a lot of compute firepower - exactly how much depends on the type of data and stage of processing. RNA sequence alignment, peak calling, joint re-processing, and image segmentation are examples of “embarrassingly parallel” tasks that require thousands of independent cores at the scale of data we’re handling. High-dimensional clustering of cells by expression profile is an example of a more HPC-heavy task requiring a single big CPU or GPU node. To accommodate compute and storage needs, we are building a data storage backbone on public cloud object storage technologies. This multi-replica storage system will be used by partners in the HCA project to openly share and publish all the data HCA produces. We will be storing the data on several different public clouds and regions, so that researchers can “bring compute to the data” - for example, by running very large computational workflows to process imaging and RNA sequencing data in the cloud, without encountering processing bottlenecks. (AK)”

Aaron Lun (14:22:03): > got the ball rolling with a link on our point of the google docs.

2017-08-15

Aaron Lun (06:48:00): > @Martin MorganWould Herve be considered part of your group for the purposes of this application? I’m thinking of proposing direct support forDelayedMatrixobjects inbeachmat.

Martin Morgan (08:02:59): > @Aaron Lunherve is part of my group, yes

Aaron Lun (11:29:06): > John suggests that we have a common part of the title, e.g., “:

Davide Risso (11:38:53): > I agree. Also, are the 500 words of “Collaborative network” the place to list each other’s application? Should this be the same across applications or not?

Aaron Lun (11:41:35): > Yes, I think we’ll talk about other applications in that point. My understanding is that it probably won’t be the same, not least because you won’t be talking about how you’ll collaborate with yourself. As long as people keep their “Who wants to do what” updated; we can make sure that everyone is singing from the same page on Monday. Check out my docs for an example.

Aaron Lun (11:43:36): > Has anyone heard from@Andrew McDavidabout this?

Aaron Lun (11:46:30): > Regarding the common part of the title: I currently have “Reinforcing the Bioconductor framework for the Human Cell Atlas”. Open to better suggestions, particularly ones involving music-related puns; but “orchestrating” has already been used in the 2015 paper, and talking about a single-cell symphony seems too contrived.

Kevin Rue-Albrecht (12:12:12): > “Tuning Bioconductor […]” ?

Aaron Lun (12:12:27): > Lol.

Aaron Lun (12:12:37): > Of course, it’s not the conductor that gets tuned.

Vince Carey (12:13:12): > suggest that the title not read as if we are using HCA to reinforce Bioc but the other way around

Vince Carey (12:13:45): > i.e., HCA should come first

Kasper D. Hansen (12:30:19): > Hardening instead of Reinforcing

Kasper D. Hansen (12:30:54): > I would think we write a common 500 collaboartive network and then people just delete their own entry

Kevin Rue-Albrecht (12:49:48): > I don’t want to waste too much of everyone’s time on music puns, but ‘arrangement’ fits the situation fairly well, considering the project aims (“a musical reconceptualization of a previously composed work”) > Plus the word itself is ‘normal’ enough to avoid unnecessary eye-rolling ^^ > Again, not a top priority, but I’m just putting it out there before I forget if someone wants to make something of it

Aaron Lun (12:56:24): > Regarding the collaborative network: parts of it can definitely be common, e.g., describing who is involved and a brief overview of their role in Bioconductor. But I think that each application would benefit from focusing on specific interactions in the network and how those collaborations relate to the proposed work.

Aaron Lun (13:03:51): > I’ve put an example of what that might consist of in the google docs for our individual application; the bit where we describe everyone is common, but the pairwise interactions between groups are more likely to be specific. For example, Kasper and Davide will probably work together on the big data algorithms, but that isn’t particularly relevant to our application.

Aaron Lun (13:06:36): > @Vince CareyNot quite sure I understand the difference in the HCA <-> Bioc order.

Vince Carey (13:08:53): > instead of “f(Bioc) for g(HCA)” i suggest “g(HCA) with f(Bioc)” – for example “Building and using the Human Cell Atlas with Bioconductor:

Aaron Lun (13:12:12): > Okay. John was complaining about my current title being too long, so maybe “ the Human Cell Atlas with Bioconductor: ”.

Aaron Lun (13:12:34): > Trying to think of a good word here, but “Grokking the Human Cell Atlas with Bioconductor” doesn’t quite work.

Aaron Lun (13:14:26): > Maybe “Exploring”.

Vince Carey (13:14:36): > Enhancing?

Kasper D. Hansen (13:34:32): > Should we or should we not put bioc in title. Shouldn’t the focus be on HCA

Kasper D. Hansen (13:35:20): > One possibility “Hardening single cell analysis infrastructure for the Human Cell Atlas”

Kasper D. Hansen (13:36:02): > And then start the summary with something like “The platform of choice for interacting and exploring single cell data is R/Bioconductor”

Aaron Lun (13:36:16): > I think explicitly mentioning Bioconductor in the title is nice; after all, it’s the glue that ties us all together.

Kasper D. Hansen (13:37:46): > Yeah, I see that, but it seems the starting point might be more “What can we do with HCA in Bioconductor” as opposed to “How can we bring value to HCA” where the answer to the second one is “Of course we need to improve Bioc is necessary; everyone uses it”

Kasper D. Hansen (13:38:04): > But I don’t think title matters much to be honest

Martin Morgan (13:48:12): > Exploiting the bioc tagline ‘Statistical analysis and comprehension of the Human Cell Atlas: …’

Di Cook (16:49:11): > I am not sure if you want to have a vis part or not. Stuart and I are ok, either way. If you want to include some vis we need to know how you would like it to be included.

Stephanie Hicks (20:47:23): > question: When it says “Prior contributions in this area and preliminary results (not required)”, did you interpret that asyour ownprior contributions or moregenerallyprior contributions?

Kasper D. Hansen (21:03:09): > @Di CookWe decided that the core of all of our proposals was the work on core infrastructure and algorithms supporting the transition from 100s of cells to 1,000,000 of cells and beyond. In all the proposed activities we are to some extent doing stuff which is simple and done for small scale experiments, but scaling it to much bigger datasets, which - we believe - is non trivial. In this context, visualization that follows this line of thought (scaling existing useful things to 1,000,000 of cells) would fit under current Aim 2. In my understanding, but I could be wrong because I have heard very little details on this - your group seems to be more interested in developing novel visualization tools for capturing this high dimensional data. This is super worthwhile, something that I am sure the CZI would be very interested it - but it doesn’t seem to fit into the joint proposal we are working on in this channel. I should note that I believe several groups here are submitting >1 proposal, and use their “other” proposal to cover more novel methodology (I guess).

Kasper D. Hansen (21:03:45): > @Di CookOne issue here is that the short format and the shared text implies that joint proposals should have a somewhat tight core.

Kasper D. Hansen (21:03:47): > @Di CookI also think a novel visualization proposal (perhaps even involving multiple PIs) would be seen as very interesting - it is clear that this field needs novel visualization and the RFA is quite explicit about seeking them.

Kasper D. Hansen (21:06:41): > I hope this is a fair summary of the discussion we had Monday.

Kasper D. Hansen (21:07:11): > @Stephanie Hicksalways your own work. Hints at the de-factor NIH requirement for preliminary results

Kasper D. Hansen (21:08:07): > @Stephanie HicksWe should put stuff down on our existing collaboration, meetings, start of the SingleCellExperiment package, DelayedArray etc etc

Di Cook (22:04:12): > @Kasper D. HansenThanks for the explanation Kasper! No worries. We might wait and see if the overall proposal is funded, and if so leverage some of the new infrastructure for vis in a later round.

Raphael Gottardo (23:29:03): > @Raphael Gottardo has joined the channel

2017-08-16

Aedin Culhane (00:56:25): > Hi everyone. Sorry I was absent, I was on a kids-on-the-beach get-away. I like Martin’s idea for the title. Bioconductor makes the data accessible to statisticians. Not all statisticians are not python savvy ,so building a framework in Bioc, opens up HCA data to the statistical community, which in turn encouraged new methods development.

Aedin Culhane (01:13:04): > Also from the ama: We are working with the GA4GH, Broad Institute, and the Common Workflow Language group to develop better ways to enable portable, reproducible scientific workflows. Our data coordination platform has this as a top priority. We are currently considering using technologies like Dockstore, CWL, WDL, AWS Batch, Google Genomics Pipelines, etc. (which all utilize Docker in their stacks) to enable this goal. > CZI itself doesn’t run sequencing assays, but HCA partner and funded labs do. Aside from using RSEM on HiSeq reads (which some of our pipelines definitely do!), there will probably be sequencing datasets coming from Oxford Nanopore and various other technologies. We focus on selecting and funding the most promising efforts using these new technologies, including sample prep, sequencing, normalization, and denoising projects and algorithms. Some of our funded project are specifically for comparing between them. > We’re always looking at new experimental methods and assays, and expect that RiboSeq and other innovative assays will figure prominently in our funded projects. > We agree wholeheartedly that doing science is easier and more powerful when you have large core facilities with standardized protocols, lots of automation and sophisticated monitoring. At the same time we must balance this against equitability, diversity and sourcing of scientific ideas from the wider community. Striking the right balance and working as a community toward gold standard pipelines and best practice protocols is a big priority for us. We won’t be starting major core facilities of our own, but we will be encouraging distributed reproducible science through agreement on these best practices and through technology sharing among project partners. (AK) - Attachment: Attachment > from the ama “Glad you asked! Analysis and re-analysis of HCA data requires a lot of compute firepower - exactly how much depends on the type of data and stage of processing. RNA sequence alignment, peak calling, joint re-processing, and image segmentation are examples of “embarrassingly parallel” tasks that require thousands of independent cores at the scale of data we’re handling. High-dimensional clustering of cells by expression profile is an example of a more HPC-heavy task requiring a single big CPU or GPU node. To accommodate compute and storage needs, we are building a data storage backbone on public cloud object storage technologies. This multi-replica storage system will be used by partners in the HCA project to openly share and publish all the data HCA produces. We will be storing the data on several different public clouds and regions, so that researchers can “bring compute to the data” - for example, by running very large computational workflows to process imaging and RNA sequencing data in the cloud, without encountering processing bottlenecks. (AK)”

Raphael Gottardo (02:17:49) (in thread): > I agree with this. I don’t think we need a common part in the title. Same for the actual proposal. 1600 words is short, so I would keep it specific to the actual work to be done. The common part can go into “Collaborative network”. But even that should be specific, so perhaps just a few sentences that we all include to make it clear that we all talked to one another, like the two overall aims we’re working towards.

Raphael Gottardo (02:17:49): - Attachment: Attachment > And then start the summary with something like “The platform of choice for interacting and exploring single cell data is R/Bioconductor” - Attachment: Attachment > I agree with this. I don’t think we need a common part in the title. Same for the actual proposal. 1600 words is short, so I would keep it specific to the actual work to be done. The common part can go into “Collaborative network”. But even that should be specific, so perhaps just a few sentences that we all include to make it clear that we all talked to one another, like the two overall aims we’re working towards.

Raphael Gottardo (02:18:45) (in thread): > Another idea would be to have a core proposal that mentions all other projects, one that could be lead by Martin.

Raphael Gottardo (02:19:15): > Another idea would be to have a core proposal that mentions all other projects, one that could be lead by Martin.

Aedin Culhane (02:52:04): > Onhttps://github.com/HumanCellAtlas, they are creating data bundles which are uploaded to S3 bucketorg-humancellatlas-data-bundle-examplesIt might be worth for us to look at these and make sure the Bioc SingleCellExperiment classes will be happy with the meta-data formats - Attachment (GitHub): Human Cell Atlas > GitHub is where people build software. More than 23 million people use GitHub to discover, fork, and contribute to over 64 million projects.

Kasper D. Hansen (09:21:41) (in thread): > I see what you’re saying re. duplicating text, but I think we have been told very specifically to duplicate text in the proposal itself. Remember this is after all a pre-proposal.

Kasper D. Hansen (09:21:41): - Attachment: Attachment > And then start the summary with something like “The platform of choice for interacting and exploring single cell data is R/Bioconductor” - Attachment: Attachment > I see what you’re saying re. duplicating text, but I think we have been told very specifically to duplicate text in the proposal itself. Remember this is after all a pre-proposal.

Kasper D. Hansen (09:22:08) (in thread): > I don’t think it is a good idea to deviate from specific instructions, even if they are not clear on RFA page

Mike Smith (10:14:17): > So is there a consensus on where we want to include the common text? I guess section 1 or 2 in their 6 point Proposal outline might work? I find it quite hard to read an explicit set of instructions and then kind of ignore them.

Aaron Lun (10:30:49): > Yes, John reckons we should put the common text in section 1 (Summary) and 2 (Project aims), along with a sentence in the Collaborative network saying that this is a collaboration - not 100% sure of the text there yet.

Raphael Gottardo (13:37:27): > Well they said it was ok to do that but we certainly don’t have to. Don’t you think it would be redundant if they read the same text 15 times, and in a way useless? It might be better to mention each others in specific way and then say in the Collaborative network that we’re all working together as part of Bioconductor. Happy to go with what everyone thinks it’s best but at the same time given the very limited amount of text we can put, we should try to keep the common text to the bare minimum.

Raphael Gottardo (13:38:27): > Again, rereading the email response, it says: > > “Proposal sounds great and I think in general that two parts you outline are perfect - essentially portions of common text that tie together the general goals and then some portion of the application that specifies the contributions/questions that your group will specifically be working on. The ratio of the two sections is up to you and I suppose depends on the degree of overlap - nothing fixed. There is a field in the application that will capture collaborators and link the applications so it should be all good.”

Raphael Gottardo (13:39:00): > So the response does mention the Collaborative network part as the way to do that. That’s what I understand from it.

Davide Risso (13:54:01): > I guess one question is whether the applications will be reviewed necessarily by the same reviewers

Davide Risso (13:54:21): > but I don’t think we know that

Aedin Culhane (18:24:38): > These are pre-proposals so they may keep the reviews in-house or among a small team.

Raphael Gottardo (18:43:16): > @Greg Finakhave a look at the discussion on this channel.

Greg Finak (18:43:19): > @Greg Finak has joined the channel

Steve Tsang (19:01:24): > @Steve Tsang has joined the channel

2017-08-17

Aaron Lun (04:37:04) (in thread): > Jonah’s response confirms John’s initial query, that the shared text will be in the main proposal, part of the 1600 word limit. The collaborative network is a separate part of it that confirms that we are all working together. Yes, there will be redundancies, but if they didn’t want that, they shouldn’t have gone with the whole “each PI has to submit their own application” thing on one hand and encourage collaborations on the other.

Aaron Lun (04:37:05): - Attachment: Attachment > So the response does mention the Collaborative network part as the way to do that. That’s what I understand from it. - Attachment: Attachment > Jonah’s response confirms John’s initial query, that the shared text will be in the main proposal, part of the 1600 word limit. The collaborative network is a separate part of it that confirms that we are all working together. Yes, there will be redundancies, but if they didn’t want that, they shouldn’t have gone with the whole “each PI has to submit their own application” thing in the first place.

Aaron Lun (05:06:10): > In any case, I’ve spent some time trimming down the common text to keep it below 900 words. This should give us enough space (700 words) to discuss the proposed work in each of our individual applications; each of us will only be tackling a small part of the aim(s), so I can’t imagine there’s that much to say. As John said earlier, multiple applications are allowed, so you can also propose to develop your own methods independently of this joint proposal.

Kasper D. Hansen (08:21:39): > We could also have a nested structure. Some part everyone has and then some part only some groups have. For example a common section on HDF5 could be useful in >1 group. Perhaps this is overthinking it though

Aaron Lun (08:22:24): > That’s a good idea if the HDF5 people are up for it.

Kasper D. Hansen (08:24:16): > I suggest we all write, perhaps going over the 1600 words and then we look at stuff over weekend and fix things

Kasper D. Hansen (08:24:45): > On the assumption that cutting is easy (which is not always true)

Mike Smith (08:26:01): > I’ve got a paragraph on ‘Why HDF5?’ since writing about improvements forrhdf5without mentioning it didn’t make a huge amount of sense. Happy for it to be included elsewhere when we review next week.

Aaron Lun (08:32:05): > I do think, though, that the current common section is pretty mature - Wolfgang, John and others (Martin and Aedin, I think?) have gone through it at least once, and I don’t want to add more text as this cramps up the space available for our individual applications.

Kasper D. Hansen (08:37:01): > If anything it should be made briefer

Kasper D. Hansen (08:37:13): > But lets write the specifics and then assess

Davide Risso (09:21:16): > I’ve added a link to my specific proposal in the shared Google doc since I’m leaving tonight

Davide Risso (09:22:05): > Unfortunately I’ll have to miss Monday’s phone call, but@Stephanie Hickswill be there for our group

Davide Risso (09:24:09): > Our proposal is currently >900 words, but hopefully Kasper is right and cutting will be easy:slightly_smiling_face:

Raphael Gottardo (12:16:59): > 900 words is way too long. I think it should be 500 or less.

Davide Risso (12:33:53): > @Raphael Gottardoare you talking about the shared or individual part?

Raphael Gottardo (12:34:37): > The shared.

Aaron Lun (14:49:11): > Sub 800 now, which is pretty decent.

Raphael Gottardo (14:50:11): > To be honest, I will likely use a very trimmed down version of it.

Raphael Gottardo (14:50:27): > But it reads well as is.

2017-08-18

Aedin Culhane (10:48:54): > is the google doc link to the individual proposals?

Mike Smith (10:53:14): > The one pinned by@Aaron Lunis the ‘main’ version with the common text (+ other stuff) and then links to the separate proposals are appearing in the ‘Which PI wants to do what?’ section of that.

Aedin Culhane (10:53:38): > Thanks@Mike Smith

Aedin Culhane (17:09:28): > Maybe of interest to some people… webinar series from 10x genomicshttp://go.10xgenomics.com/l/172142/2017-08-07/cp15d

2017-08-19

Rafael Irizarry (13:54:39): > @Rafael Irizarry has joined the channel

Rafael Irizarry (14:10:30): > Hi all! sorry for joining late. I was hoping contribute a proposal more on the methods side. Talk about how these methods are necessary to get usable data, but that without the proposed infrastructure it is impossible to make practical. I’ll be checking things in soon.

2017-08-20

Rafael Irizarry (14:07:30): > Anybody know what “drop down list” means here:https://chanzuckerberg.com/initiatives/rfa?utm_source=CZI+Science+Grants&utm_campaign=6923dba4ab-RFA+2+Now+Open&utm_medium=email&utm_term=0_54d3511fe0-6923dba4ab-47278821&mc_cid=6923dba4ab&mc_eid=767c2e66b8? - Attachment (The Chan Zuckerberg Initiative): Request for Applications – The Chan Zuckerberg Initiative > The Chan Zuckerberg Initiative seeks applications for the development of computational tools, algorithms, visualizations, and benchmark datasets in support of the Human Cell Atlas.

Vince Carey (15:10:17): > when u apply to their site there willbe a fixed set of options to define project focus you must use one of their terms

2017-08-21

Aaron Lun (07:36:55): > @Martin MorganAre we having a final meeting about this in the afternoon?

Martin Morgan (07:47:32): > yes at noon contact info is > > To join the Meeting:https://bluejeans.com/115420454To join via Room System: > Video Conferencing System:bjn.vc-or-199.48.152.152 > Meeting ID : 115420454 > > To join via phone : > 1) Dial:+1.408.740.7256(United States)+1.408.317.9253(Alternate number) > (see all numbers -http://bluejeans.com/numbers) > 2) Enter Conference ID : 115420454 - Attachment (bluejeans.com): Blue Jeans Network | Video Collaboration in the Cloud > Blue Jeans Network - Interoperable, Cloud-based, Affordable Video Conferencing Service - Attachment (Blue Jeans Network): Audio Conferencing Using Telephone Numbers - BlueJeans > Make audio conference calls with BlueJeans using any of these telephone numbers. Enjoy one touch audio conferencing and experience easy to use conference calls.

Rafael Irizarry (13:20:45): > if the proposals have two PIs does it make sense to list them both in the main doc?

Stephanie Hicks (13:36:05) (in thread): > my understanding is that they want only one person listed as the PI per applicationhttps://chanzuckerberg.com/initiatives/rfa/faq - Attachment (The Chan Zuckerberg Initiative): RFA Frequently Asked Questions - Chan Zuckerberg Initiative > More about the Chan Zuckerberg Initiative’s RFA for Collaborative Computational Tools for the Human Cell Atlas.

Aaron Lun (15:01:54): > Note that, regardless of what happens, we will be submitting on Friday, not least because Monday is a bank holiday in the UK and there’d be no one to complain to if something went wrong.

2017-08-22

Keegan Korthauer (09:17:31): > @Keegan Korthauer has joined the channel

Aedin Culhane (12:34:43): > The HumanCellAtlas channel have announced creation of a scRNAseq benchmark dataset project See the channel #benchmarkhttps://humancellatlas.slack.com/archives/C2EP65G59/p1503403667000055

Aedin Culhane (12:35:19): > holger.heyn@cnag.crg.euannounced it

Nitesh Turaga (14:16:30): > @Nitesh Turaga has joined the channel

2017-08-23

Martin Morgan (07:13:03): > the revised draft of the first two ‘shared’ components is (are?) complete. Feel free to comment. I’ll work on the relatively short third component (summary for the main proposal) and citations this morning.https://docs.google.com/document/d/1ZOiPLDCoI97P0rH01HOk5KDnBL-NjqMtFKFumMXwXCI/edit?usp=sharing

Martin Morgan (07:14:25): > I’ve given the consortium a name “Statistical Analysis and Comprehension of the Human Cell Atlas with R / Bioconductor” and in the collaborative network section added titles associated with each pi; there are FIXMEs for@Wolfgang Huber@Vince Carey@Kasper D. Hansen.

Martin Morgan (11:09:23): > @Aaron Lunwhat’s the most authoritative way to cite your 10x exploration?

Aaron Lun (11:09:51): > Currently, it would be thebeachmatBiorXiv paper (http://www.biorxiv.org/content/early/2017/07/24/167445). - Attachment (bioRxiv): beachmat: a Bioconductor C++ API for accessing single-cell genomics data from a variety of R matrix types > Recent advances in single-cell RNA sequencing have dramatically increased the number of cells that can be profiled in a single experiment. This provides unparalleled resolution to study cellular heterogeneity within biological processes such as differentiation. However, the explosion of data that are generated from such experiments poses a challenge to the existing computational infrastructure for statistical data analysis. In particular, large matrices holding expression values for each gene in each cell require sparse or file-backed representations for manipulation with the popular R programming language. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with simple, sparse and HDF5-backed matrices, amongst others. We perform simulations to examine the performance of beachmat on each matrix representation, and we demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large single-cell data set.

Martin Morgan (11:11:47): > perfect, thanks!

Martin Morgan (11:18:20): > @Peter Hickey@Kasper D. Hansenis there a citation to your very large methylation analysis using DelayedArray ?

Kasper D. Hansen (11:18:34): > no, unfortunately

Kasper D. Hansen (11:20:19): > Well, depends on POV. You could put down “L Rizzardi∗, P Hickey∗, V Rodriguez, R Tryggvadottir, C Callahan, A Idrizi, KD Hansen†, AP Feinberg†. > Neuronal brain region-specific DNA methylation and chromatin accessibility are associated with neuropsychiatric disease heritability. > bioRxiv 2017.”

Kasper D. Hansen (11:20:32): > where we analyze matrices which are 28.M x 48

Kasper D. Hansen (11:20:46): > But the really large stuff we have been doing are not out yet

Martin Morgan (11:35:44): > ok I’ll use that anyway, as bigger than small thanks

Raphael Gottardo (13:13:10): > @Martin MorganThanks, it’s great. I have modified the summary to 1) remove references as they can’t be included there and 2) tone down the large-scale aspect of things (you made it sound like it was already great) as we want to further develop this as part of our proposed work.

Raphael Gottardo (14:27:25): > @Martin MorganI have modified the summary again addressing your comments, let me know if you like it better.

Raphael Gottardo (21:06:46): > @Vince Carey@Wolfgang Huber@Kasper D. HansenCan you guys add your titles to the google doc? Specifically in the collaborative network so that it can be added to all applications (including mine). Thanks!

Kasper D. Hansen (22:29:51): > Vince and I have entered titles; Wolfgang is probably sleeping

2017-08-24

Martin Morgan (07:06:46): > tagging@Mike Smithfor a title, in case that’s a better channel to@Wolfgang Huber

Mike Smith (07:10:44) (in thread): > I don’t work at EMBL on Thursdays, but I sent a nudging email this morning when I saw the message from@Raphael Gottardo, and I’ll chase up later today.

Mike Smith (07:10:44): - Attachment: Attachment > tagging @Mike Smith for a title, in case that’s a better channel to @Wolfgang Huber - Attachment: Attachment > I don’t work at EMBL on Thursdays, but I sent a nudging email this morning when I saw the message from @Raphael Gottardo , and I’ll chase up later today.

Stephanie Hicks (10:24:01): > Is the plan to still to have a : where the Common title is “Exploring the Human Cell Atlas with Bioconductor”? I see some applications have it, and others don’t.

Mike Smith (10:34:14) (in thread): > I’ve added a title for the Huber group.@Raphael Gottardoare you planning to link to your submission in the document?

Mike Smith (10:34:14): - Attachment: Attachment > @Vince Carey @Wolfgang Huber @Kasper D. Hansen Can you guys add your titles to the google doc? Specifically in the collaborative network so that it can be added to all applications (including mine). Thanks! - Attachment: Attachment > I’ve added a title for the Huber group. @Raphael Gottardo are you planning to link to your submission in the document?

Raphael Gottardo (10:47:03): > @Mike SmithDone, I hadn’t realized we were supposed to add a link.

Vince Carey (12:17:08): > Is our “focus area of project” “Computational Biology”? that is on the dropdown

Aaron Lun (12:18:29): > I think it would be closer to data infrastructure?

Aaron Lun (12:18:41): > There’s not much actual biology in our application.

Vince Carey (12:19:33): > hmm. maybe there is a bit more in ours that is devoted to cell and gene ontology. perhaps we do not need to select the same focus term

Raphael Gottardo (12:25:28): > I was thinking Data Science, since that term includes everything we do.

Vince Carey (12:41:33): > Note that the “approximately 250 word brief summary” in the fluxx portal is actually limited to 1750 characters. Use small words.

Martin Morgan (17:08:33): > #hca_rfamight be helpful in our proposals to emphasis ‘tertiary’ analysis tools using the ‘consumer API’ and working on ‘derived’ data, fromhttps://docs.google.com/document/d/1gufaG-aRI0Zp98W5ztXVQknDFo1TBalBcyP1eG7JWig/edit(fromhttps://drive.google.com/drive/u/0/folders/0B1PPPeXlIFfjTjJLR0toeWhPQ1k)

Martin Morgan (17:10:31): > Also somehow that the algorithms and analysis do not exist to provide a finished product (e.g., RESTful endpoints where one can perform machine learning algorithm X on arbitrary samples) for others to consume; the point of science / our proposal is to do new things… (oops, bit of a rant at the end there…)

Davide Risso (18:10:14): > Just a quick note that my institute is Weill Cornell Medical College (not School)

Davide Risso (18:10:41): > I’ve changed it in the common text, but if you copied an earlier version in your own application, please update, thanks:slightly_smiling_face:

Davide Risso (21:13:24): > We also (slightly) changed the title

2017-08-25

Aaron Lun (06:50:55): > We’ve submitted.

Martin Morgan (08:40:38): > @Aaron Lunthanks Aaron for your excellent work moving the 10x and SingleCellExperiment work forward, and then for getting this channel going and drafting / negotiating the common text; a really positive contribution!

Aedin Culhane (12:26:20): > Hi Martin, check the terms of the RFA. Tim mentions something about funding the purple/blue/green but not red (tertiary) at the moment. Rafa’s normalization, and much of the infastruture might be part of the main data pipeline. Ontology could both feed the metadata (data ingestion) and tertiary - Attachment: Attachment > #hca_rfa might be helpful in our proposals to emphasis ‘tertiary’ analysis tools using the ‘consumer API’ and working on ‘derived’ data, from https://docs.google.com/document/d/1gufaG-aRI0Zp98W5ztXVQknDFo1TBalBcyP1eG7JWig/edit (from https://drive.google.com/drive/u/0/folders/0B1PPPeXlIFfjTjJLR0toeWhPQ1k)

Aedin Culhane (12:27:04): > “Gold” Standard test datasets; 13,000 human fibroblasts (in vitro), 6 batches, 800-5000 cells/batch:https://github.com/singlecell-batches/getting-startedfrom olgabotvinnikhttp://www.olgabotvinnik.com - Attachment (GitHub): singlecell-batches/getting-started > getting-started - How to get started with the single cell batches comparison

Martin Morgan (13:44:35): > @Aedin Culhanethere’s no mention of colors or secondary / tertiary in the RFA, and something like “The goal is to support a diverse set of well-validated tools to analyze, consume, integrate, and explore Human Cell Atlas data” sure sounds like what we’re proposing. > > A slide toward the end of the deck athttps://drive.google.com/drive/u/0/folders/0B5bKxHHklLF1MFN2XzZqR0FaTTAsays “We are not tertiary analysis”, which I read as saying the DCP won’t do tertiary analysis; I don’t view this (am I wrong?) about commentary on funding. > > The RFA says “[CZI engineers will] enhance and package their tools, and link them to the Human Cell Atlas Data Coordination Platform (DCP), if appropriate and desired”. > > The FAQ says “We are excited to help grantees integrate their work into the platform wherever useful, and new portals are certainly one possible result of funded projects” and ‘portals’ is the word used to describe tertiary analysis (red box). > > So I don’t see evidence that tertiary analysis is not desirable (and it’s too late anyway…:disappointed:)

2017-08-26

Davide Risso (18:51:13): > Hi all, I didn’t really want to work on anything serious this afternoon so I’ve put together a simple schema of the scRNA-seq Bioconductor packages + infrastructure to add as a figure in my proposal. In case anyone wants to reuse it / take inspiration from it, I’m uploading the Illustrator file. I’ve highlighted the packages of my specific part, but that’s easy to change. Also, let me know if I forgot any important package!

Davide Risso (18:51:41): > @Davide Rissouploaded a file:schema_hca.ai - File (Illustrator File): schema_hca.ai

Kasper D. Hansen (20:53:08): > looks good, except the lines from the green circles looks like they go into specific layers of the blue one

Kasper D. Hansen (20:54:14): > I would definately link the green infrastructure bubble to the top layer (qc/normalization) and the second bubble to both dimensionality reduction and clustering

Kasper D. Hansen (21:04:27): > and perhaps also to QC/norm

Davide Risso (22:07:43): > Ah! I didn’t think about that… I’ll modify it tomorrow! - Attachment: Attachment > looks good, except the lines from the green circles looks like they go into specific layers of the blue one

2017-08-27

Kasper D. Hansen (14:40:53): > make them arrows perhaps

Davide Risso (16:14:10): > I’ve made two more versions

Davide Risso (16:14:23): > @Davide Rissouploaded a file:schema_hca.ai - File (Illustrator File): schema_hca.ai

Davide Risso (16:14:41): > @Davide Rissouploaded a file:schema_hca2.ai - File (Illustrator File): schema_hca2.ai

Kasper D. Hansen (19:56:26): > I like the arrows. I would put an arrow from second bubble into first step (QC), since that bubble also has stuff like delayed matrix

Kasper D. Hansen (20:37:50): > @Kasper D. Hansenuploaded a file:schema_hca.ai - File (Illustrator File): schema_hca.ai

Stephanie Hicks (20:43:11): > I agree it’s technically more correct, however I feel like its a lot of arrows floating around…

Kasper D. Hansen (20:44:42): > it shows how improvements in those packages affects the entire stack

Stephanie Hicks (20:47:43): > yeah, I know. And it says “workflow” on the side, so maybe there should be arrows, but just just adding my two cents that my initial reaction when I saw the schematics was that “woah that’s a lot of arrows” (combo of both blue and green arrows). Felt a bit distracting.

Kasper D. Hansen (20:49:51): > one could also have a single gren arrow from both grren bubbles to the blue one, just signifying it inputs into the stack

Stephanie Hicks (21:00:17): > definitely. curious to hear other people’s input too before making changes

Kasper D. Hansen (21:01:28): > fair chance that we two are the only ones using it

2017-08-28

Davide Risso (08:47:39): > I agree with Stephanie that the arrows are a bit distracting

Davide Risso (08:47:52): > That’s why I tried the version with no arrows at all

Davide Risso (08:48:33): > Which I actually like better…

Davide Risso (08:49:12): > But I see your point

Kasper D. Hansen (08:49:33): > It depends on what the purpose is. Remember, a bit part of our work is providing the green bubbles. Nice to show that the green bubbles are essential to multiple stages of the stack

Davide Risso (08:50:28): > Yes, I see that.. perhaps I’ll try one last version with less ‘invasive’ arrows and see how it looks

Davide Risso (08:50:37): > And then we can make a decision

Kasper D. Hansen (08:51:06): > I agree aesthetically it is less pleasing for sure

Davide Risso (09:43:42): > @Davide Rissouploaded a file:schema_hca.ai - File (Illustrator File): schema_hca.ai

Davide Risso (09:43:47): > smaller arrows

Davide Risso (09:45:30): > I still prefer the no arrows version, but this seems less distracting than the previous version

Davide Risso (09:46:01): > should we make a final decision? Stephanie and I are almost ready to submit!

Stephanie Hicks (09:47:23): > Thanks Davide! I think I prefer the no arrows, but would you be willing to make one more version with just one green arrow (Kasper’s suggestion from last night)

Stephanie Hicks (09:47:25): > ?

Davide Risso (10:00:14): > Sure, but I’m not sure I fully understand. One green arrow to the top of the blue circle, but from which green bubble?

Davide Risso (10:00:57): > From both?

Davide Risso (10:01:05): > So two arrows?

Davide Risso (10:01:28): > Or from infrastructure to algorithms and from algorithms to the top of the blue?

Stephanie Hicks (10:05:40): > Though to be fair, I’m not in love with that idea either.

Davide Risso (10:16:37): > @Davide Rissouploaded a file:schema_hca3.ai - File (Illustrator File): schema_hca3.ai

Davide Risso (10:16:42): > what about this?

Kasper D. Hansen (10:20:21): > I think I’ll go with the multiple arrows. And that is just because my stuff is mostly about justifying why its a good idea to do what I am going to do

Kasper D. Hansen (10:20:37): > since it connects all over

Kasper D. Hansen (10:20:57): > But I don’t think we need the same figure, and the new one is more pretty:slightly_smiling_face:

Kasper D. Hansen (10:21:43): > Also, are you sure you have made the point in your proposal about extreme (millions to billions) scalability. I think that is important to make clear, otherwise you’ll get “doesn’t this already exist”

Stephanie Hicks (10:35:43): > Thanks Davide. I think I still prefer the no arrows, but I agree with Kasper that we don’t necessarily need the same figure. I’ll leave it up to you though.

Stephanie Hicks (10:37:39): > Good point@Kasper D. Hansen. I think we did address that, but always worth going back to stress that even more.

Davide Risso (10:42:01): > Yes, I’ll make sure to stress even more the extremely large scale of the data

Aedin Culhane (11:09:10): > @Davide Rissoplease can we add a “button” or mention of the ontology component that Vince and I have?

Aedin Culhane (11:09:23): > Shall I edit the schema, or do you want to do it?

Davide Risso (11:10:32): > If you don’t mind editing the schema yourself, that would probably be better, since I didn’t have much time to look at what you proposed in terms of ontology

Davide Risso (11:10:49): > and I’m afraid to misrepresent it

Aedin Culhane (11:11:00): > Ok. Which version should I edit. I know there was “arrows/no arrows” debate

Aedin Culhane (11:11:35): > I have a meeting now. But will do this when I’m back (12:30)

Davide Risso (11:11:39): > Well… I think that@Kasper D. Hansenwill use the multi arrow version and@Stephanie Hicksand I will use the no arrow version

Davide Risso (11:12:03): > but if you modify the one that you prefer, I can port the changes to the other version

Aedin Culhane (11:12:17): > ok

Kasper D. Hansen (13:07:18): > What do / did people use as “focus area”?

Vince Carey (13:07:54) (in thread): > we used data science

Stephanie Hicks (13:08:09) (in thread): > we used data science too

Aedin Culhane (14:20:52): > @Aedin Culhaneuploaded a file:schema_hca3_v2b.aiand commented: Suggestion b - File (Illustrator File): schema_hca3_v2b.ai

Aedin Culhane (14:21:15): > @Aedin Culhaneuploaded a file:schema_hca_v2a.aiand commented: suggestion a - File (Illustrator File): schema_hca_v2a.ai

Aedin Culhane (14:21:52): > I added the onto support onto both the mickey mouse and arrows versions of the schema_hca files.

Kasper D. Hansen (15:07:05): > technically we should make all the package names in the green blobs blue. I think Davide did what he did to highlight his work

Aedin Culhane (15:08:27): > Thanks for explaining the color scheme. Let me know if you want to edit or if you want me to edit

Kasper D. Hansen (15:08:41): > I am just guessing the color scheme to be honest

Davide Risso (15:29:22): > Yes. I just highlighted in colors the packages that Stephanie and I will contribute / enhance in our part of the proposal

Davide Risso (15:29:52): > the idea was that each individual PI would highlight their own specific contributions

Davide Risso (15:30:09): > in their version of the figure

Davide Risso (15:48:06): > OK, I’ve updated my figure to add RestfulSE and OntoPlus and I’m now ready to submit!

Kasper D. Hansen (16:18:03): > submitted

2017-09-24

hcorrada (19:59:35): > @hcorrada has joined the channel

2018-01-18

Martin Morgan (12:52:03): > <!channel>The document is athttps://docs.google.com/document/d/1NbHof0Uh4aCC6UbMDXlUOtEgnu4TrzmD7O6KfnnelL4/edit?usp=sharing. I’ve provided overall structure and copied relevant sections from my proposal in, and will edit somewhat the content. It would be great for y’all to do the same, focusing especially on punchy and doable ‘Delivarables’ coupled with fleshed-out but punchy aims 3-5 and then more extensive ‘Proposed work and deliverables’. The structure for the aims reflect our telephone conversation, with responsibilities annotated in the ‘Deliverables’ section. There is no page limit; we want to keep things realistic.

Martin Morgan (12:53:38): > @Aaron Lunfeel free to revise Aims & deliverables 1 & 2

Davide Risso (14:23:48): > Thanks@Martin MorganI should have some time tonight or tomorrow to go through the text and hopefully contribute something

Aaron Lun (15:23:11): > @Martin MorganShould thebeachmat/SingleCellExperiment stuffbe a separate aim? Or can I stuff it into Project Aim 2 (with a title change to “Develop standard representations of large single-cell data in semantically rich and established R / Bioconductor objects.”)?

Aaron Lun (15:32:12): > Also, any preference on how to handle everyone’s citations?

Aaron Lun (15:36:03) (in thread): > To that end I tweaked Project Aim 2.

Martin Morgan (15:42:45) (in thread): > yes that’s better I think

Martin Morgan (15:43:19): > I’m actually ignorant of best practices for reference management in google docs…

Aaron Lun (15:49:10) (in thread): > I think we’ll have to consider breaking up project aim 2 into sub aims/deliverables (e.g., as done for aim 4), as there’s going to be at least three major things: HDF5Array/DelayedArray, SingleCellExperiment/beachmat, and HDF5 improvements/rhdf5.

Martin Morgan (15:54:38) (in thread): > For the Deliverables section I think we should keep it as simple and ‘achievable’ as possible, so that the end of the funding period the deliverable can be marked as ‘done’. So for instance “Incremental improvements to packages for … such as x, y” is easier to feel good about ticking off than “improve package x”, “improve package y”. Breaking them out in the Proposed work section (e.g., as paragraphs for each) seems legit

Raphael Gottardo (16:55:13): > I use paperpile, which is really nice.

2018-01-19

Kasper D. Hansen (08:45:04): > agree with Gottardo

Davide Risso (15:54:55): > I’ve started adding text to Project Aim 4, mainly copy-pasting from my proposal and Kasper’s.@Kasper D. HansenI hope you’re fine with me reusing your text!

Davide Risso (15:55:13): > @Raphael Gottardofeel free to add the relevant parts of your proposal

Kasper D. Hansen (20:18:21): > of course

2018-01-22

Aaron Lun (06:45:24): > I’ve cleaned up proposal 2, missing a bit about rhdf5 though@Mike Smith

Raphael Gottardo (15:13:28) (in thread): > Ok, I have added some text. I have also added some text about HDF hsds

2018-01-24

Mike Smith (06:36:00) (in thread): > rhdf5 paragraph added

Raphael Gottardo (18:23:57): > @Martin MorganWhat’s the status of the proposal. What’s our deadline for finishing it?

Martin Morgan (20:05:14): > @Raphael GottardoI’ll try to revise the proposal tomorrow (@Vince Carey@Aedin Culhane@Rafael Irizarrywe need some material from your projects). The due date for the proposals is the 2nd (next Friday), so I’d like to go into the weekend with a complete draft, and come out on Monday with a final version. Remember that each individual group needs to prepare a budget and some group-specific things, and submit to CZI independently.

Kasper D. Hansen (22:41:01): > Any idea on admin involvement in the proposal? I realize with dread that it might have to go through ORA?

Kasper D. Hansen (23:16:21): > Looking at the online form. Every project needs a clear “Deliverable” separate box at upload

Kasper D. Hansen (23:16:51): > Also, it says “Please also confirm that you can attend at least 2-3 meetings and hackathons, both in person and remote, with the whole group as well as with smaller subgroups of collaborators working on similar projects.”

Kasper D. Hansen (23:17:15): > Thats a fair amount of travel it seems. From Jeremy, I understand that 1st meeting is paid for by CZI.

Raphael Gottardo (23:19:52): > It’s your entire budget right there!

2018-01-25

Aaron Lun (09:28:38): > I’ve just set up paperpile.

Kasper D. Hansen (09:32:57): > at least - given it is a foundation grant - I assume it is easy to move $$ between budget categories.

Aedin Culhane (13:09:44): > @Martin MorganI merged in our original project aims and fixed the references (very easy with paperpile ;-)) Thanks@Raphael GottardoI;ll chat with Vince and what needs to be revised

Martin Morgan (21:52:44): > There are some missing sections >
> - all: oops, I should have asked for short synopsis of ‘Prior contributions and preliminary results’ > @Rafael IrizarryDeliverables, Project Aim 3, Proposed work > @Aedin Culhane@Vince CareyDeliverables > @Davide Rissopaperpile references. CanBigDataAlgorithmsbe named something more HCA-specific, in line with reviewer comments to avoid generic functionality? > > The proposal reads like there is way more work than we can possibly deliver – Can each section, especially Aim 3, prioritize? I view this as a contract, and since we are writing it ourselves we should set out work that we (a) want to and (b) realistically can accomplish in the short funding period.

Kasper D. Hansen (22:11:30): > I fixed some of the references for Davide.

Kasper D. Hansen (22:11:55): > I think we should rename - at least in the grant - BigDataAlgorithms to scMatrixAlgorithms

Kasper D. Hansen (22:12:10): > We can always release package under a different name I think

Kasper D. Hansen (22:12:52): > I agree with@Martin Morgancomments on feasibility. In some sense we want to promise the minimal we can deliver and still make them happy

Kasper D. Hansen (22:17:08): > @Davide RissoI think we want to promise SVD/PCA - we know we can do this in theory. This will not be trivial to make fully scalable and integrated with HDF5. We also want to do more, but I think we should leave the door open. For example mention a few and say that the order in which we work on these will be determined by their relative importance to HCA

Kasper D. Hansen (22:17:32): > Unless there are some you know you want to look at

2018-01-26

Aedin Culhane (00:31:08): > Hi Kaspar and David. Seurat uses the R package irbla in their function RunPCA. Since the group are familiar with Seurat, it might be worth mentioning advantage over irbla, or at least say you’ll compare to irblahttps://www.rdocumentation.org/packages/Seurat/versions/2.2.0 - Attachment: Attachment > @Davide Risso I think we want to promise SVD/PCA - we know we can do this in theory. This will not be trivial to make fully scalable and integrated with HDF5. We also want to do more, but I think we should leave the door open. For example mention a few and say that the order in which we work on these will be determined by their relative importance to HCA

Aedin Culhane (00:32:53): > Have you reached out to the Seurat team?

Davide Risso (06:56:16): > @Kasper D. HansenI think yours is a good point. I will edit the related sections to reflect this.

Davide Risso (07:01:24): > @Aedin CulhaneI know Rahul, the main author of Seurat, and I can reach out to him if needed. But I’m not sure what is the advantage of talking to them at this stage. What do you have in mind?

Davide Risso (07:07:15): > As far as I understand, the irlba package implements a Lanczos bidiagonalization method, which we mention in our proposal. The irlba package implements it for (in-memory) dense and sparse matrices. We propose to implement it for out of memory data.

Davide Risso (07:07:49): > I agree though that we could explicitly mention the Seurat package, since the CZI is familiar with it

Aaron Lun (07:34:43): > A number of single-cell packages use irlba.monocleandscranboth use it, so there’s nothing particularly special aboutSeuratin this regard. Any improvements to these approximate PCA methods would benefit all of us.

Davide Risso (07:48:41): > Good point! I will add a sentence highlighting this.

Kasper D. Hansen (21:12:16): > Do anyone knows the start and end date of the grant?

Kasper D. Hansen (21:12:32): > I mean, I know it apprx. but not sure it is still current

Kasper D. Hansen (23:19:23): > everyone: should we have clear deliverables at the end of each aim? We need them for upload.

Kasper D. Hansen (23:19:45): > @Davide Rissowe should chat about our deliverables.

2018-01-27

Martin Morgan (08:34:58): > The overall structure is from the instructions; I think one set of deliverables at the top, elaborated (and with no feature creep) below in the aims is appropriate. For the size and duration of the grant the punchy deliverables at the top are more than enough – we’ll have five.

2018-01-28

Aaron Lun (11:29:53) (in thread): > I’ve added our prior contributions, currently cutting down on section 2.

Aaron Lun (11:44:44) (in thread): > Actually, cutting section 2 is pretty hard. I’m happy to triage the SingleCellExperiment work and merge it with one of the previous paragraphs.

Aaron Lun (14:16:21) (in thread): > @Martin MorganDo you want me to help with any editing?

Aaron Lun (14:22:16) (in thread): > Well, I’m going home now, but I can pick up tomorrow.

2018-01-29

Martin Morgan (07:17:47): > The text is ‘complete’ but > > - There are some references (bioRxiv and maybe some Bioc packages) that need to be paperpiled; search the text for [ (mostly from the Risso proposal). I think these need to be entered into paperpile ‘by hand’? > > - Incorporating a figure or two might be useful, e.g., Kasper’s version of the overall structure > > - I fleshed out the ‘prior work’ section to cover all particpants, but now it occupies a lot of space relative to the body of the proposal. I propose that this section be revised as outlined in the doc, and will do so (feel free to contribute yourselves) later today unless there serious objections.

Kasper D. Hansen (08:28:48): > From skimming it, it looks good. I dont

Kasper D. Hansen (08:29:07): > I don’t think it needs extensive editing for shortness.

Kasper D. Hansen (08:29:22): > or rather said, is that worth the effort given unlimited space

Aaron Lun (09:54:05): > Do we even need to have explicit references for Bioconductor packages?

Aaron Lun (09:57:42): > Adding them manually would be a real pain.

Martin Morgan (09:58:57): > My thought was that this ends up on a public repository somewhere, and one could use this material to promote the existing project infrastructure as much as to win the grant

Aaron Lun (10:02:34): > Can’t figure out how to insert multiple authors - it keeps mushing “Davide Risso” with “Aaron Lun” to form all sorts of weird combinations.

Aaron Lun (10:02:59): > Finally got it.

Kasper D. Hansen (10:21:15): > Fixed the references in Aim 4

Kasper D. Hansen (10:21:24): > Any other packages which needs to get fixed?

Kasper D. Hansen (10:21:30): > package citations that is

Kasper D. Hansen (10:22:42): > I searched for cite. If anyone has anything, I figured it out, so comment here

Aaron Lun (10:23:06): > I’m going through and adding package citations for Davide’s paragraph.

Kasper D. Hansen (10:23:17): > I did it

Kasper D. Hansen (10:23:34): > we don’t need to add the package citations for our own stuff

Kasper D. Hansen (10:23:47): > but we cited irlba which is relevant because its used by seurat etc

Aaron Lun (10:24:05): > He has a number of citations in the prior contributions that are still “[”.

Kasper D. Hansen (10:24:12): > oh

Kasper D. Hansen (10:24:19): > I was taking aim 4

Aaron Lun (10:24:30): > Good that we’re not doing redundant work, then.

Aaron Lun (10:25:51): > Though I guess a lot of these prior contributions stuff will get deleted in the clean-out.

Aaron Lun (10:28:36): > I think we’ve fixed all the citations, I’ll leave the reconfiguring of the prior work to Martin.

Kasper D. Hansen (10:32:31): > Did you rerun the references?

Aaron Lun (10:33:03): > Yep.

Kasper D. Hansen (10:39:10): > great

Kasper D. Hansen (10:42:54): > put in an placeholder in grant title so we don’t get the same title.

Vince Carey (15:29:21): > What do we insert for ? In the title it might be the submitting institution name? Or the PI?

Kasper D. Hansen (15:55:53): > I was thinking PI but I have not gone through and checked everywhere

Kasper D. Hansen (15:56:14): > I can do that more systematically tonight

Raphael Gottardo (16:04:50): > Do we really need to insert something? They know what we’re doing and it would be pretty obvious who the PI/institution is. So my suggestion is to the leave the title as is.

Raphael Gottardo (16:08:45): > Can someone slack everyone here once the final version is ready, so that we can be sure to all upload the same.

Kasper D. Hansen (16:25:02): > There are many INDIVIDUAL placeholders in the proposal. I was just trying to make sure they don’t have a unique(title) check

Kevin Rue-Albrecht (16:27:47): > @Kevin Rue-Albrecht has left the channel

Davide Risso (17:22:59): > Sorry I was traveling today and didn’t see the slack notifications

Davide Risso (17:23:11): > Thanks for fixing the references!

Martin Morgan (21:01:15): > The final PDF for part 1 is athttps://drive.google.com/file/d/1HTmxnyV6En6cwntglolO_dnKZNg0hkeb/view?usp=sharing. The cut-and-paste material for the other parts are in the google doc, but below the main proposal; I changed the capabilities to ‘can comment’ but only for sanity sake, happy to open it up again if desired

2018-01-30

Aaron Lun (04:14:03): > Great work, Martin. Looks really nice.

Martin Morgan (04:43:22): > A huge part of the credit goes to you, Aaron, thanks! And of course to the others in the group!

Mike Smith (08:01:57): > It looks great. Thanks everyone!

Kasper D. Hansen (12:09:39): > Reading instructions since I am partly uploading. In part I they say “Please also mention any specific coordination planned with your collaborators, either as originally planned or updated based on discussions with your Science Officer as appropriate.”

Kasper D. Hansen (12:09:54): > Should we have a section on coordination?

Kasper D. Hansen (12:10:10): > Or should we just ignore

Aaron Lun (13:02:52): > If you were to write anything, it sounds like it would be about specific collaborations between you and other groups on particular things? e.g., you and Davide on the algorithms, for example.

Aaron Lun (13:03:10): > For us (Marioni), I guess it would be with Wolfgang and Martin.

Aaron Lun (13:03:33): > Not sure that we’d need a common text for that section, because it will be very group-dependent.

Aaron Lun (13:04:21): > (We haven’t started the upload yet - still waiting for a signature from EMBL.)

Raphael Gottardo (13:05:13): > I would live it blank since they already know about our collaboration. I think this would be if you were to collaborate with others that are not founded through this effort.

Kasper D. Hansen (13:50:55): > point taken. I was thinking about mentioning slack channel, meetings, past collaborations. We have a strong existing network. But they also kind of know that and since we are funded they can always ask how we plan to do the coordination

Martin Morgan (13:58:08): > If no one has pulled the trigger yet I can add a short paragraph and regenerate the PDF. I’ll do it by 5pm Eastern.

Davide Risso (14:59:26): > How did we decide to do for the title? Should we include or not the contribution?

Davide Risso (15:00:01): > I’m fine with adding the paragraph

Davide Risso (15:00:36): > I will also read the proposal one more time and make sure to send any comment before 5pm

Davide Risso (15:09:42): > Sorry, I meant: it is fine with me to add the paragraph - Attachment: Attachment > I’m fine with adding the paragraph

Kasper D. Hansen (16:19:18): > Im in favor of adding name to title, but that is not surprising I guess. I don’t see it matters too much and its easy to do

Kasper D. Hansen (16:19:49): > on the other hand we list PI on submission, so perhaps it is irrelevant

Kasper D. Hansen (16:19:59): > Ok, I’ll trust the more exp. people who says it doesn’t matter

Kasper D. Hansen (16:20:06): > scrap the idea

Martin Morgan (16:36:56): > There are two short and relatively lame paragraphs on p. 10 describing collaboration. Updated PDF athttps://drive.google.com/open?id=1HTmxnyV6En6cwntglolO_dnKZNg0hkeb

2018-01-31

Martin Morgan (07:01:06): > Wolfgang made many small ‘wordsmithing’ comments that nonetheless improve the doucment. I accepted the comments with one outstanding (for@Aaron Lun) and I’ll do the very final-final version of the document by 10:30 Eastern this morning. Sorry that the goal line is shifting a bit at the end.

Aaron Lun (07:42:05): > Modified text in response to the comment.

Martin Morgan (10:33:28): > ok I changed the link permissions to ‘view’! the pdf is updated at the google drive link above.

Aedin Culhane (16:24:59): > Summary of the recent HCA DCP Quarterly meeting (From HCA Slack group).https://docs.google.com/presentation/d/1G1QV5fRnWG7etlKvF5DCXdADqnQ9gVgyH9O94tcscf0/edit#slide=id.pMeeting materialshttps://drive.google.com/drive/u/0/folders/0B8_GJ4pSlhgxQUFvVExoWDhndnc

Vince Carey (18:59:24): > Do we have common language for “Project purpose” (255 char)? Our admin is doing the upload and has asked for this

2018-02-01

Martin Morgan (05:05:19): > @Vince CareyProject purpose is in the google dochttps://docs.google.com/document/d/1NbHof0Uh4aCC6UbMDXlUOtEgnu4TrzmD7O6KfnnelL4/edit?usp=sharingafter the references

Aaron Lun (05:50:16): > Submitted.

2018-02-02

Davide Risso (11:40:38): > I’m ready to submit too! I will do it by the end of the day!

Kasper D. Hansen (11:44:23): > me too

Raphael Gottardo (11:59:37): > Submitted yesterday!

2018-02-05

Aedin Culhane (12:11:54): > from PQG email list : Do you have junior faculty members (Assistant/Associate Professors) in your organization that would benefit from a novel, interdisciplinary opportunity to collaborate? Please encourage them to apply for the 2018 Data Science Innovation Lab: “The Mathematics of Single Cell Dynamics” taking place June 25th-29th, 2018 at the Riverhouse on the Deschutes in Bend, OR. The application deadline is Feb 28th, 2018 11:59PM Eastern Time, after which candidate selection process will commence, with final selection and notification of our 2018 Data Science Innovation Lab Fellows is intended by April 30th, 2018. > >
> > The intent of the 2018 Data Science Innovation Lab framework is to foster the formation of new interdisciplinary collaborations which will (but not be limited to) generate creative strategies for addressing challenges associated with the mathematical approaches towards quantifying single cell heterogeneity in situ and developing novel tools for visualization. Such challenges arise from multifaceted data structures like single cell high throughput data sets and spatio-temporal images, sparse or missing data, streaming of non-stationary time series data, the need for integration from multiple sources of data, cellular environment effects, perturbative single cell sampling, etc. This Data Science Innovation Lab is intended to bring together expertise from the mathematical, statistical, and biomedical fields, to address interdisciplinary topics in biomedical data science critical to the effective use of single cell multi-spatio-temporal data. > >
> > Early-career investigators (Assistant/Associate Professors) from a broad diversity of quantitative (Mathematics, Statistics, Biostatistics and Computer Science) and biomedical (behavioral, biology, biophysical, clinical science, ecology, and epidemiology ) disciplines are highly encouraged to apply. However, application to the event is open to any biomedical investigator who has research questions with an associated single cell big data challenge/acquisition or any quantitative investigator with relevant approaches and methodology to the analysis/quantification of single cell big data. Competitively selected participants will take part in this mentored, facilitated five-day residential workshop to form new interdisciplinary teams to tackle these data science challenges. At the end of the workshop, the teams will have developed the foundation for a novel research proposal suitable for submission to the NIH or NSF to compete for potential funding. > >
> > The 2018 Data Science Innovation Lab is being organized by the Big Data to Knowledge (BD2K) Training Coordination Center with the help of KnowInnovation, Inc. and is supported by the National Institutes of Health and the National Science Foundation. For more information about the 2018 Data Science Innovation Lab and the application process, please visit the websitehttp://bigdatau.org/innovationlab2018Specific questions can be referred tobigdatau@ini.usc.edu

2018-02-21

Aedin Culhane (14:26:29): > ScRNA intensive five-day residential workshophttps://bigdatau.ini.usc.edu/innovationlab2018

2018-03-08

Davide Risso (10:17:13): > Hi all, should we organize a phone call with all the groups to start coordinating on the deliverables for the CZI project? In particular for our group, it would be extremely useful to talk to@Kasper D. Hansenand@Raphael Gottardoto decide on how to divide the work and on practical things: e.g., one package vs multiple packages, etc.

Davide Risso (10:17:26): > Are you guys available for a phone call next week or so?

Davide Risso (10:17:37): > Pinging@Stephanie Hicksand@Elizabeth Purdomtoo

Davide Risso (11:15:08): > BTW I just found out that there’s a HCA meeting going on now

Davide Risso (11:15:28): > live stream:https://www.youtube.com/watch?v=Y6nESW9p2k0

Vince Carey (11:27:31): > There are apparently 80 watching – could that be a limit? I only see a single slide and no audio

Stephanie Hicks (11:36:18): > Same here. Except I think it’s starting up again

Vince Carey (11:36:21): > audio just began … crowd noise

Vince Carey (11:36:26): > video on

Stephanie Hicks (13:37:55): > live stream is now discussing “The Data Portal” to query, download and access HCA data

Kasper D. Hansen (15:07:13): > Was this announced on the HCA slack channel? The communication completely sucks

Stephanie Hicks (15:14:56): > not sure. Interestingly, the last question/complaint at the end of the session was abouttoo muchcommunication on the HCA slack channel. the audience member requested a quarterly summary be provided

Aaron Lun (15:15:48): > Is anyone else going to the CZI meeting in SF in April?

Stephanie Hicks (15:19:02): > @Davide Rissoand I RSVP-ed. Still tentative if I can go though.

Aaron Lun (15:20:00): > oh good.

Peter Hickey (15:20:36): > i’m probably going

Davide Risso (15:20:49) (in thread): > Yes, this morning. That’s how I found out. As far as I know there was no announcement of the conference ahead of time in the slack, but I may just have missed it…

2018-03-09

Kasper D. Hansen (07:33:37) (in thread): > what channel? I am clearly not in the right channels and there is a million now

Kasper D. Hansen (07:34:06): > I’ll planning on going

Vince Carey (07:39:49): > me too

Elizabeth Purdom (10:10:54): > @Elizabeth Purdom has joined the channel

Martin Morgan (18:23:06): > I’ll be there too, if you’d like to schedule a meeting,@Davide Risso, I suggesting adding@channelto get everyone’s attention

2018-03-14

Daniel Van Twisk (13:59:46): > @Daniel Van Twisk has joined the channel

2018-03-23

Aedin Culhane (13:48:02): > Hi . What are the main competitors to 10x. Also which scRNAseq approaches retain spatial info

Aaron Lun (14:33:08): > Main competitors to 10x in droplet-based methods would be inDrop and Drop-seq, but 10X seems to have crushed them; the Chromium system is much easier to use, albeit more expensive and difficult to customize and/or trouble-shoot. More generally for high-throughput scRNA-seq, there is seq-well, microwell-seq and combinatorial indexing (sci-seq), which have varying claims of throughput and cost improvements over 10X. Though it’s fair to say that 10X has a dominant market position right now.

Aaron Lun (14:37:23): > Regarding scRNA-seq approaches retaining spatial information; FISSEQ seems to be the closest to what you’re asking for. Otherwise, I don’t think there’s any dissociation-based methods that preserve spatial info, at least without having a separate tagging step to record the original position of the cell (e.g., PMID: 27198043). Of course, there are also seqFISH and merFISH, which can handle 100-1000 genes, last I heard.

Stephanie Hicks (21:25:05): > @Aedin CulhaneI was at a meeting this week and Emily Alden from Harvard presented a short talk on an experimental technique to preserve the spatial structure and quantification of mRNA (https://biostat.wustl.edu/dacc/wp-content/uploads/2018/03/2018-NHGRI-Annual-Meeting-Agenda-3-8.pdf). I did a quick google search and couldn’t find a paper, but I can only assume it’s coming out soon.

2018-03-30

Matt Ritchie (07:49:50): > @Matt Ritchie has joined the channel

2018-04-09

Davide Risso (09:40:32): > Hi<!channel>! I believe that you also received the email from Jeremy regarding our 30 mins presentation at the Santa Cruz meeting. How do we want to organize this? Are people available for a quick phone call this week?

Martin Morgan (09:43:30): > @Davide RissoSounds like a good idea; want to set up a doodle poll? Most days / times other than Thursday (all day) and 2pm (all days) work for me.

Davide Risso (09:45:14): > Ok, will do!

Kasper D. Hansen (09:53:06): > should we frame the discussion first on slack?

Kasper D. Hansen (09:53:35): > I think we should have few presenters; perhaps one.

Davide Risso (09:54:03): > Perhaps we can start with a list of who is going?

Kasper D. Hansen (09:57:57): > From my group: myself and Pete Hickey

Davide Risso (10:02:54): > Ok, I’ve created a google doc in which to add who is going to the meeting and with a rough draft of the presentation outline (obviously very preliminary but what I could think of in 2 mins):https://docs.google.com/document/d/1-46wwobIMR2RxBs1T7Tkin53Dou50mirzGXkyr5oVuQ

Kasper D. Hansen (10:03:41): > could you give permission tokasperdanielhansen@gmail.com

Davide Risso (10:04:10): > Sure

Kasper D. Hansen (10:06:10): > If “sure” means “Ive done it”, I can report it doesn’t work

Davide Risso (10:07:07): > No sure means I’m trying to do it but I currently only have my phone and I can’t seem to make it work

Davide Risso (10:07:37): > For some reason only people from Berkeley can edit now:thinking_face:

Davide Risso (10:08:42): > Can you try now?

Kasper D. Hansen (10:08:53): > success!

Davide Risso (10:09:21): > I wanted to allow anyone with the link to edit, but I can’t on my phone

Davide Risso (10:09:38): > Can you? Or am I the only one who can grant access to other people?

Kasper D. Hansen (10:09:49): > https://docs.google.com/document/d/1-46wwobIMR2RxBs1T7Tkin53Dou50mirzGXkyr5oVuQ/edit?usp=sharing

Kasper D. Hansen (10:09:56): > I fixed it for you

Davide Risso (10:10:25): > Thanks!

Kasper D. Hansen (10:10:27): > Kind of weird that I can see on twitter there is a HCA analysis jamboree at the broad right now

Kasper D. Hansen (10:10:48): > I guess there are the important people and then the riff-raff who gets sent to california

Davide Risso (10:10:50): > Yes, I also found out yesterday

Davide Risso (10:12:10): > I think@Aaron Lunmight be there?

Davide Risso (10:12:35): > It’s interesting how few people know of these things…

Kasper D. Hansen (10:12:39): > well, his boss is:slightly_smiling_face:

Martin Morgan (10:48:38): > The links above are ‘view only’ for me

Davide Risso (10:51:31): > I can’t see how to change the global settings from my phone, but I added you

Davide Risso (10:51:39): > You should now be able to edit

Davide Risso (10:52:47): > As soon as I have access to the file with my laptop I should be able to change the global settinngs

Aaron Lun (11:05:14): > Yes, I’ll be there, I’m afraid. Currently in Boston for the HCA Jamboree.

Aedin Culhane (11:06:04): > @Stephanie HicksThanks. I haven’t heard Emily Alden speak, but a google of the title of the talk, found this has been presented a few places. It seems to be work she did as a research assistant in Jeremy Edwards Lab in the Dept of Chemistry at University of New Mexico. Jeremy Edwards Lab seems to be doing some interesting seq tech dev researchhttps://scholar.google.com/citations?hl=en&user=f_wUhnUAAAAJ&view_op=list_works&sortby=pubdate. Her linked in profile still says Univ of New Mexico (https://www.linkedin.com/in/emilynalden), so maybe she has only recently moved to Harvard. I couldn’t find her in the Harvard “system”

Aedin Culhane (11:08:19): > @Stephanie Hicks.I couldn’t find a paper either but I found an abstract onlinehttp://www.w-qbio.org/wp-content/uploads/2018WinterQ-BioAbstracts.pdf“Visualization and Transcriptome Sequencing of Histological Tissues from Multiple Organs while Preserving Spatial Information of the mRNA > Emily Alden, Radha Swaminathan, Jeremy Edwards. Dept of Chemistry and Chemical Biology, University of New Mexico, > > They seems to adhere the tissue to a glass slide and sequence in situ. Are they limited to the number of genes they are sequences, or number of cells?

Davide Risso (11:13:50): > Ok now everyone with the link should be able to edit

Kasper D. Hansen (11:23:41): > I kind of feel we should ask Jeremy to give a presentation explaining the relationship between the CZI effort and HCA

Aaron Lun (11:51:09): > I always thought HCA = science and CZI = money.

Davide Risso (14:57:26): > <!channel>I’ve created a doodle poll for possible times to talk on the phone this week (either Wed or Fri given Martin’s availability). If we cannot agree on a time this week, I can send another poll for next week.

Davide Risso (14:57:58): > The link:https://doodle.com/poll/vqewrzxmeeni87wa - Attachment (doodle.com): Doodle: HCA presentation planning > All times EST

2018-04-11

Raphael Gottardo (07:40:13): > @Davide RissoI am traveling all week, and thus can’t attend. Also, I agree with what others have said that we should have only one speaker, and given the current set up it should be@Martin Morganassuming he is ok with it.

Davide Risso (08:46:49): > Ok, it looks like not a lot of people can meet this week. Should I send a new doodle for next week or is slack + google doc enough to coordinate? If there’s only one speaker the latter might be good enough<!channel>

Aaron Lun (08:46:54): > I’m still in Boston

Kasper D. Hansen (08:47:29): > I would be open to have 2 (perhaps 3) speakers, just not 8.

Martin Morgan (08:50:47): > I’m also up for a couple or three speakers, and have no ‘need’ to be one of them

Stephanie Hicks (09:53:42): > Not that I remember her saying. The images she had in her slides though were amazingly cool. - Attachment: Attachment > @Stephanie Hicks .I couldn’t find a paper either but I found an abstract online http://www.w-qbio.org/wp-content/uploads/2018WinterQ-BioAbstracts.pdf “Visualization and Transcriptome Sequencing of Histological Tissues from Multiple Organs while Preserving Spatial Information of the mRNA > Emily Alden, Radha Swaminathan, Jeremy Edwards. Dept of Chemistry and Chemical Biology, University of New Mexico, > > They seems to adhere the tissue to a glass slide and sequence in situ. Are they limited to the number of genes they are sequences, or number of cells?

Stephanie Hicks (09:57:47): > @Aaron LunHow is that going? - Attachment: Attachment > Yes, I’ll be there, I’m afraid. Currently in Boston for the HCA Jamboree.

Aaron Lun (10:03:54): > Walked into Boston and got sick. Spent Day 2 back at the hotel.

Stephanie Hicks (10:14:51): > oh I’m sorry to hear that!:disappointed:How many days is it?

Aaron Lun (10:17:40): > Day 3 now. Still a bit:face_vomiting:

Aedin Culhane (14:48:43): > Aaron anything those of us in Boston can do to help?

Aaron Lun (14:49:31): > Thanks Aedin, I should be fine now. Head’s cleared up mostly, the pain has migrated to my stomach, which is more manageable..

Aedin Culhane (14:50:15): > Can I bring you anything?

Aaron Lun (14:50:50): > No thanks, but I appreciate the thought.

Aedin Culhane (15:13:20): > Aaron can I share the jamboree 3 tasks ?

Aaron Lun (15:13:46): > I suppose so, I didn’t get any instructions on it being classified

Aedin Culhane (15:13:50): > links to the Jamboree taskshttps://github.com/HumanCellAtlas/hca-jamboree-how-many-cellshttps://github.com/HumanCellAtlas/hca-jamboree-samplinghttps://github.com/HumanCellAtlas/hca-jamboree-representative-genes - Attachment (GitHub): HumanCellAtlas/hca-jamboree-how-many-cells > Contribute to hca-jamboree-how-many-cells development by creating an account on GitHub. - Attachment (GitHub): HumanCellAtlas/hca-jamboree-sampling > Contribute to hca-jamboree-sampling development by creating an account on GitHub. - Attachment (GitHub): HumanCellAtlas/hca-jamboree-representative-genes > Contribute to hca-jamboree-representative-genes development by creating an account on GitHub.

Aedin Culhane (15:14:42): > Channel #boston-jamboree on the hca slack

Aedin Culhane (15:15:30): > Thanks aaron.

2018-04-12

Vince Carey (06:43:53): > Can you issue invitations to the #boston-jamboree channel?

Vince Carey (07:55:11): > I have had a quick look at the jamboree tasks. Do these help guide the formulation of the 30 minute bioc overview?

Aaron Lun (08:13:53): > I would not really consider them.

Aaron Lun (08:14:23): > They were just ideas slapped together by a few PIs on the weekend to give the grunts something to do.

Aaron Lun (08:14:51): > Most of our proposed work would be at a lower level, based on what I recall…

Davide Risso (09:48:43): > Thanks@Vince Careyyour outline is much better than what I originally proposed. Given the outline, it sounds like you or@Martin Morganare the best options as speakers, if you agree. Or someone else in the technical advisory board.

Vince Carey (09:50:00): > Davide, thanks … I think we have a way to go in conceptualizing our overview. My points are available for use … but more voices are needed.

Davide Risso (09:50:16): > I also wanted to remind everyone that Jeremy offered to schedule a phone call to discuss our presentation. I’m not sure if we want to, but he was proposing to meet next week so we should probably get back to him either way.

Vince Carey (09:53:11): > @Aaron Lun– let’s not disregard them (jamboree tasks) completely though – 1) are any proposed solutions available? i did not see any issues on the noted repos; 2) are the demonstration data (which i have not tried to assemble) readily accommodated in our toolset already?

Vince Carey (09:55:04): > @Davide Risso– good to remember the phone offer from Jeremy. are you willing to do it? i think arranging a group may be hard and unnecessary.

Aaron Lun (09:56:06): > (1) Not really, though it depends on whether any of the groups follow up with their work. (2) They’re either loom or mtx files, so yes, we should be able to handle them.

Davide Risso (09:56:36): > Sure, I can do it on April 18th. It would be good if by that time we have a semi-complete outline so that I know what to tell Jeremy:slightly_smiling_face:

Davide Risso (09:56:48): > I can then report back here his feedback

Aaron Lun (09:57:10): > Of course, no one was using BioC-devel at the~hackathon~jamboree, so it was effectively a moot point.

Davide Risso (09:58:00): > @Aaron Lunmeaning that they were using BioC-release or that everyone was using python?

Aaron Lun (09:58:07): > release

Aaron Lun (09:58:10): > and/or python

Aaron Lun (09:58:14): > I would say 50:50 split

Davide Risso (09:58:40): > this looks good/useful if it works as promised:https://github.com/HumanCellAtlas/DoubletDetection - Attachment (GitHub): HumanCellAtlas/DoubletDetection > DoubletDetection - Doublet detection in single-cell RNA-seq data.

Davide Risso (09:58:46): > was it also part of the jamboree?

Aaron Lun (09:58:50): > last year’s

Davide Risso (09:58:55): > oh ok

Aaron Lun (09:59:13): > Yes, this year’s tasks were even less inspiring than last year’s, IMO.

Aaron Lun (09:59:30): > How we’re meant to “hack” experimental design is not clear to me.

Aaron Lun (09:59:45): > Spent my non-sick days wondering exactly what we were meant to be doing.

Davide Risso (09:59:53): > I mean doublets identification is a really important task IMO

Davide Risso (10:00:04): > Not so clear about the other tasks

Aaron Lun (10:00:45): > Well, I wasn’t excited by this year’s tasks, that’s all I’m saying.

Aaron Lun (10:01:31): > The doublet paper is nice but is in danger of getting scooped by similar work from the Klein group

Vince Carey (10:02:56): > We’ll have to remain cognizant of the small quantity of resources made available to bioc for this project. Being clear about how the HCA will benefit from adopting bioc practices may be important in the overview … because the HCA-focused effort we can contribute is pretty limited. Showing that we understand how HCA can benefit Bioc is also important for the overview.

2018-04-13

Aaron Lun (10:04:36): > Looking at the google docs now

Aaron Lun (10:05:52): > Should we start putting together slides somewhere?

Vince Carey (10:06:11): > Yes

Vince Carey (10:06:29): > Google slides seem quite adequate IMO

Vince Carey (10:06:43): > But if you need something more functional let’s make a repo

Aaron Lun (10:07:01): > yeah, google slides are probably fine.

Vince Carey (10:07:28): > @Martin Morgansurely has the best general overview material

Aaron Lun (10:09:16): > I remember this:https://docs.google.com/presentation/d/1aHyZvY_8CXzgecynORZoen5qasVk_SJ2U-Na3WaHYgE/edit#slide=id.p

Vince Carey (10:11:11): > good start! very dense. i was thinking about a screenshot of the support site devoted to single cell Q&A but the search facility return on a naive query is not super useful

Aaron Lun (10:14:43): > I’m going to start unpacking it. I was thinking of organizing it into three sections (one per speaker).

Aaron Lun (10:15:13): > Though this wouldn’t necessarily correspond to the three slides.

Davide Risso (10:16:36): > If you agree, I can set up a meeting with Jeremy for the 18th (next Wednesday) assuming he’s still available

Davide Risso (10:17:47): > It would be good if by then we have a semi-defined outline, and even better if we have some draft slides

Vince Carey (10:18:01): > should be quite feasible

Aaron Lun (10:18:08): > sounds good

Davide Risso (10:18:17): > alright, emailing him now

Aaron Lun (10:20:59): > I’m putting down some words for now, but I can add some pictures later to make it prettier once we know what we want.

Aaron Lun (10:21:58): > I think we should start with “what is bioconductor” before transitioning to “how bioconductor will help the HCA”

Davide Risso (10:23:19): > Vince’s first point in the doc seems a good place to start: “Current state of Bioconductor: longevity, user base, developer base”

Davide Risso (10:24:22): > instead of writing stuff from scratch perhaps@Martin Morganor others have some general Bioc material that can be used for the intro

Davide Risso (10:25:02): > so perhaps focus on the “how bioc will help HCA” and we can fill the intro later

Aaron Lun (10:27:49): > There’s two aspects here, how we can help them generally and the specific items of proposed work

Davide Risso (10:29:55): > Do you mean just a list of our proposed deliverables or specifics on how we plan to implement them?

Vince Carey (10:31:01): > i dropped in pngs of two excerpts from the Nat Gen endorsement – use if desired

Aaron Lun (10:31:08): > sure, thanks

Aaron Lun (10:34:03): > I’m just putting out the overall structure, no flesh at all at this point.

Vince Carey (10:40:30): > It’s an open question how to balance between talking about (1) bioconductor generally, which may be “familiar” to many in the audience, (2) about bioconductor more specifically, concerning the shared infrastructure and approaches to achieving agility (including changing R when necessary), and (3) talking about the specific proposals which the HCA group agrees should be done. (3) is surely important and (2) is probably not so familiar and may deserve more weight than (1). So perhaps go rapidly through (1), present a clear view of (3) and then indicate how (2) will help to do (3) and indeed more throughout HCA. Somehow I think we have to abrogate the exclusion pattern to date … there should have been more representation of Bioc at the jamboree even if there were some gaps in the jamboree’s conception. Indeed having a few of our people engaged in the problem formulations could have made a difference to their scientific clarity and relevance.

Aaron Lun (10:41:52): > That’s because the project is full of Silicon Valley types who just want to throw deep learning at everything.

Vince Carey (10:42:55): > 10-4 good buddy. But we have to try. We don’t have time to waste.

Aaron Lun (10:43:12): > Agreed

Vince Carey (10:49:46): > Tech question: I just looked at the ipython notebook for the osmFISH loom filehttps://github.com/linnarsson-lab/osmFISH_celltype_analysis/blob/master/osmFISH_Cell_Type_Analysis.ipynband it says use python 3.6 … I have not gotten involved with python 3 … are you using that? does loomExperiment need it? - Attachment (GitHub): linnarsson-lab/osmFISH_celltype_analysis > osmFISH_celltype_analysis - Notebook to perform the clustering, region classification and spatial analysis of the osmFISH cortex dataset.

Aaron Lun (10:51:28): > No Python should be necessary for just creating a LoomExperiment, I would think; it’s just a HDF5 file. I think all the Python was only for their analysis.

Aaron Lun (11:43:15): > Staring at our final proposal, I think we should merge aims 3 and 4; the difference between them is pretty subtle for a talk.

Aaron Lun (12:18:20): > Okay, I stuck all of our aims in. It’s a bit text heavy, but I can’t easily see how we could put more pictures in.

Vince Carey (12:25:39): > good start … maybe we could do a hangout monday … with luck more investigators will weigh in in the mean time – some may not be watching slack so an email to the PIs to give the link with this slideset may be in order.

Davide Risso (12:32:08): > Good idea. I will create a doodle poll and send an email to all the PI’s

Aaron Lun (13:03:33): > Well, I’m out of ideas for this, so I’ll let others play around with it for a while.

Raphael Gottardo (15:47:17): > I have added some comments to the slides. Reducing the number of aims (for the presentation) makes sense. We don’t want to be over ambitious (or seem that way) given our budget.

Raphael Gottardo (15:47:44): > I am traveling and won’t be back until Wednesday but please let me know if you have any questions.

2018-04-14

Aaron Lun (14:28:39): > I’ve filled in more of the empty slides, but I am now pooped.

Aaron Lun (14:31:45): > It’s frustratingly word-heavy, but there’s no obvious places or reasons to insert schematics. Some charts would be nice for the “impact” slide.

Martin Morgan (14:55:21): > I found Michael’s slides herehttp://bioconductor.org/help/course-materials/2017/BioInfoSummer/bioc-bioinfosummer-2017.pdfquite insightful in building a case for Bioc in an integrative environment

Aaron Lun (15:53:10): > Ha, I like the obstacle course

2018-04-16

Davide Risso (09:45:24): > According to the doodle poll, the best time for the meeting is today at 3pm. I’ve created a zoom meeting. Info below.

Davide Risso (09:45:26): > Bioconductor HCA presentation > Scheduled: Apr 16, 2018 at 3:00 PM to 4:30 PM > Location:https://weillcornell.zoom.us/j/753822843Hi there, > > Davide Risso is inviting you to a scheduled Zoom meeting. > > Topic: Bioconductor HCA presentation > Time: Apr 16, 2018 3:00 PM Eastern Time (US and Canada) > > Join from PC, Mac, Linux, iOS or Android:https://weillcornell.zoom.us/j/753822843Or iPhone one-tap : > US: +14086380968,,753822843# or +16468769923,,753822843# > Or Telephone: > Dial(for higher quality, dial a number based on your current location): > US: +1 408 638 0968 or +1 646 876 9923 or +1 669 900 6833 > Meeting ID: 753 822 843 > International numbers available:https://zoom.us/u/cR78sr6qOr an H.323/SIP room system: > H.323: > 162.255.37.11 (US West) > 162.255.36.11 (US East) > 221.122.88.195 (China) > 115.114.131.7 (India) > 213.19.144.110 (EMEA) > 202.177.207.158 (Australia) > 209.9.211.110 (Hong Kong) > 64.211.144.160 (Brazil) > 69.174.57.160 (Canada) > Meeting ID: 753 822 843 > > SIP:753822843@zoomcrc.comOr Skype for Business (Lync):https://weillcornell.zoom.us/skype/753822843

Aedin Culhane (16:00:51): > Thanks everyone.

Davide Risso (16:23:45): > Thanks to all of you who were in the call. I’ve copied my poorly taken notes into the google doc:https://docs.google.com/document/d/1-46wwobIMR2RxBs1T7Tkin53Dou50mirzGXkyr5oVuQ/edit?usp=sharing

Davide Risso (16:23:48): > @Davide Rissoshared a file:Notes for HCA meeting presentation - File (Google Docs): Notes for HCA meeting presentation

Davide Risso (16:23:54): > Let me know if I forgot something.

Vince Carey (16:47:06): > looks good – one thing we keep forgetting: flow cytometry. we should have at least 10% of content devoted to that.@Raphael Gottardohave a look at evolving slide deck – we are hoping to include some compelling case studies

2018-04-17

Aaron Lun (13:41:37): > @Davide RissoI think the comments about flow cyt algorithms being used for single-cell clustering would be better off when you talk about existingBioconductorsoftware in your section.

Davide Risso (14:02:44): > that’s fine.@Raphael GottardoI’m really not familiar with the latest and greatest of flow cytometry.. do you have any suggestions for a slide?

Vince Carey (14:05:21): > maybe@Mike Jiang…@Raphael Gottardosaid he was traveling through weds.

Mike Jiang (14:05:37): > @Mike Jiang has joined the channel

Aaron Lun (14:35:17): > @Vince Carey, do you have something eye catching for ontologies?

Vince Carey (14:35:58): > i will work on it

Aaron Lun (14:38:43): > I will trim down the ontology description as well, feel free to modify if I’ve failed to capture the salient bits.

Vince Carey (14:38:51): > @Aedin Culhanedoes anything come to mind vis-a-vis ontologies and HCA? it should not be hard to make the point that we already do this well and are close to Helen Parkinson’s group. I would say that we’d want to be able to establish terms for newly established cell types, and evidence measures for their signatures and roles, for rapid uptake by users.

Aaron Lun (14:41:35): > What does the “provenance information” refer to in the ontology aim?

Aaron Lun (14:45:49): > I don’t suppose I could get a few hypothetical case studies of what you have in mind with the ontology API? The text in the proposal was quite abstract, and I’m not sure exactly how it would be used.

Aaron Lun (14:45:50): > I can imagine sample-level, cell-level and gene-level ontology terms. Sample-level would be the various attributes of the donor, the sequencing center, the various protocols, etc. Cell-level would be… cell type? Gene-level is the usual functional stuff, e.g., GO or KEGG or something.

Vince Carey (14:46:35): > I think it should be interpreted as “given an object, we can tell how it was made”. but at the moment i don’t have the document. let me look

Vince Carey (14:47:16): > so maybe provenance here is “given a term, we can tell who created it and how it is to be applied”

Aaron Lun (14:47:37): > Right

Aaron Lun (14:56:16): > On another note, does anyone have experience designing a data format? Not an in-memory representation, but an actual format, e.g., BAM, various microarray stuff? I was thinking that we could add a slide about how BioC people have experience with this kind of stuff.

Vince Carey (14:59:14): > To get more concrete – consider the analysis of the 10x data. You find some legitimate clusters of cells and are able to assert expression signatures that distinguish them. Some of these signatures will correspond to known cell types – but how exactly would you couple that knowledge to your finding and the data? We want to have some Bioconductor functions to aid in doing that. For clusters with signatures that are truly novel, we want to let the user baptize them and add the finding to a curated cell type ontology. EBI has some mechanisms for doing that. The provenance issue: once this process gets going, at the boundaries of knowledge, there will be controversy. So when one uses a term one wants to be able to be very precise about its meaning and genesis. That’s workpiece 1. Workpiece 2 says that whatever we come up with in 1 will be able to be bound into SummarizedExperiments to facilitate finding those cells that seem to be covered by the term(s). Workpiece 3 says take the concepts of 1 and put them into EBIs ontology management infrastructure. Workpiece 4 says do it in shiny.

Vince Carey (15:00:09): > Now let’s make the slide.:joy:

Vince Carey (15:02:21): > The only format made in Bioc that I can think of is GDS. Martin may have had a role in that.

Aedin Culhane (15:08:29): > @Vince CareyProbably briefly mention the vast annotation resources that exists already. and that we will expand ontology support for known/published ontologies/annotation (Helen’s stuff, Ols, oxo, Zooma, webulous etc) but also facilitate labeling by matching to gene lists to experimental signatures derived from studies that are mapped to EFO/uberon/cell ontology etc (aka GXA).

Martin Morgan (16:12:44): > I had no input on gds

Aaron Lun (16:58:29): > Is there anything we want to say on data formats, then? That doesn’t already overlap with data representations (i.e., SE, eSets, and so on)

Aaron Lun (17:02:53): > Because I thought some of us had a less-than-good opinion of loom as a format, and could suggest something better

2018-04-18

Mike Smith (03:22:57): > Back in the day Andy Lynch and I designed a compressed format for beadarray data (paper, package). I’m not sure I’d hold it up as transformative, but you wouldn’t be lying if you said we had ‘experience’. - Attachment (PubMed Central (PMC)): BeadDataPackR: A Tool to Facilitate the Sharing of Raw Data from Illumina BeadArray Studies > Microarray technologies have been an increasingly important tool in cancer research in the last decade, and a number of initiatives have sought to stress the importance of the provision and sharing of raw microarray data. Illumina BeadArrays provide a … - Attachment (Bioconductor): BeadDataPackR > Provides functionality for the compression and decompression of raw bead-level data from the Illumina BeadArray platform

Kasper D. Hansen (03:24:29): > We have tons of experience as users, having written parsers for other peoples formats for essentially every format out there

Aaron Lun (05:02:43): > There is currently an empty slide (number 12) that I was hoping to fill up with some general comments about our experience with biological data formats.@Kasper D. Hansen@Mike Smithand others, if you could put some quick thoughts about what we know/expect/want about data formats, I will talk about it.

Stephanie Hicks (11:39:50): > Hi<!channel>,@Davide Risso@Elizabeth Purdomand I have started a#hca_clusteringchannel to discuss progress on clustering implementations related to this project. All are welcomed.

Vince Carey (12:00:03): > @Aaron LunOn the ontology front, I did add a slide that sketches the workpieces noted previously. It is very dry, I think inherently so. One way to energize the topic would be to make the following points: interoperability depends strongly on shared controlled terminology - but at the boundaries of knowledge, term interpretation and adoption is inherently controversial. We want to make it easy to use speculatively introduced terminology for features or cell types, with clear provenance, and to facilitate sensitivity analysis of the effects of term choice.

Aaron Lun (12:08:35): > I can make a pretty schematic illustrating these points

Aaron Lun (12:08:54): > What exactly are you thinking of when you say “sensitivity analysis”?

Vince Carey (12:14:23): > I mean making it easy to compare analyses that use different terminologies. A crude example is assessing whether gene sets that come from KEGG or reactome yield compatible biological stories. I would assume that in HCA the options for forming sets of samples or features are going to be very numerous and speculative so we want to avoid lock-in to any given approach.

Martin Morgan (13:19:10): > my intention is to finish the short intro section with slide 5; I put in bold names of people I understand to be at the meeting but please update.

Aaron Lun (13:40:01): > @Martin MorganDo you want to mention the aim of the project somewhere earlier, before you start talking about people? Then I can just go straight into the data access and representation.

Aaron Lun (13:40:40): > e.g., mention it in slide 4, allowing us to delete slide 6.

Martin Morgan (13:50:30): > sure I did that and zapped 6

Davide Risso (18:14:57): > <!here>just a reminder that tomorrow at 10am PST / 1pm EST I will be talking to Jeremy and show him our draft presentation. Please, have a look at the slides that we have so far and let me (us) know if we’re missing any of the points that we want to make.

Raphael Gottardo (18:15:44): > @Davide RissoI am working on the flow cytometry slide(s). Should be done before the end of the day.

Davide Risso (18:16:00): > great! thanks@Raphael Gottardo!

Raphael Gottardo (19:19:38): > @Davide RissoI have added 2 slides. These are a bit dense. Let me know if you have any questions.

Vince Carey (19:57:39): > is anyone working on the datasets provided athttps://preview.data.humancellatlas.org/… our IT folks are downloading but hit a snag

2018-04-19

Vince Carey (00:50:41): > The files for ischemic sensitivity are triples of FASTQ with I1, R1, R2 in the filenames, corresponding to 8bp, 26bp, and 98bp reads respectively. I don’t see anything in our single cell workflows relating to this layout.

Mike Smith (03:25:21): > @Mike Smith pinned a message to this channel.

Aaron Lun (04:19:36): > Sounds like the typical 10X data.

Aaron Lun (05:38:18): > @Vince Careyhave a look at slide 10 and see whether it captures the essence of the motivation behind the ontology work

Aaron Lun (06:01:33): > I also distilled the “ontology facets” into a few sentences to make it more digestible.

Aaron Lun (06:10:08): > @Davide RissoSuggest coordinating our “deliverables” slides to have the same format.

Vince Carey (07:41:00): > “typical 10x data” in the sense that one has to use cellranger to get something that dropletUtils can work with?

Aaron Lun (07:45:39): > Yes.

Aaron Lun (07:46:04): > In Bioconductor terms, there is alsoscPipe, though I have not used it myself.

Aaron Lun (07:46:31): > That will get you from the fastq to the raw counts @Matt Ritchiemay elaborate

Kasper D. Hansen (07:48:05): > I would love to get a small briefing on this: what is the typical output from the machines and what has to be done to the data to use various Bioc tools. Would be worthwhile to know for next week. We can talk about it in SF

Vince Carey (08:09:30): > @Aaron Lunslide 10 looks great. might hold slide 11 for later if there are questions on that aim. the data release mentioned previously is encouraging in that the metadata employs established ontologies wherever possible.

Vince Carey (08:18:12): > i added a couple of comments to the slides. wolfgang commented on the “one week” metric. you chose an impoverished computing environment to do the work, probably with a good reason. we might want to give an explicit definition of “scalable” that helps to focus on closing gaps between current and desired performance. the definition inhttp://repository.cmu.edu/sei/399/is “Scalability is the ability to handle increased workload by repeatedly applying a cost-effective strategy for extending a system’s capacity”, and we aim to produce code that has increased throughput as CPUs/RAM are added to the environment, but that also functions in relatively impoverished environments. - Attachment (repository.cmu.edu): “On System Scalability” by Charles B. Weinstock and John Goodenough > A significant number of systems fail in initial use, or even during integration, because factors that have a negligible effect when systems are lightly used have a harmful effect as the level of use increases. This scalability problem (i.e., the inability of a system to accommodate an increased workload) is not new. However, the increasing size (more lines of code, greater number of users, widened scope of demands, and the like) of U.S. Department of Defense systems makes the problem more critical today than in the past. This technical note presents an analysis of what is meant by scalability and a description of factors to be considered when assessing the potential for system scalability. The factors to be considered are captured in a scalability audit, a process intended to expose issues that, if overlooked, can lead to scalability problems.

Aaron Lun (08:21:25): > Hm. The only reason was because I was too lazy to set up R on our new cluster. Don’t know if that’s a good reason, but hey, we can do it on the desktop.

Aaron Lun (08:23:59): > I should mention that the bottlenecks are mostly onscran’s side. It would be straightforward to parallelize, but because I was running things in the background on my desktop, I wanted a core or two to actually do my day-to-day work.

Vince Carey (08:24:38): > i thought so … so you don’t even need to mention the time taken.

Aaron Lun (08:26:30): > yeah, that’s probably true.

Vince Carey (08:26:31): > IMHO. but it is useful to know that if we have K cores with modest RAM per core we can get approximately K-fold increase in throughput. when that is not the case it is useful to know why.

Aaron Lun (08:26:55): > I can guarantee that to be the case for everything but the PCA step at the end.

Kasper D. Hansen (08:27:25): > We can do that for PCA as well. I promise.

Kasper D. Hansen (08:27:50): > As long as the we have a final step where we have 4-8Gb I think

Vince Carey (08:29:02): > I think it is important to be clear on whether this is deterministic PCA or partial PCA or based on random projections…

Aaron Lun (08:29:21): > here it was randomized PCA

Aaron Lun (08:29:34): > Though my understanding is that it has very well-understood convergence

Kasper D. Hansen (08:30:03): > Randomized PCA gives exact results

Kasper D. Hansen (08:30:26): > It is guranteed to be equal to standard PCA up to an essentially arbitrary precision

Aaron Lun (08:30:29): > yeah, that’s what I thought

Kasper D. Hansen (08:30:47): > like something with error of 1e-18 for the settings I use.

Kasper D. Hansen (08:31:03): > which is far below the precision of PCA in the first place

Aaron Lun (08:31:08): > cool.

Kasper D. Hansen (08:31:41): > perhaps its not 1e-18. Its some small number though.

Kasper D. Hansen (08:32:10): > And there is a parameter which can make this arbitrary small with a tiny increase in time.

Kasper D. Hansen (08:32:56): > The only issue with random projection PCA is that you only get the first k singular values and vectors where k is chosen by the user.

Aaron Lun (08:33:17): > ah, that’s fine.

Aaron Lun (08:33:28): > for my applications anyway

Kasper D. Hansen (08:33:41): > There are other types of fast PCA (mentioned in proposal) and I am not up to date on the precision of the computation for that, but I assume its good

Aaron Lun (08:33:55): > sounds promising.

Aaron Lun (08:34:01): > Do they require choice of k as well?

Kasper D. Hansen (08:34:04): > yes

Kasper D. Hansen (08:34:22): > so I have not really seen the benefit in digging into the internals of those

Kasper D. Hansen (08:34:34): > Im starting with what I know

Aaron Lun (08:34:53): > I guess it might be useful if they have a more convenient data access pattern. Who knows.

Kasper D. Hansen (08:35:09): > I doubt so

Kasper D. Hansen (08:35:53): > but perhaps I should check up on that

Kasper D. Hansen (08:35:57): > its a good point

Davide Risso (09:16:17) (in thread): > Yes, I will reformat my slide to look like yours

Martin Morgan (10:45:35): > one thing about the loom format, for instance, is that doesn’t represent all the richness of SingleCellExperiment, e.g., reducedDims, int_colData - Attachment: Attachment > Is there anything we want to say on data formats, then? That doesn’t already overlap with data representations (i.e., SE, eSets, and so on)

Aaron Lun (11:13:04): > I guess I can just slip it in when I talk about BioC’s experience with experimental data in slide 11.

Mike Smith (11:17:30): > I’m not sure I’ve really improved slide 5 with my effort to break it in two. I wanted to highlight which bits of the ‘wall’ we were going to address, but I hadn’t really appreciated it was packages below the line and concepts above (and we want to tackle everything), so now it’s just a brighter set of bricks.

Aaron Lun (11:19:39): > I didn’t realize it was that, I thought below the line was foundations and above the line was the other stuff

Mike Smith (11:45:25): > Same, until I started thinking about colours and realised that was the existing clustering

Aaron Lun (13:32:31): > some of the other groups are pretty chaotic

Aaron Lun (13:36:21): > The “scale” and “compression” groups might be relevant

Aaron Lun (13:36:36): > And we can always talk to the various algorithm groups

Aaron Lun (13:37:00): > “portals” didn’t really have much re. data access; seemed to be mostly visualization

Davide Risso (13:39:08): > where on the website are you looking at?

Aaron Lun (13:39:24): > if you click on the links you go into the group materials

Davide Risso (13:39:36): > also, given what Jeremy said, do you think we need a slide on how to talk to python?

Aaron Lun (13:39:40): > though some of them are probably working out of the system, like we are

Aaron Lun (13:39:54): > Probably just a line mentioning interoperability between languages?

Aaron Lun (13:40:03): > C++, Java, Python

Aaron Lun (13:40:20): > I don’t know where that fits in our setup

Aaron Lun (13:40:28): > I think we can just respond if asked

Davide Risso (13:40:34): > yeah

Aaron Lun (13:40:40): > We have far too many slides anyway

Aaron Lun (13:41:19): > Need to cut 1-2, hopefully from the intro when that gets cleaned up

Davide Risso (13:41:51): > Well, we will definitely cut at least one from 5-7 right?

Aaron Lun (13:42:02): > At least two, I would say from 4-7.

Davide Risso (13:42:33): > And I should probably cut 27 and just talk about those points in 28

Aaron Lun (13:42:48): > Yeah, too much text anyway for reading. Shame, I like the font.

Aaron Lun (13:43:44): > I think the main thing is that after dimensionality reduction, scRNA-seq data is much like flow data (or Cytof, if one is to be pedantic), and it may be possible to leverage existing methods for the latter to use on the former.

Aaron Lun (13:44:11): > And BioC has a large catalogue of flow methods, which makes it a logical choice

Aaron Lun (13:47:24) (in thread): > I think a good place to mention this (as an aside) is in the conclusion where you talk about collaborations. You can just say that the R development framework has a strong record in multi-language environments (C++ via Rcpp, Java via rJava, python via reticulate), so we can collaborate with people even if they’re not using R.

Aaron Lun (13:47:36): > I would leave this just as a comment, though, no need to go into a lot of detail unless asked.

Davide Risso (13:47:49): > Sounds good

Davide Risso (13:48:10): > As for the cytof, I agree that that should be the message

Aaron Lun (13:59:17): > And indeed, this is where our links with the R core team come in handy. If you read Writing R Extensions thoroughly, there are clear guidelines for how to include native code and Java code in packages. Imagine if this could also be done for Python (or Julia, or whatever the kids are doing these days).

Davide Risso (15:28:41): > <!channel>so that what Aaron and I doesn’t sound too cryptic, the only real piece of feedback that we got from Jeremy was that they are very interested in interoperability between scanpy, Seurat, and Bioconductor.

Davide Risso (15:31:14): > One slide at the end on how we envision such interoperability (either via a common hdf5 format or via integration between R and python) might be a good idea.

Kasper D. Hansen (15:32:32): > Sure

Kasper D. Hansen (15:32:52): > of course Seurat should use our stuff when its mature and scanpy are sooo happy with themselves

Martin Morgan (15:33:07): > loom being the hdf5 format?

Davide Risso (15:33:33): > perhaps, if we are happy with loom — I really don’t have any experience with it

Kasper D. Hansen (15:33:47): > interoperability with seurat should be easy in principle

Davide Risso (15:34:19): > I agree, but perhaps we should be proactive and ask them at the meeting what they need that is currently not in the SingleCellExperiment class

Davide Risso (15:34:33): > I suspect very little, if anything

Davide Risso (15:35:15): > Otherwise we can always create a coercion method, but wouldn’t be better if they adopted SingleCellExperiment?

Davide Risso (15:35:34): > @Stephanie Hickshas some experience with the Seurat S4 class

Kasper D. Hansen (15:37:16): > The issue they would have - and I understand that - is depending on something before it is ready

Kasper D. Hansen (15:37:38): > If Seurat moves to SingleCellExperiment, which they should when its mature, what happens if things are not working

Kasper D. Hansen (15:37:55): > So the question should really be what do they need

Stephanie Hicks (15:53:38): > I mean, I assume Seurat (version whatever the current release is) would just remain in CRAN, and there would be an effort made to overhaul Seurat entirely to depend on theSingleCellExperimentobject, which would be submitted to BioC devel

Stephanie Hicks (15:54:02): > and whenever Seurat BioC devel version moves to the release, then the CRAN version would no longer be supported?

Kasper D. Hansen (16:02:08): > That all assumes they want to do so. But they have success with their current model, why change it. On CRAN they have control over release time etc. Just play devils advocate here

Mike Smith (16:23:01): > Just to reiterate what Davide said, I think Jeremy’s expression was “you’re winning” when he saw the state of our slides. In addition to the scanpy, Seurat discussion he more generally mentioned that if you see any other group onhttps://grants.czi.technology/human-cell-atlas/comp-tools/where you think there is potential for overlap/collaboration then the talk is a good place to highlight it so you can seek each other out during the meeting.

Mike Smith (16:29:16): > Loom wasn’t mentioned explicitly as a data format, more just the idea that it would be cool if there was some effort made towards allowing objects/results created in one language or tool be easily accessible in another. The work by@Martin Morganand@Daniel Van Twisktowards creating datasets that are not transposed when you comparerhdf5with python is a good start towards that if we want to go the HDF5 route.

Aaron Lun (17:07:13): > Seurat’s not going to move the SCE class for maturity reasons or otherwise, because it’s just more work for them without any obvious gain. They just don’t think in terms of interoperability, because in their minds, Seurat is the ultimate solution for everything. If we want them to move, we have to make them move, by providing so much good stuff with the SCE class that the case for transitioning is overwhelming. That’s part of why I’ve been frenetically putting stuff together (e.g.,iSEE,DropletUtils) and promoting its use throughout theSingleCellbiocViews.

Aaron Lun (17:15:22) (in thread): > Yeah, I was pretty irritated with the scanpy hype as well. I am curious to see how long this enthusiasm lasts - they’re going to find that maintaining successful infrastructure is a lot more work than it looks.

Aaron Lun (17:17:44) (in thread): > And on that note,sconestill isn’t using the SCE!@Davide Risso!

Davide Risso (17:22:39) (in thread): > True. But you are the one that sees Michael at all the jamborees! You should force him!

Davide Risso (17:23:33) (in thread): > Plus, one package at the time we’re getting there… clusterExperiment was higher on the list of priorities

Aaron Lun (17:44:14) (in thread): > Well, I was too:face_vomiting:to really communicate…

Stephanie Hicks (20:54:44): > @Davide Rissoon the topic of interoperability between R and Python, I would mention (maybe include a note somewhere on a slide) about recent efforts announced today from RStudio and Wes with Ursa Labs:https://blog.rstudio.com/2018/04/19/arrow-and-beyond/

2018-04-21

Aaron Lun (09:02:28): > So - uh - what are we doing with slides 4-7?

Aaron Lun (09:03:57) (in thread): > Fascinating. Was sold on the C++ bindings, that will be very interesting.

Martin Morgan (10:17:15): > 4 seems meaningful independent of 5-7. I don’t think there’s much content in 5-7 other than to list projects / people, so I’m not sure there’s value in spreading it to two slides, I didn’t intended to go into each project, or even mention the titles. But I guess that’s just the justification for the original flow; if people want a more ponderous approach here that’s ok with me…

Aaron Lun (10:25:26): > Yes, 4 seems sensible. For 5-7; perhaps the important thing seems to be the people involved, rather than the titles? If the people are listed under particular themes, I imagine that would be more than enough detail.

Martin Morgan (10:41:24): > yes, maybe with pictures of those attending

Mike Smith (13:22:33): > I feel no affection for slides 6-7 so feel free to trash them if you like

Aaron Lun (13:32:23): > and the pictures are done. I picked the most photogenic ones, if I do say so myself.

Aaron Lun (13:33:24): > @Davide Risso’s faculty photo makes him look like a communist revolutionary.

Aaron Lun (13:40:44): > 28 slides now, minus three intermissions = 25 slides. So that should do, I would think.

Aaron Lun (13:41:53): > @Mike SmithMight be worth using your green/blue colouring scheme on slide 4, though.

2018-04-23

Stephanie Hicks (08:55:09) (in thread): > yeah, I’m pretty intrigued too. The attempt to save and read files between python and R (feather) was a good first step, but still too clunky imho. This has the potential to be a game changer.

2018-04-24

Vince Carey (08:45:59): > I added two comments to the slides. Martin can consider whether slide 2 could be amplified with some details on our software engineering, specifically issues of synchronization with the base language. Slide 9 had been discussed, I thought, and the points of contact between R/Bioc and pieces of the DCP are more numerous than shown.

Vince Carey (09:33:39): > Straw man: github + travisCI + docker are sufficient for an ecosystem addressing annotation + analysis + curated experiment data

Aaron Lun (09:44:15): > Um. I suppose it could be, for a development environment? I mean, that’s basically how our workflows operate.

Raphael Gottardo (12:47:40): > Regarding the slides,@Mike Jiangand I have been on several calls with the DCP group to discuss HDF5 and the new HSDS service. If it’s worth mentioning? As interaction between Bioconductor and DCP. This is part of their effort to benchmark file formats.

Vince Carey (12:50:40): > Yes. However we had a discussion with Marcus Kinsella (CZI), and there was a bit of hedging about strength of commitment to HDF5 at his end. So it makes sense to be pragmatic.

Aaron Lun (12:51:22): > I will mention HDF5, but will qualify it with comments about how it is not the be-all-and-end-all of our out-of-memory approach.

Aaron Lun (12:56:08): > As for file formats: I was just going to make some general comments on how BioC can help DCP file format choices. Don’t know if I should be more specific. If you give me a short sentence to say, I’ll try to remember it.

Davide Risso (14:05:40): > <!channel>Are you all happy with the presentation? If so I can make a pdf and send it to Jeremy.

Raphael Gottardo (14:07:09) (in thread): > Agree, I wanted to mention the fact that we can advise the DCP as opposed to saying we should select HDF5.

Davide Risso (18:01:23): > For those of you not in the czi slack, there is currently a discussion on curating a collection of datasets for benchmarks

Davide Risso (18:01:54): > I think that ExperimentHub is a very natural venue for that

Peter Hickey (18:03:43): > do you have the address for that slack? i couldn’t find it

Davide Risso (18:04:41): > https://cziscience.slack.com/join/shared_invite/enQtMzQ1NDQxNzAwNTMzLTRmNjIzZTkzMTYzY2NlM2YzYmE4NTYxNDczZmI2YTQ4YmU5NjdlMTA0MzljMDY4ODY2NWEyNGM1N2E1NTBjMmI

Kasper D. Hansen (20:05:41): > That was great guys!

Kasper D. Hansen (20:05:55): > I never knew Aaron had such passion for ontologies

Wolfgang Huber (20:18:20): > On the topic of efficient computation, it might be worth to have a look again at renjin - not to replace regular R, but to send some compute-intensive tasks to. See e.g.https://twitter.com/wolfgangkhuber/status/930823845543587840anddocs.renjin.org/en/latest/package/ - Attachment (twitter): Attachment > Impressive…. > From: “Using Renjin as an R Package” (http://docs.renjin.org/en/latest/package) https://pbs.twimg.com/media/DOrzI7eWsAA4WeL.jpg

Peter Hickey (20:43:08) (in thread): > Great job!

Stephanie Hicks (20:48:58): > I’m guessing the presentation was today? How did it go? Also how was Day 1?

Davide Risso (21:03:01): > Day 1 is not over yet:tired_face:

Davide Risso (21:05:46): > the presentation was fine, Peter Kharchenko approached me to say that it would be good to have at least some coercion from his pagoda object to SingleCellExperiment

Davide Risso (21:08:35): > perhaps we can convince him to switch to SingleCellExperiment

2018-04-25

Stephanie Hicks (10:31:22): > oh, that would be good. Hopefully Peter means he’s willing to develop the coercion function?

Stephanie Hicks (10:31:58): > I haven’t really used pagoda, so I’m not sure how difficult that would be

Aaron Lun (12:07:25): > Yes, the idea would be to have a coercion function on their end.

Aaron Lun (12:07:53): > Otherwise implementing a coercion inSingleCellExperimentwould be crazy, we’d have to support every man and his dog.

Davide Risso (12:10:24): > yes, coercion is on their end, but they asked for help figuring it out

Davide Risso (12:11:06): > he mentioned that he was having compatibility issues between Matrix and Bioconductor

Stephanie Hicks (12:14:01): > so i heard a rumor there were bonfires and s’mores on the beach yesterday. What’s on tap today?

Davide Risso (12:17:07): > Beach volley or Santa Cruz boardwalk or hiking in the forest:sunglasses:

Keegan Korthauer (12:18:32): > @Davide Rissoyou neglected beach yoga!

Davide Risso (12:18:50): > Right! Sorry!:slightly_smiling_face:

Davide Risso (12:19:13): > I’m so into yoga that it didn’t even register

Keegan Korthauer (12:19:17): > I’m not offended; I chose hiking!

Kasper D. Hansen (12:20:27): > I was very tempted by the beach yoga, but I think I’ll do hiking as well

Stephanie Hicks (12:21:28): > too many great choices! have fun you guys and plz send pics:slightly_smiling_face:

Davide Risso (12:22:45): > Sorry that you couldn’t make it!

Stephanie Hicks (12:23:34): > ha, no apologies needed!

Vince Carey (15:00:55): > We have a repo athttps://github.com/vjcitn/biocHCAwhere we collect notes on tasks to do at this meeting, if possible. If you would like push access to the repo just give me the github id to use - Attachment (GitHub): vjcitn/biocHCA > biocHCA - Notes on Bioconductor tasks for HCA 2018 meeting

Stephanie Hicks (16:07:20): > @Stephanie Hicks pinned a message to this channel.

Daniel Van Twisk (21:23:22): > I need to ask a question regardingLoomExperiment. Since we are planning on increasing the interoperability betweenSingleCellExperimentandseuratin hopes of them eventually adoptingSingleCellExperimentobjects, I am unsure of what the future ofLoomExperimentwith regards to storing the added slots ofSingleCellExperiment. I need to point out thatloomRdoes have a method of exportingseuratobjects toloomformat usingConvert()and if we are makingSingleCellExperimentobjects more likeseurat, the functionality seems to exist there.

Aaron Lun (22:59:30): > I think we still need a LoomExperiment distinct from loomR, because hdf5r seems crappy and we want to use our HDF5Array machinery. We also need it to be distinct from a SCE as there’s weird slots in the Loom file that I don’t really want to support generally in a SCE, but should (in theory) be supported for whoever wants them.

2018-04-26

Daniel Van Twisk (00:27:16): > Okay. I’m going to try to implementLoomExperimentwith the newSingleCellExperimentslots. I’m a bit more optimistic about things now since it seems thatloomRhas found a way around some of our initial concerns.

Martin Morgan (05:14:53): > Not sure whether this has been resolved face-to-face, but ‘we’ need a plan for ‘trainee’ presentations for Thursday. Who is up for that?

Kasper D. Hansen (12:12:30): > So@Aaron Luntold me yesterday about HDF5 cache size and its impact on performance in beachmat. I think this is critical information we should make sure we internalize while we are together (@Mike Smith,@Peter Hickey)

Aaron Lun (12:15:29) (in thread): > Nominating@Keegan Korthauerand@Mike Smith.

Keegan Korthauer (12:25:58) (in thread): > Happy to!

Davide Risso (14:54:28): > @Davide Rissouploaded a file:LinearEmbeddingMatrix.png - File (PNG): LinearEmbeddingMatrix.png

Vince Carey (16:07:39): > here’s a request for the final discussion – in describing the SingleCellExperiment work, explain clearly what the colData component is, and ask that it be easy to generate a highly informative colData from the HCA metadata about an experiment.

Keegan Korthauer (16:58:22): > @Vince Careyby the final discussion, do you mean the 5 min presentation? If so, I think we should have room for this as long as@Mike Smithonly has 1 slide and about 2minutes of explanation to discuss parallel hdf5

Keegan Korthauer (16:59:45): > @Keegan Korthaueruploaded a file:@wizard_of_oz @daviderisso prototype LEM slideand commented: Does it make sense? Any more dim red methods to add? More ‘interpretation’ boxes to add on the right? - File (PDF): @wizard_of_oz @daviderisso prototype LEM slide

Aaron Lun (17:00:15): > Projection is probably more important than factor properties, right?

Keegan Korthauer (17:00:25): > Ah yes

Vince Carey (17:01:38): > @Keegan Korthauer, yes i was hoping that this could be suggested somewhere in your brief slides … if it fits. if not, no problem!

Davide Risso (17:04:03): > @Keegan KorthauerI would also mention low-rank reconstruction / approximation of the original data

Keegan Korthauer (17:05:31): > Any ideas for pictures to represent projection and low-rank representations?

Davide Risso (17:11:08): > I think just the word projection instead of clustering could work with the same picture

Davide Risso (17:12:09): > Low rank approximation could be heatmaps?:thinking_face:

Keegan Korthauer (17:48:20): > @Keegan Korthaueruploaded a file:new & improved LEM slide - File (PDF): new & improved LEM slide

Davide Risso (17:48:54): > Nice!

Keegan Korthauer (17:56:34): > ok added it to the slide deck along with a slide with the full SingleCellExperiment graphic, where I’ll explain colData and importance of a compatible HCA metadata representation

Mike Smith (18:31:22): > One slide from me added, 2 mins should be fine!

Aaron Lun (19:45:51): > @Keegan Korthauerworth mentioning that the LEM is tested and documented!

Keegan Korthauer (19:46:46): > Definitely!!:smile:

Aaron Lun (19:46:49): > which is like 90% of the work.

Davide Risso (19:47:23): > is there still time to add the codecov badge to the slide?:wink:

2018-04-30

Davide Risso (12:52:09): > Relevant to some of our discussions:https://twitter.com/xkcdcomic/status/990960967373635584?s=21:upside_down_face: - Attachment (twitter): Attachment > Python Environment https://xkcd.com/1987/ https://m.xkcd.com/1987/ https://pbs.twimg.com/media/DcCZv4WV4AELKaI.jpg

2018-05-09

Stephanie Hicks (21:06:42): > Is there a plan to submit a Birds of a Feather session on progress made for the HCA-CZI project? Would be good to present to the larger BioC community what we’ve been up to the past year and what we still have planned.

2018-05-10

Stephanie Hicks (06:13:38): > Is someone already in the process of writing up a short proposal? (http://bioc2018.bioconductor.org/call-for-abstracts)? If not, I’m happy to start that. - Attachment (BioC 2018): BioC 2018: Where Software and Biology Connect > Where Software and Biology Connect. July 25 - 27, Toronto, Canada.

Martin Morgan (06:58:01): > Would be great if you did that Stephanie…

Stephanie Hicks (11:27:40): > I started a birds of a feather proposal the HCA-CZI project. If everyone could edit directly, that would be helpful. Due date is May 17.https://docs.google.com/document/d/1Zmc78gvHaXvpuT5f9ey5-T_so3fUmQ5ywAeHR-vMA5s/edit?usp=sharing

Davide Risso (14:52:20): > Thanks@Stephanie Hicks! Looks good! I’m not sure I have much to add, except perhaps that we could discuss possible alternatives to HDF5 for data representation

2018-05-11

Aedin Culhane (17:08:40): > @Vince CareyHappy to contribute to S4 dimensionality reduction class. I did a comparison of different implementations of PCA recently. The “naming” convention of the matrices is all over the place!! Summary is athttps://github.com/aedin/ODSC_2018/tree/master/PCA_vignette - Attachment (GitHub): aedin/ODSC_2018 > ODSC_2018 - Slides, tutorial presented at ODSC 2018

Aedin Culhane (17:14:48): > @Aaron Lunquick basic question. I have some bulk and scRNAseq data with too few n that I need to compare. The scRNAseq data was prepared using different tissue disassociation protocols and the question is which protocol is best. However the scRNAseq counts are low, so the data QC is an issue. What tips/suggestions do you have? Thanks ;-))))

2018-05-12

Aaron Lun (10:44:00) (in thread): > Hm. What library prep protocol did they use?

Aaron Lun (10:46:22) (in thread): > The most obvious thing to do is to check the proportion of reads mapping to mitochondria, or to spike-ins. Suboptimal dissociation should cause some cell damage and increase those proportions.

Aaron Lun (10:47:59) (in thread): > The other obvious thing is to just throw everything together and form some clusters (hoping for negligible batch effects, if they were sensible and processed everything at the same time). A not-so-good dissociation protocol might be losing cells of a particular type, which should manifest as the depletion of cells from particular clusters.

2018-05-16

Stephanie Hicks (06:24:27): > I plan to submit the BOF proposal this afternoon. If you wish, feel free to make edits or suggestions.

Stephanie Hicks (11:04:38): > BOF proposal submitted:https://github.com/Bioconductor/BioC2018/issues/5 - Attachment (GitHub): SIG: Statistical Analysis and Comprehension of the Human Cell Atlas in R/Bioconductor · Issue #5 · Bioconductor/BioC2018 > Introduction of yourself: Bioconductor developers involved in the Chan Zuckerberg Initiative (CZI) to develop collaborative computational tools for the Human Cell Atlas (HCA). Should it be held dur…

2018-06-22

Stephanie Hicks (05:54:01): > This seems relevant@Aedin Culhanehttp://science.sciencemag.org/content/early/2018/06/20/science.aat5691 - Attachment: Attachment > Hi . What are the main competitors to 10x. Also which scRNAseq approaches retain spatial info - Attachment (Science): Three-dimensional intact-tissue sequencing of single-cell transcriptional states > Retrieving high-content gene-expression information while retaining 3D positional anatomy at cellular resolution has been difficult, limiting integrative understanding of structure and function in complex biological tissues. Here we develop and apply a technology for 3D intact-tissue RNA sequencing, termed STARmap (Spatially-resolved Transcript Amplicon Readout Mapping), which integrates hydrogel-tissue chemistry, targeted signal amplification, and in situ sequencing. The capabilities of STARmap were tested by mapping 160 to 1,020 genes simultaneously in sections of mouse brain at single-cell resolution with high efficiency, accuracy and reproducibility. Moving to thick tissue blocks, we observed a molecularly-defined gradient distribution of excitatory-neuron subtypes across cubic millimeter-scale volumes (>30,000 cells), and discovered a short-range 3D self-clustering in many inhibitory-neuron subtypes that could be identified and described with 3D STARmap.

2018-07-12

Stephanie Hicks (10:46:29): > For #bioc2018, we have BOF session scheduled for 11am on developer day to update the bioC community on our ongoing project with the CZI-HCA and to get feedback. I started a set of google slides (https://docs.google.com/presentation/d/1fhXUcnJvjRC-_Z3SA4_5x9tIXl186oNTKXUw3ogfnHk/edit?usp=sharing) to present. Could each of the 8 groups could add 2-3 slides describing (i) what they proposed to do, (ii) any progress you’ve made / ideas you’re thinking about, and (iii) timeline / future for what you plan to do in the next year. I pre-filled out blank slides with the titles of the 8 projects to organize it a bit.

Aaron Lun (11:59:03): > Hm.

Aaron Lun (11:59:10): > Got to remember what I proposed to do.

Vince Carey (17:32:28): > @Stephanie Hicksthanks for doing this. More generally, has anyone seen today’s post by Tim Tickle concerning Brain Atlas activities related to clustering. It sounds like there will be calls and Bioc teams could get involved.

Vince Carey (17:33:14): > https://cziscience.slack.com/archives/C8ME9JMBP/p1531427694000183

Davide Risso (18:37:48): > @Vince CareyI’m part of the Brain Initiative group

Davide Risso (18:38:32): > I don’t think there are currently any open calls (i.e. $$$) related to this

Davide Risso (18:39:04): > This is a consortium made of awardees from last year Brain initiative grants

2018-07-13

Raphael Gottardo (03:43:49): > Have you guys seen this: > PAR-18-844https://grants.nih.gov/grants/guide/pa-files/PAR-18-844.htmlInvestigator Initiated Research in Computational Genomics and Data Science (R01 Clinical Trial Not Allowed) > NIH/NHGRI > Deadline: November 16, 2018; July 16, 2019; November 16, 2019; July 16, 2020; November 16, 2020; July 16, 2021 > The purpose of this funding opportunity announcement (FOA) is to invite applications for a broad range of research efforts in computational genomics, data science, statistics, and bioinformatics relevant to one or both of basic or clinical genomic science, and broadly applicable to human health and disease. This FOA supports fundamental genomics research developing innovative analytical methodologies and approaches, early stage development of tools and software, and refinement or hardening of software and tools of high value to the biomedical genomics community. Work supported under this FOA should be enabling for genomics and be generalizable or broadly applicable across diseases and biological systems. All applications should address how the methods would scale to address larger and larger data sets. - Attachment (grants.nih.gov): PAR-18-844: Investigator Initiated Research in Computational Genomics and Data Science (R01 Clinical Trial Not Allowed) > NIH Funding Opportunities and Notices in the NIH Guide for Grants and Contracts: Investigator Initiated Research in Computational Genomics and Data Science (R01 Clinical Trial Not Allowed) PAR-18-844. NHGRI

Raphael Gottardo (03:44:18): > I wonder if we should try to write an application to continue some of the work we’ve proposed to CZI?

Raphael Gottardo (03:45:14): > Also, should we think about writing a review paper on single-cell genomics analysis with Bioconductor? Sort of like the@Wolfgang Huberet al. paper.

Aaron Lun (05:37:31): > I’m up for that.

Stephanie Hicks (05:55:12): > those are great ideas@Raphael Gottardo. I’m up for that too

Aaron Lun (06:28:30) (in thread): > I’ve added a couple of slides on what I’ve done so far.

Kasper D. Hansen (09:14:04): > sure

Davide Risso (17:23:36): > Happy to participate as well!

Martin Morgan (19:39:31): > :thumbsup:

2018-07-16

Stephanie Hicks (09:26:41) (in thread): > thank you!

Stephanie Hicks (09:30:40): > As #bioc2018 is next week, just wanted to send a friendly reminder to@Vince Carey@Aedin Culhane@Martin Morgan@Daniel Van Twisk@Mike Smith@Raphael Gottardo@Mike Jiang@Kasper D. Hansen@Peter Hickey@Davide Risso@Rafael Irizarry@Keegan Korthaueret al to add a few slides (see message below) for the BOF session on developer day. Thank you everyone! - Attachment: Attachment > For #bioc2018, we have BOF session scheduled for 11am on developer day to update the bioC community on our ongoing project with the CZI-HCA and to get feedback. I started a set of google slides (https://docs.google.com/presentation/d/1fhXUcnJvjRC-_Z3SA4_5x9tIXl186oNTKXUw3ogfnHk/edit?usp=sharing) to present. Could each of the 8 groups could add 2-3 slides describing (i) what they proposed to do, (ii) any progress you’ve made / ideas you’re thinking about, and (iii) timeline / future for what you plan to do in the next year. I pre-filled out blank slides with the titles of the 8 projects to organize it a bit.

Martin Morgan (11:13:32) (in thread): > Daniel and I will add a slide on plans for HCA data access (currently slide 6). Do we want something about DelayedArray (currently slide 11)?

Stephanie Hicks (11:52:29) (in thread): > Feel free to move things around. Just wanted to put something down for people to edit.

2018-07-19

Vince Carey (00:06:35): > Thanks again Stephanie –@Aedin CulhaneI have added some material that should suffice but your comments/changes welcome.

Brendan Innes (14:17:29): > @Brendan Innes has joined the channel

2018-07-24

Stephanie Hicks (20:35:34): > <!here>If anyone has slides to add for the BOF session tomorrow, could you add them tonight?

2018-07-25

Sehyun Oh (10:45:41): > @Sehyun Oh has joined the channel

Stephanie Hicks (12:00:14): > Here is a list of questions from audience at BOF: > > * Pete asking to create a formal place (e.g. github issue) for a wish list of algorithms that we want/need to be implemented. e.g. we have committed to do PCA, k-means, linear models, etc with the CZI, but what are others (to enage the larger bioC community) > * Mike asking about if we can get summary level information (e.g. medoids profile for cell types in HCA data similar to GTEx); Not clear that we are ready for this though because it’s not clear we have defined cell types > * Lori asking about how much have we worked with 10X genomics people? Should we ask for help for 3rd party development with 10X genomics?

Aaron Lun (12:11:16): > I did email 10X about putting some Bioconductor software athttps://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/rkit. No response.

Stephanie Hicks (14:05:41): > Interesting@Aaron Lun. I mentioned that question to@Martin Morganand he said@Aedin Culhanemight know someone at 10X?

Aaron Lun (14:06:10): > ¯*(ツ)*/¯

Aaron Lun (14:06:50): > That would help.

Aaron Lun (14:07:09): > I didn’t chase it up though.

Stephanie Hicks (14:09:21): > Is there a “living document” that gets updated with each BioC release depicting a high level (maybe tree like webpage?) overview of what all is available in BioC? I know there is the How To for Methods and Classes (https://bioconductor.org/developers/how-to/commonMethodsAndClasses/), and you can search for something if you happen to know it exists (https://bioconductor.org/packages/release/BiocViews.html), but there were a ton of talks this morning containing packages that I hadn’t heard of. And I’m not sure how else I would have found them otherwise.

Aaron Lun (14:17:30): > It would be nice to have an automatically generated D3 graph that people can use to explore the Bioconductor ecosystem if they don’t know what they’re looking for.

Aaron Lun (14:17:39): > Colourable by biocViews, etc.

Stephanie Hicks (14:18:16): > oooh yeah, that would be nice

Stephanie Hicks (15:15:18): > Another idea discussed on coffee break with@Davide Risso&@Peter Hickey: what if you used NPL on the description in each BioC package to try and create similarities between packages for the D3 graph?

Aedin Culhane (15:20:58): > Hi@Aaron Lunyes I know a couple of people at 10x and I went out there earlier in the year (Feb) and gave a presentation on Bioc. At the time I showed them Martin’s 10X package and delayed arrayed. I asked them to update the website’s info on Bioconductor. It never happened. Deana Church is the main biooinfo person there that you need to contact .

Aedin Culhane (15:21:32): > deanna.church@10xgenomics.com

Davide Risso (15:30:30): > Aaron somebody beat you at the D3 package representation of packages!

Stephanie Hicks (15:31:10): > Boom

Stephanie Hicks (15:35:00): > Thank you@Shian Su! Looking forward to trying out BiocExplorer

2018-07-28

Vince Carey (18:55:47): > I thought our BOF session was very good. What should we do next? A white paper spelling out available results? The situation with the preview data seems unfavorable relative to expectations; the fact that Luyi Tian quantified the ischemic sensitivity data (using scPipe) was made known on slack but has not been acknowledged on the site. As@Elana Fertigmentioned to me, maybe future funding will be determined by our demonstration of self-organization and independent progress, so a) writing something up and b) making some noise about it may be in order.

Elana Fertig (18:55:52): > @Elana Fertig has joined the channel

Aaron Lun (19:44:07): > For me, I would like to see a <10 hour pipeline for 1 million cells from counts to clusters.

Aaron Lun (19:50:22): > Though putting my cynical hat on, this is more a demonstration of capability, rather than any pretense of scientific utility. Working on a 100,000 cell data set (or specifically, sitting next to someone who is working on it), I would say that raw computational time is negligible compared to trying to figure out what the results mean, e.g., interpretation of the clusters can take ~1 week of communication with domain experts.

2018-07-29

Vince Carey (14:20:24): > Can we think about scalability as opposed to clock time? Taking the preview ischaemic sensitivity data as a target, we have one quantification by Luyi, done prior to the availability of alevin. Is anyone considering quantifying this dataset with alevin/has anyone done it? If not – and if there is no dataset considered a more salient demonstration – I will take it on.

Vince Carey (14:25:30): > I would also propose that we do not downplay the scientific element of data architecture and design as fundamental components of the enterprise. Systematic benchmarking of out-of-memory methods undertaken thus far is important and should be brought to full maturity. This and other “pre-biological” elements of these workflows have to be given a lot of respect and resources for the long-run biology to be successful.

Davide Risso (15:24:45): > I agree with@Vince Careythat a well done benchmarking of out-of-memory methods would be an important scientific resource–and not trivial given what@Kasper D. Hansenand@Peter Hickeywere saying about speed dependence on disk performance

Martin Morgan (18:49:15): > is there not room for some commentary / development of appropriate statistical / computational methods for working with summaries (better than down-sampling) of large data? This seems like the right thing to do – fast exploratory analysis followed by comprehensive analysis only when the way forward is clear

2018-07-30

Aedin Culhane (17:18:31): > Hi. Has anyone processed the immune cell datahttps://preview.data.humancellatlas.org/

2018-07-31

Aaron Lun (08:38:55): > Thx@Aedin Culhanefor the email to 10x:party_parrot:

Aedin Culhane (15:28:23): > That parrot is getting everywhere…@Sean Davishas started something!!!!

Aedin Culhane (15:28:48): > @Aaron Lundid you see I also tweeted your workflow and added them to tweet ;-)))

Aaron Lun (16:07:54): > Oh - that’s nice to know. I’m not on Twitter so I don’t keep track of these things.

Neke Ibeh (16:19:55): > @Neke Ibeh has joined the channel

2018-08-01

Aedin Culhane (00:06:52): > ahhh each time I check slack and see one of those dancing parrots……..

2018-08-06

Kasper D. Hansen (14:01:21): > > As@Elana Fertigmentioned to me, maybe future funding will be determined by our demonstration of self-organization and independent progress, so a) writing something up and b) making some noise about it may be in order.

Kasper D. Hansen (14:02:11): > I think this is important to think about. I think benchmarking is important to us, but will not matter much to funders. I think getting Monocle and Seurat to use SingleCellExperiment would be a great achievement given how excited the CZI people were about this prospect

Kasper D. Hansen (14:02:41): > I think that access to the downloads via. ExperimentHub would beveryimportant

Kasper D. Hansen (14:02:59): > I guess nothing has happened re. definition of the API but perhaps it is time to make some noise about this

Kasper D. Hansen (14:03:04): > We might even have some influence …

Aaron Lun (14:03:51): > I think I saw@Daniel Van Twiskask about the API on the HCA slack

Kasper D. Hansen (14:04:23): > But for future potential funding we need to declare ourselves a success and that has to be reflected in achievements of course

Aaron Lun (14:05:20): > yeah

Kasper D. Hansen (14:05:22): > @Aaron LunOk. Too many channels with too little relevant action

Kasper D. Hansen (14:05:47): > But are the decision makes on these channels. Perhaps an email to our PO about who to contact might be good

Aaron Lun (14:06:04): > Well all I see are DCP people throwing out updates on the schema

Kasper D. Hansen (14:06:08): > Saying that we have watched slack but we don’t understand if anything/what is happening

Aaron Lun (14:06:13): > Like “changing biological_sex to sex”.

Kasper D. Hansen (14:07:03): > I don’t understand the overall design, but I have also paid zero attention. All I am saying is that (a) we have great experience to contribute and (b) it will look good to be hungry for development

Kasper D. Hansen (14:07:26): > Like “we need this to move our end forward. What exactly is the timeline, who is the important people and channels?”

Kasper D. Hansen (14:07:37): > As I see it we loose nothing by being a but pushy

Aaron Lun (14:10:16): > I guess I could talk to John to get the number of someone we can talk to at the coalface

Aaron Lun (14:11:22): > But the restful interface should have been out now, so maybe we should just do it rather than talk to people.

Kasper D. Hansen (14:20:33): > is “should have” = “has been” or is it vaporware?

Kasper D. Hansen (14:20:54): > Also, I think we should ask John, but we should also - for psychological reasons - consider asking the CZI people

Aaron Lun (14:21:12): > ¯*(ツ)*/¯

Aaron Lun (14:21:17): > let me have a look at the slack channel

Aaron Lun (14:21:53): > there’s this:https://github.com/HumanCellAtlas/data-storebut I don’t know half the words in theREADME. - Attachment (GitHub): HumanCellAtlas/data-store > data-store - Design specs and prototypes for the HCA Data Storage System (“blue box”)

Aaron Lun (14:30:43): > And there’s this:https://github.com/HumanCellAtlas/dcp-cli - Attachment (GitHub): HumanCellAtlas/dcp-cli > dcp-cli - HCA Data Coordination Platform Command Line Interface

Aaron Lun (14:31:46): > guess we could slap together something that mimics it via ExperimentHub

Martin Morgan (16:25:21): > @Daniel Van Twiskhas started work on HCABrowser. I’ll talk with him some more, now that I’ve looked at the page-with-half-known-words, and perhaps we can quickly establish our own test data storage system for all to play with

Daniel Van Twisk (20:10:42): > I messaged some dcp people regarding authorization to use some of the locked features of the data storage platform. They said that they are looking to open these features to the public in the near future (not sure how near that is). It may be good to set up a test data storage platform if our needs exceed their making the platform available, but it would be much easier to just get access to theirs.

Kasper D. Hansen (21:08:49): > Perhaps if we go through the right persons we can be elevated above “the public”

Daniel Van Twisk (21:24:23): > An email from a whitelisted domain (DSS_SUBSCRIPTION_AUTHORIZED_DOMAINS_ARRAY) is needed to access authorized features. Maybe if we can get a token from someone who has a whitelisted email we can access it? > > DSS_SUBSCRIPTION_AUTHORIZED_DOMAINS_ARRAY=([chanzuckerberg.com](http://chanzuckerberg.com)[ucsc.edu](http://ucsc.edu)[broadinstitute.org](http://broadinstitute.org)[ebi.ac.uk](http://ebi.ac.uk){human-cell-atlas-travis-test,broad-dsde-mint-{dev,test,staging}}.[iam.gserviceaccount.com](http://iam.gserviceaccount.com)) >

2018-08-07

Daniel Van Twisk (13:22:13): > They messaged me again and said they are looking to make the resource public before the end of the year.

2018-08-08

Vince Carey (11:35:45): > That doesn’t sound suitable for collaborative development. But@Aaron Lunshould have anebi.ac.ukemail address, no? I have a broadinstitute email address can can poke in if that will be helpful. But you have to give me the explicit steps. After we get a sense of what is at stake, technically, we should have a dialogue with Jeremy Miller about how to reduce siloing.

Aaron Lun (11:36:58): > I don’t have an EBI address, because I’m officially hired by Cancer Research UK. I try to avoid going to the EBI, it’s not a pleasant place.

Aaron Lun (11:37:21): > However, there are a few people in the group who do have this address, and if the steps are simple, I can ask them to help out.

Vince Carey (11:39:56): > :+1:

Kasper D. Hansen (13:45:33): > I think for political reasons we should get in touch with CZI people. It shows informal desire to move along

Aaron Lun (14:03:56): > I agree, and at least marcus has been responsive in chats with@Raphael Gottardoand@Mike Jiang.

Elana Fertig (20:07:37): > We are looking to create standardized outputs for latent spaces in the Nov 5 meeting. It fits in very naturally with the single cell class. Does someone want to join in?

Elana Fertig (20:08:07): > I’m sure you would be welcome at the meeting - we are already doing that for CoGAPS

Elana Fertig (20:08:29): > And need to think about standardized outputs to text for sharing with python

Elana Fertig (20:08:36): > It would be very low hanging fruit.

Elana Fertig (20:08:53): > Plus a fun meeting with donuts.

Martin Morgan (20:15:07): > one thing about the HCA access is that authorization is required for actions that modify data, so there is still a great deal to do in simply accessing existing (such as it is) resources.

2018-08-10

Martin Morgan (11:36:28): > <!channel>CZI would like to check in on how things are going, next Thursday 8/16 8am Pacific / 11am Eastern / 4pm Cambridge. Please thumbs up (one per grant) if you’re able to come, thumbs down otherwise.

Kasper D. Hansen (11:38:14): > We should have a list of “wants”, like the testing permission

Davide Risso (12:19:48): > I have another meeting at that time that is very difficult to move, perhaps@Stephanie Hickscan represent our grant

2018-08-11

Martin Morgan (19:45:41): > Here’s a link for meeting prep notes, including a link to the CZI-provided agendahttps://docs.google.com/document/d/12q13l1DiCXc8GcglZD07O_nmrG9UGZUhSCyDQqFzYpk/edit?usp=sharing

2018-08-13

Martin Morgan (12:16:38): > some organization added to the doc above; would be great to have furher bullet-point progress directed toward project aims. Also is there anything meaningful to say about collaboration with the larger group of CZI researchers?

Aaron Lun (13:57:01): > I’ve been doing a fair bit of work that could be called “HCA-related” but most of it relates more to John’s other grant.

Aaron Lun (13:57:44): > And I collaborate with other HCA-involved groups but not on HCA stuff, e.g.,iSEE.

Davide Risso (14:31:03): > A paper describing the clusterExperiment package has been accepted in PLOS CompBio as a software note and this was work by@Elizabeth Purdomand me (among others) related to clustering. We do have some rudimental HDF5 support and we definitely want to include the mini-batch k-means into the clusterExperiment framework once it’s mature so this could be worth mentioning.

Aaron Lun (14:31:04): > Was I meant to put stuff in the aims, or was I meant to put stuff in “updates”?

Aaron Lun (14:31:19): > We’ll I’ve moved it to the “updates” in any case.

Davide Risso (14:32:21): > I should say “provisionally accepted”

Martin Morgan (14:42:13): > I meant to have the ‘progress’ section have each aim introduce updates; the ‘updates’ were initial notes from Davide (I guess…) can you up the ‘progress’ section instead.

Martin Morgan (14:43:15): > I like the idea of ‘looking ahead’ so inserted some structure there; there are numbered sections for material to be filled in

Aaron Lun (14:45:19): > done

Aaron Lun (14:46:52): > From my end this feels like 30K GBP worth of work.

Aaron Lun (14:47:06): > Well, once thebeachmatrefactor is complete, anyway.

Aaron Lun (18:05:03): > Do we have details on the call? URL, etc.?

Martin Morgan (18:12:07): > No details yet; I’ll forward contact info to ‘Fiona Griffin’ of those with thumbs up

2018-08-14

Aedin Culhane (10:03:47): > @Elana Fertigcould the in press review count as Bioc/Fertig interaction…. ?

Elana Fertig (10:04:41): > It definitely could

Elana Fertig (10:04:45): > I don’t know

Elana Fertig (10:04:55): > I get the sense they want to see collaborations forming through the consortium

Elana Fertig (10:05:05): > I dunno — this is all rumor I hear — it may not be founded on anything

Elana Fertig (10:05:49): > We are also making a push that our latent spaces tool that are in R use the infrastructure of#singlecellexperimentwhich we will encourage the group to do as well

Davide Risso (10:08:09): > I think that will be a good point to emphasize since they seem to care a lot about interoperability

Elana Fertig (10:08:57): > @Davide Rissomaybe we can arrange a skype call or something for the latent spaces meeting on Nov 5

Elana Fertig (10:09:02): > we are going tob e focusing on output formats

Elana Fertig (10:09:08): > for that reason

Elana Fertig (10:09:17): > it would be great to have the R stuff standardized

Davide Risso (10:09:30): > sure! Happy to organize a skype call… what is the Nov 5 meeting?

Elana Fertig (10:09:34): > and think about how we can get the python code to feed into something similar

Elana Fertig (10:09:37): > on the czi channel

Elana Fertig (10:09:40): > check out the latent spaces

Elana Fertig (10:09:44): > I’ll send you a message

Elana Fertig (10:09:52): > it’ll be in person in Philly if you can make it

Davide Risso (10:10:21): > thanks! And sorry but too many channels!

Elana Fertig (10:11:27): > too many slacks!!!

Elana Fertig (10:11:31): > I just sent a message to the group there

Elana Fertig (10:11:41): > I’m biased — I think it would be really helpful

Elana Fertig (10:11:44): > thanks for being open to it!

Aedin Culhane (11:19:11): > NSForest github is now up@Vince Careyhttps://github.com/JCVenterInstitute/NSForesttps://github.com/JCVenterInstitute/NSForest

Aedin Culhane (11:19:55): > @Elana Fertigare you following the hca slack.. There seems to be a million channels which do you recommend?

Elana Fertig (12:52:47): > Hard to know.

Elana Fertig (12:52:55): > For you latent spaces and multiplication

Elana Fertig (12:53:04): > Multiomics

Aedin Culhane (15:01:48): > thanks@Elana Fertigbut I can’t find that channel. Can you invite me?

Elana Fertig (15:02:04): > sorry — it’s on the CZI initiative one

Elana Fertig (15:02:05): > not HCA

Aedin Culhane (16:00:44): > I’m not on the CZI slack. Is it closed? Or can you sent an invite?

Raphael Gottardo (20:01:48): > @Martin MorganDo we have call info for Thursday?

Kasper D. Hansen (20:08:04): > Yes and we got an invite

Raphael Gottardo (20:08:47): > I don’t think I got the invite. Can you forward it to me? I I will try to join. I am not 100% sure I can yet.

Martin Morgan (22:05:10): > forwarded to your fhcrc address; our working noteshttps://docs.google.com/document/d/12q13l1DiCXc8GcglZD07O_nmrG9UGZUhSCyDQqFzYpk/edit?usp=sharingwith a link to their agenda

2018-08-15

Raphael Gottardo (13:14:37): > Thanks!

Vince Carey (17:03:14): > I do not see an invite in my mbox. Hints on how to search properly or a link would be appreciated.

Martin Morgan (17:11:19): > forwarded under ‘Organizing a group call’

Vince Carey (17:15:20): > thanks!

Mike Jiang (22:47:19): > Is this the call linkhttps://czi.zoom.us/j/464345375? As I didn’t receive the email either - Attachment (Zoom Video): Join our Cloud HD Video Meeting now > Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom Rooms is the original software-based conference room solution used around the world in board, conference, huddle, and training rooms, as well as executive offices and classrooms. Founded in 2011, Zoom helps businesses and organizations bring their teams together in a frictionless environment to get more done. Zoom is a private company headquartered in San Jose, CA.

2018-08-16

Vince Carey (05:08:56): > that looks right but i will forward the email to you@Mike Jiang

Aaron Lun (05:50:15): > I might be a bit late (~10 min), depends on how fast a previous meeting finishes up.

Martin Morgan (08:56:55): > @Mike Jiangyes

Kasper D. Hansen (09:00:51): > Its in 2h right? I am getting confused by all the talk this far in advance …

Davide Risso (09:32:21): > can you forward the email to me as well? in the remote case that I can get out of my other meeting…

Davide Risso (09:32:50): > and yes it’s at 11am

Martin Morgan (09:49:06): > @Keegan Korthauer(are you the right person?) is there a bullet point for Aim 3 athttps://docs.google.com/document/d/12q13l1DiCXc8GcglZD07O_nmrG9UGZUhSCyDQqFzYpk/edit?usp=sharing

Keegan Korthauer (10:02:39) (in thread): > I’ll ping@Rafael Irizarryabout this - thanks!

Martin Morgan (10:04:31) (in thread): > ok thanks; our meeting is at 11 this morning, so… (I’ll send an invite to you in case you’re able to attend)

Stephanie Hicks (11:31:05): > sorry if I missed this. what is the nov meeting? Are any of us going?

Kasper D. Hansen (11:38:41): > I found a SGC genomics meeting at the Broad at end of Oct. Registration closed

Kasper D. Hansen (11:39:00): > I am assuming what they are saying is that they expect a HCA meeting right next to this general SC meeting

Vince Carey (12:05:12): > Nice discussion. Apropos “R scripts” as an approach to getting regular builds on substantial resources, I hope that a github repo can be established for the purpose of contributing/maintaining these scripts and docker files. I don’t know that it is appreciated that the compute engine must run the appropriate version of R, the devel branch of Bioc and the up-to-the-minute versions of all relevant infrastructure. Presumably the dockerfiles will encode all these constraints.

Aaron Lun (12:05:46): > @Stephanie HicksI’m going to the SGC meeting. Though just because I go where I’m sent.

Kasper D. Hansen (12:06:44): > Yes, I can help setup the docker image

Kasper D. Hansen (12:06:55): > It will be useful for other reasons as well

Vince Carey (12:07:07): > And make a github repo and invite us … I agree that it will be a good resource in its own right.

Kasper D. Hansen (12:07:23): > Not that I am a docker expert but I have people around me who are (like James Taylor)

Aaron Lun (12:07:46): > I can throw in a couple of scripts. With enough (~10) cores it should take < 1 day to finish for the 1M cells

Kasper D. Hansen (12:08:09): > We’ll let them pay for it:slightly_smiling_face:

Kasper D. Hansen (12:08:21): > We just want it to be fast eventually

Kasper D. Hansen (12:08:29): > I mean even faster

Aaron Lun (12:09:06): > The current bottleneck is…

Aaron Lun (12:09:07): > the PCA.

Aaron Lun (12:10:50): > Putting this up again, for those of you who have forgotten:https://github.com/Bioconductor/TENxBrainAnalysis - Attachment (GitHub): Bioconductor/TENxBrainAnalysis > TENxBrainAnalysis - R scripts for analyzing the 1.3 million brain cell data set from 10X Genomics

Aaron Lun (12:11:04): > … which I no longer have write access to.

Martin Morgan (12:13:57): > LTLA has write access to the github repos

Aaron Lun (12:14:08): > oh.

Aaron Lun (12:14:37): > I assumed that got cleaned out when the permissions got reset, given that I’m not a Bioc org member anymore.

Stephanie Hicks (13:27:38): > so besides the Nov meeting, do we have other “deadlines” that we’ll need to be thinking about? e.g. in March? I know@Martin Morganasked about timelines to Marcus, but it wasn’t clear to me what all was happening at the end of this project.

Aaron Lun (13:28:03): > ¯*(ツ)*/¯

Aaron Lun (13:28:20): > I haven’t heard anything from John about this either.

Stephanie Hicks (13:31:04): > Another question@Davide Rissoand I were talking about this yesterday: is there a plan to create one R package for the algorithms in Aim 4 (https://docs.google.com/document/d/12q13l1DiCXc8GcglZD07O_nmrG9UGZUhSCyDQqFzYpk/edit#heading=h.xw66qghh66k7)?

Aaron Lun (13:38:00): > I think one R package would be a bit much

Aaron Lun (13:38:14): > Makes more sense to have one package for each class of algorithms

Aaron Lun (13:41:29): > easier to maintain and manage, and also for users to choose what they need

Davide Risso (14:52:25): > Are there some notes / a summary of today’s meeting?

Martin Morgan (14:55:38): > I think the agenda and linked pages, although these were only updated a little during the call.https://docs.google.com/document/d/1rxOzl5b_ihJztop9hRyhUfa9V3wSSIwP7abZzxvxrXU/edit?usp=sharing

Kasper D. Hansen (14:58:20): > My plan is to start by making an R package for the PCA. Not that I am against being part of a bigger thing, but I am predicting a lot of work before it is production ready

Kasper D. Hansen (14:58:34): > and yes@Aaron LunI think of the PCA almost daily these days

Davide Risso (15:09:01): > thanks@Martin Morgan!

Marcus Kinsella (17:04:31): > @Marcus Kinsella has joined the channel

2018-08-28

Stephanie Hicks (12:50:09): > could someone send me a link the hca-czi meeting happening in Nov at Broad?

Stephanie Hicks (13:06:55): > the only meeting I can find is this one:http://www.weizmann.ac.il/conferences/SCG2018/, but I thought the CZI was having a satellite meeting?

Marcus Kinsella (13:09:01): > i think it’s an HCA general meeting nov 1-2

Stephanie Hicks (13:09:31): > do you happen to have a link?

Marcus Kinsella (13:10:27): > i do not, and i unfortunately doubt the existence of one

Stephanie Hicks (13:11:11): > do you know who I might contact about how to get registered?

Marcus Kinsella (13:13:13): > i’d ask jonahjcool@chanzuckerberg.com. it’s not a czi meeting, but he usually knows who you’re supposed to talk to

Stephanie Hicks (13:13:29): > awesome, thanks@Marcus Kinsella

2018-08-29

Kasper D. Hansen (14:34:56): > Ok Stephanie and I have sent some emails to Jonah about this. To the other PIs here: I took the liberty of suggesting that we provide a short (common to every subaward) update on our work at this meeting…. if some of us actually manage to join

2018-09-05

Martin Morgan (11:42:36): > <!channel>CZI Seed Networks RFAhttps://www.chanzuckerberg.com/science/rfa/seed-networksis posted. Thoughts? - Attachment (The Chan Zuckerberg Initiative): Request for Applications | Seed Networks for the Human Cell Atlas > The Chan Zuckerberg Initiative invites applications for three-year projects to form Seed Networks for the Human Cell Atlas (HCA).

2018-09-07

Martin Morgan (05:57:47): > <!channel>any comments on the CZI RFA?

Aaron Lun (06:00:55): > Well, at least there’s no requirement for gut work.

Stephanie Hicks (06:40:42): > Thank for the reminder about this Martin! yes,@Kasper D. Hansen,@Davide Rissoand I had some discussions about the this a few weeks back. At that time, we knew there would be an other RFA, but not the details. Now we know the details, so I’m curious what they think. I’m very much interested in submitting another bioc application. I’m curious the CZI’s definition of a computer scientist:“At least one Principal Investigator of the Seed Network must be a computer scientist or software engineer.“

Stephanie Hicks (06:42:26): > Also, the RFA seems to stress they want to fund“new networks”of individuals e.g.“”to support the continued growth of nascent projects and to incubate new networks”. It’s not clear to me that we fall into that category. However, a lot of they what they are looking for, we are definitely a good fit.

Stephanie Hicks (06:44:38): > In the eligibility section, it says multiple PIs can apply to join multiple seed network applications. so us submitting a bioc application as a group would not prevent one PI submitting an application on his/her own too. that’s nice.

Stephanie Hicks (06:45:50): > I got the sense they want the applications to be very focused on the biology (not necessarily gut as@Aaron Lunmentioned) e.g.“These projects should establish the HCA as a resource for applications such as clarifying genetic variants associated with disease, cell type-specific drug toxicity, or therapeutic applications”

Stephanie Hicks (06:48:59): > especially the list of examples in the sentence below“Specific examples may include, but are not limited to”

Stephanie Hicks (06:50:43): > But there are a lot of possibilities for what we could propose next after this year’s application. I’m curious to hear others thoughts?

Davide Risso (10:41:12): > I would definitely be in favor of submitting another bioc-wide application if there is interest

Davide Risso (10:42:06): > I think that we could be included in the definition of computer scientist, the way it’s written in the first paragraph of “Seed Networks” is ” including at least one computational biologist or software engineer”

Davide Risso (10:42:38): > I think the definition is broad enough to include people like us

Davide Risso (10:43:43): > I also think that we fall into the definition of “the continued growth of nascent projects”

Davide Risso (10:44:04): > it would be strange if they wouldn’t consider for funding the networks that were part of the pilot phase

Davide Risso (10:44:40): > otherwise, participating in the pilot phase would disqualify candidates for getting the full funding

Davide Risso (10:44:54): > I think they use this language to encourage PI’s that submitted individual applications to submit proposals together in the spirit of the groups that they created in Aptos

Davide Risso (10:45:45): > but our group was a bit of an outlier since we formed it independently

Raphael Gottardo (10:46:08): > I agree, I think we could and should submit a renewal. I think we would need to make it very HCA specific (i.e. why is it important for HCA).

Kasper D. Hansen (11:01:21): > Re. “computer scientist” - it is referred to as “computational biologist” elsewhere on the RFA. It is therefore a bit unclear what exactly they mean.

Kasper D. Hansen (11:02:16): > I also thing we could and should. I do think there is some chance that they’ll prefer to fund teams which include data generators, but I think we can make a good case for being important infrastructure

Marcus Kinsella (16:04:34): > hey if you have questions, you can emailsciencegrants@chanzuckerberg.com, somebody should respond pretty quickly

Raphael Gottardo (16:07:01): > Thanks@Marcus Kinsella@Martin MorganDo you want to take the lead on this?

2018-09-14

Raphael Gottardo (11:56:27): > @Martin MorganHave you been thinking of the RFA and getting some info?

Martin Morgan (16:40:29): > yes I’ve been thinking, I’ll reach out now..

Raphael Gottardo (18:12:00): > Ok, thanks Martin.

2018-09-18

Stephanie Hicks (19:03:12): > @Kasper D. Hansenand I emailed CZI to ask a few Qs about the Seed Network RFA. > > 1) The RFA is pretty clear that a team has to include computational biologists. Can a team be exclusively computational. > 2) How big do you envision the teams to be - ie. are you thinking of funding a few large teams or many small teams? > 3) Are you envisioning each network to be tissue specific or can it be cross-tissue analyses? > 4) Does each network have to advance all 4 of the overarching scientific goals (build teams, contribute data, support comp bio, develop new tech and benchmark datasets) or “just” some of them? > 5) Are you willing to give feedback on a short overview of some ideas? (It is no uncommon for NIH PO to respond to a specific aims page prior to review). > > FWIW, these were their responses: > > For (1): Yes, a Seed Network can be entirely computational. However, the goals of the Seed Networks should be considered and a grounding on the progress of the HCA is critical - so a computational focus that is not tied to the biological goals of the HCA may not be ideal. There is no requirement for a non-computational lab. > > For (2): Teams must consist of 3 or more co-principal investigators (co-PIs). In SMApply (the application portal) the applicant can list up to 15 co-PIs More than 15 applicants will require you to contact us atsciencegrants@chanzuckerberg.com. > > For (3): There is no requirement for a single tissue of interest - if you look at the example projects (https://www.chanzuckerberg.com/science/rfa/seed-networks), they touch on tissue-specific questions but also cross-tissue analysis. Both are critical for the progress of the HCA. > > For (4): There is no requirement that a proposal addresses all 4 of the goals, although we encourage proposals to touch on as many as possible. These goals are not presented as an eligibility criteria. > > For (5): We are looking for bold, innovative ideas and high risk/high impact projects and strongly encourage applicants to think creatively and ambitiously. To be fair to all applicants, we are not able to offer feedback on proposals in advance of submission. - Attachment (The Chan Zuckerberg Initiative): Request for Applications | Seed Networks for the Human Cell Atlas > The Chan Zuckerberg Initiative invites applications for three-year projects to form Seed Networks for the Human Cell Atlas (HCA).

2018-09-24

Kim-Anh Lê Cao (21:46:01): > @Kim-Anh Lê Cao has joined the channel

2018-10-04

Levi Waldron (12:55:59): > @Levi Waldron has joined the channel

Martin Morgan (16:42:00): > If you’re interested in participating in a Bioconductor ‘Seed Network’ proposal please take a second to complete the doodle poll for a meeting next week.https://doodle.com/poll/r4dr8ygwd2upein8 - Attachment (doodle.com): Doodle: Bioc / CZI Seed Network > Doodle radically simplifies the process of scheduling events, meetings, appointments, etc. Herding cats gets 2x faster with Doodle. For free!

Martin Morgan (16:42:59): > Oops, should have tagged the doodle meeting poll<!channel>

Stephanie Hicks (20:31:55): > Thanks martin! If the 10th only works for most people, i’m happy to try and move meetings.

2018-10-08

Stephanie Hicks (05:48:57): > @Martin Morgandid we decide on a time for this week? Oct 11th from 12-1pm EST has most votes?

Martin Morgan (22:48:46): > <!channel>Oct 11 12-1;https://bluejeans.com/742063948

2018-10-09

BJ Stubbs (13:56:34): > @BJ Stubbs has joined the channel

2018-10-14

Vince Carey (07:54:36): > sorry to have missed the call, is there an agenda or minutes document?

Martin Morgan (08:38:18): > https://docs.google.com/document/d/1aK8ZjGHVRoaojWb_SnfNCVltIbsVwQb7Cb5RoY50ffQ/edit?usp=sharing

2018-10-15

hcorrada (08:10:28): > Awesome! I am looking forward to working on viz item with@Stephanie Hicksand@Davide Risso!

hcorrada (08:11:53): > Also, who would be on the imputation item? I have a student thinking about this and could help

Stephanie Hicks (08:12:42): > @hcorradaDavide, Elizabeth and I had that in our original czi proposal last year (but it got cut when our budget got cut).

Stephanie Hicks (08:12:53): > so I mentioned it as a possibility of something we could work on for this proposal

Stephanie Hicks (08:12:59): > would love to work with you on this

hcorrada (08:13:07): > Great!

hcorrada (08:13:53): > I think we’re meeting for palmtree soon, we could make it next week? We can discuss both items then

Stephanie Hicks (08:26:41): > that works. Did we settle on a time? If not, we problem should coordinate soon with@Davide Rissobecause of time diff

Davide Risso (08:29:38): > happy to discuss both items next week! Any time in the AM EST should work for me!

hcorrada (08:30:30): > Awesome. I’ll be in touch to setup meeting

Kevin Rue-Albrecht (14:57:53): > @Kevin Rue-Albrecht has joined the channel

2018-10-17

Raphael Gottardo (23:51:24) (in thread): > I have added a few comments to the doc.

2018-10-18

Federico Marini (05:11:08): > @Federico Marini has joined the channel

Kasper D. Hansen (14:31:54): > HCA GENERAL MEETING, CAMBRIDGE, MA, NOVEMBER 1-2, 2018 > Hosted by the Broad Institute of MIT and Harvard. > > This meeting will focus on the work of the networks to build atlases of key organs including lung, gut, brain, immune, heart, vasculature, kidney and liver. We will hear updates on progress in data collection and analysis, discuss plans for organ-wide mapping, provide updates on the activities of the HCA Working Groups and hear about the Data Coordination Platform. > > While participation in person in the HCA General Meetings is by invitation, all are welcome to register here (https://docs.google.com/forms/d/e/1FAIpQLSfhqe30vUOhNOFKZDBJ8EZqAufPb0p3IRNhh3P4wXKTAR7Wcw/viewform) to join remotely. They will be listed as formal attendees. We will also be live streaming all plenary sessions and breakout sessions. > > Any inquiries regarding this meeting should be sent tomeetings@humancellatlas.org. (edited) > > For more information, please visithttps://www.humancellatlas.org/news/27

Kasper D. Hansen (14:32:23): > This was posted in the HCA slack channel today. This is the meeting we have heard about elsewhere

2018-10-24

Gabriele Sales (05:54:09): > @Gabriele Sales has joined the channel

2018-10-27

Martin Morgan (13:30:11): > I’ve started to outline the proposal athttps://docs.google.com/document/d/1aK8ZjGHVRoaojWb_SnfNCVltIbsVwQb7Cb5RoY50ffQ/edit?usp=sharingPlease take the current text (title and abstract) with a grain of salt. If you’re interested in participating, we need to develop an overall budget ASAP, and to start to enumerate areas where we wish to contribute. There are highlighted sections to be completed in the proposal; please do so by MONDAY OCTOBER 29

2018-10-29

Stephanie Hicks (16:14:33): > Thanks Martin! I’ve added a section on interactive graphics

Aedin Culhane (17:27:48): > For visualization.. they have included morphesus from the broad for heatmaps.. It can be run through R Shinyhttps://integration.data.humancellatlas.org/analyze/visualization-components/visualization-components/

Federico Marini (17:35:38): > :wave:joining as a newcomer the whole discussion, also in the name of theiSEEdevel team

Federico Marini (17:36:25): > Would the package be worth considering in this scope?

Federico Marini (17:37:19): > in brief: takeanySummarizedExperiment object, view different panels of its data, with plots and tables linked by brushing, and a spray of reproducibility with self-generated code

Stephanie Hicks (20:10:03): > oh cool! thanks@Aedin Culhane

Stephanie Hicks (20:10:14): > I think that’s a great idea@Federico Marini

2018-10-30

Charlotte Soneson (13:37:19): > @Charlotte Soneson has joined the channel

2018-10-31

Jayaram Kancherla (09:34:53): > @Jayaram Kancherla has joined the channel

Raphael Gottardo (16:54:42): > @Rob Amezquitawill be helping writing our part on clustering RNA-seq and CITE-seq

Rob Amezquita (16:54:45): > @Rob Amezquita has joined the channel

Raphael Gottardo (16:55:32): > @Rob AmezquitaSee google doc above. I have also posted some questions on the#singlecellexperimentchannel regarding multi-omics data structure for single-cell data.

2018-11-11

Stephanie Hicks (10:14:20): > @Martin MorganI added one paragraph under Goal 3 for data visualization

Stephanie Hicks (10:16:38): > unfortunately, I won’t be able to make the blue jeans meeting on Monday, but happy help contribute more writing.

2018-11-12

Stephanie Hicks (07:35:04): > @Martin Morganmy meeting was canceled!:blush:I’ll be there at 11am EST

Kasper D. Hansen (08:49:43): > Im teaching at that time.

Stephanie Hicks (11:55:35): > @Martin Morgandemo to use for iSEEhttps://marionilab.cruk.cam.ac.uk/iSEE_pbmc4k/

Stephanie Hicks (11:55:45): > andhttps://community-bioc.slack.com/archives/C8BJLSP8T/p1542041651304800 - Attachment: Attachment > Here is the home page: https://marionilab.cruk.cam.ac.uk There are other two examples, including a CyTOF example to show off with measurements for 172,791 cells

Stephanie Hicks (11:56:13): > Maybe we include a screenshot of the pbmc4k iSEE demo?

Federico Marini (12:13:53): > FWIW, I did take a video of that. No voice, just mouse movements to follow the tour

Kevin Rue-Albrecht (12:21:16): > FWIW2, i forgot to finish my thought on the#iseechannel. [The live demos are probably the most straightforward toys…], but I’ve also made Docker containers of them here, to make a point about reproducibility and portability:https://hub.docker.com/u/kevinrue/

Stephanie Hicks (12:38:41): > Thank you@Kevin Rue-Albrecht!

Stephanie Hicks (12:43:40): > @Davide Risso@Aedin Culhane@Martin Morganhere is my commitment letter

Davide Risso (13:06:10): > Thank you@Stephanie Hicks!

Stephanie Hicks (15:16:42): > you’re welcome@Davide Risso! no idea if that’s actually what they want. but at least it emphases mycommitment:joy:

Kasper D. Hansen (16:12:06): > And you’re even using our new letterhead. Have you made it into an actual template?>

Aedin Culhane (19:39:31): > Did you see Gary Bader’s paper on F1000https://f1000research.com/articles/7-1522/v1 - Attachment (f1000research.com): F1000Research Article: scClustViz – Single-cell RNAseq cluster assessment and visualization. > Read the latest article version by Brendan T. Innes, Gary D. Bader, at F1000Research.

2018-11-13

Stephanie Hicks (05:59:52): > Cool thanks!

Aaron Lun (06:00:53): > I think@Brendan Innesis floating around somewhere.

Davide Risso (06:09:39): > @Stephanie Hicksfyi in the letter should be “Chan Zuckerberg Initiative” not institute

Davide Risso (06:10:20): > I know because I was shamelessly copy-pasting:grimacing:

Stephanie Hicks (06:32:54): > oops thanks!

Stephanie Hicks (06:35:23): > @Martin MorganI put together this figure:https://docs.google.com/presentation/d/1kyK4HqMG9V7b3UDWu1D_Q84CnIxbx4Y45m3mp-TAEkQ/edit?usp=sharingin my sleepy state last night it somehow seemed better, but oh well! Feel free to edit & use it or not. totally fine either way.

Stephanie Hicks (06:35:42): > One comment from@Davide Risso: “One small comment on the figure, I would replace thevignette()statement from the scalability section (people may not even know what a vignette is) and I would add apryr::object_size()instead to show that it doesn’t use much RAM”

Stephanie Hicks (06:37:13): > @Martin Morganif you run that code chunk and copy and paste with output in google doc i can add it

Stephanie Hicks (06:39:47): > @Martin Morganfollowing@Davide Risso’s correction, can you use this letter of commitment instead of the one I previously sent you? - File (PDF): 2018-CZI-Bioconductor-Hicks-LetterofCommitment.pdf

Martin Morgan (06:56:45) (in thread): > > > pryr::object_size(tenx) > 200 MB > > But I don’t know if that’s impressive or not; the bulk of the data are actually the column names! > > > pryr::object_size(colnames(tenx)) > 115 MB > > pryr::object_size(colData(tenx)) > 195 MB >

Martin Morgan (06:57:28) (in thread): > can you send this to me by email?

Stephanie Hicks (07:34:12) (in thread): > sent

Stephanie Hicks (07:38:59) (in thread): > updated!

Stephanie Hicks (07:52:10): > the subheadings look soooo much better@Aedin Culhanethank you!

Federico Marini (11:52:17): > @Stephanie Hicksnot so essential, but I spotted a typo in the slide

Federico Marini (11:52:33): > the final line in panel D should readpryr::object_size

Martin Morgan (12:05:38): > the proposal body is down to 1967 words, so please review especially your sections and correct only egregious problems…

Martin Morgan (15:52:29): > I’ll take a snapshot of the HCA document now, for final off-line tweaks before submission

Martin Morgan (16:35:46): > The submit button has been pressed, thanks for the effort on this!

Federico Marini (16:40:02): > Even if only taking part from the sideline, well done:wink:

Stephanie Hicks (18:00:08): > @Martin Morganjust a heads up. Apparently the CZI portal is down.

Kasper D. Hansen (18:01:34): > He has already submitted

Stephanie Hicks (18:02:34): > :sweat_smile:

Martin Morgan (18:45:38): > I mean, where do these so-called emoji come from?

Stephanie Hicks (19:33:53): > Lol I’m betting@Leonardo Collado Torres

2018-11-14

Aedin Culhane (12:33:48): > Thanks for all of your hard work Martin

Aedin Culhane (16:03:22): > Not sure where to post this, but resources for images/trees on cell lineages/predicted gene lists for cell types etchttp://tcs.cira.kyoto-u.ac.jp/research-e.html

Stephanie Hicks (22:12:04): > Thank you@Martin Morgan!!

2018-12-04

Aaron Lun (12:32:43): > @Davide Rissoadd “human-cell-atlas” tag to SingleCellExperiment.

Davide Risso (13:09:27): > Done! Added bioconductor too since github suggested it…

Aaron Lun (13:10:19): > I added “bioconductor-package”, which might be a bit more on-point.

Aaron Lun (13:11:10): > probably should also add “single-cell-rna-seq”

Aaron Lun (13:17:30): > I just noticed that mbkmeans purports to do nearest-neighbors. You know we have BiocNeighbors? Consider concentrating k-NN algorithmic development there.

Aaron Lun (13:22:22): > In particular, more approximate alogirhtms would be appreciated.

2018-12-05

Davide Risso (07:03:59): > Yeah, that’s an old purpose… we decided to make it a more focused package only on k-means

Davide Risso (07:05:51): > I changed the readme file yesterday (I think) to reflect this. Is there anywhere else where we say that the purpose is nearest neighbors?

Davide Risso (07:11:25): > Oh I see… thedescription! Fixed now!

2019-01-23

Martin Morgan (09:09:21): > <!channel>I think the initial funding period runs until the end of February. I think we should hold a virtual symposium (maybe 15 minute presentations from each group, with round-table discussion at the end?) highlighting our accomplishments. This would be a good opportunity to make the successes more widely known. I’ve set up a doodle poll and invite your participation; I think we have east & west coast and European participation, so plausible times are probably in the 11 - 2 time frame. We can advertise this in the HCA community more broadly when the schedule firms up just a bit…https://doodle.com/poll/vhnh5xh6iuigeacz. Please let others on your team know… - Attachment (doodle.com): Doodle: Bioc HCA Symposium > Doodle radically simplifies the process of scheduling events, meetings, appointments, etc. Herding cats gets 2x faster with Doodle. For free!

Stephanie Hicks (09:10:22): > Great idea Martin!

Aaron Lun (09:10:47): > Done.

Kasper D. Hansen (09:15:24): > Speaking of which do anyone know if there is a mechanism of no cost extension

Marcel Ramos Pérez (10:37:10): > @Marcel Ramos Pérez has joined the channel

Marcus Kinsella (11:06:57) (in thread): > Yes there is. I’d ping Jonah or Fiona in cziscience slack or email

2019-01-25

Aedin Culhane (12:09:50): > Hi. Which datasets are good for estimating the expected stochastic variability in scRNAseq. So given a population of cell of the same type, if we identify two populations, are they “real” or just an artifact of noise in the expt. Is there a good replicate/gold standard dataset for estimating expected variability?

2019-01-29

Martin Morgan (17:06:45): > <!channel>The symposium has been scheduled for 12 - 2pm Eastern time, Wednesday 20 February. > > Meeting URL:https://bluejeans.com/272538961Schedule:https://docs.google.com/document/d/1xjUWJ5-WLFyuAmDPQrHNIzD_gkD4IVxquwZlJrdFCG8/edit?usp=sharingPlease edit the schedule to include your presentation. Feel free to share the bluejeans link with your colleagues. Please indicate intention to attend by clicking on the ‘thumbs up’ emoji (or not-too-ambiguous emoji of your choice) below.

2019-01-30

Kim-Anh Lê Cao (01:58:59) (in thread): > @Aedin CulhaneWe may have data for you:https://www.biorxiv.org/content/10.1101/433102v2we are pushing a new update of this manuscript

2019-01-31

Aedin Culhane (07:40:15): > Thanks

Martin Morgan (12:14:23): > What would we need to do to identify and use a real cohort from the HCA, using Bioc tools developed during this funding? Maybe as a first pass identify ‘manually’ or outside of R data that would be interesting, and then figure out how to express this using HCABrowser retrieve using HCAMatrixBrowser, ingest using SingleCellExperiment / LoomExperiment etc…

Kasper D. Hansen (12:18:07): > Let me see if I can parse this. You’re suggesting showing off a workflow where we do something like this on the 20th? Or is this more long range?

Martin Morgan (13:43:03): > yes, illustrating this on the 20th

2019-02-07

Davide Risso (11:49:59): > Hi everyone, perhaps it’s just me, but I wasn’t immediately clear on the goal and target audience of the February 20th symposium.

Davide Risso (11:50:08): > I think that Martin’s idea is to advertise it broadly as a way to show what we have done for this round of funding.

Davide Risso (11:50:34): > I have the feeling that, for a variety of reasons, we cannot yet deliver a convincing start-to-end workflow for large single-cell data analysis. For instance, there are some issues with subsetting and random access of HDF5 files (see Levi’s comments on the#bigdata-repchannel) that make a lot of the analysis steps sub-optimal.

Davide Risso (11:50:49): > I also think that we are very close to have a satisfactory workflow and I’m afraid that rushing this at the end of the month will make it look as an incomplete effort.

Davide Risso (11:51:01): > In our specific case, we have a working clustering package, but we still have to perform extensive tests and benchmarking both in terms of accuracy and speed/efficiency.

Davide Risso (11:51:23): > Personally, I think that an internal round-table of what we have accomplished so far as a group is very valuable and might be a first step to perhaps produce a more polished final report on the project (perhaps to be delivered at Bioc2019?) that would be closer to the final deliverable of the Bioconductor group.

Davide Risso (11:51:31): > I wanted to at least discuss this before the 20th and see what others think.

Aaron Lun (12:44:17): > An end-to-end may be doable, if someone has the clustering step fixed. I spent some time working on Rtsne to make it scalable.

Aaron Lun (12:44:34): > And I’m helping out on the uwot package.

Davide Risso (12:59:52): > We have a working clustering method but not extendively tested

Davide Risso (13:00:00): > Or benchmarked

Aaron Lun (13:03:25): > Does it run fast enough is all I really care about ATM.

Davide Risso (13:11:59): > Minibatch k-means is fast, but if you need to extract the mini batches from hdf5 because the data don’t fit in memory you pay a big prize

Davide Risso (13:12:34): > Perhaps we don’t care, but this is the kind of things that are not optimized yet and can make a big difference imo

Aaron Lun (13:20:23): > You’ll be working on the PCs, so there’s no problem there.

Aaron Lun (13:20:31): > Everything is in memory at this point.

Davide Risso (13:26:58): > I only tested it today but it looks irlba has the same tradeoff… it’s much slower for hdf5 data than for in memory data, isn’t it?

Davide Risso (13:27:54): > Anyway, I’m working on a complete example and I may have more specifics when that’s done

Davide Risso (13:28:20): > That is scran norm + pca + minibatch kmeans

Aaron Lun (13:30:22): > Of course, but we have to do the PCA anyway. Randomized SVD only involves a handful of matrix multiplications, throw them across 10 cores and it’ll be done in half an hour.

Martin Morgan (14:56:26): > Returning to Davide’s question, I think it would be valuable to talk to more than ourselves, so advertise at least a little that the symposium is happening. I think this will be good scientifically (exposure to more potential use cases), building our ‘user base’, and politically (showing the outside world that we have been making progress).

Raphael Gottardo (16:35:38): > I agree with@Martin Morgan, everything is work in progress, but that’s when we should be getting feedback.

Kasper D. Hansen (20:13:02): > Depends on whether the purpose is feedback or (political) show off

2019-02-08

Martin Morgan (08:54:52) (in thread): > Either way don’t you want to talk to an audience? I view this as a very positive opportunity

Kasper D. Hansen (09:25:48) (in thread): > I think@Davide Risso’s point is that if we do this primarily for political reasons, we should make sure we have something to show off.

Martin Morgan (09:27:08) (in thread): > To me the idea is to show off our stuff, to ourselves and whoever will listen; ‘political’ is a distraction here

Stephanie Hicks (16:34:18) (in thread): > @Martin Morganis the idea that the symposium is made publicly available? or what audience were you thinking?

Stephanie Hicks (16:40:31): > @Martin Morgan@Raphael GottardoI agree that a symposium is a wonderful idea, but it’s not clear to me if we should be aiming to show a final product or just a progress update? Could you clarify if the goal is to show a final product or just generally describe what progress we have made? I think the concern for me@Davide Rissoand@Elizabeth Purdomis that while we have a working clustering method, we are running into problems with extracting data from hdf5 files. Also, we haven’t really benchmarked or extensively tested it.

Stephanie Hicks (16:41:14): > Would it be sufficient to just present the mini-batch clustering method, but say that we are still working on how to most efficiently extract data from HDF5 files?

Raphael Gottardo (17:10:50): > Work in progress. I think it might be a good way to show that we’ve done quite a bit already but more work/funding is needed. So I think it’s perfectly reasonable to present it this way.

2019-02-13

Vince Carey (06:29:09): > @Stephanie Hickswould you give an example of the problems extracting data from hdf5 files? I am aware of the general material in bigdata-rep.

Mike Smith (08:10:21): > @Stephanie HicksTo echo Vince, I’m happy to discuss possible strategies for optimising either the structure of your HDF5 files, or how to apply rhdf5. Feel free to open an issue on your own Github repo and tag@grimbough(like the MSnbase guys did athttps://github.com/lgatto/MSnbase/issues/403) or open an issue athttps://github.com/grimbough/rhdf5/issues

Davide Risso (08:28:11): > Well, I think that what@Stephanie Hicksis referring to is very similar to what Levi was showing in#bigdata-rep

Davide Risso (08:29:27): > After subsetting and taking the log, it takes forever to run our clustering function on theDelayedMatrix, but if werealizethe matrix in a newHDF5Matrixand apply the same function it’s much faster

Davide Risso (08:30:52): > I will try to write a small general example and create an issue somewhere (possibly in our own repo), but what was worrying me the most is what Levi observed on extracting random subsets from a HDF5 matrix, since our method relies heavily on it.

Aedin Culhane (15:06:35): > Has anyone triedhttps://github.com/KlugerLab/FIt-SNEwhich is implemented with an R rapper. Paper describing it in Nature Methodshttps://www.nature.com/articles/s41592-018-0308-4.epdf(Sorry for my absence… I had the flu) - Attachment (nature.com): Fast interpolation-based t-SNE for improved visualization of single-ce > FIt-SNE, a sped-up version of t-SNE, enables visualization of rare cell types in large datasets by obviating the need for downsampling. One-dimensional t-SNE heatmaps allow simultaneous visualization of expression patterns from thousands of genes.

Aaron Lun (16:29:44): > We had a discussion about this in#bigdata-rep.

2019-02-14

Martin Morgan (12:55:16): > <!channel>I’ve updated the schedule, and emphasized the ‘work in progress’ nature of things. Please review, and consider adding links to yourself and slides / other presentation aids.https://docs.google.com/document/d/1xjUWJ5-WLFyuAmDPQrHNIzD_gkD4IVxquwZlJrdFCG8/edit?usp=sharing

Kayla Interdonato (14:12:23): > @Kayla Interdonato has joined the channel

2019-02-15

Lori Shepherd (09:15:00): > @Lori Shepherd has joined the channel

Aedin Culhane (10:47:49): > Nice visualization that might be nice for scRNAseqhttps://www.theguardian.com/politics/ng-interactive/2019/feb/15/how-brexit-revealed-four-new-political-factions - Attachment (the Guardian): How Brexit revealed four new political factions > Analysis of Commons voting patterns shows how Europhobe and Europhile rebels from both main parties are forming new parliamentary blocs

Aedin Culhane (10:49:52): > Canvas and d3.http://bl.ocks.org/Jverma/70f7975a72358e6d69cdd4bf6a0569e7. Is there an R/bioconductor implementation - Attachment (bl.ocks.org): Canvas + D3: 20K points with mouseover > Janu Verma’s Block 70f7975a72358e6d69cdd4bf6a0569e7

2019-02-18

Aaron Lun (12:57:24): > @Martin MorganDid you want to promote this on the support site or Bioc-devel?

Martin Morgan (13:11:36): > Thanks for the nudge@Aaron LunI announced on slack, bioc-devel, and the cziscience slack (I’m not really sure if that slack is still active…). > > I changed the permissions associated with the public link to ‘comment’, so if there are those who would like to add links to slides / etc please let me know with your email and I’ll enable your account with edit permissinos.

Martin Morgan (13:12:10): > I tried to emphasize that this is work in progress, so expectations are not inflated and that we’re all comfortable talking about where we’ve got to.

Laurent Gatto (13:25:20): > @Laurent Gatto has joined the channel

Diego Diez (21:14:54): > @Diego Diez has joined the channel

2019-02-19

Ruizhu HUANG (06:01:49): > @Ruizhu HUANG has joined the channel

Catalina Vallejos (08:19:33): > @Catalina Vallejos has joined the channel

Philipp Wahle (11:45:19): > @Philipp Wahle has joined the channel

Adonis Cedeno (11:55:54): > @Adonis Cedeno has joined the channel

Johnson Zhang (13:17:01): > @Johnson Zhang has joined the channel

Aedin Culhane (14:21:18): > Is Bioconductor part of the Horizon 2020 HCAhttps://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/sc1-bhc-31-2019 - Attachment (ec.europa.eu): Funding & tenders > The Funding and Tenders Portal is the single entry point (the Single Electronic Data Interchange Area) for applicants, contractors and experts in funding programmes and procurements managed by the European Commission.

Aedin Culhane (14:21:42): > Pilot actions to build the foundations of a human cell atlas > ID: SC1-BHC-31-2019 > Type of action: > RIA Research and Innovation action > Deadline Model : single-stage > Opening: 26 July 2018 > Deadline: 16 April 2019 17:00:00 Brussels time

Aedin Culhane (14:24:29): > Also there is a webinar series for the HCA onhttps://www.protocols.io/webinars - Attachment (protocols.io): protocols.io webinars > Still testing protocols.io’s functionalities and would like to learn more about the platform, feel free to join our 45-minute webinar

Aedin Culhane (14:25:20): > Human Cell Atlas Method Development Community > MAR 5, 2019 AT 10:00 AM PST

Aedin Culhane (14:25:33): > This was announced on the general HCA slack channel

2019-02-20

Kasper D. Hansen (10:19:01): > Just FYI for today: we have a snow storm in Baltimore. I expect to be on the call, but I might have to only be there part time. Also, connecting from home so hopefully that’ll work

Stephanie Hicks (10:21:41): > Yes, i will be on the call, but only for a portion of the time. Trading off juggling kids with my husband today.

Aaron Lun (10:58:41): > I have no kids and the weather is nice, so I’ll be there all of the time.

Kellie Kravarik (11:36:45): > @Kellie Kravarik has joined the channel

Tim Triche (11:45:33): > @Tim Triche has joined the channel

Tim Triche (11:46:57): > will the bluejeans link be added to the Google docs agenda before the proceedings begin?

Martin Morgan (11:47:39): > yep! in about 5 minutes…

Tim Triche (11:49:27): > :thumbsup:

Tim Triche (11:49:32): > thanks for organizing this

Martin Morgan (14:41:17): > A big thanks to everyone for participating in our virtual symposium today. We’ll try to get a slightly edited version of each talk available in the coming days.

Stephanie Hicks (14:41:53): > Thank you@Martin Morganfor organizing!

Kasper D. Hansen (14:46:14): > thanks, I enjoyed the presentations

Federico Marini (14:57:57): > Would it be worth to collect the presentations and assemble them in a dedicated page?

Martin Morgan (14:59:48): > Yes, we’ll collect the presentations…

2019-03-01

Mike Smith (03:27:47): > I’ve just been reminded about the reports we have to submit by the end of the month, and apparently we’ve been provided with the ‘reporting templates’ already. Since we can’t find any mention of them here or the report submission site, I was wondering if anybody else has been sent a template or knows where to track them down.

Kasper D. Hansen (18:14:02): > I have not seen any templates

Kasper D. Hansen (18:14:24): > Davide and I submitted no cost extensions, we just wrote something in google doc

Marcus Kinsella (18:16:37): > there is no template, but if you have questions, you should reach out to Fiona

2019-03-06

Mike Smith (06:12:30): > Great, thanks for the info.

2019-03-26

Kasper D. Hansen (17:07:09): > Following the call I realize that it was unclear to me how much space we have for deliverables.

Martin Morgan (17:19:56): > I’ll check (Friday or Monday) but I think there is space and that we do not have to do much – a table of deliverables x quarter of delivery.

2019-04-03

Kayla Interdonato (10:15:29): > Slides and videos from the virtual symposium have now been added as course materials on the Bioconductor website (https://www.bioconductor.org/help/course-materials)

2019-05-05

Firas (11:02:34): > @Firas has joined the channel

2019-05-16

Stephanie Hicks (20:41:43): > fwiwhttps://chanzuckerberg.com/rfa/essential-open-source-software-for-science/ - Attachment (Chan Zuckerberg Initiative): Coming Soon: Essential Open Source Software for Science > The Chan Zuckerberg Initiative will soon invite applications for one-year open source software projects that are essential to biomedical research. This RFA is the first of a series. CZI will invite applications during three distinct cycles, with rounds beginning June 18, 2019; mid-December 2019; and mid-June 2020.

2019-05-17

Mike Smith (05:54:40) (in thread): > One year feels like such a short amount of time to hire someone for, but I like the initiative.

2019-06-21

Stephanie Hicks (21:18:59): > :tada:https://chanzuckerberg.com/newsroom/czi-awards-68-million-support-growth-of-human-cell-atlas/ - Attachment (Chan Zuckerberg Initiative): The Chan Zuckerberg Initiative Awards $68 Million to Support the Growth of the Human Cell Atlas > The Chan Zuckerberg Initiative’s Seed Networks grants bring together scientists, computational biologists, software engineers, and physicians to support the continued development of the Human Cell Atlas, an international effort to map all cells in the human body.

2019-06-23

Ameya Kulkarni (22:09:41): > @Ameya Kulkarni has joined the channel

Matt Brauer (23:05:45): > @Matt Brauer has joined the channel

2019-06-24

Komal Rathi (09:22:59): > @Komal Rathi has joined the channel

2019-06-26

Junhao Li (13:28:17): > @Junhao Li has joined the channel

2019-07-01

Stephanie Hicks (06:46:13): > @Martin Morgan— quick question. Do we know yet when the award will start? Our original proposal had said June 2019. I’m also happy to reach out to Fiona, but wanted to check here first. Thanks!

Martin Morgan (16:31:45): > my admin says ‘it should be soon’…

2019-11-20

Nolan Nichols (12:01:43): > @Nolan Nichols has joined the channel

Nolan Nichols (12:02:07): > @Nolan Nichols has left the channel

Russ Bainer (12:03:14): > @Russ Bainer has joined the channel

2019-12-06

Somesh (12:21:19): > @Somesh has joined the channel

2019-12-07

Juan Ojeda-Garcia (18:44:56): > @Juan Ojeda-Garcia has joined the channel

2019-12-11

Aaron Lun (11:38:13): > @Aaron Lun has left the channel

2019-12-12

Tim Triche (17:50:42): > @Tim Triche has left the channel

2019-12-24

dylan (12:02:16): > @dylan has joined the channel

2020-01-07

Nitesh Turaga (09:18:35): > @Nitesh Turaga has left the channel

2020-01-23

Charlotte Soneson (04:12:17): > @Charlotte Soneson has left the channel

2020-02-06

Stuart Lee (19:08:43): > @Stuart Lee has left the channel

2020-03-03

Levi Waldron (11:21:16): > @Levi Waldron has left the channel

2020-07-31

Dr Awala Fortune O. (16:27:52): > @Dr Awala Fortune O. has joined the channel

2020-12-12

Huipeng Li (00:40:04): > @Huipeng Li has joined the channel

2021-01-22

Annajiat Alim Rasel (15:44:11): > @Annajiat Alim Rasel has joined the channel

2021-05-11

Megha Lal (16:45:03): > @Megha Lal has joined the channel

2021-07-16

Lori Shepherd (12:43:16): > @Lori Shepherd has left the channel

2021-10-14

Wolfgang Huber (10:24:21): > @Wolfgang Huber has left the channel

2021-10-26

Mike Smith (03:40:56): > @Mike Smith has left the channel

2021-11-08

Paula Nieto García (03:27:37): > @Paula Nieto García has joined the channel

2022-01-28

Megha Lal (11:13:06): > @Megha Lal has left the channel

2022-02-01

Stephanie Hicks (20:25:32): > @Stephanie Hicks has left the channel

2024-02-09

Marcel Ramos Pérez (10:16:04): > archived the channel