#m
2022-10-17
stefano mangiola (02:39:19): > @stefano mangiola has joined the channel
Michael Milton (02:41:53): > @Michael Milton has joined the channel
Martin Morgan (02:41:53): > @Martin Morgan has joined the channel
Alex Mahmoud (02:41:53): > @Alex Mahmoud has joined the channel
Julie Iskander (04:41:36): > @Julie Iskander has joined the channel
stefano mangiola (10:45:40): > Hello Team, > > now that our team is formed and our goals are clearer, I wanted to create this group to communicate across institutes. > > The people involved so far are > * Stefano Mangiola (myself) - Tony Papenfuss Lab > * Michael Milton - Julie Iskander Lab > * Alex Mahmoud - Martin Morgan, Vince Carey Team/lab > * Connie Li - Alex Garnham Lab > * Nishika Kapuruge - Alex Garnham Lab > The proposed goal is to output a Human-Cell-Atlas harmonisation/integration paper by the next 6 months. With the current team, I think it is very possible. After publication, we will be able to keep working on the project and improving it, so we will optimise our work to reach an acceptable (very impactful) stage quickly. > > The working paper is herehttps://docs.google.com/document/d/11zGOnMKONZzAEnfzmf-wriqWIvqAcPx3lpws4MSl20w/edit?usp=sharingThe github code repository (the name will change) is herehttps://github.com/stemangiola/HCAqueryThe project to-do list will be organised herehttps://github.com/users/stemangiola/projects/2Analysis and organisation flowcharthttps://app.mural.co/invitation/mural/covid7029/1666019549891?sender=udbe1ea99c9618abb07196978&key=2e2effcc-f217-4cd6-a0d1-3ccc7f08201cSome of the high-level goals are > * Harmonise and curate cell-type annotation using a multi-method consensus strategy, providing annotation confidence scores > * Integrate the cell-macro type to get integrated PCA and refine harmonised annotations > * Evaluate original cell-type label switching and evaluate heterogeneity of standards and biases > * Distribute integrated datasets through a remote database
Connie Li Wai Suen (10:48:24): > @Connie Li Wai Suen has joined the channel
Alexandra Garnham (10:48:51): > @Alexandra Garnham has joined the channel
Nishika Kapuruge (17:57:02): > @Nishika Kapuruge has joined the channel
2022-10-18
Michael Milton (00:17:18): > Hi all, I’ve been doing some benchmarking to look into the what format we should store the count data in order to efficiently query it with R. In my initial benchmark I compared using the SingleCellExperiment (SCE) class with using H5Array directly, with an SQLite-based approach where I converted the count matrix into a sparse matrix which I stored in an indexed SQLite DB. You can look at my benchmarking code here:https://github.com/WEHI-ResearchComputing/HcaBenchmarks. The main finding was that, at least from these fairly naive benchmarks, using SCE directly was consistently the fastest method. There must be some heavy optimisations involved in the SCE class in order for this to happen. I’m quite surprised it outperformed an indexed SQL database, but perhaps that’s because of the amount of format conversions involved.
Michael Milton (00:17:43): > Here are the benchmark results - File (Plain Text): Untitled
Tony Papenfuss (01:22:20): > @Tony Papenfuss has joined the channel
stefano mangiola (02:13:34): > has renamed the channel from “hca-integration” to “m”
2022-10-19
Vince Carey (10:36:09): > @Vince Carey has joined the channel
stefano mangiola (10:36:48): > @Vince CareyI will rename the group once it becomes private
Vince Carey (10:38:31): > OK@stefano mangiolabut I’d just comment that there seems to be a lot of benefit of being in the open as early as possible. Is there proprietary material being discussed/developed in the repo?
stefano mangiola (11:12:06): > I see. Now that our focus pivoted more on the integration of HCA rather than just a harmonised database/API, I got a bit overwhelmed as it is clear that the consortium itself will (or already is) tackling this challenge, and I would not like to cause a rush as we are a small team. Maybe I got anxious without good reason. I am more than happy to take suggestions from the more experienced.
2022-10-21
stefano mangiola (05:08:55): > Hello, I think the consensus is that with the goal to go public as soon as possible( I completely agree), might be beneficial to keep private for the initial phase. I will start a Slack chat instead, and close this channel for the time being.
2022-10-27
Marcel Ramos Pérez (10:55:13): > @Marcel Ramos Pérez has joined the channel
Marcel Ramos Pérez (10:55:29): > archived the channel