#spatialdata-devel

2023-04-28

Helena L. Crowell (10:15:39): > @Helena L. Crowell has joined the channel

Helena L. Crowell (10:15:39): > set the channel description: S4Class to read/write/represent > OME-Zarr/SpatialData-Python

Giovanni Palla (10:16:03): > @Giovanni Palla has joined the channel

Luca Marconato (10:16:03): > @Luca Marconato has joined the channel

Tim Treis (10:16:03): > @Tim Treis has joined the channel

Constantin Ahlmann-Eltze (10:16:03): > @Constantin Ahlmann-Eltze has joined the channel

Helena L. Crowell (10:18:11): > Hey Mr.s! Was a real pleasure working with you guys and getting things rolling.Sorry I had to storm off in such a rush.Will be in touch after taking a little time to cover the outstanding basics (tests, documentation, working examples etc.).Let’s meet when there’s some more progress & see from there!:call_me_hand:

Helena L. Crowell (10:19:36): > PS: the trams here suck & DB is late:alarm_clock:back to:flag-ch::sparkles::snow_capped_mountain:

2023-05-02

Helena L. Crowell (05:19:36): > So,@Giovanni Palla, you mentioned (regular?) meeting(s)… Let me know when you (both?) would be available. I’m trying to track progress, and noting down questions as I go. So would be helpful to check in now and than:muscle:

Luca Marconato (06:03:02): > Hi,@Giovanni Pallawhat about blocking 30 min of the Thursday’s spatial data meetings? We meet with the community every two weeks from 11 to 12. Usually we speak for about 30 min with external people and the remaining 30 we discuss about us. But we could use that time to discuss bioc updates, since we talk among us in Zulip/Zoom more often anyway

Luca Marconato (06:03:15): > the next meeting would be the next week

Giovanni Palla (06:06:55): > I rthink this is a great idea!

Giovanni Palla (06:07:13): > would it work for you@Helena L. Crowell?

Helena L. Crowell (06:09:30): > Perfect. So Biweekly (starting next week) Thu at 11:30? (except next Thu I am going to Mallorca for the weekend:see_no_evil:)

Giovanni Palla (08:23:00): > we can do it some other time next week (e.g. wednesday) ?

Helena L. Crowell (08:26:19) (in thread): > Yes, any time before would work.Tho if we do Wed than I can’t work on what we discuss until I’m back…but whenever you’re available works for me & after next week I’ll put Thus in my calendar:blush:

Giovanni Palla (10:44:48) (in thread): > whatever you think it’s best! I could do next wed if you want

2023-05-03

Helena L. Crowell (12:37:49) (in thread): > gee, so sorry, I realized my flight is on Wed not Thu…. so we’ll have to reschedule:confused:but no worries - got enough to do:smile:

2023-05-04

Giovanni Palla (05:09:11) (in thread): > ok then week after no problem!

2023-06-09

Aedin Culhane (17:25:43): > @Aedin Culhane has joined the channel

Ahmad Alkhan (17:26:37): > @Ahmad Alkhan has joined the channel

2023-09-13

Ruben Dries (10:57:28): > @Ruben Dries has joined the channel

Christopher Chin (17:05:21): > @Christopher Chin has joined the channel

2023-09-20

Alik Huseynov (04:22:23): > @Alik Huseynov has joined the channel

2023-09-29

Wes W (10:47:09): > @Wes W has joined the channel

2024-01-04

Artür Manukyan (05:43:44): > @Artür Manukyan has joined the channel

2024-03-11

Artür Manukyan (12:29:13): > Hey all, I was wondering if there are any efforts/plans to define load/read functions for spatial tech (Xenium, Visium etc.)?https://github.com/HelenaLC/SpatialDataif so, would they be embedded in the package or be a separate entity like spatialdata-io ?

Helena L. Crowell (12:38:31) (in thread): > “plans” is perhaps a little optimistic, but yeah, the idea would be to have a class, visualization, and utils package (readers and writers) + n > 1 methods packages for useful operations (e.g., transformations, aggregation, stats etc.) - however, the latter only makes sense once the class is in order, which it isn’t. Some discussion is happening, but nothing concrete at this point. If/when there are developments, it’ll be made known.

Luca Marconato (13:04:50) (in thread): > I think the quickest approach will be to have a cli in the Pythonspatialdata-iopackage which takes the raw data and saves it to Zarr, in this way the R library can read the data without the need to implement the reader. The main problems with the readers is not the implementation but the maintenance as the versions of SpaceRanger and XeniumAnalyzer keep evolving.

Luca Marconato (13:05:31) (in thread): > This PR tracks the status of the CLI, unfortunately it’s buried under other PRs and issues atm, but we will get there.https://github.com/scverse/spatialdata-io/pull/72 - Attachment: #72 Added first simple version of CLI script. > This PR adds a command line interface for calling readers. At the moment, this doesn’t work due to non-merged PRs on spatialdata-io. > > These are the things that still need to be done in this code to make it work: > > ☐ Add flags for each specific technology to ensure all of them work > ☐ Wait for PR #28 to be merged

Artür Manukyan (15:59:06) (in thread): > Ah thanks guys, a CLI does in fact sounds amazing. > > However, I still believe a strong class/container such as SpatialData deserves its own standalone interfaces in both Python and R side. > > Also@Helena L. Crowellbut the class in R package should now be capable of building custom SpatialData objects at least and write to some zarr right ? I will play with it now:smile:

Helena L. Crowell (16:33:20) (in thread): > No, not really, unfortunately… as I said, there are discussions happening but the class as is should not be used/expected to be read/write compatible with what’s in python; were it ready it would be in bioc devel and not just my GH. (e.g., we are missing a delayed zarr-backend, proper handling of zattrs, zarr store representation etc. before a proper class can be built; this is all under discussion but will need time…)

Artür Manukyan (16:37:52) (in thread): > Ah gotcha! thanks again Helena

2024-03-15

Davide Risso (08:08:16): > @Davide Risso has joined the channel

Marcel Ramos Pérez (08:40:25): > @Marcel Ramos Pérez has joined the channel

Laurent Gatto (08:49:44): > @Laurent Gatto has joined the channel

Charlotte Soneson (09:21:06): > @Charlotte Soneson has joined the channel

Vince Carey (09:32:38): > @Vince Carey has joined the channel

Jenny Drnevich (10:02:54): > @Jenny Drnevich has joined the channel

Lukas Weber (10:47:57): > @Lukas Weber has joined the channel

2024-03-17

Peter Hickey (18:02:36): > @Peter Hickey has joined the channel

2024-03-18

Dario Righelli (05:44:10): > @Dario Righelli has joined the channel

Mark Keller (06:53:12): > @Mark Keller has joined the channel

Stephanie Hicks (08:07:20): > @Stephanie Hicks has joined the channel

Hervé Pagès (14:02:36): > @Hervé Pagès has joined the channel

2024-03-21

Leonardo Collado Torres (09:29:58): > @Leonardo Collado Torres has joined the channel

Nick Eagles (09:56:13): > @Nick Eagles has joined the channel

2024-05-17

Michal Kolář (10:00:02): > @Michal Kolář has joined the channel

2024-08-14

Kylie Bemis (22:38:19): > @Kylie Bemis has joined the channel

2024-08-17

Vince Carey (07:15:31): > This channel needs some action! In thread I will put some benchmarking regarding handling of ‘large’ parquet files that can arise with xenium …

Vince Carey (07:22:40) (in thread): > The basis is transcripts.parquet shipped withlung cancer data from 10x - Attachment (10x Genomics): Preview Data: FFPE Human Lung Cancer with Xenium Multimodal Cell Segmentation - 10x Genomics

Vince Carey (07:23:54) (in thread): > xxis produced witharrow::read_parquet: > > > xx > # A tibble: 12,165,021 × 11 > transcript_id cell_id overlaps_nucleus feature_name x_location y_location > <dbl> <chr> <int> <chr> <dbl> <dbl> > 1 2.82e14 UNASSIGNED 0 STEAP4 66.8 1439. > 2 2.82e14 UNASSIGNED 0 THBS2 202. 1422. > 3 2.82e14 UNASSIGNED 0 CXCR4 60.8 1427. > ... >

Vince Carey (07:24:40) (in thread): > > > microbenchmark(z <- xx |> filter(x_location < 220 & y_location > 1410), times=20) > Unit: milliseconds > expr min lq > z <- filter(xx, x_location < 220 & y_location > 1410) 81.7286 103.8816 > mean median uq max neval > 106.8381 106.6605 112.5011 127.0887 20 >

Vince Carey (07:28:24) (in thread): > using duckdb with the parquet seems faster > > > con = dbConnect(duckdb()) > > vi = dbExecute(con, "create view tx as select * from parquet_scan('transcripts.parquet')") > > microbenchmark(z <- tbl(con, "tx") |> filter(x_location < 220 & y_location > 1410), times=20) > Unit: milliseconds > expr min > z <- filter(tbl(con, "tx"), x_location < 220 & y_location > 1410) 7.27563 > lq mean median uq max neval > 7.457816 7.684494 7.582807 7.808913 9.218928 20 >

Vince Carey (07:30:28) (in thread): > These observations are part of an informal exploration of how one might “lighten the stack” associated with this modality –https://vjcitn.github.io/XenSCE/ - Attachment (vjcitn.github.io): Simple classes and methods for managing a Xenium exemplar dataset > Define a relatively light class for managing Xenium data. Totally experimental.

Vince Carey (07:36:26) (in thread): > Putting to one side the important topic of interfacing with zarr, a question is whether we will want to have something like saveHDF5SummarizedExperiment that can manage in a unified way the R experiment class data, quantifications in HDF5 files and positional data recorded in parquet files? And another is whether duckdb is really advantageous for working with the parquet.

Helena L. Crowell (08:56:56) (in thread): > Just wondering, have you tried arrow::read_parquet with as_data_frame=FALSE, and what difference it makes to in-memory representation and speed of dplyr operations?

Vince Carey (09:29:44) (in thread): > Thanks Helena, much better that way: > > > nn = arrow::read_parquet("transcripts.parquet", as_data_frame=FALSE) > > microbenchmark(z <- nn |> filter(x_location < 220 & y_location > 1410), times=20) > Unit: milliseconds > expr min lq > z <- filter(nn, x_location < 220 & y_location > 1410) 6.558221 6.790354 > mean median uq max neval > 7.022074 6.947779 7.04997 8.184056 20 >

Vince Carey (09:33:01) (in thread): > but now z is > > > z > Table (query) > transcript_id: uint64 > cell_id: string > overlaps_nucleus: uint8 > feature_name: string > x_location: float > y_location: float > z_location: float > qv: float > fov_name: string > nucleus_distance: float > codeword_index: int32 > > * Filter: ((x_location < 220) and (y_location > {value:double = 1410})) > See $.data for the source Arrow object > > … how do we dig out the locations?

Helena L. Crowell (09:33:50) (in thread): > Yeah, it’s a bit fiddly … basically all my functions do things like select > filter > … > pull or as.data.frame to get the data & plot, e.g.

Helena L. Crowell (09:34:44) (in thread): > see here some methods I had written a while back –https://github.com/HelenaLC/SpatialData/blob/main/R/PointFrame.R

Vince Carey (10:03:58) (in thread): > I think for comparability we need to put a collect() in the benchmark, all the last benchmark does is create the query. Including the collect increases the median time to 22ms. I have to put a collect in with the duckdb example too. Benchmarking with lazy computation takes more care than I have shown so far. > > > microbenchmark(z <- tbl(con, "tx") |> filter(x_location < 220 & y_location > 1410) |> collect(), times=20) > Unit: milliseconds > expr > z <- collect(filter(tbl(con, "tx"), x_location < 220 & y_location > 1410)) > min lq mean median uq max neval > 83.23629 85.57663 94.74392 92.81108 102.0001 114.3132 20 >

Vince Carey (10:04:59) (in thread): > Probably don’t need to consider duckdb any more. Just use arrow and dplyr knowledgeably.

2024-08-23

Artür Manukyan (04:43:44): > Going back tozarr, willZarrArraybe strictly available inSpatialDataor would it have its own package (https://github.com/HelenaLC/SpatialData/blob/main/R/ZarrArray.R) likeHDF5Array? > > There is also this really nice zarr implementation (https://github.com/keller-mark/pizzarr). I would say its utility is quite similar torhdf5.

Helena L. Crowell (04:50:19) (in thread): > jumping in here …ZarrArraywas my ad-hock solution to internally deal with.zarrarray data + supplementary.zattrs(json-like) … this was whenRarrwas on devel, and only supported reading the array data, without metadata > > that said, if we end up conforming with zarr, I would def be in favor of an independent class to represent 1) zarr + zattrs, and 2) zarr stores, where multiple resolutions (pyramid layers) are available; this is also somethingRarrdidn’t cover at the time > > hoping that futureRarrdevelopments could cover this, along with delayed support:pray:

Artür Manukyan (16:43:43) (in thread): > So this will perhaps be the topic of Bioc Oxford ?https://community-bioc.slack.com/archives/C03MKFSS7V2/p1722606138883569?thread_ts=1719579952.747229&cid=C03MKFSS7V2 - Attachment: Attachment > Hi, @Sanket Verma @Davide Risso @Mike Smith I’ve just opened an issue at the EuroBioC2024 repo proposing this BoF. @Charlotte Soneson told me she may be also interested, so let’s have it proposed and will see whether we get enough interest. If it happens, we’ll open a Zoom call so that Sanket could join.

Artür Manukyan (16:49:06) (in thread): > However, I must say my experience withpizzarrso far was quite good, checks out many utilities thatRarrdoesn’t.

2024-08-27

NILESH KUMAR (11:36:51): > @NILESH KUMAR has joined the channel

2024-09-03

Hervé Pagès (15:06:32) (in thread): > @Artür Manukyanpizzarrsounds promising, thanks for sharing. Hopefully they’ll consider submitting to CRAN. > I also like the idea of a dedicatedZarrArraypackage, with a ZarrArray class that brings theDelayedArrayframework on top ofpizzarr, similar to HDF5Array objects but for zarr instead of hdf5.

Artür Manukyan (15:26:10) (in thread): > Thanks@Hervé Pagès, I have started one here to see how it would look like:https://github.com/BIMSBbioinfo/ZarrArray. Lets see how it goes …

Hervé Pagès (15:38:18) (in thread): > Oh, interesting! I’ll check it out. Thanks

2024-09-11

Ellis Patrick (03:01:13): > @Ellis Patrick has joined the channel

2024-09-13

Gobi Dasu (18:18:04): > @Gobi Dasu has joined the channel

2024-09-25

Alik Huseynov (07:09:15): > @Helena L. CrowellSpatialDataR package, how can this be used in downstream spatial data analysis, eg starting with makingSpatialExperimentobject fromSpatialDataclass? If that kind of workflow exists (besidesthis one) could you provide an example workflow? Thanks

Luca Marconato (13:03:57) (in thread): > Hi, pre-answer before Helena gives full details. We are going to actively working on this in the next 2 months (first scheduled meeting with Helena is on Friday). We will give updates here:blush:

Alik Huseynov (14:12:25) (in thread): > thanks Luca!

2024-10-01

Alik Huseynov (08:21:31) (in thread): > @Luca Marconatowill that upcoming hackathon cover some of that in R?https://spatialhackathon.github.io/ - Attachment (ELIXIR-Germany SpaceHack): Home > Github Pages for SpatialHackathon

Luca Marconato (08:37:57) (in thread): > I am not sure about this; unfortunately I won’y be able to attend the SpaceHack hackathon. But it could be a good occasion to work on interoperability for who is attending.

2024-10-10

Alik Huseynov (15:45:12): > @Luca Marconatoand@Giovanni Palla: > For spatial neighbors graph building, is spatial contiguity-based (such as rook, queen contiguity) graph building already exists inscverseorsquidpy? > This would be the most relevant for VisiumHD or similar (for segmentation-based methods only if cell boundaries share a vertex or an edge)

2024-10-16

Giovanni Palla (23:58:25) (in thread): > hi@Alik Huseynov, yes squidpy does that for visium, I believe it would technically work for visium hd as well, but it should be tested, especially because I expect there is gonna be some sway between bins

Giovanni Palla (23:59:16) (in thread): > you can take a look at the implmentation herehttps://github.com/scverse/squidpy/blob/main/src/squidpy/gr/_build.py

2024-10-17

Giovanni Palla (00:00:15) (in thread): > like, if you pass n_neighbors=4 and coord_type=grid, itshoulddo what you expect on visium HD, but I’m afraid it won’t be consistent, this in part maybe because of implementation not robust, but I think more due to the actual numerical values of the coordinates

Giovanni Palla (00:00:59) (in thread): > see e.g thishttps://github.com/scverse/squidpy/blob/80621fc29c594f1a7b5c61b3fa968c56eecaeac9/src/squidpy/gr/_build.py#L412we had to introduce for visium

Giovanni Palla (00:01:29) (in thread): > ofc suggestions/pull requests always wlecome

Alik Huseynov (07:08:53) (in thread): > > especially because I expect there is gonna be some sway between bins > Hi@Giovanni Palla, you mean there is some displacement between spot (or bin) pairs? In this case pairwise distances between their centroids won’t be the same. > This is related, though for older ST Visiumhttps://www.biorxiv.org/content/10.1101/362624v2.full#F7I think rearranging array spots uniformly, like with centroidal Voronoi tessellation would solve that.

Luca Marconato (12:07:15) (in thread): > Not sure if it is useful: the sway between the bins may be explained by the fact that a small rotation is present for bin data.

Alik Huseynov (13:31:37) (in thread): > Yes, that’s another one. When the small rotation is present (maybe even at different magnitudes) additional preprocessing of that data should be done, like re-alignment or similar to remove any shifts. > Non-uniform spot positions are likely due to array printing processes (some technical variability) > In general, I think those need to be corrected, to insure consistency and robustness..

Lambda Moses (13:39:25): > @Lambda Moses has joined the channel

2024-10-21

Alik Huseynov (06:21:11): > probably I missed it, is someone working on SpatialData (also in R) be able to read technology-specific.zarrfiles, like from VisiumHD and Xenium?

Vince Carey (09:34:32): > Is the “technology specific .zarr” format specifically emitted by a version of Xenium … and if so is there an example in the 10x downloads page? i suspect that there is but if you could pin down an example and link that gives the format you are interested in, there are new developments in Rarr that should be examined.

Artür Manukyan (09:39:42) (in thread): > cell coordinates for example are also given as .zarr as an output, haven’t checked myself if Rarr can read it but I suspect it can!

Artür Manukyan (09:39:58) (in thread): > https://www.10xgenomics.com/support/software/xenium-onboard-analysis/latest/advanced/xoa-output-zarr - Attachment (10x Genomics): Overview of Xenium Zarr Output Files - Official 10x Genomics Support > 10x Genomics In Situ Software Suite

Artür Manukyan (09:41:31) (in thread): > Rarrshould work as long as you can point it to the group name under the zarr store hence it can read the,polygon_sets

Alik Huseynov (09:44:32) (in thread): > Xenium-specific zarr is explained in the comment above:arrow_up:One can use the XOA v3multimodal public example

Alik Huseynov (09:48:55) (in thread): > In general, I think that if spatial technology provides their.zarroutput, then one should be able to read it, be it in SpatialData, Rarr or any other related packages

Luca Marconato (10:10:58) (in thread): > Hi, we are generally not parsing the Xenium specific.zarrfiles for Xenium (and Visium HD) because the information is mostly redundant and present elsewhere in handier formats (e.g. points coordinates in.parquet). Still, sometimes some information in some versions is present only in the Zarr files and we parse it from there (for instance IIRC the mapping between the labels indices and the cell indices was present only in Zarr in some older versions, while now I think it’s available also in the parquet files but we are still reading it from Zarr).

Luca Marconato (10:12:59) (in thread): > We haven’t finalized the support for XOA 3.0, but if we can avoid using the Zarr files we’ll stop reading them. > > Regarding where to find the info, I also recommend the resource shared from@Artür Manukyan, together with the changelog from 10x (which is well curated), and also these tiny datasets:https://www.10xgenomics.com/support/software/xenium-onboard-analysis/latest/resources/xenium-example-data. We asked 10x to provide some small datasets for CI purposes and they were kind to upload them (they are currently only available only for Xenium data, not for Visium HD). - Attachment (10x Genomics): Example Xenium Datasets - Official 10x Genomics Support > 10x Genomics In Situ Software Suite

Alik Huseynov (12:03:03) (in thread): > .parquetis definitely more handier than.zarrI agree. > So, this means that unless strictly necessary, SpatialData won’t provide support on reading technology-specific Zarr files, correct?

Mike Smith (14:25:20): > @Mike Smith has joined the channel

2024-10-22

Edward Zhao (03:10:21): > @Edward Zhao has joined the channel

2024-10-23

Alik Huseynov (03:22:21) (in thread): > any updates on that, interoperability with R and timeline? Thanks!

Cindy Fang (09:57:11): > @Cindy Fang has joined the channel

Luca Marconato (12:09:49) (in thread): > Currently at a conference but from this weekend I will pick up this and give updates.

2024-10-27

Artür Manukyan (18:53:57): > Shall we start opening some issues onHelenaLC/SpatialData.https://github.com/HelenaLC/SpatialData/issues/53? > > I must say that I am not a big fan ofbasilisk, and don’t know how much it is favored by peeps at bioc, but there are efforts on R-native solutions to this:https://github.com/scverse/anndataR - Attachment: #53 implementation and default behaviour of readTable

2024-10-28

Alik Huseynov (02:19:12) (in thread): > I think first we need to know from@Helena L. Crowelland others who are involved, what is the status of that package, integration with spatial bioc packages and how that object can be used in spatial downstream analysis.

Helena L. Crowell (02:51:28) (in thread): > Hi both! I really appreciate the enthusiasm. But would be grateful if we could hold off in-depth discussion until after Nov 15th. There will be a SpatialData hackathon in Basel, Switzerland, where many Bioc folks will be present as well. We hope to strategize then (scverse & Bioc together) how to best move forward. So please be a little patient:pray:fwiw, I replied to the GH issue you posted rebasiliskjust now.

Alik Huseynov (03:15:35) (in thread): > Ok, thanks! Let’s see what the plan looks like then after 15.11

2024-10-30

Estella Dong (17:01:25): > @Estella Dong has joined the channel

Shila Ghazanfar (17:01:25): > @Shila Ghazanfar has joined the channel

2024-10-31

Mike Smith (08:20:20): > Thanks for the interesting discussion yesterday. At some pointEBImagewas mentioned, specifically in the context of callingcomputeFeatures()on an image. Would there be any enthusiasm for supporting Zarr images natively in EBImage? At the moment I think it only supports TIFF, JPEG, and PNG, with a suggestion to use RBioFormats to convert from others. Given the seeming coalescence and enthusiasm for Zarr in the bioimaging world, could this have a big impact for those wanting to do image analysis in R? I think it’ll be quite a lot of work, particularly if we wanted to support out of memory operations on Zarr images, but might be something to explore.

Artür Manukyan (08:26:33) (in thread): > Dear Mike, at some point we have customized some DelayedArray based packages to define image pyramids in R which takes in either HDF5Array or ZarrArray. With that we were able to devise ggplot2 based shiny applications that access images in different layers in a speedy fashion:https://x.com/arturman_sl/status/1841873001207210151Would it be possible to support pyramids in EBImage too ? or something like that with TIFF is already possible ?

Mike Smith (08:35:10) (in thread): > I’m far from the EBImage expert, but have contributed little bits over time. AFAIK it doesn’t have any support for image pyramids. It just reads a single image into an in memory array representation, and then works on this. I have no idea what would happen currently if you tried to give it a disk-backed array. My guess would be either realize it in memory, or crash, depending on the operation you tried to perform.

Mike Smith (08:36:15) (in thread): > I think it would probably also be a lot of work to support the pyramids, and if you’re already utilising OpenCV maybe there wouldn’t be much to gain from EBImage:shrug:

Artür Manukyan (08:38:41) (in thread): > I would love it though if EBImage handles pyramids, disk-backed image array in the same time. Perhaps, the same pyramid based operations could be useful for tiff as well hence you can support both OME-TIFF and OME-ZARR

Artür Manukyan (08:39:05) (in thread): > OpenCV is another deal, at least I dont personally use it for visualization in r

Alik Huseynov (08:42:25) (in thread): > inSpatialFeatureExperimentOME-TIFF (which was a major issue for us when reading Xenium data) can be read@Lambda Moseswrote that part. > check thishttps://lambdamoses.github.io/SFEWorkshop2024/articles/workshop.html#imagesImage operations can also be done usingterraandEBImage

Mike Smith (08:43:11) (in thread): > My reference to OpenCV was for the identification of features etc. I suspect it’s more sophisticated thanEBImage::computeFeatures()For visualising then yes, maybe updating EBImage would be good.

Artür Manukyan (08:52:04) (in thread): > YesRBioFormatshandles that beautifully!

Artür Manukyan (08:52:47) (in thread): > I guess there is some effort to define zarr formats within BioFormats which will translate to RBioFormats immediately

Artür Manukyan (08:53:10) (in thread): > The Java aspect is the only thing I dont like, but what do you do

2024-11-01

Artür Manukyan (09:37:48) (in thread): > @Josh Moorewould know better the status of zarr IO in BioFormats

2024-11-04

Vince Carey (09:10:07): > The merfish.zarr in inst/extdata has zmetadata but I do not see anything that would constitute experiment metadata – what is the source, what version of platform, etc. Can we state a minimal standard for experiment provenance and ensure that all > our data artifacts can be checked for compliance? [Repeats text of an issue I filed at Helena’s SpatialData repo]

Luca Marconato (10:06:43) (in thread): > Correct, the zmetadata is just used to store the consolidated metadata, i.e. to make filenames discoverable in storage systems wherelsis not available (such as S3), and it is not designed for general metadata. Personally, I haven’t worked on the metadata problem, but Wouter-Michiel Vierdag (SpatialData core dev) has. I will ask him to join this channel so he can add more details. But in short, his approach, which saw some experiments in the latest NGFF challenge, is to useRO-Crate+LinkMLto have a hierarchical system to store the metadata that is also extendable and queryable. RO-Crate and Zarr would work in a compatible way, in the sense that the path of a Zarr store can also be read as a RO-Crate container. > > Some previous discussions (to which I haven’t taken part, so I can’t give more details) can be found here: > * https://forum.image.sc/t/community-call-metadata-in-ome-ngff/77570 > * https://forum.image.sc/t/ro-crate-and-omero/80610 > * https://github.com/ome/ome2024-ngff-challenge/issues/4 - Attachment (researchobject.org): Research Object Crate (RO-Crate) > RO-Crate is a community effort to establish a lightweight approach to packaging research data with their metadata. It is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments. - Attachment (linkml.io): LinkML - Linked data Modeling Language - LinkML - Linked data Modeling Language > LinkML is a language for modeling linked data - Attachment: #4 Where to place RO Crate Metadata > Where to place the ro-crate-metadata.json file### > > The ro-crate-metadata.json contains descriptions of files relative to its location, within the same directory it is placed in. They can describe individual files and directories. It is not clear to me whether ro-crate-tools would expect / have a standard way of handling multiple ro-crate-metadata.json files within nested directory structures (i’ve not seen an example of this in the spec) > > From discussions on 2024/07/02 there were a few different options of where to place this file relative to the zarr. I’m not sure what are the most relevant design ’scoring’ factors to consider e.g. size/effort to create ro-crate-metadata.json or promoting adoption (e.g. by creating a standard that can be used beyond zarr files / by image formats that we would want to convert to zarr) > > 1. In the directory outside the zarr > > > root-directory/ > | ro-crate-metadata.json > | my-image1.zarr/ > | | .zgroup > | | .zattrs > | | 0/ > | | 1/ > ... > | my-image2.zarr/ > | | .zgroup > | | .zattrs > | | 0/ > | | 1/ > ... > > > > consequences: > > • Can describe more than one zarr while reusing parts of the metadata descriptions > • Not intrinsically tied to the zarr format, so can be used to describe other image files using the same metadata structure > • Could also contain non-image files (e.g. parquet files, meshes) “cleanly” (i.e. without putting things in the OME-Zarr container that the spec doesn’t explicitly describe) > • Would support holding multiple images together with, e.g. transform metadata > • Tools pointed at the OME-Zarr can’t discover the ro-crate metadata without making assumptions about the hierarchy structure > > 2. Inside the zarr, at the multiscale image level > > > my-image1.zarr/ > | ro-crate-metadata.json > | .zgroup > | .zattrs > | 0/ > | 1/ > … > > > > consequences: > > • The zarr file effectively becomes an ro-crate, so the metadata is more difficult to ‘lose’ > • Each RO-Crate contains a single multiscale image, possibly with a single set of labels > • Easy to add ro-crate-metadata.json to an existing OME-Zarr without restructuring > > 3. Within a zarr group wrapping other zarrs > > > zarr-group.zarr/ > | ro-crate-metadata.json > | 0/ > | | 0/ > | | 1/ > ... > | 1/ > | | 0/ > | | 0/ > … > > > > consequences: > > • The zarr file effectively becomes an ro-crate, so the metadata is more difficult to ‘lose’ > • Easy to add ro-crate-metadata.json to an existing OME-Zarr without restructuring > • Can describe more than one zarr while reusing parts of the metadata descriptions > • Can easily coexist with option 2

Vince Carey (10:21:58) (in thread): > Thanks, that’s an interesting-looking stack. I would like to propose that our archives have a README.txt or README.md that gives some basic orientation, because the path to making RO-crate so far involves video tutorials etc. and I would like to be in a 0-day position to “know where the data came from and what it is about”. I hesitate to handle any data about which I can’t answer those basic questions – we don’t even state (in the artifact) the fact that the merfish.zarr has its origin in … scverse …

Vince Carey (10:24:54): > I made a PR to SpatialData at HelenaLC. I’d like to propose that our packages pass R CMD check, and when they fail a clear issue is provided concerning what needs to be fixed.

Vince Carey (10:31:05): > Another topic: “zarr” organizes a family of files. We want to be able to hand these around – I have adopted “zip” to put them together and then the user can take them apart. Good or bad? Do utilities exist for operating on compressed unified archives of zarr?

Mark Keller (10:34:43) (in thread): > Technically zarrstoresare more abstract. A zarr “DirectoryStore” uses a family of files to store its data. A zarrZipStoreon the other hand achieves what you want. You just need to be careful when zipping a directory store as you do not want to include the root directory as part of the internal paths of the zip file. For instance: > > To zip a zarr directory store, need to zip recursively and without the root directory included in the internal paths. > > # enter the zarr store and zip its contents > cd path/to/input.h5ad.zarr && zip -r ../path/to/output.h5ad.zarr.zip . > # cannot simply do > # zip -r path/to/output.h5ad.zarr.zip path/to/input.h5ad.zarr >

Artür Manukyan (10:52:21) (in thread): > @Mark Kellerwhat exactly happens if you zip the root folder directly though ?

Mark Keller (10:54:28) (in thread): > The keys to things in the zarr store will include the"path/to/input.h5ad.zarr"so if we use AnnData as an example, accessing theXarray from the root of the storez, in the first case would look likez["/X"]while the latter will requirez["/path/to/input.h5ad.zarr/X"]instead which is probably not intended

Vince Carey (11:01:33) (in thread): > Very interesting, thanks.

2024-11-05

Estella Dong (05:34:32): > A more advanced version of Parquet:https://github.com/spiraldb/vortex/. It could be interesting to see the additional benefits for storing table. And how to make it extend GeoParquet?

2024-11-06

Artür Manukyan (08:31:58): > FYIhttps://github.com/scverse/anndataR/pull/190

Dharmesh Dinesh Bhuva (22:07:48): > @Dharmesh Dinesh Bhuva has joined the channel

Malvika Kharbanda (22:08:33): > @Malvika Kharbanda has joined the channel

Farhan Ameen (22:15:21): > @Farhan Ameen has joined the channel

2024-11-07

Shila Ghazanfar (17:54:33): > hi all - im curious if anyone has worked with the .gef files in BGI Stereo-seq before? this is an h5 file which seems to include the per-nanoball locations and counts, but i’m finding it hard to understand how that information is being stored & represented, thanks so much

Artür Manukyan (17:59:31) (in thread): > Hmm, this is a question rather for the#spatialchannel but perhaps I might help a little, we established some internal workflows to process.gefs

Shila Ghazanfar (19:24:34) (in thread): > thanks Artur !

Peter Hickey (21:32:39) (in thread): > Matt Ritchie’s lab have looked at some stereoseq/STOmics data, too.@Changqingmight know something

Luca Marconato (23:44:42) (in thread): > Please also you may find useful how we parse the.geffiles inspatialdata-io. Code partshereandhere.

Shila Ghazanfar (23:51:43) (in thread): > Thank you pete and Luca! I think ive figured it out. Appreciate it!

2024-11-11

Harry Mueller (18:08:14): > @Harry Mueller has joined the channel

2024-11-15

Louise (15:00:37): > @Louise has joined the channel

2025-01-03

Josh Moore (11:41:44): > @Josh Moore has joined the channel

Josh Moore (11:43:20) (in thread): > Sorry@Artür ManukyanWasn’twatching the channel.If you mean OME-Zarr, it should work with the additional jars on your class path.Generic zarr support is currently unplanned.

2025-01-04

Alik Huseynov (12:36:10) (in thread): > Hi both@Luca Marconatoand@Helena L. Crowell. Any updates on reading SpatialData object in R? > Eg, I would like to load existingxenium_test.zarrobject, then makeSpatialExperimentor ideallySpatialFeatureExperimentobject for downstream spatial analysis. > If there is some preliminary workflow/vignette to play around, please do share. Thanks!

Helena L. Crowell (15:57:20) (in thread): > Mm, sorry for the silence…a social media/blog post was supposed to happen after the hackathon with a summary/links etc.…hopefully coming soon!…in the meantime, you can checkout my GH for what we got thus far…all under devel, but what you described should be doable:crossed_fingers:

Alik Huseynov (18:28:11) (in thread): > Thanks!It would great to have a link to that post once ready. > I checked your repo:+1:

Luca Marconato (18:54:27) (in thread): > > a social media/blog post was supposed to happen after the hackathon with a summary/links e > Yes, apologies, coming soon! I do not have access to the scverse social media and we didn’t manage to finalize it before the holidays started.

Luca Marconato (18:57:03) (in thread): > This is a BioHackrXiv pre-print that we submitted:https://osf.io/preprints/biohackrxiv/8ck3e, you can find all the details there already. - Attachment (OSF): 1st SpatialData Developer Workshop > This pre-print is aimed at sharing the results of the “1st SpatialData workshop,” an in-person event organized by the SpatialData team and funded by the Chan Zuckerberg Initiative (CZI) that brought together expertise from different fields, including methods developers of a variety of tools for single-cell and spatial omics. The purpose is to explore new directions to advance the field of spatial omics. By leveraging multiple programming languages, including Python, R, and JavaScript, the event focuses on four central hackathon tracks: R interoperability: This track aims to enhance the integration and compatibility of R and Python with the SpatialData Python framework by using the language-agnostic SpatialData Zarr file format (which follows, when possible, the NGFF specification). Visualization interoperability: This track is dedicated to improving the seamless integration of visualization tools across different systems and programming languages via a tool-agnostic view configuration. Scalability and benchmarking: Participants will identify, benchmark, and address computational bottlenecks within the SpatialData framework. Ergonomics and user-friendliness: This track focuses on enhancing the usability and accessibility of the SpatialData framework for both first-time users and third-party developers. These tracks aim to foster collaboration and innovation, driving advancements in the analysis and infrastructure of spatial omics and imaging data.

2025-01-05

Ilariya Tarasova (22:26:31): > @Ilariya Tarasova has joined the channel

2025-01-08

Maria Doyle (14:10:59): > @Maria Doyle has joined the channel

2025-01-09

Maria Doyle (09:24:43): > Hi everyone, > Check out the new Bioconductor blog post on the SpatialData hackathon and workshop held in November in Switzerland, organised by scverse with Bioconductor participation. Read more here:https://blog.bioconductor.org/posts/2025-01-08-bioc-in-scverse-workshop/Thanks to@Artür Manukyanfor writing the post and to everyone who contributed! - Attachment (Bioconductor community blog): 2024 SpatialData Workshop – Bioconductor community blog > R/Bioconductor developers participating in the 1st scverse SpatialData Workshop

Ammar Sabir Cheema (11:41:03): > @Ammar Sabir Cheema has joined the channel

2025-01-10

Brian Repko (09:26:53): > @Brian Repko has joined the channel

2025-01-26

Alik Huseynov (12:55:44) (in thread): > > ..in the meantime, you can checkout my GH for what we got thus far … all under devel, but what you described should be doable:crossed_fingers: > would it make more sense to haveSpatialExperimentobject intablesby default instead of SCE? Sincesdata["table"]hasobsm['spatial']polygon centroids as well.

Luca Marconato (13:46:09) (in thread): > One note,obsm['spatial']is absent in most datasets (we kept for Xenium for legacy reasons), as we decouple spatial information from tabular annotations.

Alik Huseynov (14:57:58) (in thread): > Thanks. So by defaultsdata["table"]has only AnnData (non-spatial singe cell data) and any spatial info like polygon centroids, shapes etc.. can be added to it afterwards if needed.

Luca Marconato (15:08:24) (in thread): > exactly, and also vice versa: you can have aSpatialDataobject that has only images or geometry information and then you add one or more tables later

Luca Marconato (15:08:43) (in thread): > (or you can have an object that directly has everything)

2025-01-28

Alik Huseynov (04:04:25) (in thread): > > One note,obsm['spatial']is absent in most datasets (we kept for Xenium for legacy reasons).. > @Luca Marconatois such legacy kept or will be kept for next SpatialData updates?

Luca Marconato (05:44:36) (in thread): > One should not rely on that already, as it’s basically present only for Xenium data.

2025-02-06

Kalyanee (09:21:38): > @Kalyanee has joined the channel

2025-02-13

Mengbo Li (17:11:42): > @Mengbo Li has joined the channel

2025-03-07

Emanuele Pitino (05:36:48): > @Emanuele Pitino has joined the channel

2025-04-21

Jennifer Slotnick (22:44:36): > @Jennifer Slotnick has joined the channel