#miaverse

2020-09-24

FelixErnst (02:37:44): > @FelixErnst has joined the channel

Domenick Braccia (02:38:06): > @Domenick Braccia has joined the channel

Ruizhu HUANG (02:38:06): > @Ruizhu HUANG has joined the channel

FelixErnst (02:39:18): > Hi I created the channel to maybe keep the discussion a bit more tidy. Please invite people to join, if you think they might be interested

FelixErnst (02:39:38): > https://github.com/FelixErnst/MicrobiomeExperiment

Ruizhu HUANG (05:59:42): > If I understand correctly, so far,MicrobiomeExperimenthas a new slot calledmicrobiomeData. Currently, is there a plan like what to put on there?

FelixErnst (07:04:30): > I am not sure.

FelixErnst (07:08:27): > This was historical put intorowData, but I didn’t like the idea, since aMicrobiomeExperimentwith a changedrowDatadoes not necessarily behave like aSummarizedExperiment. (The developer of the class stored in therowDataslot has to make sure the class behaves like aDataFrame) I choose to move it an extra slot, because then it can be easily coerced to a normalSEby sheding the slot in question. In addition it is not concern of theMicrobiomeExperimentdevs to reimplement function acting just on therowData(I am thinking about connections toMultiAssayExperiments,etc)

FelixErnst (07:09:46): > Currently, I don’t see, why microbiomeData would be necessary anymore. The tree is stored inrowTreeand the taxonomic data is inrowDatafor now.

FelixErnst (07:10:25): > the reference sequence can also be made parallel torowDataas a separate slot.

FelixErnst (07:12:00): > For me the benefits of a new class and/or new slot are becoming smaller and smaller

Domenick Braccia (10:49:48) (in thread): > Hey Felix, you read my mind! thanks for making this channel

hcorrada (10:50:09): > @hcorrada has joined the channel

Jayaram Kancherla (10:50:09): > @Jayaram Kancherla has joined the channel

Levi Waldron (10:55:26): > @Levi Waldron has joined the channel

2020-09-25

FelixErnst (02:53:17): > So I think I setup the GH repo correctly for development (Development against devel on master and against release on the release branch). Once theBiocStyleissue is solved, GitHub actions should also work. > I also added areferenceSeqslot parallel to the rows, which makes use of the S4Vectors infrastructure for this. SoreferenceSeqis eitherNULLorlength(referenceSeq) == nrow(x).

2020-09-28

hcorrada (13:51:56): > Thanks for this@FelixErnst!

FelixErnst (14:24:47): > no problem. I am looking forward to collaborate on this.

FelixErnst (14:25:09): > we can also pull it back to you repo, if you’d like

FelixErnst (14:25:45): > And if you can think on other people interested in contributing, feel free to add them to the channel

2020-10-08

FelixErnst (02:55:55): > Hi Aaron@Aaron Lun. I have a question regarding thesumCountsAcrossFeatures/sumCountsAcrossCellsfunctions from thescaterpackage. The functionality is also required for microbiome experiments and is implemented for the phyloseq classes as well. Since there was and idea of switching to aSCEbacked objects, starting withTreeSummarizedExperiment(See#miaverse), I ported this quite central function to theMicrobiomeExperimentpackage. Then I realized, that you had indeed implemented the feature already with more efficiency, so I a piggy backed on that implementation fromscater. And here it comes: the name of the function is very much geared towards single cell experiments. Is there any interest into changing the name of the function and/or moving it intoSingleCellExperiment? Please let me know, how you think on that

Aaron Lun (02:57:44): > @Aaron Lun has joined the channel

Aaron Lun (02:58:05): > well it’s actually moved to scuttle.

FelixErnst (02:58:18): > in devel?

Aaron Lun (02:58:21): > yes.

FelixErnst (03:33:17): > Thanks for the advice. From what I can tell so far, there is nothing more to change. The implementation of the methods on theSEobject types is a bit different, but the functions on the matrix are the stuff we need. > > Based on a bit of history, the functions inMEare just namedmergeRows/mergeCols. Sincescuttleis also geared towards single cell analysis, I don’t see a way a more generic name of function would make sense in that package. From a concept point of view it may. Would you agree?

Aaron Lun (03:34:35): > that would be correct.

Aaron Lun (03:35:03): > You could propose to write another package underneath the two. But I don’t have a good idea of what that would be called or what its scope would be.

FelixErnst (03:35:47): > Me neither.SEutilsis just too 2019

FelixErnst (03:35:51): > :slightly_smiling_face:

FelixErnst (03:37:22): > Beelike insumSum?

FelixErnst (03:53:50): > scope: Unification of common data modification functions?

FelixErnst (03:54:04): > Any comments welcome

2020-10-12

Aaron Lun (12:44:24): > what’s this SEtup thing?

Aaron Lun (12:44:30): > is it for this?

FelixErnst (12:49:04): > yup. The name was just a stupid idea probably

2020-10-14

FelixErnst (06:55:23): > @Aaron Lun@Alan O’CI added a genericcalculateDistancetoSEtupwhich is a way to define alternate distance calculation from the microbiome world (e.g. JSD and UniFrac). SincerunMDSis defaulting todist, I had to addrunMDS2as well (for now at least). Happy to discuss, what might be done about this

Alan O’C (06:55:58): > @Alan O’C has joined the channel

Alan O’C (10:07:51): > I could change mds to usedist_fun=distanddist_method="euclidean", would that work for you?

2020-10-17

FelixErnst (14:12:24): > That would be step in the right direction. Thanks. For testing purposes I set up a functionrunMDS2inSEtupwith the argument calledFUNsince?disthas already an argument calledmethodand I didn’t want to cause confusion there.

FelixErnst (14:14:49): > Maybe it is sufficient to pass the dots on to the dist funtion. themethodargument would the work out of the box and other arguments for other distance methods would work as well

Leo Lahti (14:55:03): > @Leo Lahti has joined the channel

Joey McMurdie (14:56:14): > @Joey McMurdie has joined the channel

FelixErnst (14:58:18): > Since I got some replies from people, who are involved in the microbiome field longer than I am, I want to recap the situation. > > My interest in restructuring code for microbiome data started, when I realized, that phyloseq’s data structure was basically an SE and that Ruizhu@Ruizhu HUANGhad already laid the ground works for tree annotation data in herTreeSummarizedExperimentpackage. Based on this a lot of different re-implementions and overlap of required functionality within the single cell and microbiome worlds became apparent at several places.

FelixErnst (14:58:34): > I currently see this effort having better chances, if it becomes a group effort. I currently think in two ways about this.

FelixErnst (14:59:57): > First, any overlap with theSingleCellExperimentworld should be used, since Aaron and all the others spent a lot of time and energy on this and came up with a complex and yet understandable vocabulary for data structure, functionality, workflows and package structure, which we can already use and don’t have to reinvent. What is left to do here, is to “convert” single cell specific vocabulary into general SE vocabulary, where applicable. For this we can only hope that Aaron and others see the benefit as well and help us out. > > For this I played around with a mockupSEtup, which currently contains just the merge functions, which slot in between the existingaggregateAcrossFeaturesandsumCountsAcrossFeaturesof thescaterpackage to recreate the functionality like seen in phyloseq including the archetype argument.

FelixErnst (15:00:29): > The second thing is the gathering of functions, which people in the field are using and want to see updated to work with SE objects or ME objects. For this a two or three new packages might be on the horizon to structure functions in a way, so that we don’t end up with one big monolithic package. > > Here I see a blank page in front of me, since I lack experience in the microbiome field. I could see from all the existing packages, that broadly two topic exist: microbiome specific calculations and visualization. So maybe a package each would be something to think about. Currently two functions (JSD and UniFrac) are in MicrobiomeExperiment, which could find a better home in a specific package.

FelixErnst (15:02:00) (in thread): > I also saw thatSEtoolsis currently implementing some of the functions. So there is additional overlap

FelixErnst (15:03:43) (in thread): > Functions fromphyloseq``microbiomeandmicrobiomeutilitiscome to mind here.

FelixErnst (15:04:59): > So how do you think about it? What would you be interested in?

Alan O’C (15:13:21) (in thread): > Yeah I somehow thought the dots were passed elsewhere but that would indeed be better. Can you link theSEtuprepo?

FelixErnst (15:22:54) (in thread): > https://github.com/FelixErnst/SEtup

FelixErnst (15:25:14) (in thread): > I played around with a generic wrappercalculateDistancemaybe this is a bit to much wrapping?

Alan O’C (15:43:51) (in thread): > With SCE at least I think MDS assumes you want to do it on a matrix of the primary modality

Alan O’C (15:46:51) (in thread): > btw I thinkdo.call(FUN, c(list(x),list(...)))can just beFUN(x, ...)?

FelixErnst (16:02:24) (in thread): > Sure, if the argument is a function. If someone puts the functionname in a character vector it will work only with the first, wouldn’t it? I just thought about someone reusing the function in a dynamic context

Alan O’C (16:03:06) (in thread): > You probably wantFUN <- match.fun(FUN)first in any case

FelixErnst (16:07:01) (in thread): > I think do.call does exactly that, but writing it verbosly in the package might be better, I agree

Alan O’C (16:08:14) (in thread): > Maybe it’s just thatc(list(x), list(y))gives me PTSD:slightly_smiling_face:

Leo Lahti (16:40:23) (in thread): > Absolutely. And we are together closely connected to all three.

Leo Lahti (16:47:13): > We could experiment with some well selected cases as part of the other pkg dev work we are now doing with microbiome, microbiomeutilitues and some others. This could result in a spinoff pkg that uses the new structure and can inspire further extensions. Could also facilitate better structuring of the existing work in the above mentioned packages. Microbiome, microbiomeutilities and phyloseq are all somewhat heterogeneous.

Leo Lahti (16:53:54): > Beta diversity analysis and visualization is a clear candidate theme. Also phylogeny aware factorization and transformations. Also hierarchies in the sample space should be considered because TreeSE supports this.

Leo Lahti (16:55:10): > I would keep the class structure package rather minimal and focused on core functionality. This would support long term maintenance

Leo Lahti (17:31:50): > How about aphylodiversitypkg that would focus on (phylogeny-aware) diversity measures? For instance.

FelixErnst (18:20:46) (in thread): > For my taste this is a bit to specific/unspecific. Phylo is a an abreviation, which can mean a lot of different things and is used in different fields. In additon a bunch ofphylo*packages exist on CRAN. Just focusing on diversity measure is also a bit to narrow, isn’t it? We could just start withMiafor microbiome analysis and be done with it for now. If to many functions end up on there, we can restructure at the point.

FelixErnst (18:22:10) (in thread): > Sadly the package namemicrobiomeis already taken:grin:

Leo Lahti (18:28:23) (in thread): > We are at the moment reconsidering the organization of microbiome/microbiomeSeq/microbiomeutilies/seqtime and in principle microbiome pkg could also be used for this. But perhaps it is indeed more clear to start a fresh one. I agree about the names but I still think that it might make sense to define a specific scope for the package first, then come up with a suitable descriptive name. One could choose a generic name and just see what ends up being contributed there but in my experience this could be more efficiently taken into full completion if the scope is well defined and it is possible to assess when the package is “ready”.

Leo Lahti (18:30:04) (in thread): > I am just afraid that with a generic name such as mia we will again end up having a monolithic heterogeneous package. Although I like the name mia. Even if that name is kept, it could make sense to think of a scope.

Leo Lahti (18:30:18) (in thread): > In general, I would keep analysis and visualization in separate pkgs.

FelixErnst (18:52:07) (in thread): > Sure, but it doesn’t help to be to specific from the beginning, because then you have to keep to many things going. How about this: Lets try to convert every vignette you have written formicrobiomeand Joey forphyloseqinto a SE based workflow (or other workflows/vignettes you are interested in). This way it will become clear, which functions need to be written/converted and which function fromscater, etc can be used out of the box. For example the whole plotting for reduced dimensions is basically done inscater. So there is actually a lot of stuff inherited.

FelixErnst (18:54:05) (in thread): > And the vignettes will be a good way to keep track, how functions are used.

FelixErnst (18:55:37) (in thread): > For example the composition bar plots is something I will have a look at next week

Leo Lahti (19:09:56) (in thread): > I am into the vignette idea in general, however, our vignettes contain various materials and many of these are seldom used in practice. I would perhaps look at selected parts of the vignettes that would seem to benefit most from ME. There is clear value in 0) just getting it all going. At the same time, there are at least the following other aims (?): 1) assess the practical aspects of using of ME (benefits/bottlenecks); 2) start to lay the groundwork for a pkg ecosystem based on ME; 3) attract other contributors. Is the conversion all vignettes into using ME supporting these aims 0-3 in the optimal way. Or are there other aims. What would be the priority?

Leo Lahti (19:12:17) (in thread): > Sudarshan (microbiomeutilities) has also worked on barplots lately. Perhaps some synergies could be found.

hcorrada (19:23:09): > For diversity, please consider interfacing with pkgs fromhttp://statisticaldiversitylab.com/software

Leo Lahti (19:24:50): > agreed

2020-10-18

FelixErnst (03:24:17): > read my mind. I had a brief conversation with Amy Willis via mail, why breakaway wasn’t anymore on CRAN. Maybe this is an opportunity to get this going again

FelixErnst (03:26:16) (in thread): > Nope I think its pretty much how you descibed it. Would you mind, if I set this up on GitHub and send you an invitation?

Leo Lahti (03:57:34) (in thread): > It’s welcome. I dont know about scheduling but we can see how it goes.

Leo Lahti (07:48:32): > What is the current home for MicrobiomeExperiment, is ithttps://github.com/HCBravoLab/MicrobiomeExperiment

Leo Lahti (07:49:46): > or would it be FelixErnst/MicrobiomeExperiment.git

Leo Lahti (07:50:06): > just thinkg what’s the primary source for forks and PRs

FelixErnst (09:30:24): > FelixErnst/MicrobiomeExperiment.git

Leo Lahti (10:15:31): > Can you summarize what you would see as the key extensions inMicrobiomeExperiment, as compared toTreeSummarizedExperiment

Leo Lahti (10:16:13): > Because the latter already seems to be rather well suited (as such) for microbiome data.

hcorrada (10:21:13): > It would be great to standardize feature annotation (metagenomeFeatures is an early attempt). That should live in MIcrobiomeExperiment)

FelixErnst (10:33:53): > The missing feature compaired to whatphloseqand the early implementation ofMicrobiomeExperimentoffered was the definition of a dedicated reference sequence slot. This is the only addition to the class structure I added via the vertical slot mechanism fromS4Vectors.

FelixErnst (10:34:48): > On the side of functions I added the tax_glom replacementagglomerateByRank

FelixErnst (10:35:30): > and the underlying functions ofmergeRows, which reuse the assay summary functions fromscater

FelixErnst (10:39:08): > FrommetagenomeFeaturesI got the impression, that it offers some functionality analogous to whatTxDbdoes. Is that correct?

FelixErnst (10:41:29): > It might be possible to create a dedicated taxonomic container extending fromDataFrame. However, since many examples don’t have all levels populated, it seams to be overkill, since all columns become optional, which turns the idea of a dedicated class on itself.

FelixErnst (10:44:07): > From my point of view a on-the-fly check of the existing data structure in both rowData and rowTree should be sufficient for now. Seetaxonomy.Rfor low-level examples.

Leo Lahti (10:48:02): > right

FelixErnst (11:49:33): > https://github.com/microbiome/mia

Sudarshan (15:16:31): > @Sudarshan has joined the channel

Leo Lahti (15:17:25): > I added@Sudarshani.e. Sudarshan a.k.amicrobiomeutilitieswizard here as he was showing interest too.

Sudarshan (15:32:08): > Thanks!

2020-10-19

Sudarshan (03:25:00): > I read the discussion above and it seems very exciting development! Sharing my thoughts here. WillMicrobiomeExptsupport/integrateMultiAssayExperiment? > From an end-user perspective, the ability to integrate 16S, WGS, metatranscriptomics, metabolomics assays will be a key requirement in coming years. > Another aspect can be the ability to structure metagenome-assembled genomes as single-cell equivalents with lineage/taxonomic information and consisting of metagenomic or metatranscriptomic count data. > This can further enhance the usage ofscaterandMicrobiomeExptfor microbiome research. I am not an advanced R expert and not sure how complicated will this be…

FelixErnst (03:28:00): > From what I understandMultiAssayExperimentcan hold any type ofSEobject. This works already forME. Seehttp://www.bioconductor.org/packages/release/bioc/vignettes/MultiAssayExperiment/inst/doc/MultiAssayExperiment.htmlfor more details (Section 3.1)

FelixErnst (03:32:04): > a metagenome assemble also starts of with a count matrix and how does the feature annotation look like?rowDataandrowRangesare available out-of-the-box and so you have the choice to populate feature annotation even withGenomicRangesif such data is available

Sudarshan (04:01:43): > Ah yes! The access toMAEis something that can be demonstrated in tutorials with examples. For MAGs, a rough figure of what I was thinking based on the philosophy ofSingleCellExperiment``scater - File (PNG): image.png

FelixErnst (04:05:02): > Sure as long as you have enough RAM:grin:But than again, you can switch to HDF5 back sparse matrices and you should be good to go

FelixErnst (04:06:21): > Maybe we need to tweaksaveHDF5SummarizedExperiment/loadHDF5SummarizedExperimentto make this work forME. I am not sure, what thing Aaron has done to make this work forSCE

FelixErnst (04:09:01): > or even if there is anything to do

Ruizhu HUANG (04:24:05) (in thread): > MayberowRangesis also a possibility to store the information of metagenome features?

FelixErnst (04:24:54) (in thread): > sure.rowDataandrowRangesare both available

Ruizhu HUANG (04:27:57) (in thread): > Maybe I miss some information here. Then, why we need a new slot for the reference sequence slot?

FelixErnst (04:29:04) (in thread): > the reference sequence slot uses theBiostringsclasses, which are much faster and also offer some QC functions (no invalid chars, etc)

FelixErnst (04:29:14) (in thread): > and it is easier to type

Ruizhu HUANG (04:29:45) (in thread): > ah… I see.

FelixErnst (04:30:11) (in thread): > rowRanges(me)$seqis a bit clumsy and it is not easy to fix the name of the column. It could also berowRanges(me)$refSeqor any other name.

FelixErnst (04:30:28) (in thread): > Then we would need to write a validity function for this… and so on

Ruizhu HUANG (06:12:57) (in thread): > We could updateTreeSummarizedExperimentto provide therefSeqif you think that would be easier for others work in the community. It doesn’t seem to have a big change to the structure if I understand it correctly.

FelixErnst (06:59:25) (in thread): > We definitely could do this. Are you thinking of integrating the slots in the classTreeSummarizedExperimentor just putting theMEin the packageTreeSummarizedExperiment, which than would exportTSEandMEclasses?

FelixErnst (07:05:37) (in thread): > But this actually an interesting idea. I have the problem that I cannot use UniFrac with the complete matrix, but have to aggregate it to the species level and construct a tree from this data (becausetoTreeremoves duplicate rows). With JSD I see differences, when I compute the distances on the complete data vs. the aggregated data. I suspect, that this might be the same for UniFrac, but I cannot test it. What I have is an alignment of the 70k ASV and the taxonomic table. Could a tree be constructed, which is aware of the alignment distances to distinguish between the node based on the same taxonomic information?

FelixErnst (07:47:10) (in thread): > However,t hat is probably fasttree territory, sin’t it?

Ruizhu HUANG (08:04:51) (in thread): > I was thinking probably we could integrating the slots in the classTreeSummarizedExperimentto keep the family tree of SummarizedExperiment as simple as possible. Or how do you think? > For the problem aboutUniFrac, I guess the function is coded in a way that each row of the matrix requires one-to-one map to a leaf of the tree? In the tested data, you probably have multiple rows sharing exactly the same lineage or the same leaf.

FelixErnst (08:05:35) (in thread): > yop to the UniFrac thing. I am trying fasttree on an alignment. lets see how this works

FelixErnst (08:08:03) (in thread): > Regarding mergeMicrobiomeExperimentandTreeSummarizedExperiment: I agree with keeping the things simple. However, sinceTSEis already on BioC, it makes it difficult (or rather not as effortless) to develop without a clear time table. If we push changes toTSEmaster, the will end up on Bioc in half a year without any chance of intervening

FelixErnst (08:08:21) (in thread): > That put pressure on this, which I think we might want to avoid.

FelixErnst (08:09:38) (in thread): > If we commit to it now, we have to follow through, regardless. And this is a decision I am not really comfortable to take, especially since it will put stuff on your plate

FelixErnst (08:10:23) (in thread): > So my gut feeling right now is to keep separate and merge, if we are sure the light end of the tunnel is not the train

FelixErnst (08:12:34) (in thread): > What do you think?

2020-10-20

Amy Willis (11:57:51): > @Amy Willis has joined the channel

2020-10-21

FelixErnst (07:12:02): > <!channel>I got a question regarding the numerous distance measurements used in the microbiome world. Multiple implementations exist for example for bray-curtis and I was wondering, if and which packages you use for this.veganseams to be a linchpin for many of these measurements. Would a general accessors forvegdistdistance calc and their piping to MDS andreducedDimsomething worth the effort?

Sudarshan (07:44:23): > I think so…veganhas had a very long life of stable maintenance with developers still actively involved. Safer bet. Piping toreducedDimalso good but not sure how it deals with negative values.

Sudarshan (07:46:02): > Not sure but maybe vegan itself has an answer for handling negative values.@Leo Lahtido you have experience with this?

FelixErnst (07:51:33): > reducedDimis just a slot for holding MDS data. thedistobject gets scaled viacmdscaleand the result is stored inreducedDim

Leo Lahti (12:27:12): > vegan is a good and reliable option, yes.

Leo Lahti (12:27:47): > (although there might be other good ones)

2020-10-22

Leo Lahti (09:53:17): > By the way, abstract submission DL to EuropeanBioconductorMeeting 2020 is Oct 30 (next week Friday) - that might be a good place to receive early feedback.. what do you think?https://eurobioc2020.bioconductor.org/abstracts - Attachment (eurobioc2020.bioconductor.org): EuroBioc2020 > European Bioconductor Virtual Meeting 14-18 December 2020

Leo Lahti (09:54:12): > could be a talk or a workshop.

Leo Lahti (09:54:18): > (or a poster..)

Leo Lahti (09:55:21): > Short talks (8 minutes + 2 minutes for questions) focusing on a package, an application note, or the presentation of a new research project that seeks input and collaborations from the comunity.

Leo Lahti (09:55:37): > Short workshop (30 - 45 minutes) such as a demonstration of a package or a workflow.

FelixErnst (09:55:53): > Thats a good idea. But i don’t have the time to prepare anything. Talk/poster would require at least some finished things (e.g. submission ready) before I would feel comfortable to spent time on preparing it

FelixErnst (09:56:16): > But if someone else does have the time, please feel free to submit something.

Leo Lahti (09:56:18): > I could have time to prepare something. The short talk format would be quite suitable for a work that is at this stage.

FelixErnst (09:57:09): > That would be fantastic. My guess is that by the time you have prepared something, we would have a quite comprehensive list, which things are missing

Leo Lahti (09:58:18): > Yep exactly. This is just the abstract. And we could submit in the format that is intended for early feedback i.e. short event.

Leo Lahti (09:59:31): > Ok I can have a look. I will let you know then. Quite crucial in fact to gather early feedback from the broader community.

2020-10-24

Sudarshan (03:08:54): > FYIhttps://github.com/compbiomed/animalculesBased onseandmae

FelixErnst (04:17:25): > Wow. I didn’t know about this

Charlotte Soneson (10:28:29): > @Charlotte Soneson has joined the channel

Leo Lahti (16:19:42): > yeah that’s cool

2020-10-25

Leo Lahti (06:42:23): > Do you know what is the reason thatHCBravoLabhas not ceased working withMicrobiomeExperiment? Were there key obstacles?@hcorradaany tips? Also relevant if we continue developing this, and go to Bioc meeting to discuss the work (as it was initially started by others).

FelixErnst (07:00:41): > Thats interesting point since some issues on the HCBraveLab GitHub repo are still quite relevant. From the tests I see that some additional files were present ininst/extdataat some point in time, which I can’t find in the repo.

FelixErnst (07:01:28): > Since the fork was started to explore, I wouldn’t mind pushing back to the original repo

Leo Lahti (07:05:39): > i would hesitate submitting this to EuroBioc until we get comments/approval from the original dev

FelixErnst (07:06:06): > Fyi: I did some work yesterday and the referenceSeq slot now also exceptsDNAStringSetListso that multiple sequences can be added

FelixErnst (07:06:28) (in thread): > I agree

FelixErnst (07:07:58): > So eitherDNAStringSetListorDNAStringSetcan be used.

FelixErnst (07:11:29): > I also would like to get opinions on the naming style issue. If you have time please head over tohttps://github.com/FelixErnst/MicrobiomeExperiment/issues/12and offer an opinion and vote in the poll

Leo Lahti (08:10:04): > nice

Leo Lahti (08:40:11): > How about adding md versions of the vignettes in vignette/ folder of MicrobiomeExperiment. Could be readily browsed in the web then.

Leo Lahti (08:43:14): > .. and the same for mia

FelixErnst (08:58:59): > sure. go for it

hcorrada (09:26:35): > Hello there. No worries on presentation/ownership etc. We stopped because other projects unfortunately took our attention and didn’t find our way back into it.

hcorrada (09:27:29): > So if you are willing, you can take over as leads

hcorrada (09:27:54): > (Which you have for all intents :-) )

FelixErnst (09:34:29): > thx for the clarification

FelixErnst (09:35:05): > However, input is most welcome

Leo Lahti (11:41:15): > Ok thanks@hcorrada- it is just the first thought that comes to mind that you might have abandoned the project because it became obvious that it was too challenging or unnecessary, or something. That would be good to know, good we are not missing anything obvious. Do you think if we should also confirm with @HCBravo before proceeding..?

hcorrada (12:24:55): > That’s me :-)

Leo Lahti (12:38:22): > oh I though you are just colleagues, great:smile:

Leo Lahti (12:44:57): > The main benefits that I can currently see in MicrobiomeExperiment class, compared to phyloseq, are: > 1. Tighter links to other widely used and well-established classes (SE, TSE, etc.); adding robustness and providing access to a vast number of already implemented utilities; also the fact that these classes are already familiar to many can potentially support rapid and wide adoption after the initial groundwork. > 2. Hierarchical structure available for both feature as well as sample space > 3. Improved support for additional feature information (e.g. sequence classes) > 4. More general (covers phyloseq as a special case; this can help with conversions) > -> I wonder if I am still missing some key points.

FelixErnst (12:47:37): > faster than phyloseq

Leo Lahti (12:48:02): > The main shortcomings: > 5. Not yet adopted by the community; tools and examples are missing; adoption will take time and support is not granted (does not necessarily matter that much, if our own teams can benefit from this class anyway) > Some more examples would be needed to demonstrate more clearly the benfits of 1-4. Otherwise it easily looks like an empty exercise. So far we managed to do pretty much all we need with phyloseq. But this is partially because we did not do other things in the lack of good supporting structures..

Leo Lahti (12:48:19) (in thread): > How much. Are there benchmarks somewhere?

FelixErnst (12:49:00) (in thread): > no weird conversion making access times to sample_data and tax_table ridiculous for 70kx2k data

FelixErnst (12:49:02) (in thread): > nope

Leo Lahti (12:49:40) (in thread): > Well it will be easy to demonstrate this kind of things later. Some benchmarking will be inevitably needed.

FelixErnst (12:50:30) (in thread): > just personal experience. there is some conversion between data.frame and on of the data types, which are incredibly slow

FelixErnst (13:54:32) (in thread): > I don’t seam to be able ro reproduce it now. But here is a quickly assembled benchmark script

FelixErnst (13:56:56) (in thread): > some conversion are slower, subsetting functions are slower and use more memory somehow, a bigger phloseq object shows a weird number of samples. there seams to be some environment jumbling going on, because I cannot explain it otherwise - File (R): bench.R

FelixErnst (13:59:27) (in thread): > line 106-107 produce the following output and I cannot correct the number of samples shown for the phyloseq object

FelixErnst (13:59:29) (in thread): > > fun_p2() > #> phyloseq-class experiment-level object > #> otu_table() OTU Table: [ 57648 taxa and 1950 samples ] > #> sample_data() Sample Data: [ 26 samples by 8 sample variables ] > #> tax_table() Taxonomy Table: [ 57648 taxa by 7 taxonomic ranks ] > fun_m2() > #> class: MicrobiomeExperiment > #> dim: 57648 1950 > #> metadata(0): > #> assays(1): '' > #> rownames(57648): 549322 522457 ... 200359.2 271582.2 > #> rowData names(7): Kingdom Phylum ... Genus Species > #> colnames(1950): CL3 CC1 ... Even274 Even374 > #> colData names(8): X.SampleID Primer ... Description add > #> reducedDimNames(0): > #> altExpNames(0): > #> rowLinks: NULL > #> rowTree: NULL > #> colLinks: NULL > #> colTree: NULL > #> referenceSeq: NULL >

FelixErnst (14:02:53): > 6. multiple assays

FelixErnst (14:03:03): > 7. sparse matrix support

FelixErnst (14:04:25): > 8. multiomics integration viaMultiAssayExperiment

FelixErnst (14:07:59): > 9. future improvements in theSEclasses directly inherited. also future problems are inherited but i am glass half-full kind of person:grin:

Leo Lahti (14:25:20): > Thanks

Leo Lahti (14:26:35) (in thread): > what you mean by this - I mean MultiAssayExperiment we can use with MicrobiomeExperiment as well as phyloseq. Do you mean that there is something special in MicrobiomeExperiment class itself that allows multiple assays?

Leo Lahti (14:27:36) (in thread): > Me too. But this is essentially same as point 1, isn’t it?

FelixErnst (14:28:22) (in thread): > sure. You can add multiple assays by usingassay(me,"someohtername")<-

FelixErnst (14:28:46) (in thread): > eg. the original count matrix is not lost by usingrelAbundanceCounts

FelixErnst (14:29:48) (in thread): > Yes:expressionless:

FelixErnst (14:31:52) (in thread): > So if you don’t override any data, you can have one single object, which contains all the data of your analysis. also withaltExp()you can store agglomerate data in the same object. and if you for example subset to certain samples, this is carried through to the alternative experiments

Leo Lahti (14:57:07) (in thread): > ok great. Isn’t this somewhat inefficient. I mean, transformations are fast, for instance, and it would sense to store just information of transformation function instead of the transformed data?

FelixErnst (14:58:39) (in thread): > If you want to perform a single calculation, I would guess so, but as soon as you use the data multiple times the situation shifts

Leo Lahti (14:59:02) (in thread): > And the other question is when multiple assays would be better handled withassayrather thanmae

Leo Lahti (14:59:29) (in thread): > Ok at least if multiple times means a high number.

FelixErnst (15:04:36) (in thread): > I think a general answer is not possible for these considerations. All assays of aSEmust have the same dimensions, which is not the case for theMAE. Here the colData can map flexibly to other colData of the individual experiments

Sudarshan (15:34:19) (in thread): > Here is a nice demonstration and alternative regarding speed ofphyloseq

Sudarshan (15:36:23) (in thread): > I think the size of data that can be handled with delayed arrays is a plus compared tophyloseq

Leo Lahti (17:16:46) (in thread): > yeah right

Leo Lahti (17:17:06) (in thread): > at least this is somewhat simpler for parallel multi-omics

Leo Lahti (17:17:29) (in thread): > :heavy_check_mark:

Leo Lahti (17:18:55) (in thread): > here -where?

Leo Lahti (17:19:04) (in thread): > the above you mean?

Leo Lahti (17:21:33) (in thread): > not surprising, thinking how much optimization the se class has probably gone through. phyloseq is great in many ways but to my understanding, the resources have been more limited than with the se class

Sudarshan (17:28:03) (in thread): > Me and@Leo Lahtikeep an updated list herehttps://microsud.github.io/Tools-Microbiome-Analysis/Since 2017 this list has grown from 30 to 78 pkgs. There are many more for sure but difficult to keep track of all tools:nerd_face: - Attachment (microsud.github.io): List of R tools for microbiome data analysis > A list of R environment based tools for microbiome data exploration, statistical analysis and visualization

Leo Lahti (18:49:22) (in thread): > But this is possible also with phyloseq. Is there a particular benefit in that regard?

Leo Lahti (18:55:58): > Hi all - I drafted very quickly an abstract that we could submit to EuroBioc2020 (DL on Friday):https://docs.google.com/document/d/1AHBhmtAO9wRIRFshsW-aBHDx6h3jQTSspHNKLZUiWso/edit?usp=sharing - File (Google Docs): Bioc / MicrobiomeExperiment / 2020

Leo Lahti (18:58:39): > Now, I am not sure who should be on the author list. I put there now@FelixErnst,@Sudarshan,@hcorrada, and@Leo Lahti. Emails are required, I would ask everyone to at least add their email to confirm that inclusion on the author list and the submission is OK. And I would like to know if anyone else should be added or if you have comments on the author order.

Leo Lahti (18:58:54): > In addition, feel free to provide comments and suggestions, or even improve the text.

Leo Lahti (18:59:35): > I did not find info on abstract length. Perhaps it does not matter. Let’s keep it short anyway.

2020-10-26

FelixErnst (10:09:23): > @Ruizhu HUANGinterested in this as well?

FelixErnst (10:13:11): > I added some half sentences. However, I wasn’t aware that they are not tracked, are they?

FelixErnst (10:13:59): > Otherwise a very nice read. Clear, concise, to the point. Thank you.

Leo Lahti (10:19:36): > They would be tracked if you click from top right corner the “Suggesting” mode on

Leo Lahti (10:19:48): > but this is so short that I dont’t think it is critical now

Leo Lahti (10:19:55): > i will read it through anyway before submittin

Leo Lahti (10:20:46): > Thanks for the positive feedback.

Ruizhu HUANG (11:14:46): > Thanks,@Leo Lahtiand@FelixErnst. It looks very clear and nice. I have nothing new but very minor comments there.

Leo Lahti (11:18:39): > Great!

Sudarshan (12:24:11): > Well written and clear for me!

Leo Lahti (12:25:18): > Perfectt

Leo Lahti (12:25:51): > @hcorradaemail and approval for the text still needed - and perhaps comments who we are missing from the author list.

Leo Lahti (12:26:58) (in thread): > Institutional email might look better in abstract although I am not sure if the emails will be public. Probably not. Perhaps it does not matter.

Sudarshan (12:27:36) (in thread): > sorry here is the linkhttps://github.com/mikemc/speedyseq

Sudarshan (12:30:27) (in thread): > changed. For now this is the one with UMC Utrecht but will change in December:sweat_smile:

FelixErnst (12:30:53): > @Domenick Braccia?

hcorrada (12:33:13): > I will look in about 3 hours. Is that ok?

Leo Lahti (12:55:48): > sure, we have 3 more days:slightly_smiling_face:

Leo Lahti (12:56:15) (in thread): > good

Leo Lahti (12:56:26) (in thread): > oh yeah

Leo Lahti (12:56:41) (in thread): > scientists should have universal permanent emails

Domenick Braccia (13:02:29): > Hey all, nice work on the daft. I will provide comments later today. My full name: Domenick James Braccia, my email:dbraccia@umd.edu

Leo Lahti (13:55:27): > This all seems to flow so smoothly

2020-10-27

Leo Lahti (07:14:29): > I would like to submit this tomorrow, so if you still have anything let me know.

FelixErnst (10:40:36) (in thread): > I have seen that also a couple of month ago, but I didn’t have a closer look at it. Do you have any experience with it/ever used it?

Sudarshan (10:46:19) (in thread): > Tested thepsmeltfunction and it is faster than thephyloseq. FormeltAssay, I tried to usedplyrandtibbleby default to be faster.

2020-10-28

FelixErnst (04:47:01): > a few tweaks in the draft. let me know, what you think about them

Leo Lahti (04:52:20): > it’s good. I am still waiting for@hcorradaapproval (email)

Leo Lahti (16:57:24): > Ok - I assumed that@hcorradais OK and I just submitted.

Leo Lahti (16:57:40): > I think we have a good case. Let us see how it flies.

hcorrada (20:11:31): > thanks!

2020-10-29

Leo Lahti (08:47:59): > now I realize that in the above discussions also@Ruizhu HUANGwas tagged as a potential co-author. I had missed that discussion and no one had added her name in the abstract draft (I asked to add any changes there directly, in order not to miss anythng), therefore it is not currently included. I can see if it can be added in the abstract, if I get confirmation from@Ruizhu HUANGwhether she would like to be included as a co-author in MicrobiomeExperiment EuroBioc2020 short talk submission.

FelixErnst (08:48:51): > yes I guess she added herself at one point but it got overwritten?

FelixErnst (08:49:02): > Fiona = Ruizhu Huang

Leo Lahti (08:49:14): > oh, then I do not know why it is not there

FelixErnst (08:49:23): > She is the author of TreeSummarizedExperiment

Leo Lahti (08:49:24): > must have been some mistake

Leo Lahti (08:49:27): > i know

Ruizhu HUANG (08:49:59): > No problem. It should be fine if you have submitted.

FelixErnst (08:51:14): > Hey, ruizhu, lets get you added:+1:I think Leo can make that happen

Leo Lahti (08:51:25): > ok it seems i can edit the submission, no prob

Ruizhu HUANG (08:51:27): > I hesitated to add my name because I was highly demanded in other projects, and not sure how much I could contribute to the work.:joy:

Sudarshan (08:51:51): > Nice!

Leo Lahti (08:51:53): > well I am sure your part is crucial as any contributions to tse class are also contributions to me

Leo Lahti (08:52:07): > (me=ME class)

FelixErnst (08:52:12): > Yeah, but this is not about doing a lot of work (which you already did), it is about getting feedback

Leo Lahti (08:52:29): > i am just thinking how to sort the name ordering

Leo Lahti (08:52:44): > @Ruizhu HUANGas second last author?

FelixErnst (08:52:54): > I would suggest a 20 dice:grin:

FelixErnst (08:52:58): > Yeah sure

Leo Lahti (08:53:01): > cool

Leo Lahti (08:53:28): > done

Ruizhu HUANG (08:53:46): > I am happy anywhere to be put:blush:, and would be interested to know the feedback

Ruizhu HUANG (08:53:52): > Thank you!

Leo Lahti (08:54:17): > !

Atul Deshpande (10:48:00): > @Atul Deshpande has joined the channel

Vince Carey (13:03:22): > @Vince Carey has joined the channel

Leo Lahti (17:00:20): > Nice, welcome:slightly_smiling_face:

Leo Lahti (18:13:37): > Dear all - > > The project will ultimately benefit from having a clear home. > > We have already the Github microbiome organization available. It has not been very actively maintained, so it could be quite easily rewritten and converted as a home for a joint project Using this readily available organization for our purposes would have at least the following advantages: > 1. using organization account may facilitate long-term development and maintenance as admin lists can be easily updated and in the end an organization is not really tied to any single individual; using organization as a project home allows development on personal accounts as well through forking and PRs. > 2. using a non-personal organization account might make it more attractive for community-minded contributors; > 3. themicrobiome.github.ioorganization and its microbiome R pkg already have some visibility in the relevant communities and could reach a substantial user base (but we can update the contributor list so that everyone’s efforts are clear and well acknowledged so that it is a genuine community thing); > 4. we already have 4 teams (FI / BE / NL / UK) who have agreed to combine efforts in developing R tools for microbiome data; if the MicrobiomeExperiment would join and we develop the whole thing as a joint effort of class structure + packages + tutorials + perhaps a blog, then directing these efforts on one landing site where everything is based on the new class might help to gather the necessary momentum. In the end there will be many more independent packages if the project is successful, but I am thinking of the initial set of resources that will help us to bring together the community, get the good visibility and break through. > Would be great to hear what you think.

2020-10-30

Sudarshan (03:51:44): > 5. In addition, workshops and summer/spring schools can be organized under one umbrella. Allowing for wider user support and promotion of tools.

Sudarshan (03:52:32): > We have already organised some within the microbiome framework in EU and Asia

Ruizhu HUANG (05:02:13): > Hi@Charlotte Soneson@FelixErnst@Leo Lahti@Sudarshan, > I wonder whether we should includereferenceSeqslot in theTSE,and leaveMEto working on structure that doesn’t support by currentTSE(e.g., MultiOmics).

Leo Lahti (05:04:59): > At least that would be quite a remarkable change to the current one.

Leo Lahti (05:05:47): > By the way, spatial or temporal data is another aspect that could be supported more..

FelixErnst (05:07:54): > > leaveMEto working on structure that doesn’t support by currentTSE(e.g., MultiOmics). > I guess this already covered in any case throughMultiAssayExperiment. TSE and ME both are supported as elements inExperimentListso I think that shouldn’t be a problem<

FelixErnst (05:16:23): > regarding merging ME and TSE , I think the case could be made that adding the reference slots is a minor addition for a specific field, whereas adding the tree is major technical one. > > I think maybe we need to discuss scopes of packages to get on the same page about which package is doing what. Currently I see it as follows: > > 1. classes for microbiome analysis > 2. data wrangling > 3. data analysis > 4. data presentation > > Is this something we agree on?

FelixErnst (05:19:04) (in thread): > Anyone?

Ruizhu HUANG (05:23:05) (in thread): > right, I have not got the chance to useMultiAssayExperimentyet. The dimension ofassaysin MAE is different? So, I am not sure how easy or difficult it is to adapt the currentTSEorMEto have functionalities ofMAE.

FelixErnst (05:24:17) (in thread): > There is no need for this. Have alook at the vignette or the workshop by marcel ramos from Bioc2020. I think those resources can explain it better than I can

Leo Lahti (05:25:19) (in thread): > Do you mean that we would initially have one package for each of these?

Leo Lahti (05:25:58) (in thread): > I cant contribute much to discussion before evening due to 2 funding DLs that are looming this afternoon. But after that I will have more time to think of this and other things.

FelixErnst (05:26:35) (in thread): > yeah that could be possible

Ruizhu HUANG (05:26:51) (in thread): > I went through the current code ofME. It seems some functions likeagglomerate``mergethat doing data wrangling could go tomiathat works on data analysis?

FelixErnst (05:27:35) (in thread): > for example. I was also thinking about this

FelixErnst (05:28:00) (in thread): > I think the current situation draws some quite blurry lines

FelixErnst (05:28:29) (in thread): > But this nothing I am worying about since we can only make stuff as we go along,

FelixErnst (05:31:19) (in thread): > @Leo Lahtino problem.

FelixErnst (05:34:46) (in thread): > I think that this is definitly a goal I also have in mind, since I am using those resources. I really don’t care, in which namespace the project live.

FelixErnst (05:37:01) (in thread): > If we get the conversion of a core set of tools done, the next step would be to convert also the resources mentioned above. So if you want to commit to this, you can start converting the turorials, where possible. This would tie everything togehter and make it available not only as packages, but as resources for new users

Ruizhu HUANG (05:38:23) (in thread): > Maybe it worths to decideMEis for data wrangling or the class at this stage?

FelixErnst (05:40:01) (in thread): > The name fits only the class and not the data wrangling part

Ruizhu HUANG (05:55:28) (in thread): > That’s what confuses me. The currentMEpackage seems to work more on the data wrangling part. So, should I send new wrangling functions as PR tomiaand stop updatingMEpackage as the class part was planned go toTSE?

FelixErnst (06:44:37) (in thread): > always these hard questions:grin:

FelixErnst (06:45:07) (in thread): > I would tend to saymia

FelixErnst (06:45:28) (in thread): > but iam really not sure since this quite fresh on my mind

Leo Lahti (10:41:14) (in thread): > Yes exactly.

Leo Lahti (11:01:03) (in thread): > I agree very much thatMEwould be dedicated to the class part only.

Sudarshan (13:41:18) (in thread): > Should we all (as many as possible:sweat_smile:) have a virtual brainstorming session? Or we can also meet after the EU BioC then@Leo Lahtican share with us the input he gets and we brainstorm together:slightly_smiling_face:

Leo Lahti (14:24:46) (in thread): > or perhaps both and

Leo Lahti (14:27:30) (in thread): > It is starting to seem to me that a best way to go around this could be to start drafting a shared online resource showing how the classes and tools can be used. I briefly chatted about this also with@FelixErnst- the exact decisions on how to organize material in packages etc can and will evolve over time but decisions are easier to make when we have real code. Converting microbiome & microbiomeutilities tutorials to use the new class can be one thing to do. That could already provide many examples.

Sudarshan (14:36:46) (in thread): > This I agree with completely.meltAssaywas kind of same, we started with an example and then converted it to code. This is useful especially for me as I am not a trained bioinformatitian and can contribute more from an end-user perspective:slightly_smiling_face:

Leo Lahti (14:52:47) (in thread): > well often that helps..

2020-10-31

Leo Lahti (19:30:20): > Actually the 4 points above are good in a sense but there are many specific tools and subtopics. For instance, time series might require their own packages for most points. We can just collect stuff in packages to start with, but in the end it might be sensible to aim at a set of base packages that provide just certain fundamentals.

Leo Lahti (19:35:59) (in thread): > That’s also an option, ifTSEwould clearly benefit fromreferenceSeqslot. Then again, I think MultiOmics might already be handled well by the option for having multiple assays inTSE, or by usingMAE. I am not sure what would be left forMEspecifically. Time series is one aspect that might need more support. Or there could be some other customization on top ofTSEifreferenceSeqwould go there.. have to think what that might be..

2020-11-02

Domenick Braccia (09:56:00): > @Leo LahtiI am up for having a virtual brainstorming session post EU BioC to discuss feedback and future directions!

Charlotte Soneson (09:58:29): > Just a note in case you want potentially broader input: EuroBioc considers suggestions for birds-of-a-feather/discussion sessions on specific topics (also submitted through OpenReview).

Leo Lahti (11:16:19): > @Domenick Braccia- great - let’s do!

Leo Lahti (11:16:38): > @Charlotte Soneson- great - but I think the submission DL closed yesterday?

Charlotte Soneson (11:17:22): > It’s extended until the 16th

Leo Lahti (11:35:05): > oh it was that much - I thought it was this Monday:slightly_smiling_face:- then we should have a look

Leo Lahti (11:35:30): > perhaps next week - also other things may get fwd by that. I will come back to it then.

Leo Lahti (16:21:22): > We could convert example data sets from here to the new class:https://github.com/twbattaglia/MicrobeDS

Leo Lahti (16:22:15): > would be helpful as many examples and case studies for these are available already. We have some additional ones, too.

Leo Lahti (16:24:12): > We could do that conversion and PR but it might be best to wait until we have a better consensus of the final structure ofMicrobiomeExperiment- this is chicken-egg problem, easier to proceed also with testing when we can fix the first serious version.

Leo Lahti (16:26:04): > @Ruizhu HUANG- did you have any specific thoughts on supporting spatial (e.g. geography) or temporal (e.g. time series) variation in(Tree)SummarizedExperiment? I have not worked on single cell data, not sure what the developments are on that side.

2020-11-03

FelixErnst (03:37:14): > Usually this works by setting up a specific factor in the colData()

FelixErnst (03:38:11): > Then you can reference this as function argument and the geographic or temporal information are taken into account that way

FelixErnst (03:38:56): > So for this we don’t need to change anything IMHO, sinceSummarizedExperimentsupports this type of data storage out of the box

FelixErnst (04:48:45) (in thread): > This would be best suited to end up in experiment data hub.scRNAseqis a similar data supplying package

Leo Lahti (07:07:47): > Yes this is true but at same time, there are dedicated R classes for time series and spatial data. Well, we can think about it again if clear needs will pop up.

FelixErnst (07:08:29): > If they are required, they could be created on the fly and used internally

FelixErnst (07:09:00): > But an example workflow would bring clarity to this

FelixErnst (07:10:24): > https://github.com/fionarhuang/TreeSummarizedExperiment

Leo Lahti (07:18:01) (in thread): > right

Leo Lahti (07:18:28) (in thread): > We can prepare something on that.

Leo Lahti (07:19:09): > good

Leo Lahti (07:21:25): > Then@Ruizhu HUANGalready proposed to put the sequence slot inTSE- if that is done, do we have any need forME?

Leo Lahti (07:22:36): > Not sure, though, what is@Ruizhu HUANG’s latest take on this

FelixErnst (07:23:10): > I have prepared a PR for this, which currently open.

Leo Lahti (07:23:32): > to move it to TSE?

FelixErnst (07:24:00): > My current thinking is, that we can use ME as a wrapper package pulling both TSE and mia and any additional package for microbiome analysis

FelixErnst (07:24:38): > Bioc frowns on this a bit, but at least there is only one point of entry.

FelixErnst (07:25:13) (in thread): > yes. Changes to mia are also in progress to fit to the seperation described above

Leo Lahti (07:26:02): > Do you mean that mia would contain microbiome specific methods for the TSE class?

FelixErnst (07:26:09): > Yes

Leo Lahti (07:26:42): > sorry I mean ME would contain microbiome specific methods for the TSE class

FelixErnst (07:26:50): > For now, the ME class is still active, since I didn’t want to let the name go just, yet, without discussing it in the PR

Leo Lahti (07:30:32): > There is clear need for domain specific methods even if the class is the same. Interesting, I have not see this earlier. Typically classes are quite domain specific.

FelixErnst (07:32:18): > I am not sure, what you mean with this

FelixErnst (07:33:58): > Based on this > 1. classes for microbiome analysis > 2. data wrangling > 3. data analysis > 4. data presentation > > it becomes clear, that 1. aka TSE contains the class, 2. and 3. is domain of mia and 4. is something els

FelixErnst (07:36:17): > without the functions for data wrangling, the ME packages becomes very empty. it would have been only the class and this is just very thin. In TSE it is closer to its origin and with one package you get both

Leo Lahti (07:42:00): > Yep.

Leo Lahti (07:42:03): > What is the difference between ME and mia then?

Leo Lahti (07:42:05): > The methods in ME would be somehow more fundamental?

FelixErnst (07:49:02): > The ME package is wrapper, a gateway for getting into microbiome analysis, but the technical things should go into mia

FelixErnst (07:50:06): > If we need a playing field, ME can be that as well

FelixErnst (07:51:18): > But if it is clearly fitting to one of the four points above, IMO, it can go into the package with the matching scope

Leo Lahti (07:52:22): > Ok I am not sure if I get it yet but we will see.

FelixErnst (08:01:22): > I updated the README of ME to summarize this. If anything is still unclear, lets write down right away in the README (PRs for this and other stuff welcome)

Leo Lahti (08:03:58): > Perfect.

FelixErnst (08:24:40): > https://github.com/microbiome/miaViz

Ruizhu HUANG (09:09:03) (in thread): > I think there is aSpatialExperimentunder active development#spatialexperiment. I have not really followed how it is going on …

2020-11-05

Leo Lahti (04:50:49): > Hi@FelixErnst- I saw the latest additionstopTaxa, getAbundanceSample, getAbundanceFeature etc

FelixErnst (04:51:24): > Yeah, I posted a review 5 min ago

Leo Lahti (04:51:30): > which brings me to think that we already have many of the same in the microbiome R package with different names (top_taxa, abundances etc.).

Leo Lahti (04:52:54): > We probably can’t immediately change all that even as start supporting theMEclass. This means that there will be many parallel naming schemes.

Leo Lahti (04:54:09): > The package is already in wide use. Yes, it is one thing to do class methods and another thing to provide additional analysis functions, and not a huge problem to have parallel names (or aliases). But I thought it is good to note this.

FelixErnst (04:59:57): > Sure. I think these are the normal growing pains, we will encounter. It might be an idea to keep track of these overlaps via a table in the book (columns names:phyloseq functionsandMicrobiomeExperimentfunction). This way the switch would be easier for users ofphyloseqandmicrobiome. We could also add atop_taxawrapper and mark it as defunct, so that it can be a smooth transition. All these incongruences are worth it IMHO, since I am missing the verb in some of these functions.

Leo Lahti (05:01:06): > Table is good idea. I am not sure if we are going to abandonmicrobiome. At least for now I am mainly thinking to addMEsupport to start with.

FelixErnst (05:01:28): > That might quite a personal view point, but I think it is good style to differentiate accessors and functions this way. It makes the code more readable

Leo Lahti (05:02:45): > That is true too.

FelixErnst (05:07:44): > Thats something you have to decide. I don’t have a clear grasp on how much work that is.miahas the benefit that is a clean plate, whereasmicrobiomehas a user base and is tried and tested. Personally I see PRs intomiatransferring experience you have gained withmicrobiomeas easier and less dangerous (microbiomeis here to stay for thephyloseqworld and maybe theMicrobiomeExperimentwill be a non starter, in that case you wouldn’t have spent to much time on adaptingmicrobiometo something which isn’t)

Leo Lahti (05:12:44): > Yes I agree in many ways. At the same time, phyloseq is a special case ofME. It should be straightforward to makemicrobiomeinterpretMEobjects as well and even gradually evolve the package towardsME. Will need to think about it. I see the problems with this, too, just not sure if they are really that remarkable in the end.

Leo Lahti (05:15:01): > What is clear that it does not make sense to have too much overlap.

2020-11-08

Leo Lahti (05:23:51): > We are planning to next add standard transformation functions used in microbiome research frommicrobiomepkg tomia(clr, z, log10, log10p, compositional). Any comments meanwhile are welcome. The plan is to add a given transformation in the list of assay objects when the transformation is called.

FelixErnst (05:39:18): > Can you describe the functions in more details or provide links to a description?

FelixErnst (05:43:03): > Sincescuttlehas already anormalizeCountsfunction, we might be able to use this here as well

Leo Lahti (06:10:10): > I will have a look at that first.

Leo Lahti (06:48:54): > Shall we discuss here the location of the main organization & branch? Regarding mia and related packages, and MiaBook.

Leo Lahti (06:50:51): > The difference to a standard pkg development project is that we aim to create a set of interoperable core packages and a related gitbook, which may then support many other independent contributions.

FelixErnst (06:51:55): > I think this is best done in issue and PRs

Leo Lahti (06:52:19): > Also fine.

FelixErnst (06:52:31): > No copying of links and redirection required

Leo Lahti (06:53:22): > Yes at least for the more technical parts of the discussion that is better suited.

Leo Lahti (10:26:44) (in thread): > In addition to data wrangling , analysis and visualiztion I think there would be use for packages that are more dedicated to ME time series or data simulation. We are currently working on both topics.

2020-11-12

Leo Lahti (13:08:22) (in thread): > I do not find any mention on this from EuroBioc2020 website. But I guess short (30-45min) or long workshop (90-120min) could be designed this way?

Leo Lahti (13:10:01) (in thread): > Will need to think if this would fitMicrobiomeExperimentwork. If others have any opinions it would be great to hear. It should be a facilitated session with some hands-on activities. But at this moment theMicrobiomeExperimentproject is at the construction stage, and somehow I feel it may not be ready for a workshop that could be used to throw in new developers. The time would be pretty short for getting them on board, and this Slack channel is already available for anyone.

Leo Lahti (13:15:02) (in thread): > Ifwe had a workshop, it could be 1) practical introduction to and testing of the new tools inME; or 2) guidance for setting up the package and gitbook repositories to get started as a developer; or 3) collection of pros & cons & suggestions in smaller subteams, with a wrap-up in the end.

Leo Lahti (13:16:43) (in thread): > We could plan something on this but I will be active mainly if we are somehow collectively reaching the consensus that it would be useful at this point. Another option is to now have just the short talk, and then next year a longer workshop.

Charlotte Soneson (13:44:22) (in thread): > Sorry, it seems this didn’t actually make it to the website. There is a special submission category called ‘discussion session/birds-of-a-feather’ (it’s mentioned in the announcement of the deadline extension, and on OpenReview). It’s a quite open format: > > Birds-of-a-Feather/Discussion: flexible format that may be proposed in the submission or to be determined, bringing together interested people for interactive discussion >

Leo Lahti (14:37:26) (in thread): > Ahaa ok thanks. Do you think we could suggest both short talk and this one?

Charlotte Soneson (14:37:59) (in thread): > Yes, sure.

2020-11-14

Leo Lahti (11:25:21): > Dear all - I just submitted additionally a birds-of-a-feather session with a slightly modified abstrat. The DL is already on Monday and in my understanding there are no real objections, so I just submitted. It can still be edited until the Monday DL so if you have any comments on the submission feel free to drop a line. I assume that all authors received a copy in their emails.

2020-11-22

Tuomas Borman (16:16:53): > @Tuomas Borman has joined the channel

Leo Lahti (16:42:29): > May I introduce@Tuomas Borman- he is working with us in University of Turku, Finland, and contributing to the development of theMEecosystem. We have started to migrate some functions frommicrobiomepkg tomia, hopefully soon the first PRs.

2020-11-23

Tuomas Borman (07:40:47): > Hello everyone! Thanks Leo for introducing me. It is nice to contribute to this interesting project

2020-11-24

Leo Lahti (08:50:07): > Thank you for submitting an abstract for EuroBioC2020. We are pleased to inform you that your submission has been accepted for a birds-of-a-feather session (1 h).

Leo Lahti (08:50:13): > https://openreview.net/forum?id=_Usnt8Uy5lk¬eId=GGS04Er5CW6 - Attachment (openreview.net): OpenReview > Promoting openness in scientific communication and the peer-review process

Domenick Braccia (08:50:48): > Great news! thanks for sharing that,@Leo Lahti

Leo Lahti (08:57:01): > I will come back to it later as I have time to start planning in more detail.

Leo Lahti (08:57:29): > Those of who you like to join should register to eurobioc conf asap. I guess it would be useful to have some of us there, not just me.

Domenick Braccia (09:02:20): > I’ve just registered for EuroBioc

Leo Lahti (19:05:49): > “In addition to the talk, we also encourage all presenters to present a poster at the conference. Note that the format is flexible (it does not have to be a ‘traditional’ poster, seehttps://eurobioc2020.bioconductor.org/abstracts). If you would like to present a poster, please email the organizing committee ateurobioc2020@stat.unipd.itno later than Dec. 7.” - Attachment (eurobioc2020.bioconductor.org): EuroBioc2020 > European Bioconductor Virtual Meeting 14-18 December 2020

Leo Lahti (19:06:35): > -> At least some simple but nice poster should be easy to set up, I think it is worth it and planning to email the organizers but feel free to comment.

2020-11-25

FelixErnst (04:26:50): > That sounds really great.

FelixErnst (04:28:22): > What is your plan for collaboration on this and what kind of contribution would you like to have?

Leo Lahti (04:39:33): > This is shaping up, and the suggestions from everyone are welcome. I won’t have time to focus on this before the weekend but in general, I guess the talk + birds-of-a-feather + poster will form one coherent package with the same set of authors; and we should design this so that it will spread the work about the initiative and gather critical feedback. Re: contribution - for instance commenting/augmenting presentation material, and volunteering to help in the birds-of-a-feather session (active participation is already a form of support I guess).

FelixErnst (04:41:29): > That sounds good. My suggestion, would be that you define a structure of the talk (10 min? for birds-of-a-feather) and split sections for individual contribution. Would that work for you/everyone?

FelixErnst (04:41:54): > I will also be there, so participation is a given from my side

Leo Lahti (04:44:50): > sounds feasible

Sudarshan (14:51:59): > I will not be able to actively contribute to this in the coming weeks because I am in India fulfilling personal commitments.

Leo Lahti (15:01:36): > noprob, this will be a thing for the next decade:slightly_smiling_face:

Domenick Braccia (15:13:39): > I have time to prepare the poster

Domenick Braccia (15:15:01): > I’ll send a sketch of one over by this Friday, probably end of work day US time

Leo Lahti (16:27:38): > Ok that’s great!

Leo Lahti (16:28:02): > we are not in a real hurry with that (yet) but better earlier than later

Domenick Braccia (17:01:11): > Are there any institution logos that you all would like on the poster?@Leo Lahti@FelixErnst

Aaron Lun (17:10:06): > @Aaron Lun has left the channel

2020-12-02

FelixErnst (09:01:34): > Not sure

FelixErnst (09:01:39): > but I guess yes

2020-12-04

FelixErnst (17:44:52): > mianow contains several new functionssplitByRanks,unsplitByRanks,taxonomicTreeandaddTaxonomicTree

FelixErnst (17:45:36): > In additionplotRowTreeis also now a bit more matured inmiaViz

FelixErnst (17:45:40): > https://microbiome.github.io/miaViz/reference/plotTree.html - Attachment (microbiome.github.io): Plotting tree information enriched with information — plotTree > Based on the stored data in a TreeSummarizedExperiment a tree can be plotted. From the rowData, the assays as well as the colData information can be taken for enriching the tree plots with additional information.

2020-12-05

FelixErnst (13:24:28): > I removed thebreakawaydependency frommiauntil it is clear, if and howbreakawaywill head to CRAN or Bioconductor

Leo Lahti (13:56:59): > well at least this is necessary if mia is to be submitted to BioCondcutor.

FelixErnst (13:57:31): > and thats the goal without a doubt

Leo Lahti (14:44:35): > yeag

2020-12-07

Leo Lahti (10:02:43): > We also recommend uploading your slides and/or poster to the F1000 Bioconductor gateway (https://f1000research.com/gateways/bioconductor/about-this-gateway). This is a great opportunity to get credit for your work and increase its visibility and impact as your work will be assigned a DOI and be hosted permanently on the F1000 Bioconductor gateway. Make sure to indicate EuroBioC2020 in the gateway area and European Bioconductor Meeting in the conference field.@Domenick Bracciaalso - Attachment (f1000research.com): About Bioconductor | Gateways | F1000Research > Read the latest peer reviewed Bioconductor articles and more on F1000Research

2020-12-10

Domenick Braccia (20:25:14): > Hey everyone, I made a mockup figure forMicrobiomeExperiment. I borrowed format from the original SummarizedExperiment figure as well as@Ruizhu HUANG’s figure forTreeSE. I am planning to use this in the poster presentation, so let me know if you have any suggestions. - File (PNG): MBExp.png

2020-12-11

Leo Lahti (03:08:35): > Nice! Could you perhaps highlight how it differs fromTreeSE? It is possible that@Ruizhu HUANGis including this toTreeSEandMEwould become obsolete. But I think this can be discussed in the meeting because it is a relevant and more generic question whetherTreeSEwill be the sufficient basis for a new R microbiome ecosystem.

Ruizhu HUANG (03:18:50): > Nice figure! Agree with Leo. I think it is good to get some feedbacks from the meeting. We will also do some revisions/updates forTSEbased on the reviewers’s comments on theTSEarticle in F1000. So, it would be also a good time to add new slots inTSEifTSEis considered to work as a sufficient basis.

FelixErnst (03:21:18): > I agree, nice figure! I include these types of figures in the PR adding ME to the TSE.https://github.com/fionarhuang/TreeSummarizedExperiment/pull/43/files

FelixErnst (03:24:35): > However, after the discussion on the issue on the TSE repo, it remains open, if the refSeq slot gets directly added to TSE. Currently it seams to be that there is tendency in favour of it, to just not add another *SE class to the Bioc-Universe

FelixErnst (03:25:29) (in thread): > TSE does support col and row tree. Also the tree tip do not have to be in a 1:1 relationship with the rows

FelixErnst (03:27:28) (in thread): > Have a look athttps://microbiome.github.io/MiaBook/taxonomic-information.html#generate-a-taxonomic-tree-on-the-flywhat the result of this can look like - Attachment (microbiome.github.io): Chapter 3 Taxonomic information | Microbiome Analysis > Chapter 3 Taxonomic information | Microbiome Analysis

FelixErnst (03:28:48) (in thread): > There the tree contains all the branches which can be distinguished by the taxonomic information, but each tip is linked to multiple rows on average (1645 <-> 19216)

Leo Lahti (04:11:24) (in thread): > Sounds great! Would be more clear to include it before formal publication when there is a chance.

Leo Lahti (04:15:25): > Yes. Anyway it will be good to hear feedback. If TreeSE with this will serve all the purposes, that will simplify everything.

Leo Lahti (04:17:44): > Btw - I aim to work on the Birds-of-a-feather plan for next week EuroBioC today (few hours from now) - I will post here a suggestion later today and you can see if/how to support (all optional).

Alan O’C (04:51:38): > @Alan O’C has left the channel

Domenick Braccia (07:58:21): > Thank you for the input everyone. I can change the(mbe)to(tse)in the figure and/or mention orally during the poster that the extra slots and functions one would utilize for microbiome data are already supported by thetseobject

Domenick Braccia (07:59:02): > @FelixErnstthanks for pointing me to the book chapter. I’ll be sure to picture all of the possible ways to format the tree

Leo Lahti (08:03:15): > I am wondering if we should keep the mbe name for now because this creates a contrast to the current tse that might facilitate discussion. Even if the conclusion then is than mbe will not be needed in the end?

Leo Lahti (08:03:31): > Anyway, no strong opinions.

Domenick Braccia (09:17:54): > yes, that was my thinking with writing it asmbe… but it is a very minor detail

Leo Lahti (09:32:01): > Ok hi all - well to get started I drafted a brief plan on birds-of-a-feather session here:

Leo Lahti (09:32:02): > https://docs.google.com/document/d/1AHBhmtAO9wRIRFshsW-aBHDx6h3jQTSspHNKLZUiWso/edit - File (Google Docs): Bioc / MicrobiomeExperiment / 2020

Leo Lahti (09:32:30): > Most important is now that we list there which of us will be able to join, and what to cover.

Leo Lahti (09:33:35): > We have 1 hour and we can split to smaller teams for group work (e.g. class + packages + book) or we can just keep all in the same session and go through each topic together. Perhaps it also depends on the number of participants.

Leo Lahti (09:36:09): > As the outcome I think we would aim is 1) discuss whetherTSEis sufficient basis for the ecosystem or do we need a separateMEclass; 2) to have a stronger (idea of) roadmap for the development and how the work and community should be organized (or if the current model is good).

Leo Lahti (09:39:29): > If we do not split in smaller groups, then I can chair the whole session if that’s easier but it would still help tremendously if all of us prepare and actively participate the discussion so that it keeps going.

Domenick Braccia (09:44:35): > I feel that one hour is maybe too short to break into groups and then try and re-convene and wrap things up cohesively.

Domenick Braccia (09:45:34): > The drawback of keeping it all together and asking for feedback and planning is that it is not helpful to beginners, but that can be addressed in the very beginning so that people know what they are getting into

Domenick Braccia (09:46:32): > It seems like we need a high level, honest discussion about the state of microbiome analysis in R, which will inevitably be geared towards more experienced users. But I think that is OK

Leo Lahti (09:48:15): > Yes I was thinking the same. So let us keep it all together. I think we can do it as one session, it is also a matter of coordination.

Leo Lahti (09:48:57): > I think it is ok. If there are less experienced users, it may still be rewarding for them to participate. And I am not sure how we could approach this otherwise.

Leo Lahti (09:50:47): > Then the structure Intro -> Content -> Conclusion is probably fine. But I am thinking of the “Content” part. That could be like 10 min per topic (class -> R & data pkgs -> book/documentation -> community management), summarizing our current suggestion and then just gathering any feedback that may arise.

Leo Lahti (09:51:21): > Ok with other suggestions as well, if you have any.

Domenick Braccia (09:52:23): > I like this organization

Leo Lahti (09:56:05): > We could potentially split some of the “Content” part if any of you is enthusiastic about taking the lead on discussing one of those aspects. Or I can do that for all aspects and just wishing to have active support from everyone for the discussions. I think we should have 1-3 summary slides per discussion topic just to get it going. Perhaps an interactive slide or document that can be filled while we are talking.

Leo Lahti (09:57:46): > If I coordinate the overall event, it could help if someone could be writing down the feedback points that arise. Indeed I think that should be a shared document that can be filled in also by others. But someone should take the lead in ensuring that the relevant points are written down.

Domenick Braccia (09:58:13): > I can take this role

Leo Lahti (09:58:18): > Awesome.

Leo Lahti (09:58:36): > I mark this in the plan, thanks so much.

Leo Lahti (09:59:53): > Is@Ruizhu HUANGgoing to join this session? It would be the most natural thing if she wants to give a very short overview of theTSEclass (like 5 min or so)? But I can give as well, no problem.

Ruizhu HUANG (10:30:40): > I did not register for the Bioconductor conference this year. It would be great if you@Leo Lahticould help to give a short overview.

FelixErnst (10:31:38): > So back from my meeting

FelixErnst (10:32:13): > I can also take some of the responsibilities, like the introduction of the TSE structure

Ruizhu HUANG (10:36:15): > Yeah, that would be great!

FelixErnst (10:36:35): > Looking at the split of the content section, which I really like, I would maybe do the TSE and mia section, because that leads natively from one to the other

Leo Lahti (11:49:21) (in thread): > Sure!

Leo Lahti (11:49:40) (in thread): > This would be good. I think you are a bit better equipped to do that.

Leo Lahti (11:50:27): > Ok, if you do TSE and mia and then I do Book + Community? And Domenick collects notes. I can also handle intro + conclusion part.

Leo Lahti (11:50:53): > This is rather clear then. I finalize this over the weekend and prepare some supporting slides.

2020-12-12

Huipeng Li (00:39:34): > @Huipeng Li has joined the channel

Domenick Braccia (22:44:28): > Hi all - I have a draft of the poster for next week’s conference. Please check it out and suggest comments as there is still time to include / exclude / change anything on there currently. Thanks everyone for the feedback advice with making this! - File (PNG): MBExp_poster_20201212.png

2020-12-13

FelixErnst (04:30:54) (in thread): > the package is namedscaternotscatter

Sudarshan (07:53:34) (in thread): > Cool layout and content! > Suggestion: “Inputs are OTU/ASV count tables with or without taxonomic classification with accompanying sample information.”

Sudarshan (07:55:50) (in thread): > “We propose a new family of …”

FelixErnst (08:54:17) (in thread): > Regarding TSE I think the phrasing “We propose that the recent addition of TreeSumamrizedexperiment as new member of the SE family be used in concert with a new suite of tools ….”

Leo Lahti (12:47:41) (in thread): > My name: “Lahti” not “Lathti”

Leo Lahti (12:53:00) (in thread): > I think this is very clear and nice. One thing that we could potentially add is a link tomicrobiome.github.ioas the URL where we can gather all relevant info & links (classes, mia & other pkgs, MiaBook..). Unfortunately it is not updated yet but I was planning to do this on Monday so that it is ready before the poster session. I will also polish the page on the same go, add all co-authors etc. I think it would be beneficial to have a project landing page.

Kelly Eckenrode (13:41:49): > @Kelly Eckenrode has joined the channel

hcorrada (15:26:09): > Hello all, I write with regards to themetagenomeFeaturespackage (https://bioconductor.org/packages/release/bioc/html/metagenomeFeatures.html). Our group did this as a proof of concept of how to regularize metagenomic feature annotation inspired by GenomicFeatures and the various TxDB packages used for RNA-seq analysis (for example). We have not found much adoption to the idea, and are leaning towards deprecating the package. I wanted to reach out here however before doing so to see if there is interest in keeping around and if it’s fits the current strategy you are all pursuing. Thanks in advance!! - Attachment (Bioconductor): metagenomeFeatures > metagenomeFeatures was developed for use in exploring the taxonomic annotations for a marker-gene metagenomic sequence dataset. The package can be used to explore the taxonomic composition of a marker-gene database or annotated sequences from a marker-gene metagenome experiment.

Leo Lahti (15:30:43): > Hmm.. right. Do you see how it could be best used in the new context?

Leo Lahti (15:35:30): > (I should delve into it in more detail before I could assess how this could be used to complement the ME)

hcorrada (15:41:36): > The most straightforward use is to provide a regular interface to annotation databases to extract taxonomy tables, and marker gene sequences. I see those are part of the ME object, so the use case would be to extract this info through metagenomeFeatures and pass to ME upon construction.

FelixErnst (15:42:33): > I have to admit, that I currently only have hands on experience with taxonomic annotation

FelixErnst (15:45:49): > Having said, that I think thatMetaGenomicFeaturesor as you named itmetagenomeFeatureshas a place in the landscape for providing aggregation/conversion from annotation data toGRangestype annotation, which can be easily put into aRSE. Am I correct, that is basically what you have in mind with “regular interface to annotation databases to extract taxonomy tables, and marker gene sequences”?

hcorrada (15:50:41): > This is only for taxonomic annotation, so not a GRanges. This could provide annotated rowData in an SE containing the tax table, reference sequence (a DNAStringSet), etc. from the specific versioned annotation packages (e.g.,https://bioconductor.org/packages/release/data/annotation/html/silva128.1MgDb.html) - Attachment (Bioconductor): silva128.1MgDb > Metagenome annotation package with for the SILVA SSR rRNA database release 128.1, Bacterial and Archeal sequences. Contains a MgDb-class object, defined in the metagenomeFeatures package.

FelixErnst (16:04:52): > I think the idea is still valid, especially for workflows taking into account this type of data. The problem regarding the lack of adoption is in my mind due to the fact that every tool matching samples aka. reads to annotation has brought its on input format for the annotation data. For example dada2:https://benjjneb.github.io/dada2/training.html. So in this regard it doesn’t fill a big void. I think, that if it could offer a more automatic tools for creating an annotation database (Just point at the Silva,RDP,gg file and get an rsqlite), it might find better adoption

FelixErnst (16:06:11): > The silva, rdp and gg release all have a different standard and if this conversion could be automated into a standard like you have withmetagenomeFeaturesit might be worth the effort.

FelixErnst (16:12:12): > This would solve the problem of having to rely on other people generating the database you want to use for tool X and metagenomeFeatures would be the central converter, which would bring a unified setter/getter system with underlying database. So in the current state it does not bridge this gap or at least I haven’t found in the 10 minutes I looked for it. The data seams to be there, but the accessors are not

FelixErnst (16:14:00): > Maybe we can have a call on this, since a discussion would be helpful and might make sense for discussing requirements. I can only offer my point of view, but there maybe alterantives out there, which can offer more perspectives on the decision you outlined above

hcorrada (16:17:28): > Sure, that would be good. When do you all meet next?

FelixErnst (16:17:47): > EuroBioc2020 is probably next:slightly_smiling_face:

Leo Lahti (16:34:48): - File (PDF): Lahti.pdf

Leo Lahti (16:35:35): > Attached the draft version of my short talk slides (8 min). All feedback and suggestions are welcome. I hope I did not forget to mention anyone.

Leo Lahti (16:38:06): > I was planning to recycle some of this also for the BoF session. I could link also to the poster on these slides (not done now) but since the talk is later than the poster session, I am not sure (@Domenick Braccia)

Domenick Braccia (16:40:13): > I can submit the poster to F1000 as the EuroBioc organizers suggested, and then you can link the poster and/or mention that it is in submission too

Domenick Braccia (16:43:23) (in thread): > the next time will probably be the EuroBioc2020 poster session which is Tuesday, 18:15 - 19:15 CET.

Domenick Braccia (16:44:10) (in thread): > But for a more focused meeting about metagenomeFeatures, we may need to find a time that suits across all 4 or so time zones:slightly_smiling_face:

Leo Lahti (16:44:54): > That’s good.

FelixErnst (18:01:13) (in thread): > scatter->scater

Domenick Braccia (20:47:30) (in thread): > @FelixErnstthanks for that, changing this now

Domenick Braccia (20:47:43) (in thread): > @Leo LahtiI think these slides are great! no comments.

Sudarshan (21:18:33) (in thread): > Better to add a QR code for the update lists websitehttps://microsud.github.io/Tools-Microbiome-Analysis/Easier in virtual meeting to scan from screen > > Same for themicrobiome.iowebsite - Attachment (microsud.github.io): List of R tools for microbiome data analysis > A list of R environment based tools for microbiome data exploration, statistical analysis and visualization

Domenick Braccia (21:19:34) (in thread): > sure thing

2020-12-14

Leo Lahti (01:47:43) (in thread): > Ok good

2020-12-15

Francesc Català (05:42:44): > @Francesc Català has joined the channel

Leo Lahti (08:58:26): > Hi all - I just quickly updated links on the website. If you have more ideas on how to take it further you can let me know, or send a PR:http://microbiome.github.io/

Leo Lahti (08:58:56): > We can convert this into a proper website as the time allows, I have some available templates for that. But too much for this week

Sudarshan (09:42:06) (in thread): > Thanks! It is a good starting point for the project.

Leo Lahti (16:29:44): > Our video recording is available at..https://drive.google.com/file/d/1ZffYu_1P59XRw_U4pCRMqYeqAllTD1TH/view?usp=sharing - File (MPEG 4 Video): Wednesday-LeoLahti-MicrobiomeExperiment.mp4

Leo Lahti (16:50:32): > .. and the slides:https://f1000research.com/slides/9-1464 - Attachment (f1000research.com): Slide: Upgrading the R/Bioconductor ecosystem for microbiome research has been published by F1000Research. > Read this work by Lahti L, at F1000Research.

2020-12-16

FelixErnst (07:08:57): > My slides for the birds-of-a-feather session. Any thoughts/comments? - File (HTML): 20201218_boaf_TSE_mia.html

Leo Lahti (07:48:46): > Seems clear

Leo Lahti (07:48:57): > phyloseq reference is for 2020 although the paper that it links to is 2013

Leo Lahti (07:49:07): > slightly confusing

FelixErnst (07:50:59): > hmm i think a made a mistake there

FelixErnst (07:51:06): > but the package is on bioc for 8 years

Leo Lahti (07:51:09): > my name Lathi -> Lahti (latter is the correct one)

FelixErnst (07:51:16): > dang

FelixErnst (07:51:22): > sry for that

Leo Lahti (07:51:44): > domenick had Lathti and anyway i need to correct this all the time

FelixErnst (07:52:33): > the funny think is that i always pronounce it correctly in my head and never thought about using the th there

Leo Lahti (07:52:36): > the slides are good, no other coments

Leo Lahti (07:52:39): > :slightly_smiling_face:

Leo Lahti (07:53:02): > sometimes it is also spellchecker i think

Matteo Calgaro (10:51:25): > @Matteo Calgaro has joined the channel

Henrik Eckermann (11:47:51): > @Henrik Eckermann has joined the channel

Domenick Braccia (12:53:02): > @Matteo Calgarogreat talk today! we are very interested in what you do and we see some overlap between our interests

Domenick Braccia (12:54:13) (in thread): > I realized after your talk that I already had your Jan 2020 bioarXiv paper in my TO READ list. moving it to the top of the pile !:slightly_smiling_face:

Matteo Calgaro (13:10:22) (in thread): > Thank u Domenick!:+1:Great talk for@Leo Lahtitoo! I’m interested in your work:star-struck:We should meet to undestand these overlaps… I think that in a month I will be able to have a more clear idea about my package and that I’d like to make it fully compatibile with yours.:+1:

Domenick Braccia (13:12:25) (in thread): > sounds great. I also want to spend some time playing with some ofcuratedMetagenomicDatastudies and converting them intoTreeSEobjects and performingmiamethods on them

Matteo Calgaro (13:14:11) (in thread): > I missed the poster session yesterday… Tell me if u’d like to meet in the lounge, maybe tomorrow?

Leo Lahti (13:28:35) (in thread): > Yes, we could!

Leo Lahti (13:29:07) (in thread): > Let’s just pick a suitable time.

Domenick Braccia (13:29:22) (in thread): > Yes absolutely. I could do one of the earlier breaks

Matteo Calgaro (13:31:17) (in thread): > ok for me, we can dm tomorrow on Airmeet platform.:slightly_smiling_face:See u tomorrow:wave:

Hena Ramay (17:44:21): > @Hena Ramay has joined the channel

2020-12-17

James MacDonald (11:13:03): > @James MacDonald has joined the channel

Rene Welch (17:35:05): > @Rene Welch has joined the channel

2020-12-18

Henrik Eckermann (02:26:25): > Hi, will students need a running installation of mia for the workshop tonight? I could install MicrobiomeExperiment. But I failed to install mia on 3 different R versions (4.0.1, 4.0.2 and 4.0.3) on Mac. Also tried R 4.0.1 on linux and 4.0.2 on Windows. I did not explore all options yet but if students will have to install it tonight during the workshop, they might run into unexpected errors when usingBiocManager::install("FelixErnst/mia").

Leo Lahti (02:42:39): > Hmm

Leo Lahti (02:42:42): > Thanks!

Leo Lahti (02:43:54): > In my view, we don’t necessarily need to run things in the workshop (except if the presenters themselves are showing something). The one hour time is not sufficient for both teaching and critically assessing the new ideas.

Leo Lahti (02:44:35): > We have about 10-15 minutes per topic (class structure, package ecosystem, book/documentation, community building).

Leo Lahti (02:45:52): > So, I think the idea is not to provide a training workshop for this system now but instead a critical discussion workshop on whether this is needed in the first place, what should be taken into account, how to proceed.. I am not sure if@FelixErnst@Domenick Bracciawould like to add something.

Leo Lahti (02:47:00): > Anyway it might be a good idea to point prospective contributors to the correct instructions on how to install and test it. Perhaps we could set up a getting started page or something, if time allows before the evening.

Henrik Eckermann (02:49:28): > OK great. Then there is no problem for now! Just wanted to make you aware + I could have helped if anyone wanted me to try something out in terms of installation!

Leo Lahti (02:52:02): > Right, you are not necessarily the only one who has this in mind.

FelixErnst (02:52:16): > That was also my take on it since it is a birds-of-a-feather session. I will show some of things we currently have in a few slides. The slides were generated from these resourceshttps://github.com/FelixErnst/EuroBioc2020/tree/master/Microbiomewhich also contains a DockerFile

FelixErnst (02:53:34): > It should build and give you a local RStudio session with the current state ofmia. It takes a few minutes to build though (~45min?)

Ruizhu HUANG (04:24:21): > Hi@FelixErnst@Leo Lahti@Domenick Braccia@Sudarshan, > Nice talk and poster! Have you got interesting feedbacks about whether we should keep theMicrobiomeExperimentclass or just start withTreeSummarizedExperiment? I am doing revisions and updates ofTSE. It would be a good time if we want to add new slots there.

Ruizhu HUANG (04:30:30): > https://github.com/fionarhuang/TreeSummarizedExperiment/pull/43

FelixErnst (04:44:24): > Hi Ruizhu. We have the birds-of-a-feather session this evening. After that we can report back. Would that work for you?

Ruizhu HUANG (05:17:03): > ah… OK. Thanks, Felix!

Leo Lahti (10:54:52): > Here are preliminary slides for the BoF sessionhttps://docs.google.com/presentation/d/1L6WaNe9LneEFYbaMb5WYQGRrUIzhqFJth6OgNbp-98I/edit?usp=sharing - File (Google Slides): EuroBioC2020-BoF

Eva Kohnert (11:12:23): > @Eva Kohnert has joined the channel

Pratheepa Jeganathan (12:03:48): > @Pratheepa Jeganathan has joined the channel

Henrik Eckermann (12:12:33): > it was great to see you all discuss this with the community and especially with Susan as one of the creators of the phyloseq package. I think that was optimal! Thanks a lot for this effort!

Leo Lahti (12:14:53): > Great! I guess it was a bit technical for those who are not already familiar with these techniques

Leo Lahti (12:15:21): > but as far as I could see it was a positive response overall

Shirin Moossavi (12:51:50): > @Shirin Moossavi has joined the channel

Domenick Braccia (14:20:23): > Thank you to everyone who helped out and who contributed to the conversation today and throughout the week. I look forward to working together towards the future of microbiome research and R!

FelixErnst (14:38:01) (in thread): > Thanks from my side as well. It was fun to discuss this and I am happy about all the contributions everyone provided. Thanks to@Leo Lahtiand you for preparing the drafts, thanks to@Ruizhu HUANGfor TSE and everyone else who contribute before I joined

Leo Lahti (15:20:56) (in thread): > .. and for@FelixErnstfor giving it a strong push forward!

Leo Lahti (15:27:16): > About MicrobiomeChallenges (https://github.com/microbiome/BiocMicrobiomeChallenges) - I am in a team that is planning to organize a DREAM challenge on microbiome research as part of a COST action ML4microbiome. We need both public as well as proprietary (confidential) data so that the algorithms for the challenge can be trained with the public data, and benchmarked on the private data. This could be potentially aligned with BiocMicrobiomeChallenges. If anyone has good suggestions for a challenge let me know, we are exploring the options at the moment. It could be about classification, prediction, or other verifiable tasks based on microbiome profiles.

Domenick Braccia (15:28:19) (in thread): > looks like I am getting a 404 error when I try to click that link

FelixErnst (15:34:34) (in thread): > it private since I need to modify it to work

Leo Lahti (15:35:17) (in thread): > ah

FelixErnst (15:35:46) (in thread): > I need to author a PR for BiocChallanges to get more topics available. should be done on monday

Domenick Braccia (15:37:03) (in thread): > sounds good! no rush

2020-12-19

Sudarshan (05:59:12): > FYI single cell RNAseq for microbeshttps://science.sciencemag.org/content/early/2020/12/16/science.aba5257

2020-12-20

Giacomo Antonello (11:31:40): > @Giacomo Antonello has joined the channel

2020-12-21

Harithaa Anand (04:10:58): > @Harithaa Anand has joined the channel

FelixErnst (04:32:21): > So I guess the consensus is, that we add the reference seq slot to TSE?

Luiz Gustavo dos Anjos Borges (08:09:19): > @Luiz Gustavo dos Anjos Borges has joined the channel

Ruizhu HUANG (15:29:19) (in thread): > Thanks, Felix! I have added some comments in the PR. Not sure whether I understand the issue aboutdrop.linksandcopy.linkscorrectly…

2020-12-23

Tuomas Borman (12:14:54): > I am currently working with the documentation about dominance indices for the Miabook. I am wondering, what is the suitable level of information. > > For example, should there be just information about: > > 1)           What are dominance indices? > 2)           What dominance indices are possible to calculate with mia? And then we could add references to lead reader to additional information. >   > Or should the text be more “complete”? For example, should it contain more specific information about single indices, e.g., how Gini is calculated and so on?

Leo Lahti (17:59:13): > Literature references should be enough in most cases.

Shirin Moossavi (18:17:30): > I agree with Leo.

Leo Lahti (18:28:32): > Dominance indices are conceptually very close to alpha diversity indices. Perhaps they deserve their own method but I’m wondering if these should be available through estimateDiversity function as well.

2020-12-24

FelixErnst (05:16:36) (in thread): > I think this depends on how well documented the indices are. If@Tuomas Bormanis writing about it for MiaBook and boiled down version might also be a good addition to the details section of mia

Leo Lahti (07:28:09) (in thread): > Indeed, some technical details may better go in the @details field in package roxygen

Shirin Moossavi (19:14:36) (in thread): > By the way, I don’t know what you mean by dominance indices? I have never heard of them. It is important that correct ecological terminology is used.

2020-12-25

Leo Lahti (06:28:25) (in thread): > Yes. This is somewhat fuzzy. The term “dominance index” seems to be used on a regular basis related to indices such as Simpson index (not inverse Simpson), Berger-Parker, McNaughton’s dominance, Gini index, and some other indices that are conceptually similar to alpha diversity. Some papers seem to use evenness and dominance as exchangeable terms (https://academic.oup.com/femsec/article/43/1/1/509981) but because many of the so-called dominance indices do also account for species richness (not just evenness), they seem more like diversity indices to me. I am not sure if there is any real difference between alpha diversity and dominance, except that some indices are typically called “dominance indices” AND that dominance and diversity indices are negatively correlated: increasing dominance is (in general) associated with decreasing diversity. Both indices seem to be used in ecological literature.

FelixErnst (09:32:54) (in thread): > This distinction is probably important to keep track of. I think there should be subchapter explaining the distinction as presented in the book, even though that might be just for the sake of the book.

Leo Lahti (14:00:15) (in thread): > It is fine and I can perhaps have a look. Meanwhile, the details and references for specific indices should now go to @details in roxygen I think - so@Tuomas Bormanyou do not need to add examples to MiaBook right now, just check that @detail field has sufficient details. You can see examples from other functions and other packages, for instance the vegan package does very good job in explaining methodological details and sources through roxygen so perhaps you can have a look at that.

Shirin Moossavi (16:27:11) (in thread): > No, they are all diversity indices. The difference is that some just take the presence into account (these are the richness indices) and the others such as Simpson take abundance into account as well (these are diversity indices). And all are measures of alpha diversity.

2020-12-26

Tuomas Borman (03:13:57) (in thread): > @Leo LahtiOk, I will check @details

Leo Lahti (08:03:37) (in thread): > @Shirin MoossaviI agree that “dominance indices” are “diversity indices”, if we accept the definition that indices that are influenced by both richness and evenness is a diversity index. I think this is the common definition. Nevertheless, “dominance index” is a concept that seems to appear in ecological literature on its own. All or most of these indices quantify are (in general) negatively correlated with the common alpha diversity indices. Examples include Berger-Parker and McNaughton’s dominance; these are called “dominance”, not “diversity” indices. It is also noteworthy that Simpson index () is sometimes called a dominance index, whereas inverse Simpson (1/) is the corresponding diversity index. There are some conceptual differences here. This being said, it is up to us whether we like to emphasize dominance as a separate concept or not. Dominance and diversity are closely linked.

Leo Lahti (18:49:12) (in thread): > Anyway, dominance could go below the general category of “alpha diversity” as much as richness and evenness indices go there. Even though its sign is different than in any of these.

Shirin Moossavi (18:51:11) (in thread): > Got you, either way, explanation is key. I can help with that

Leo Lahti (18:53:12) (in thread): > Great. The “testing” branch in MiaBook has now a first (very) preliminary version of the diversity explanations. Feel free to comment / improve / PRhttps://github.com/microbiome/MiaBook/tree/testing

Leo Lahti (18:54:43) (in thread): > .. and many details could go to mia/estimateDominance roxygen insteadhttps://github.com/FelixErnst/mia/blob/master/R/estimateDominance.R

Leo Lahti (18:56:23) (in thread): > The level of detail for the book has not been fixed yet. We should probably keep it relatively concise for now, focusing on understandable examples and providing well-thought references. The roxygen pages can contain many more details.

Shirin Moossavi (19:12:55): > One suggestion, I think the following should be added to the quality control section: > 1- Identification of potential reagent contaminants; important for low biomass samples. decontam is a commnly used package for this, which is compatible with phyloseq, so would be good to incorporate it in the Mia. > 2- Batch comparison and correction for large-scale studies. It would be also useful for identifying additional reagent contaminants in low biomass samples. We have done extensive work on this (currently under revision), and I can share our approach with you.

2020-12-27

Leo Lahti (04:03:38): > I agree and the reason that these are not included so far is mainly that we have been busy with other aspects. If you have a chance to contribute to MiaBook text related to these that would be very welcome, too.

Leo Lahti (04:11:09): > I have now added these as issues in miahttps://github.com/FelixErnst/mia/issues/

Shirin Moossavi (21:52:38) (in thread): > I’d be happy too.

2020-12-29

Leo Lahti (14:25:46) (in thread): > @Shirin Moossaviif you have comment on a related issue for evenness indices, kindly checkhttps://github.com/FelixErnst/mia/issues/33

2020-12-30

Shirin Moossavi (00:02:06) (in thread): > Just to let you know, I am taking a few days off. I’ll start working on it in the new year. Happy Holidays by the way.

Leo Lahti (04:00:34) (in thread): > Happy holidays..!

Hena Ramay (14:38:51) (in thread): > @Leo LahtiWe have collected ~20 publicly available 16s vaginal microbiome datasets to look for a particular species but they can be used for other purposes like predicting a common state types or BV status based on the taxonomic profiles. I am happy to put you in touch with the PI with expertise in Vaginal microbiome. She would be able to help with the private datasets.

Leo Lahti (16:35:43) (in thread): > @Hena Ramaygreat! I would be very interested to have a look at this in more detail. There is ultimately a panel who makes the decisions but we do not have too many options yet and this sounds quite promising.

Leo Lahti (16:36:03) (in thread): > Is the public collection available publicly yet (for browsing purposes)..?

Hena Ramay (16:57:47) (in thread): > No unfortunately not right now but I am working on it. We are making an R package with the processed data but I can share the details about the datasets with you probably next week.

2020-12-31

Leo Lahti (06:07:12) (in thread): > ṕerfect

2021-01-07

Julie Aubert (06:09:44): > @Julie Aubert has joined the channel

2021-01-14

Leo Lahti (13:06:17): > might be interesting for some here

Leo Lahti (13:06:20): > We’re excited to announce@sherlockpholmesas our next speaker. She will be talking about microbiomes and how to use R’s workflow to perform statistical analysis on those complex bacterial communities. Interested? Join us this Thursday at 7pm UTChttps://buff.ly/3qnPvHB - Attachment (twitter.com): Dr Susan Holmes :green_book::computer: (@SherlockpHolmes) | Twitter > The latest Tweets from Dr Susan Holmes :green_book::computer: (@SherlockpHolmes). Statistician,Mother,Prof,Grandmthr.”It is a capital mistake to theorize before one has data.Insensibly one begins to twist facts to suit theories”.Sherlock:lower_left_fountain_pen:. Stanford - Attachment (YouTube): Why R? Webinar 029 - Susan Holmes - Why using R for analysis of the human microbiome is a good idea > - speaker Susan Holmes http://statweb.stanford.edu/~susan/- webinars http://whyr.pl/webinars/- subscribe http://whyr.pl/subscribe/- slack http://whyr.pl/slac

Leo Lahti (13:06:38): > Today CET 8pm (1 hour from now)

FelixErnst (13:51:39): > they killed the stream

FelixErnst (13:51:41): > omfg

FelixErnst (13:52:28): > they did start it by accident and stopped it after 15 sec

FelixErnst (13:52:35): > now the link is used up

Domenick Braccia (13:52:39): > yes I am seeing that

Domenick Braccia (13:52:41): > oh no!!

FelixErnst (14:04:49): > the removed the youtube video. probably working in a second - Attachment (YouTube): Why R? Webinar 029 - Susan Holmes - Why using R for analysis of the human microbiome is a good idea

Leo Lahti (14:40:40): > yes there were problems..

2021-01-16

Hena Ramay (16:21:14): > I am working on testing a few longitudinal analysis methods. Came across this latest onehttps://www.nature.com/articles/s41587-020-0660-7.If you anyone has used it or has any comments, please share. Might be useful for Miabook too - Attachment (Nature Biotechnology): Context-aware dimensionality reduction deconvolutes gut microbial community dynamics > Gut microbiome composition is associated with phenotypes as revealed by a dimensionality reduction tool.

Leo Lahti (18:13:56): > We might like to consider including this somehow but need to check how readily usable it is.

2021-01-19

Sudarshan (01:20:44): > https://sites.google.com/umn.edu/decipheringmicrobiomes2019/home

Guangchuang Yu (20:21:53): > @Guangchuang Yu has joined the channel

2021-01-22

Annajiat Alim Rasel (15:44:38): > @Annajiat Alim Rasel has joined the channel

Leo Lahti (19:05:42): > #BioC2021virtual conference (4-6 Aug 2021) abstract submission open until March 9https://bioc2021.bioconductor.org-> I am wondering if August would be a good time for an update of the progress & introducing the work to the broader Bioc community? - Attachment (bioc2021.bioconductor.org): BioC2021 > Site template made by devcows using hugo

2021-01-27

FelixErnst (03:46:24): > This is a good opportunity and my guess would be that until then, the functions are available to show the capabilities of a workflow base onTSEand all the other packages

Leo Lahti (03:49:34): > I can take care of preparing the abstract draft if there are no other suggestions/preferences.

2021-01-28

mirna (10:40:56): > @mirna has joined the channel

2021-01-29

FelixErnst (08:40:41): > microbiomeDataSetshas been accepted as an ExperimentData package to Bioconductor. Thanks to all the contributors and tester! > > If you want to see your or any other dataset you are working with as a part of the package, please feel free to submit a PR on GitHub. (The package is aimed at microbiome sequencing efforts with a clear separation to any metagenomic datasets)

FelixErnst (08:41:06): > https://github.com/microbiome/microbiomeDataSets

Leo Lahti (09:32:36): > Really nice.

Leo Lahti (09:37:37): > Depends: R (>= 4.1) but the latest R version is R 4.0.3 …

Leo Lahti (09:37:56): > this seems to prevent installation for some at least, I got a report from CR

FelixErnst (09:38:01): > This is currently only available in Bioc-devel

Leo Lahti (09:38:18): > this also applies to the Github version

Leo Lahti (09:39:25): > I got this reported

Leo Lahti (09:39:26): > I have been trying to download the new version of “mia” usingBiocManager::install("FelixErnst/mia")with the updated loadFromQIIME2 function but it says > > ERROR: this R is version 4.0.3, package ‘microbiomeDataSets’ requires R  >= 4.1

FelixErnst (09:39:36): > Just install via the release branch:BiocManager::install("microbiome/microbiomeDataSets@release")

FelixErnst (09:42:09): > @Leo Lahtidoes that work?

Leo Lahti (10:00:38): > I seem to get the same error after runningBiocManager::install("microbiome/microbiomeDataSets@release")successfully and thenBiocManager::install("FelixErnst/mia")

Leo Lahti (10:00:57): > ERROR: this R is version 4.0.3, package ‘microbiomeDataSets’ requires R >= 4.1 > * removing ‘/home/lemila/R/x86_64-pc-linux-gnu-library/4.0/microbiomeDataSets’ > Error: Failed to install ‘mia’ from GitHub: > Failed to install ‘microbiomeDataSets’ from GitHub: > (converted from warning) installation of package ‘/tmp/RtmpP3cpbC/file1334b2d6a4b1d/microbiomeDataSets_0.99.5.tar.gz’ had non-zero exit statu

FelixErnst (10:02:13): > Can you tryBiocManager::install("FelixErnst/mia", update=FALSE)?

Leo Lahti (10:08:33): > ERROR: this R is version 4.0.3, package ‘microbiomeDataSets’ requires R >= 4.1 > Error: Failed to install ‘mia’ from GitHub: > Failed to install ‘microbiomeDataSets’ from GitHub: > (converted from warning) installation of package ‘/tmp/RtmpP3cpbC/file1334b5448e6f9/microbiomeDataSets_0.99.5.tar.gz’ had non-zero exit status

FelixErnst (10:09:29): > BiocManager::install("FelixErnst/mia", update=FALSE, dependencies = FALSE)?

Leo Lahti (10:10:48): > ERROR: dependency ‘microbiomeDataSets’ is not available for package ‘mia’ > * removing ‘/home/lemila/R/x86_64-pc-linux-gnu-library/4.0/mia’ > Error: Failed to install ‘mia’ from GitHub: > (converted from warning) installation of package ‘/tmp/RtmpP3cpbC/file1334b54c3f6f4/mia_0.98.31.tar.gz’ had non-zero exit status

FelixErnst (10:11:18): > But thenmicrobiomeDataSetswas not installed

Leo Lahti (10:11:40): > true. let me see, I made some mistake

Leo Lahti (10:12:53): > I am not sure what changed but now it works after I run second timeBiocManager::install("microbiome/microbiomeDataSets@release")

Leo Lahti (10:13:17): > I thought installation had been successful already but perhaps I was mistaken.

FelixErnst (10:14:54): > Main thing is, it works

2021-02-09

Sudarshan (12:58:02): > https://twitter.com/daanspeth/status/1354946835924750349?s=09 - Attachment (twitter): Attachment > The BVCN is hosting a virtual conference from June 7-11, 2021, and we are looking for speakers. > Nominate yourself and/or others here: https://forms.gle/cD71qbQ7rBbVkJ6V7 > > what is the BVCN conference? A short thread: https://pbs.twimg.com/media/Es288hqVkAoMXJ2.jpg

2021-02-11

Leo Lahti (06:16:28): > Some nice demo data for microbiomeDataSets?https://www.nature.com/articles/s41591-019-0559-3 - Attachment (Nature Medicine): A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research > A comprehensive biobank of bacterial isolates with longitudinal and multi-omics characterization will advance understanding of the diversity and functions of human gut bacteria.

Leo Lahti (08:38:18): > what would you think about using github issues to list this kind of potentially interesting data sets

FelixErnst (09:54:03): > I think a GH issue is the right place to track these. In addition, we could get the MicrobiomeChallanges started

FelixErnst (09:54:50): > It would be an exercise in getting to knowTSEandExperimentHub

FelixErnst (09:57:02): > Probably only the 16S data is within the scope ofmicrobiomeDataSets

Sudarshan (10:00:08) (in thread): > Yes and count data are not available. We may need to contact the authors.

FelixErnst (10:00:45): > I thought the data was supposed to be available through the website

Sudarshan (10:01:40) (in thread): > Codes and 16S rarified counts available on github repo.

Leo Lahti (10:57:04) (in thread): > Ah. Ok.. Wouldn’t it be ok to have sequencing-based abundance tables in microbiomeDataSets?

Leo Lahti (10:57:41) (in thread): > right. So is it suitable for microbiomeDataSets then in the end?

FelixErnst (10:58:08) (in thread): > well, that a last resort. processed count data is a bit:face_vomiting:work with

FelixErnst (10:58:16) (in thread): > cannot really do anything with it

FelixErnst (10:58:50) (in thread): > I guess not, since only rarified counts are available

Leo Lahti (11:00:37) (in thread): > Right

Leo Lahti (11:00:58) (in thread): > well interesting downstream analyses can be possible

Sudarshan (13:43:10) (in thread): > I go with Felix. We need raw counts for all assays to be meaningful for teaching as well as tool development

2021-02-17

abdullah hanta (16:07:01): > @abdullah hanta has joined the channel

2021-02-22

Leo Lahti (05:30:11): > Frontiers seems to have a new “topic” for Methods for Single-Cell and Microbiome Sequencing Datahttps://www.frontiersin.org/research-topics/13026/methods-for-single-cell-and-microbiome-sequencing-data - Attachment (Frontiers): Methods for Single-Cell and Microbiome Sequencing Data > The last few years have seen a simultaneous explosion of high-throughput single-cell as well as metagenomics sequencing technologies for quantifying DNA/RNA levels within individual cells/species. These breakthroughs together with other ’omics studies (such as proteomics, metabolomics, etc.) pave the way for exploring biological systems at an unprecedented level of detail. While single-cell technologies allow us to look closely into inter-cellular variations and interactions and intra-tissue heterogeneity in biological samples, metagenomics sequencing enables in-depth studies of microbial ecosystems in much finer resolution than previously envisioned. Both these technologies have independently led to the development of novel computational and statistical methods that encompass data preprocessing, modeling, and inference. Despite the progress, there is still much work to be done to meet the challenges and make use of the opportunities posed by the resulting new data types. Although there are special issues in statistical and computational biology journals focusing on each individual data analysis, researchers from these fields as well as practitioners of these technologies would greatly benefit from a combined issue, where it is possible to exchange ideas, raise new questions, and form future collaborations, building upon the lessons learned in one field and reverse-translating the gained knowledge from one field to the other. This is particularly relevant as both these tec…

2021-02-26

Diana Hendrickx (11:57:34): > @Diana Hendrickx has joined the channel

Diana Hendrickx (12:11:15): > Hi, I’m a postdoctoral researcher and working with Clara Belzer at the Laboratory of Microbiology at Wageningen University. I have a background in mathematics/statistics and I’m working on a predictive model for persistence of cow’s milk allergy combining different types of data (16S amplicon sequencing, clinical, immunology, proteomics, metabolomics), measured at 3 time points. As I have a quite different background than the other people in my group, which are lab scientists, I like to get in touch with other people in the field of microbiome data analysis.

Leo Lahti (12:16:44): > Nice, welcome:slightly_smiling_face:

Leo Lahti (12:18:03): > The TSE ecosystem is specifically suited for multiomics so it sounds like a good match.

Leo Lahti (12:18:46): > A limitation is that it is so far less widely adopted but this could be changing.

FelixErnst (12:23:55): > TSE= TreeSummarizedExperiment:slightly_smiling_face:

2021-03-01

Leo Lahti (15:41:05): > Do we already have functions to remove specific samples or specific taxa (similar to microbiome pkg remove_taxa or remove_samples)?

FelixErnst (15:43:20): > I don’t think so, but of course you are free to construct a logical, character or integer vector for subsetting to the desired samples, features

FelixErnst (15:45:17): > Just construct a vectoriand/orjand save the output oftse[i,j]

FelixErnst (15:47:51): > I often userowData(tse)$mean_rel > thresholdor any such combinations. (mean_relcan be easily set usingrowData(tse)$mean_rel <- rowMeans(assay(tse,"relabundance"))

FelixErnst (15:48:34): > Of course the is no requirement to save the mean relative abundance inrowData.i <- rowMeans(assay(tse,"relabundance") > thresholdworks as well

FelixErnst (15:53:50): > !is.na(rowData(tse)$Phylum) & rowData(tse)$Phylum == "whatever"will produce a vector for subsetting based on taxonomic information

Leo Lahti (15:55:39): > Sure, there are many ways to do it no doubt. The question is mainly, do we think that an easy shortcut would be useful. I think it would be.

Leo Lahti (15:56:20): > Like, we can do most of these things even in base R by writing enough code..

FelixErnst (15:56:49): > Which use cases do you want to cover? I think there are to many to really make it user friendly

FelixErnst (15:58:56): > The problem I see, is thatremove???can only cover so many use cases

FelixErnst (16:00:00): > Eventually someone needs something, which these functions cannot do. And then the learning curve is really steap, if everything got abstracted away up until this point

Leo Lahti (16:00:21): > The use cases I have in mind would be similar to subsetting of the data. Instead of including samples based on a given variable (like in phyloseq::subset_samples), we could explicitly exclude specified samples from the data.

FelixErnst (16:02:38): > subset_samples is the way for phyloseq to provide the[functionality. So this is just a different name and one, which is special and only usable with this type of object.[is canonical and used throughout R. My guess would be that this is what user are most familiar with, especially whan they start

Leo Lahti (16:04:12): > Ok so we could encourage users to use the[functionality for any sort of subsetting or removal?

FelixErnst (16:06:41): > that would be my favorite. The knowledge gained by the approach, is transferable to any otherSEobject type and thus will benefit the user in the long run. (If a user is familiar with the other types, they will not go looking for other functions)

FelixErnst (16:08:07): > Excluding specific taxa is in my mind a contradiction, since we cannot provide a general function to exclude specific things. The specificity would need to be provided again by the user, so that we wouldn’t have gained anything

Leo Lahti (16:13:08): > Fine by me. But good to discuss it through.

Leo Lahti (16:15:42): > I have just opened a bunch of issues in mia, this was the list of things from microbiome pkg that could be potentially transferred. No need to transfer all but we can systematically discuss and close those cases through the issues.

2021-03-03

Leo Lahti (13:50:10): > Novo Nordisk has an international “data science” call that could match with the tse/mia ecosystem. But Danish colleagues should be involved. Do we have?https://novonordiskfonden.dk/da/grants/data-science-collaborative-research-programme-2021/- one challenge also is that the DL is alreday March 16. But I just saw this today.. - Attachment (Novo Nordisk Fonden): Data Science Collaborative Research Programme 2021 - Novo Nordisk Fonden > Purpose  The Data Science Collaborative Research Programme aims to support synergistic research collaborations rooted in data science which:   lead to new or improved core data science algorithms, methods and technologies. and/or explore and expand data science applications to real-world scientific problems within the scope of the NNF Data Science Initiative (see Areas of Support) The […]

2021-03-08

Vince Carey (05:34:35): > @Kasper D. Hansen^^

2021-03-12

Levi Waldron (03:31:13) (in thread): > I just would add that although[comes automatically with SummarizedExperiment, I think it is also useful to replicate the phyloseq API since it is so widely used, and provide a drop-in replacement. Defining verbs also gives the opportunity to customize in ways that using[doesn’t.

FelixErnst (03:58:10) (in thread): > Thats one decision we are wrestling with. I argue above, that we cannot anticipate every use case in advance and are better of nudging the user in the direction of using[, because this can be used in R universally for 2+ dimensional data. Since thesubsetfunction (whichsubset_samplesrefers to) is actually only designed to work on vectors, I also have a hard time justifying it on a conceptual level. Thats also the reason why there would need to be two functionssubset_samplesandsubset_features. So I am willing do compromise here and add the functions with a deprecation warning, that they will be removed in Bioc 3.15+. Do you think that is reasonable?

Leo Lahti (14:10:14): > @FelixErnstyou mentioned that runMDS2 will be gone by release. What will be the replacement?

FelixErnst (14:10:39): > runMDSfromscater

FelixErnst (14:10:54): > But the features needed are just in devel

FelixErnst (14:11:24): > So to be able to work and test stuff in release I addedrunMDS2as a test bed

Leo Lahti (14:32:34): > Ok.

2021-03-15

Leo Lahti (04:07:29): > Did we conclude something about Faith’s phylogenetic diversity? Can’t find inmiaissues.@Sudarshan

FelixErnst (04:10:49): > I don’t think so. it was discussed on slack I I think

Leo Lahti (04:42:50): > Right. This is my impression too. I just though we already opened an issue but if this is not the case and we agree that this is good to have then I could open an issue.

FelixErnst (04:43:17): > I think this was about the implementation, wasn’t it?

FelixErnst (04:43:42): > Since the package implementing it is not on CRAN or Bioc, we cannot depend on it

Sudarshan (04:43:44): > Yes it was about implementation

Leo Lahti (04:44:06): > Ah right.

Leo Lahti (04:44:37): > Would be nice to complement all the non-tree based indices with one tree-based index, for comparisons.

Sudarshan (04:44:46): > Picante for Faith’s pd is on cran

Leo Lahti (04:45:37): > Then it should be possible to use?

Sudarshan (04:45:40): > Miabook is where we were discussing it

Sudarshan (04:45:45): > Yes

Sudarshan (04:46:00): > I show that in Miabook

Leo Lahti (04:47:02): > It would add a dependency inmia, that’s a minus. On the other hand it is clearly more handy for users if it is readily available through mia.

Sudarshan (04:48:24): > Yes it’s only one function but adds an entire pkg

FelixErnst (04:48:49): > How about a suggests?

Sudarshan (04:52:59): > Do you want to add in Mia or MiaBook?

FelixErnst (04:57:33): > If we put in suggests, I think it can go into mia.

Sudarshan (04:58:17): > Would need a wrapper function or just in vignettes?

FelixErnst (04:59:52): > I don’t know. Whats the functions name inpicante?

Leo Lahti (05:10:28): > I would personally like to add it as one of the indices inmia::estimateDiversity

Sudarshan (08:54:42): > picante::pd()

Leo Lahti (08:59:18): > Could be even re-implemented relatively easily.

Sudarshan (09:01:42): > If you can speed up the calculations would be nice. The picante::pd() function is slow.

Leo Lahti (09:06:04): > right

Leo Lahti (09:06:17): > I add this to issues

2021-03-17

FelixErnst (05:38:24): > Themiarepository was moved to themicrobiomeorganization. Please update your git remote settings to reflect this. The new remote should point togithub.com/microbiome/mia.git

Leo Lahti (06:03:43): > I am preparing some smaller documentation and code unification things.

Leo Lahti (08:05:20): > I update the initial project landing page a bit:https://microbiome.github.io

Leo Lahti (08:05:42): > if there are web wizards who like to make it better, feel free to jump in:slightly_smiling_face:

2021-03-18

FelixErnst (04:12:50): > I am about to merge a PR introducing thesubset*functions. However, I am not so sure, that they will survive the review process, since they include a message about not using them and refer to[for subsetting. Lets see how it goes

Sudarshan (04:14:03): > How can you sell a product labelled as ’there are better option than this”?:wink:

FelixErnst (04:16:09): > We will see. The rational behind it is to make a transition easier, so it is not without good reason.

Sudarshan (04:24:24) (in thread): > Yes. I can guarantee that the user base of phyloseq will appreciate these small but useful functions

Leo Lahti (04:36:31) (in thread): > yep

2021-03-19

Leo Lahti (04:31:29): > Would be great to come up with a short name and hex logo for theMEecosystem.

FelixErnst (04:31:45): > I will ask Johannes later today

FelixErnst (04:31:59): > MAybe he has a good idea

Leo Lahti (04:32:09): > cool

FelixErnst (05:46:17): > Here we go:https://github.com/Bioconductor/Contributions/issues/1987 - Attachment: #1987 mia > Update the following URL to point to the GitHub repository of
> the package you wish to submit to Bioconductor > > • Repository: https://github.com/microbiome/mia > > Confirm the following by editing each check box to ‘[x]’ > > I understand that by submitting my package to Bioconductor,
> the package source and all review commentary are visible to the
> general public. > I have read the Bioconductor Package Submission
> instructions. My package is consistent with the Bioconductor
> Package Guidelines. > I understand that a minimum requirement for package acceptance
> is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
> Passing these checks does not result in automatic acceptance. The
> package will then undergo a formal review and recommendations for
> acceptance regarding other Bioconductor standards will be addressed. > My package addresses statistical or bioinformatic issues related
> to the analysis and comprehension of high throughput genomic data. > I am committed to the long-term maintenance of my package. This
> includes monitoring the support site for issues that users may
> have, subscribing to the bioc-devel mailing list to stay aware
> of developments in the Bioconductor community, responding promptly
> to requests for updates from the Core team in response to changes in
> R or underlying software. > I am familiar with the Bioconductor code of conduct and
> agree to abide by it. > > I am familiar with the essential aspects of Bioconductor software
> management, including: > > ☑︎ The ‘devel’ branch for new packages and features. > ☑︎ The stable ‘release’ branch, made available every six
> months, for bug fixes. > ☑︎ Bioconductor version control using Git
> (optionally via GitHub). > > For help with submitting your package, please subscribe and post questions
> to the bioc-devel mailing list.

2021-03-20

watanabe_st (01:57:49): > @watanabe_st has joined the channel

2021-03-26

FelixErnst (05:46:13): > A GitHub Project is now set up athttps://github.com/orgs/microbiome/projects/1. Maybe this helps to get a better overview of the packages on microbiome data and SummarizedExperiment objects and their current issues and PRs. Feel free to contribute in any way!

Leo Lahti (05:49:56): > so nice, thanks again@FelixErnst

2021-03-29

Leo Lahti (15:29:40): > This for microbiomes toohttps://twitter.com/strnr/status/1376598463589277699?s=19 - Attachment (twitter): Attachment > tidyseurat: Interfacing Seurat with the #Rstats tidyverse https://www.biorxiv.org/content/10.1101/2021.03.26.437294v1 > https://github.com/stemangiola/tidyseurat > https://cran.r-project.org/web/packages/tidyseurat/ https://pbs.twimg.com/media/ExqpAWVXAAUyszQ.png

Leo Lahti (16:24:43): > Also this one..https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001165 - Attachment (journals.plos.org): Is “bioinformatics” dead? > Why would a computational biologist with 40 years of research experience say bioinformatics is dead? The short answer is, in being the Founding Dean of a new School of Data Science, what we do suddenly looks different.

2021-04-08

FelixErnst (02:52:17) (in thread): > Both packages,miaandmiaViz, got accepted. Thanks to all the Authors and Contributors!

Leo Lahti (04:14:32): > wohooo

Leo Lahti (04:16:35): > makes things easier

FelixErnst (04:43:18): > Yes it does. Lets hope that Bioc 3.13 is going to be released in April, but my guess would be May, since I haven’t seen an official release date for R 4.1, yet

Jayaram Kancherla (09:02:18): > not sure if this is already looked at -https://siamcat.embl.de/ - Attachment (siamcat.embl.de): Statistical Inference of Associations between Microbial Communities And host phenoTypes > Pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes (SIAMCAT). A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research. For this, robust statistical modeling and biomarker extraction toolkits are crucially needed. SIAMCAT provides a full pipeline supporting data preprocessing, statistical association testing, statistical modeling (LASSO logistic regression) including tools for evaluation and interpretation of these models (such as cross validation, parameter selection, ROC analysis and diagnostic model plots).

Leo Lahti (09:08:39): > Support for TreeSummarizedExperiment?

FelixErnst (09:39:49): > That would be quite nice

FelixErnst (09:43:23): > I had a look at the vignette and they wrap phyloseq in their own class

Sudarshan (09:43:39) (in thread): > Yes

Sudarshan (09:44:06) (in thread): > Also when loading microbiome pkg there is clash for meta() function

FelixErnst (09:53:49) (in thread): > Well that is just a clash, which happens. With an S4 system it might be less of a problem and more readily solved, but of course that requires dependencies to be sorted out.

Leo Lahti (17:05:40): > This fails - I wasn’t sure if opening issue to mia is good before asking here. I would expect that this would work with default example data sets.library(mia); ``data(GlobalPatterns); ``scater::runMDS(GlobalPatterns, FUN = vegan::vegdist, name = "MDS_BC", exprs_values = "counts")with > > Error in .calculate_mds(mat, transposed = !is.null(dimred), …) : > unused argument (FUN = function (x, method = “bray”, binary = FALSE, diag = FALSE, upper = FALSE, na.rm = FALSE, …) > { > ZAP <- 1e-15 > if (is.na(pmatch(method, “euclidian”))) method <- “euclidean” > METHODS <- c(“manhattan”, “euclidean”, “canberra”, “bray”, “kulczynski”, “gower”, “morisita”, “horn”, “mountford”, “jaccard”, “raup”, “binomial”, “chao”, “altGower”, “cao”, “mahalanobis”, “clark”, “chisq”, “chord”) > method <- pmatch(method, METHODS) > inm <- METHODS[method] > if (is.na(method)) stop(“invalid distance method”) > if (method == -1) stop(“ambiguous distance method”) > x <- as.matrix(x) > if (!na.rm && anyNA(x)) stop(“missing values are not allowed with argument ‘na.rm = FALSE’”) > if (!(is.numeric(x) || is.logical(x))) stop(“input data must be numeric”) > if (!method %in% c(1, 2, 6, 16, 18) && any(rowSums(x, na.rm = TRUE) == 0)) warning(“you have empty rows: their dissimilarities may be meaningless in method”, dQuote(inm)) > if (!method %

2021-04-09

FelixErnst (03:08:09): > I think you a using release version ofscater

FelixErnst (03:08:53): > runMDSis implemented inscaterand the support for other distance functions is only available in devel

Leo Lahti (04:11:40): > ahaa ok good

Leo Lahti (04:11:44): > will test

2021-04-12

Leo Lahti (09:34:05): > I am planning to add a new data set inmicrobiomeDataSets. The data is available inFigshare. I noticed that none of our current data sets include make-data.R script (only make-metadata.R) scripts inmicrobiomeDataSets/inst/scripts- I am however planning to add one, showing how the data is constructed from the Figshare files. Any comments? - Attachment (figshare): NAFLD and XOS project, rat data, gut microbiota > Data repository for the research manuscript “Xylo-oligosaccharides in prevention of hepatic steatosis and adipose tissue inflammation: associating taxonomic and metabolomic patterns in fecal microbiomes with biclustering” by Hintikka, J et al 2021.

Sudarshan (13:17:40): > Not sure if the FigShare files are generalized across studies. Maybe we show a few examples in OMA?

Leo Lahti (13:37:36): > They are not. I guess I will just build this from the files.

FelixErnst (13:46:08): > a make data script is always good. It allows for reproducability if needed.

FelixErnst (13:47:11): > So If you can put all the conversion into script if would be definitely be a solid setup

Leo Lahti (16:05:24): > Yes the question was mainly if that should also fetch the data from the permanent repo (Figshare) or can I assume that user has downloaded those files locally before running the make data script.

Leo Lahti (16:05:44): > Otherwise I agree completely and I see no other way basically.

Leo Lahti (16:08:30): > microbiomeDataSets does not seem to have make data files for any other data, though (at least not in inst/scripts/ wehere I think they could be expected to be).

FelixErnst (16:26:14): > The steps a make-script would need to perform would be 1. download the data and 2. converting them to the files compatible with ExperimentHub. After the upload the ExperimentHub the data would be available for use (via the appropriate download function)

FelixErnst (16:27:34): > So the user would need to download the data from FigShare. The conversion would be done once by us and then the download would take place from EH

FelixErnst (16:28:29): > On the second note, we didn’t actually produce the make-scripts for the other datasets. maybe that has been a bit of an oversight, but I guess the data structure was not as fixed for those.

FelixErnst (16:29:04): > The were assemble to completeness and saved for EH once

2021-04-13

Leo Lahti (01:39:50): > Yes, this was my understanding of the procedure as well. The uncertainty related to the level of automation regarding the initial downloads. Because it is a single event, it is sufficient to provide the source URL and mention manual download. If automation seems problematic. Figshare downloads seem to require user-specific API keys.

2021-04-14

FelixErnst (04:11:57): > I guess, that a make-script makes sense in this case as well, even though it maybe just for us.

2021-04-16

Sudarshan (04:27:28): > https://arxiv.org/abs/2104.07266 - Attachment (arXiv.org): A Critique of Differential Abundance Analysis, and Advocacy for an… > It is largely taken for granted that differential abundance analysis is, by default, the best first step when analyzing genomic data. We argue that this is not necessarily the case. In this…

Sudarshan (04:29:01): > The different “-isms” and related debate

Leo Lahti (05:13:32): > great

2021-04-19

Leo Lahti (15:10:02): > Microbiome data science in the SummarizedExperiment universeFelix G.M. Ernst,Tuomas Borman,Sudarshan Shetty,Ruizhu Huang,Domenick James Braccia,Hector Bravo,Leo M LahtiDecision: Accept (Short talk - 10min) > > Comment: We are pleased to inform you that your proposed talk has been accepted after peer review for inclusion in the BioC2021 programme. > > Registration is now open. Please register here:https://bioc2021.bioconductor.org/registration/

2021-04-24

Leo Lahti (15:43:40): > Any pointers to comparisons/benchmarking on computational efficiency ofSEclass and itss derivatives? OrTSEvsphyloseq? Preferably in peer reviewed publications.

Sudarshan (17:42:18) (in thread): > Not aware of any comparisons

2021-04-28

Sudarshan (08:21:12): > Dynamic Bayesian Networks for Integrating Multi-omics Time Series Microbiome Datahttps://msystems.asm.org/content/6/2/e01105-20

2021-04-30

Tuomas Borman (09:09:37): > LoadFromBiom(andmakeTreeSummarizedExperimentFromBiom) is currently returningTSEobject. However, returnedTSEdoes include onlyassay,rowData, andcolData. > 1. Should the returned object beSEinstead ofTSE? Then it would be similar toloadFromMothurwhich returnsSE. > 2. Or should it be possible to add additional files, likerowTree, as an input?

FelixErnst (09:27:41): > Which type of tree does the biom file format provide?

FelixErnst (09:28:46): > And how is the link between assay data and tree tips/nodes provided?

Tuomas Borman (09:46:56): > I’m not sure if biom file format includes any type of tree, I don’t have that much of experience of it

FelixErnst (09:47:47): > Then I would go for option 1 for now

FelixErnst (09:53:04): > does anyone more experience with the biom file format?

Leo Lahti (10:04:46): > It can include a tree, and this is an important option. But it does not always contain a tree.

Leo Lahti (10:05:46): > But there is already R package rbiom. Also avoid duplication:https://cran.r-project.org/web/packages/rbiom/index.html - Attachment (cran.r-project.org): rbiom: Read/Write, Transform, and Summarize ‘BIOM’ Data > A toolkit for working with Biological Observation Matrix (‘BIOM’) files. Features include reading/writing all ‘BIOM’ formats, rarefaction, alpha diversity, beta diversity (including ‘UniFrac’), summarizing counts by taxonomic level, and sample subsetting. Standalone functions for reading, writing, and subsetting phylogenetic trees are also provided. All CPU intensive operations are encoded in C with multi-thread support.

Leo Lahti (10:05:56): > biom format is very common

FelixErnst (10:06:35): > In mia the biom file is accessed usingbiomformathttp://bioconductor.org/packages/release/bioc/html/biomformat.html - Attachment (Bioconductor): biomformat > This is an R package for interfacing with the BIOM format. This package includes basic tools for reading biom-format files, accessing and subsetting data tables from a biom object (which is more complex than a single table), as well as limited support for writing a biom-object back to a biom-format file. The design of this API is intended to match the python API and other tools included with the biom-format project, but with a decidedly

FelixErnst (10:06:46): > Do you have an example file with a tree?

Leo Lahti (10:11:21): > right

Leo Lahti (10:11:38): > not right now, perhaps@Sudarshanhas? I am sure we can find one online

Sudarshan (11:13:58): > I am not aware of tree being part of biome files. Haven’t use biom for past few years now. I know it can have sample data, seq data but never saw tree data. It was always separate .nwk or .tre file.

Leo Lahti (11:31:53): > @Tuomas Bormanyou have the tree.nwk file in our example data.

Leo Lahti (11:33:30): > Does that work?

Leo Lahti (11:34:00): > it is a “Newick tree” and by some quick browsing I see that it is also mentioned with biom / QIIME2

Tuomas Borman (11:52:47): > Yep, I saw the separate tree, but I was thinking if it could be included in Biom file in some cases also. > > So, we go with option 1

Leo Lahti (12:56:46): > Good for now.

2021-05-01

FelixErnst (06:11:12): > I think it would be best to accept only formats which can be read in usingape::read.tree. This is much easier to maintain, than introducing a second dependency for tree loading

Leo Lahti (06:57:55): > No disagreement, and Newick trees are included.

2021-05-06

Pratheepa Jeganathan (10:38:21): > @Pratheepa Jeganathan has joined the channel

2021-05-07

Leo Lahti (13:56:50): > I am thinking of a good short name for the MicrobiomeExperiment framework. Preferably a single word / acronym. Some options: “miaverse” (not sure how much we like to associate with this), “mia” (can refer to the project in addition to the pkg but some risk of confusion), other..?

Sudarshan (14:01:25): > I am very bad at this:sweat_smile:

2021-05-08

Leo Lahti (04:33:06): > I got question from “Why R?” webinar series about (non-senior) speakers. I have promised to talk about microbiome research in summer but there we have many possible R topics related to that. Anyone interested, let me know.

2021-05-11

Megha Lal (16:45:15): > @Megha Lal has joined the channel

2021-05-12

Leo Lahti (12:34:47): > Some colleagues organizing this one:https://www.microbiomics2021.org/

Leo Lahti (12:35:24): > interesting since there are clear links between Single cell & microbiome analytics in miaverse

2021-05-22

Moritz E. Beber (16:00:36): > @Moritz E. Beber has joined the channel

2021-05-27

Aarthi Ravikrishnan (02:45:49): > @Aarthi Ravikrishnan has joined the channel

2021-05-30

Chris Fields (13:03:38): > @Chris Fields has joined the channel

2021-06-28

Leo Lahti (18:58:57): > I am starting to think that miaverse would be a good name for the overall TSE framework. Any comments?

2021-06-29

Tuomas Borman (04:56:46) (in thread): > I think it sounds nice and simple, and it describes the thing. I googled the name. “miiverse” was a Nintendo’s social network, but there shoudn’t be meanings for “miaverse” that could be adversehttps://en.wikipedia.org/wiki/Miiverse - Attachment: Miiverse > Miiverse was a social network for Nintendo 3DS and Wii U, created by Nintendo System Development and Hatena, and powered by the Nintendo Network. Integrated into many games, Miiverse allowed players to interact and share their experiences by way of handwritten messages or drawings, text, screenshots, and sometimes game videos in dedicated communities. It was available via any web browser, and a dedicated app version originally planned for tablets and smartphones. All users who signed up for a Nintendo Network ID were automatically given a Miiverse profile per account, represented by the Mii avatar connected to said Nintendo Network ID. > Miiverse was announced on June 3, 2012 during a pre-E3 2012 Nintendo Direct event; the service initially launched on the Wii U on November 18, 2012 and was later made available for the Nintendo 3DS on December 9, 2013. A web-based portal was opened on April 25, 2013.Miiverse was discontinued on November 7, 2017 at 10:00 PM PST. The service was discontinued worldwide simultaneously at this point in time. The majority of time zones were in November 8. Only two time zones along the western side of the Americas remained in November 7. Nintendo of America, which is based in Redmond, Washington, experienced Miiverse shutting down at 10 PM on November 7, which coincides with 3 PM on November 8 in Kyoto, Nintendo’s main HQ. The service never launched on the Nintendo Switch. However, games such as Splatoon 2 and Super Mario Maker 2 include a community messaging feature that is reminiscent of Miiverse’s handwritten message/drawing function.

Himel Mallick (05:15:48): > @Himel Mallick has joined the channel

Leo Lahti (05:41:57) (in thread): > This can bring us additional vibes

Tuomas Borman (07:45:54) (in thread): > That’s true

2021-07-02

Rajesh Shigdel (04:32:48): > @Rajesh Shigdel has joined the channel

Chouaib Benchraka (04:34:03): > @Chouaib Benchraka has joined the channel

2021-07-07

Leo Lahti (02:52:26): > The Bioconductor conference program is now online, and the miaverse talk seems to be on Friday Aug 6 in a session on related topics - there are also (many) other interesting microbiome talks/workshops on this and other dayshttps://www.airmeet.com/e/3124e6e0-8b3d-11eb-adfc-b1c12ad96800 - Attachment (Airmeet): Bioconductor Annual Conference 2021 | Airmeet > Bioconductor conference highlights current developments within and beyond the Bioconductor project. See more info at https://bioc2021.bioconductor.org

Leo Lahti (11:20:42): > Suggestions for open microbiome data sets that could be used to demonstrate how to import BIOM files into thetseformat? Should have biom file (abundance and taxonomy), sample metadata, and phylogenetic tree file.

Chris Fields (13:26:34): > @Leo Lahtiwould you know if there are plans to have this recorded and posted? I will not be available then. More than happy to pay for cost to access this if needed, well worth it

Sudarshan (14:04:46) (in thread): > https://github.com/mibwurrepo/EdwardsJ_2019_EquineCoreMicrobiomewe have two dataset here

Leo Lahti (16:32:38) (in thread): > I think the talks will be recorded and may be even freely available afterwards. At least we can take a note on this and post here the info.

Leo Lahti (16:33:23) (in thread): > Thanks for the comment:grinning:

2021-07-15

Leo Lahti (03:52:24): > We are planning to rename this channel #miaverse to better reflect the current developments. Any comments welcome before this is done..

2021-07-16

Leo Lahti (09:15:37): > We were thinking here that a shorter name forTreeSummarizedExperimentwould be quite useful in docs and edu materials, and better stick to one abbreviation than use many. Options includeTSEandTreeSEat least. I preferTreeSEas it is more clear. Would like to use this to refer to the class & data container in most materials. How would this sound? Well let’s say that I will start usingTreeSEin docs from now on unless other discussion will arise.

Lori Shepherd (13:54:46): > @Lori Shepherd has joined the channel

Lori Shepherd (14:07:33): > @Lori Shepherd has left the channel

2021-07-23

Leo Lahti (13:46:56): > @Ruizhu HUANGa question for you; when subsettingTreeSErows, is there a way to collapse the tree at the same time? By default, the original tree seems to stay:> rowTree(GlobalPatterns[1:3,])Phylogenetic tree with 19216 tips and 19215 internal nodes. > Tip labels: > 549322, 522457, 951, 244423, 586076, 246140, … > Node labels: > , 0.858.4, 1.000.154, 0.764.3, 0.995.2, 1.000.2, … > Rooted; includes branch lengths.

Batool Almarzouq (15:53:29): > @Batool Almarzouq has joined the channel

2021-07-24

Ruizhu HUANG (11:22:44) (in thread): > That is a very question.rowTreedoes not directly output the collapse tree because > (1) the numeric ID of node changes after the tree is pruned, which might lead to confusion if users need to do some mapping to the original tree in later steps. > (2) In your example case, it is quite clear how the tree should be collapsed after subsetting because rows are only mapped to leaves. For cases that have rows mapped to internal and leaf nodes of tree, it becomes difficult to decide how the tree structure should be updated. > A possible way to get the collapsed tree in your example case would beape::keep.tip.:blush: > > > GP <- GlobalPatterns > > ape::keep.tip(phy = rowTree(GP), tip = rowLinks(GP[1:3, ])$nodeNum) > > Phylogenetic tree with 3 tips and 2 internal nodes. > > Tip labels: > 549322, 522457, 951 > Node labels: > 0.995.2, 1.000.2 > > Rooted; includes branch lengths. >

Leo Lahti (11:46:22) (in thread): > Thanks a lot for this, I will explore a bit. > > Yes, I can see the problem. However, the simple linking structure that does not link to internal nodes is relatively common. It could be worthwhile to consider supporting easy collapsing in such cases? Although there are obvious limitations.

2021-07-25

Ruizhu HUANG (10:18:20) (in thread): > MaybesubsetByLeafis the function we are talking about here? It updates the tree structure after the subsetting. It was not included inTreeSEpackage, but was shown as an example in the section “Custom functions for theTreeSE”in F1000 (https://f1000research.com/articles/9-1246). We could of course add it to the package if that would be useful to users.:blush:

Leo Lahti (11:08:16): > FYI: PR opened to add S3 support for TreeSE in the philr packagehttps://github.com/jsilve24/philr/pull/17

Leo Lahti (11:19:40) (in thread): > Oh yes, this looks the right fit. At least I would have immediate use for it.

Ruizhu HUANG (12:28:09) (in thread): > Great, I will polish the code ofsubsetByLeafand add it to the package.:blush:

Leo Lahti (18:50:52) (in thread): > Awesome

2021-07-30

Leo Lahti (08:17:43): > Does anyone know how to color pointsinplotAbundanceDensity, the follwoing does not work and I could not find a way easily:library(mia)``library(miaViz)``data(GlobalPatterns)``tse <- GlobalPatterns``plotAbundanceDensity(tse, abund_values = "relabundance", point_size=1, colour="red")``plotAbundanceDensity(tse, abund_values = "relabundance", point_size=1) + geom_jitter(color="red")

2021-08-03

Yagmur Simsek (05:56:56): > @Yagmur Simsek has joined the channel

Levi Waldron (07:27:41): > FYI everyone, > 1. curatedMetagenomicData 3 (https://bioconductor.org/packages/curatedMetagenomicData/,https://waldronlab.io/curatedMetagenomicData/) now usesTreeSummarizedExperimentfor all its taxonomic relative abundance data (now >20,000 samples from 86 studies). It includes phylogenetic trees asrowTreeand taxonomic information inrowData, and the vignette recommends use ofmia::splitByRanksto populatealtExpswith taxonomic relative abundances at levels higher than species. Feedback welcome! > 2. @Ludwig Geistlingerand I are hosting a table at BioC2021, August 5 11:30am PT to discuss some challenges relating to taxonomy/phylogeny and other issues coming up from our upcomingbugsigdb.orgthat we’d love to involve other in. - Attachment (Bioconductor): curatedMetagenomicData > The curatedMetagenomicData package provides standardized, curated human microbiome data for novel analyses. It includes gene families, marker abundance, marker presence, pathway abundance, pathway coverage, and relative abundance for samples collected from different body sites. The bacterial, fungal, and archaeal taxonomic abundances for each sample were calculated with MetaPhlAn3 and metabolic functional potential was calculated with HUMAnN3. The manually curated sample metadata and standardized metagenomic data are available as (Tree)SummarizedExperiment objects. - Attachment (waldronlab.io): Curated Metagenomic Data of the Human Microbiome > The curatedMetagenomicData package provides standardized, curated human microbiome data for novel analyses. It includes gene families, marker abundance, marker presence, pathway abundance, pathway coverage, and relative abundance for samples collected from different body sites. The bacterial, fungal, and archaeal taxonomic abundances for each sample were calculated with MetaPhlAn3 and metabolic functional potential was calculated with HUMAnN3. The manually curated sample metadata and standardized metagenomic data are available as (Tree)SummarizedExperiment objects.

Ludwig Geistlinger (07:29:42): > @Ludwig Geistlinger has joined the channel

Leo Lahti (07:57:24): > Thanks@Levi Waldron, this is great and we had already spotted the newTreeSEsupport incuratedMetagenomicData. Will do our best to join these sessions despite the time differences US/EU.

Leo Lahti (07:58:30): > We will have a short presentation onmiaverseframework in Bioconductor meeting on Fri Aug 6, 10:30AM PT.

2021-08-05

Leo Lahti (04:12:54): > The miaverse presentation slides for this Bioc meeting are now live:https://f1000research.com/slides/10-748 - Attachment (f1000research.com): Slide: Microbiome data science in the SummarizedExperiment universe has been published by F1000Research. > Read this work by Lahti L, at F1000Research.

2021-08-06

Leo Lahti (08:49:08): > We have a miaverse table in the lounge. Let us meet there after today’s presentation. It was put at 4pm PT because I was unclear about the time when reserving. This is 2am in Finland, I cannot promise to be up but it is Friday night so let’s see:slightly_smiling_face:

Leo Lahti (13:52:27): > Hmm interesting question “pangenomic structures in this context. Do you think we need a different datastructure for those data?”

Leo Lahti (13:52:31): > any comments?

Leo Lahti (14:15:27): > I am in the lounge at least ..:face_with_rolling_eyes:

2021-08-11

Leo Lahti (06:20:01): > FYI all here as well, we will organize a short 3-day PhD level virtual course on microbiome time series analyses on Nov 2-4. We had to switch to virtual meeting, and there is now space for more participants. There is a nice lineup of speakers, and some hands-on sessions as well. Registration can be done through the websitehttp://msysbiology.com/microbialtimeseries/

2021-08-13

Leo Lahti (16:54:25) (in thread): > Hi@Ruizhu HUANG- any updates on this..?

2021-08-14

Leo Lahti (18:53:52): > TreeSE support added to philrhttps://github.com/jsilve24/philr/pull/17

2021-08-15

Ruizhu HUANG (11:48:54) (in thread): > Hi@Leo LahtiSorry for the later update. I have just added the new functionsubsetByLeafin the Bioconductor (version 2.0.3 in release and 2.1.4 in devel). This update is now available in Github. Bioconductor probably has about one-day delay.

Leo Lahti (12:14:28) (in thread): > awesome:slightly_smiling_face:

Leo Lahti (12:14:54) (in thread): > Thanks a lot@Ruizhu HUANG- I will soon update our own documentation when this new feature becomes available.

Leo Lahti (17:48:04): > Dear all - if you have received email about removed permissions no worries, this means that you have been assigned the relevant permissions through a team instead, I just wanted to clarify the management structure as the project is growing. If anything missing, let me know.

2021-08-17

Tuomas Borman (13:13:15): > I found a quite interesting feature from SE constructor. Sample names of assay are ordered based on colData. However, the values in the table stay the same –> wrong values to wrong samples > > Can anyone replicate this? I think this is quite weird, and very hard for user to notice - File (HTML): se_bug.html - File (Plain Text): se_bug.Rmd

Vince Carey (14:58:10): > yes, i think i would have expected behavior like > > > assay(se) = assay_mod > Error in `assays<-`(`**tmp**`, withDimnames = withDimnames, ..., value = `**vtmp**`) : > please use 'assay(x, withDimnames=FALSE)) <- value' or 'assays(x, > withDimnames=FALSE)) <- value' when the dimnames on the supplied > assay(s) are not identical to the dimnames on SummarizedExperiment > object 'x' > > in the construction

Vince Carey (14:58:44): > @Hervé Pagèsthis is worth a look

Hervé Pagès (14:58:47): > @Hervé Pagès has joined the channel

Hervé Pagès (15:02:55): > @Tuomas BormanCan you please open an issue on GitHub (https://github.com/Bioconductor/SummarizedExperiment/issues) with a minimalist reproducible example +sessionInfo(). Thx!

Tuomas Borman (15:11:34): > Sure

2021-08-18

Leo Lahti (12:07:41): > Let’s put this here, toohttps://twitter.com/AedinCulhane/status/1427988110827212800?s=09 - Attachment (twitter): Attachment > Have you developed a @Bioconductor package… was it a challenge? Do you want to help another would-be developer get their R scripts into a Bioconductor Package… #giveback Expression of interest for mentors form forms.gle/fya5JEArTT5kNE… https://pbs.twimg.com/media/E9E7J-IXsAMBvag.jpg

2021-08-22

Leo Lahti (17:27:48): > In some cases, user may want to group taxa into custom groups; for instance to simplify analysis by grouping rare taxa into a group “Other”. Such grouping can help to reduce dimensionality of the data, while retaining the information on abundances (the proportion of those rare taxa may differ across samples, and this information is potentially valuable). However, such operation will break the tree since the grouped (e.g. rare) taxa may come from different parts of the tree. My immediate impression is that the only way to keep the tree for a coherent analysis would be to subset the data and collapse the tree so that the rare taxa are simply discarded i.e we cannot both group them, and keep a tree. But because discarding rare taxa would discard information I am curious if there has been discussion on this, are there any alternatives that could allow both keeping the tree in some form as well as grouping some of the taxa across different parts of the tree this way .. @Ruizhu HUANG?

2021-08-26

Leo Lahti (03:22:09): > Perhaps not feasible then.

Leo Lahti (03:25:16): > Another question onTreeSE- now the subsetting based on colData would go liketse[, colData(tse)[, "SampleType"] == "Ocean"]but does something prevent us from having a simplification liketse[, SampleType == "Ocean"]?

Leo Lahti (03:27:35): > This may be problematic if row/colData refers internal tree nodes but if this would be defined on leaves it should work. Too error prone?

Hervé Pagès (06:12:27): > @Hervé Pagès has left the channel

2021-09-12

Leo Lahti (03:31:43): > We just saw this one, thanks for@Sudarshanfor the tip:https://github.com/YuLab-SMU/MicrobiotaProcess- I think from@Guangchuang Yu? Based on another tree-based derivative ofSummarizedExperimentand available via Bioconductorhttps://www.bioconductor.org/packages/devel/bioc/html/MicrobiotaProcess.html

Levi Waldron (22:25:26): > <!channel>I am an organizer of the Microbiome Virtual International Forum, a monthly 3h mini-conference with our first meeting this Tuesday at 10am. We have a lineup of speakers from all over the world, and registration is free (see the program athttps://www.microbiome-vif.org/program/). There are currently >400 registrants, but fortunately we don’t have a room size limit, so please come join, and also consider submitting an abstract for a future forum. You can follow on Twitter athttps://twitter.com/MicrobiomeVIFor email athttps://groups.google.com/g/mvif. - Attachment (microbiome-vif.org): Program, > PROGRAM Next Meeting: Atlantic: SEPTEMBER 14th 2021 Pacific: SEPTEMBER 16th 2021 Premiere SessionAtlantic Time: 14th September 2021: 10. - Attachment (twitter.com): Microbiome Virtual International Forum (@MicrobiomeVIF) | Twitter > The latest Tweets from Microbiome Virtual International Forum (@MicrobiomeVIF). MVIF, free recurring bite-sized alternative to multi-day microbiome conferences. Sept 2021 (online) registration: https://t.co/DBQNdyfjzS…. Worldwide

2021-09-16

Chris Fields (10:58:25): > @Levi Waldronare you announcing these regularly here (or other places like Twitter, etc)? I couldn’t attend this but def. interested in attending future ones

Levi Waldron (20:07:21): > Most reliable source for all information including abstract deadlines etc ishttps://twitter.com/microbiomevifand the web site, but I’ll try to announce here when new programmes are posted. - Attachment (twitter.com): Microbiome Virtual International Forum (@MicrobiomeVIF) | Twitter > The latest Tweets from Microbiome Virtual International Forum (@MicrobiomeVIF). MVIF, free recurring bite-sized alternative to multi-day microbiome conferences. Sept 2021 (online) registration: https://t.co/DBQNdyfjzS…. Worldwide

2021-09-25

Haichao Wang (07:20:37): > @Haichao Wang has joined the channel

Mikey C (19:04:51): > @Mikey C has joined the channel

2021-10-11

David Mateo García (13:50:53): > @David Mateo García has joined the channel

David Mateo García (14:00:52): > Hi all. My name is David, and with my colleague Maria Ángeles are testing Miaverse. We’re students of the Radboud Microbiota’s Course. We’re trying to generate a SE object with our own data (counts, tax and samples), using this code: > > se <- SummarizedExperiment(assays = list(counts = counts), > colData = samples, > rowData = tax) > > All of our data are in data.frame format. When we used the code above, the following error appears:Error in validObject(.Object) : invalid class “SummarizedExperiment” object: ‘x@assays’ is not parallel to ‘x’We will really appreciated if you could help us with this trouble. > > Thanks beforehand, > David and Maria Ángeles

Vince Carey (14:09:19): > I sure don’t understand the error message. Can you tell us a little more, like dim(counts), dim(samples), dim(tax), and also provide the result of sessionInfo() after the error is triggered? Thanks!

2021-10-19

Maria Angeles Martinez Rodriguez (07:08:27): > @Maria Angeles Martinez Rodriguez has joined the channel

Maria Angeles Martinez Rodriguez (07:23:48): > Dear Vince, our dim, error (with traceback) and sessionInfo() are below, thank you beforehand: > dim(counts) 6285 1312 > dim (samples) 1312 106 > dim (tax) 6286 7 Error in validObject(.Object) : invalid class “SummarizedExperiment” object: ‘x@assays’ is not parallel to ‘x’ > 7. > stop(msg, “:”, errors, domain = NA) > 6. > validObject(.Object) > 5. > initialize(value, …) > 4. > initialize(value, …) > 3. > new(“SummarizedExperiment”, NAMES = names, elementMetadata = rowData, colData = colData, assays = assays, metadata = as.list(metadata)) > 2. > new_SummarizedExperiment(assays, ans_rownames, rowData, colData, metadata) > 1. > SummarizedExperiment(assays = list(counts = counts), colData = samples, rowData = tax) sessionInfo() R version 4.1.0 (2021-05-18) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 10 x64 (build 19042) > > Matrix products: default > > locale: > [1] LC_COLLATE=Catalan_Spain.1252 LC_CTYPE=Catalan_Spain.1252 LC_MONETARY=Catalan_Spain.1252 > [4] LC_NUMERIC=C LC_TIME=Catalan_Spain.1252 > > attached base packages: > [1] parallel stats4 stats graphics grDevices utils datasets methods base > > other attached packages: > [1] tibble_3.1.2 readr_2.0.0 DESeq2_1.32.0 > [4] dplyr_1.0.7 miaViz_1.0.1 ggraph_2.0.5 > [7] scater_1.20.1 ggplot2_3.3.5 scuttle_1.2.0 > [10] mia_1.0.6 TreeSummarizedExperiment_2.0.2 Biostrings_2.60.1 > [13] XVector_0.32.0 SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0 > [16] Biobase_2.52.0 GenomicRanges_1.44.0 GenomeInfoDb_1.28.1 > [19] IRanges_2.26.0 S4Vectors_0.30.0 BiocGenerics_0.38.0 > [22] MatrixGenerics_1.4.0 matrixStats_0.59.0 phyloseq_1.36.0 > > loaded via a namespace (and not attached): > [1] plyr_1.8.6 igraph_1.2.6 lazyeval_0.2.2 > [4] splines_4.1.0 BiocParallel_1.26.1 digest_0.6.27 > [7] foreach_1.5.1 htmltools_0.5.1.1 viridis_0.6.1 > [10] fansi_0.5.0 magrittr_2.0.1 memoise_2.0.0 > [13] ScaledMatrix_1.0.0 cluster_2.1.2 DECIPHER_2.20.0 > [16] tzdb_0.1.2 annotate_1.70.0 graphlayouts_0.7.1 > [19] colorspace_2.0-2 blob_1.2.2 ggrepel_0.9.1 > [22] xfun_0.24 crayon_1.4.1 RCurl_1.98-1.3 > [25] jsonlite_1.7.2 genefilter_1.74.0 survival_3.2-11 > [28] iterators_1.0.13 ape_5.5 glue_1.4.2 > [31] polyclip_1.10-0 gtable_0.3.0 zlibbioc_1.38.0 > [34] DelayedArray_0.18.0 BiocSingular_1.8.1 Rhdf5lib_1.14.2 > [37] scales_1.1.1 DBI_1.1.1 Rcpp_1.0.7 > [40] viridisLite_0.4.0 xtable_1.8-4 decontam_1.12.0 > [43] tidytree_0.3.4 bit_4.0.4 rsvd_1.0.5 > [46] httr_1.4.2 RColorBrewer_1.1-2 ellipsis_0.3.2 > [49] pkgconfig_2.0.3 XML_3.99-0.6 farver_2.1.0 > [52] locfit_1.5-9.4 utf8_1.2.1 tidyselect_1.1.1 > [55] rlang_0.4.11 reshape2_1.4.4 AnnotationDbi_1.54.1 > [58] munsell_0.5.0 tools_4.1.0 cachem_1.0.5 > [61] DirichletMultinomial_1.34.0 generics_0.1.0 RSQLite_2.2.7 > [64] ade4_1.7-17 evaluate_0.14 biomformat_1.20.0 > [67] stringr_1.4.0 fastmap_1.1.0 yaml_2.2.1 > [70] ggtree_3.0.2 knitr_1.33 bit64_4.0.5 > [73] tidygraph_1.2.0 purrr_0.3.4 KEGGREST_1.32.0 > [76] nlme_3.1-152 sparseMatrixStats_1.4.0 aplot_0.0.6 > [79] compiler_4.1.0 rstudioapi_0.13 png_0.1-7 > [82] beeswarm_0.4.0 treeio_1.16.1 geneplotter_1.70.0 > [85] tweenr_1.0.2 stringi_1.7.3 lattice_0.20-44 > [88] Matrix_1.3-4 vegan_2.5-7 permute_0.9-5 > [91] multtest_2.48.0 vctrs_0.3.8 pillar_1.6.2 > [94] lifecycle_1.0.0 rhdf5filters_1.4.0 BiocManager_1.30.16 > [97] BiocNeighbors_1.10.0 data.table_1.14.0 bitops_1.0-7 > [100] irlba_2.3.3 patchwork_1.1.1 R6_2.5.0 > [103] gridExtra_2.3 vipor_0.4.5 codetools_0.2-18 > [106] MASS_7.3-54 assertthat_0.2.1 rhdf5_2.36.0 > [109] withr_2.4.2 GenomeInfoDbData_1.2.6 hms_1.1.0 > [112] mgcv_1.8-36 grid_4.1.0 beachmat_2.8.0 > [115] tidyr_1.1.3 rmarkdown_2.11 DelayedMatrixStats_1.14.0 > [118] rvcheck_0.1.8 ggnewscale_0.4.5 ggforce_0.3.3 > [121] ggbeeswarm_0.6.0

Vince Carey (08:50:01): > I would question why there are 6286 rows in tax but 6285 in counts. This would seem to be exactly the condition identified as “not parallel” although I would agree the terminology is unclear.

2021-10-22

Maria Angeles Martinez Rodriguez (03:40:38): > Dear Vince, > totally agree with you, we were thinking that the number of obs in counts and tax must be equal. > We wanted to merge the two dataframe (counts and metadata) by the sample_id variable to keep the data that is relevant for us (removing the extra information in counts, because we have here positive controls, and other non relevant information). > So, we realized that when we use this code: count_match <- counts_predimed[-1571, metadata_match_def$sample_id] in order to reduce the number of variables (from 1571 to 1312) we got the 1312 variables but we loose 1 observation insted of 6286 we got 6285. > Do you have any idea about that? How this can be possible? > thank you beforehand > MA

Leo Lahti (07:05:13) (in thread): > If you can create a count matrix and metadata data.frame (colData) that have the same dimensions and sampleIDs, then I think you should be able to construct the SE object.

Leo Lahti (07:11:50) (in thread): > The samples can be identified based on the row and column names: > > # Load some example data > library(mia) > data(GlobalPatterns) > counts <- assay(GlobalPatterns, "counts") > samples <- colData(GlobalPatterns) > > # Pick random sample subset > s <- sample(rownames(samples), 5) > > # Construct SE where we have same > # samples for abundance table (counts) > # and sample metadata (colData) > se <- SummarizedExperiment(assays = list(counts = counts[,s]), colData = samples[s, ]) >

Leo Lahti (07:12:37) (in thread): > If you can make sure that the samples match between the input data tables, then the problem might be solved?

2021-11-08

Quang Nguyen (10:55:14): > @Quang Nguyen has joined the channel

2021-11-09

Levi Waldron (09:44:20): > <!here>the Atlantic session ofmicrobiome-vif.orgn. 3 is starting in ~20 minutes, sorry for the late notice! The session for Pacific time zones will follow in two days. The keynote today is by Frederic Bushman, “The human virome in health and disease” and there are a number of other contributed talks and research highlights / open-access paper highlights (program athttps://www.microbiome-vif.org/program/). Registration is free:https://hopin.com/events/microbiome-vif-n-3We do have a newsletter now if you want to be informed of upcoming events and abstract registration deadlines (would love to get see Bioconductor-related talks!) - Attachment (microbiome-vif.org): Program, > PROGRAM Next Meeting: Atlantic: NOVEMBER 9th 2021PACIFIC: NOVEMBER 11th 2021 Premier SessionAtlantic Time: 9th NOVEMBER:  10. 00am New York 15 - Attachment (hopin.com): Microbiome-VIF n.3 - Nov 09 | Hopin > Get tickets to Microbiome-VIF n.3, taking place 11/09/2021 to 11/10/2021. Hopin is your source for engaging events and experiences.

2021-11-26

Francesc Català (06:43:47): > @Francesc Català has left the channel

2021-12-02

Maria Angeles Martinez Rodriguez (09:30:15): > hello! Do you know if there is any code equivalent to prune_taxa in miaverse?

Leo Lahti (13:21:01): > How about this:library(mia)``# Example data``data(GlobalPatterns, package="mia")``tse <- GlobalPatterns``# Pick "taxa" (the level present in the data)``taxa <- rownames(tse)[1:3]``# Subsetting (as in phyloseq::prune_taxa)``tse.pruned <- tse[taxa, ]

2021-12-06

Tuomas Borman (09:32:50): > Hi! > > Currently, – if I have understood correctly (@Ruizhu HUANG) –tse[1:10, ]updatesrowLinksbut it does not remove additional leaves fromrowTree. In order to subset data includingrowTree, you need to do something like this:library(mia)``data(GlobalPatterns)``tse <- GlobalPatterns``# Subset``tse <- tse[1:5, ]``# rowTree includes 19216 tips. ``tse <- subsetByLeaf(tse, rowLeaf = rownames(tse) )``# rowTree includes 5 tipsFor instance,phyloseq::filter_taxaalso subsets the phylo tree. > > Because of that, subsetting function ofTreeSEcould have an additional option for subsetting alsorowTree(andcolTree). For example, it could be like this:tse[1:5, , subsetTree = FALSE]What do you all think? Is this something that could be implemented? > > -Tuomas

Leo Lahti (11:15:09): > To me this would seem useful.

Leo Lahti (11:18:08): > This feature could fit TreeSummarizedExperiment pkg

2021-12-12

Yagmur Simsek (14:06:26): > Hi!@FelixErnstI just noticed there is a build error of the mia package in Bioconductor, wanted to let you know in case you haven’t noticed.

2021-12-14

Asiye (04:51:18): > @Asiye has joined the channel

Megha Lal (08:23:10): > @Megha Lal has left the channel

2021-12-27

Leo Lahti (13:48:06): > Is there a way to splitSummarizedExperimentobject by sample categories? > > This one splits by features (rows), not by samples (cols):split(se, colData(se)[, field])and the functions that I find with search engines are all for splitting by feature categories.

Leo Lahti (13:50:24): > For instance, if colData(se)[, “subject”] specifies the subject ID, what would be the best way to split the SE object by subject? Something easier thanspl <- split(colnames(se), colData(se)[, "subject"]); ``se_list <- lapply(spl, function (i) {se[, i]})

2021-12-28

Leo Lahti (05:01:05): > Time to consider EuroBioC submissions for workshops, posters, talkshttps://eurobioc2022.bioconductor.org/(submission January 15 - February 10, 2022).

2022-01-03

Yagmur Simsek (15:44:04): > Is there a different way to combine TreeSE objects else thancbind()?

Leo Lahti (16:43:29): > For what purpose?

Yagmur Simsek (18:23:36): > For TreeSE object split by subject ID and each subject is stored as TreeSE in list object. This list contains each TreeSE in elements and I want to put all elements together.cbind()does the job but also throws error: > > Error in (function (classes, fdef, mtable) : > unable to find an inherited method for function 'bindCOLS' for signature "TreeSummarizedExperiment" >

2022-01-04

Leo Lahti (03:20:20): > Do you have reproducible example with some demo data set?

Yagmur Simsek (03:51:36): > You can see the example here in the function. However, the function is not in the main branch. So it won’t be called withlibrary(miaTime)https://github.com/microbiome/miaTime/blob/devel/R/getTimeDivergence.R

Leo Lahti (13:26:56): > is there a problem withcbind?

Yagmur Simsek (14:51:39): > It works within the function and does the job, howeverdevtools::check()gives error

Leo Lahti (16:57:46): > Can you useSEtools::mergeSEs(se_list)instead ofcbindand also add SEtools to Imports in DESCRIPTION

Leo Lahti (16:57:57): > seems to solve the issue

2022-01-10

Levi Waldron (12:32:00): > FYI the program for this week’s Microbiome Virtual International Forum, tomorrow for Atlantic time zones and Thursday for Pacific time zones. Register for free athttps://www.microbiome-vif.org/. - Attachment (microbiome-vif.org): Home, > WELCOME TO MICROBIOME VIRTUAL INTERNATIONAL FORUM! January 11th 2022 MICROBIOME Microbiome Virtual International Conference (MVIF) is a recurring… - File (PNG): MVIF January program .png

2022-01-19

Maria Angeles Martinez Rodriguez (03:39:24): > Hi! we are working with miaverse in order to obtain from a phyloseq a TSE, we done TSE. Then we filtered all the data with this code:

Maria Angeles Martinez Rodriguez (03:39:46): > #filtering script > otu_taxa_genus <- aggregate(otu_taxa1[,-1], by=list(otu_taxa1\(taxa_1), sum) > otu_taxa_prev <- otu_taxa_genus[which(rowSums(otu_taxa_genus>0)>=ncol(otu_taxa_genus[,-1])*.1),] > otu_taxa_prev <- otu_taxa_prev[,which(colSums(otu_taxa_prev>0)>=nrow(otu_taxa_prev)*.1)] > otu_kept <- otu_taxa1[otu_taxa1\)taxa_1 %in% otu_taxa_prev$Group.1,] > otu_kept <- otu_kept[,colnames(otu_kept) %in% colnames(otu_taxa_prev)]

Maria Angeles Martinez Rodriguez (03:43:48): > Now, our problem is that we want to join the filtered data with the previous TSE, in order to do that, we can use a code used in phyloseq, but this is not working: > # match the otus and samples filtered in the original ps (no clr transformed) > filtered_tse <- prune_taxa (rownames(otu_kept), tse) # we tried , tse insted ps (phyloseq) > filtered_tse <- prune_samples(colnames(otu_kept), filtered_tse) # we tried , tse insted ps (phyloseq) > could you help us? thanks!!:innocent:

Leo Lahti (04:34:12): > Is the question how to subset features and samples for TreeSE objects?

Leo Lahti (04:36:30): > One problem that I see is that you are here trying to apply phyloseq tools for TreeSE objects. But phyloseq tools are not applicable to TreeSE objects. You need to operate on the TreeSE object directly with the appropriate tools, or you can convert TreeSE back to phyloseq object and then use phyloseq tools. To operate directly on the TreeSE object, you could dotse[rownames(otu_kept), colnames(otu_kept)]

Maria Angeles Martinez Rodriguez (05:08:11) (in thread): > thanks for your quick answer, we are going to try it!

Maria Angeles Martinez Rodriguez (05:21:40) (in thread): > we got it!! thank you so much!!!

2022-01-20

David Mateo García (03:50:34) (in thread): > Our main goal is to compare miaverse vs phyloseq, so we were looking for the equivalent “prune” function in miaverse. We saw that miaverse is easier than phyloseq, so thank you again for the code.

2022-01-22

Leo Lahti (09:56:52) (in thread): > If you also want to collapse the tree you can try TreeSummarizedExperiment::subsetByLeaf and that typically works, unless you have a complicated tree

Leo Lahti (09:59:15) (in thread): > The [] method keeps the original tree because the right collapse procedure may depend on the tree structure and it is difficult to provide a general solution. But this applies to both data formats, phyloseq and treese

2022-02-03

Maria Angeles Martinez Rodriguez (06:21:55) (in thread): > Hi@Leo Lahti! we are working with miaverse. We tried again the code that you gave us but first we agglomerate by “genus” so, after that we need again introduce our otu_kept in tse: tse_agg <- tse[rownames(otu_kept_tse_agg), colnames(otu_kept_tse_agg)].However, this did not work, we obtained that error: “Error: 10 specified rows can’t be found”.

Maria Angeles Martinez Rodriguez (06:22:19) (in thread): > Thanks beforehand!:slightly_smiling_face:

Leo Lahti (11:04:42) (in thread): > hmm have you checked what is inrownames(otu_kept_tse_agg)?

Leo Lahti (11:05:24) (in thread): > .. and what exact aggregation command you have used?

Maria Angeles Martinez Rodriguez (11:13:42) (in thread): > hi! we have used tse_agg <- agglomerateByRank(tse, rank = “Genus”) > in phyloseq people use this code: ps_agg <- aggregate_taxa(ps, level = “Genus”, verbose = TRUE) > After agglomeration by genus, we follow some steps: > 1. # extract data frame: > counts_tse_agg <- as.data.frame(tse_agg@assays@data@listData [["counts"]]) > taxa_tse_agg <- as.data.frame(rowData(tse_agg))# tse_agg > metadata_tse_agg <- data.frame(colData(tse_agg)) > 2.#remove “Species” column from taxa > taxa_tse_agg$Species <- NULL

Maria Angeles Martinez Rodriguez (11:24:20) (in thread): > 3.# import metadata > metadata_BL <- read_excel(“C:/Users/29213673-Q.PDI/Desktop/MICROBIOTA_PPLUS/FILTRADO/metadata_z.xlsx”) > metadata_BL1 = column_to_rownames(metadata_BL, var=“sample_ID_otu”) #pasar de variable (sample_ID_OTU)a rowname > 4.# generate sample_ID_otu column > metadata_BL1\(sample_ID_otu <- metadata_BL\)sample_ID_otu > 5.#match names in otu table (counts_clr) with names in metadata_z(metadata BL1) > otu_BL_tse_agg = counts_tse_agg[-1571,metadata_BL1\(sample_ID_otu] > #6.create a column with OTU_IDs > otu_BL1_tse_agg <- rownames_to_column(otu_BL_tse_agg, var = "OTU_ID")#los OTUs de rowname pasan a variable (columna) > taxa_1_tse_agg <- rownames_to_column(taxa_tse_agg, var = "OTU_ID")#los OTUs de rowname pasan a variable (columna) > #7.merge columns in taxa and add a separator > taxa_merged_tse_agg <- taxa_1_tse_agg %>% unite(taxa_1_tse_agg, Domain, Phylum, Class, Order, Family, Genus, sep="_", remove=T)#tendremos solo dos variables ahora > > # 8.merge otu and taxa_merged > otu_taxa_tse_agg <- merge(taxa_merged_tse_agg, otu_BL1_tse_agg, by="OTU_ID") > otu_taxa1_tse_agg = column_to_rownames(otu_taxa_tse_agg, var = "OTU_ID") > #.9 finally #filtering script > otu_taxa_genus_tse_agg <- aggregate(otu_taxa1_tse_agg_2[,-1], by=list(otu_taxa1_tse_agg_2\)taxa_1_tse_agg), sum) > otu_taxa_prev_tse_agg <- otu_taxa_genus_tse_agg[which(rowSums(otu_taxa_genus_tse_agg>0)>=ncol(otu_taxa_genus_tse_agg[,-1])*.1),] > otu_taxa_prev_tse_agg <- otu_taxa_prev_tse_agg[,which(colSums(otu_taxa_prev_tse_agg>0)>=nrow(otu_taxa_prev_tse_agg)*.1)] > otu_kept_tse_agg <- otu_taxa1_tse_agg_2[otu_taxa1_tse_agg_2\(taxa_1_tse_agg %in% otu_taxa_prev_tse_agg\)Group.1,] > otu_kept_tse_agg <- otu_kept_tse_agg[,colnames(otu_kept_tse_agg) %in% colnames(otu_taxa_prev_tse_agg)] > > Up to this point 9, everything runs well and works, the problem now is when we want to insert the otu_kept_tse_agg data in our tse_agg by this code: tse_agg <- tse[rownames(otu_kept_tse_agg), colnames(otu_kept_tse_agg)].However, this did not work, we obtained that error: “Error: 10 specified rows can’t be found”.

Maria Angeles Martinez Rodriguez (11:25:09) (in thread): > sorry for a lot of text, but I wanted to try to explain it well:slightly_smiling_face:thank you

Leo Lahti (14:54:25) (in thread): > The above code is mainly for phyloseq but your problem is about TreeSE. It would be informative to see how you have formed the objecttse. You cannot really fluently mix phyloseq and TreeSE methods, they are not exchangable. You can convert TreeSE to phyloseq, and use phyloseq tools, or convert phyloseq to TreeSE, and use TreeSE tools. My impression is that you have now formed some taxonomic grouping with phyloseq, and you are trying to apply it on TreeSE object that does not correspond to the same taxonomic level because it has not been aggregated. But I am not sure because the code above does not tell how the objecttsehas been created. > > It would be easier to debug if you sent the data object and code with reproducible error. Or at least show the output forhead(rownames(otu_kept_tse_agg))and forhead(rownames(tse))and alsomean(rownames(otu_kept_tse_agg) %in% rownames(tse))

2022-02-05

Leo Lahti (04:22:53) (in thread): > If you can verbally specify the task that you need to carry out, starting from the original data, we can see if we could suggest similar example code.

2022-02-25

Erwann SCAON (09:50:08): > @Erwann SCAON has joined the channel

Erwann SCAON (09:59:37): > Hello, I’m new to this channel. > I’m looking forward to some kind of “data container comparison” : phyloseq vs TreeSummarizedExperiment. > Do you have some helpful links (I’ve found this one for the moment :https://microbiome.github.io/OMA/data-introduction.html#background)? > > In my team we will soon start developing internal functions for microbiome analysis (custom one for specific need) and we are trying to evaluate what should be our data container (ease of use, integration, scalability, maintainability, documentation, etc.)

Leo Lahti (10:03:43): > There are no comprehensive comparisons between these, and to some extent they can serve different purposes. It might be a good idea to write a summary about this. Are there some particular things you are into?

Leo Lahti (10:06:37): > Both have different strengths. The ecosystem of tools is broader for phyloseq as it has been around for a longer time. The TreeSE is based on more recently published container technologies, and can benefit from developments in the broader SummarizedExperiment space (used in e,g, Single Cell). The original publications of SummarizedExperiment and TreeSummarizedExperiment discuss some computational benefits in general.

Leo Lahti (10:08:41): > I used to develop tools for phyloseq for about 10 years, and still maintain the microbiome R/Bioc package. But we have switched all new development efforts to the TreeSE framework since last year. There are bridges between the two containers, you convert between these in most common use cases.

Leo Lahti (10:09:24): > There is more support for multiple assays and multi-omics through TreeSE I think.

2022-03-05

Giulio Benedetti (15:16:55): > @Giulio Benedetti has joined the channel

2022-03-30

sarah i (10:03:56): > @sarah i has joined the channel

sarah i (16:28:58): > Hi there! I messaged about doing functional analysis on github, i’m stuck again with doing the beta diversity analyses, really sorry!

sarah i (16:29:49): > I ran > > se <- runMDS(se, FUN = vegan::vegdist, name = "MDS_BC", exprs_values = "counts") > > exactly as inhttps://microbiome.github.io/OMA/beta-diversity.html#estimating-beta-diversityand I get this. > > Error in (function (classes, fdef, mtable) : > unable to find an inherited method for function 'reducedDim<-' for signature '"SummarizedExperiment", "character"' > > Is my data import still incorrect? - Attachment (microbiome.github.io): Chapter 9 Beta Diversity | Orchestrating Microbiome Analysis > Chapter 9 Beta Diversity | Orchestrating Microbiome Analysis

Leo Lahti (16:45:08): > Hmm - - any chance you could share the se object with us?

Leo Lahti (16:45:30): > I do not immediately see where the problem would be if the object is expected to be ok.

sarah i (16:51:52): > Sure, but not sure how to export an SE object, this is what’s written though: > > class: SummarizedExperiment > dim: 1250 7 > metadata(0): > assays(1): counts > rownames(1250): K00001 K00002 ... K22622 K22897 > rowData names(7): AT.2 PS.1 ... TN.2 TN.3 > colnames(7): AT-2 PS-1 ... TN-2 TN-3 > colData names(3): Species Site bray >

Leo Lahti (17:42:45): > I will come back to it tomorrow but if you can paste here the outputs fromdim(rowData(se))anddim(colData(se))anddim(assay(se,"counts"))that might help.

Leo Lahti (17:43:10): > The se you could just save with saveRDS(se, file=“se.rds”) ?

2022-03-31

Tuomas Borman (03:48:55): > Hi, > > the problem might be that SE does not have a reducedDim slot > > Does this workse <- as(se, "TreeSummarizedExperiment") > > se <- runMDS(se, FUN = vegan::vegdist, name = "MDS_BC", exprs_values = "counts") > > If so, then we have to fix this bug

sarah i (09:53:58): > @Leo Lahtihere is the output: > > > dim(rowData(se)) > [1] 1250 7 > > dim(colData(se)) > [1] 7 2 > > dim(assay(se,"counts")) > [1] 1250 7 > > @Tuomas Bormanthat script worked to runMDS, but when I try to plot it > > se <- runMDS(se, FUN = vegan::vegdist, name = "MDS_BC", exprs_values = "counts", keep_dist = TRUE) > > plotMDS(se, "MDS") > > I get this error: > > Error in value[[3L]](cond) : > invalid subscript 'type' in 'reducedDim(<TreeSummarizedExperiment>, type="character", ...)': > 'MDS' not in 'reducedDimNames(<TreeSummarizedExperiment>) >

sarah i (09:55:06): > i’m also wondering if i need to make it a treesummarizedexperiment if i don’t have a phylogenetic tree? would the package still work if my data as SummarizedExperiment?

Leo Lahti (10:59:06): > It should work as aSEbut it is possible that this is currently broken and should be fixed. This is what Tuomas referred to. Seems more likely now, I would wait for his input.

Tuomas Borman (11:23:09): > 1 The thing is thatSummarizedExperimentdoes not havereducedDimslot, however,TreeSummarizedExperimentdoes have that slot. You can check that by printing the object; you can’t seereducedDimNamesinSE. > > Theserun*functions store the result toreducedDimslot. That is why you got the first error. (In my opinion the error should say, “no function forSE” –> method’s input type is wrong). The function works only for objects that havereducedDimslot. > > There arecalculate*(e.g.,calculateMDS) functions that do not store the result toreducedDim. Their output is the MDS result itself. You can use that if you do not want to convert your data intoTreeSE. > > However, then you cannot useplotReducedDim/plotMDSfunction, which is quite handy, I think…

Tuomas Borman (11:27:42): > 2 You get that error because you do not have “MDS” results inreducedDim.plotMDStries to find resutls that are named that way. You have specified that the name is “MDS_BC”. Because your name is “MDS_BC”, you cannot useplotMDS.However, that do not cause any problems.plotMDS(tse, ...)is actually a wrapper forplotReducedDim(tse, dimred = "MDS").So try to useplotReducedDim(se, dimred = "MDS_BC")

Tuomas Borman (11:31:51): > 3 In order to use these wrappers, you have to have reducedDim –> then you have to convert data intoTreeSE. > > There are only few functions that work only for TreeSE (like runMDS which requires reducedDim, but you can still use calculateMDS which work also for SE…). So the package works for SE objects also. Although, converting data into TreeSE foes not cause any harm

Leo Lahti (11:52:09): > Indeed, you can have the data in the TreeSE format even if you do not have tree. In order to take full advantage of the available methods.

sarah i (14:11:26): > Thank you so much for the thorough explanation everyone Leo and Tuomas, I’m pretty sure that solves basically everything I want. Regarding MDS_BC, is that like, using Bray Curtis matrix in MDS plot? Is it possible to also use Unweighted UniFrac values on a PCoA table?

sarah i (14:14:58): > wait, sorry, unweighted unifrac can only be calculated when there’s a phylogenetic tree, right?

Tuomas Borman (15:17:48): > That example in OMA is actually little bit misleading.namecan be anything, it is just a name of your analysis. It is stored inreducedDimwith that name. > > Ifnameis not specified,name = MDSand you can useplotMDSwithout this hassle > > WithFUNyou specify the distance function. In this case (FUN = vegan::vegdist) it is vegdist. Withmethodyou specify the dissimilarity that is being used. The argument is passed to vegdist in this case. Because the default is"bray"in vegdist,runMDSuses bray-curtis by default whenFUN = vegdist. > > So, this would be better way to calculate Bray-Curtis MDSrunMDS(se, FUN = vegan::vegdist, method = "bray", abund_values = "counts")

Tuomas Borman (15:19:16): > You can calculate unifrac distance and crreate a plot based on those also#' runMDS(esophagus, FUN = calculateUnifrac, name = "Unifrac",``#' tree = rowTree(esophagus),``#' exprs_values = "counts",``#' ntop = nrow(esophagus))But as you said, you require a phylogenetic tree

Leo Lahti (17:06:23) (in thread): > should we update the OMA example if it is misleading..?

Leo Lahti (17:14:17) (in thread): > .. andrunMDSis from thescaterpkg and it is usingexprs_valuesas argument name instead ofabund_valuesat least based onhelp(runMDS)and OMA examples? > > This is of course a bit out of sync with methods inmia, e.g.runNMDSis usingabund_values. > > We cannot changescater but we might like to consider providing a wrapper that allows harmonizing the argument names to avoid confusion.. not really ideal either but it does not seem like a good thing to have functions like runMDS and runNMDS that do the same thing but have similar but different argument names for the same thing..

Leo Lahti (17:15:23) (in thread): > This should be resolved somehow, and OMA examples and explanations possibly clarified so that these things become more clear.

2022-04-01

Tuomas Borman (01:38:40) (in thread): > ohh, yep it is exprs_values…. > > Atleast we should probably improve the example > > Wrapper… maybe, it might be good

Tuomas Borman (03:23:12) (in thread): > But I think plotReducedDim is preferred in our examples, and it works with abund_values

Leo Lahti (10:29:35) (in thread): > Yes. Could you open an issue on this to mia/OMA..? Perhaps even solve when time will allow..

Tuomas Borman (13:26:02) (in thread): > Yes I actually already modified the example (added method = “bray”)

2022-04-25

Indrik Wijaya (02:45:45): > @Indrik Wijaya has joined the channel

2022-04-29

Leo Lahti (14:57:33): > We will run an introductory course on microbiome analysis with R/Bioconductor using the TreeSummarizedExperiment / MultiAssayExperiment framework. In Oulu, Finland on June 20-23, 2022https://microbiome.github.io/course_2022_oulu/-> Feedback, questions welcome.

2022-05-03

David Mateo García (06:01:20) (in thread): > Hi Leo, seems a good course. I did the miaverse introduction in last Radboud Summer School. Is this course a good complement for it? > Thanks in advice!

Leo Lahti (07:05:43) (in thread): > Great to hear. This is meant to be an introductory course to using (Tree)SummarizedExperiment and MultiAssayExperiment data containers in microbiome studies. We will cover partially the same topics than in Radboud 2021 but using entirely different data sets and example cases and you can work with your own data sets if you prefer. Some new functionality and material has been introduced, and MultiAssayExperiment part and integration of multiple data tables / omics was not as well covered last summer in Radboud. The last day is dedicated to this. > > This course is in live format, not online. If this is an issue. Oulu in the northern FInland is very nice in June, though :-)

David Mateo García (07:10:58) (in thread): > Finland seems to be nice in any case:smile:Thanks for the explanation, Leo!

2022-05-17

David Mateo García (03:34:13) (in thread): > Hi Leo, I send the motivation letter to Jenni Hekkala, but I have a few questions: > * Is the course in english? (I supposed yes) > * Where is the link to join the course and pay taxes? I want to take an early registration but I can’t find the website for do it. Maybe I must wait for further instructions? > Thanks in advice!

2022-05-18

Leo Lahti (03:02:04) (in thread): > Yes, we will have the course in English (except if every participant speaks fluent Finnish)

Leo Lahti (03:05:24) (in thread): > We had the first registration DL on May 20 and the idea is to inform all accepted participant immediately afterwards, with more details (including payment info). It takes some time for us to process, I would say that you will get the information email early next week at the latest. They are arranging this first time in Oulu, and preferred to do it this time with email instead of a registration website.

Leo Lahti (03:05:54) (in thread): > I have asked that Jenni confirms to everyone she has received the emails. Did you get confirmation from her about that? Just checking, I will make sure she got it.

Leo Lahti (03:06:53) (in thread): > Nice to meet in person.

David Mateo García (04:43:18) (in thread): > Yes, Jenni confirmed me the registration, thank you!

2022-05-19

Maria Angeles Martinez Rodriguez (06:23:27): > Hi leo for David and me (María Angeles) it would be great to join for the course! > We have already sent the motivation letter > Best regards

2022-05-20

Michal (09:06:54): > @Michal has joined the channel

2022-05-30

David Mateo García (07:12:16) (in thread): > Hi Leo, any news about the course? Thanks!

Leo Lahti (11:43:48) (in thread): > Hi - thanks for your question. You should have received the confirmation today, everyone was accepted in the course. Have you got email from Jenni?

2022-06-09

Leo Lahti (15:46:44) (in thread): > Just to check - have you received all info that you needed so far? We are planning to send instructions to familiarize with the online teaching environment + detailed program soon (by Monday at the latest).

2022-06-29

Henrik Eckermann (08:01:45): > Hi everyone, > > I am preparing a pullrequest for the team behind ANCOMBC to add support for tse (currently only pseq is supported). They use one function calleddata_corein which they extract the necessary data (abundance table and meta data using the functionsabundances(pseq)andmeta(pseq)). This data will then be prepared for the analyses etc.. So, it should be as simple as modifying the first part ofdata_corelike this: > > if (class(data) == "phyloseq") { > feature_table = abundances(data) > meta_data = meta(data) > } else if (class(data) == "TreeSummarizedExperiment") { > feature_table = counts(data) > meta_data = as.data.frame(colData(data)) > } > > Before I prepare this further and submit the pull request I wanted to ask you guys: > > - is this written general enough? What I mean is: does thecountsfunction always work to extract the abundances as counts? I assume that this should be the best way as our convention is to call the assay with the counts “counts”. We could add a short description in the documentation of theancombcfunction that there should be the assay “counts” in the data. ANCOMBC needs counts. Butdata_coredoes not check for the phyloseq either if the abundances are counts, clr etc.. Thus, if the abundances in pseq are relative then you get an unspecific error that does not tell your input is incorrect. It occurs because of how they process the data in the function further down the line… > - if you see any other issues I maybe overlook to do it this way.

Tuomas Borman (08:35:19): > Good! > > In my opinion, it is better to have an option to specify the table because “counts” is just a name. The name of counts table can be anything. Although, the option can be hidden….internal_function <- function(x, abund_values = "counts", ...){`` if( !abund_values %in% assayNames(tse)){`` stop()`` }`` assay <- assay(x, abund_values)``}Best would be that the function checks that the table contains integers…

Henrik Eckermann (09:31:09): > Thanks for feedback Tuomas. I will try to incorporate this. Might send you a PM at some point to assure I did what you intended…

2022-06-30

Leo Lahti (08:25:56): > Yes I agree with Tuomas.

2022-07-04

Henrik Eckermann (11:28:22): > I created the pull request today:https://github.com/FrederickHuangLin/ANCOMBC/pulls. I will keep track of it. > > P.S.: I discussed further with Tuomas how to implement the changes and we figured this would be the most straightforward way to do it with the setup they have currently for ANCOMBC.

2022-07-05

Leo Lahti (10:01:39): > Super.

Leo Lahti (10:04:25): > One thing I noticed is the use of “assay_name” as an argument. In the mia framework this same thing is typically called “abund_values”. It might be useful to harmonize the terminology if possible. Simplest would be to change assay_name to abund_values in the ANCOMBC PR. However I also tend to think that “assay_name” (or even just “assay”) would be more clear also in the mia framework and I am wondering if we should change it there instead. This could be made backward compatible by gradually deprecating the “abund_values” as argument name. Any thoughts on this,@Tuomas Borman@Sudarshan@FelixErnstothers

Leo Lahti (10:05:43): > In practice the abund_value refers to the input assay, to me “abund_values” is intuitively less clear than “assay” or “assay_name”. However if the source of abundance values could be altExp or MAE experiment then “assay” or “assay_name” might be also misleading. I am not sure if there any such cases though

Tuomas Borman (10:12:16): > I would say “assay_name” might be the best since there could be any kind of experiments, and “abundance values” refer more to bacterial abundances than biomolecule concentrations for example (Also I think ‘assay_name’ is better than just ‘assay’ because it is more clear) > > We could change “abund_values” to “assay_name” in miaverse

Henrik Eckermann (11:03:48): > To me alsoassay_nameorassaywould be most intuitive/clear. I will wait what comes out of this discussion.

Henrik Eckermann (11:10:08): > * how about unit tests? for instance checking that same results could be obtained with phyloseq and SE (there are conversion functions between these so it should be easy to check) > what I tested is whether the results that come out of thedata_corefunction (which extracts abundance table and meta data from pseq/tse) are equal when extracted from either data type. But I only tested this for the atlas data set. Should I maybe test is for more datasets to be sure? If the results ofdata_coreare the same, then anything downstream should be the same, too. Therefore, I did not test the altered ancombc function (where only the argument was added and one argument name was changed (which I will change to “x” based on your feedback)

Leo Lahti (12:19:26): > Ok lets change to assay_name but ensure backward compatibiliy

Leo Lahti (12:20:20) (in thread): > Perhaps the tests by pkg authors are enough if they are now adding many

Leo Lahti (12:21:05) (in thread): > I think it might be good to test that outcome is the same for both formats?

Henrik Eckermann (12:46:59) (in thread): > yes for sure. I tested only once using the atlas data. There outcome was same except different sorting in the dataframe (but no influence on results).

2022-07-06

Henrik Eckermann (04:16:36): > > @Leo Lahti: Also: ensure backward compatibility(!). If earlier workflows with ANCOMBC have used the “phyloseq” argument name, that should work also in the future. For instance accepting also “phyloseq” as an (optional) argument and generating deprecation message that recommends to use “x” (or “data”) instead. > Do you mean introducing the “…” argument? And then in the function I must check if “phyloseq” exists in the list of optional arguments and if so extract data from there, right? Do you maybe have an example of how you coded this. That way might go quicker and makes sure I stick to convention. I will search now as well…

Tuomas Borman (04:50:47): > Does this work? Then you can easily remove phyloseq in the future. However, im not sure if this is the best way...``@param phyloseq Will be deprecated``@param x Phyloseq or TreeSe object``...``the_function <- function(x = phyloseq, param2, param3, phyloseq){`` check(x)`` res <- calc_something(x)``}

Henrik Eckermann (05:03:57): > yes that should work. I will do it that way for now. Thanks Tuomas.

Henrik Eckermann (06:07:43): > ok pull request has been altered:https://github.com/FrederickHuangLin/ANCOMBC/pull/77

Leo Lahti (10:11:03) (in thread): > One data set is enough. I was meaning to add unit tests (if they support that in the package in general)

2022-07-07

Henrik Eckermann (11:41:12): > @Tuomas BormanThe PR has been accepted. After installing the dev version: > > library(devtools) > install_github("FrederickHuangLin/ANCOMBC", ref = "develop") > > you can use ANCOMBC with tse. The support will be implemented in the next version together with some other changes:+1:

Tuomas Borman (11:46:25): > SUPER!!

Leo Lahti (12:38:26): > @henrikeckermann87open issue in OMA to update examples when the release is ready (or now already)? If not done yet. It is easy to forget this after some months otherwise

2022-07-18

Leo Lahti (15:01:30): > @Tuomas Bormandid you notice this one “TreeSummarizedExperiment (TSE) now allows rowTree() and colTree() to work as both setters and getters, provides a new slot referenceSeq() to store sequence information, and replaces aggValue with aggTSE to provide more flexible data aggregation. The combination of multiple TSE objects is enabled, for which a new column whichTree is added in LinkDataFrame for rowLinks()/colLinks() to register which rows and columns are are mapped to which trees in rowTree() & colTree(). Also, an example analysis of CyTOF data is added as a new use case of TreeSummarizedExperiment.” from the TreeSE F1000 paper by@Ruizhu HUANGhttps://f1000research.com/articles/9-1246 - Attachment (f1000research.com): F1000Research Article: TreeSummarizedExperiment: a S4 class for data with hierarchical structure. > Read the latest article version by Ruizhu Huang, Charlotte Soneson, Felix G.M. Ernst, Kevin C. Rue-Albrecht, Guangchuang Yu, Stephanie C. Hicks, Mark D. Robinson, at F1000Research.

Leo Lahti (15:08:18): > Interesting concept to merge distinct TreeSEs so that each of them can maintain link to a different tree. I wonder what might be an example use case.

Leo Lahti (15:08:49): > Conceptually this is different from the case where a shared tree is assumed (or trees removed) during the merge operation.

Leo Lahti (15:13:21): > Regarding merge f the merged TreeSEs do not share the tree, it would be ideal to utilize the above mentioned mechanism and keep all trees. The current documentation of mia::mergeSEs does not state clearly how it handles this (discard the trees if not shared between objects?)

Tuomas Borman (15:40:33): > This more flexible agglomeration is new thing, I actually haven’t took that in to account > > I think the main purpose of mergeSEs is to combine TreeSEs as an initial step, and it just checks that rowTree/colTree node labels match with rownames. Otherwise it is just discarded > > However, this might be good to consider

Leo Lahti (16:30:47): > Because this option already exists I feel that it should be the default to keep all trees?

Leo Lahti (16:31:21): > So instead of discarding one could keep them.

Leo Lahti (16:31:29): > And combine when possible.

Leo Lahti (16:31:48): > Someone asked about a rowLink at the course, so I checked it up..

2022-07-21

Leo Lahti (15:41:06): > https://environmentalmicrobiome.biomedcentral.com/articles/10.1186/s40793-022-00426-0?s=03@Sudarshan - Attachment (BioMed Central): specificity: an R package for analysis of feature specificity to environmental and higher dimensional variables, applied to microbiome species data - Environmental Microbiome > Background Understanding the factors that influence microbes’ environmental distributions is important for determining drivers of microbial community composition. These include environmental variables like temperature and pH, and higher-dimensional variables like geographic distance and host species phylogeny. In microbial ecology, “specificity” is often described in the context of symbiotic or host parasitic interactions, but specificity can be more broadly used to describe the extent to which a species occupies a narrower range of an environmental variable than expected by chance. Using a standardization we describe here, Rao’s (Theor Popul Biol, 1982. https://doi.org/10.1016/0040-5809(82)90004-1, Sankhya A, 2010. https://doi.org/10.1007/s13171-010-0016-3 ) Quadratic Entropy can be conveniently applied to calculate specificity of a feature, such as a species, to many different environmental variables. Results We present our R package specificity for performing the above analyses, and apply it to four real-life microbial data sets to demonstrate its application. We found that many fungi within the leaves of native Hawaiian plants had strong specificity to rainfall and elevation, even though these variables showed minimal importance in a previous analysis of fungal beta-diversity. In Antarctic cryoconite holes, our tool revealed that many bacteria have specificity to co-occurring algal community composition. Similarly, in the human gut microbiome, many bacteria showed specificity to the composition of bile acids. Finally, our analysis of the Earth Microbiome Project data set showed that most bacteria show strong ontological specificity to sample type. Our software performed as expected on synthetic data as well. Conclusions specificity is well-suited to analysis of microbiome data, both in synthetic test cases, and across multiple environment types and experimental designs. The analysis and software we present here can reveal patterns in microbial taxa that may not be evident from a community-level perspective. These insights can also be visualized and interactively shared among researchers using specificity’s companion package, specificity.shiny.

2022-07-23

Leo Lahti (06:49:49): > This will help to simplify some (Tree)SE workflowshttps://www.bioconductor.org/packages/devel/bioc/vignettes/tidySummarizedExperiment/inst/doc/introduction.html

Leo Lahti (06:50:04): > Alsohttps://github.com/stemangiola/tidySingleCellExperiment

2022-09-12

Samuel Gamboa (13:31:23): > @Samuel Gamboa has joined the channel

2022-09-16

Andrea Sottosanti (06:12:05): > @Andrea Sottosanti has joined the channel

2022-09-23

Henrik Eckermann (04:19:32): > Hi, > > after using the agglomerateByRank function, I can no longer use the package tidySummarizedExperiment it seems. At least the output when using is no longer what I would expect (please see below code example). Expected (after using the filter function) would be to get a new tse object where only the samples are retained that fulfill the filter criterion. However, after using the aggolmerateByRank function, it returns no more tse object but a tibble with the warning “tidySummarizedExperiment says: The resulting data frame is not rectangular (all genes for all samples), a tibble is returned for independent data analysis.” > > Is this a bug? If yes, I would open an issue on github. Here is an example: > > library(mia) > library(tidySummarizedExperiment) > > > data(GlobalPatterns) > tse <- GlobalPatterns > # normal behavior is this > filter(tse, SampleType == "Soil") > # after agglomerateByRank, the behavior changes > tse_a <- agglomerateByRank(tse, "Genus") > filter(tse_a, SampleType == "Soil") >

Tuomas Borman (05:23:51): > Hi, > > thanks for good example > > It works like this# after agglomerateByRank, the behavior changes ``tse_a <- agglomerateByRank(tse, "Genus", onRankOnly = T)``filter(tse_a, SampleType == "Soil")I think this is a bug intidySummarizedExperimentHere, the filter function checks if “data cannot be a SummarizedExperiment”https://github.com/stemangiola/tidySummarizedExperiment/blob/4198d94e4eee0f51f55d6c81a47fbc73e57da047/R/dplyr_methods.R#L327This function checks, if each feature is present in assay ncol times. I.e., if there is no duplicates in rows of assayhttps://github.com/stemangiola/tidySummarizedExperiment/blob/4198d94e4eee0f51f55d6c81a47fbc73e57da047/R/validation.R#L3–> When agglomerating, it might be that taxa is not present in certain taxonomy level –> it gets the lowest taxonomy level available as a name –> multiple rows can be named equally -> this function returns FALSE –> the result is that the object is not SE –> since the object is actually SE, this is a bug

Henrik Eckermann (05:44:10): > Thanks Tuomas! Seems like something worth opening up an issue on their github I would say. I am going ahead to do that unless someone disagrees.

2022-09-24

Leo Lahti (10:51:45): > Does mia throw an error if multiple rows would have the same name after agglomeration? I think that could be warranted as well..

Tuomas Borman (11:21:33): > Only the names are the same, but the rows are different. I think this should not be an error. However, we could just add there make_unique = TRUE option (agglomertaeByRank is already using .get_taxonomic_label function that has that option –> we could just enable...)

2022-09-25

Leo Lahti (07:24:04): > hmm I think that might be a neat idea, even if rarely used

2022-09-27

Jennifer Holmes (16:15:35): > @Jennifer Holmes has joined the channel

2022-09-29

Leo Lahti (02:36:45): > I wonder if this could useful for tree part of TreeSE:Ggtree: A serialized data object for visualization of a phylogenetic tree and annotation datahttps://onlinelibrary.wiley.com/doi/full/10.1002/imt2.56

Tuomas Borman (08:39:00): > We have utilized it in miaViz::plotRowTree

Leo Lahti (09:41:13): > aha, cool:smile:

2022-10-08

Leo Lahti (16:08:22): > https://twitter.com/olafurw/status/1578704185809244160?s=20&t=MpGQM310pv0RELDVvE8wbQ - Attachment (twitter): Attachment > All unit tests passing.

2022-10-17

Leo Lahti (12:47:58): > If this inspires someting:https://enblacar.github.io/SCpubr-book/ - Attachment (enblacar.github.io): About this package | SCpubr > Generating high quality, publication-ready plots for Single Cell transcriptomics.

Leo Lahti (12:52:07): > This,too:https://twitter.com/wolfgangkhuber/status/1582031364987052032?s=31 - Attachment (twitter): Attachment > Beta release of a quarto version of “Modern Statistics for Modern Biology” with @SherlockpHolmes
> With many thanks to @CrowellHL and @grimbough . > https://www.huber.embl.de/msmb-quarto/

2022-10-19

Leo Lahti (13:47:23): > benchdamic is an R/Bioc pkg for benchmarking differential abundance methods, and seems to support TreeSE nowhttps://bioconductor.org/packages/release/bioc/vignettes/benchdamic/inst/doc/intro.html@Henrik Eckermann@Tuomas Borman

Tuomas Borman (13:48:11): > Cool!

2022-10-20

Moritz E. Beber (13:01:50): > Hey folks, > I typically do most of my plumbing work in Python and only the last part of the statistical analysis/plotting in R. So I was curious if there is a way to create the data structure for aTreeSummarizedExperimentfrom Python? Or maybe there is another format that is close enough so that it can be quickly converted in R? Thank you for any insights.

2022-10-21

Leo Lahti (03:37:32): > I am not aware but if you store your files in standard formats (like csv, tre etc) then it should be straightforward to import in R and combine into TreeSE. Or you could use some R/Py compatible compressed formats like feather to transfer matrices/dataframes between the two? But the TreeSE itself is not available in Python afaik.

Leo Lahti (03:37:58): > Reading the data in R and putting it into TreeSE should not be many lines.

Moritz E. Beber (03:52:19) (in thread): > Okay, thanks. I was just curious if I can prepare a “data package” from Python directly. I was mostly concerned with preparing the tree in the right way. I’ll use arrow then as you recommend and figure something out for the taxonomy.

Leo Lahti (08:30:50) (in thread): > i would be curious to know, too, if there is some nice way to prepare data in a good way in Python

2022-10-24

Marja Heiskanen (16:24:56): > @Marja Heiskanen has joined the channel

2022-10-28

Moritz E. Beber (11:36:27) (in thread): > Since theBIOM format is supported, I will try to create such an object in Python and then load it in R. - Attachment (microbiome.github.io): Chapter 2 Microbiome Data | Orchestrating Microbiome Analysis > Chapter 2 Microbiome Data | Orchestrating Microbiome Analysis

Leo Lahti (16:54:25) (in thread): > Yes mia has biom importers.

2022-12-19

Renuka Potbhare (08:09:58): > @Renuka Potbhare has joined the channel

Matti Ruuskanen (14:17:27): > @Matti Ruuskanen has joined the channel

Matti Ruuskanen (14:19:01): > Is there a better way to include additional multiomic “assays” of the samples (in addition to microbiome data) in a TreeSE container, than just putting them in ColData?

Leo Lahti (14:19:39): > Table 2.1 in OMA summarizes our recommendations:https://microbiome.github.io/OMA/containers.html#data-containers - Attachment (microbiome.github.io): Chapter 2 Microbiome Data | Orchestrating Microbiome Analysis > Chapter 2 Microbiome Data | Orchestrating Microbiome Analysis

Leo Lahti (14:20:12): > The first thing to consider is if they have the same number of features or a different number?

Matti Ruuskanen (14:20:23) (in thread): > Also, I find it quite difficult that when I add new data sets to an existing object, I have to first remove all mismatching samples from both the TreeSE and the new table.

Leo Lahti (14:20:30): > is it like taxonomic vs. metabolomic combination?

Leo Lahti (14:20:46): > or Species vs. Genus; or CLR vs. relabundance?

Leo Lahti (14:21:03): > different types have a a bit different possibilities.

Matti Ruuskanen (14:21:15): > I have a TreeSE object with taxonomic data with a count table, I want to add metabolomic data and a polygenic risk score

Leo Lahti (14:21:28): > there are mechanisms of assay, altExp, and MultiAssayExp

Leo Lahti (14:23:32): > Ok. For real multi-omics the recommended way is to use MultiAssayExperiment, which can collect together several TreeSummarizedExperiments (or other data containers, if you have some specialized ones, depending on the omic). The advantage is that you can share the same sample data (colData) across these, and for instance subsetting operations apply to all data objects on the same go

Matti Ruuskanen (14:23:55) (in thread): > The figure is too small / bad quality to study how I can construct the objects. Even when I open the raw image in another tab,

Matti Ruuskanen (14:24:44): > So solution is to construct 3 separate TreeSE objects and then combine them in a MultiAssayExperiment?

Leo Lahti (14:24:55): > Yes. I am checking if I can find example code.

Matti Ruuskanen (14:25:45): > If I have overlapping “metadata” of other sample measurements, is it still worth it to copy the data to three separate objects and combine? I could make do with a single TreeSE object by just putting the other omics in the colData:wink:

Leo Lahti (14:25:49): > It is not possible to simplify much more when you like to keep them all in a single object (which I would recommend by default since it is easy to extract individual experiments from that whenever necessary).

Matti Ruuskanen (14:26:36): > The downside of putting additional omics sets in the colData is that then you have to extract and subset it manually to the features in each omics sets, though.

Leo Lahti (14:26:56): > If the sample metadata is shared, then you can include it just once for the MAE (MultiAssayExperiment). However you can additionally have dataset-specific metadata fields in each TreeSE..

Leo Lahti (14:27:51): > Yes exactly that’s the problem. Including them in colData sort of overlooks the fact that they are full real data stes.

Leo Lahti (14:29:13): > This script shows an example, although here you have SummarizedExperiment (not TreeSE). But it works exactly same way (lines 116-136):https://github.com/microbiome/microbiomeDataSets/blob/master/inst/scripts/make-hintikka-xo-data.R

Matti Ruuskanen (14:30:35): > Okay, that looks like what I was looking for. Would be nice to have this kind of constructing examples in the OMA book directly

Matti Ruuskanen (14:30:56): > Specifically: > > sem <- SummarizedExperiment(assays = list(counts=otu_cecum), > rowData = tax) > sen <- SummarizedExperiment(assays = list(nmr=nmr)) > seb <- SummarizedExperiment(assays = list(signals=bm)) > > ## Create a MultiAssayExperiment instance > ExpList <- ExperimentList(list(microbiota=sem, > metabolites=sen, > biomarkers=seb)) > mae <- MultiAssayExperiment(experiments = ExpList, > colData = meta_cecum) >

Leo Lahti (14:33:30): > I certainly agree. I think this is a gap that should be filled asap.

Leo Lahti (14:34:54): > I opened an issue:slightly_smiling_face:https://github.com/microbiome/OMA/issues/205

Leo Lahti (14:35:18) (in thread): > Just note that we would strongly recommend using TreeSE instead of SE.

Leo Lahti (14:35:52) (in thread): > I also notice this example has shortcoming, you would better use SimpleList instead of list..:slightly_smiling_face:

Leo Lahti (14:35:58) (in thread): > both work anyway

Leo Lahti (14:38:00): > We will need to run a hackathon soon to troubleshoot many issues at one go.

Leo Lahti (14:38:46) (in thread): > The Table 2.1?

Leo Lahti (14:39:00) (in thread): > Ah you are looking at Figure 2.1

Leo Lahti (14:39:10) (in thread): > There is also Table 2.1

Leo Lahti (14:40:15) (in thread): > Yes Fig. 2.1 is not meant to teach how to construct the objects, more like to give an overview of available stuff.

Leo Lahti (14:40:33) (in thread): > Table 2.1 is just summarizing the different options (assay, altExp, MAE) and their differences.

Leo Lahti (14:40:39) (in thread): > The example we really need to add still.

Leo Lahti (14:41:06) (in thread): > MAE is a bit more advanced and we almost never really get into that on courses, so the material is lagging a bit.

Leo Lahti (14:43:17) (in thread): > @Tuomas Bormanhas created two functions to calculate cross-associations between different experiments:https://microbiome.github.io/mia/reference/getExperimentCrossAssociation.html

Leo Lahti (14:43:32) (in thread): > There are some examples in OMA ch 13https://microbiome.github.io/OMA/multi-assay_analyses.html - Attachment (microbiome.github.io): Chapter 13 Multi-assay analyses | Orchestrating Microbiome Analysis > Chapter 13 Multi-assay analyses | Orchestrating Microbiome Analysis

Leo Lahti (14:48:13) (in thread): > In fact you could also combine phyloseq objects into MAE.. although I would go with TreeSE.

2022-12-21

Asiye (12:17:10) (in thread): > I volunteer to run the tests and contribute:raised_hand:as I’ve already built my MAE of 3 TSEs

Matti Ruuskanen (12:49:52) (in thread): > This is how I ended up doing it: > > #create multiomic object > ExpList <- ExperimentList(list(microbiota=data.matrix(assays(centrifuge_data)[["rclr"]]), metabolites=assays(NMR_data)[[1]], genetics=assays(PRS_data)[[1]])) > mae <- MultiAssayExperiment(experiments = ExpList, colData = colData(centrifuge_data)) >

Matti Ruuskanen (12:50:43) (in thread): > “centrifuge_data” is an existing TSE

Matti Ruuskanen (12:51:31) (in thread): > There are some little data wrangling issues, like you need to check that the tables have all the same samples and are in the correct orientation and such…

Leo Lahti (13:22:05) (in thread): > I would suggest to use TreeSEs as ExperimentList elements, this would help to make sure that they have the correct orientations by default etc. and make the code more readily applicable to any multi-omic case like this. I can see, though, that for simple use cases you might like to reduce to matrices immediately as MAE allows that.

Leo Lahti (13:23:45) (in thread): > Do you really need to do data.matrix on the linelist(microbiota=data.matrix(assays(centrifuge_data)[["rclr"]])

Leo Lahti (13:24:04) (in thread): > I thought that the assays are already numeric matrices

Leo Lahti (13:24:36) (in thread): > A preferred way to call an assay would be:assay(centrifuge_data, "rclr")

Leo Lahti (13:25:12) (in thread): > By the way I generally do not prefer rCLR for transformation. It seems to be problematic and it hasn’t been widely adopted either despite being around since 2019.

Matti Ruuskanen (13:54:30) (in thread): > ah, well I had a lot of issues retaining the numerical features with matrices

Matti Ruuskanen (13:55:30) (in thread): > I’m not sure how the object conversions convert matrices, but as.matrix() puts all numeric columns to character columns, which are not accepted later on in my pipeline

Matti Ruuskanen (13:55:42) (in thread): > so I tried several things to correct this… not sure if these are needed

Tuomas Borman (14:38:53) (in thread): > In matrix, there can be only one data type (numeric or character). If there are numeric and character values, as.matrix converts all values into character. If there are only numeric values, values should stay as numeric > > –> Are there some columns that have character values?

Tuomas Borman (14:39:04) (in thread): > L3 <- LETTERS[1:3] > char <- sample(L3, 10, replace = TRUE) > (d <- data.frame(x = 1, y = 1:10, char = char)) > > is.numeric(d[,1]) > is.numeric(d[,2]) > is.numeric(d[,3]) > > (mat <- as.matrix(d)) > > is.numeric(mat[,1]) > is.numeric(mat[,2]) > is.numeric(mat[,3]) > > (mat <- as.matrix(d[,1:2])) > > is.numeric(mat[,1]) > is.numeric(mat[,2])

Leo Lahti (14:55:14) (in thread): > If you construct TreeSE as expected (in microbiome context), then you can always trust later in the workflow that the assay is a numeric matrix (features x samples)

Leo Lahti (14:55:28) (in thread): > This makes it easier to standardize everything.

Matti Ruuskanen (15:05:51) (in thread): > yes, there were some features in the assays which were problematic (sample codes) and shouldn’t have been there. I corrected that issue, but the data.matrix() function apparently remained in my code, since it didn’t do anything harmful

Matti Ruuskanen (15:06:57) (in thread): > There were also some issues with the raw assay files themselves, as some columns which were supposed to be numeric had non-numeric NA or not determined -codes:smile:

Leo Lahti (15:10:35) (in thread): > Right. But that’s part of the initial data cleaning that should be done anyway, not an issue with TreeSE per se. After constructing data objects you should be able to trust them well.

Matti Ruuskanen (15:13:37) (in thread): > yes, but being new to TreeSE and all of that, I am not sure what features it is enforcing on the different parts of the container. It was part of my troubleshooting:slightly_smiling_face:

Leo Lahti (15:41:30) (in thread): > Yes, now you know!:muscle:

2023-01-02

Virginie Stanislas (04:56:01): > @Virginie Stanislas has joined the channel

2023-01-16

Katariina Pärnänen (05:13:44): > @Katariina Pärnänen has joined the channel

2023-01-20

Leo Lahti (13:17:03): > Something to consider? > > Bioconductor (@Bioconductor) twiittasi 3:30 ip. on pe, tammik. 20, 2023: > Bioconductor plans to participate in the @Outreachy Mar-Aug 2023 internship cycle. We’re seeking Outreachy interns who would like to work on a Bioconductor project and mentors to propose projects. > > Feb 6 - deadline to apply to be an intern > Feb 24 - deadline to submit projects > (https://twitter.com/Bioconductor/status/1616427968347181058?t=EL0zl9cAzEuJHdV_dohrNA&s=03) - Attachment (twitter): Attachment > :mega:Bioconductor plans to participate in the @outreachy Mar-Aug 2023 internship cycle. We’re seeking Outreachy interns who would like to work on a Bioconductor project and mentors to propose projects. > > Feb 6 - deadline to apply to be an intern > Feb 24 - deadline to submit projects

2023-01-23

David Mateo García (10:26:24): > Seems interesting. I’ll take a look, thanks for sharing.

2023-01-24

Leo Lahti (09:39:21): > CALL FOR APPLICATIONS: GRANTS FOR SEMINARS EXPLORING HUMAN-MICROBIAL RELATIONSCSSM, theCentre for the Social Study of Microbesat the University of Helsinki, is a hub for social scientists and artists conducting research on human-microbial relations. We aim to develop theory and methods to better make sense of the complex relations between humans, nonhumans, microbes, and their environments. We now invite applications from researchers, groups or collectives to join this network and organise theory production seminars external to the University of Helsinki! > > The CSSMseminar grantis a competitive grant callto organise a 2-3 day seminar on a theme and at a location proposed by an applicant external to the CSSM. The objective is to maximise the opportunities for networking among social researchers and artists conducting research on human-microbial relations at the applicant’s host institution (or nearby). Similar to what we do in the CSSM,the seminar should aim to develop theory and methods to better make sense of the complex relations between humans, nonhumans, microbes, and their environments. > > Vision, innovation and creativity will be key factors in selecting recipients for this grant, as will be the potential of the seminar to support/develop collaborative networks beyond the applicant group and CSSM. Applicants from the Global South and members of minorities are especially welcome to apply. > > Successful application will require institutional affiliation in order to process the payment of the grant. The seminar should take place between March and December 2023.Please submit your application, between 2-5 pages, to cssm@helsinki.fi**** by 17th of February 2023. Your application should at least include: > * the theme and rationale of your seminar > > * time, location and planned number of participants > > * how does the seminar contribute to theory development i.e. to better grasp the complex relations between humans, nonhumans, microbes, and their environments? > > * budget plan (max. €11,000) including breakdown of costs > > * the innovation and creativity of your vision > > * (possible) previous experience organising event and institutional support – potential outputs > > * networking plan – how do you intend to expand this network? > > * how does the seminar benefit the Global South and members of minorities (if applicable) > We are looking forward to your applications! We will aim to get back to you with our decision by the end of February.Please spread the call in your local networks, thank you.** - Attachment (The Centre for the Social Study of Microbes): Frontpage – The Centre for the Social Study of Microbes > Centre for the Social Study of Microbes Helsinki, Finland The Centre for the Social Study of Microbes at the University of Helsinki is a hub for social scientists and artists conducting research on human-microbial relations. We aim to develop theory and methods to better make sense of the complex relations between humans, nonhumans, microbes, and […]

2023-01-28

Leo Lahti (11:00:56): > Also note the STAMPS course with the excellent lineup of teachers:https://www.mbl.edu/education/advanced-research-training-courses/course-offerings/[…]tegies-and-techniques-analyzing-microbial-population-structures - Attachment (Marine Biological Laboratory): Strategies and Techniques for Analyzing Microbial Population Structures (STAMPS) | Marine Biological Laboratory > The STAMPS course promotes dialogue and the exchange of ideas between experts in environmental and microbiome analysis and offers interdisciplinary bioinformatics and statistical training to practitioners of molecular microbial ecology and genomics.

2023-01-31

Moritz E. Beber (10:33:57): > @Leo LahtiI finally worked a bit on the Python -> biom-format -> R -> TreeSummarizedExperiment idea. It’s not working out so well:laughing:I can transfer the data just fine but getting the taxonomy across and into a row tree is annoying. First issue is that, since biom uses HDF5, any input is converted into a full table. That means, any taxonomy needs to be converted into a “canonical” table format (every lineage needs to have the same ranks). Second, it doesn’t seem like I can readily create a taxonomy from both numeric identifiers and names together.

Moritz E. Beber (10:37:59): > So it’s probably a better idea to load the taxonomy separately.

Francesc Català-Moll (14:34:52): > @Francesc Català-Moll has joined the channel

Leo Lahti (15:25:09) (in thread): > Ah, great you explored that. Is the biom/HDF5 thing problem in Python or R side, or both?

Moritz E. Beber (16:31:57) (in thread): > Since I wanted to write to biom format from Python, the problem is more on the Python side for me (but would equally apply if you wrote to biom from R). It simply expects a full table of equal ranks, it does not handle unequal lineages by itself, i.e., it cannot handle lists of different lengths denoting the lineages. > > Since I didn’t know if the code that creates aTreeSummarizedExperimentwould handle gaps in the taxonomy table and I didn’t know how to apply the original taxonomy identifiers in addition to the labels coming from the lineages; I stopped there for now rather than generating such a table with gaps.

2023-02-03

Leo Lahti (04:09:20): > The scater package has now switched fromexprs_valuesargument toassay_namein PR #187 and this supports the mia package family conventions. - Attachment: #187 Adding assay_name aliases > This PR addresses issue #186 . > > Changes: > > • Added assay_name as an alias for exprs_values in function calls (“API”); also changed the functions to use assay_name internally so that deprecation of exprs_values later is easier; no changes in actual functionality have been introduced. > • Similar aliases and updates to by_exprs_values (by_assay_name), altexp_exprs_values (altexp_assay_name) and exprs_logged (assay_logged) > • Very minor code polishing on the way > • DESCRIPTIOn file info updated accordingly > • manpages updated with devtools::document() > > Checks: > > • All tests pass, except some UMAP related failure that is independent of this PR. > • build(), check(), BiocCheck() OK (with that one independent exception) > > TODO (to discuss before adding): > > • Update inst/NEWS.md if this is desirable? > • Add deprecation messages for exprs_values (or perhaps later?) > • Update examples, tests, and vignettes to use assay_name (or perhaps later?)

2023-02-15

Leo Lahti (02:14:18): > Related to this I would like to have some discussion naming conventions in mia vs. scater & scuttle. > > Nowmiaandscateruse underscores with arguments (e,.g.assay_name) > > Thescuttlepackage prefers dot arguments (e.g.assay.type), using naming conventions summarised inscuttle issue #21. > * dots as separators in argument names > * lower camel case in function names > * upper camel case in class names > * underscores in internal function names > These three packages are interoperable, and harmonized naming conventions would be beneficial. > > Thescuttlepackage has suggested a clear scheme to follow, and it seems the this is the scheme for the SCE container from which TreeSE inherits. Thescaterpackage would seem to agree to change dot argument names (seehttps://github.com/Alanocallaghan/scater/issues/189). Ideally this should be done before the next Bioc April release. > > I suggest to switch to these naming conventions inmiapackage family. At least for the argument names (other names could wait as they are less critical for package interoperability), so that they become harmonized between these packages. The old names would be deprecated but remain functional. Similar change would be done toscatervia PR, as initially agreed. > > Before we proceed to implement these changes I would welcome comments.@Tuomas Bormanand all others!

Tuomas Borman (14:41:32): > Yes, I think we should follow scuttle and scater and other SCE ecosystem; it helps interoperability (there are more disadvantages if we do not change argument names…) > > We should also check that arguments are named similarly in other ways also (assay.name vs assay.type –> assay.type might be preferred since scuttle uses it) so that these packages work as seamlessly as possible together > > Our functions already follow lower camel case (atleast most of them; we have also functions like runDMN)

2023-02-16

Leo Lahti (10:53:11): > I would go with a clear logic, andscuttle has a clear suggestion. Then the rest of the packages (including mia toolkit) should follow the suite. Trying to comply with many differnet external packages with varying naming conventions is not gonna be sustainable in the longer run. > > In the short term it could be enough to changeassay_nametoassay.type(and same for other closely related names). If we can harmonize other things on the same go then I certainly support that.

Leo Lahti (10:55:09): > I can make the PR for the assay_name stuff as this follows directly the earlier PR toscater and will make these packages compatible at least for this part. > > If we like to add other argument names then let us have a separate issue and/or PR to assess the overall workload and feasibility. If such changes have to be made it would best to do now when the system if shaping up.

Tuomas Borman (12:24:35): > Sounds good

2023-03-06

Thomas Klammsteiner (08:27:06): > @Thomas Klammsteiner has joined the channel

Andres Wokaty (13:30:03): > @Andres Wokaty has joined the channel

2023-03-08

Reece (12:53:37): > @Reece has joined the channel

OLADIRAN OLUWASHOLA (18:40:38): > @OLADIRAN OLUWASHOLA has joined the channel

2023-03-09

K Nodia (08:06:40): > @K Nodia has joined the channel

2023-03-10

Dennis Ndubi (04:32:08): > @Dennis Ndubi has joined the channel

Dennis Ndubi (04:33:24): > Greetings mentors and fellow applicants, Dennis Ndubi here, an Outreachy intern. I am interested in contributing to the project “Optimize microbiome data science framework”, thus I would like to connect with people who are already working on it for further guidance and help. Looking forward to building a great network and community. Thank you.@Tuomas Borman

K Nodia (04:35:08): > Welcome@Dennis Ndubii am K Nodia an outreachy applicant , i got interest in this project and i will be contributing here ,the mentor for this project is@Tuomas Bormanand the community will be here to help where necessary Thank you

K Nodia (04:38:02): > Additionnally miaverse consist of multiple packages. You can contribute to mia package > for instance (it is the most important package in the miaverse) > > From the website that you have there, you can find all these packages: mia, miaViz, miaTime… > > So mia would behttps://github.com/microbiome/mia

Dennis Ndubi (04:38:41): > Thank you@K Nodiafor the cordial welcome. Am also looking forward to making contributions to the project and make an impact in open source

K Nodia (04:40:33): > you are welcome

Tuomas Borman (04:48:04): > Welcome@Dennis Ndubiand@K NodiaGood news for the miaverse community! As you noticed, miaverse got accepted to be one of the Outreachy projects, so we got new contributors

K Nodia (05:06:34): > it is my pleasure to be here:raised_hands:

K Nodia (07:41:23): > greetings ; while i m trying to work on the small tasks listed in outreachy website i am facing some issues to locate the exercises starting from 4. Do exercise 19.1 from OMA i am checking in the repository but i m somehow not seeing them@Tuomas Bormanand community Thanks for the Help

Tuomas Borman (07:45:36): > We have had some changes in OMA (apparently one chapter has removed) –> but anyway, these exercises refer to thishttps://microbiome.github.io/OMA/exercises.html–> so exercise 19.1 would be now 18.1 - Attachment (microbiome.github.io): Chapter 18 Exercises | Orchestrating Microbiome Analysis > Chapter 18 Exercises | Orchestrating Microbiome Analysis

K Nodia (08:17:57): > Thank you

sophy (08:46:32): > @sophy has joined the channel

2023-03-11

Muluh (02:16:15): > @Muluh has joined the channel

Muluh (03:01:36): > the chapters have shifted again and now we have Exercises in Chapter 21. I’m not quite sure but that is where it is for now. Please correct me if I’m wrong.@Tuomas Borman

Leo Lahti (03:49:35): > Hi@Muluh- thanks for your observation. Now the exercises are again in Chapter 19 as initially announced. My apologies for this - I am responsible for the switching chapter numbers. We had an exotic build failure that only occurred after merging a PR, and I had to rerun the book couple of times in order to identify and fix it. Now all seems to work again and the chapter numberings are in their original places.

Leo Lahti (03:50:16): > The automated builds always take time, so this had to be done over the last 1-2 days.

Muluh (03:51:01): > please I have a question. Are we expected to submit a single quarto/rmd/r file for all tasks or individual tasks with instructions to create new files will be submitted separately?

Muluh (03:51:46) (in thread): > Thank you for the clarification:raised_hands:

Leo Lahti (03:52:24) (in thread): > now it is again ch 19

Leo Lahti (03:52:54) (in thread): > @Tuomas Bormanwill be best to answer this but I think that a single file would be fine as long as it is clear.

Muluh (03:53:21) (in thread): > okay thank you.

Tuomas Borman (06:19:12) (in thread): > Yes, single file

Namusisi Sharon (09:11:03): > @Namusisi Sharon has joined the channel

2023-03-12

Muluh (01:11:02): > Hello@Tuomas Borman. I have a question. Should we add links to the PRs on the exercise on fixing typos in vignettes? or can we submit the file and continue working on the task of typos? Also, when we clone the package for the exercise on unittests with devtools, is our objective to ensure that all tests are successful( i.e. maybe explicitly installing dependency packages before test run etc) or Is it okay if we have some failed tests?:slightly_smiling_face:

Leo Lahti (06:58:50) (in thread): > Hi@Muluh- can I ask for clarification for what you mean by adding links to the PRs?

Leo Lahti (06:59:45) (in thread): > Re: unit tests I think it would be good to have all tests passed. The master branch is clear and there should be no unit tests issues at the moment as far as we know (assuming that you have installed all dependencies locally).

Aliyu Atiku Mustapha (07:05:52): > @Aliyu Atiku Mustapha has joined the channel

Leo Lahti (08:13:43): > Should we have a small “How to contribute” subsection in OMA tutorial itself, or is the README in Github repo already sufficient as is (https://github.com/microbiome/OMA)?

K Nodia (08:34:05): > I guess it would be a Good idea adding it@Leo Lahti

Muluh (09:22:57) (in thread): > oh, i meant there is a tasks for us to fix typos in packages vignettes. i wanted to find out if we can submit the rest of the tasks while looking for typos or we have fix typos , add the links to the rmd file before sending by mail

Muluh (09:26:15) (in thread): > Also I am having a hardtime finding typos in packages vignettes to fix. however i came across some beginner tasks such as :https://github.com/microbiome/OMA/issues/101.cani be assigned to complete it?

Leo Lahti (09:27:00) (in thread): > Let us see what@Tuomas Bormansays

Leo Lahti (09:27:37) (in thread): > This also a nice PR task for someone:hugging_face:

Muluh (09:27:59): > Hello I found this cheatsheet. hope it helps my fellow outreachy applicants with devtools. - File (PDF): devtools-cheatsheet.pdf

Muluh (09:28:29) (in thread): > Can I work on this?

Leo Lahti (09:42:28) (in thread): > For sure. Could you first open a github issue in OMA, then we can assign it to you and it is open for comments when you are working on the PR.

Andres Wokaty (09:42:38) (in thread): > Hi, I’m the coordinator. I want to caution everyone because of OMA’s license that make it not open source, work on OMA itself is not appropriate for Outreachy applicants. Outreachy applicants should work on open source projects.

Leo Lahti (09:52:32) (in thread): > Right, that’s true Jennifer. Apologies for this. We can direct the outreachy efforts on the R package vignettes instead. Those are fully open source and do not have the NC clause. > > This specific Slack channel itself has been for a long time dedicated to the whole mia ecosystem, including OMA. I think it is fair to say that contributions to OMA are welcome but they cannot be considered as part of the Outreachy activities. Does this sound feasible,@Andres Wokaty? > > We are open to reconsidering the OMA license but this has to be discussed with all co-authors and that will take more time. The initial decision on CC-BY-SA-NC for OMA dates back 3-4 years and takes into account the NC clause is specifically common with books, see e.g.https://web.stanford.edu/class/bios221/bookon the other hand OSCA seems to go full CC-BYhttps://bioconductor.org/books/release/OSCA- if anyone knows good literature discussing this question in the context of online books in more depth let us know.

Andres Wokaty (09:58:53) (in thread): > I appreciate your understanding and this discussion. I just want to make sure that we honor our commitments to Outreachy and applicants.

Leo Lahti (10:02:06) (in thread): > Yes I entirely agree. > > We will add the “How to contribute” chapter to OMA in other ways, and continue discussing the license with anyone who likes to join in the discussion.

Andres Wokaty (10:03:16) (in thread): > If OMA goes open source, maybe consider submitting to Bioconductor?:slightly_smiling_face:

Leo Lahti (10:04:28) (in thread): > Yes that has also been on the list of possiblities.

Muluh (10:47:43) (in thread): > okay thank you for the clarification.@Leo Lahti,@Andres Wokaty. I continue searching for typos in packages vignettes. however if there are issues that fit the outreachy commitments and can add to my contributions, I will gladly have a look at them:)

2023-03-13

Tuomas Borman (02:33:10) (in thread): > 1. Yes, you can send me the Rmd/R/quarto document that includes those small tasks (exercises from OMA)tvborm@utu.fi–> then you can continue working with other tasks > 2. You can also fix typos from function documentations > 3. As discussed in other thread, you cannot contribute to OMA (currently)

Parul Chaddha (08:56:57): > @Parul Chaddha has joined the channel

Muluh (14:12:29): > Hello@Tuomas Borman. please do we get a notification after submitting our qmd file?

anshika bhatt (20:55:25): > @anshika bhatt has joined the channel

2023-03-14

Tuomas Borman (02:04:51) (in thread): > Yes, I try to give everyone some feedback so it might take some time

Muluh (07:48:18): > Hello@Tuomas Borman,@Leo Lahti, I came across this comment, I am thinking it might be a typo(should beglobalright) but not entirely sure. I have raised a PR in the microbiome repo for the other typos I fixed. Should I fix this typo and amend the commit? - File (PNG): patch.PNG

Tuomas Borman (07:50:51) (in thread): > Hi, yes, that’s typo > > Fix & add to PR, thanks

Elisheba Asiimwe Joanita (10:03:21): > @Elisheba Asiimwe Joanita has joined the channel

Elisheba Asiimwe Joanita (10:10:03): > Greetings mentors and fellow applicants,my name is Elisheba Asiimwe, an outreachy intern applicant. So sorry for joining the project late but I am very much interested in contributing to the “Optimize Microbiome data science framework” project. > I would love to connect with anyone who is already working on it for further guidance and help on how I may contribute.I look forward to working and connecting with you all. Thank you@Tuomas Borman

Muluh (10:31:10) (in thread): > I am reposting from the mentor: Create Quarto, R, or Rmd file to answer the questions and send it to Tuomas (tvborm@utu.fi). I’m also happy to help if you have questions. Remember to comment the lines and explain what are you doing. If you end up to issue or an error while doing the exercises, please report it (e.g., add the code and print to your document). > > 1. What are the commands to get X from TreeSE object? > > - Abundance table > > - Feature data > > - Sample metadata > > - Phylogenetic tree > > 2. Describe TreeSE object on your own words. Benefits, the purpose, the structure… > > 3. Describe miaverse and its benefits and disadvantages on your own words. > > 4. Do exercise 19.1 from OMA > > 5. Do exercises 19.2.1 - 19.2.3 from OMA > > 6. Do exercises 19.2.4 - 19.2.5 from OMA > > 7 . Do exercise 19.4.1 from OMA > > 8 . Do exercise 19.4.2 from OMA > > 9 . Do exercise 19.5.1 from OMA > > 10. Do exercise 19.6.1 from OMA > > 11. Give feedback on function documentations. > > - Are the functions easy to use? Are the examples describing? Is the TreeSE object clear to use? If not, what kind of information you would need to understand the concept better? > > Medium tasks: > > 12. Fix typos in package vignettes and create a pull request to corresponding package. > > 13. Clone one of the packages of miaverse. With using commands of devtools, run unit tests. (Take screenshot and add it to your document that you send totvborm@utu.fi. Remember to explain what you did.) > > 14. Clone one of the packages of miaverse. With using commands of devtools, create documents. Create modification to method documentation (R folder contains methods). Create documents again and open the document (man direcotry contains documents). (Take screenshot and add it to your document that you send totvborm@utu.fi. Remember to explain what you did.) > > > Repositories: > > mia:https://github.com/microbiome/miamiaViz:https://github.com/microbiome/miaVizOMA:https://github.com/microbiome/OMAFrom OMAhttps://microbiome.github.io/OMA/, you get more detailed information on miaverse. - Attachment (microbiome.github.io): Orchestrating Microbiome Analysis > Orchestrating Microbiome Analysis

Elisheba Asiimwe Joanita (10:33:25) (in thread): > Thank you:blush:

Muluh (10:33:32) (in thread): > These are the tasks for contribution. I have completed some and you can reach out. If you face any difficulty, we can have a look together:)

Elisheba Asiimwe Joanita (10:34:03) (in thread): > Okay, I will be reaching out

Muluh (10:34:32) (in thread): > Exercises 19.1 and co are found here:https://microbiome.github.io/OMA/exercises.html - Attachment (microbiome.github.io): Chapter 19 Exercises | Orchestrating Microbiome Analysis > Chapter 19 Exercises | Orchestrating Microbiome Analysis

Elisheba Asiimwe Joanita (10:35:56) (in thread): > Okay

Aliyu Atiku Mustapha (10:36:09): > @Aliyu Atiku Mustapha has left the channel

Emmanuel Taiwo devbird007 (10:50:06): > @Emmanuel Taiwo devbird007 has joined the channel

Juanjuan Song (12:08:25): > @Juanjuan Song has joined the channel

Tuomas Borman (15:23:08): > Hello, > > miaverse has already got nice contributions, thanks! As you know, miaverse consist of multiple packages. All those package repositories can be found from herehttps://github.com/microbiomeHowever, the website contains also other repos that are not in miaverse or part of this Outreachy project. > > –> You can contribute to all package repos that start with mia (mia, miaViz, miaTime, miaSim)

K Nodia (20:46:18): > Help : Hello Everyone ;i have been facing this issue , what may be wrong ? i tried to install mia but not successfull > > Warning in install.packages : > package 'mia' is not available for this version of R > > A version of this package for your version of R might be available elsewhere, > see the ideas at[https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages](https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages) > > i am working on the exercises and this is 19.2.1 - File (PNG): Screenshot from 2023-03-15 03-36-41.png

Andres Wokaty (21:18:46) (in thread): > BiocManager::install()is usually the recommended way to install Bioconductor packages. Try using it to install mia?

Muluh (21:37:47) (in thread): > What version of R and R studio are you using?

2023-03-15

K Nodia (03:41:12) (in thread): > [1] “R version 4.2.2 Patched (2022-11-10 r83330)” that s the version i m using

Leo Lahti (04:08:35) (in thread): > Hmm for me mia is available for R-4.2.2. What installation command you have been using?

2023-03-16

Elisheba Asiimwe Joanita (03:45:53) (in thread): > hey, hope you are well, i was just wondering where we do part 1, 2 and 3 of the exercise or do we just skip to the chapter 19 exercises

Tuomas Borman (03:51:31) (in thread): > Hi, do them all to same single quarto/Rmd/R file > > So one file, shoukd contain these all small tasks –> then send it to metvborm@utu.fi

Elisheba Asiimwe Joanita (03:59:16) (in thread): > okay thank you

K Nodia (14:19:07) (in thread): > i am going to reinstall all again from scratch hope it will help ,

K Nodia (16:10:14) (in thread): > i have tried to uninstall and reinstall R but no change , any idea of something else i should look at ? when i useBiocManager::install()it starts and downloads all the packages but at the end i get this error installation of package 'Biostrings' had non-zero exit status7: In install.packages(…) : > installation of package ‘SingleCellExperiment’ had non-zero exit status > 8: In install.packages(…) : > installation of package ‘MultiAssayExperiment’ had non-zero exit status > 9: In install.packages(…) : > installation of package ‘TreeSummarizedExperiment’ had non-zero exit status > 10: In install.packages(…) : > installation of package ‘scuttle’ had non-zero exit status > 11: In install.packages(…) : > installation of package ‘scater’ had non-zero exit status > 12: In install.packages(…) : > installation of package ‘mia’ had non-zero exit status:upside_down_face:

Andres Wokaty (17:49:59) (in thread): > It seems like a missing dependency somewhere. I would try installing each of these dependencies separately, starting with Biostrings, as inBiocManager::install("BioStrings"). If you get an error, I would share all the results so we can look through it together. It also might be helpful to know your operating system.

2023-03-17

K Nodia (00:56:02) (in thread): > Thank you , for the help , i m using ubuntu as an OS , and the Above sugestion Gives me this as error output Installing package(s) ‘BioStrings’ > Warning messages: > > 1: package 'BioStrings' is not available for Bioconductor version '3.16' > > A version of this package for your version of R might be available elsewhere, > see the ideas at[https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages](https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages)2: Perhaps you meant 'Biostrings' ? >

Vivian Ginikachukwu Ikeh (02:18:32): > @Vivian Ginikachukwu Ikeh has joined the channel

Vivian Ginikachukwu Ikeh (02:28:33): > Greetings everyone, I am Vivian, an outreachy applicant, just discovered I have been on the wrong channel all the while and just found this channel for the project. I hope to make contributions, notwithstanding the limited time. And I will be glad if anyone can put me through. Thank you

K Nodia (02:31:29): > welcome

K Nodia (02:32:20): > @Vivian Ginikachukwu Ikehimportant thing is that you understand the miaverse and the data > container that it is utilizing. To start, there are some exercises and > OMA book that you can look at. Those exercises should be on outreachy > website. After you have finished those, you can contribute to the > packages (information on those is also available on website [hopefully]) > > -From Mentor

Vivian Ginikachukwu Ikeh (02:33:02): > @K Nodiathank you

Leo Lahti (05:54:20) (in thread): > did you try to install Biostrings instead of BioStrings?

Leo Lahti (05:54:27) (in thread): > as the message suggests.

Rishi Singh (08:55:34): > @Rishi Singh has joined the channel

2023-03-19

Reece11 (12:47:29): > @Reece11 has joined the channel

2023-03-21

Vivian Ginikachukwu Ikeh (00:28:45): > Hi everyone, I am having issues installing the package

Vivian Ginikachukwu Ikeh (00:55:42): > Done.

K Nodia (17:46:01) (in thread): > i m sorry for replying late , i had some issues with PC

K Nodia (17:55:46) (in thread): > Still Refusing , would u please assist me with some guidance ? may be i am wrong somewhere in the steps ?

Andres Wokaty (17:57:20) (in thread): > Can you share the output forBiocManager::install("Biostrings")?

K Nodia (17:57:48) (in thread): > yes in few minutes

K Nodia (18:00:39) (in thread): > > BiocManager::install("Biostrings") > 'getOption("repos")' replaces Bioconductor standard repositories, see > 'help("repositories", package = "BiocManager")' for details. > Replacement repositories: > CRAN:[https://cloud.r-project.org](https://cloud.r-project.org)Bioconductor version 3.16 (BiocManager 1.30.20), R 4.2.2 Patched (2022-11-10 > r83330) > Installing package(s) 'Biostrings' > also installing the dependencies 'RCurl', 'GenomeInfoDb' > > trying URL '[https://cloud.r-project.org/src/contrib/RCurl_1.98-1.10.tar.gz](https://cloud.r-project.org/src/contrib/RCurl_1.98-1.10.tar.gz)' > Content type 'application/x-gzip' length 731446 bytes (714 KB) > ================================================== > downloaded 714 KB > > trying URL '[https://bioconductor.org/packages/3.16/bioc/src/contrib/GenomeInfoDb_1.34.9.tar.gz](https://bioconductor.org/packages/3.16/bioc/src/contrib/GenomeInfoDb_1.34.9.tar.gz)' > Content type 'application/x-gzip' length 3474381 bytes (3.3 MB) > ================================================== > downloaded 3.3 MB > > trying URL '[https://bioconductor.org/packages/3.16/bioc/src/contrib/Biostrings_2.66.0.tar.gz](https://bioconductor.org/packages/3.16/bioc/src/contrib/Biostrings_2.66.0.tar.gz)' > Content type 'application/x-gzip' length 12426149 bytes (11.9 MB) > ================================================== > downloaded 11.9 MB > > * installing **source** package 'RCurl' ... > **** package 'RCurl' successfully unpacked and MD5 sums checked > **** using staged installation > checking for curl-config... no > Cannot find curl-config > ERROR: configuration failed for package 'RCurl' > * removing '/usr/local/lib/R/site-library/RCurl' > ERROR: dependency 'RCurl' is not available for package 'GenomeInfoDb' > * removing '/usr/local/lib/R/site-library/GenomeInfoDb' > ERROR: dependency 'GenomeInfoDb' is not available for package 'Biostrings' > * removing '/usr/local/lib/R/site-library/Biostrings' > > The downloaded source packages are in > '/tmp/RtmpRa7c5u/downloaded_packages' > Warning messages: > 1: In install.packages(...) : > installation of package 'RCurl' had non-zero exit status > 2: In install.packages(...) : > installation of package 'GenomeInfoDb' had non-zero exit status > 3: In install.packages(...) : > installation of package 'Biostrings' had non-zero exit status > > >

K Nodia (18:02:14) (in thread): > @Andres WokatyHere it is above

Andres Wokaty (18:03:03) (in thread): > Were you able to install any packages?

K Nodia (18:03:59) (in thread): > i was doing the OMA exercises and when i wanted to install mia , thats where the issues started

K Nodia (18:04:23) (in thread): > i have not yet installed any

Andres Wokaty (18:04:41) (in thread): > Can you show mels -la /usr/local/lib/R/site-library/in your terminal?

Andres Wokaty (18:05:25) (in thread): > I want to see if you have any packages in there. That’s where R packages are usually installed

K Nodia (18:06:00) (in thread): > oooh there is a lot 492

Andres Wokaty (18:06:59) (in thread): > Can you show me a few?

K Nodia (18:07:24) (in thread): > > $ ls -la /usr/local/lib/R/site-library/ > total 492 > drwxr-xr-x 123 root root 4096 Mar 22 00:59 . > drwxr-xr-x 3 root root 4096 Mar 10 14:22 .. > drwxr-xr-x 9 root root 4096 Mar 14 11:44 ape > drwxr-xr-x 7 root root 4096 Mar 16 22:21 askpass > drwxr-xr-x 12 root root 4096 Mar 14 11:53 beachmat > drwxr-xr-x 8 root root 4096 Mar 14 11:37 beeswarm > drwxr-xr-x 6 root root 4096 Mar 14 11:40 BH > drwxr-xr-x 14 root root 4096 Mar 14 11:43 Biobase > drwxr-xr-x 9 root root 4096 Mar 14 11:39 BiocBaseUtils > drwxr-xr-x 7 root root 4096 Mar 14 11:40 BiocGenerics > drwxr-xr-x 7 root root 4096 Mar 16 22:21 BiocIO > drwxr-xr-x 7 root root 4096 Mar 16 22:13 BiocManager > drwxr-xr-x 9 root root 4096 Mar 14 11:54 BiocNeighbors > drwxr-xr-x 10 root root 4096 Mar 22 00:51 BiocParallel > drwxr-xr-x 8 root root 4096 Mar 14 11:55 BiocSingular > drwxr-xr-x 5 root root 4096 Mar 16 22:13 BiocVersion > drwxr-xr-x 8 root root 4096 Mar 14 11:36 bit > drwxr-xr-x 10 root root 4096 Mar 14 11:41 bit64 > drwxr-xr-x 7 root root 4096 Mar 14 11:36 bitops > drwxr-xr-x 6 root root 4096 Mar 22 00:51 blob > drwxr-xr-x 7 root root 4096 Mar 14 11:40 cachem > drwxr-xr-x 11 root root 4096 Mar 14 11:40 cli > drwxr-xr-x 13 root root 4096 Mar 14 11:35 colorspace > drwxr-xr-x 8 root root 4096 Mar 14 11:40 cpp11 > drwxr-xr-x 6 root root 4096 Mar 14 11:39 crayon > drwxr-xr-x 7 root root 4096 Mar 14 11:39 DBI > drwxr-xr-x 7 root root 4096 Mar 22 00:52 dbplyr > drwxr-xr-x 8 root root 4096 Mar 14 11:56 decontam > drwxr-xr-x 9 root root 4096 Mar 14 11:51 DelayedArray > drwxr-xr-x 7 root root 4096 Mar 14 11:55 DelayedMatrixStats > drwxr-xr-x 11 root root 4096 Mar 14 11:39 digest > drwxr-xr-x 9 root root 4096 Mar 14 11:55 dplyr > drwxr-xr-x 9 root root 4096 Mar 14 11:45 dqrng > drwxr-xr-x 7 root root 4096 Mar 16 22:20 ellipsis > drwxr-xr-x 8 root root 4096 Mar 14 11:40 fansi > drwxr-xr-x 7 root root 4096 Mar 14 11:36 farver > drwxr-xr-x 7 root root 4096 Mar 14 11:36 fastmap > drwxr-xr-x 7 root root 4096 Mar 16 22:21 filelock > drwxr-xr-x 7 root root 4096 Mar 22 00:52 FNN > drwxr-xr-x 9 root root 4096 Mar 14 11:36 formatR > drwxr-xr-x 6 root root 4096 Mar 14 11:45 futile.logger > drwxr-xr-x 6 root root 4096 Mar 14 11:36 futile.options > drwxr-xr-x 6 root root 4096 Mar 14 11:40 generics > drwxr-xr-x 7 root root 4096 Mar 14 11:36 GenomeInfoDbData > drwxr-xr-x 7 root root 4096 Mar 14 11:56 ggbeeswarm > drwxr-xr-x 8 root root 4096 Mar 14 11:55 ggplot2 > drwxr-xr-x 8 root root 4096 Mar 14 11:56 ggrepel > drwxr-xr-x 8 root root 4096 Mar 14 11:40 glue > drwxr-xr-x 8 root root 4096 Mar 14 11:43 gridExtra > drwxr-xr-x 7 root root 4096 Mar 22 00:52 gtable > drwxr-xr-x 6 root root 4096 Mar 22 00:52 hms > drwxr-xr-x 11 root root 4096 Mar 14 11:47 IRanges > drwxr-xr-x 8 root root 4096 Mar 14 11:36 irlba > drwxr-xr-x 9 root root 4096 Mar 14 11:36 isoband > drwxr-xr-x 8 root root 4096 Mar 14 11:36 jsonlite > drwxr-xr-x 6 root root 4096 Mar 14 11:36 labeling > drwxr-xr-x 6 root root 4096 Mar 14 11:41 lambda.r > drwxr-xr-x 8 root root 4096 Mar 14 11:36 lazyeval > drwxr-xr-x 7 root root 4096 Mar 14 11:44 lifecycle > drwxr-xr-x 8 root root 4096 Mar 14 11:40 magrittr > drwxr-xr-x 6 root root 4096 Mar 14 11:43 MatrixGenerics > drwxr-xr-x 9 root root 4096 Mar 14 11:40 matrixStats > drwxr-xr-x 6 root root 4096 Mar 14 11:45 memoise > drwxr-xr-x 7 root root 4096 Mar 16 22:21 mime > drwxr-xr-x 7 root root 4096 Mar 14 11:40 munsell > drwxr-xr-x 8 root root 4096 Mar 16 22:21 openssl > drwxr-xr-x 8 root root 4096 Mar 14 11:39 permute > drwxr-xr-x 6 root root 4096 Mar 14 11:50 pheatmap > drwxr-xr-x 7 root root 4096 Mar 14 11:50 pillar > drwxr-xr-x 6 root root 4096 Mar 14 11:40 pkgconfig > drwxr-xr-x 7 root root 4096 Mar 14 11:36 plogr > drwxr-xr-x 8 root root 4096 Mar 14 11:41 plyr > drwxr-xr-x 8 root root 4096 Mar 14 11:37 png > drwxr-xr-x 6 root root 4096 Mar 16 22:21 prettyunits > drwxr-xr-x 7 root root 4096 Mar 16 22:22 progress > drwxr-xr-x 8 root root 4096 Mar 14 11:50 purrr > drwxr-xr-x 6 root root 4096 Mar 14 11:40 R6 > drwxr-xr-x 7 root root 4096 Mar 14 11:49 ragg > drwxr-xr-x 7 root root 4096 Mar 16 22:21 rappdirs > drwxr-xr-x 6 root root 4096 Mar 14 11:40 RColorBrewer > drwxr-xr-x 16 root root 4096 Mar 14 11:39 Rcpp > drwxr-xr-x 12 root root 4096 Mar 14 11:43 RcppAnnoy > drwxr-xr-x 12 root root 4096 Mar 14 11:42 RcppEigen > drwxr-xr-x 8 root root 4096 Mar 14 11:42 RcppHNSW > drwxr-xr-x 9 root root 4096 Mar 14 11:46 RcppML > drwxr-xr-x 7 root root 4096 Mar 14 11:37 RcppProgress > drwxr-xr-x 8 root root 4096 Mar 14 11:51 reshape2 > drwxr-xr-x 10 root root 4096 Mar 16 22:20 rjson > drwxr-xr-x 7 root root 4096 Mar 16 21:31 rlang > drwxr-xr-x 9 root root 4096 Mar 14 11:52 RSQLite > drwxr-xr-x 7 root root 4096 Mar 14 11:37 rsvd > drwxr-xr-x 7 root root 4096 Mar 14 11:44 Rtsne > drwxr-xr-x 10 root root 4096 Mar 14 11:45 S4Vectors > drwxr-xr-x 7 root root 4096 Mar 14 11:51 ScaledMatrix > drwxr-xr-x 6 root root 4096 Mar 14 11:45 scales > drwxr-xr-x 9 root root 4096 Mar 14 11:41 sitmo > drwxr-xr-x 6 root root 4096 Mar 14 11:39 snow > drwxr-xr-x 8 root root 4096 Mar 14 11:46 sparseMatrixStats > drwxr-xr-x 8 root root 4096 Mar 14 11:39 stringi > drwxr-xr-x 9 root root 4096 Mar 14 11:50 stringr > drwxr-xr-x 7 root root 4096 Mar 16 22:20 sys > drwxr-xr-x 9 root root 4096 Mar 14 11:41 systemfonts > drwxr-xr-x 9 root root 4096 Mar 14 11:45 textshaping > drwxr-xr-x 8 root root 4096 Mar 22 00:52 tibble > drwxr-xr-x 9 root root 4096 Mar 14 11:56 tidyr > drwxr-xr-x 7 root root 4096 Mar 14 11:50 tidyselect > drwxr-xr-x 7 root root 4096 Mar 14 11:56 tidytree > drwxr-xr-x 8 root root 4096 Mar 14 11:56 treeio > drwxr-xr-x 6 root root 4096 Mar 10 14:24 txtplot > drwxr-xr-x 8 root root 4096 Mar 14 11:37 utf8 > drwxr-xr-x 8 root root 4096 Mar 14 11:50 uwot > drwxr-xr-x 9 root root 4096 Mar 16 21:32 vctrs > drwxr-xr-x 9 root root 4096 Mar 14 11:44 vegan > drwxr-xr-x 8 root root 4096 Mar 14 11:37 vipor > drwxr-xr-x 8 root root 4096 Mar 14 11:56 viridis > drwxr-xr-x 6 root root 4096 Mar 14 11:37 viridisLite > drwxr-xr-x 7 root root 4096 Mar 14 11:36 withr > drwxr-xr-x 10 root root 4096 Mar 22 00:52 XML > drwxr-xr-x 10 root root 4096 Mar 16 22:21 xml2 > drwxr-xr-x 9 root root 4096 Mar 14 11:49 XVector > drwxr-xr-x 8 root root 4096 Mar 16 22:21 yaml > drwxr-xr-x 6 root root 4096 Mar 14 11:36 yulab.utils > drwxr-xr-x 9 root root 4096 Mar 14 11:36 zlibbioc >

K Nodia (18:08:01) (in thread): > the above is the output of the command

Andres Wokaty (18:10:08) (in thread): > What is your user? It looks like everything is owned by root. I am guessing when you run R that you are not root so it isn’t allowing you to install. I would probably change the group permissions to allow writing on that directory and change the group ownership to your user.

K Nodia (18:12:05) (in thread): > i m Running R as Rootsudo -i R or sudo R

Andres Wokaty (18:13:23) (in thread): > ok, probably in the future you want to consider not running as root:slightly_smiling_face:

K Nodia (18:13:42) (in thread): > okay Thanks let me do it

Andres Wokaty (18:15:15) (in thread): > if you do that, make sure you change the group and permissions so that your user can alter thesite-library

K Nodia (18:25:57) (in thread): > i am from changing but not successful ,still same errors

K Nodia (18:27:08) (in thread): > should i delete again each and everyting then i restart new setup again ?

Andres Wokaty (18:28:19) (in thread): > When you say a new set up, are you talking about R?

K Nodia (18:28:54) (in thread): > deleting R and Rstudio

K Nodia (18:29:07) (in thread): > and anything related to the installation

Andres Wokaty (18:29:31) (in thread): > let’s try to work on the permissions first

Andres Wokaty (18:29:59) (in thread): > In R, can you show the results of.libPaths()? This shows you the library paths where R tries to install. It should besite-library

Andres Wokaty (18:30:55) (in thread): > After that, can you show me again a few entries from your site-library so I can see how you changed the permissions and ownership to your user?

K Nodia (18:31:59) (in thread): > okay

K Nodia (18:33:16) (in thread): > > .libPaths()``[1] "/home/kk/R/x86_64-pc-linux-gnu-library/4.2"``[2] "/usr/local/lib/R/site-library" ``[3] "/usr/lib/R/site-library" ``[4] "/usr/lib/R/library"

K Nodia (18:33:27) (in thread): > the output

K Nodia (18:35:16) (in thread): > Entries in Site-librarydrwxrwxrwx 9 root kk 4096 Mar 16 21:32 vctrs``drwxrwxrwx 9 root kk 4096 Mar 14 11:44 vegan``drwxrwxrwx 8 root kk 4096 Mar 14 11:37 vipor``drwxrwxrwx 8 root kk 4096 Mar 14 11:56 viridis``drwxrwxrwx 6 root kk 4096 Mar 14 11:37 viridisLite``drwxrwxrwx 7 root kk 4096 Mar 14 11:36 withr``drwxrwxrwx 10 root kk 4096 Mar 22 00:52 XML``drwxrwxrwx 10 root kk 4096 Mar 16 22:21 xml2``drwxrwxrwx 9 root kk 4096 Mar 14 11:49 XVector``drwxrwxrwx 8 root kk 4096 Mar 16 22:21 yaml``drwxrwxrwx 6 root kk 4096 Mar 14 11:36 yulab.utils``drwxrwxrwx 9 root kk 4096 Mar 14 11:36 zlibbioc

Andres Wokaty (18:35:48) (in thread): > How about for the first path that starts/home/kk?

K Nodia (18:36:15) (in thread): > ls -la /usr/local/lib/R/site-library/ > total 492 > drwxrwxrwx 123 root kk 4096 Mar 22 00:59 . > drwxr-xr-x 3 root root 4096 Mar 10 14:22 .. > drwxrwxrwx 9 root kk 4096 Mar 14 11:44 ape > drwxrwxrwx 7 root kk 4096 Mar 16 22:21 askpass > drwxrwxrwx 12 root kk 4096 Mar 14 11:53 beachmat

K Nodia (18:36:28) (in thread): > same

Andres Wokaty (18:37:23) (in thread): > are you running ubuntu?

K Nodia (18:37:54) (in thread): > yes

K Nodia (18:39:07) (in thread): > kk@rdm:~$ lsb_release -a``No LSB modules are available.``Distributor ID: Ubuntu``Description: Ubuntu 22.04.2 LTS``Release: 22.04``Codename: jammy

Andres Wokaty (18:41:06) (in thread): > When you tried installing, were you using R from the terminal or in RStudio?

K Nodia (18:41:23) (in thread): > R in terminal

Andres Wokaty (18:42:20) (in thread): > Can you tryBiocManager::install("RCurl")?

K Nodia (18:43:14) (in thread): > okay

Andres Wokaty (18:43:59) (in thread): > You were able to install packages previously, so something happened

K Nodia (18:43:59) (in thread): > > BiocManager::install("RCurl")``'getOption("repos")' replaces Bioconductor standard repositories, see``'help("repositories", package = "BiocManager")' for details.``Replacement repositories:`` CRAN:https://cloud.r-project.orgBioconductor version 3.16 (BiocManager 1.30.20), R 4.2.2 Patched (2022-11-10`` r83330)``Installing package(s) 'RCurl'``trying URL 'https://cloud.r-project.org/src/contrib/RCurl_1.98-1.10.tar.gz'``Content type 'application/x-gzip' length 731446 bytes (714 KB)``==================================================``downloaded 714 KB``* installing **source** package 'RCurl' ...``**** package 'RCurl' successfully unpacked and MD5 sums checked``**** using staged installation``checking for curl-config... no``Cannot find curl-config``ERROR: configuration failed for package 'RCurl'``* removing '/home/kk/R/x86_64-pc-linux-gnu-library/4.2/RCurl'``The downloaded source packages are in`` '/tmp/RtmpN4doOr/downloaded_packages'``Warning message:``In install.packages(...) :`` installation of package 'RCurl' had non-zero exit status``>

K Nodia (18:44:11) (in thread): > thats the error output

Andres Wokaty (18:44:45) (in thread): > Can you try this way:install.packages('RCurl', repos = "https://cran.r-project.org")?

K Nodia (18:46:18) (in thread): > > install.packages('RCurl', repos = "https://cran.r-project.org")``Installing package into '/home/kk/R/x86_64-pc-linux-gnu-library/4.2'``(as 'lib' is unspecified)``trying URL 'https://cran.r-project.org/src/contrib/RCurl_1.98-1.10.tar.gz'``Content type 'application/x-gzip' length 731446 bytes (714 KB)``==================================================``downloaded 714 KB``* installing **source** package 'RCurl' ...``**** package 'RCurl' successfully unpacked and MD5 sums checked``**** using staged installation``checking for curl-config... no``Cannot find curl-config``ERROR: configuration failed for package 'RCurl'``* removing '/home/kk/R/x86_64-pc-linux-gnu-library/4.2/RCurl'``The downloaded source packages are in`` '/tmp/RtmpN4doOr/downloaded_packages'``Warning message:``In install.packages("RCurl", repos = "https://cran.r-project.org") :`` installation of package 'RCurl' had non-zero exit status``>

Andres Wokaty (18:46:27) (in thread): > Maybe you need to check if libcurl is installed:https://techoverflow.net/2020/04/22/how-to-fix-rcurl-cannot-find-curl-config-or-checking-for-curl-config-no/ - Attachment (TechOverflow): How to fix RCurl ‘Cannot find curl-config’ or ‘checking for curl-config… no’ - TechOverflow > Problem: You want to install RCurl using Rscript -e “install.packages(‘RCurl’)” but you see an error message like Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) trying URL ‘https://cloud.r-project.org/src/contrib/RCurl_1.98-1.2.tar.gz’ Content type ‘application/x-gzip’ length 699583 bytes (683 KB) ================================================== downloaded 683 KB * installing source package ‘RCurl’ … **** package ‘RCurl’ successfully unpacked and MD5 sums checkedContinue reading →

K Nodia (18:47:16) (in thread): > let me check it out

Andres Wokaty (18:47:46) (in thread): > You can also trysudo apt search libcurl4to see if it is listed as installed

K Nodia (18:50:28) (in thread): > seem to accept now installing RCurl

Andres Wokaty (18:50:43) (in thread): > Try to install Biostrings with BiocManager

K Nodia (18:50:44) (in thread): > **** building package indices > **** testing if installed package can be loaded from temporary location > **** checking absolute paths in shared objects and dynamic libraries > **** testing if installed package can be loaded from final location > **** testing if installed package keeps a record of temporary installation path > * DONE (RCurl) > > The downloaded source packages are in > ‘/tmp/RtmpAyCHJB/downloaded_packages’

Andres Wokaty (18:51:41) (in thread): > Sometimes you have to look at the package documentation to see if there are some SystemRequirements because you have to install those separately.

K Nodia (18:53:29) (in thread): > BiocManager::install("Biostrings")Accepted also **** testing if installed package keeps a record of temporary installation path > * DONE (Biostrings) > > The downloaded source packages are in > ‘/tmp/RtmpAyCHJB/downloaded_packages’ > >

Andres Wokaty (18:53:42) (in thread): > Nice. Try installing mia.

K Nodia (18:59:48) (in thread): > trying and ongoing

K Nodia (19:02:29) (in thread): > Warning messages:``1: In install.packages(...) :`` installation of package 'Cairo' had non-zero exit status``2: In install.packages(...) :`` installation of package 'DirichletMultinomial' had non-zero exit status``3: In install.packages(...) :`` installation of package 'ggrastr' had non-zero exit status``4: In install.packages(...) :`` installation of package 'scater' had non-zero exit status``5: In install.packages(...) :`` installation of package 'mia' had non-zero exit status``>Only 5 , thats the warning message or there is some packages that was showing done

Andres Wokaty (19:03:56) (in thread): > Probably some were installed, but these weren’t. I would look at their landing pages to see if there are any missing system dependencies.

K Nodia (19:04:30) (in thread): > yeah some were installed

Andres Wokaty (19:04:38) (in thread): > Cairo is probably a cran package. I don’t know ggrastr, but I think the others are Bioconductor packages.

K Nodia (19:05:40) (in thread): > Thank you for assistance , i really Appreciate

Andres Wokaty (19:07:42) (in thread): > Cairo has a system dependency:https://cran.r-project.org/web/packages/Cairo/index.html. DirichletMultinomial has another:https://www.bioconductor.org/packages/release/bioc/html/DirichletMultinomial.html. I’ll leave the others for you to check out. - Attachment (cran.r-project.org): Cairo: R Graphics Device using Cairo Graphics Library for Creating High-Quality Bitmap (PNG, JPEG, TIFF), Vector (PDF, SVG, PostScript) and Display (X11 and Win32) Output > R graphics device using cairographics library that can be used to create high-quality vector (PDF, PostScript and SVG) and bitmap output (PNG,JPEG,TIFF), and high-quality rendering in displays (X11 and Win32). Since it uses the same back-end for all output, copying across formats is WYSIWYG. Files are created without the dependence on X11 or other external programs. This device supports alpha channel (semi-transparent drawing) and resulting images can contain transparent and semi-transparent regions. It is ideal for use in server environments (file output) and as a replacement for other devices that don’t have Cairo’s capabilities such as alpha support or anti-aliasing. Backends are modular such that any subset of backends is supported. - Attachment (Bioconductor): DirichletMultinomial > Dirichlet-multinomial mixture models can be used to describe variability in microbial metagenomic data. This package is an interface to code originally made available by Holmes, Harris, and Quince, 2012, PLoS ONE 7(2): 1-15, as discussed further in the man page for this package, ?DirichletMultinomial.

K Nodia (19:09:31) (in thread): > on it

2023-03-22

Leo Lahti (10:32:42) (in thread): > Sorry for all, the dependencies are always causing trouble when starting for the first time.

Leo Lahti (10:33:11) (in thread): > If there are any ideas on how to improve the documentation or other, this will be very welcome. We have tried to reduce dependencies.

2023-03-23

Vivian Ginikachukwu Ikeh (09:33:02): > Hello@Tuomas Borman@Leo Lahtiplease what am I expected to do next, I got a notification to fix an error

Leo Lahti (09:44:23) (in thread): > Can you check the error message in the PR, and then try to fix the error?

Leo Lahti (09:45:08) (in thread): > After fixing, run devtools::check() and if it goes through all right you can commit + push your changes to the same branch, then they will become part of an updated PR.

Vivian Ginikachukwu Ikeh (09:45:38) (in thread): > Okay > Let me do that, thank you

2023-03-24

Vivian Ginikachukwu Ikeh (10:39:50): > Hi@Leo Lahtiso I have been trying to fix the error since yesterday and noticed R wasn’t accessing it, I keep getting error message. Is there another way I could approach it or did I do something wrong when cloning the master? For some unknown reasons my R version isn’t recognizing the Rmd file. What should I do please?

Leo Lahti (12:26:52): > Can you report more exactly how the problem can be reproduced. Which repository you have cloned and which command you have ran?

Vivian Ginikachukwu Ikeh (15:11:40): > I clonedhttps://github.com/microbiome/miaVizI ran devtools::check().

2023-03-25

Muluh (04:10:23): > can you send a screenshot of the error?

Muluh (04:11:40): > also when you rundevtools::check()are you prompted to install some packages like in the attached image? - File (PNG): miaviz1.PNG

Muluh (05:48:53): > hello@Vivian Ginikachukwu IkehI’m not quite sure of the error you have but ensure to have done the following before runningdevtools::check()install all dependencies including miaTime, intall rtools corresponding to your r version. To check your r version, just go to tools->global options in r studio. Rtools is needed sincedevtools::check()updates the documentation, then builds and checks the miaViz package locally . I think it should work fine if you do these steps:)

Vivian Ginikachukwu Ikeh (12:10:58): > Alright, I think I didn’t install all that. I will try it now

2023-03-29

Leo Lahti (15:04:23): > https://forms.gle/n1mjjzGwAtNssX7i7 - Attachment (Google Docs): Call for Community-Led Events > Do you want to find and connect with Bioconductor users and developers who share similar interests to you? > The Bioconductor Community Advisory Board are calling on new and experienced Bioconductor users and developers to suggest ideas and host (or co-host) community events with our support. > > We would like to enable Bioconductor community-led events that are engaging and open to all. Have you creative ideas you think will engage Bioconductor users or developers? Would you like to host (or co-host) a Bioconductor webinar series, vignette presentation by new package developers, journal club, Discord, Twitch, TikToks, Reddit Ask Me Anything sessions, viewing parties for Bioconductor workshops or social Bioconductor discussions? If so, we would love you to get involved! > > Why get involved? > As a host you’ll be empowered to create amazing opportunities for members to connect. You’ll also raise your profile and obtain valuable leadership experience to add to your CV. > > How do you apply? > Complete this form if you are interested to contribute events for the Bioconductor community! > > Events must abide by the Bioconductor Code of Conduct

Leo Lahti (15:05:06): > Would this be something for the microbiome community, if a coordinator could be found?

2023-03-30

Tuomas Borman (02:53:31) (in thread): > worth to consider

2023-03-31

Saksham Gupta (06:39:12): > @Saksham Gupta has joined the channel

2023-04-01

Muluh (03:44:14): > Hello, Please have a look athttps://github.com/microbiome/mia/pull/349. The check still fails on test and I can’t figure out the bug. Also I would like to know if the logical difference between skip_if_not_installed(****************) and skip_if_not(require(*****************) could have a damaging effect on tests? - Attachment: #349 Skip failed test due to absence of dependency package > :bug: Fix test run to avoid failed unit test as a result of dependency package not installed.
> - Before, unit test fails when dependency package is not installed
> - now, unit test skips if dependency package is not installed
> Fix typo in unit test file.

Leo Lahti (03:45:34): > Hi - I am not sure but I noticed that there is something on “stop(gettextf(”invalid call in method dispatch to ‘%s’ (no default method)“,680”estimateDiversity”), domain = NA))(x, assay.type, assay_name,681index, name, …)), )`: unable to find an inherited method for function ‘estimateDiversity’ for signature ‘“phyloseq”’”

Leo Lahti (03:46:55): > -> Now,phyloseqis another data container and different one fromTreeSummarizedExperiment- it seems to me that somewhere there are calls to phyloseq functions and/or data sets

Leo Lahti (03:48:12): > It is possible that a loaded example data set is available from both phyloseq and one of the mia packages. This could be avoided with something likedata(peerj13075, package="mia")

Leo Lahti (03:48:23): > i.e. when loading example data, specify the package, too.

Leo Lahti (03:49:22): > You could add this change in the R/ files and unit tests (in test/ folder) and in vignettes/ Rmd files

Muluh (04:55:23): > Thank you@Leo Lahti. While doing the changes, I noticed it’s not implicit to identify the package from which the dataset is to be loaded. How can I know the package like for example, phyloseq and mia both have dataset Global patterns. How do I know when to use mia/phyloseq?

Leo Lahti (06:18:01): > If you want to use phyloseq objects then do > data(GlobalPatterns, package=“phyloseq”) > > If you want to use TreeSummarizedExperiment objects then do > data(GlobalPatterns, package=“mia”) > > However notice that some of the TreeSE example data sets are also in packages miaViz, miaTime, miaSim etc.

Leo Lahti (06:18:25): > So just explictily tell the package when you are loading the data. That is a good practice anyway I guess.

Leo Lahti (06:18:32): > (although not very common)

Tuomas Borman (06:33:45) (in thread): > Or maybe for only those casses when needed? > > In most cases data(GlobalPatterns) should work because phyloseq is used only internally in miaverse and phyloseq is not loaded into session

Muluh (07:02:05): > Awesome, the bug is flushed. The check runs fine.https://github.com/microbiome/mia/pull/349 - Attachment: #349 Skip failed test due to absence of dependency package > :bug: Fix test run to avoid failed unit test as a result of dependency package not installed.
> - Before, unit test fails when dependency package is not installed
> - now, unit test skips if dependency package is not installed
> Fix typo in unit test file.

Muluh (07:11:59): > I have a question@Leo Lahti@Tuomas Borman. I noticed that during dataset loading in in miaverse packages, we use for example sometimes data(GlobalPatterns***************) and data(“GlobalPatterns”**************)… Is there a particular reason for the difference..? If not, can we harmonise the loads?

Leo Lahti (07:46:04) (in thread): > No reason, would be nice to harmonize.

Leo Lahti (07:46:14) (in thread): > I prefer without quotes since obviously they are unnecessary

Leo Lahti (07:47:20) (in thread): > Both ok. But who knows in which sessions these are being used by different users. If phyloseq and mia are both loaded, then it will depend on the order of loading.

Leo Lahti (07:48:20) (in thread): > I’ve bumped into this many times accidentally, not a problem since I can spot the problem when it occurs but this is not the case for everyone who copies example code and starts modifying the workflows.

Muluh (09:06:15) (in thread): > OK great. I will pull up PRs so that we land the harmonisation

Muluh (09:21:23): > PTALhttps://github.com/microbiome/mia/pull/356, andhttps://github.com/microbiome/miaTime/pull/75 - Attachment: #356 Harmonize Dataset Load command - Attachment: #75 Harmonize dataset load command

Muluh (10:00:45): > Also,https://github.com/microbiome/miaViz/pull/74, I’m not sure but I think it fails because miaTime install has been conditioned so the vignette doesn’t build. - Attachment: #74 Harmonize Dataset Load command > :+1: Remove quotes from dataset load command as it is trivial for its purpose.

2023-04-02

khadijah (09:04:25): > @khadijah has joined the channel

2023-04-06

Muluh (04:44:10): > Hello,@Leo Lahti,@Tuomas Borman, the check stills fails because of miaTime installation. How can we fix it? I tried several changes already nothing seems to work.

Tuomas Borman (05:03:14) (in thread): > Have you synced the branch with origin/main?

Leo Lahti (06:42:28) (in thread): > @Tuomas Bormanwas last checking this and running tests I think, looking fwd to his comment.

Tuomas Borman (06:53:29) (in thread): > Oh, I replied to wrong thread… > > Have you synced the branch with origin/main? –> there was an issue with miaTime installation

Tuomas Borman (06:56:09) (in thread): > (now there is problem with miaTime –> mia::mergeSEs but thats is another story) –> I will fix as soon as possible

Tuomas Borman (06:57:07) (in thread): > (mergeSEs require that similarly named colData columns are same class which is not the case in objects when miaTime::someFunction is used)

Leo Lahti (11:46:06) (in thread): > :heavy_check_mark:

2023-04-08

Leo Lahti (10:06:35): > EuroBioC DL for abstracts, April 14:https://eurobioc2023.bioconductor.org/

Muluh (16:03:50) (in thread): > Hello@Leo Lahti@Tuomas Borman, what can I do to assist with the bug in miaTime?

2023-04-10

Leo Lahti (14:33:09): > Has anyone tested this one?https://github.com/cafferychen777/ggpicrust2

2023-04-11

Tuomas Borman (02:30:38) (in thread): > I haven’t > > (They have logo t-shirt and hat –> mia hat would be cool :D)

Tuomas Borman (02:37:59) (in thread): > Hello, the problem lies on mia::mergeSEs –> I have to check the specific problem and I will let you know

2023-04-18

Matthew Broerman (13:53:24): > @Matthew Broerman has joined the channel

Levi Waldron (16:25:04): > his Thursday from 09:00 - 12:00 NYC time is this month’s Microbiome International Forum (Atlantic edition - the Pacific edition is today 9pm NYC time). It’s free and I am one of the organizers. > * This week’s keynote is Elaine Holmes of Murdoch University, speaking onMapping the functionality of the microbiome using metabolic phenotyping. Short talks include: > * Evolution of the breastfed infant gut microbiome across the first year of life in the BLOSOM cohort,Do you see what I see? Improving color accessibility and organization of microbiome data visualizations with the microshades R package, > * Genomic analysis of cultivated infant microbiomes identifies Bifidobacterium 2’-fucosyllactose utilization can be facilitated by co-existing species, > * The microbial and metabolic landscape of infant cystic fibrosis: the gut-lung axis > https://www.microbiome-vif.org/en-US/-/future-events/mvif18-prof-elaine-holmes

Leo Lahti (16:43:27): > Multi-omic data analysis with R/Bioconductor: Oulu Summer School, June 19-21, 2023 (Mon-Wed)* For program & schedule, travel tips, registration, and other details, see the course homepagehttps://microbiome.github.io/course_2023_oulu

2023-05-04

sophy (13:02:34): > @sophy has left the channel

2023-05-17

Artur Sannikov (02:58:54): > @Artur Sannikov has joined the channel

2023-05-23

David Mateo García (05:31:27): > Good morning. When trying to load a .biom file, the following error appears. Any ideas on how to solve it? Could it be because I updated R and RStudio to their latest versions? > > Error in biomformat::read_biom(file) : Both attempts to read input file: > biom either as JSON (BIOM-v1) or HDF5 (BIOM-v2). > Check file path, file name, file itself, then try again.

Tuomas Borman (06:24:59): > Hello, > > that comes from biomformat::read_biom (and jsonlite::fromJSON / rhdf5::h5read)https://github.com/joey711/biomformat/blob/4e13f6fb3611b715b1c5b87d62c851f65d1453b6/R/IO-methods.R#L76hard to say what is the problem. So the file is the same? Can you give the session info

David Mateo García (06:59:01): > It was a really stupid issue. I wrote: > se <- loadFromBiom(“mydoc”), where must be loadFromBiom(mydoc).

Tuomas Borman (09:42:04): > Ahhh, well that happens > > I like easy mistakes because they are usually also fixed easily –> quick stress relief

2023-05-24

David Mateo García (05:14:00): > Sure hahaha

2023-06-01

Asiye (09:20:37): > @Asiye has joined the channel

2023-06-05

Stefanie Peschel (07:23:12): > @Stefanie Peschel has joined the channel

2023-06-06

Isabel Fernandez Escapa (14:26:28): > @Isabel Fernandez Escapa has joined the channel

2023-06-19

Ida Holopainen (11:21:35): > @Ida Holopainen has joined the channel

2023-07-25

David Mateo García (10:52:07): > Hi all! > > I’m having difficulties properly reading a .biom file with miaverse. This file was provided to us by the sequencing service. > I’ve used the ‘loadFromBiom’ function, and the file seems to load correctly, but the data is organized differently than in the examples from the OMA book (like GlobalPatterns). > Specifically, it doesn’t add the colData names or the rowData names in their place. Additionally, the taxa seem to be stored in rownames. > > Do you know how I could fix this?

David Mateo García (10:52:23): > Thanks!

Tuomas Borman (11:16:06) (in thread): > Hi, just guessing what is the exact problem – > > 1. rowData is completely empty? > 2. How about colData (colData does not include data necessarily)? > 3. And assay? Does it look correct? > > rownames can be long depending on how OTUs/ASVs are named in BIOM file. > > If the rowData does not include taxonomy ranks you can try to run this lineloadFromBiom(file, removeTaxaPrefixes = TRUE, rankFromPrefix = TRUE)

Leo Lahti (12:13:17) (in thread): > Did it work?

2023-07-26

David Mateo García (02:51:23) (in thread): > 1. 1 - No, if I open the file ‘se’, the ‘rowData names’ is empty. The ‘rownames’ display taxonomy in a strange format (instead of showing the OTU ID, it shows taxons like d__Archaea; p; o; and so on). When using the command ‘head(rowData(se))’, nothing is displayed. > 2. 2 - The command ‘head(colData(se))’ doesn’t display anything either, just like before. If I open the file ‘se’, the ‘colnames’ contain the IDs of my samples, and in ‘colData names’, there’s nothing. > 3. 3 - Neither do the counts appear as a count table; instead, my samples are combined with the taxonomy (which appeared in ‘rownames’).

David Mateo García (02:51:55) (in thread): > I will try the command you suggested. Thank you.

David Mateo García (02:54:43) (in thread): > I have tried the command, and it has reported the following error: > Error in colnames<-(**tmp**, value = ranks) : > ‘value’ must be a character vector in colnames(x) <- value

Tuomas Borman (03:39:00) (in thread): > Hmmm, is there other biom formats… > > Does this work? Is the output correct?.require_package("biomformat")``obj <- biomformat::read_biom(file)``# Print version of BIOM``obj$format``counts <- as(biomformat::biom_data(obj), "matrix"))``sample_data <- biomformat::sample_metadata(obj)``feature_data <- biomformat::observation_metadata(obj)``rownames(counts) <- rownames(feature_data) <- biomformat::rownames(obj)``colnames(counts) <- rownames(sample_data) <- biomformat::colnames(obj)``counts``sample_data``feature_data

David Mateo García (03:53:44) (in thread): > The code works until it reaches ‘rownames’ (counts), showing this error: > Error inrownames<-(**tmp**, value = c(“d_Archaea; p; c; o; f; g; s_”, : > attempt to set ‘rownames’ on an object with no dimensions > > The same happens with ‘colnames’ (counts), and the error is: > Error inrownames<-(**tmp**, value = c(“P21101-01”, “P21101-02”, “P21101-03”, : > attempt to set ‘rownames’ on an object with no dimensions > > ‘Counts’ displays the taxonomy, not the count table. Sample and Feature are null (it’s normal in sample because I haven’t added the metadata yet).

David Mateo García (03:54:55) (in thread): > Could it be due to the format of the biom file? Can I provide any instructions to the sequencing service to obtain the file differently?

Tuomas Borman (04:16:36) (in thread): > Here is information on BIOM formathttps://biom-format.org/documentation/format_versions/biom-2.1.htmlHere is a package that reads the BIOM filehttps://www.bioconductor.org/packages/release/bioc/manuals/biomformat/man/biomformat.pdf(mia is using that internally) > > –> I’m not completely sure, but it seems that the BIOM file is somehow created incorrectly.biomformat::biom_data()should fetch the abundance table from the BIOM object. > > You can also try to read abundance table, taxonomy table and sample metadata manually by using biomformat package (or other way, BIOM object is just a list of lists). > > In the figure,dataslot includes counts table,rowstaxonomy info andcolumnssample metadata - File (PNG): image.png

David Mateo García (05:23:00) (in thread): > I’ll try it, thanks!

Leo Lahti (14:14:23): > Next year a similar one for OMA..?https://twitter.com/AedinCulhane/status/1684147155802488834?t=5E7Nf5zla9Apj_ycuwyKng&s=31 - Attachment (Twitter): Aedin Culhane (@AedinCulhane@genomic.social) on Twitter > :point_right:Orchestrating Large-Scale Single-Cell Analysis with Bioconductor > > @Bioconductor @drighelli @drisso1893 @M2RUseR & Ludwig Geistlinger presented sold-out workshop at #ISMBECCB2023 #ISMB2023 > > All workshop materials are available online at https://t.co/7q3vwT7QiP > > Try it out!

2023-07-27

Artur Sannikov (08:42:53): > Hi, when trying to agglomerate the data at higher taxonomic ranks, I get a warning > > Warning message: > 'counts' includes binary values. > Agglomeration of it might lead to meaningless values. > Check the assay, and consider doing transformation again manually with agglomerated data. > > > > library(mia) > data(HintikkaXOData, package = "mia") > mae <- HintikkaXOData > mae[[1]] <- agglomerateByPrevalence(mae[[1]], rank = "Phylum") >

Artur Sannikov (08:43:38): > I had a look atcountsand it seems that a new group “Other” is generated which mostly has 0s, and some 1s. Is this the reason of this warning?

Leo Lahti (18:59:34): > Thanks, this is a very useful observation. > > By the way, in the latest development version we have update the name of this function tomergeFeaturesByPrevalence(the old name still works but it will throw a warning). You can update withdevtools::install_github("microbiome/mia"). It is also good to give the argumentas_relative=TRUEexplicitly, for clarity. > > Regarding the issue, two points: > 1. Indeed, it seems that your analysis is correct; the “Other” group only contains 0’s and 1’s. Then the system is not sure if these are actual counts that are sensible to sum up, and throws a warning. In principle this works as expected. In practice, we already know from the context that it is actual count data because we could test that the originalmae[[1]]is count data, and hence any subset of it (those that will be merged under the “Other” category) will also be. This could deserve a small fix that would check the “count” status in such cases for the original input only. This will require thinking a bit about the logic of the method. > 2. I noticed also another point; agglomerating by rank or prevalence will give different total read counts per sample, although they would be expected to give identical count (just different grouping of the rows). > > > colSums(assay(agglomerateByRank(mae[[1]], rank = "Phylum"))) > colSums(assay(agglomerateByPrevalence(mae[[1]], rank = "Phylum"))) > > This is because the Phylum rank includes NAs for some rows: sum([is.na](http://is.na)(rowData(mae[[1]])$Phylum)) yields 93. These are omitted withagglomerateByPrevalencebut not withagglomerateByRank(they will be included as NA row in the latter). It would be most logical that the NA row would be included also in the data that is agglomerated by prevalence. The user can choose whether they want to merge such NA row further. One problem with the NA row is that these may come from different phyla, and hence grouping them together in the phylum level agglomeration is potentially misleading. I would solve this by providing a binary argument that excludes the NA phyla by default in all agglomerations (rank, prevalence, or other grouping variable) but user could choose to keep these by switching the argument (then they are aware of this and can maintain the original read count, which might be relevant in some cases).@Tuomas Bormanany chance to have a look, or at least open a proper issue on these?

2023-07-28

David Mateo García (03:30:24): > Is there any problem with the function transformAssay? I’m trying to get relabundance but couldn’t find the function. I have mia, miaViz and miaTime on.

Leo Lahti (04:04:03): > Have you updated to the latest version withdevtools::install_github("microbiome/mia")?

Leo Lahti (04:04:17): > then restart and try again

David Mateo García (04:50:02): > Thank you!

Artur Sannikov (06:23:21) (in thread): > Thanks Leo, what do you mean by “count status”? > > Regarding the second point, in OMA book, it’s suggested to remove NA values at Phyla and lower ranks before going on with the analysis. In my case, I did it in the data cleaning step. > > # Remove species which miss information at Phylum level and lower > mae[[1]] <- mae[[1]][!is.na(rowData(mae[[1]])$Phylum), ] >

Artur Sannikov (06:32:10) (in thread): > AlsotransformCountsis calledtransformAssayin the new version.

Artur Sannikov (08:44:33): > Hi, another observation. InCCAsection, you tell to > > # Give unique names so that we do not have problems when we are creating a plot > rownames(mae[[1]]) <- getTaxonomyLabels(mae[[1]]) > > I don’t get what’s the reason. It works perfectly in exactly the same way even if I comment this line out. - Attachment (microbiome.github.io): Chapter 13 Multi-assay analyses | Orchestrating Microbiome Analysis > Chapter 13 Multi-assay analyses | Orchestrating Microbiome Analysis

Leo Lahti (08:52:33) (in thread): > By “count” status I mean the case where 0/1 are counts (and not binary indicators)

Leo Lahti (08:54:30) (in thread): > Removal of NAs: yes, we might need to discuss if it is better to leave it to the user to remove these, or shall we include that as an option in the merge function. In any case, the different merge functions (mergeFeatures, mergeFeaturesByRank, mergeFeaturesByPrevalence..) should handle the case in the same way. Otherwise it will be too confusing.

Leo Lahti (08:54:51) (in thread): > Yes we have renamed transformCounts to transformAssay because it can transform any assay, not just counts and the name is also more clear.

Leo Lahti (11:01:48) (in thread): > Hmm I guess that we have initially had an example with some other data set that used to have duplicated row names, or no row names. This line could indeed be removed from OMA (could make PR if you have also some other improvements?)

Konstantinos Daniilidis (13:47:33): > @Konstantinos Daniilidis has joined the channel

Benjamin Yang (15:57:58): > @Benjamin Yang has joined the channel

2023-08-01

David Mateo García (03:59:50): > Hello again. A colleague is having trouble installing mia on her Mac. Some packages (like miaTime or scater) are not being installed, and therefore, mia is not being installed correctly. Do you know how this can be fixed? Thank you!

Leo Lahti (09:52:17) (in thread): > the commands and error messages would be necessary for troubleshooting.

2023-08-04

Leo Lahti (08:54:11): > Something to adapt later https://twitter.com/mikelove/status/1687436534822174720?s=20 - Attachment (Twitter): Michael Love on Twitter > Presenting a workshop with @steman_research at #BioC2023: > > “Tidy genomic and transcriptomic single-cell analyses” > > New packages provide a tidy-syntax API to manipulate @Bioconductor objects. Metadata/structure of the original object is preserved. > > https://t.co/AvXYz2Nm5v

2023-08-10

Leo Lahti (06:15:54): > Reading Pat Schloss latest preprint with rarefaction experiments on alpha & beta diversity, the paper deals nicely with all the earlier controversies and it seems to me that some support for that (averaging results from many rarification rounds) could be provided. > > I opened two issues on this:https://github.com/microbiome/mia/issues/417(support for alpha diversity calculations in mia)https://github.com/microbiome/OMA/issues/332(extra examples on beta diversity in OMA)

Leo Lahti (06:16:20): > Also curious to hear criticism.

2023-08-24

Moritz E. Beber (16:42:10): > @Leo Lahtiand anyone else interested: I finally made some progress on importing metagenomic profiles from Python into R. First, some advertisement since we recentlypublished taxpasta. Then, I just released a new version that can include a tax table with the BIOM format file. So you can now do a few things: > 1. biomformat::read_biom("result.biom") > > > biom object. > type: OTU table > matrix_type: dense > 43 rows and 2 columns > > 2. You can also useobj <- phyloseq::import_biom("result.biom") > > phyloseq-class experiment-level object > otu_table() OTU Table: [ 43 taxa and 2 samples ] > tax_table() Taxonomy Table: [ 43 taxa by 17 taxonomic ranks ] > > 3. You can then create a TSETreeSummarizedExperiment(assays = list(count = phyloseq::otu_table(obj)), rowData = phyloseq::tax_table(obj)) > > class: TreeSummarizedExperiment > dim: 43 2 > metadata(0): > assays(1): count > rownames(43): 1 131567 ... 9605 9606 > rowData names(17): Rank1 Rank2 ... Rank16 Rank17 > colnames(2): sample1 sample2 > colData names(0): > reducedDimNames(0): > mainExpName: NULL > altExpNames(0): > rowLinks: NULL > rowTree: NULL > colLinks: NULL > colTree: NULL > > The only thing that’s still missing for me is to turn the tax table into a tree. There are various functions in Python to do so, but as far as I can tell the BIOM format cannot store a tree. So I could either export a separate file with a tree in NEWICK format or convert the tax table into a tree in R. I haven’t found a quick way to do that yet, but I suspect it would not be too hard when using a proper graph library. - Attachment (Journal of Open Source Software): TAXPASTA: TAXonomic Profile Aggregation and STAndardisation > Beber et al., (2023). TAXPASTA: TAXonomic Profile Aggregation and STAndardisation. Journal of Open Source Software, 8(87), 5627, https://doi.org/10.21105/joss.05627

Leo Lahti (17:11:54): > Great to hear about the progress. Some remarks / questions: > > 2-3: Why not directly import into TreeSE, e.g. with > > obj <- biomformat::read_biom("result.biom") > tse <- mia::makeTreeSEFromBiom(obj) > > or more directly just: > > tse <- mia::loadFromBiom("result.biom") > > You could add taxonomy tree as follows. However this is not actual sequence-based phylogenetic tree but the one that describes hierarchies in rowData / taxonomic table: > > tse <- mia::addTaxonomyTree(tse) >

2023-08-27

Moritz E. Beber (03:55:18) (in thread): > Honestly, because I didn’t see those functions in the documentation page I was looking at:smile:(I was looking herehttps://bioconductor.org/packages/release/bioc/vignettes/TreeSummarizedExperiment/inst/doc/Introduction_to_treeSummarizedExperiment.html). I’ll try immediately.

Moritz E. Beber (04:07:03) (in thread): > Okay,loadFromBiomworks just fine, however, adding the taxonomy doesn’t work yet, as there are more columns in therowDatathan just the taxonomy table. That would need a bit of code.

2023-08-28

Leo Lahti (09:44:25) (in thread): > Right. Those functions are not in the TreeSummarizedExperiment package but in the mia package. Therefore they are in the mia documentation. Perhaps this raises the question whether we should cross-link more between these resources - ping@Ruizhu HUANG?

Leo Lahti (09:45:18) (in thread): > Can you share example code on trying to add the tree, we can see how to fix?

Leo Lahti (09:48:26) (in thread): > For me the extra fields in rowData do not matter when runningaddTaxonomyTree

2023-08-29

Moritz E. Beber (13:44:40) (in thread): > > > tse <- mia::loadFromBiom("result.biom") > > tse > class: TreeSummarizedExperiment > dim: 43 2 > metadata(0): > assays(1): counts > rownames(43): 1 131567 ... 9605 9606 > rowData names(18): rank_lineage taxonomy1 ... taxonomy16 taxonomy17 > colnames(2): sample1 sample2 > colData names(0): > reducedDimNames(0): > mainExpName: NULL > altExpNames(0): > rowLinks: NULL > rowTree: NULL > colLinks: NULL > colTree: NULL > > > > > mia::addTaxonomyTree(tse) > Error in h(simpleError(msg, call)) : > error in evaluating the argument 'X' in selecting a method for function 'lapply': argument must be coercible to non-negative integer > > > > > mia::taxonomyRanks(tse) > character(0) > > > > > rowData(tse) > DataFrame with 43 rows and 18 columns > rank_lineage taxonomy1 taxonomy2 taxonomy3 taxonomy4 taxonomy5 taxonomy6 > <character> <character> <character> <character> <character> <character> <character> > 1 > 131567 > 2759 superkingdom Eukaryota > 33154 superkingdom;clade Eukaryota > 4751 superkingdom;clade;k.. Eukaryota Fungi > ... ... ... ... ... ... ... ... >

2023-08-31

Leo Lahti (04:58:29) (in thread): > hmm is there any chance you could send the data (or modified version of it) to facilitate debugging?

Leo Lahti (04:58:37) (in thread): > in case this is not reproducible in our demo data sets

2023-09-02

Moritz E. Beber (10:14:33) (in thread): > I don’t mind sharing at all, it’s more or less fake data. - File (Binary): result.biom

Leo Lahti (14:59:05) (in thread): > @Akewak Jebawould you have a chance to check this?

Akewak Jeba (14:59:11): > @Akewak Jeba has joined the channel

2023-09-03

Akewak Jeba (05:50:25) (in thread): > Sure I will check this

2023-09-05

Akewak Jeba (07:07:00) (in thread): > I checked the whether the taxonomic information is useable for mia and it returned FALSE > > checkTaxonomy(tse) > [1] FALSE > attr(,"msg") > [1] "FALSE" >

Leo Lahti (08:29:19) (in thread): > if some feature information (rowData) exists, then is there a more specific reason why it is not accepted

2023-09-06

David Mateo García (09:37:47): > Hello everyone. I have a question that may be very basic but is hindering my progress in analyzing results. Using miaverse and following the OMA book, I’m calculating some alpha diversity indices (such as the Shannon index). In my sample, no significant differences are observed between groups. > > Now, the service that sequenced our samples calculated these indices differently: they took ten rarefaction measures per sample and used them to compare between groups, observing significant differences. > Seeing that both methods yield different results, my question is as follows: am I doing something wrong with miaverse? Should I consider the rarefaction measures? > > I hope I have explained myself clearly. Any suggestions are more than welcome.

Moritz E. Beber (10:38:33) (in thread): > Did you do any sub-sampling in your analysis? It might indeed work better if you have similar number of sequenced content in all of your samples.

Leo Lahti (18:31:17) (in thread): > I agree it might work better and be better justified. > > This is difficult to automate in a general case for all possible analyses but mia provides the necessary tools to implement such workflow and we should probably add some examples in OMA how to do. > > Did you see this recent paper from Pat Schloss, with nice analyses of this:https://www.biorxiv.org/content/10.1101/2023.06.23.546313v1.full.pdf

2023-09-07

David Mateo García (06:18:52) (in thread): > Thanks for the paper, it was very informative. Some examples of how implement rarefaction would be useful. Do you know where I could find the code?

Leo Lahti (12:36:23) (in thread): > I think he made all his code available but that is more base R I think. For mia, you can usesubsampleCountsto rarify, and then estimateDiversity to calculate alpha diversity for that rarified data as usual. The for loop over multiple rarification rounds you would need to write yourself. For group-wise comparison I think you would just (critically) see how the other paper did that, or at simplest you can do something like testing running e.g. 20-100 rarefactions, do the comparison per each round, and get empirical p-value for the significance. Or you can also come up more fancy scenarios to do this I guess.

Leo Lahti (12:37:20) (in thread): > I hope we could add some examples soonish (contributions welcome:smile:)

Moritz E. Beber (16:45:23) (in thread): > I highly, highly, recommend sub sampling your raw sequencing reads directly, for example, from FASTQ files. My experience with sampling taxon counts is that the increased richness due to sequencing depth carries over and is not completely eliminated by sampling at the counts-level, but only by sampling sequencing reads. There are a couple of tools for doing so, for example,https://github.com/stjude-rust-labs/fq. And I’ve written a nextflow pipeline for doing so, although I haven’t maintained it in a whilehttps://github.com/Midnighter/tyche/tree/dev.

2023-09-08

Leo Lahti (11:20:12) (in thread): > That’s a good point I think. One problem is that sometimes analysts do not have access to that data if they are provided with count tables directly. But this might be a better approach when one can do it. Although, it is outside of the usual Bioconductor downstream analysis. > > Do you have/know actual quantitative benchmarks between those two options?

2023-09-09

Moritz E. Beber (04:15:05) (in thread): > Hmm, no, I don’t have access to the numbers any more where I checked this. However, we are running some benchmarking experiments forhttps://github.com/nf-core/taxprofiler/soon, so that is something we could include. I think it’d actually be very interesting to show and we can then even do so for many metagenomic profilers at once. Would you/your group be able to suggest suitable public whole-genome, metagenomic samples?

Leo Lahti (06:03:29) (in thread): > Hmm I guess we would not even critically need sample metadata here since the consistency of the estimates can be measured independently just from the sequencing data. Although sample metadata could provide additional support. Why not directly use public metagenome data from curated open databases, such as EBI MGnify (https://www.ebi.ac.uk/metagenomics/browse/super-studies) - even for different environments like sea, earth, human, animal, plant.. We have some specific interest to HoloFood (which is integrated with MGnify), that has a dedicated data portal for some model organisms:https://www.holofooddata.org - Attachment (holofooddata.org): HoloFood Data Portal > Data resources generated by the HoloFood hologenomics project

Moritz E. Beber (11:12:36) (in thread): > Those are nice samples, yes. Although I really wanted to use human faecal samples in order to test some questions around host removal, too.

Leo Lahti (13:51:53) (in thread): > Do you have a desired minimum depth?

Moritz E. Beber (15:43:48) (in thread): > Not less than 5M read pairs, I think.

2023-09-12

Akewak Jeba (08:48:45) (in thread): > Here is my understanding > I noticed that the imported biom file did not contain any colData yet, so only an empty dataframe appears in this slot. > > head(colData(tse)) > DataFrame with 2 rows and 0 columns > > Therefore I think we need to add sample metadata. Therefore the error is because we didn’t add sample metadata for the colData(tse)

Leo Lahti (16:41:47) (in thread): > Hmm.. but the problem was withaddTaxonomyTreeand that should be independent ofcolData.

Leo Lahti (16:44:22) (in thread): > Did you check whataddTaxonomyTreeis trying to do and where the error occurs? My guess is that it is because the rowData does not contain official taxonomic rank names (frommia::TAXONOMY_RANKS) and probablyaddTaxonomyTreewould need those. Also themia::checkTaxonomyreturns FALSE, perhaps because of this.

Leo Lahti (16:45:42) (in thread): > If this is the reason then the question is of course whether we should allow an option to build a rowTree also from other hierarchies in rowData. I think user should be able to do this if they want to.@Tuomas Bormanyour take on it..?

2023-09-13

Moritz E. Beber (05:54:33) (in thread): > Please note that there is some flexibility on my part how that BIOM file is created. So I can also adjust it there. However, the column names I can not directly influence. They will always be some name + a numeric index. That’s why I provide the ranks in a separate column. I could fairly quickly define a function to create expectedrowDatabut I’m not sure where such a function should then live. I could just document on the taxpasta-side, but I don’t want to release an R package just for the one function.

Tuomas Borman (07:15:42) (in thread): > 1. > > The problem is that those taxonomy rank columns are not detected. What is this format of this feature data / rowData / taxonomy table? Or did you create it by yourself? If this is a common format, we could support it. > > If the format is not common, we can not do much in loadFromBiom function and the output will include taxonomy* columns in rowData > > You can manually assign correct column names after the TreeSE is created, it should only require a one-liner. > > 2. > Should we support building taxonomy trees from other hierarchies… In this case, better solution is to name those column correctly, otherwise other functions will not work (for example mergeFeaturesByRank). > > But on the other hand, we could make this function more general; there might be other use-cases in the future where hierarchy trees need to be created > > > 3. > Again, this file includes lots of “unofficial” taxonomy ranks that are not supported by mia (for example superkingdom) We have to think about a solution for this. > > For example, for this reason addTaxonomyTree would not work currently even though there was correct column names (those extra taxonomy ranks are not taken into account when creating a tree). > > Should there be examples how to manipulateTAXONOMY_RANKSvariable in mia? However, I think this could break things, and we need to modify those functions to support user-defined taxonomy ranks

Moritz E. Beber (07:18:05) (in thread): > The ranks are coming directly from the NCBI taxonomy. So I wouldn’t call them unofficial. Maybe a bit unusual to have the full list of ranks there rather than having it reformatted to the typical seven ranks.

Leo Lahti (07:38:51) (in thread): > I am afraid that if we do not create a solution that is relatively straightforward, this question keeps on coming back.@Tuomas Bormanshould we create a proper issue(s) and aim to come up with a more definite solution? > > The more general solution might not be so critical in case we can assume that most use cases would deal with the “ranks”, although we still need to decide how those ranks should be defined.

2023-09-14

Tuomas Borman (04:27:28) (in thread): > We have an open issue on this:https://github.com/microbiome/mia/issues/219 - Attachment: #219 taxonomy ranks are too restictive > Hi, > > in case one’s taxonomy ranks fall outside of what is hard-coded in mia::TAXONOMY_RANKS, it breaks functions like taxonomyRanks. > > For example: > > > > agglomerateByRank(my_tse, "Strain") > Error: 'rank' must be a value from 'taxonomyRanks()' > > > One solution would be to be able to customize TAXONOMY_RANKS. Or include all possible ranks in TAXONOMY_RANKS. Especially eukaryotes have a bunch of more ranks. > > Thanks!
> Bela

Leo Lahti (08:58:16): > https://x.com/steman_research/status/1702184792920674566?t=wgBNuty_Tg7q6qE3bhjZ6A&s=31 - Attachment (X (formerly Twitter)): Stefano Mangiola on X > :tada::broom: The #tidyomics ecosystem is official! > > Into #omic data analysis? Spanning #Seurat @Bioconductor #SCE, #SE, #GRanges, #Citometry? > Now, just use #tidyverse ! > > Co-led with @mikelove and @TonyPapenfuss @WEHI_research #singlecell > > The preprint https://t.co/jGEy2BVj5C

Moritz E. Beber (15:34:24) (in thread): > Thanks for driving this forward. Felix Ernst comes across as pretty grumpy. I’m glad that our discussion has been much more pleasant so far.

2023-09-15

Leo Lahti (04:52:36): > @Leo Lahti has joined the channel

2023-09-20

Leo Lahti (17:30:23): > ggplot2 now supports DataFrame objects as wellhttps://github.com/tidyverse/ggplot2/issues/5390 - Attachment: #5390 Support for formats other than data.frame > Problem Whereas ggplot2 supports data.frame, many other data structures are available that could benefit from the ability to use ggplot2 functionality. Examples include e.g. DataFrame, matrix, dgCMatrix, DelayedMatrix, SparseMatrix, etc. Many of these classes support as.data.frame() and can be easily converted into a data.frame. However, the need to do this with every ggplot2 function call becomes rapidly very repetitive. > > Suggested solution The default fortify() method, ggplot2:::fortify.default() could just try to call as.data.frame() on the supplied object. This would directly make ggplot() work on any object that supports as.data.frame() (e.g. DataFrame, matrix, dgCMatrix, DelayedMatrix, SparseMatrix, etc.) > > Let’s load libraries and example data > > > library(S4Vectors) > library(ggplot2) > data(iris) > > > Usual data.frame works as expected: > > > ggplot(iris, aes(x=Sepal.Width, y=Sepal.Length)) + geom_point() > > > DataFrame does not work, and ggplot call throws and error: > > > ggplot(DataFrame(iris), aes(x=Sepal.Width, y=Sepal.Length)) + geom_point() > > > > Error in fortify():
> > ! data must be a <data.frame>, or an object coercible by fortify(),
> > not a object. > > At the moment our default solution has been to always add as.data.frame() around DataFrame objects, like: > > > ggplot(as.data.frame(d), aes(x, y)) + geom_point() > > > There was initial discussion that related to the challenges this adds to teaching standard plotting in ecosystems that rely on classes that are closely related to data.frame but not that. > > Initial thought was to solve this in the S4Vectors class (for DataFrame), see the PR by @kevinrue - then @hpages pointed out the more general solution described above. > > -> Could ggplot add the as.data.frame check to extend the support to other formats than data.frame? If yes, we might be able to provide a PR.

2023-09-21

Leo Lahti (05:08:22) (in thread): > This specific question is a bit tricky because merging taxonomic ranks is conceptually different from just merging features. And there are standards but there is also need for flexibility..

Leo Lahti (05:09:40) (in thread): > I am sort of getting in favor of the idea that we would just show, by examples, how users can define the taxonomy ranks themselves, if they are not happy with the standard defaults. That’s an extra step but it would help to keep some control over this while allowing flexibility.

2023-10-13

pande.erawijantari (05:34:56): > @pande.erawijantari has joined the channel

USLACKBOT (12:47:36): > Utuhas joined this channel by invitation fromcommunity-bioc.

Leo Lahti (18:23:51): > https://x.com/samanthapgraham/status/1712834019019141263?t=Ko06VyVGcJSb3LT9nArEgA&s=31would be cool demodata for mia..? - Attachment (X (formerly Twitter)): Samantha Graham on X > So that’s our story so far! We’re excited to share this with the community and would love your feedback and ideas. The goods: > * Dataset https://t.co/VopPEah8y3 > * R package: https://t.co/CUmc3AgY6f > * Website: https://t.co/XbKGLNdmeN > * Preprint! https://t.co/eLFFYCFWc7

2023-10-14

Leo Lahti (04:26:35): > They support TreeSummarizedExperiment (!) > > “The resulting object is aTreeSummarizedExperimentobject. Currently, the”tree” part of the TreeSummarizedExperiment is not populated, but that is on the roadmap.”

2023-10-17

Tuomas Borman (07:44:48): > #miaverseparticipates in#hacktoberfestWe are open for contributions! See e.g., repos: > > mia:https://github.com/microbiome/miamiaViz:https://github.com/microbiome/miaVizOMA:https://github.com/microbiome/OMA

2023-10-30

Moritz E. Beber (05:05:32) (in thread): > Hi everyone, > I was curious if there have been any updates on this issue? It’s a very desirable feature for the taxprofiler users. If you can’t make it happen at the moment, just let me know and I’ll see about coding up something on my own.

Tuomas Borman (05:17:47) (in thread): > Hello! I understand; unfortunately no updates yet. We have lots of things on the table currently. We are planning to get miaverse finalized before the next release so that no big changes will come after that to existing functions (function naming, parameters etc). > > Then there is also these functional things to implement. If you come up with good solution, we are more than happy if you are willing to contribute. > > The mia package has a global variable TAXONOMY_RANKS which specifies the taxonomy ranks

Leo Lahti (07:55:44) (in thread): > Thanks@Tuomas Borman. But isn’t it so that one could in fact use the TAXONOMY_RANKS to essentially achieve exactly what@Moritz E. Beberneeds here? Only thing that is needed is a clear example (or pointing where to find it, if it exists). Automation we can think later as you suggest, if it appears to more problematic to do in the general case.

2023-11-02

Tuomas Borman (06:47:51) (in thread): > It would go like this, but this is not the tidiest way and just temporary solution > > library("mia") > data("GlobalPatterns") > > tse <- GlobalPatterns > > colnames(rowData(tse)) <- c("Kingdom", "Phylum", "Class", "Taxa1", "Family", "Genus", "Species") > > rowData(tse)[["extra_column"]] <- rep(c("asd", "qwe")) > > asd <- c("kingdom", "phylum", "class", "taxa1", "family", "genus", "species", "extra_column") > > assignInNamespace("TAXONOMY_RANKS", asd, ns = asNamespace("mia")) > > splitByRanks(tse) > > addTaxonomyTree(tse) > > rowData(agglomerateByRank(tse, "Genus")) > > rowData(agglomerateByRank(tse, "Taxa1")) >

Leo Lahti (07:09:47) (in thread): > TheaddTaxonomyTreeadds the tree based on taxonomy rankings (just replicating the hierarchy in the taxonomic mapping table between ranks) but I assume that a regular user may instead like to agglomerate the real phylogenetic tree?@Moritz E. Beberwhich is your preference?

2023-11-07

Moritz E. Beber (05:08:55) (in thread): > What do you mean exactly by “agglomerate the real phylogenetic tree”? Just to be clear from my side, when I hear phylogenetic tree, I tend to think of computed trees based on sequence similarity as is done for ASVs of, for example, 16S sequencing. The taxprofiler pipeline is only for whole-genome sequencing data and analysis using reference databases, so the taxonomies are fixed beforehand. Maybe we are talking about the same thing, but I just wanted to make sure. When you talk about agglomeration, do you mean to summarize the relative abundance profile at a particular rank, or are you talking about reformatting the taxonomy to, for example, the seven typical ranks when looking at bacteria? If you mean the latter, I’d say that users should reformat the taxonomy and prepare their reference databases thus before even using the taxprofiler pipeline. If you mean the former, then I expect there are existing functions that can be used? At least thephyloseqpackage and BIOM itself already offer this.

Leo Lahti (08:02:45) (in thread): > Yes, probably it is best to prepare reference dbs already before profiling. > > Often this is not done, and data is available at species/ASV or such resolution, and the user may like to agglomerate it to higher levels. Then read counts or (relative) abundances are summed up, and the three can be collapsed as well. Such functionality is available inphyloseqand inmia. > > What I meant is that we are talking about two somewhat different types of trees above. The first type is computed based on sequence similarity and we would call this aphylogenetic tree; the second type is not computed at all but just a graphical representation of the hierarchy that we have in the taxonomy table (or rowData) and perhaps we could call thistaxonomy treeor something? TheaddTaxonomyTreeadds that latter type of tree. I have noticed that novice users often mix up these two different “trees”. I assume that you were interested in the first type of tree, which is computed based on sequence similarities. I am not sure how often users need the second type but it has been available.

Moritz E. Beber (08:09:42) (in thread): > Okay, we seem to agree on the semantics:slightly_smiling_face:. Users of the taxprofiler pipeline will only ever be looking for ataxonomytree, since the taxprofiler pipeline does not offer any other kind of analysis.https://nf-co.re/taxprofiler/ - Attachment (nf-co.re): taxprofiler: Introduction > Highly parallelised multi-taxonomic profiling of shotgun short- and long-read metagenomic data

Leo Lahti (08:11:47) (in thread): > ahaa, great! Fixing the semantics is always a good start:smile:

Leo Lahti (08:15:26) (in thread): > So - you can redefine TAXONOMY_RANKS and then do any operations as usual inmia(as Tuomas shows in the above example). > > Would that be sufficient for the time being? > > We could certainly think about building more wrappers for some of those routine operations, if you have suggestions. Trying to find a balance in flexibility and automation.

Tuomas Borman (09:06:45) (in thread): > Discussion is also open herehttps://github.com/microbiome/mia/issues/219 - Attachment: #219 taxonomy ranks are too restictive > Hi, > > in case one’s taxonomy ranks fall outside of what is hard-coded in mia::TAXONOMY_RANKS, it breaks functions like taxonomyRanks. > > For example: > > > > agglomerateByRank(my_tse, "Strain") > Error: 'rank' must be a value from 'taxonomyRanks()' > > > One solution would be to be able to customize TAXONOMY_RANKS. Or include all possible ranks in TAXONOMY_RANKS. Especially eukaryotes have a bunch of more ranks. > > Thanks!
> Bela

2023-11-08

Moritz E. Beber (09:42:32) (in thread): > Thank you. I think, I can work with that example. My ideal outcome would be a function, that can turn the tax table into a tree. I like the more functional approach of simple input -> output, rather than adding directly to an object. I’ll dig a bit in your code and see what I can come up with.

2023-11-22

David Mateo García (02:42:37): > Hi all, > I’m writing to ask you about DDA, specifically Maaslin2. There are some parts of it that I can’t quite grasp, and despite my efforts in searching, I haven’t found a clear answer to resolve my doubts. > 1. What exactly do the coefficients of the model measure? Is there a limit to them, or should they fall within a specific range - for example, between 0 and 1? > 2. How are q-values applied and interpreted? By default, I’m using the BH correction at 0.25. > 3. What should I consider before applying Maaslin2? In my case, I’m trying to relate taxa to cognitive variables. When I use all the variables at once, I don’t get significant results. If I model variable by variable, I get some significances. Finally, if I model by grouping some variables (mostly related to each other), I get more significances. Interpreting this has become a headache. > These might be very basic questions, and perhaps you could recommend some reading on the subject. Thank you very much for taking the time to read and respond.

Leo Lahti (02:45:32) (in thread): > @Himel Mallickmight be the guy for maaslin2..?

2023-11-28

Himel Mallick (09:48:24) (in thread): > @David Mateo García- I apologize for the late reply. I am no longer actively maintaining MaAsLin2 but if you could post your question in the bioBakery forum (https://forum.biobakery.org/c/downstream-analysis-and-statistics/maaslin/10), I am sure you will receive a faster response. All the best!

2024-01-11

USLACKBOT (08:59:01): > Utuhas removed themselves from this channel.

2024-01-18

Théotime Pralas (02:33:23): > @Théotime Pralas has joined the channel

2024-01-27

Leo Lahti (02:18:30): > https://x.com/VincentAB/status/1751027981005336892?s=31 - Attachment (X (formerly Twitter)): Vincent Arel-Bundock (@VincentAB) on X > Grant McDermott just released a new version of his 𝚙𝚕𝚘𝚝𝟸 package for #RStats. It’s an ultra-lightweight package that makes it easier to draw facets, area plots, ribbons, legends, and more using Base R plotting functions. Very cool stuff: https://t.co/8r6zoClwrN

2024-02-01

Axel Klenk (17:28:34): > @Axel Klenk has joined the channel

2024-02-05

Leo Lahti (06:46:28): > We will run course on gut microbiome-brain axis in Radboud, The Netherlands in July including hands-on practice with mia & Bioconductor:https://www.ru.nl/en/education/education-for-professionals/overview/brain-bacteria-and-behaviour-understanding-the-gut-brain-axis-rss406 - Attachment (ru.nl): Brain, Bacteria and Behaviour: understanding the gut-brain axis (RSS4.06) | Radboud University > In this hands-on course you will learn the latest evidence on the role of commensal gut microbiota in the gut-brain axis, and how to study interactions between gut microbiome and brain functioning, behaviour and psychiatric diseases.

2024-02-19

Leo Lahti (05:30:48): > MicrobiotaProcess:https://www.sciencedirect.com/science/article/pii/S2666675823000164

Leo Lahti (05:33:33): > This seems to provide an alternative to TreeSummarizedExperiment. Not clear to me what would be the added advantage of their MPSE class (which merges SummarizedExperiment + treedata) but curious to learn. COnverters with TreeSE are provided.

Vince Carey (11:32:01) (in thread): > I was not aware that this package has been in bioconductor for 4 years.https://bioconductor.org/packages/MicrobiotaProcess - Attachment (Bioconductor): MicrobiotaProcess > MicrobiotaProcess is an R package for analysis, visualization and biomarker discovery of microbial datasets. It introduces MPSE class, this make it more interoperable with the existing computing ecosystem. Moreover, it introduces a tidy microbiome data structure paradigm and analysis grammar. It provides a wide variety of microbiome data analysis procedures under the unified and common framework (tidy-like framework).

2024-04-15

Leo Lahti (16:44:35) (in thread): > Well I think it is a good idea to have. Which package is something to discuss.

2024-04-18

Leo Lahti (05:30:21): > Abstract submission is open for #EuroBioC2024 conference in Oxford:gb:Sep 4-6https://eurobioc2024.bioconductor.org/!Submit an abstractby April 26for > * Short talk > * Package demo > * Poster > * Contributed session > :point_right:Submit herehttps://openreview.net/group?id=bioconductor.org/EuroBioC/2024/Conference - Attachment (OpenReview): EuroBioC 2024 Conference > Welcome to the OpenReview homepage for EuroBioC 2024 Conference

2024-04-28

Danielle Callan (08:33:46): > @Danielle Callan has joined the channel

2024-05-02

Leo Lahti (16:22:06): > R interface to EBI MGnify metagenomics resource now in Bioconductor, including support for TreeSummarizedExperiment data containerhttps://bioconductor.org/packages/devel/bioc/html/MGnifyR.html - Attachment (Bioconductor): MGnifyR (development version) > Utility package to facilitate integration and analysis of EBI MGnify data in R. The package can be used to import microbial data for instance into TreeSummarizedExperiment (TreeSE). In TreeSE format, the data is directly compatible with miaverse framework.

2024-05-16

Moritz E. Beber (10:57:07): > Hi@Leo Lahtiand@Tuomas Borman, > After a long break, I’m continuing on my quest to create a TSE from a BIOM file. Can you please help me to understand why the tree generation fails for the following file and script? (Please note that due to group metadata, I had to customize reading the BIOM file a bit.) If you run the steps individually of theload_biomfunction, then it fails at theaddTaxonomyTree(or updatedaddHierarchyTree). Please use the attached BIOM file for testing. > > load_biom <- function(filename) { > raw <- rhdf5::h5read(filename, "/", read.attributes = TRUE) > biom <- create_biom(raw) > > if (is.null(raw$observation$`group-metadata`$ranks)) { > simpleWarning("The BIOM file does not contain taxonomy information; unable to generate a taxonomic tree.") > return(mia::makeTreeSEFromBiom(biom)) > } > > ranks <- get_ranks(raw) > > tse <- mia::makeTreeSEFromBiom(biom) > SummarizedExperiment::rowData(tse) <- create_row_data(biom, ranks) > utils::assignInNamespace("TAXONOMY_RANKS", tolower(ranks), ns = asNamespace("mia")) > mia::splitByRanks(tse) > mia::addTaxonomyTree(tse) > > return(tse) > } > > > create_biom <- function(h5array) { > data = biomformat:::generate_matrix(h5array) > rows = biomformat:::generate_metadata(h5array$observation) > columns = biomformat:::generate_metadata(h5array$sample) > shape = c(length(data),length(data[[1]])) > > id = attr(h5array,"id") > vs = attr(h5array,"format-version") > format = sprintf("Biological Observation Matrix %s.%s",vs[1],vs[2]) > format_url = attr(h5array,"format-url") > type = "OTU table" > generated_by = attr(h5array,"generated-by") > date = attr(h5array,"creation-date") > matrix_type = "dense" > matrix_element_type = "int" > > return(biomformat:::namedList(id,format,format_url,type,generated_by,date,matrix_type,matrix_element_type, > rows,columns,shape,data) |> biomformat:::biom()) > } > > get_ranks <- function(raw) { > ranks <- strsplit(raw$observation$`group-metadata`$ranks, ";", fixed = TRUE)[[1]] > return(ranks) > } > > create_row_data <- function(biom, ranks) { > meta <- biomformat::observation_metadata(biom) > column_names <- colnames(meta) > indeces <- startsWith(column_names, "taxonomy") > > if (sum(indeces) != length(ranks)) { > stop("The number of generic taxonomy* columns differs from the number of ranks.") > } > > colnames(meta) <- replace(column_names, indeces, ranks) > return(meta) > } > - File (Binary): complete.biom

Leo Lahti (12:53:21) (in thread): > Thanks@Moritz E. Beber- we are having a look at this asap.

2024-05-17

Akewak Jeba (08:11:43) (in thread): > From my debugging I found out that we need to apply addTaxonomyTree to each element of the list

Akewak Jeba (08:12:15) (in thread): > tse_list <- mia::splitByRanks(tse)

Tuomas Borman (08:36:05) (in thread): > @Akewak JebaCan you give working example on importing of that BIOM file?

Akewak Jeba (22:45:07) (in thread): > I added this at the end > > test_file <- "path_to_file/complete.biom" > if (!file.exists(test_file)) { > stop("The file complete.biom could not be found. Please check the file path.") > } else { > tse <- load_biom(test_file) > print(tse) > } >

2024-05-18

Moritz E. Beber (05:29:51) (in thread): > I was more or less blindly following@Tuomas Borman’s previous example. I see now in the documentation forsplitByRanksthat it indeed returns a list with one TSE element per rank. That makes sense to me, but how do I then add a taxonomic tree for the full TSE?

Moritz E. Beber (05:38:36) (in thread): > Just noticed that the following code runs, but I’m not clear that it’s correct as I can’t draw it. Do missing values need to be explicitNAs? Currently, they are empty strings. > > > mia::getHierarchyTree(tse) > > Phylogenetic tree with 5 tips and 11 internal nodes. > > Tip labels: > Family:_5_1, Family:Lachnospiraceae, Family:_6, Family:, Family:_5_2 > Node labels: > Root:Root, Superkingdom:Bacteria, Clade:, Class:_1, Order:_1, Clade:Terrabacteria group, ... > > Rooted; includes branch lengths. > > > ggtree::ggtree(mia::getHierarchyTree(tse)) > Error in child_list[[i]] : > attempt to select less than one element in integerOneIndex > In addition: Warning message: > In y[tip.idx] <- 1:Ntip * step : > number of items to replace is not a multiple of replacement length >

Moritz E. Beber (05:45:39) (in thread): > So indeed, when I replace empty strings withNA, I can add a tree, however, I can’t plot it. > > > tse <- mia::addHierarchyTree(tse) > > miaViz::plotRowTree(tse) > ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned. > ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned. > ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned. > ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned. >

Moritz E. Beber (05:47:47) (in thread): > The output seems correct, though. > > > TreeSummarizedExperiment::rowTree(tse) > > Phylogenetic tree with 1 tips and 5 internal nodes. > > Tip labels: > Family:Lachnospiraceae > Node labels: > Root:Root, Superkingdom:Bacteria, Clade:Terrabacteria group, Class:Clostridia, Order:Lachnospirales > > Rooted; includes branch lengths. >

Tuomas Borman (06:00:38) (in thread): > > So indeed, when I replace empty strings withNA, I can add a tree, however, I can’t plot it. > That will be fixed byhttps://github.com/microbiome/mia/pull/547This is your rowData > > taxonomy1 Kingdom Phylum Class Order Family > <character> <character> <character> <character> <character> <character> > Kingdom:Bacteria Root Bacteria > Phylum:Terrabacteria group Root Bacteria Terrabacteria group > Class:Clostridia Root Bacteria Terrabacteria group Clostridia > Order:Lachnospirales Root Bacteria Terrabacteria group Clostridia Lachnospirales > Family:Lachnospiraceae Root Bacteria Terrabacteria group Clostridia Lachnospirales Lachnospiraceae > > Same phylogenetic path is shared by all rows which means that the calculated hierarchy tree us just one line - Attachment: #547 getHierarchyTree: empty cells > I noticed that only NA cells were considered as empty. This led to error –> the created tree was not matching with rows if cells contained for instance character ““.

Tuomas Borman (06:03:00) (in thread): > > miaViz::plotRowTree(tse, show_label = TRUE) >

Tuomas Borman (06:03:36) (in thread): - File (PNG): image.png

Tuomas Borman (06:08:07) (in thread): > Also note that hierarchy tree does not create taxonomy tree. It creates tree that represents only hierarchy between rows. Taxonomy tree or phylogeny is commonly referred as a tree that takes into account the genetic distances between taxa. That is why we changed the name

Tuomas Borman (06:14:14) (in thread): > So to import your data, you could do something like this (PR is not merged yet so addHierarchyTree is not working yet) > > library(mia) > file_name <- "complete.biom" > # Impport BIOM file (imortBIOM is the new name) > tse <- loadFromBiom(file_name) > > # Rename rowData fields > ranks <- c("taxonomy2" = "Kingdom", "taxonomy3" = "Phylum", "taxonomy4" = "Class", "taxonomy5" = "Order", "taxonomy6" = "Family") > ind <- match( names(ranks), colnames(rowData(tse))) > colnames(rowData(tse))[ ind ] <- ranks > # Or we could set taxonomy ranks based on names that we have in rowData > # setTaxonomyRanks(colnames(rowData(tse))) > > # Add hierarchy tree > tse <- addHierarchyTree(tse) > > # Agglomerate to all levels (agglomerateByRanks will be the new name) > altExps(tse) <- splitByRanks(tse, agglomerate.tree = TRUE) >

Moritz E. Beber (07:03:17) (in thread): > Fantastic, that looks promising. I look forward to the new mia version then.

Tuomas Borman (09:15:50) (in thread): > Cool, we will keep you informed. Updates will be there in couple of days

2024-05-20

Tuomas Borman (13:36:58) (in thread): > @Moritz E. BeberNow it should work (version 1.13.10) > > Any feedback is welcome!

Moritz E. Beber (15:33:39) (in thread): > The latest version I seem to be getting from BiocManager ‘devel’ is 1.13.0 (from May 1), do I need to install from GitHub directly?

Moritz E. Beber (15:38:47) (in thread): > Okay, got version 1.13.11 directly from GitHub.

Moritz E. Beber (15:53:14) (in thread): > Now the code runs fine:slightly_smiling_face:and looks correct to me. The last thing is thatmiaViz::plotRowTree(tse, show_label = TRUE)still spits out a couple of messages, although it does draw the “tree”. > > ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned. > ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned. > ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned. > ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned. > ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned. > - File (PNG): image.png

Moritz E. Beber (15:54:49) (in thread): > (That’s with miaViz 1.13.0, Bioconductor 3.20, and R 4.4.0).

2024-05-21

Tuomas Borman (09:16:16) (in thread): > yes, there are lots of changes coming so we do not want to push them yet to Bioconductor so we can test them first thoroughly . > > Those warnings come fromtidytree::as_tibble(rowTree(tse))they are not harmful. There are some checks in tidytree. The structure of the tree is quite awkward, and other datasets do not cause these warnings

Moritz E. Beber (14:45:23) (in thread): > I’ll see about adding a little better test data. Thank you for all your help:slightly_smiling_face:

Tuomas Borman (16:04:58) (in thread): > No problem, happy to help!

Moritz E. Beber (16:52:40) (in thread): > I’ve updated the R package with the latest developmentshttps://github.com/taxprofiler/taxpasta2tse. Now i need to adjust taxpasta to actually format the BIOM file in this way:laughing:. When those two are in sync, I might consider publishing on Bioconductor.

2024-05-22

Tuomas Borman (06:57:38) (in thread): > Hmmm, sounds interesting, I take a look

Leo Lahti (07:01:10) (in thread): > taxpasta had flipped out of my mind but that seems extremely useful, overall. If it was in Bioconductor that would be even better. We should see if this can be highlighted in some way in the processing examples in our own documentation material.

Leo Lahti (07:01:39) (in thread): > There are also other importers in mia for TreeSE, would those help (in addition to BIOM)?

Leo Lahti (07:02:18) (in thread): > The “Data loading” section in mia function reference:https://microbiome.github.io/mia/reference/index.html

Leo Lahti (07:04:11) (in thread): > Just thinking if taxpasta2tse will provide a general purpose importer for all kinds of possible formats, it might be more clear to support that, as compared to listing a variety of format-specific importers in mia

2024-05-23

Ely Seraidarian (03:32:20): > @Ely Seraidarian has joined the channel

shishr (04:21:25): > @shishr has joined the channel

Moritz E. Beber (11:46:43) (in thread): > For now, my goal was to only support the BIOM format output from taxpasta, as that can contain all required information (counts + taxonomy) in one file. It will betaxpasta’s roleto ingest various formats and produce the BIOM output. Taxpasta was created for and is used in thehttps://nf-co.re/taxprofiler/nextflow pipeline. Taxpasta itself is a Python CLI, so I didn’t have plans to put it on Bioconductor so far. - Attachment (taxpasta.readthedocs.io): TAXPASTA > TAXonomic Profile Aggregation and STAndardisation - Attachment (nf-co.re): taxprofiler: Introduction > Highly parallelised multi-taxonomic profiling of shotgun short- and long-read metagenomic data

Tuomas Borman (13:32:07) (in thread): > If the format is BIOM, mia::importBIOM() should already work, or?

2024-05-24

Moritz E. Beber (12:21:07) (in thread): > To import the basic data, yes, but everything we talked about here with regard to creating a taxonomy tree would not work automatically. That’s what I wanted to achieve with this custom function. If you think that it could have a place in the mia package, that’d be fantastic, of course, but I had the impression that was not the case.

2024-05-27

Tuomas Borman (11:25:33) (in thread): > Yes, the idea of mia is to provide basic functions to do single tasks and not for larger pipelines. But we can think about that. Also one thing to consider is the automatic creation of hierarchy tree which might cause problems for users that do not know what they are doing

Moritz E. Beber (11:39:43) (in thread): > We may have different views on the tree. Since taxpasta passes through the taxonomy from an NCBI taxdump formatted input, problems should only be introduced in someone customized that taxonomy in a wrong way. Can happen, of course, but the convenience is worth it IMO.

2024-05-28

Leo Lahti (15:45:18) (in thread): > I tend to agree with@Moritz E. Beberabout the simplicity. Do I get it correctly that taxpasta is not using the actual phylogenetic tree but it uses the tree that can be constructed based on taxonomic mapping hierarchy? Whynot the real tree?

Moritz E. Beber (16:38:52) (in thread): > Taxpasta only transforms (standardizes) the output from many different metagenomic profilers. Those profilers will use the taxonomy. All of the current profilers are based on reference databases for whole genomes or marker genes. Thus they use fixed taxonomies and do not create their own phylogenies on the fly. Taxpasta only uses the taxonomy to annotate names and lineages.

Moritz E. Beber (16:40:01) (in thread): > There was a user request recently for supporting other profilers like DADA2, we would have to see how they export their phylogeny and how to pass that through then.

2024-05-29

Leo Lahti (12:47:51) (in thread): > @Tuomas BormanI think we could consider that taxpasta related importer in mia?

2024-05-30

Tuomas Borman (01:41:40) (in thread): > Yes, of course, this sounds interesting

Leo Lahti (03:13:12) (in thread): > Would you be willing to open a PR@Moritz E. Beber? The main issue could be that we are willing to maintain certain implementation standards. But it should be relatively straightfwd to catch by checking how some other related function has been implemented.

Leo Lahti (03:13:58) (in thread): > Or if some details need to be discussed first, opening an issue could be an option.

Moritz E. Beber (14:49:40) (in thread): > Sure, having it directly in mia would be fantastic. When you look athttps://github.com/taxprofiler/taxpasta2tseare there some obvious things that are missing or need to be changed? I guess, you prefer pascal casing but that’s an easy fix.

2024-05-31

Tuomas Borman (02:02:16) (in thread): > Quite minor things that are easy to address. > > 1. This is for specific purpose (to import taxpasta to TreeSE), so the name cannot be load_biom. Instead it could be importTaxpasta. > > 2. The documentation should be extensive. It should describe what the function does. As this is importing data from tool that is not as widely used as BIOM itself for instance, it should link to taxpasta. Also look the documentation from mia, it should follow the same idea. > > 3. Coding style should also follow mia’s coding style so that the code is consistent within the package (Bioc guidelines). Also add more comments to the code. > > 4. There are some more Bioconductor-specific stuff. For example, the indentation should be multiplication of 4 spaces. Also the line width should not exceed 80 characters (it is ok to exceed this if necessary). See more from herehttps://contributions.bioconductor.org/r-code.html?q=inden#indentation5. There are unit tests already which is good. > > 6. Add input checks. filename can be character value, so you should test that it is character. > > 7. We follow specific naming convention: > > ClassName (e.g. TreeSummarizedExperiment) > functionName (e.g. agglomerateByRanks) > parameter.name (e.g. agglomerate.tree) > .internal_function (e.g. .calculate_overlap) > > 8. Instead of using function like this: rhdf5::h5read, use @importFrom rhdf5 h5read and use the function without specifying the package. - Attachment (contributions.bioconductor.org): Chapter 15 R code | Bioconductor Packages: Development, Maintenance, and Peer Review > Everyone has their own coding style and formats. There are however some best practice guidelines that Bioconductor reviewers will look for. can be a robust, fast and efficient programming language…

2024-06-01

Leo Lahti (04:39:50) (in thread): > Perfect. Btw. we are also ourselves now updating these naming conventions, so some of the current functions are throwing (or soon throwing) deprecation messages. They should remain functional, however. Sorry for that inconvenience.

2024-06-05

Adrian Hirt (05:48:02): > @Adrian Hirt has joined the channel

2024-07-11

Danielle Callan (10:14:47): > hey folks, wondering if anyone knows of an existing method to go from treeSE to biom? i know mia has the biom importer, but im looking to go the opposite direction as well.

Leo Lahti (13:13:48): > I don’t think that export functions exist. Would be great addition, though

2024-07-12

Tuomas Borman (05:28:52) (in thread): > Hello! > > Now there is a method that converts TreeSE to BIOM: convertToBIOM() > > I just merged the PRhttps://github.com/microbiome/mia/pull/606. Install mia directly from GitHub (version > 1.13.30). Any feedback is welcome! - Attachment: #606 convertToBIOM > #605 > > Function for converting SE to BIOM.

Danielle Callan (06:56:02) (in thread): > That’s awesome!!!!:sunglasses:You are greatly appreciated sir.

2024-07-16

Leo Lahti (06:39:17) (in thread): > It was fast:smile:

2024-07-30

Li-Fang Yeo (02:44:16): > @Li-Fang Yeo has joined the channel

2024-08-14

Jayaram Kancherla (13:34:22): > @Jayaram Kancherla has left the channel

2024-08-16

Li-Fang Yeo (02:46:04): > Hello! > > I want to runRDA for multiple combinations of variables. But each combination takes more than half a day to run. Is there a way to save the matrix somewhere and just feed in different combinations of covariates? > > calculate_beta_diversity <- function(tse) { > mia::mergeFeaturesByRank(tse, rank = “Species”) %>% > mia::transformAssay(method = “relabundance”) %>% > mia::runRDA(assay.type = “relabundance”, > formula = assay ~ BL_AGE + MEN, > distance = “bray”, > na.action = na.exclude) > } > tse_species <- compute_or_load_result(function(x) calculate_beta_diversity(tse), “cache/beta-diversity-htn.rds”) > > rda_info <- attr(SingleCellExperiment::reducedDim(tse_species, “RDA”), “significance”)

2024-08-19

Rema Gesaka (09:40:03): > @Rema Gesaka has joined the channel

2024-08-21

Laura Symul (08:57:26): > @Laura Symul has joined the channel

2024-08-27

Muluh (07:23:33) (in thread): > Hi, sorry for the late reply on this issue. I’m delighted to assist you in mitigating the time taken for the calculation. To begin, I would like to know if you can share: > 1. Dataset dimension > 2. Is it possible for me to have a copy of the dataset to attempt reproducing the issue? > Just a comment here that mia is constantly developed and the functionsmergefeaturesByRankandtransformAssayare deprecated in favour ofagglomerateByRankandtransformCounts.We can also try your calculations with the latest version in development to identify where the issue is. > > BR.

Leo Lahti (16:31:18) (in thread): > Thanks@Muluh- I would not consider RDA as a beta diversity measure, it goes beyond beta diversity as it is also finding associations with the target variable, not just calculating dissimilarities.

Leo Lahti (16:31:47) (in thread): > So function name “calculate_beta_diversity” seems somehow non-standard with this one.

Leo Lahti (16:36:18) (in thread): > The slow operation in RDA is indeed the beta diversity calculation. It is a very good question whether that can be calculated just once. I don’t think this is supported currently. The solution would be to add beta diversity matrix in the TreeSE data object, then add support for that in mia::runRDA (to pick the pre-calculated dissimilarities from there). ping@Tuomas Borman

Leo Lahti (16:49:33) (in thread): > I opened an issue on this onehttps://github.com/microbiome/mia/issues/633 - Attachment: #633 RDA speedup > Running mia::runRDA can be very slow for large data sets. This is problem in particular when we want to calculate alternative RDA models with different formula (e.g. assay ~ BMI + AGE vs. assay ~ BMI vs. assay ~ AGE etc), as in: > > > mia::runRDA(tse, > assay.type = "relabundance", > formula = assay ~ BL_AGE + MEN, > distance = "bray", > na.action = na.exclude) > > > > One problem is that the beta diversity is here re-calculated for every combination. > > Speedups could be obtained by using pre-calculated beta diversity matrix, stored in TreeSE object and then supporting the use of that instead, e.g. something like: > > > mia::runRDA(meta(tse)$betadiv, > assay.type = "relabundance", > formula = assay ~ BL_AGE + MEN, > distance = "bray", > na.action = na.exclude) > > > > Implementation details can be discussed but this would be a substantial improvement.

Leo Lahti (16:50:05) (in thread): > Did you manage to solve it,@Li-Fang Yeo

2024-08-28

Li-Fang Yeo (01:54:46) (in thread): > Hello everyone! Thank you for your replies. I find that i learn something new everytime I muster up the courage to ask a question in a forum. I decided on the slow and steady way, which is to run every combination on my laptop overnight.:smiley:But, this is definitely a recurring problem because other datasets will also require me to run RDA for multiple combinations of covariates.

Li-Fang Yeo (02:00:28) (in thread): > I have since last week, changed the code to look like the standard one that Leo posted in github. Because I wasn’t a 100% sure how to manipulate that code I posted. (i inherited codes from ppl in my team)

Leo Lahti (02:17:24) (in thread): > We should try to solve this rapidly but good if you are able to proceed with your current model.

2024-09-04

Samuel Gamboa (14:31:22): > Hi, all. Is there a shorter way to merge all features agglomerated by rank plus the original features into a single SE than using splitByRank, adding the original features to the output list, and merging? (example below) > > library(mia) > #> Loading required package: SummarizedExperiment > #> Loading required package: MatrixGenerics > #> Loading required package: matrixStats > #> > #> Attaching package: 'MatrixGenerics' > #> The following objects are masked from 'package:matrixStats': > #> > #> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, > #> colCounts, colCummaxs, colCummins, colCumprods, colCumsums, > #> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, > #> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, > #> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, > #> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, > #> colWeightedMeans, colWeightedMedians, colWeightedSds, > #> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, > #> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, > #> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, > #> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, > #> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks, > #> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, > #> rowWeightedMads, rowWeightedMeans, rowWeightedMedians, > #> rowWeightedSds, rowWeightedVars > #> Loading required package: GenomicRanges > #> Loading required package: stats4 > #> Loading required package: BiocGenerics > #> > #> Attaching package: 'BiocGenerics' > #> The following objects are masked from 'package:stats': > #> > #> IQR, mad, sd, var, xtabs > #> The following objects are masked from 'package:base': > #> > #> anyDuplicated, aperm, append, as.data.frame, basename, cbind, > #> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, > #> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, > #> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, > #> Position, rank, rbind, Reduce, rownames, sapply, setdiff, table, > #> tapply, union, unique, unsplit, which.max, which.min > #> Loading required package: S4Vectors > #> > #> Attaching package: 'S4Vectors' > #> The following object is masked from 'package:utils': > #> > #> findMatches > #> The following objects are masked from 'package:base': > #> > #> expand.grid, I, unname > #> Loading required package: IRanges > #> Loading required package: GenomeInfoDb > #> Loading required package: Biobase > #> Welcome to Bioconductor > #> > #> Vignettes contain introductory material; view with > #> 'browseVignettes()'. To cite Bioconductor, see > #> 'citation("Biobase")', and for packages 'citation("pkgname")'. > #> > #> Attaching package: 'Biobase' > #> The following object is masked from 'package:MatrixGenerics': > #> > #> rowMedians > #> The following objects are masked from 'package:matrixStats': > #> > #> anyMissing, rowMedians > #> Loading required package: SingleCellExperiment > #> Loading required package: TreeSummarizedExperiment > #> Loading required package: Biostrings > #> Loading required package: XVector > #> > #> Attaching package: 'Biostrings' > #> The following object is masked from 'package:base': > #> > #> strsplit > #> Loading required package: MultiAssayExperiment > data("GlobalPatterns") > dim(GlobalPatterns) > #> [1] 19216 26 > > l <- splitByRanks(GlobalPatterns) > l$features <- GlobalPatterns > > mergedSE <- mergeSEs(l, collapse.cols = TRUE) > #> Merging with full join... > #> 1/82/83/84/85/86/87/88/8 > #> Merging rowTree... > #> Warning in convertNode(tree = value, node = olab[olab %in% lab]): Multiple > #> nodes are found to have the same label. > dim(mergedSE) > #> [1] 21908 26 >

2024-09-05

Muluh (06:43:45) (in thread): > Hi@Samuel Gamboa! If you don’t want to usesplitByRanksthere’s an alternative withagglomerateByRankswhich stores the list in thealtExpslots. Currently we don’t yet support direct methods to merge altExp and SCE. Hence this alternative might be longer than your solution. > > tse <- agglomerateByRanks(GlobalPatterns) > L <- altExps(tse) > L$features <- GlobalPatterns > merged <- mergeSEs(L, collapse.cols = TRUE) >

Samuel Gamboa (17:05:24) (in thread): > Thank you!

2024-09-06

Leo Lahti (03:35:41) (in thread): > Hmm I am wondering how common use case this would be, is this something thatshould be supported..?

2024-09-08

Sounkou Mahamane Toure (11:28:47): > @Sounkou Mahamane Toure has joined the channel

2024-09-17

Li-Fang Yeo (08:46:41): > Hello, > > I am trying to import a HUMAnN file using mia. But the error i get is Error: ‘file’ must be a single character value. > > library(mia) #1.13.36 > test <- read_delim("test.txt", delim = "\t") > tseb <- mia::importHUMAnN (test) > > ##I got this toy data from HUMAnN website. My own data doesn't work either. > # Pathway $SAMPLENAME_Abundance > UNMAPPED 140.0 > UNINTEGRATED 87.0 > UNINTEGRATED|g__Bacteroides.s__Bacteroides_caccae 23.0 > UNINTEGRATED|g__Bacteroides.s__Bacteroides_finegoldii 20.0 > UNINTEGRATED|unclassified 12.0 > PWY0-1301: melibiose degradation 57.5 >

Tuomas Borman (08:53:32) (in thread): > Hello, > > does this work: > > mia::importHUMAnN(“test.txt”)

Li-Fang Yeo (08:59:47) (in thread): > it does!! okay thank you =)

Tuomas Borman (09:02:59) (in thread): > Cool!

2024-10-11

Ji Hen Lau (01:33:32): > @Ji Hen Lau has joined the channel

2024-11-19

Muluh (12:07:35): > For package submission to bioconductor, you can refer to: > 1. https://www.rpubs.com/Saskia/554320 > 2. https://contributions.bioconductor.org/submission-overview.html > To verify your package’s compatibility before submission, > Run the following checks: > > #! bash > R CMD build package_name > R CMD check package_name_version.tar.gz > > > > #! R > BiocManager::install("BiocCheck") > BiocCheck::BiocCheck("path/to/package_name") > > > > #! R > devtools::check() > # some duplicated checks here but you can't be > # too sure > devtools::test() > devtools::run_examples() > devtools::build_vignettes() > > > > #! R > tools::checkPackage("path/to/package_name") > > > > #! R > BiocManager::valid() > - Attachment (contributions.bioconductor.org): Overview | Bioconductor Packages: Development, Maintenance, and Peer Review > The following page gives an overview of the submission process along with key principles to follow. See also Package Guidelines for package specific guidelines and requirement and the Bioconductor…

Leo Lahti (15:59:31): > In my experience the command line versions do more extensive testig

Leo Lahti (16:00:13): > Along these lines: > ~/bin/R-4.2.2/bin/R CMD build ../../ #–resave-data #–no-examples –no-build-vignettes > ~/bin/R-4.2.2/bin/R CMD check microbiome_1.21.1.tar.gz #–no-build-vignettes –no-examples > ~/bin/R-4.2.2/bin/R CMD BiocCheck microbiome_1.21.1.tar.gz > ~/bin/R-4.2.2/bin/R CMD INSTALL microbiome_1.21.1.tar.gz

2024-11-25

Thomaz Bastiaanssen (06:57:44): > @Thomaz Bastiaanssen has joined the channel

2024-11-26

Hassan Diab (05:53:33): > @Hassan Diab has joined the channel

Hassan Diab (06:10:21): > Hello, > I am using UniFrac to calculate PCoA (for beta diversity analysis). I tried to runMDS on tse object that is agglomerated by Species but I got the error message: > “Incompatible tree and abundance table! Please try to provide ‘node.label’” > I then extracted the nodeLabs corresponding to the detected species and set the node.label argument accordingly (as shown in the code below). The error is gone after I do that but is this the right way of doing it?Transform count assay to relative abundancestse_cyto <- transformAssay(tse_cyto, assay.type = “counts”, method = “relabundance”)Agglomerate by Species and subset by prevalencetse_cyto <- subsetByPrevalentFeatures(tse_cyto, rank = “Species”, assay.type = “relabundance”, prevalence = 5/100, detection = 0.1/100)Extract nodes_labelrow_links_dataframe <- as.data.frame(rowLinks(tse_cyto)) > nodes_label <- row_links_dataframe$nodeLabRun MDS with UniFractse_cyto <- runMDS(tse_cyto, > FUN = getDissimilarity, > name = “Unifrac”, > method = “unifrac”, > tree = rowTree(tse_cyto), > ntop = nrow(tse_cyto), > assay.type = “counts”, > weighted = TRUE, > node.label = nodes_label)

Tuomas Borman (06:33:14) (in thread): > Hello! > > That is correctly done. > > Phylogeny and matrix is sent to external function to calculate Unifrac. Because of agglomeration done insubsetByFeatures(), the rownames are updated to refer species, however, the node names in tree are not updated. That is why the function cannot match nodes with rows anymore based on names; it needs the linkages that are provided bynode.label, just like you did. > > Note that the phylogeny is not pruned in agglomeration. You might want to consider updating it bysubsetByPrevalent(..., update.tree = TRUE)also note that it just prunes the tree, i.e., simplifies its structure to reflect the current data; it does not modify the node names so you should still provide the linkages withnode.label

Hassan Diab (07:12:06) (in thread): > okay great! Thanks for the help. Much appreciated.

2024-11-27

Leo Lahti (04:59:32): > Should we consider changing defaults so that they will better match with these standard use cases..?

Tuomas Borman (06:18:36) (in thread): > Yes, I think the tree should be updated by default. I made an issue:https://github.com/microbiome/mia/issues/661update.tree was added afterwards to the functions, which is why we did not want to modify the default behavior so we set the default choice to FALSE (I think that was the reason). > > Also, perhaps because [] do not have tree update option, we decided to set this to FALSE. However, the pruning can be done after subsetting with [] which is illustrated in OMA:https://microbiome.github.io/OMA/docs/devel/pages/taxonomy.html#sec-update-tree

Leo Lahti (06:46:00) (in thread): > Right. That is also a motivation for having the separate subsetting functions that take care of such things?

Leo Lahti (06:46:14) (in thread): > (unless we modify the [] operation, which we could in principle do)

2024-11-30

Thomaz Bastiaanssen (05:40:24): > Hi all, > > I’m going through OMA and I was wondering, what is the recommended way to get relevant data from a tse or mae into long format, for example for more specialised plotting or statistical assessment? > > Is itmeltSE()?

Tuomas Borman (07:03:50): > Hi! > > with > > library(mia) > data(HintikkaXOData) > mae <- HintikkaXOData > tse <- mae[[1]] > colData(tse) <- colData(mae) > > meltSE(tse, assay.type = "counts", add.col = c("Rat", "Diet"), add.row = TRUE) > > you can put SummarizedExperiment to long format. With > > longFormat(mae, i = c("counts", "nmr", "signals"), colDataCols = c("Rat", "Diet")) > > you can put MultiAssayExperiment to long format.

Tuomas Borman (07:09:55): > Thanks, I added info on that:https://github.com/microbiome/OMA/pull/645 - Attachment: #645 Example on melting MAE

Thomaz Bastiaanssen (07:58:01): > Brilliant, thanks Tuomas!

2024-12-04

Ben Valderrama (11:24:22): > @Ben Valderrama has joined the channel

Thomaz Bastiaanssen (11:28:12): > Hi all, please allow me to introduce@Ben Valderrama, the PhD researcher in John Cryans microbiota-gut-brain axis group in Cork, interested in helping out@Leo Lahti@Tuomas Borman@Muluh. > > Brilliant bioinformatician and overall lovely guy!

Ben Valderrama (11:32:24) (in thread): > Thank you for the kind introduction:raised_hands:. Hello everyone, it’s my pleasure to join this channel. I’m very excited to see how I can contribute to this amazing project

Tuomas Borman (11:40:14) (in thread): > Hello@Ben Valderrama! Great to get you onboard!

2024-12-07

Leo Lahti (05:17:10) (in thread): > Hello!

2024-12-12

Janetta Top (05:16:27): > Hi all, I would like remove samples from a TSE with e.g. less than 500 reads/counts. Does anyone know how to script this? I did not find this in OMA.

Tuomas Borman (07:22:56) (in thread): > Hello! > > Does this work for you? > > library(mia) > library(scater) > > data(GlobalPatterns) > tse <- GlobalPatterns > # Add library sizes to colData > tse <- addPerCellQC(tse) > # Ge library sie from colData. Create logical vector denoting those samples that > # have more than 500 counts > keep <- tse[["total"]] >= 500 > # Subset > tse <- tse[, keep] >

Janetta Top (09:03:39) (in thread): > Thanks a lot for the suggestion. It indeed worked!

Leo Lahti (15:18:13) (in thread): > I think you could also write shorter for the last two code lines, like justtse <- tse[, tse$total>=500]

Leo Lahti (15:20:05) (in thread): > Or for the entire operation on one line:tse <- tse[, colSums(assay(tse, "counts")) >= 500]

Leo Lahti (15:21:22) (in thread): > We could have something likereadCounts(tse)as a wrapper but it is almost as fast but more transparent to writecolSums(assay(tse, "counts")).

2024-12-16

Leo Lahti (19:03:28): > Maaslin3 supports SummarizedExperimenthttps://www.biorxiv.org/content/10.1101/2024.12.13.628459v1 - Attachment (bioRxiv): MaAsLin 3: Refining and extending generalized multivariable linear models for meta-omic association discovery > A key question in microbial community analysis is determining which microbial features are associated with community properties such as environmental or health phenotypes. This statistical task is impeded by characteristics of typical microbial community profiling technologies, including sparsity (which can be either technical or biological) and the compositionality imposed by most nucleotide sequencing approaches. Many models have been proposed that focus on how the relative abundance of a feature (e.g. taxon or pathway) relates to one or more covariates. Few of these, however, simultaneously control false discovery rates, achieve reasonable power, incorporate complex modeling terms such as random effects, and also permit assessment of prevalence (presence/absence) associations and absolute abundance associations (when appropriate measurements are available, e.g. qPCR or spike-ins). Here, we introduce MaAsLin 3 (Microbiome Multivariable Associations with Linear Models), a modeling framework that simultaneously identifies both abundance and prevalence relationships in microbiome studies with modern, potentially complex designs. MaAsLin 3 also newly accounts for compositionality with experimental (spike-ins and total microbial load estimation) or computational techniques, and it expands the space of biological hypotheses that can be tested with inference for new covariate types. On a variety of synthetic and real datasets, MaAsLin 3 outperformed current state-of-the-art differential abundance methods in testing and inferring associations from compositional data. When applied to the Inflammatory Bowel Disease Multi-omics Database, MaAsLin 3 corroborated many previously reported microbial associations with the inflammatory bowel diseases, but notably 77% of associations were with feature prevalence rather than abundance. In summary, MaAsLin 3 enables researchers to identify microbiome associations with higher accuracy and more specific association types, especially in complex datasets with multiple covariates and repeated measures. > > ### Competing Interest Statement > > C.H. declares the following associations: Seres Therapeutics (scientific advisory board, microbiome therapies), Microbiome Insights (scientific advisory board, microbiome data generation), Zoe (scientific advisory board), Empress (scientific advisory board, microbiome therapies).

2024-12-25

Leo Lahti (22:22:29): > Just saw this onehttps://bioconductor.org/books/release/OHCA/

2024-12-26

Rasmus Hindström (14:35:48): > @Rasmus Hindström has joined the channel

2024-12-28

Pascal-Onaho (07:55:39): > @Pascal-Onaho has joined the channel

2025-01-11

Leo Lahti (16:10:05): > I was trying to get full taxonomy labels for a tse object (a combination of multiple levels), usinglibrary(mia)``data(peerj13075)``tse <- peerj13075 ``getTaxonomyLabels(tse, with.rank=TRUE)

Leo Lahti (16:10:08): > This outputs stuff like

Leo Lahti (16:10:09): > > head(getTaxonomyLabels(tse, with.rank=TRUE)) > [1] “genus:Abiotrophia” “genus:Abyssicoccus” “genus:Acetobacterium” > [4] “genus:Acetonema” “genus:Acholeplasma” “genus:Achromobacter”

Leo Lahti (16:10:53): > But I would like to get something like > “Bacteria|Firmicutes|Bacilli|Lactobacillales|Aerococcaceae|Abiotrophia”

Leo Lahti (16:11:19): > or

Leo Lahti (16:11:20): > “k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Aerococcaceae|g__Abiotrophia”

Leo Lahti (16:11:32): > etc.

Leo Lahti (16:11:58): > So the full taxonomy in the name, or at least something more than just genus, for instance family + genus.

Leo Lahti (16:12:31): > I thought we already had a function for this butgetTaxonomyLabelsdoes not seem to do the job based on its manpage.

Leo Lahti (16:12:46): > ping@Tuomas Borman@Muluh?

2025-01-12

Tuomas Borman (05:58:24): > We have only getTaxonomyLabels and its idea is to get the “tidy”, shortest possible names that identifies each bacteria. > > If the idea is to rename each row in this format ““Bacteria|Firmicutes|…”, we do not have that functionality currently. Although, if there is a need, we could extend the getTaxonomyLabels function

2025-01-13

Leo Lahti (15:37:49): > In the example above, we havelibrary(mia)``data(peerj13075)``tse <- peerj13075 ``labs <- getTaxonomyLabels(tse, with.rank=TRUE)``labs[62:63][1] “genus:Amycolatopsis” “genus:Amycolatopsis_1”

Leo Lahti (15:37:55): > Checking rowData:

Leo Lahti (15:37:56): > > rowData(tse)[62:63,] > DataFrame with 2 rows and 6 columns > kingdom phylum class order > > OTU184 Bacteria Actinobacteria Actinobacteria Pseudonocardiales > OTU185 Bacteria Actinobacteria Actinobacteria Pseudonocardiales > family genus > > OTU184 Pseudonocardiaceae Amycolatopsis > OTU185 Pseudonocardiaceae Amycolatopsis

Leo Lahti (15:38:33): > In this case the rownames refer to OTUs. Row names might be often more informative than indices.

Leo Lahti (15:39:39): > Not sure if anything should be done but it is an option to modify the function so that the unique labels would be instead something like > > “genus:Amycolatopsis_OTU184” “genus:Amycolatopsis_OTU185”

2025-01-14

Tuomas Borman (11:46:00): > Does this make sense? The OTU information must be in taxonomy table as rownames can be arbitrary. > > > data(peerj13075) > > tse <- peerj13075 > > # Add OTUs to taxonomy table and set taxonomy ranks of mia > > rowData(tse)[["OTU"]] <- rownames(tse) > > setTaxonomyRanks(colnames(rowData(tse))) > > # Get taxonomy labels which correspond to OTUs > > labs <- getTaxonomyLabels(tse) > > labs[62:63] > [1] "OTU184" "OTU185" > > # Get taxonomy labels with genus > > labs <- getTaxonomyLabels(tse, with.rank=TRUE, lowest.rank = "genus") > > labs[62:63] > [1] "genus:Amycolatopsis_OTU184" "genus:Amycolatopsis_OTU185" > > https://github.com/microbiome/mia/pull/672

Leo Lahti (15:30:53): > I think it is a feasible solution. Rather often the OTU information is only in the row names.

2025-01-28

Rasmus Hindström (08:39:08): > Is the mia packages github discussions page the right place to post support questions? I’m having some difficulty getting Taxa names onto the plots, in place of the non informative rownames from the imported data. There must be a simple solution, that I am blind to.

Tuomas Borman (09:17:51): > Hello! You can ask here or there. Slack might not be accessible for everyone, so GH Discussions is another place to ask questions and seek help. As there are no discussions yet in GH, it might be higher threshold to start one; you could be the first one:smile:You could post the code there in the discussion:https://github.com/microbiome/OMA/discussions

2025-01-29

Rasmus Hindström (05:42:43): > Strangely the issue has resolved itself overnight. The code now correctly produces the plots with the informative taxonomy labels. For context, yesterday I tried to subset my data and plot with plotAbundanceDensity(). Plotting on the subsets resulted in noninformative feature names inherited from Qiime2’s default naming of features. While plotting over the whole tse, without subsetting prior, resulted in plots with informative feature names. > > I’ll open the flood gates and start posting more questions in the Q&A forum in OMA/discussions as I proceed with the analysis. I anticipate more questions will arise.

Chris Fields (10:42:28): > Hi all, sorry to intrude. I asked few general questions related to miaverse on#microbiome_metagenome. If you have any constructive input we’d greatly appreciate ithttps://community-bioc.slack.com/archives/C5EHVREKZ/p1738113178039239 - Attachment: Attachment > I have a few general (and hopefully not controversial!) questions for the community here, and maybe it’s worth a poll at some point.

2025-01-30

Leo Lahti (11:23:00): > Maaslin3 added support for assay typehttps://forum.biobakery.org/t/support-for-summarizedexperiment-assays/7772/3

2025-02-04

Hassan Diab (02:01:34): > Hello.. it is mentioned in the mia book that UniFrac uses rarefied counts. > > Does the runMDS function do the rarefication automatically when “method=unifrac” or should I do the rarefication by setting the arguments “niter”, “sample” and “replace” (as suggested in the book) within runMDS? > > Thanks

Tuomas Borman (02:16:09) (in thread): > Hello! > * rarefaction is not applied automatically > * when rarefaction is applied, the function utilizes vegan::avgdist function (https://rdrr.io/cran/vegan/man/avgdist.html) > * To enable, the function user must specifyniter > * Unfortunately, rarefaction is not supported when unifrac is specified at this moment (we should state this clearly in the documentation) > So the call would look something like thisrunMDS(tse, assay.type = "counts", FUN = getDissimilarity, method = "euclidean", trasnf = "rclr", niter = 100)

Hassan Diab (02:27:32) (in thread): > okay thanks! So I should use rarefyAssay() to get the rarefied counts then I can use that assay for UniFrac. Is this correct?

Tuomas Borman (02:36:23) (in thread): > No, that is not correct way to do that. I will come back to you in couple of days. I check if the support can be easily added

Tuomas Borman (10:58:30) (in thread): > Hello! > > Couple days have passed in the Imaginary Land and now there is support for rarefaction when unifrac is applied. > > Install the latest mia from GitHub (version 1.15.21):remotes::install_github("microbiome/mia")unifrac + rarefaction > > library(mia) > library(scater) > > data(GlobalPatterns) > tse <- GlobalPatterns > > tse <- runMDS( > tse, > assay.type = "counts", > FUN = getDissimilarity, > method = "unifrac", > tree = rowTree(tse), > sample = 10000, > niter = 10 > ) > plotReducedDim(tse, "MDS", colour_by = "SampleType") > > Happy to hear if there is something that could be still improved, and other feedback also, it helps a lot!

Leo Lahti (13:05:57) (in thread): > Unifrac is a tree-based method to start with. Could we have rowTree(tse) as the default for the “tree” argument, or is there a need to specify this explicitly?

Tuomas Borman (13:43:11) (in thread): > Because we use scater::runMDS the tree must be fed manually. This is the only way to do this currently > > I have been thinking about getMDS and addMDS to solve this problem. They would be just simple wrappers for run/calculate but they could feed the phylogeny automatically to scater functions

Leo Lahti (15:51:59) (in thread): > By the way, vegan rclr implementation is not yet ready. The current rclr version in vegan should better not be used.

Leo Lahti (15:52:57) (in thread): > I thought that the tree gets passed to getDissimilarity(..) and that scater::runMDS does include tree as an argument?

Leo Lahti (15:53:38) (in thread): > Those wrappers might be useful for microbiome folks. Just needs to be balanced with implementation + maintenance efforts.

Tuomas Borman (16:20:15) (in thread): > Yes, runMDS passes forward the tree to getDissimilarity() but to its matrix method, not to TreeSE method. That is why we need to give the tree separately > 1. We call runMDS > 2. runMDS extracts abundance matrix > 3. runMDS calls getDissimilarity that calculates dissimilarity based on the abundance matrix

2025-02-05

Hassan Diab (04:46:45) (in thread): > Thank you very much for your help:blush:

Hassan Diab (07:43:02) (in thread): > So I run the following command > > > row_links_dataframe <- as.data.frame(rowLinks(tse_subset)) > nodes_label <- row_links_dataframe$nodeLab > tse_subset <- runMDS(tse_subset, > assay.type = “counts”, > FUN = getDissimilarity, > method = “unifrac”, > tree = rowTree(tse_subset), > ntop = nrow(tse_subset), > sample = 10000, > niter = 10, > node.label = nodes_label) > > > which gave the following warning and error > > > Warning: The following sampling units were removed because they were below sampling depth: 820013449-1, 820023033-7, 820027543-9, 820040227-3, 820047185-2, 820048319-7, 820048431-0, 820048711-8, 820049873-3, 820050307-5, 820050405-7, 820050461-7, 820051833-6, 820053121-4, 820056441-7, 820056617-7, 820058409-0, 820058905-8, 820061577-9, 820062233-3, 820064201-2, 820066185-5, 820066425-1, 820066457-8, 820066857-1, 820069465-8, 820076569-0, 820078457-9, 820079497-6, 820081865-9, 820082105-9, 820083113-0, 820083785-8, 820084201-4, 820099907-7, 820200273-3, 820201081-0, 820201097-5, 820201577-1, 820201665-1, 820203329-3, 820205649-2, 820205745-2, 820205809-1, 820206297-4, 820208817-3, 820209529-0, 820209857-6, 820209945-4, 820209969-8, 820210001-8, 820210017-4, 820211635-7, 820212585-3, 820212795-4, 820212835-9, 820213145-2, 820214185-8, 820215945-4, 820216755-7, 820220265-6, 820220415-2, 820220885-3, 820225005-9, 820227675-9, 820228095-0, 820228205-9, 820228525-3, 820228635-3, 820229355-3, 820231633-2, 820231769-3, 820232249-2, 820232409-1, 820232481-4, 820234665-1, 820235913-7, 820236329-7, 820236625-4, 820236777-9, 820237273-9, 820240833-2, 820243515-7, 820245955-8, 820246285-4, 820246745-4, 820247245-6, 820247435-8, 820250085-3, 820250255-3, 820251085-2, 820251345-6, 820251925-0, 820252075-2, 820252405-7 > Warning: non-NULL ‘rownames(value)’ should be the same as ‘colnames(x)’ for ‘reducedDim<-’. > This will be an error in the next release of Bioconductor.Error in .set_internal_character(x, type, value, getfun = int_colData, : > invalid ‘value’ in ‘reducedDim(, type=“character”) <- value’: > ‘value’ should have number of rows equal to ‘ncol(x)’ > > > but then I removed the rownames that were included in the warning message from the dataframe before running the function runMDS and the problem was solved

Tuomas Borman (08:10:07) (in thread): > The first warning tells that that the library sizes in some samples was belowsample = 10000What rarefaction does is that it takes random samples from the data, in your case 10000. If the sample does not have that many counts, the sample is dropped. That is why commonly usedsampleis the smallest library size in the datamin(scater::perCellQCMetrics(tse)[["total"]])The error is caused because runMDS does not handle cases where some samples are removed (because of too small library size). I was able to reproduce that and hopefully it will be fixed soon, thanks!

Hassan Diab (08:11:22) (in thread): > yes I am using the min sample now. thank you very much!

2025-02-06

Leo Lahti (04:04:44) (in thread): > Hi@Tuomas Bormanin your summary above: > 1. We call runMDS > 2. runMDS extracts abundance matrix > 3. runMDS calls getDissimilarity that calculates dissimilarity based on the abundance matrix > -> This would be potentially solved by making PR to runMDS that would know how to extract the tree from TreeSE and pass it to getDissimilarity for Unifrac? I am not sure how feasible that would be but they accepted some PR from us earlier.

Tuomas Borman (10:03:34) (in thread): > It could be solved like that but I think the simplest solution would be to create a wrapper in mia. I am little bit skeptic about adding tree support to their methods as unifrac or trees are not really used in single cell analysis. Moreover, it would add new dependency to scater

2025-02-07

Tuomas Borman (08:55:52) (in thread): > Something like this:https://github.com/microbiome/mia/pull/689 - Attachment: #689 addMDS and getMDS > This PR implements wrappers for runMDS and calculateMDS. > > 1. User do not need to specify FUN = getDissimilarity and tree = rowTree(x), node.label = rowLinks(x)[[1]] as they are now default values. > 2. If rarefaction drops off some samples due to insufficient sampling depth, this is handled in addMDS.

Leo Lahti (16:09:13) (in thread): > yes perfect

2025-02-19

Hassan Diab (02:48:23): > Hello! > > I am trying to agglomerate based on a specific condition (butyrate production). For that, I created new genus_sub column in rowData where the genera of the selected taxa are re-named to “buty-producing” then I agglomerate by this new column. However, when I view the rowData of the new “buty-producing” it shows the rowData of one of selected taxa (the 1st one in alphabetical order) instead of showing NAs (since the new group contains different taxonomic groups). I am wondering if this is okay or perhaps I am doing something wrong. Please find the code below. > > {r} > # Subset tse object by filtered rownames > tse_buty <- tse[, col_data_rownames] > > # define genera that are butyrate-producing > buty_genus <- c("Butyricimonas", > "Butyricicoccus_A_77030", > "Butyricicoccus_A_77419", > "Butyrivibrio_A_168226", > "Butyrivibrio_A_180067", > "Odoribacter_865974", > "Agathobacter_164117", > "Agathobacter_164119", > "Anaerobutyricum", > "Anaerostipes", > "Coprococcus_A_121497", > "Coprococcus_A_187866", > "Roseburia", > "Shuttleworthia", > "Faecalibacterium", > "Flavonifractor", > "Pseudoflavonifractor_81068", > "Oscillibacter") > > # I have one taxon at the species level that I want to include in the agglomeration > > # Change the genus name of a species Eubacterium_G ventriosum to buty-producing to include in the agglomeration > rowData(tse_buty)$Genus <- ifelse(rowData(tse_buty)$Species == "Eubacterium_G ventriosum", > "buty-producing", > rowData(tse_buty)$Genus) > > # Agglomerate by Genus > > tse_buty <- agglomerateByRank(tse_buty, rank = "Genus") > > > # Rename the genus of all defined buty-producing genera to "buty-producing" in a new Genus_sub rowData column > genus_renamed <- lapply(rowData(tse_buty)$Genus, function(x){ > if (x %in% buty_genus) {"buty-producing"} else {x} > }) > > rowData(tse_buty)$Genus_sub <- as.character(genus_renamed) > > # Agglomerate by Genus_sub (which contains "buty-producing genera") > > tse_buty <- agglomerateByVariable(tse_buty, by = "rows", f = "Genus_sub") > > # Transform to relabundance > > tse_buty <- transformAssay(tse_buty, assay.type = "counts", method = "relabundance") > > # Subset by prevalence and detection level > tse_buty <- subsetByPrevalentFeatures(tse_buty, assay.type = "relabundance", prevalence = 5/100, detection = 0.1/100) > > # View buty-producing rowData > rowData(tse_buty) ["buty-producing", ] >

Hassan Diab (02:49:18) (in thread): > this is the rowname for buty-producing > > Warning: 'subsetByPrevalentFeatures' is deprecated. Use DataFrame with 1 row and 9 columns > Kingdom Phylum Class Order > <character> <character> <character> <character> > buty-producing NA Firmicutes_A Clostridia_258483 Lachnospirales > Family Genus Species Confidence > <character> <character> <character> <numeric> > buty-producing Lachnospiraceae Agathobacter_164117 NA 1 > Genus_sub > <character> > buty-producing buty-producing >

Tuomas Borman (04:59:41) (in thread): > Hello! > > You’re applying agglomeration correctly.agglomerateByVariableworks differently thanagglomerateByRank. It does simple merging without verifying whether the resulting taxonomy table remains meaningful. In this case, the taxa do not align with the “butyrate-producing” group (as you already noticed, it refers to the first taxa in the group). You are right that this is misleading. We check what to do with this issue

Leo Lahti (19:11:29) (in thread): > I hope this is fixed rapidly, in case this is a bug.

Leo Lahti (19:15:36) (in thread): > Btw, could this be simplified:genus_renamed <- lapply(rowData(tse_buty)$Genus, function(x){`` if (x %in% buty_genus) {"buty-producing"} else {x}`` })could also be:genus_renamed <- rowData(tse_buty)$Genus %>% as.character``genus_renamed[genus_renamed %in% buty_genus] <- "buty-producing"

2025-02-20

Tuomas Borman (02:04:56) (in thread): > This is not a bug in its “common” meaning. We just do not have way to combine the values from multiple taxa (the same goes with samples). > > The function hasarchetypeargument to control which rows to preserve > > Metadata from the rowData or colData are retained as defined by archetype > however, that specifies just an index for each preserved row (1 by default, i.e., first row is kept) so it does not help much > * How to combine taxonomy? We could replace merged values with NA. > * How to combine other character/factor values? We could replace them with NA, but it just removes information. Or we could instead store all unique values. > * How to combine numeric values? Mean/median might not make sense > That is why the function currently just returns the values of first member of each group > > One option is to improve documentation and maybe give warning

Tuomas Borman (12:08:12) (in thread): > rowData and colData could be merged like this but does this make sense?@Leo Lahti > > library(mia) > library(dplyr) > library(tidyr) > > merge_df <- function(df, f, tse){ > # Merge rows by group > df <- df |> > as.data.frame() |> > mutate(temporal_column = f) |> > group_by(temporal_column) |> > summarise( > across(where(is.numeric), \(x) mean(x, na.rm = TRUE)), # Average numeric columns > across(where(is.character), \(x) paste(unique(x), collapse = ", ")), # Unique character values > across(where(is.factor), \(x) paste(unique(as.character(x)), collapse = ", ")), # Handle factors similarly > across(where(is.list), \(x) list(unlist(x))) # Merge list columns > ) |> > ungroup() |> > select(-temporal_column) |> > DataFrame() > # Replace taxonomy ranks with NA > df[, colnames(df) %in% taxonomyRanks(tse)] <- NA_character_ > return(df) > } > > # Prepare data > data(GlobalPatterns) > tse <- GlobalPatterns > tse <- addAlpha(tse) > rowData(tse)[["butyrate"]] <- sample(c("butyrate", "not_butyrate"), nrow(tse), replace = TRUE) > colData(tse)[["some_random_list"]] <- split(rep(c("asd", "test"), ncol(tse)), rep(seq_len(ncol(tse)), each = 2)) > > # Merge rowData > df <- rowData(tse) > f <- df[["butyrate"]] > df <- merge_df(df, f, tse) > > # Merge colData > df <- colData(tse) > f <- df[["SampleType"]] > df <- merge_df(df, f, tse) >

2025-02-21

Leo Lahti (09:55:44) (in thread): > I suggest we discuss this next week at the office. It carries a lot of different aspects.

2025-03-31

Hassan Diab (04:23:54): > Hello, > > I want to calculate the PCA axes for a TSE object, using runPCA. I have two questions; > > 1) Is CLR-transformation recommended before runPCA? > 2) if so, does runPCA do the CLR-transformation internally, or should I runPCA on the CLR-transformed assay? > > Thanks!

Tuomas Borman (04:37:38) (in thread): > Hello, > > here is a table that summarizes the common options for ordination:https://microbiome.github.io/OMA/docs/devel/pages/beta_diversity.html > 1. CLR + PCA is also called as Aitchison distance which is a common option –> yes, you should apply PCA for CLR-transformed data > 2. runPCAdoes not do any transformations, you should apply it beforehands withtransformAssayand specify the CLR-transformed matrix withassay.type

Leo Lahti (09:03:04) (in thread): > Yes.

2025-04-08

Leo Lahti (16:51:27): > LEfSe now in Bioconductor with SummarizedExperiment supporthttps://academic.oup.com/bioinformatics/article/40/12/btae707/7908399?login=false

2025-04-26

Leo Lahti (12:10:41): > The enhanced version of robust CLR and robust Aitchison distance are now in the vegan/master in Github, hopefully soon also in CRAN (not sure how soon they upgrade CRAN next time):https://github.com/vegandevs/vegan/pull/667#event-17415567139

Leo Lahti (12:11:03): > This means that these are now getting available through mia as well.