#miaverse-admins

2021-08-15

Leo Lahti (17:58:48): > @Leo Lahti has joined the channel

Sudarshan (17:59:04): > @Sudarshan has joined the channel

FelixErnst (17:59:04): > @FelixErnst has joined the channel

Tuomas Borman (17:59:04): > @Tuomas Borman has joined the channel

Leo Lahti (18:00:32): > Hi all - the four of us have occasionally discussed organization of the project in a 4-person private chat. I decided to make this a team instead. Others may join if they like (I do not seem to be able to create private channels here) but I think that’s ok most of the time. I am also wondering if we should invite Domenick, Fiona and perhaps some others here.

Leo Lahti (18:07:16): > Our current organization,github.com/microbiome- has 29 repositories at the moment, after I deleted and archived some (of my own old ones). These can be categorized as follows: > 1. Inactive legacy repositories from the time before miaverse (mostly my own stuff + forks) > 2. Active legacy repositories, in particular the phyloseq-based microbiome R package and its tutorial > 3. miaverse R packages and resources (mia, miaViz, microbiomeDataSets, docker at least) > 4. miaverse educational material (course_2021_radboud etc) > I tend to think that it would be good for the project to organize this better. I could either move the non-miaverse stuff out of this organization, or we could move miaverse stuff to a new organization. Just in case, I have already reserved Github organization “miaverse” so we can use that if we like. That might be useful for building identity, and transition could be smooth, it can be done gradually so that we first use that as the final release point while still accepting PRs through microbiome organization for some time and in fact we could still keep misc educational and other material in the microbiome organization, and reserve miaverse purely for proper R packages and related stuff (docker containers?). But I have no problem in sticking to microbiome organization instead, and moving the older stuff elsewhere. Would be good to hear your thoughths on this.

Sudarshan (22:15:52): > Thanks Leo for the updates and efforts. I agree that in the long run miaverse-organisation seems appropriate. A clean slate. For microbiome pkg and related stuff it would be best not to alter as these are and will be actively used in near future ;) transition to Miaverse at this stage of dev is better than at later more advanced stage.

2021-08-16

Tuomas Borman (03:57:41): > I agree that miaverse organization for miaverse things would be nice

Leo Lahti (04:42:14): > Yes this is what I was thinking too - if we ever transfer to new organization, it is better to do asap.

Leo Lahti (04:42:34): > But let us see what@FelixErnstsays.

Leo Lahti (04:45:47): > If we move miaverse material in the newmiaverseorganization, we can also discuss if we want to have also the different online books there (course materials like “course_2021_radboud” etc. - we may have many of these in the future). In principle it is useful to have in the same place, the minus is that the content is more scattered if we have many different types of repositories and course materials are more specific to a given course and audience whereas R packages are more generic for the whole community. Perhaps this does not matter and we can still keep all in one place at least to start with.

2021-08-20

Leo Lahti (04:36:06): > In fact once we have decided this, I could polish OMA so that the public online version contains only more polished stuff and then we can add more finalized material gradually. The current beta version is a bit too complex at the moment to deserve being browsable online. We could keep that in devel branch and focus most development efforts there; and push ready made stuff to main.

2021-08-21

Leo Lahti (09:19:11): > Hi@FelixErnst- your comment here would be valuable. I am not willing to transfer packages to github/miaverse organization before we all agree on this.

FelixErnst (10:04:14): > Puh this is difficult to put into a clear decision. I am not sure what the benefit would be. The microbiome namespace is quite good and the miaverse is less catchy. Is a clear separation necessary? I mean have a look at Bioconductor. They have repos which haven’t seen a lot of activity in the last couple of years and are legacy IMO. So my gut reaction would be to keep the microbiome namespace as the centre and maybe fork interesting repos into miaverse. This could be like an onboarding targeted at more broad microbiome packages. If you really want to clean it up, move repos only you personally work on / have worked on to your personal namespace “antagomir”

Leo Lahti (11:31:31): > Thanks a lot for your view. I tend to have rather similar thoughts.

Leo Lahti (11:33:34): > We could do it like that. But then, if we keep the material in microbiome namespace, is there real additional benefit from forking material under miaverse namespace? I have noprob keeping all under microbiome for now (less hassle) and moving some other stuff elsewhere. I was mainly thinking that miaverse namespace could help with branding. But it is possible that the added benefit is quite marginal.

Leo Lahti (11:35:17): > So I am asking if we should make a decision to fork / not fork miaverse parts under miaverse namespace while keeping microbiome namespace as the main development hub. I would lean towards not forking (at least not for now) because that is easily creating more confusion than helping. People will find both and will need to do more work to find out they should submit the PRs in the first place.

Leo Lahti (11:38:08): > Perhaps my suggestion is then: > 1. keep miaverse namespace reserved but not action right now (excepet perhaps cross-linking) > 2. move personal / legacy stuff out of microbiome namespace > 3. some house-keeping on the microbiome namespace to improve clarity

2021-10-27

Leo Lahti (17:44:18): > To be honest, I have concluded that I am not a big fan of the current project name “miaverse” as it is somewhat unclear. But I think it was better than MicrobiomeExperiment which is also very generic. On the other hand, phyloseq is also not very clear name but now everyone knows it anyway and refers to it fluently. So, I am not sure. I would still like probe if there are better suggestions, I do not think it is too late to upgrade the name a bit. So far it is just name of slack channel and has been mentioned on some digital materials but not so many.

2021-10-28

Tuomas Borman (02:47:54): > I like miaverse, especially the part “mia”, which I think is quite clever. > > But it’s true that no one knows what this is for, if user just reads “miaverse”. Maybe the name should have “microbiome” in it? That kind of name might be little bit more “boring”, but since this is not advertised as intensive as commercial products the name should describe more I think > > “MicrobiomeAnalysis” or similar might be one suggestion that might be easily justified, because then all the packages have name derived from it. > > I think this is worth to discuss and I think that this is one of the latest points to change the name > > My opinion is that miaverse is still good name, but if there is better suggestions then we could consider that

Sudarshan (11:05:03): > One suggestion I can think of is just calling this project microbiomicsR because we hope to have omics expansion/support as next step to phyloseq and microbiome R packages

Sudarshan (11:05:29): > OrR4Microbiomics > There is aMicrobiomeAnalyst**tool so MicrobiomeAnalysis will be too close to it.

Sudarshan (11:14:58): > RToolsForMicrobiomics

2021-12-28

Leo Lahti (05:06:41): > I suggest something like this for EuroBioC submissions, is anyone else interested? It will be in Heidelberg in March and would be nice to meet if you can make it. I try to come/go. > * Workshop/talk: “Advances in microbiome analysis with SummarizedExperiment: miaverse” (Leo or other maintainer; in workshop mode we could plan for more interactivity and make it a planning/feedback session) > > * Talk/Poster: Multiomics and/or visualization capabilities in miaverse (Tuomas) > > * Talk/Poster: miaSim (microbiome time series simulation) or microbiomeDataSets (Yagmur)

Leo Lahti (05:07:00): > It is low cost event and planned to be in-person.

2022-01-08

Leo Lahti (06:05:54): > It was suggested in a recent mia issue that a new package for microbiome clustering (miaClust..?) could be created, by following good practices inblusterand more generally SCE community:https://github.com/microbiome/mia/pull/187

Leo Lahti (06:08:17): > I tend to think that this is a feasible idea. But it would also be a larger undertaking. Before proceeding, I was thinking whether we should talk more closely with Aaron Lun (the author ofbluster). Felix do you have a good contact with him? I can also ask but if you know well it might help to see if this is something to do together. Unless he would prefer extending the bluster pkg itself, which I doubpt.

2022-01-11

Leo Lahti (12:56:48): > PR toveganwas like a trip back to early 2000’s. Fortunately, I could still remember some tricks with Rnw etc.

2022-02-10

Leo Lahti (10:33:28): > I think we put here something again - an update of the progress during the past year. > > Submissions for Bioc2022 Conference are now open. The deadline to submit is March 9, 2022.https://bioc2022.bioconductor.org/submissions/ - Attachment (bioc2022.bioconductor.org): Submissions > Submissions

2022-02-26

Leo Lahti (05:18:22): > I have created a public email list for this projectmicrobiome@utu.fi

Leo Lahti (05:18:26): > Subscribe athttps://lists.utu.fi/mailman/listinfo/microbiome

Leo Lahti (05:19:00): > Archives accumulate at:https://lists.utu.fi/pipermail/microbiome/

Leo Lahti (05:19:54): > I have also added this info tohttps://microbiome.github.iohomepage

Leo Lahti (05:21:04): > I get many email requests to my personal email now, and thought that a public email list could be a way to share some of that load among the many contributors.

Leo Lahti (05:21:24): > If everyone is happy and this seems to work, I will also announce it in#miaversechannel.

2022-02-27

Sudarshan (05:41:31): > Thanks for creating this!

2022-03-19

Leo Lahti (13:12:20): > Fyi we will be teaching mia framework in several international courses this year, details TBC later but tentatively: Oulu/Finland 6/2022 - Nijmegen/NL 7/2022 - Pune/India 8/2022 - CSC Finnish supercomputing center 9/2022 - ML4microbiome COST training Barcelona/Spain 10/2022 + perhaps more. All this will push the OMA and mia packages fwd as well.

2022-05-20

Michal (09:06:58): > @Michal has joined the channel

2022-05-30

Leo Lahti (16:13:36): > Hi Michal - just FYI this channel is meant for the#miaverseadmins, this is why we are so few here. Anyway if you like to hang around I would think it is ok for now, we do not really have secrets.

Leo Lahti (16:20:03): > Hi@FelixErnst@Sudarshan- the EuroBioc DL is postponed till June 10:https://eurobioc2022.bioconductor.orgMe, and I think also@Tuomas Bormanand@Yagmur Simsekwho are developers, will go there in live mode, and* *was thinking that a birds-of-a-feather session on the development of this framework would be a good way to attract potential new contributors, or at least to collect good feedback. Are you planning to join in Heidelberg, Sep 14-16? > > I was thinking that it could be most clear to list in the abstract those authors who will come in the meeting and run the session. We will naturally acknowledge everyone involved there on spot. If you have other suggestions let me know. > > The development is proceeding on a constant pace. I expect more development this summer as we are working on various course materials and utilizing contributions from interns. Everyone’s experience and help in the process will be really valuable.

2022-05-31

Sudarshan (03:00:54) (in thread): > This will be during deadlines for two personal grant submissions :( I will not be able to join

Sudarshan (03:01:18) (in thread): > Okay with author suggestions you made:slightly_smiling_face:

Leo Lahti (06:13:00) (in thread): > okke!

2022-09-24

Sudarshan (03:08:39): > https://www.researchsquare.com/article/rs-1284357/latest.pdf

Sudarshan (03:09:04): > MicrobiotaProcess: A comprehensive R package for managing and analyzing microbiome and other ecological data within the tidy framework

Leo Lahti (10:53:00): > Cool, they support TreeSE!

2023-03-12

Leo Lahti (10:00:54): > Hi all - and especially@FelixErnstwho discussed this with me initially. There have lately been wishes to change the OMA license from CC-BY-NC to CC-BY. > > We initially decided to have the NC clause in because it is specifically used with books (e.g. Modern Statistics for Modern Biologyhttps://web.stanford.edu/class/bios221/book). > > On the other hand, our companion, the OSCA book, seems to be now with full CC-BY:https://bioconductor.org/books/release/OSCAI do see some risk in removing the NC clause but I also think that it is not obvious if this would be doing more good than harm. We are still the main developers with all the expertise, and if someone would take and publish this as book without including us that might still be beneficial in other ways. > > Now the Outreachy applicants cannot contribute to OMA because Outreachy does not allow contribucitions to non-open resources, and NC licenses are not open (according to the OS definition, which is fair in my opinion, too). > > I would like to reopen this discussion a bit and see if we should keep the current license or could/should we switch to CC-BY.

2023-03-13

Tuomas Borman (02:45:10): > 1. I don’t have strong opinion on this. > 2. OMA is still beta version. We have second decision point afer we have polished the book (hopefully this year) –> it could make more sense then to change the licence > 3. It would be weird situtation if someone decides to publish this without us –> I prefer maybe to keep this as it is for now and chnage whole miaverse to open-source when we have published this formally

Leo Lahti (03:13:16): > Thanks for your views. I think that OMA is the only part of the framework that does not have an open license (although CC-BY-SA-NC is still semi-open). I agree that releasing it upon formal publication could be one feasible option.

Leo Lahti (04:51:15): > The challenge is that the more there are contributors the more complex it will become to ask everyone’s approval for license change (that is necessary)

Sudarshan (14:41:22): > It is better to remove NC and make it fully open source. Since we have a github history of development any unfair publication without developers will not be possible. It will save troubles at a later stage for approval from everyone. I would also emphasize that publication at this stage should also be possible like the OSCA. Works like these will constantly undergo development/ improvements and never be finalized:stuck_out_tongue_winking_eye:For me it already seems challenging with the new job to make time to contribute:sweat_smile:

Sudarshan (14:44:45): > A publication of miaverse will also help me make a case for implementing it at Danone for multi omics microbiome data science.

Leo Lahti (15:36:49) (in thread): > I am confident that we will advance in this during 2023.

2023-03-19

Leo Lahti (14:08:31): > Hey guys - I just submitted short talk to BioC2023. The DL is today and I couldn’t decide earlier so I didn’t check with you but I assume it is ok. I am not sure about Felix position, so to be on the safe side I only included me, Tuomas and Sudarshan as co-authors list. We might be able to update this if there are preferences and I will nevertheless acknowledge everyone in the talk, this is just one virtual event abstract so I thought it is ok with this. It is good for the project to maintain visibility. The abstract is as follows: > > “A number of R/Bioconductor packages for microbiome data science have been released in the recent years. The majority of the existing frameworks and packages focus on the analysis of taxonomic profiling data generated by phylogenetic microarrays, 16S amplicon sequencing, or metagenome analysis. However, there is an increasing need to integrate taxonomic profiles with other measurement types, such as transcriptomics, metabolomics, host genomics, cross-kingdom analysis, and hierarchical side information of the features and samples. Modern data containers, such as the MultiAssayExperiment, have opened up new opportunities for developing a systematic approach to multi-table data integration in microbiome studies. A dedicated ecosystem of tools for microbiome analysis can provide valuable, targeted tools for those focusing on microbiome and metagenome profiling data sets. We will discuss recent advances in microbiome data integration and the associated package ecosystem in R/Bioconductor, highlighting links to related work in other areas such as transcriptomics and single-cell studies that are greatly accelerating these community-driven efforts.”

Leo Lahti (14:08:54): > We should also submit to EuroBioc. The DL is in April and the event is in September (Ghent). I am going but I am also in the org. committee. Any volunteers to lead that one for the mia etc. abstract?

2023-03-20

Tuomas Borman (04:53:33): > Yes, very good; visibility is always good (Bioc Asia might be also worth to consider) > > I’m interested on EuroBioc, i’m considering having a poster

Leo Lahti (05:14:17): > Yes indeed BioC Asia, too (@Sudarshan..?) - I think that DL is a but later. > > For EuroBioC I would suggest to consider birds-of-a-feather, which can be useful to have dedicated community discussion on the development plans including only those who are interested in that session- they will then drop it into poster if there is too much.

2023-03-21

Sudarshan (12:40:26): > FYIhttps://www.sciencedirect.com/science/article/pii/S2666675823000164#mmc1 - Attachment (sciencedirect.com): MicrobiotaProcess: A comprehensive R package for deep mining microbiome > The data output from microbiome research is growing at an accelerating rate, yet mining the data quickly and efficiently remains difficult. There is s…

Sudarshan (12:40:41): - File (JPEG): Screenshot_20230321_173951_Samsung Notes.jpg

2023-04-03

Leo Lahti (06:40:35): > fyi I have sumitted a mia abstract to “World of Microbiome” conference, the DL is today. I put the names of me & Tuomas in the oral abstract and I will clearly acknowledge all contributors in the talk. I hope this is ok for all.

2023-07-30

Leo Lahti (06:14:51): > Relevant. Picker up from#biocbooks

Leo Lahti (06:14:52): > https://community-bioc.slack.com/archives/CM2CUGBGB/p1689964337695179 - Attachment: Attachment > <!here> Sometime in the next release cycle, I would like to switch scran and scuttle to use libscran with the new tatami representations. This should give several-fold speed-ups for large datasets… > > But it is likely that all the results will change. And if I do this, the book will break (see https://www.youtube.com/watch?v=NCBUBP4Ll9I for details). The question is what everyone here wants to do about it. > > One option is to go ahead with the breaking changes. This could be a good opportunity to streamline and update the book. For example, I’ve come around to the idea that further QC on top of Cellranger’s emptyDrops-based cell filtering is unnecessary. > > The other option is to do all my changes in a new package, e.g., scran2. This will preserve the book builds but the book’s contents will be obsolete. > > I would obviously prefer option 1, but I can only do so much, and I’ll have my hands full with the packages. Is there anyone who is interested in working on the OSCA book 2nd edition? There’s probably a paper in there somewhere about how to write this kind of book (e.g., dir.expiry, rebook).

Leo Lahti (13:03:16): > My draft slides for Bioc2023. I will record presentation on Monday. If you have suggestions on things to add/fix/remove, or other feedback that’s welcome. I will still have the final look also myself on Monday. Would like to improve the narrative on the slides a bit and highlight the latest developments. But the talk is just 9 min and introduction is needed, too. - File (PDF): bioc2023.pdf

2023-08-04

Leo Lahti (04:59:55): > Some problems with updated function names in OMAhttps://github.com/microbiome/OMA/issues/319It seems this is because many people still have older pkg versions installed while OMA uses the latest function names. > > Just thinking if we should switch back to the old names in OMA in order to allow some more time for various users to gradually upgrade their packages. Hmm. Not necessarily a big change but a bit incovenient..

Leo Lahti (05:00:22): > In the longer term OMA should be tied to Bioconductor release versions I think.

Leo Lahti (05:03:56): > I would rather not change. The downside could be that some people get bad experiences with things not working, and become discouraged using the material

Tuomas Borman (05:18:57): > Probably better to not switch back and forth –> I would rather keep the names as they are now –> let’s link OMA to Bioc release in the future > > Can we point this issue somehow? Can this issue in github be highlighted or something?

Leo Lahti (06:32:43): > I agree

Leo Lahti (06:33:34): > Hmm OMA package installation instructions at least currently also show how to install mia etc from Bioconductor but then on the other hand sometimes the book uses the latest github version of mia etc.

Leo Lahti (06:34:09): > We could bind OMA to Bioc devel version and make sure that the book only uses stuff that has gone through to Bioc devel. That source we can update on a weekly basis if need be.

Leo Lahti (06:34:23): > (but hopefully no weekly need)

2023-09-15

Leo Lahti (04:52:34): > @Leo Lahti has joined the channel

2024-02-01

Leo Lahti (18:01:01): > I was planning to submit abstract on our microbiome framework to ISME. Let me know if you have any feedback:

Leo Lahti (18:01:01): > https://docs.google.com/document/d/1cNToU38EKefgw1THBaV6xgOkW6IuROeHv8oEG1L4sgc/edit?usp=sharing

2024-07-07

Leo Lahti (04:56:22): > Join us on Tuesday, July 16th for ahashtag#tidyomicsZoom celebration and an open discussion about future plans!:star2::date:Timezones: > > 8:30 AM US (New York, EDT) > 1:30 PM Europe (London, BST) > 10:30 PM Australia (Sydney, AEST):link:Zoom:https://buff.ly/3xJlMC2:closed_lock_with_key:Password: 542990 - Attachment (LinkedIn): LinkedIn Login, Sign in | LinkedIn > Login to LinkedIn to keep in touch with people you know, share ideas, and build your career. - Attachment (Zoom Video): Join our Cloud HD Video Meeting > Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom Rooms is the original software-based conference room solution used around the world in board, conference, huddle, and training rooms, as well as executive offices and classrooms. Founded in 2011, Zoom helps businesses and organizations bring their teams together in a frictionless environment to get more done. Zoom is a publicly traded company headquartered in San Jose, CA.

2025-01-19

Thomaz Bastiaanssen (06:38:03): > @Thomaz Bastiaanssen has joined the channel

Thomaz Bastiaanssen (17:08:10): > Hi! Regarding laying out alpha diversity indices in chapter 13, I’m currently including the figure straight from the preprint, also attached for convenience. > Of course, I’m quite happy for us to use this figure here, but do you think a table would be more helpful here? > I could make a reasonably fancy table with thegtlibrary?https://gt.rstudio.com/ - Attachment (gt.rstudio.com): Easily Create Presentation-Ready Display Tables > Build display tables from tabular data with an easy-to-use set of functions. With its progressive approach, we can construct display tables with a cohesive set of table parts. Table values can be formatted using any of the included formatting functions. Footnotes and cell styles can be precisely added through a location targeting system. The way in which gt handles things for you means that you don’t often have to worry about the fine details. - File (PNG): fig_13_1_alphadiv.png

Thomaz Bastiaanssen (17:18:31): > Further, for chapter 14 on beta diversity, In the guidebook we use a decision tree to convey how different beta diversity indices relate. > I stand by the decision tree and I would again be delighted to use it in OMA - it suits the structure we discussed nicely. > However, I do think it is quite an opinionated figure so I’d like to run it by you. Figure attached - part β:grin: - File (PDF): fig_2_new_diversity-1.pdf

2025-01-20

Tuomas Borman (02:33:58) (in thread): > Hi! > > The table sounds good idea. That way we can also update it very easily in the future. If something can be done merely based on R, it should be preferred. Some illustration figures could also be done in R > > (I have done couple figures with chatgpt by inputting my figure made with another program. It gave ggplot code which was rather simple to modify to look as desired)

Tuomas Borman (02:36:15) (in thread): > I like the tree. For reader that do not know what to use, this is easy way to make the decision

Thomaz Bastiaanssen (02:58:09) (in thread): > Sounds good! I’ll give it a go with a table :)

2025-01-21

Muluh (02:23:05): > @Muluh has joined the channel

2025-01-22

Thomaz Bastiaanssen (09:49:47) (in thread): > Hi, please find a screenshot of the new table attached, code in my recent update to PR. - File (PNG): Screenshot_20250122_154838.png

Thomaz Bastiaanssen (09:53:08) (in thread): > I’m currently going with “Hill number” rather than coefficient - just because number seems to be more commonly used in literature - happy to change ofc

Tuomas Borman (10:12:49) (in thread): > Well, looks pretty awesome!!! > > I think “hill number” is ok, but it should be described (I am not microbial ecologists, this was first time I heard “hill number”)

Thomaz Bastiaanssen (10:14:51) (in thread): > Glad you think so! > Yes, hill numbers are covered in the text just above the table (found in PR)

2025-01-24

Thomaz Bastiaanssen (05:59:46): > Hi@Tuomas Borman, regarding rendering the equations, I think it may be a katex vs mathjax issue. I can look into it - I’m using mathjax locally it seems. > > I was wondering whether there is a reason that the _quarto.yml file saysmathjax: null?

Thomaz Bastiaanssen (06:00:32): > (katex was also giving me grief initially, namely)

Tuomas Borman (06:02:41): > Hi, how can you specify to use mathjax? > > I just tested locally without building the book. So mathjax: null was not my issue. Although, we have to probably change that also before merging

Tuomas Borman (06:07:52): > Anyways, have you tested to render the whole book and check that the table renders correctly? > 1. You could first temporarily remove all other files from here to speedup rendering:https://github.com/microbiome/OMA/blob/devel/inst/assets/_book.yml > 2. If the rendering worksBiocBook::preview(BiocBook::BiocBook('.')), then all the settings should be ok

Thomaz Bastiaanssen (07:47:27): > Rendering works - just doing some final checks

Thomaz Bastiaanssen (08:08:44): > All good on my end:+1:

Tuomas Borman (10:12:36): > The PR is failing due to phyloseq installation problem, I modified so that phylsoeq is installed from GH > > phyloseq has not been available long time via Bioconductor because of build problems:https://bioconductor.org/checkResults/devel/bioc-LATEST/phyloseq/nebbiolo1-buildsrc.html

Tuomas Borman (10:13:30): > I tried to fix the problem, but apparently we have to still wait somethinghttps://github.com/joey711/phyloseq/pull/1777 - Attachment: #1777 Fix tables and Bioconductor build > Hi! > > I noticed that the Bioc devel build fails: > > https://bioconductor.org/checkResults/devel/bioc-LATEST/phyloseq/nebbiolo1-buildsrc.html > > This is because the vignette build fails: > > > --- re-building ‘phyloseq-basics.Rmd’ using rmarkdown > YAML parse exception at line 2, column 6, > while scanning a block scalar: > did not find expected comment or line break > Error: processing vignette 'phyloseq-basics.Rmd' failed with diagnostics: > pandoc document conversion failed with error 64 > --- failed re-building ‘phyloseq-basics.Rmd’ > > > > > The reason seems to be that pandoc incorrectly interprets the “—” before and after tables as yaml metadata block. (However, I am not sure why this is affecting only devel version as the files seems to be the same; maybe some changes in rendering engines…) > > I was able to reproduce the error locally, and it was fixed by adding line breaks between the table and “—”. The content seems to be showing as intended. > > -Tuomas

Tuomas Borman (10:14:09): > I guess I saw that problem in December first time

Thomaz Bastiaanssen (11:23:30): > Ok - makes sense! > Next, I’d like to prepare a PR for two things: > * Apply same structure to other chapters where it makes sense > * For chapter 14, beta diversity, lay out the different metrics and integrate the decision tree.

Tuomas Borman (11:40:35): > Sounds good!

Tuomas Borman (11:43:28): > My fix did not fix the problem as we have some packages that still tries to install phyloseq from Bioconductor and fails… > > I try to think solution for this (maybe we can merge it anyways, but this failing means that the changes are not visible in the book itself, yet) > > Error: Failed to install 'OMA' from local: > 2025-01-24T16:21:05.1506268Z Failed to install 'NetCoMi' from GitHub: > 2025-01-24T16:21:05.1507096Z [33m![39m System command 'R' failed > 2025-01-24T16:21:05.1507624Z Execution halted >

Thomaz Bastiaanssen (11:53:58) (in thread): > Ok, makes sense - just thinking out loud - is it perhaps feasible to definephyloseqversion in DESCRIPTION to ensure github version?

Tuomas Borman (12:00:53) (in thread): > The phyloseq installation in OMA is already defined so that it fetches the github version. The problem is caused because NetCoMI is trying to install it from Bioconductor > > –> If we put the phyloseq before NetCoMI package in the lsist, it might be that NetCoMi will not try to install phyloseq again from Bioconductor

Thomaz Bastiaanssen (17:40:15) (in thread): > Good call!

2025-02-04

Tuomas Borman (12:56:01): > Hello@Aura RauloJust to make sure that there is no overlapping work, Aura was planning to go through chapters 6-12.@Thomaz Bastiaanssenyou were not currently working with these chapters, right?

Thomaz Bastiaanssen (13:25:15): > Hi, yes, that’s correct! No issues on my end

Leo Lahti (15:53:55): > Way to go

2025-02-06

USLACKBOT (09:58:57): > Knowles Labhas joined this channel by invitation fromcommunity-bioc.

UH3H1899P (09:58:57): > @UH3H1899P has joined the channel

2025-02-07

UH3H1899P (10:11:41): > Hello! I have a quick and probably a bit nitpicky qustion about the “getTaxonomyLabels”-example code in OMA chapter 6: > > phylum <- is.na(rowData(tse)$Phylum) &vapply(data.frame(apply(rowData(tse)[,taxonomyRanks(tse)[3:7]], 1L,is.na)), all,logical(1))getTaxonomyLabels(tse[phylum,])” > > Would there perhaps be a shorter and more elegant way of coding the list of phyla in this tse object? No worries if not, this just seemed a bit extensive for the simple example…

Tuomas Borman (10:20:29): > Yes, that is not a good example. The idea is to show how with.rank works. That example fails to show that because it is too complex > > How about this? > > # Select those features that have Species level info > rank_found <- !taxonomyRankEmpty(tse, rank = "Species") > getTaxonomyLabels(tse[rank_found,]) |> head() > getTaxonomyLabels(tse[rank_found,], with.rank = TRUE) |> head() >

UH3H1899P (10:35:06): > yeah much clearer thanks! I’ll add this to my edits on the chapter

Leo Lahti (16:07:47): > Yep, it’s a good catch. There are some too complex examples currently that do not make sense to have really.

UH3H1899P (16:30:53): > Ok I got through some chapters and suggested some edits, but I’d like to continue a bit next week if that’s ok (still chapters 6-12). I’ll then make a pull request of these changes together. this ok?

2025-02-08

Tuomas Borman (02:31:20): > Sounds good!

2025-02-12

UH3H1899P (05:26:05): > Hi all, could someone help explain thesplitOn()function to me a bit more in detail? What does it mean to “split data”?

UH3H1899P (09:35:01): > Heya, I now created a pull request on edits I made to chapters 6-8 to enhance clarity for biologist users. > > Note that these commits also include some more elaborate edits I made to the text bits (not the code bits) of chapter 18 (Networks_learning) before Christmas, but these should not be merged into the main before we have discussed further with the original authors of this chapter 18, as they previously said they “do not approve any changes to this chapter”.

UH3H1899P (09:35:37): > I will continue this editing work in a couple of weeks from now and move on to chapters 9-12. I’ll message you to ask before I start that

UH3H1899P (09:50:08): > Moreover, in addition to the changes I made to the book directly, I have gathered some points, questions and suggestions of these chapters in this separate document. > > In the end of this document I have also included a description of what I think is not great about the existing chapter 18, and a record of my email exchange with the original author of this chapter, Christian Müller. - File (Word Document): OMAnotes 2.docx

2025-02-13

Tuomas Borman (02:26:25) (in thread): > Hi, > > if you have data in single TreeSE and the data includes some kind of grouping, you can use splitOn. > > For instance, if the data includes samples from multiple cohorts, you can divide these cohorts into separate TreeSE objects. The output is a list of TreeSEs; each TreeSE contains samples from single cohort. > > Moreover, this can be done also row-wise. You might want e.g. to divide bacteria and viruses to separate objects. > > Does this clarify, I can also give code example

Tuomas Borman (02:30:08) (in thread): > I can go these through today. > > One option could be that we move these network chapters to “Extra material” as we did with some material from Himel Mallick’s group. –> We can keep the material but create our own chapters on networks (It is little bit tricky situation) > > The problem is also that they are using packages outside Bioconductor and CRAN. We should not use them

Tuomas Borman (05:43:46): > The suggested changes in the PR (and related comment with “–> is this ok?” in the word file) are ok. > > I will paste your comments here as it might be good to discuss them

Tuomas Borman (05:44:18): > Chapter 6: > * ” In this chapter, we will refer to the co-abundant groups as CAGs, which are clusters of taxa that co-vary across samples.” > * –> This comes a bit out of nowhere. If it refers to some dimensionality reduction process, this should be introduced first. And perhaps cags can be defined when they come relevant in the example, rather than up front without context

Tuomas Borman (05:44:30) (in thread): > This is probably something that is just left from previous versions. I think this can be removed.

Tuomas Borman (05:45:40): > Chapter 6: > * “The dada2 package [@Callahan2016dada2] implements the assignTaxonomy() > > > function, which takes as an input the ASV sequences associated with each > > row of data and a training dataset” > > –> What is meant with “row of data” here? Does it refer to sample or demultiplexed sequence variant? I recommend adding this information, as the row/column structure of data varies between applications and does not tell the reader what is meant here.

Tuomas Borman (05:45:58) (in thread): > It refers to demultiplexed sequence variant; the output of DADA includes those unique ASVs, and we can then assign taxonomy to them. > > However, I think we can consider modifying this “Assigning taxonomic information”. I feel that it is currently unclear what is the message of this subchapter. > > The “Taxonomic information” chapter currently discuss about handling the taxonomy information. This “Assigning taxonomic information” subchapter refers to 16S tools which should be done in prior to mia. Moreover, 16S is just one way to do it, metagenomics is not discussed here at all. > > Suggestions: > > 1. Add short subchapter that describes the upstream bioinformatics pipelines. We should keep the information very limited and just direct user to additonal resources (maybe this:https://www.metagenomics.wiki/) as the focus of OMA is not upstream tools. > 2. mia includes converter for DECIPHER::IdTaxa results. We can add that info to chapter that discuss about converters. - Attachment (metagenomics.wiki): Metagenomics > Exploring microbial communities in the human gut, on plant roots, or in soil and water samples. > → Metagenomics and the microbiome > > → Metagenomics Tools > Sequence data analysis > → BLAST → Bowtie2 → SAMtools → Assembly → Shotgun sequencing → 16S tools → Ubuntu / Linux server > > → Taxonomy

Tuomas Borman (05:46:42): > Chapter 6: > * “Since the rowData can contain other data,taxonomyRanks()will return the columns that mia assumes to contain the taxonomic information” > * –> Clarify what you mean with “other data” here. Say for example “Since the rowdata can contain both taxonomy-related and other data, use taxonomyRanks() to return columns of rowData that mia assumes to hold the taxonomic information. “ - Attachment (rdrr.io): Find an R package > Find an R package according to flexible criteria

Tuomas Borman (05:46:49) (in thread): > I agree. Gram-positive and negative can be an example of “other data” that most people know

Tuomas Borman (05:47:12): > Chapter 6: > * “getTaxonomyLabels()is a multi-purpose function, which turns taxonomic information into a character vector of length(x)…By default, this will use the lowest non-empty information to construct a string with the following scheme level:value. If all levels are the same, this part is omitted, but can be added by setting with.rank = TRUE.” > * –> This part is a bit unclear and it’s not entirely clear here in what order this character vector is. What does “lowest non-empty information” mean? Is there a more down-to-earth way of saying this, like “it will unravel the taxonomy information into a vector in the order of xxx (row by row? Column by column?), made unique.”

Tuomas Borman (05:47:29) (in thread): > * It returns a vector in same order than the original rows are, i.w, for each row it returns new label. > * Is this better: “The function extracts the most specific available taxonomic information for each row and ensures that the resulting labels are unique.”

Tuomas Borman (05:48:34): > Chapter 6: > * “To apply the loop resolving function resolveLoop() from the > > > TreeSummarizedExperiment package [@R_TreeSummarizedExperiment] within > > getTaxonomyLabels(), set resolve.loops = TRUE. > > –> Clarify what this is or make this into an anecdote.

Tuomas Borman (05:48:41) (in thread): > Loops can occur when the same taxon is associated with multiple conflicting parent taxa. This function ensures these taxa are treated as unique.

Tuomas Borman (05:49:19): > Chapter 6: > * “Although the linkages between rows and tree nodes > > > remain correct, the tree retains its original, complex structure.” > > –> Not sure I understand what this means. Is it that the tree tips are not equal to the taxa/features in the row data anymore, but somehow the linkage between them still remains correct? Do you mean a different thing with “nodes” and “tips”. Can you clarify.

Tuomas Borman (05:49:37) (in thread): > When subsetting, the phylogeny is kept unchanged. This means that the tree includes the original set of tips instead of just those that are includes in subsetted dataset. > > To drop those tips that are not included anymore in the dataset, we can prune the tree.

Tuomas Borman (05:50:03): > Chapter 6: > * ”A hierarchy tree shows mapping between the taxonomic levels in taxonomic rank table (included in rowData), rather than the detailed phylogenetic relations. Usually, a phylogenetic tree refers to latter which is why we call here the generated tree as “hierarchy tree”.” > * –> Bit more clarity could be good here on how the hierarchy tree differs from a normal phylogenetic tree. What you mean with “detailed phylogenetic relations” and why are these different from the inferred taxonomic relations? > > >

Tuomas Borman (05:50:16) (in thread): > * Edges in “real phylogeny“ reflects the genetic distance between species. In hieararchy tree the length of edges are equal, i.e., no “detailed phylogenetic relations”

Tuomas Borman (05:50:33): > Chapter 7: > * What does “split data” really mean. Could be good to explain and give an example

Tuomas Borman (05:50:52) (in thread): > > > data("Tengeler2020") > > tse <- Tengeler2020 > > tse_list <- splitOn(tse, "cohort", by = "samples") > > tse_list > List of length 3 > names(3): Cohort_1 Cohort_2 Cohort_3 >

Tuomas Borman (05:51:50): > Chapter 7: > * ”Similar steps can also be applied to rowData. If you have an assay whose rows and columns aling with the existing ones, you can add the assay easily to the TreeSE object.” > * > * –> Here, I think the example does not necessarily follow logically from the first sentence. After the example of how colData can be modified by adding/ changing the sample variables, if we say “similar steps can be applied to rowData”, that sounds like we are going to introduce a similar example of modifying/adding variables in the rowData. However, the following example is about the assay. I suggest change the first sentence to either talk about how rowData columns can or cannot be modified (and then move on to talk about modifications of assays) OR talk only about how we can also modify the assays associated with the rowData. > > >

Tuomas Borman (05:52:04) (in thread): > The idea is to say that rowData() is simlar to colData(). They are both DataFrame objects that can be modified with similar methods. Would be good to clarify this.

Tuomas Borman (05:52:38): > Chapter 8: > > “ However, subsetting by feature implies a few more obstacles, such as the presence of NA elements and the possible need for agglomeration.” > > –> These examples in this sentence are not informative as the reader does not necessarily know what “NA elements” or “agglomeration” mean at this point. I suggest clarify what these mean here.

Tuomas Borman (05:52:57) (in thread): > We could add link to agglomeration chapter and refer Nas as missing taxonomy infromation. Then we could show how to check missing taxonomy withtaxonomyRankEmpty(tse, rank = "Phylum")

Leo Lahti (10:31:00) (in thread): > We do have functions to group features based on similarity/correlation and this can be useful for dimension reduction (in metabolomics they routinely do that, in microbiome research it has been referred to e.g. as “amalgamation” but less commonly done). It can be left out but it is potentially useful feature for crude analyses.

Leo Lahti (10:33:22) (in thread): > We can primarily aim to cite external papers or online resources for more info on these?

Leo Lahti (10:34:06) (in thread): > I trust you can see what’s a good balance, but let’s try to avoid writing stuff that merely replicates other sources and stuff that will be a challenge to maintain over time.

Leo Lahti (10:34:54) (in thread): > Also e.g. “butyrate producer” is a category we have actually used in some published research

Leo Lahti (10:35:20) (in thread): > also mappings of taxa to pathways / functions could constitute such info

Leo Lahti (10:35:52) (in thread): > Could be made shorter and simplier, and demonstrating the point just by a code example?

Leo Lahti (10:36:14) (in thread): > Seeing the input and output will sometimes say more than verbal explanations

Leo Lahti (10:38:33) (in thread): > Yes, so in this case it splits the original tse data object into three distinct objects

Leo Lahti (10:38:49) (in thread): > in this case by samples (splitting by features is also possible)

Tuomas Borman (10:50:27) (in thread): > We could keep this information, but I think it is currently in wrong place. This sounds some kind of agglomeration and if it is, it could fit in that chapter > > tse <- addCluster(tse, name = "group", ...) > tse <- agglomerateByVariable(tse, by = "rows", "group") >

Tuomas Borman (10:53:06) (in thread): > I agree. However, there has been questions on “where to get the data”, so it might be good to explain mia framework’s position in larger context. This should not take more than 1-2 paragraphs

Leo Lahti (12:53:31) (in thread): > yes fine

2025-02-14

Ben Valderrama (07:43:54): > @Ben Valderrama has joined the channel

2025-02-17

UH3H1899P (04:17:46): > Thanks a lot Tuomas! I’m on a little holiday this week, but will get back to these asap next week!

UH3H1899P (04:20:25) (in thread): > sounds good. But even if they were extra material, we should be able to modify the material a bit to ensure people dont use those methods with wrong interpretations.

2025-02-25

Tuomas Borman (07:34:56): > Hi@Thomaz BastiaanssenAre you working with DAA chapter? I talked with Juho Pelto and he would be interested to contribute in upcoming weeks to DAA chapter. He wrote this nice paper with others:https://arxiv.org/abs/2404.02691 - Attachment (arXiv.org): Elementary methods provide more replicable results in microbial differential abundance analysis > Differential abundance analysis is a key component of microbiome studies. While dozens of methods for it exist, currently, there is no consensus on the preferred methods. Correctness of results in differential abundance analysis is an ambiguous concept that cannot be evaluated without employing simulated data, but we argue that consistency of results across datasets should be considered as an essential quality of a well-performing method. > We compared the performance of 14 differential abundance analysis methods employing datasets from 54 taxonomic profiling studies based on 16S rRNA gene or shotgun sequencing. For each method, we examined how the results replicated between random partitions of each dataset and between datasets from independent studies. While certain methods showed good consistency, some widely used methods were observed to produce a substantial number of conflicting findings. Overall, the highest consistency without unnecessary reduction in sensitivity was attained by analyzing relative abundances with a non-parametric method (Wilcoxon test or ordinal regression model) or linear regression (MaAsLin2). Comparable performance was also attained by analyzing presence/absence of taxa with logistic regression.

Thomaz Bastiaanssen (07:56:30) (in thread): > Hi Tuomas! No, I’m not working on the DAA chapter. please go ahead! > My next focus for OMA will be the Beta diversity chapter.

Tuomas Borman (08:08:06) (in thread): > Cool!

Ben Valderrama (15:44:39): > Hello@Tuomas Borman, I was locally rendering the book before sending the PR for my contribution to the supervised ML chapter. However, I found this error when rendering the mediation.qmd file (added 2 days ago in the last commit): > > [ 5/25] pages\mediation.qmd > > processing file: mediation.qmd > |............................ | 53% [mediation3] Error: > ! The arguments mediator, assay.type and dimred are mutually exclusive, but 2 were provided. > Backtrace: > x > 1. +-mia::getMediation(...) > 2. \-mia::getMediation(...) > 3. \-mia (local) .local(x, ...) > > Quitting from lines 128-146 [mediation3] (mediation.qmd) > > It seems to be an error caused by the code in the chunk ‘mediation3’. Not sure if others have experienced the same, so any help will be appreciated.

2025-02-26

Tuomas Borman (02:10:50) (in thread): > Hello! > > You are 100% correct with that observation, however, there was a bug in mia that is already fixed > > Can you check > devtools::install_github(“microbiome/mia”)

Ben Valderrama (03:59:15) (in thread): > Thanks for the response! After following your suggestion, I got the following error: > > Preparing to preview > [ 1/21] pages\mediation.qmd > > > processing file: mediation.qmd > |............................ | 53% [mediation3] Error in `plotMediation()` > > Which was solved with:devtools::install_github("microbiome/miaViz"). Now mediation is finished and its working on the other chapters. Thanks:wink:

Ben Valderrama (04:09:51): > In an unrelated note, when I render the book I get > > x Some dependencies found in book pages are not listed in DESCRIPTION: > * Maaslin2 > i Consider adding these dependencies to DESCRIPTION > > And indeed maaslin2 is still used in 2 other chapters:msea.qmdandmmuphin_meta_analysis.qmd. However, maaslin2 is still installed as part of theinstall_packages.Rscript, so if people use that to setup their packages it shouldn’t be a problem. Should I add that in the DESCRIPTION and send a pull request, or is it that you are maybe thinking in updating the other 2 chapters to maaslin3?

Leo Lahti (04:18:36): > I think we should just remove maaslin2

Tuomas Borman (04:18:53) (in thread): > Good catch, can you add maaslin2 to DESCRIPTION file? It was removed from there, since I did not notice that it was still used in those files

2025-03-11

Tuomas Borman (08:58:21) (in thread): > Just saw this message… > > The problem is that those extra chapter written by Himel’s group are uitilizing the Maaslin2. The output of Maaslin3 is different than Maaslin2, and the results also differ. > > Ideally we could just use maaslin3, but those chapters have also other problems which is why they are now in “extra chapters” (they are not really utilizing mia tools)

2025-03-12

Leo Lahti (11:13:39) (in thread): > Or we could ask them to update it? Himel is also an author in Maaslin3

Leo Lahti (11:13:53) (in thread): > Maaslin3 also has SE support.

Leo Lahti (11:14:21) (in thread): > If they want to be part of the paper I think it could be a reasonable request. And they can still choose.

Tuomas Borman (11:28:39) (in thread): > That would be the best solution. But those chapter should be updated in general also as they are not using mia

2025-03-13

Leo Lahti (05:42:01) (in thread): > Rifght

Leo Lahti (05:42:09) (in thread): > I should write there.

Leo Lahti (05:42:20) (in thread): > Perhaps to wait until the draft is ready