#tech-advisory-board

2019-07-15

Martin Morgan (14:39:10): > @Martin Morgan has joined the channel

Martin Morgan (14:39:11): > set the channel description: Discuss topics and activities of the Bioconductor technical advisory board

Lori Shepherd (14:43:59): > @Lori Shepherd has joined the channel

Kayla Interdonato (14:44:05): > @Kayla Interdonato has joined the channel

Laurent Gatto (14:44:14): > @Laurent Gatto has joined the channel

Lorena Pantano (14:46:14): > @Lorena Pantano has joined the channel

Nitesh Turaga (14:47:52): > @Nitesh Turaga has joined the channel

Johannes Rainer (14:54:49): > @Johannes Rainer has joined the channel

James Taylor (15:52:33): > @James Taylor has joined the channel

Kevin Rue-Albrecht (16:30:59): > @Kevin Rue-Albrecht has joined the channel

Stephanie Hicks (20:07:45): > @Stephanie Hicks has joined the channel

2019-07-16

Peter Hickey (00:23:04): > @Peter Hickey has joined the channel

Robert Castelo (01:57:18): > @Robert Castelo has joined the channel

Mike Smith (04:31:59): > @Mike Smith has joined the channel

Almut (05:52:04): > @Almut has joined the channel

Lluís Revilla (07:29:48): > @Lluís Revilla has joined the channel

Charlotte Soneson (09:43:04): > @Charlotte Soneson has joined the channel

2019-07-17

hcorrada (10:43:36): > @hcorrada has joined the channel

2019-07-20

Leo Lahti (09:24:17): > @Leo Lahti has joined the channel

2019-07-25

Levi Waldron (12:07:17): > @Levi Waldron has joined the channel

2019-07-28

Martin Morgan (13:04:40): > Nominations for the technical advisory boardhttps://forms.gle/zTZLZJQrHL4ZGEGq9will close at midnight (ok, 11:59) Easter time, July 31 - Attachment (Google Docs): Nomination Form: Bioconductor Technical Advisory Board (TAB) > Complete this form to nominate a candidate for a three-year appointment to the Bioconductor Technical Advisory Board (TAB). Self-nomination and nomination of others are acceptable; if nominating someone else please check with them first that they intend to accept the nomination. The composition of the TAB is determined through a nomination and selection process defined in the Bioconductor TAB governance document (https://bioconductor.org/about/technical-advisory-board/TAB-Governance.pdf). The TAB should: - constitute a broad representation of the Bioconductor scientific community - include emerging and established researchers in biological, statistical, and computational domains - represent diversity of gender, race/ethnicity, geography, and other aspects of the Bioconductor community Participation in the TAB obligates the member to regular participation in a monthly TAB teleconference and to follow the Bioconductor Governance Document. The TAB helps determine the overall direction of the project, providing guidance to developers and investigators on new concepts in computational biology and genome research, and acts as an incubation group for the collaborative pursuit of funding for the project.

2019-08-01

Matt Ritchie (12:11:28): > @Matt Ritchie has joined the channel

2019-08-14

Aedin Culhane (14:42:38): > @Aedin Culhane has joined the channel

2019-10-27

Sean Davis (11:02:00): > @Sean Davis has joined the channel

2019-10-29

Kevin Rue-Albrecht (17:35:11): > @Kevin Rue-Albrecht has left the channel

2020-01-08

Lluís Revilla (10:39:58): > Was there a meeting in November? I can’t find the notes from November:sweat_smile:

Martin Morgan (11:10:01): > @Lluís Revillasorry, these should appear in the usual placehttps://bioconductor.org/about/technical-advisory-board/in the next hour or so…@Levi Waldronwe need to arrange approval of the December minutes even though we did not have a January meeting (scheduling / holiday conflicts)

Lluís Revilla (11:13:26): > Thanks! Looking forward to what the board has decided on some topics like the Community Advisory Board

Martin Morgan (11:16:38): > @Matt Ritchieis taking a leadership role in forming the Community Advisory Board; the overall structure and governance will be similar to the Technical Advisory Board, but the membership and of course responsibilities will have a different emphasis.

2020-03-09

Lluís Revilla (07:21:51): > Were the notes of the February approved on the latest March meeting?

hcorrada (08:30:17): > @hcorrada has left the channel

Martin Morgan (11:58:43) (in thread): > @Lluís Revillaposted now; thanks for keeping us honest!

Lluís Revilla (12:33:47) (in thread): > :sweat_smile:

2020-03-13

Sean Davis (13:36:59): > The@rOpenSci#rstatspackage dev guide, including sections on development, software review, and maintenance.devguide.ropensci.orgThought it was worth sharing. - Attachment (twitter.com): rOpenSci (@rOpenSci) | Twitter > The latest Tweets from rOpenSci (@rOpenSci). rOpenSci develops #rstats-based tools to facilitate open science and access to open data. Tweets by @sckottie @_inundata @StefanieButland @opencpu @ma_salmon. Berkeley, CA - Attachment (twitter.com): #rstats - Twitter Search > The latest Tweets on #rstats. Read what people are saying and join the conversation.

2020-03-14

Leo Lahti (16:56:52): > Yep it’s good.

2020-03-16

Lluís Revilla (07:57:37): > BTW Maëlle Salmon from rOpenSci recently asked for a viewable interface of thegit.bioconductor.orgrepository

Lluís Revilla (07:57:55): > there was an issue open about how to handle Bioconductor dependencies for CRAN packages too (https://github.com/ropensci/dev_guide/issues/210)

2020-05-10

Sangram Keshari Sahu (09:30:44): > @Sangram Keshari Sahu has joined the channel

2020-05-25

Kevin Blighe (18:47:07): > @Kevin Blighe has joined the channel

2020-05-26

Robert Ivánek (03:54:49): > @Robert Ivánek has joined the channel

2020-06-06

Olagunju Abdulrahman (19:58:03): > @Olagunju Abdulrahman has joined the channel

2020-07-24

Michael Love (19:33:26): > @Michael Love has joined the channel

2020-08-24

Sean Davis (10:07:50): > An interesting approach to code analysis across a python data ecosystem:https://github.com/data-apis/python-record-api

2020-09-04

Levi Waldron (08:18:59): > There was discussion in the TAB meeting yesterday about how to modularize chapters in a book or sessions in a course, and@Aaron Lunsaid the main problem with completely separating the chapters of a book into separate repos / projects was cross-referencing. Would something like this work when combining independently-build .md files? Fromhttps://github.com/lierdakil/pandoc-crossref/issues/97: > > pandoc -F pandoc-crossref -F pandoc-citeproc ~/manuscript/*.md -o test.html > > File 1 might look something like this: > > a reference to a figure in file 2 [@fig:figure2] > > ![first figure](/full/path/figure-1.png){#fig:figure1} > > And file 2 like this … > > ![Another figure](/full/path/figure-2.png){#fig:figure2} > > a reference to a figure in file 1 [@fig:figure1] > > Output would be a single document, something like this … > > a reference to a figure in file 2 Fig. 2 > > a reference to a figure in file 1 Fig. 1 >

2020-09-30

RGentleman (13:56:22): > @RGentleman has joined the channel

2020-10-05

Sean Davis (09:07:14): > <!channel>, consider tweeting/retweeting or otherwise sharing the project manager/software developer/manager position at Harvard that will support Bioconductor:https://twitter.com/seandavis12/status/1313084907707600897https://partners.taleo.net/careersection/bwh/jobdetail.ftl?job=3133887&tz=GMT-04%3A00&tzname=America%2FNew_York - Attachment (twitter): Attachment > #Bioconductor has created a new project management/software dev/manager position and is looking for candidates. > https://partners.taleo.net/careersection/bwh/jobdetail.ftl?job=3133887&tz=GMT-04%3A00&tzname=America%2FNew_York > Join an influential OSS project, interact with great scientists, and help manage a distributed, capable team. > #Jobs #bioinformatics https://pbs.twimg.com/media/EjkDQkDXkAIX0NQ.png - Attachment (partners.taleo.net): IS Project Manager/ Day/ 40 Hrs/ Channing Division of Network Medicine > Click the link provided to see the complete job description.

Sean Davis (09:09:58): > Anyone interested in participating in a Bioconductor Blog if we set one up?

Nitesh Turaga (09:10:25) (in thread): > I am. What is the scope of the blog? (or is it TBD or very flexible?)

Sean Davis (09:14:35) (in thread): > I’d not scope it beyond saying that it should abide by the code of conduct, posts should interest some portion of the bioconductor community, and should be objective when dealing with products, services, or software. Authorship would be a voluntary activity.

Nitesh Turaga (09:15:08) (in thread): > Sounds good. I’d be interested.

Sean Davis (09:18:16) (in thread): > To be specific, each post will be an Rmarkdown (or markdown) document submitted as a pull request to a github repo that can include bibliography, math, htmlwidgets along with normal rmarkdown stuff.

Lluís Revilla (11:00:11): > https://community-bioc.slack.com/archives/CLF37V6C8/p1601903234001800Perhaps tweeting from Bioconductor’s twitter account will help… - Attachment: Attachment > <!channel>, consider tweeting/retweeting or otherwise sharing the project manager/software developer/manager position at Harvard that will support Bioconductor: > https://twitter.com/seandavis12/status/1313084907707600897 > https://partners.taleo.net/careersection/bwh/jobdetail.ftl?job=3133887&tz=GMT-04%3A00&tzname=America%2FNew_York

2020-10-09

Lluís Revilla (14:51:02): > I’ve seen on the last meeting minutes’ of 2020-09-03 that the review process will change. If you need data I can provide the dataset used for the blog post mentioned. In case it helps I also plan to analyze the rOpenSci review process similarly.

2020-10-11

Kozo Nishida (21:43:01): > @Kozo Nishida has joined the channel

2020-10-17

Kevin Blighe (08:25:53): > @Kevin Blighe has joined the channel

2020-10-28

Lluís Revilla (11:48:43): > I finally made public the post reviewing the rOpenSci submissions:https://llrs.dev/2020/09/ropensci-submissions/. I might do a second post analyzing in more detail the role of the reviewers and about second submissions - Attachment (B101nfo): rOpenSci submissions | B101nfo > Comparing rOpenSci review process to the Bioconductor review process. Most important differences are external reviewers and build on external machines as well as a longer review time.

Martin Morgan (15:07:46) (in thread): > yeah that looks really great, and it’s interesting to compare the processes – in many ways they have the same structure submit / ‘pre-review & assign reviewer’ / review / response / accept. > > As mentioned in the rOpenSci channel the 2 community-based reviewers can invest quite a bit of time (in the random package I looked at,https://github.com/ropensci/software-review/issues/330, the reviewers were investing on the order of a work day in the review) which maybe our current system doesn’t allow, so the comments are perhaps more extensive and also with more ‘domain’ expertise, but qualitatively the two processes seemed to be addressing similar concerns. > > It would be interesting to know whether there were more completely different approaches, e.g., like the F1000 model where the manuscript is available as soon as technical limitations are addressed, but the reviews & status accumulate (and can be renewed) over time – software isn’t static, and utility of a package yesterday does not mean that it will be useful tomorrow.

2020-10-29

Lluís Revilla (05:25:16) (in thread): > Yes, overall the difference isn’t big, only a bit longer. Yes, surprisingly they do not look for in deep review but to an ongoing conversation on the package. That’s why I might do a second post looking for this interactions. > One idea I had to speed up the process and get reviewers is to use BiocViews to ask maintainers with similar BiocViews to review those submitted packages similar to theirs. This has the side benefit that it might increase knowledge transfer and intercompatibility. > Not sure if I could access to F1000 data. Looking at the xml version of the articles it doesn’t seem to be an easy correspondence between the review and the version of the article. > By January I might start to work on a post on CRAN reviews (I am waiting to accumulate more data), which is closer to F1000 model.

2020-11-19

Kevin Blighe (08:28:31): > @Kevin Blighe has joined the channel

2020-12-12

Huipeng Li (00:38:02): > @Huipeng Li has joined the channel

2021-03-03

Aedin Culhane (00:40:25): > R OpenSci described their new build systemhttps://r-universe.dev/builds/

2021-03-05

Michael Love (10:54:49): > I’m thinking of creating a #AnnotationHub channel for further discussion going off of yesterday’s call, i dont think something like this already exists?

Lori Shepherd (10:58:19): > There is a#biochubschannel but I don’t know what the discussion was about

Michael Love (11:10:10): > got it, well might as well keep using that

2021-03-20

watanabe_st (01:58:44): > @watanabe_st has joined the channel

2021-04-02

Nitesh Turaga (12:13:59): > @Vince Carey

Vince Carey (12:14:02): > @Vince Carey has joined the channel

2021-04-06

Aedin Culhane (12:43:15): > Maybe of interesthttps://paperswithcode.com/datasets - Attachment (paperswithcode.com): Papers with Code - Machine Learning Datasets > 3395 datasets • 43718 papers with code.

2021-05-04

Marcel Ramos Pérez (13:38:40): > @Marcel Ramos Pérez has joined the channel

2021-05-07

Davide Risso (06:31:17): > @Davide Risso has joined the channel

Arjun Krishnan (08:50:04): > @Arjun Krishnan has joined the channel

Hervé Pagès (17:29:34): > @Hervé Pagès has joined the channel

2021-05-11

Megha Lal (16:46:04): > @Megha Lal has joined the channel

2021-05-13

Aedin Culhane (11:22:57): > @Mike Smithraised this point - I thought it’d be nice to create an overview of BioC classes, but this suggests there’s over 1500 classes exported across all packageshttps://code.bioconductor.org/search/search?q=exportClass+file%3ANAMESPACEHow is that possible!? - Attachment (code.bioconductor.org): Bioconductor Code: Search > Search source code across all Bioconductor packages

Mike Smith (11:47:47) (in thread): > I might have phrased that slightly different not in a Google doc comment! > > It was mostly meant as a “do my search results really indicate that?” and if so an expression of surprise that despite the strong efforts to reuse classes there’s an average of 0.75 class definitions per software package.

2021-07-14

Michael Love (04:05:00): > I did a few tweaks to the intro to TAB slides, looking good

2021-07-19

Shila Ghazanfar (06:01:31): > @Shila Ghazanfar has joined the channel

Shila Ghazanfar (06:32:56): > hi all! i added some comments to the “Meet the TAB” slides, thanks for putting them together. One thing I wasn’t sure was whether the slides are available to everyone afterwards? if so then I think it can be more link-heavy, otherwise i think it’s quite comprehensive!

2021-07-26

Charlotte Soneson (02:35:57) (in thread): > Thanks Shila! Yes, I think they should be public afterwards. I’ll look into adding some more links!

2021-08-05

Michael Love (11:37:41): > we can have 9 people on stage + 1 screen share at the TAB session

Michael Love (11:38:01): > @Vince Careyi’ve invited you on stage if you are near your computer

Michael Love (11:41:07): > i’m happy to moderate the Q&A in this session

Michael Love (11:46:52): > ok we have 9 now, so i can rotate people on by kicking someone off

Wes W (11:48:30): > @Wes W has joined the channel

hcorrada (11:49:23): > @hcorrada has joined the channel

Michael Love (11:52:28): > if someone from audience wants to say something just raise your hand and i’ll rotate you on

Michael Love (12:14:14): > i can bring one more person on now — going for Robert G

Aedin Culhane (12:18:36): > The 10 limit per session in Airmeet is a challenge.

Aedin Culhane (12:18:53): > Well done everyone. Should we post links that we mentioned during Bioc2021 here?

Shila Ghazanfar (12:24:16) (in thread): > that’d be great Aedin, we could then pin it to the channel:slightly_smiling_face:

Shila Ghazanfar (12:41:49): > 2018 workshop on maintaining your bioconductor packagehttps://bioconductor.github.io/BiocWorkshops/maintaining-your-bioconductor-package.htmlshared by@Nitesh Turagawhich I think you mentioned may be worth updating? - Attachment (bioconductor.github.io): The Bioconductor 2018 Workshop Compilation > This book contains all the workshops presented at the Bioconductor 2018 Conference

Mikhail Dozmorov (12:48:08): > @Mikhail Dozmorov has joined the channel

Sonali (13:12:16): > @Sonali has joined the channel

2021-08-06

Spencer Nystrom (15:24:31): > @Spencer Nystrom has joined the channel

2021-08-09

Spencer Nystrom (09:28:20): > Hey TAB folks, I have a few questions RE: working groups and resources, etc. > > Ben Tremblay and I have been discussing for some time now doing some work to boost the motif analysis capabilities and training materials available in Bioconductor. One idea we’ve been tossing back and forth is maybe building an “Orchestrating motif analysis with Bioconductor” book. Our focus has mostly been on DNA motifs, but we’d love experience from RNA or protein folks as well to contribute workflows, best practices, and potentially identify areas for improvement in the ecosystem which we can then build. Anyway, this is all sounding like “working group” territory to me, so thought I’d ping here for thoughts & feedback. Cheers!

Michael Love (09:38:22) (in thread): > Sounds like it would be a great resource. Some on the TAB can probably provide best practices advice on book setup and maintenance. Wondering if it could fit under the education WG or also if there is a Bioc books WG. I don’t fully grasp what makes something a project vs. a WG.

Spencer Nystrom (09:44:37) (in thread): > Yeah, as long we’re not working in a vacuum, I’m cool with whatever.

Sean Davis (10:03:23) (in thread): > As far as books go,@Aaron Lunis the wizard. From my own experience, technically building a collaborative book and keeping it updated is no small undertaking.

Spencer Nystrom (10:05:33) (in thread): > Luckily, I don’t think we’ll be knitting data on the scRNAseq scale, so hopefully a little more lightweight on the builds, and more like a normal pkgdown build. But yes, the soft skills I imagine are the greatest hurdle to this stuff.

Spencer Nystrom (10:06:59) (in thread): > (and by “soft” I mean non-book content related)

Aedin Culhane (10:43:22) (in thread): > @Kevin Rue-Albrechtalso has a lot of experiences getting books up and running

2021-09-06

Eddie (08:23:44): > @Eddie has joined the channel

Eddie (09:12:04): > @Eddie has left the channel

2021-09-30

Sean Davis (08:45:02): > Seems doable. Tracking geographic locations of downloads, particularly over time, might be useful to guide global outreach and translation activities. - Attachment: Attachment > Is it possible to get statistics on which country the package is downloaded from? > I’d like to show that the number of Japanese Bioconductor users is not small (although there are few Bioconductor developers in Japan). > If it can be shown, it can be said that the barrier in Japan is that the information for developers is not available (or accessible==language barrier?).

2021-10-02

Vince Carey (14:41:51): > did tab members receive email with link to agenda for oct 7 call?

Charlotte Soneson (14:44:59) (in thread): > I got it yesterday

2021-10-03

Kevin Blighe (08:51:08) (in thread): > I got the original email but think that it was in error, as Im not on the TAB

Charlotte Soneson (08:55:51) (in thread): > Kevin, can you clarify - did you get an email on Friday with subject “Agenda link for TAB meeting 7 October”?

Kevin Blighe (09:04:10) (in thread): > Nope, never received that one. > I got one before that to say that there would be an interview, and that, if I wanted to opt out of it, I could reply to Vince.

Charlotte Soneson (09:06:59) (in thread): > Ok, good - then it all seems fine.

Stephanie Hicks (14:40:39) (in thread): > I did not get it. Could you try again@Vince Carey?

Stephanie Hicks (14:41:45) (in thread): > ah, n/m. Found it in my spam.:confused:

2021-10-04

Aedin Culhane (16:51:37) (in thread): > Agree

2021-10-13

Sean Davis (20:09:53): > Thinking about cloud hosting:https://blog.cloudflare.com/introducing-r2-object-storage/ - Attachment (The Cloudflare Blog): Announcing Cloudflare R2 Storage: Rapid and Reliable Object Storage, minus the egress fees > Introducing Cloudflare’s S3-compatible Object Storage service, with zero egress bandwidth charges and automatic migration from S3-compatible services.

2021-10-14

Andres Wokaty (13:41:40): > @Andres Wokaty has joined the channel

2021-10-20

Kevin Blighe (04:45:15) (in thread): > Oh, I got something now. Is it to arrange the interview with La Piana Consulting? > Y’all know I only have excellent things to say about Bioconductor:relaxed:

2021-12-14

Megha Lal (08:23:40): > @Megha Lal has left the channel

2022-03-03

Michael Love (11:58:54): > sorry i’ll join a few min late…

2022-05-09

Aris Budiman (02:20:17): > @Aris Budiman has joined the channel

2022-05-21

Kasper D. Hansen (15:31:12): > @Kasper D. Hansen has joined the channel

2022-06-20

Mike Smith (09:34:18): > I was wondering if anyone on the TAB had thoughts about this post (https://support.bioconductor.org/p/9144841/) regarding package licensing. > > Initially I was sceptical sincescrandoesn’t copy or distribute any code fromdqrng, but the more reading I do (e.g.https://en.wikipedia.org/wiki/GNU_General_Public_License#Linking_and_derived_works) the more I’m convince that the interpretation in that post might be correct, and just linking to a GPL licenced library is enough to makescrana derivative piece of work and thus also subject to the same license. I’d be interested to know what other think and what the implications for this might be.

2022-06-21

Vince Carey (03:36:48) (in thread): > I answered the post indicating that we are discussing the matter and will reply later.

Lluís Revilla (04:11:51) (in thread): > I think the same problem happens in CRAN too. AFAIK licenses are not checked to be compatible when checking packages.

2022-07-12

Lluís Revilla (07:20:18): > There seems to be a problem withcode.bioconductor.org: release 3.10 seems missing and it is not possible to reach it via url such ashttps://code.bioconductor.org/browse/ReactomePA/blob/RELEASE_3_10/DESCRIPTION(but it works for RELEASE_3_9 and RELEASE_3_11). The error I see is :****Oops!****fatal: Path '0/DESCRIPTION' does not exist in 'RELEASE_3_1'. Thanks for the great service@Mike Smith! (and enjoy these days, it is not important, I just wanted to record it somewhere in case it is easy to fix)

2022-07-27

Levi Waldron (12:57:20) (in thread): > A little late to this discussion, but in general I think it is the responsibility of the copyright holder to enforce terms of their product’s license by derivative products. That means the author of dqrng would have to enforce the license, by asking / compelling the author of scran to comply. I am not a lawyer, but I don’t think this is Bioconductor’s responsibility to resolve, it’s the responsibility of the copyright holders of the original and the derivative works. As a side note, scran additionally depends on GPL-3 licensed software. From the AGPL version 3 section ofhttps://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicensesit would seem that scran’s current situation is explicitly allowed under section 13 ofhttps://www.gnu.org/licenses/agpl-3.0.html: > > Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the work with which it is combined will remain governed by version 3 of the GNU General Public License.

Levi Waldron (12:57:53) (in thread): > I could add this reply to the support site question.

Vince Carey (20:17:35) (in thread): > thanks levi i guess i would skip the disavowal of responsibility but otherwise posting seems fine and include all text if you feel like it

2022-07-28

Levi Waldron (08:11:57) (in thread): > Done, just made a minimal response to the question of licensing without responding to the comment that Bioconductor should do automated license compabitility checking (which might be possible to do as an internal check for obvious incompatibilities but not thoroughly enough to provide a badge or anything like that I don’t think).

2022-07-29

Alex Mahmoud (12:10:50): > @Alex Mahmoud has joined the channel

2022-08-05

Stephanie Hicks (14:15:31): > hi everyone! I wanted to know if someone could connect me with Bioconductor’s contact with Azure? I’ve got a colleague at Hopkins who is looking to know if anyone has a contact at Azure. I said I would check here to see if I could try to make a connection. Thanks!

Nitesh Turaga (14:59:21): > Hi@Stephanie HicksI’m happy to make the connection. Let me know atnitesh@ds.dfci.harvard.edu

Stephanie Hicks (15:30:24): > thanks@Nitesh Turaga! I’ll connect everyone now.

2022-08-07

Sean Davis (11:18:40): > @Stephanie Hicks, NIH STRIDES has recently added Azure.https://www.nih.gov/news-events/news-releases/nih-expands-biomedical-research-cloud-microsoft-azureContact info is on the linked page. - Attachment (National Institutes of Health (NIH)): NIH expands biomedical research in the cloud with Microsoft Azure > To date, researchers have used more than 83 million hours of computational resources to access and analyze more than 115 petabytes of high-value biomedical data in the cloud.

2022-08-09

Stephanie Hicks (16:24:13): > got it, thanks@Sean Davis!

2022-10-11

Michael Love (16:08:29): > hi all, I have some thoughts about a new function/package for simplifying liftOver, more in thread:point_down:

Michael Love (16:09:06) (in thread): > Currently I bet most users do: > 1. Find chain file on UCSC website > 2. Download > 3. Unzip > 4. rtracklayer::import > 5. rtracklayer::liftOver > 6. unlist > 7. genome()<-what about seqinfo/seqlengths? > Or if you happen to know about Ahub, steps 1-4 can be replace by: > > ah <- AnnotationHub() > chain <- query(ah, c("chain.gz", "hg19toHg38"))[[1]] > > then you are back to steps 5 and onward > > Ideally this could be a single function, something like: > > new <- easylift(old, "hg19", "hg38") > # or > new <- easylift(old, "hg38") # if genome(old) == "hg19" > > I’m happy to do the work but I’d like to get opinions on where this should live and how to do it that is future proof. 99.9% of time this is hg19 to hg38 or reverse and we should make this easy.

Michael Love (16:10:13) (in thread): > it would be great if after liftover we could havegenomeandseqlengthsprovided but the best way i know to do this relies on ftp to UCSC via call toSeqinfo

Michael Love (16:10:36) (in thread): > what if we put those seqlengths also on Ahub? > > in general any thoughts about the best way to implement this?

Sean Davis (16:13:47) (in thread): > You didn’t mention if any of this is happening in R (outside of the Ahub piece). Just for completeness:https://www.bioconductor.org/packages/release/workflows/vignettes/liftOver/inst/doc/liftov.html

Michael Love (16:15:29) (in thread): > sorry lemme make those more specific

Mikhail Dozmorov (21:42:34) (in thread): > Simplifying liftOver would be great. For seqlengths,GenomeInfoDb::getChromInfoFromUCSC(as well asgetChromInfoFromNCBI,getChromInfoFromEnsembl) works well even using external data. Not sure about the value of seqlengths on AHub given the number of organisms and somewhat difficult search.

Mikhail Dozmorov (21:43:00) (in thread): > Besides simplifying, there are many issues with liftOver in R. This issue is fully reproducible and remain unanswered:https://community-bioc.slack.com/archives/C35G93GJH/p1612539714062500. I had issues with lifting over coordinates near centromeres, telomeres, short arms. There is CrossMaphttps://crossmap.readthedocs.io/that uses the same chain files but I haven’t tested it on the issue above. Will be interested to help or follow the development - Attachment: Attachment > I encountered a strange bug with rtracklayer::liftOver. In R, it takes one region and lifts it into 21 regions! Using the UCSC genome browser and the command line tool with the identical liftover chain, that one region is correctly lifted into one region. I posted this on the BioC support site, but wonder if anyone ever noted such behavior?

Hervé Pagès (23:52:25) (in thread): > Just played a little bit withrtracklayer::liftOver(). Another issue I see is that it lifts everything, regardless of what the genome field in Seqinfo says: > > library(rtracklayer) > path <- system.file(package="liftOver", "extdata", "hg38ToHg19.over.chain") > ch <- import.chain(path) > gr <- GRanges(c("chr1", "chr2"), IRanges(1, 15000)) > genome(gr) <- c("hg38", "mm10") > unlist(liftOver(gr, ch)) > # GRanges object with 2 ranges and 0 metadata columns: > # seqnames ranges strand > # <Rle> <IRanges> <Rle> > # [1] chr1 10001-15000 * > # [2] chr2 10001-15000 * > # ------- > # seqinfo: 2 sequences from an unspecified genome; no seqlengths > > Mixing chromosomes from different genomes is a rare situation but it should be supported. This one should be easy to fix. > About Seqinfo: setting the correct Seqinfo on the lifted object is kind of tricky at the moment. FWIW I just added anupdate()method for Seqinfo objects inGenomeInfoDb1.33.8 in an attempt to make this a little bit easier. For example, for thecur19object produced in theliftOverworkflow, it can be done with: > > lifted_seqlevels <- seqlevels(cur)[genome(cur) %in% "hg38"] > seqinfo(cur19) <- update(seqinfo(cur19), Seqinfo(genome="hg19")[lifted_seqlevels]) > > This can somehow be considered the standard way to set the Seqinfo on the lifted object. Still not as easy as one might have hoped but that’s something that could be baked inrtracklayer::liftOver()oreasylift().

2022-10-12

Michael Love (07:02:12) (in thread): > thanks@Mikhail Dozmorovand@Hervé Pagèsfor these pointers

Michael Love (07:05:09) (in thread): > re: UCSC, this is too fragile for me. i’ve seen lots of times when getting the seqinfo from UCSC is the breakpoint for an entire workflow, which is funny when you think of how few bytes of data we need (it’s a few hundred bytes), and how stable the information is (as i understand, assembled molecule lengths do not ever change for builds such as “hgXX” etc., and these come a few times a decade) > > I’m wondering if we just put hg18, hg19, hg38 seqlengths into Ahub as frozen pieces of data (again, I’m happy to do this) we could solve the 99% problem.

Mikhail Dozmorov (07:13:23) (in thread): > Yes, also encountered issues with UCSC. Suggesting mm9, mm10, and mm39 for AHub in addition to human seqlengths - that’ll solve 99.98% of the problems.

Mikhail Dozmorov (07:16:04) (in thread): > T2T seqlengths would also be helpful. But, surprisingly, T2T seqlengths differ sligntly between version 1.1 and 2.0. Although, based on version descriptions, they should be the same for autosomes. We are looking into it for the excluderages package.

Michael Love (08:32:57) (in thread): > agree, hg18-hg38 and mm9-mm39

Michael Love (08:36:03) (in thread): > T2T is more in the .02% for a few more years at least

Hervé Pagès (12:35:05) (in thread): > yeah, only for autosomes + mitochondrial chromosome. The other things (scaffold etc…) are likely to change, even for something like mm10 (seehttps://github.com/Bioconductor/GenomeInfoDb/issues/27). An alternative to the AnnotationHub solution would be to store those seqlengths inGenomeInfoDborGenomeInfoDbData. Then a call likegetChromInfoFromUCSC("mm10", assembled.molecules.only=TRUE)would be able to use that and wouldn’t need internet access at all.

Michael Love (12:56:34) (in thread): > i think we should propose this for the next TAB meeting and hear some from Herve on the +/- of these solutions

Michael Love (12:57:09) (in thread): > the function I am thinking of would just drop scaffolds entirely. i want to solve the 99% problem here, and people can do it [that it, the steps above, including aSeqinfocall to UCSC for seqlengths] manually if they want to work with scaffolds

Michael Love (13:01:15) (in thread): > edit:DONT DO THIS:slightly_smiling_face:see below. this is just to show that size of info is ~400B > > > hg38 <- GenomeInfoDb::Seqinfo(genome="hg38") > > hg38 <- hg38[paste0("chr",c(1:22,"X","Y","M")),] > > save(hg38, file="hg38.rda", compress="xz") > > 392B Oct 12 13:00 hg38.rda >

Michael Love (13:02:02) (in thread): > if we do this for a half dozen human/mouse genomes, this is ~2Kb burden

Michael Love (13:08:30) (in thread): > on the other hand, Ahub is already loaded for this particular use case, Ahub is transparent about why the data is there (bc someone uploaded it), and the process for updating is clear (although in this case the updates are a few times per decade)

Hervé Pagès (13:26:55) (in thread): > I would do this instead: > > > getChromInfoFromUCSC("hg38", assembled.molecules.only=TRUE, as.Seqinfo=TRUE) > Seqinfo object with 25 sequences (1 circular) from hg38 genome: > seqnames seqlengths isCircular genome > chr1 248956422 FALSE hg38 > chr2 242193529 FALSE hg38 > chr3 198295559 FALSE hg38 > chr4 190214555 FALSE hg38 > chr5 181538259 FALSE hg38 > ... ... ... ... > chr21 46709983 FALSE hg38 > chr22 50818468 FALSE hg38 > chrX 156040895 FALSE hg38 > chrY 57227415 FALSE hg38 > chrM 16569 TRUE hg38 > > Putting those on Ahub would work too but it would make it a little bit harder to take advantage of them ingetChromInfoFromUCSC(). Also a call like the above would no longer be guaranteed to work when offline.

Michael Love (14:50:57) (in thread): > agree

Michael Love (14:51:53) (in thread): > i think if we are ok saying human and mouse are unique cases where we just don’t want to be subject to UCSC connection, and given that these assembled molecule lengths are tied to builds that update a few times a decade, i would like to have these via standard routines inGenomeInfoDb

2022-10-13

Hervé Pagès (00:47:51) (in thread): > Done:https://github.com/Bioconductor/GenomeInfoDb/commit/345f22c55b8c431f1cf8080af3235f78266ade9cWith this improvement togetChromInfoFromUCSC(), the standard way to set the Seqinfo on the lifted object becomes: > > to_seqinfo <- getChromInfoFromUCSC("hg19", assembled.molecules.only=TRUE, as.Seqinfo=TRUE) > lifted_seqlevels <- seqlevels(cur)[genome(cur) %in% "hg38"] > seqinfo(cur19) <- update(seqinfo(cur19), to_seqinfo[lifted_seqlevels]) > > No internet access needed!:sunglasses:

Michael Love (07:21:31) (in thread): > this is great! looking forward to trying this out. I may mockupeasylift()to get more feedback here from interested parties

2022-10-16

Vince Carey (14:07:05): > I have made a pull request to BiocPkgTools that will introduce code that can extract maintainer ORCID from packageDescription output and use rorcid to get metadata about maintainer. I have noticed that lots of packages don’t use Authors@R and don’t have the comment=c(ORCID=…) in the person() entry. I have also noticed that lots of folks don’t keep their ORCID metadata up to date. Can we try to improve the situation with TAB-initiated code and TAB membership?

Vince Carey (14:08:50): > I would also like to reduce the use ofmaintainer@bioconductor.orgas the maintainer field, preferring to have named maintainers for all packages, especially key infrastructure. TAB members will be invited to participate in maintaining > packages that lack named individuals as maintainers.

2022-10-18

Vince Carey (09:45:33): > Here’s a new approach to TAB agenda construction. I have made a template for November meeting athttps://docs.google.com/document/d/1maef2QtIaMCJ3iDUZFGB1rmj3XRWSGhd9kvya8TZEL4/edit?usp=sharingand TAB members can edit to introduce topics of interest. The TAB exec committee will shape the meeting agenda prior to the meeting. - File (Google Docs): Bioc TAB agenda Nov 3 2022

Stephanie Hicks (10:18:25): > thanks@Vince Carey!

2022-10-20

Lluís Revilla (10:48:34): > There is thehttps://bioconductor.org/developers/how-to/commonImportsAndClasses/which says “Common Bioconductor Methods and Classes” but sometimes they are enforced in the review upon submission, while other packages that use the Bioc prefix are sometimes considered official but not mentioned there or not enforced in the reviews. Is there an official way for Bioconductor to select a packages/classes as the official Bioconductor package for a purpose?

Lori Shepherd (14:48:08): > I proposed starting a working group to revamp and evaluate this page with input from various community outlets but need volunteers and a lead as I can not lead such a project right now – I think a suggestion and appeal for review from the technical advisory board for approval to be listed on the page would suffice for now

Laurent Gatto (15:24:31) (in thread): > I am happy to be involved in this.

Lluís Revilla (16:15:54) (in thread): > According to the websitehttp://workinggroups.bioconductor.org/currently-active-working-groups-committees.html#recommended-classes-and-methodsyou are already a member@Laurent Gatto:smile:How much work would this be or how would it be organized? - Attachment (workinggroups.bioconductor.org): Chapter 2 Currently Active Working Groups / Committees | Bioconductor Working Groups: Guidelines and activities > The following describe currently active working groups listed in alphabetical order. If you are interested in becoming involved with one of these groups please contact the group leader(s). 2.1…

Laurent Gatto (16:41:02) (in thread): > At least, I seem to be consistent in my choices:smile:

Laurent Gatto (16:42:06) (in thread): > I am under the impression that this has never really taken off. Might be worth bringing it up during the next TAB and take it from there. I’ll add it to the agenda.

2022-10-21

Lori Shepherd (07:16:35) (in thread): > We needed someone to lead it as no one took the lead to poll and start scheduling meetings with interested memebers

2022-12-15

Laurent Gatto (03:29:28): > I have a point that I would like to bring up during the next TAB. I have observed the following already a couple of time: a package maintainer does some changes in their Github repo (a very specific example: accept a PR that fixes a bug), but these changes are nver incorporated in the Bioconductor package,. The Bioc package ends up having a version > that the one on Github, but still has the bugs.

Vince Carey (05:07:10): > This is also being reported on bioc-devel by another person too. What do you propose to do about this?

Laurent Gatto (06:02:19): > 1. Bioc core to send an email to maintainer. > 2. Possibility for a member of the community to send a PR to Bioconductor. > 3. If 1 fails and nobody takes on 2, deprecate package because (i) maintainer is not responsive and (ii) Github and Bioc versions out-of-sync, which is a big source of confusion.

Leo Lahti (06:35:05): > I have also bumped into this every now and then. Not sure how wide-spread problem this is but some control along those lines might be useful. But it would be also a problem if an othewise useful pacage will be deprecated to one, potentially minor issue like this. Bug fixes are often small and do not necessarily affect core functionality.

Lori Shepherd (07:37:55) (in thread): > I agree with 1. The problem with 2 is that it opens up changes that the maintainer didn’t explicitly agree to make public through Bioconductor. – 3 normally only happens if a package is failing so its interesting if the package is not technically failing but out of sync to deprecate would be a new policy

Michael Love (07:55:55) (in thread): > instead of deprecate what about a banner

Michael Love (07:58:22) (in thread): > something with this level of high priority in terms of visual design - File (PNG): Screenshot 2022-12-15 at 7.57.02 AM.png

Michael Love (07:59:53) (in thread): > but it would say something like “This repository is out of sync with a maintainer-hosted one”

Michael Love (08:00:11) (in thread): > but there’s no way to do this programmatically

Lori Shepherd (08:00:53) (in thread): > We would need some way to find them – plus some will push changes on the github and not necessarily sync because its not ready or they are still working on it

Lori Shepherd (08:01:15) (in thread): > (yes in good practice it would be on a different branch but not everyone does this)

Michael Love (08:02:06) (in thread): > yeah, i think it can’t be easily resolved programmatically, so it creates a burden on someone to check up on whether they are in “compliance” (with this proposed policy)

Michael Love (08:04:31) (in thread): > if you go multiple releases and you have key functionality or bug fixes on GitHub that aren’t ported to Bioc, you are inserting confusion into the project. i totally get why Laurent raises the issue. so having some “stick” that we can use to incentivize folks to not do this…

Michael Love (08:05:08) (in thread): > i’ve seen it before, and it in the cases i can remember it’s bc maintainer has left bioinformatics, but bc it’s easy to accept PR on GitHub they continue to do that

Laurent Gatto (09:29:18) (in thread): > > 3 normally only happens if a package is failing so its interesting if the package is not technically failing but out of sync to deprecate would be a new policy > May be this is an indication that we might want to push more unit testing, so that ‘small’ bugs get caught and we avoid drifts between Bioc and Github version.

Laurent Gatto (09:31:38) (in thread): > Re 2, I agree it’s not ideal… it’s a bit like somebody taking over partial maintenance. I think it will be very rare… it will happen only in cases where the person suggesting a change in Bioc is (1) well acquainted with the domain/package and (2) has to do it to fix their own package (or package’s dependency).

Laurent Gatto (09:33:31): > I am wondering what is the biggest issue: (1) an important package that get deprecated and (if really important) replaced or (2) differences between the official Bioc version and the unofficial Github version. The latter is dramatic for new developers that depend on the faulty package.

Laurent Gatto (09:35:23) (in thread): > @Michael Love- the banner would only partially resolve the issue. It will work for a user that wants to use the latest version (even when the version is smaller!), but will not solve the dependency problem among Bioconductor packages… if I develop a package that depends on the case we speak, there’s no way out!

Laurent Gatto (09:36:16): > The fact that several of us (you, Leo, myself, and a research in my lab - cf email on bioc-devel) have bumped in these issues indicates that it actually is not that rare.

Michael Love (09:37:47) (in thread): > yes it’s not a solution but a “stick”

Michael Love (09:38:45) (in thread): > like, maybe shame the maintainer into coming into “compliance” (although we would have to list this as a rule somewhere, that we don’t want Bioc devels to have multiple versions of the same software both seemingly the “official” / “current” codebase)

Michael Love (09:39:32) (in thread): > we prob need a good amount of time to discuss this topic, we won’t be able to cover it in 10 min

Kasper D. Hansen (15:18:19): > Is the PR merged into the main branch on Github? We could check if the main branch on Bioc git is the same as the main branch on github

Vince Carey (15:38:43): > fwiw i emailed maintainer and lab PI and have not heard back yet

Laurent Gatto (15:39:12) (in thread): > Not sure if I understand, but PR with bug fixes that we merged on GH before release have never been pushed to Bioc.

Kasper D. Hansen (15:54:02) (in thread): > You can merge into a branch, right. So we can check if the main branch of bioc is equal to main branch of github

Laurent Gatto (15:58:36) (in thread): > There’s at least that one bug fix and the version that are different between GH master and Bioc master and RELEASE_3_16.

Kasper D. Hansen (21:00:49) (in thread): > yeah, so the issue is that github has not been pushed to Bioc

Kasper D. Hansen (21:01:04) (in thread): > We could in principle automate this btw, in an opt-in way

2022-12-16

Vince Carey (05:30:28): > And have heard nothing back so far. We have been discussing informally the process of offering an apparently abandoned package for adoption by other maintainers. > > - [ ] I am committed to the long-term maintenance of my package. This > includes monitoring the [support site][3] for issues that users may > have, subscribing to the [bioc-devel][4] mailing list to stay aware > of developments in the *Bioconductor* community, responding promptly > to requests for updates from the Core team in response to changes in > *R* or underlying software. > > is part of the submission agreement, but another clause about actions that the project might take in the event of non-response to requests for modifications could be introduced?

Laurent Gatto (10:14:44): > The specific problem that triggered this discussion is being addressed by the maintainer (she is synchronising Bioc and GH). However, more generally, given that it is not an isolated issue, it could still be brought up in the TAB.

Laurent Gatto (10:16:12) (in thread): > It might be useful to explicitly mention not to forget to push changes GH’s main branch to Bioc in the submission agreement.

Kasper D. Hansen (23:50:08): > We should figure out how to write a github actions (or something) that pushes changes from github to bioc and files an issue if the push is unsuccessful. Or at least figure out if such an action could be written.

Kasper D. Hansen (23:50:36): > My guess is, for many developers, this is about forgetting

2022-12-17

Laurent Gatto (01:09:26) (in thread): > Not sure if this should be automatic… I often have changes on GH’s main branch at version x.y.1 that I don’t want to merge right away into Bioc vers x.y.0. For example because I’m waiting for an update in the man page or a unit test. If I were to merge into Bioc now, I would need to bump to x.y.2 for these additional changes to take effect on Bioc. If GH -> Bioc was automatic, that would force me to keep to work in a different branch on GH.

Laurent Gatto (01:10:34) (in thread): > And more generally, the automatic merging will fail if the version on GH isn’t kept up-to-date. (Which I imaging will be the case when developers simply forget to keep both in sync).

Lori Shepherd (12:11:18) (in thread): > I don’t like the idea of auto sync with GitHub repos. It will be plagued with merge conflicts and it’s not our responsible to do so.

Lori Shepherd (12:12:59) (in thread): > That’swhy we have the practice of pushing during the reviewprocess now to get in the habit of having to do both. > If some way to determine discrepancy and send a reminder email I think would be more appropriate.

2022-12-21

Mike Smith (03:14:34) (in thread): > I tend to work like@Laurent Gattodescribes, and treat the GitHub master branch as a sort of “super devel” which users shouldn’t necessarily expect to even build at a particular point in time. However my best practices hat (which I don’t wear often) suggests that the development of x.y.1 should be happening in a new branch. Then you could expect master/main to be identical between GitHub and git.bioc. You could even restricted the GitHub master to only expect content from pull requests, which might make the application of an automatic push to Bioconductor once after a build/check success less prone to merge conflicts. > > I think it’d be impossible to impose that on developers, but perhaps it could be tested/encouraged if it worked.

Lori Shepherd (07:05:54) (in thread): > probably too much for naive users and as you said impossible to strictly impose

2023-01-14

Ludwig Geistlinger (18:32:47): > @Ludwig Geistlinger has joined the channel

2023-01-29

Aedin Culhane (13:30:03): > Interesting paper that geo-locates Github contributions and shows growth of developers in countries others than EU/USA over the past ten years. Open source software contribution is more global Code is in python on github.https://www.sciencedirect.com/science/article/pii/S0040162522000105#fig0001 - Attachment (sciencedirect.com): The Geography of Open Source Software: Evidence from GitHub > Open Source Software (OSS) plays an important role in the digital economy. Yet although software production is amenable to remote collaboration and it… - File (PNG): image.png - File (PNG): image.png

2023-01-31

Hervé Pagès (13:53:54) (in thread): > I find the breakdown per region within European countries interesting, somehow confirming some clichés about rural/urban areas.

2023-02-07

Johannes Rainer (04:59:25) (in thread): > it’s also funny to see the difference South Tyrol (where I am) to Trentino in Italy - both are in the same political region but always competing - and seems Trentino is outperforming South Tyrol:smile:

2023-02-28

Aedin Culhane (20:51:36) (in thread): > @Lori Shepherd@Maria Doyle@Vince Careyjust wondering if this code example might be useful for developing reports on Bioconductor outreach and contributions

2023-03-09

Leo Lahti (16:44:38) (in thread): > Also an interesting choice to publish this in an Elsevier journal.

2023-03-14

Michael Love (09:09:07): > Just wanted to thank Hervé for always managing the issues that come down the line from UCSC:pray:https://github.com/Bioconductor/GenomeInfoDb/issues/82#issuecomment-1409252404 - Attachment: Comment on #82 getChromInfoFromUCSC: anyNA(m32) is not TRUE error for hg38 > So it looks like UCSC has just sneakily changed the assembly that hg38 is based on, again! Used to be GRCh38.p13, now it’s GRCh38.p14. I’ve not seen any announcement. Unfortunately this breaks GenomeInfoDb::getChromInfoFromUCSC("hg38"). > > Not the first time they do this: see issue #30. They’ve even done a change to hg19 about 3 years ago: see issue #9 :cry: > > A fix is on the way. Will only be available in current BioC release (3.16) and devel (3.17) because past releases are frozen and no longer maintained. So, if you are using a past release (@mxw010 you seem to be in that situation), you’ll need to update to the current release.

Hervé Pagès (12:06:45): > You’re welcome. Note that a few days after they made this change the UCSC folks notified me off-lists about it. They said it was the last time so hopefully no more changes to hg38, hg19, mm10, etc…:crossed_fingers:

Michael Love (12:36:33): > Reminds me ofmanuscript_final_v2.doc

2023-04-06

Michael Love (12:55:51): > Proposal from TAB meeting: what if we use the “Meet the TAB” session at BioC2023 to open up some of our hardest problems for general discussion, e.g. some of our perennial topics like technical debt, books, annotation data in hubs, etc.

Michael Love (14:14:10): > A proposal for the “top left” box on the slash help page:https://docs.google.com/document/d/1DPv6xHn18avSDLHK57iYqktFzV7LwkpOpKgF34-AHno/editadditions welcome (link fixed)

2023-04-07

Vince Carey (06:52:11): > I am in full agreement with the importance of improving accessibility to doc from a wide range of sources. It becomes a card-catalog problem … what are the best subject-headings to surface? How can we include usage and usability information? Can’t a good vignette by a developer unrelated to book authors be just as useful as a book chapter on a given topic? The vignettes link points to some pretty dry information about what a vignette is…. If we think about available interfaces to “documentation” in the world, what comes to mind? Pubmed – at the top level, all articles are treated identically. Google – ranking plays a role, some “preview” information can be valuable. Newspapers – editorial judgement and front-page design are crucial determinants of value. How do some other projects address this?https://hail.is/gethelp.html… cheatsheet idea is nice but underused?https://satijalab.org/seurat/… pkgdown used to good advantage …https://scverse.org/learn/pretty evocative, needs some maint … and their topical interface to the forum e.g. sidebar athttps://discourse.scverse.org/c/help/muon/34seems worthy of consideration. > > So, yes, doc box should be improved, and it would be nice to attach some strategic thinking about supporting effective information retrieval from the project as a whole. TAB and CAB activities in this area are most welcome. - Attachment (scverse): muon > Forum for scverse, a home for discussion about single-cell omics data analysis and software development

Michael Love (09:15:42): > Right. We have a wealth of content, but a new user might be confused by these different types, and I don’t think we are offering enough guidance. A user wants to find documentation that is up-to-date and of the right length for the task. Maybe I need just a code chunk or maybe I need a whole book. We need to convey these have different lengths, are held up to different reproducibility standards (e.g. CI or not). FWIW none of the above are checked against an evolving codebase right? So we are offering somethingextra– we should convey that somehow.

Michael Love (09:16:20): > if we are just updating the box, feel free to make edits/suggestions in the link above

2023-04-12

Aedin Culhane (09:02:43): > a vignette browser.

Lluís Revilla (11:26:22): > In case it helps, I think all/most Bioconductor vignettes of software packages can be browsed via the r-universe from rOpenSci:https://bioconductor.r-universe.dev/articles

2023-04-27

Vince Carey (09:59:41): > Our May meeting is coming up. In a recent posting the following topics were mentioned: technical debt, books, annotation data in hubs. We welcome agenda items related to these or other concerns.

Michael Love (10:03:47): > proposal: discussion about authorship on Bioc collaborations. that is, do we want to have any loose guidance about how Bioc-centered projects might consider to define authorship

Vince Carey (10:43:01): > Tentatively I have asked@Hervé Pagèsto present on SparseArray,@Robert Shearto present some information on analysis of our AWS footprint and global consumption of bioc bandwidth. I think we’ll have some information on cloud working group activities; other working group members should pipe up if there is info. In all I would like to have a significant chunk of May meeting devoted to core dev presentations on accomplishments in getting 3.17 out and assessment of goals for 3.18 period.

Robert Shear (10:43:07): > @Robert Shear has joined the channel

Michael Love (11:08:21) (in thread): > sorry, scratch this (for May). I have to leave at 12:30 for a committee meeting

2023-05-04

Hervé Pagès (05:22:14): > Link to my “SparseArray update” slides for today’s meeting:https://docs.google.com/presentation/d/1_ZKZdAUCKV3sGceU-nspF21oAkjLh0TodKSuilXFXjU/edit?usp=sharing

Kasper D. Hansen (08:50:50): > Looks awesome@Hervé Pagès. In my advice getting the “parsing” to work should be the next step on the TODO list by a wide margin.

Kasper D. Hansen (08:51:24): > Next should be working with DelayedArray and return sparse blocks (IMO)

2023-05-07

Zuguang Gu (05:17:55): > @Zuguang Gu has joined the channel

Shila Ghazanfar (17:58:58): > hi@Hervé Pagèsim catching up on the TAB meeting notes and intrigued by SparseArray. at present i have a use-case of 3-dimensional arrays, typicallyG x N x k. The array is built essentially fromG x Nsparse matrices, and to create the 3D array i need to resort to dense arrays viaabind. my goal afterward is to take the mean or sums across the k via something likeapply(X, c(1,2), mean). Am i right in understanding SparseArray could address this? i will certainly look more into it as well, thanks!

Stephanie Hicks (20:59:34): > beautiful work@Hervé Pagès!

2023-05-08

Hervé Pagès (00:23:39) (in thread): > thank you

Hervé Pagès (01:30:03) (in thread): > Thanks for the use case! This is something that we definitely want to support. It could be done using something like this (assumingmy_sparse_matricesis a list containing yourksparse matrices): > > library(SparseArray) > > ## Construct the 3D sparse array: > svt <- SparseArray:::new_SVT_SparseArray(c(G, N, K), type="integer") > for (i in 1:k) svt[ , , i] <- my_sparse_matrices[[i]] > > ## apply(): > apply(svt, c(1,2), mean) > > 3 notes: > * Unfortunately thesvt[ , , i] <- my_sparse_matrices[[i]]subassignments don’t work at the moment:disappointed:I need to fix that (should be a simple fix). > * The use ofSparseArray:::new_SVT_SparseArray()is a temporary hack. I still need to expose a user-friendly way to create an empty sparse array of arbitrary dimensions (adding this to the TODO list). > * The call toapply()might be slow ifG x Nis big, It might be possible to improve this by implementing anapply()method for SVT_SparseArray objects that loops at the C-level. > An alternative solution that might be more efficient is to construct ak x G x Nsparse array (instead of aG x N x karray): > > svt2 <- SparseArray:::new_SVT_SparseArray(c(k, G, N), type="integer") > for (i in 1:k) svt2[i, , ] <- my_sparse_matrices[[i]] > apply(svt2, c(2,3), mean) > > In this case each call tomean()will receive values collected along the 1st dimension which will probably be much more efficient (given how the sparse data is organized in memory).

Shila Ghazanfar (02:48:50) (in thread): > oh thats extremely helpful, thanks so much Herve! i’ll follow up with a little more info about this use case

Maria Doyle (05:33:52): > @Maria Doyle has joined the channel

Axel Klenk (08:49:12): > @Axel Klenk has joined the channel

2023-05-11

Hervé Pagès (21:26:14) (in thread): > This use case poses 2 interesting problems: > 1. What’s the best way to construct the big 3D sparse array from the collection ofksparse matrices? > 2. What’s the best way to compute stats across thekmatrices? > It seems that 1. can be efficiently accomplished in two steps (assumingmy_GxN_matricesis a list containing yourksparse matrices already represented as SparseMatrix objects): > > library(SparseArray) # needs to be SparseArray >= 1.1.3 (BioC 3.18) > > ## Turn the GxN matrices into GxNx1 arrays by adding artificial 3rd > ## dimension to them: > my_GxNx1_arrays <- lapply(my_GxN_matrices, > function(x) { dim(x) <- c(dim(x), 1L); x } > ) > > ## Bind the GxNx1 arrays along their 3rd dimension (unfortunately we > ## need to use an non-exported utility to do this at the moment, this > ## will need to be addressed): > abind2 <- SparseArray:::.abind_SVT_SparseArray_objects > big_GxNxk_array <- abind2(my_GxNx1_arrays, along=3L) > > For 2. it turns out thatbase::apply(X, ...)callsas.array(X)internally, turning it into a dense representation, so we must absolutely avoid it. However, this seems to do the job in a pretty efficient manner (it takes advantage of therowMeans()method for SVT_SparseMatrix objects): > > tmp <- lapply(seq_len(ncol(big_GxNxk_array)), > function(j) rowMeans(big_GxNxk_array[ , j, ])) > res <- do.call(cbind, tmp) > > Performance:I tried the above with different collections of sparse matrices (oftype()integer) of various shapes (G x N), sizes (k), and densities. > * G=500,N=200,k=1000, density=0.1: > > * T1 (time to construct the 3D object): < 1 sec > * MFP (memory footprint of the 3D object): 109Mb > * T2 (time to compute means across thekmatrices): < 1 sec > > * G=500,N=200,k=15000, density=0.1: > > * T1: 7.3 sec > * MFP: 1.6Gb > * T2: 5.5 sec > > * G=200,N=500,k=15000, density=0.1: > > * T1: 10 sec > * MFP: 2.77Gb > * T2: 10.2 sec > > * G=1500,N=750,k=20000, density=0.05: > > * T1: 11.7 sec > * MFP: 10.8Gb (would be 84Gb if it was dense!) > * T2: 36 sec > There’s still room for some improvements e.g. therowMeans()method is currently implemented in R (it uses avapply()loop) but it could be moved to the C level. Also we should extend the capabilities of the method and make it work on objects with an arbitrary number of dimensions. Note that this is actually the case forbase::rowMeans()and the other functions documented in?base:: colSums. Furthermore, all these functions have adimsargument and we should support that too. This would allow us to use something likerowMeans(big_GxNxk_array, dims=2)instead of the hack above. I’m adding these things to the TODO list.

2023-05-12

Lluís Revilla (06:49:10): > I saw the email and message about new positions at the technical advisory board and I have a question: The current election is this because something is stepping down before its due time or the natural turnover of roles? (I couldn’t find the dates when the current members where due to leave).

Michael Love (06:58:32): > There is natural turnover of roles, certain members who have served a three year term can choose to re-apply for their position or not

Michael Love (06:59:24): > “Members and executive officers are elected to a three-year renewable term.”

Michael Love (06:59:31): > https://www.bioconductor.org/about/technical-advisory-board/TAB-Governance.pdf

Michael Love (07:01:08): > Aedin, Shila and my terms are coming up this cycle

Michael Love (07:02:12): > one year ago it was Charlotte, Laurent, Stephanie, Aaron (i can see this in the minutes from May 2022)

Lluís Revilla (07:04:20): > Thank Michael!

Kasper D. Hansen (08:16:50): > Its a good question and perhaps - for next time (so high chance of forgetting this comment) - it may be nice to have a couple of sentences on this, to briefly clarify how things work

Lluís Revilla (10:28:19): > In the Code of conduct committee we have at the end of the page the current members and when their term finish:https://bioconductor.github.io/bioc_coc_multilingual/. Perhaps this can be added once the new members are added to the website

2023-05-27

Hervé Pagès (18:19:32) (in thread): > An update on this: withSparseArray1.0.8, the means or sums across theksparse matrices can be obtained by calingcolMeans()orcolSums()on thek x N x Gsparse array (obtained by transposing the original 3D array withaperm()). WithG=1500,N=750,k=20000, and a density of 0.05, the 3D transposition takes a little bit less than 3 min, but thencolMeans()orcolSums()take less than 2 sec.:slightly_smiling_face:

2023-06-01

Aedin Culhane (13:01:08): > if you would like to contribute (either in-person in Dublin or remotely) to the Elixir All Hands workshop “Opportunities for Bioconductor and ELIXIR communities to co-develop training infrastructure” on June 6th 11-12.30pm GMT please let@Maria Doyleknow or comment in the#elixirchannel

2023-06-05

Shila Ghazanfar (01:00:12): > Hi TAB, over the weekend there was a post to bioc-devel on ‘BioConductor package vulnerabilities to R-spatial evolution process’ from Prof Roger Bivand flagging several Bioconductor packages having strong dependencies onsppackage, and Roger highlighted this issue on MoleculeExperiment as a most recent examplehttps://github.com/SydneyBioX/MoleculeExperiment/issues/1In the specific MoleculeExperiment case, we actually already implemented a superceded function that does not usesp, so I expect this to be quite a straightforward fix (remove the old/obsolete function, remove the dependency), but I think it warrants some further discussion. > > Personally, I wasn’t so aware of the history and interplay between these packages, and I imagine other developers may not either. I’d be very happy to discuss potential solution, thanks in advance,

Kasper D. Hansen (10:42:04): > So they are essentially working on replacing the backend ofsp. The way they have chosen to do this, is to leave it up to downstream users to use functions which doesn’t depend on the retired dependencies. I was aware this was going on, but not so clear on the details and the various blog posts.

Kasper D. Hansen (10:42:45): > However, it might be useful to implement the environment variable ` > > *SP_EVOLUTION_STATUS*=2 > > in our build system

Kasper D. Hansen (10:43:03): > This will flag uses of the depcrecated packages

2023-06-06

Aedin Culhane (11:22:40): > Citeable workflow hub (DOI) - also includes curated collections of workflows. Should Bioc workflows have a presencehttps://workflowhub.eu

Aedin Culhane (11:30:48): > Data DOI -https://datacite.org/value.html - Attachment (datacite.org): DataCite’s Value > DataCite -

2023-06-08

Andres Wokaty (16:05:03) (in thread): > If I understand the blogs correctly, this will forcespto usesfrather thanrgdal. We do have some packages that listspas a dependency; however, we don’t currently havergdalinstalled on the builders, so I don’t think we’ll have any benefit from adding it sincergdalis not being used. This should also become the default behavior sometime this month. > > I did find 3 packages that import or list as a dependencyrgeosandmaptools. I have sent emails to the maintainers that these packages will be archived by CRAN in October.

Kasper D. Hansen (20:30:09) (in thread): > Ah, well, that solves it

Kasper D. Hansen (20:30:33) (in thread): > (This is also my understanding)

2023-06-12

Michael Love (07:29:27): > Eventually I would like to make a PR to the website proposing some edits to the “Documentation” box which appears in various places, before I do that though just posting this again as a google doc summarizing some edits:https://docs.google.com/document/d/1DPv6xHn18avSDLHK57iYqktFzV7LwkpOpKgF34-AHno/edit

Mike Smith (07:34:32) (in thread): > I wonder if this would be good also posted to#biocwebsitesince@Maria Doyleand the working group are actively engaging on the potential design at the moment.

Maria Doyle (07:41:37) (in thread): > Yes please do post that there!

2023-07-06

Alex Mahmoud (13:06:58): > @Kasper D. Hansenlist of possible VMs re last question on GPUs: (sourcehttps://docs.jetstream-cloud.org/general/vmsizes/) - File (PNG): image.png

2023-07-26

Helena L. Crowell (08:18:59): > @Helena L. Crowell has joined the channel

2023-08-31

Vince Carey (10:47:27): > We have a serious concern about packages with lots of downstream dependencies for which the developer emails do not resolve. TAB members, please think this over.

Vince Carey (10:53:31): > Also – consolidation of communication channels. Tech solutions? AI approaches to scan, integrate, prioritize messages to and within the project?

2023-09-07

Vince Carey (11:12:56): > TAB members who do not have a zoom link for today’s meeting, or a slide link for today’s meeting, should DM me.

2023-09-13

Levi Waldron (08:45:34): > I presented two lightning talks at the NumFOCUS summit this week. My overall impression is that it will be a great organization to be a part of, because 1) they’re open-source enthusiasts and good people who have a lot to offer to help Bioconductor grow and do more things, and 2) it brings together an extremely impressive group of software projects with expertise that we will benefit from and developers who are enthusiastic about sharing and collaboration. These were my talks: > * Growth and Governance of Bioconductor > * Bioconductor Developer Experience

Sean Davis (09:31:50): > This was a good poke that at some point, we might want to review and standardize where we can the community standards for each repo:https://github.com/Bioconductor/bioconductor_docker/community - Attachment: Attachment > FYI there seems to be an issue in the Dockerfile when downloading [Renviron.site](http://Renviron.site) through unsecured HTTP. I’ve opened a PR for the devel Docker image (https://github.com/Bioconductor/bioconductor_docker/pull/85), although it wasn’t clear to me whether direct contributions were welcome or not?

Mike Smith (09:54:59) (in thread): > Maybe this is more (or at least jointly) appropriate for the CAB? Having a CONTRIBUTIONS file in each official repo would be a positive move in my opinion.

2023-09-15

Leo Lahti (04:56:51): > @Leo Lahti has joined the channel

2023-09-21

Aedin Culhane (08:53:29): > We will have a brief (10 min) talk on bioconductor governance (TAB/CAB/Core/working groups) at eurobioc2023 tomorrow at 11am CESThttps://eurobioc2023.bioconductor.org/schedule/. We only have 10 min so just enough time for quick overview - Attachment (eurobioc2023.bioconductor.org): Schedule > Schedule

2023-09-22

Aakanksha Singh (05:16:13): > @Aakanksha Singh has joined the channel

2023-10-17

saskia (02:03:53): > @saskia has joined the channel

Anna Quaglieri (she/her) (02:04:03): > @Anna Quaglieri (she/her) has joined the channel

2023-10-23

Sean Davis (16:37:56): - Attachment: Attachment > Bioconductor has a long (and successful) history of requiring a single maintainer for each package. However, I wonder if we can and should consider other models. Github and other social coding platforms encourage a much broader definition of software development, including shared responsibility. PR reviews, discussions, issues, projects, blame, and even automated CI/CD enable and encourage the concept of a team of developers that can be structured according to project needs and (local) community governance and engagement. > > I’m curious to hear what others feel about this. Cross-posting to TAB as well.

Lori Shepherd (16:51:53) (in thread): > The hard part is R CMD (build I think but it may be check) enforces this. So really it also comes from R core

Hervé Pagès (16:52:40) (in thread): > Aligning with CRAN on this. Like CRAN we also require that the maintainer be a real person and not a mailing list or alias that redirects to a group of people (and we’re failing on this with our use ofmaintainer@bioconductor.orgfor many core packages).

Lori Shepherd (16:53:01) (in thread): > At least as far as explicit in description. > And we do grant exceptions as far as extra push access by request

Hervé Pagès (16:56:15) (in thread): > Note that CRAN/Bioconductor single maintainer policy is not incompatible with collaborative software development on GitHub or other social coding platforms. But someone needs to be our reliable point of contact.

Henrik Bengtsson (17:05:18): > @Henrik Bengtsson has joined the channel

Henrik Bengtsson (17:12:17) (in thread): > I think there should beonemaintainer who’s responsible (packageAuthors@R``role = "cre"), but that maintainer can then distribute that responsibility however they’d like. Shared responsibility increases the risk for no-one being responsible. > > Technically,R CMD buildwill give an error if you specify more than one maintainer (i.e. two or moreperson():s withrole = "ctb"), e.g. > > $ R CMD build teeny > * checking for file 'teeny/DESCRIPTION' ... OK > * preparing 'teeny': > * checking DESCRIPTION meta-information ... ERROR > Authors@R field gives more than one person with maintainer role: > Alice <alice@example.org> [cre] > Bob <bob@example.org> [cre] > > See section 'The DESCRIPTION file' in the 'Writing R Extensions' manual. > $ > > If the discussion is aboutgit pushrights, then I think that can be provided in other ways, while still having a single maintainer as the requirement. Maybe there’s anotherrolethat can be used for this? From?person, the supportedrole:s are defined in <https://www.loc.gov/marc/relators/relaterm.html>. FWIW,role = "cre"is defined there as: > > “Creator [cre]: A person or organization responsible for the intellectual or artistic content of a resource.”

Hervé Pagès (17:15:20) (in thread): > The fact that R Core/CRAN don’t make the distinction between Creator and Maintainer has always puzzled me!

Henrik Bengtsson (17:31:05) (in thread): > Have you asked? My guess is that the term “maintainer” is something R had way before the formal concept of “role” was introduced, and “creator” is the best role for what we mean by an R package maintainer.

Hervé Pagès (18:42:10) (in thread): > That’s going to be a hard sell with me. Why would an R package be any different from other pieces of software? Nobodyevertalks about contacting thecreatorof a package when they actually mean the maintainer. I don’t know what the corresponding terms would be for the on-going action ofmaintaininga package, (“creating”?, like in “I’ve been creating this package for the last 5 years”), or for taking care of themaintenanceof a package (taking care of the “creation”?). You create once, and you maintain forever. These are 2 very different things. > Creator is more like “original author” to me.

Stevie Pederson (20:39:07): > @Stevie Pederson has joined the channel

2023-10-24

Johannes Rainer (02:04:58) (in thread): > I agree with Herve - but just also adding some more info regarding the original comment: > > For our RforMassSpec packages we have one developer being (the “mainteiner/creator”) responsible to push updates to Bioconductor. For the development itself through github we have a maintainer “team” (Laurent, Sebastian and myself) and we require reviews for all pull requests from at least one other developer (can be the one of the above “core” developers, or another “author” that contributed PR-related code to the respective package. > > That approach surely slows development down sometimes (depending on availability of the others), but also ensures that the code is consistent and that more than just the original code developer knows about that code/functionality he/she provided.

2023-10-26

Daniel Niiaziiev (15:51:31): > @Daniel Niiaziiev has joined the channel

2023-10-29

Heather Turner (11:48:23): > @Heather Turner has joined the channel

2023-10-31

Henrik Bengtsson (22:00:37): > Q: Has there been a discussion around hosting packages in the package repo that cannot be installed, because they are broken? For example,https://www.bioconductor.org/packages/DeepBlueR/is broken on all platforms. It neither builds nor installs, so it won’t even make it to the checks. Yet, it’s distributed viaavailable.packages(repos = "https://bioconductor.org/packages/3.18/bioc"). To me it would make more sense toneverserver a package via the repo if it cannot be installed.

Henrik Bengtsson (22:01:43) (in thread): > To clarify, it can still live in the Bioconductor git repo, and still have a package webpage, but I don’t think it should be in the package repo.

2023-11-01

Hervé Pagès (01:58:13) (in thread): > In that case it seems that it would make sense to pruneDeepBlueRfrom the public repo since it has not built on any platform for a long time. Plus it is officially deprecated. > Then there’s the question of packages that are included in the release, that are all green on the daily report, but that suddendy can no longer be installed because they depend on a CRAN package that disappeared. This happens more often that we’d want these days. Should we also remove these packages from the public repos, even though they’ve been officially released? This would mean “unreleasing” them. Would that still be ok?

Hervé Pagès (11:00:45) (in thread): > On that topic I think that packages getting removed from CRAN despite beeing dependencies of the current BioC release is a serious issue. The CRAN folks are prompt to contact us to complain when a change in a Bioconductor package breaks a CRAN package (they have a significant number of packages that depend on us), but OTOH they will not hesitate to remove a package that we depend on, without consulting or notifying us. Maybe someone could approach them to discuss a less disruptive approach?

Henrik Bengtsson (13:40:57) (in thread): > > Then there’s the question of packages that are included in the release, that are all green on the daily report, but that suddendy can no longer be installed because they depend on a CRAN package that disappeared. This happens more often that we’d want these days. Should we also remove these packages from the public repos, even though they’ve been officially released? This would mean “unreleasing” them. Would that still be ok? > Yes, I think so. If a package cannot be installed, because it has errors itself, or its dependencies doesn’t install, there’s no point in serving the package via the repo’sPACKAGESfile. One reason being (install.packages()orBiocManager::install()doesn’t really matter): > > > BiocManager::install("DeepBlueR") > Warning message: > package 'DeepBlueR' is not available for this version of R > > A version of this package for your version of R might be available elsewhere, > see the ideas at[https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages](https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages) > > is instant and a more clear than: > > > BiocManager::install("DeepBlueR") > ... long, windy installation of dependencies, and compilations, until all of a sudden ... > > Execution halted > > ERROR: lazy loading failed for package 'DeepBlueR' > * removing '/home/henrik/R/ubuntu22_04-x86_64-pc-linux-gnu-library/4.3-CBI-gcc11/DeepBlueR' > > The downloaded source packages are in > '/tmp/henrik/Rtmp1BHET7/downloaded_packages' > Warning message: > In install.packages("DeepBlueR") : > installation of package 'DeepBlueR' had non-zero exit status > Old packages: 'evaluate', 'RcppArmadillo', 'xfun' > Update all/some/none? [a/s/n]: > ... > > I think the main question is how soon should be removed from the repo. CRAN has a two-week policy for fixingcheckerrors. After that, the packages isarchived, such that it is not listed in the repo’sPACKAGESfile. The easiest would be to have the same time-limit on Bioconductor - because “same expectations everywhere”. An possible advantage with Bioconductor, is that the automatic archival of a package could be suspended if there’s a new version sitting in the git repo waiting to be built and checked. > > Side effects of archiving a package> When a package is archived, the only reasonable action is to archive also also hard reverse dependencies. Packages with a soft dependency (viaSuggests:) can still depend on an archived package. > > I think the gist is that it should be possible to “trust” a package repo and itsPACKAGESdatabase - anything served that way is validated and a contract saying “these packages can be installed [on at least one platform]”.

Hervé Pagès (14:25:56) (in thread): > Note that Windows and Mac users can actually install and loadDeepBlueRso I’m worried that “unreleasing” the package would be a cure that is worse than the disease.

Hervé Pagès (14:34:22) (in thread): > More generally speaking I have mixed feelings about the idea of unreleasing packages. Seems to go against our comcept of releases. CRAN doesn’t have this concept, packages come and go at any time. However in our case, the idea has always been to touch a release as little as possible, when only strictly necessary.

Henrik Bengtsson (16:07:44) (in thread): > I think “publish” and “archive” might be better terms to use here. I see CRAN (Comprehensive R Archive Network) providing a few “services” to the community. > > FOREVER ARCHIVE: > The first one is that it publishes packages and versions of them until the end of time. When a package has been published on CRAN, it takes a lot for it to be removed from there. I don’t know if it ever happened, but I can imagine a package can be fully removed if it was illegally published in the first place (e.g. copyright, illegal content, …) or malicious. > > INSTALLATION SERVICE: > Then CRAN also provides a R package repository service for installing packages on CRAN using built-in R functions. The set of packages in the package repo is a subset of all packages on CRAN. The CRAN package repo makes a promise that all packages listed inPACKAGEScan be installed. If they cannot make that promise, they’ll archive the package (=remove it fromPACKAGES). I should also say,install.packages(url)can be used to install from the set of packages that are archived. Technically, old package versions are always archived. > > CHECK SERVICE: > The content of the R package repository is guided by the CRAN package checks that run on R-oldrel, R-release, and R-devel across multiple platforms. The minimal requirement is that no package should remain in the package repository if the checks detects ERRORs (and those errors are not due to recently introduced bugs in R-devel). WARNINGs can also cause a package to be archived, but that process often takes longer. AFAIK, NOTEs are not a cause for a package being archived (but I could be wrong). The CRAN incoming checks, which you have to pass when you submit a new package, or an updated version, will make sure that the published package pass with all OKs. (It’s possible to argue for NOTEs being false positives, or for them not to be fixed, but that requires a manual approval by the CRAN Team). > > If you look at Bioconductor the same way, I argue that the INSTALLATION SERVICE of Bioconductor should not let broken packages hang around in the Bioconductor package repository unnecessarily long. Those packages could bearchived, just like on CRAN. That would means to archive underhttps://bioconductor.org/packages/3.18/bioc/src/contrib/Archive/<pkg>/*.tar.gz. (BTW, Bioconductor doesn’t have such a subfolder, correct? And, there is no way to get all source tarballs that ever existed for a specific Bioconductor version, i.e. only the latest version released is available as a source tarball. Everything else has to be re-engineered from the git repo. Or?)

Hervé Pagès (17:03:32) (in thread): > Yes we do:https://bioconductor.org/packages/3.18/bioc/src/contrib/Archive/ > > The CRAN package repo makes a promise that all packages listed in PACKAGES can be installed > We don’t, and we can’t (and I’m not sure they can do that either). > > If you look at Bioconductor the same way > Again, Bioconductor is not CRAN. A major difference is that we have the notion of release, and they don’t. So we only mimic what they do to a certain extent. The goal was never to copy their model, otherwise we wouldn’t bother to host and distribute packages via our own CRAN-style repos in the first place. > > For theDeepBlueRexample, I’m not even convinced that the outcome ofBiocManager::install("DeepBlueR")becomes that much better if we “unrelease” the package than if we don’t. With the latter the Linux user gets an installation error (the Windows and Mac users are fine), with the former everybody gets a message that tells them that the package “is not available for this version of R”. Both are frustrating and unsatisfying, but I’m not sure which one is less confusing. At least the latter has the merit to make it clear that the package itself is broken, and that it’s not that somehow we forgot to include the package in the release or that they mistyped the package name. > Furthermore, when the CRAN folks start breaking the 3.18 release by removing packages, I’d rather keep the now broken BioC packages available. I’d rather have the user feel the pain of having packages removed from CRAN, and blame the CRAN folks or the CRAN package maintainers for that, rather than blaming us for removing a package from the release. Evil, I know!:smiling_imp:

Henrik Bengtsson (19:16:29) (in thread): > > The CRAN folks are prompt to contact us to complain when a change in a Bioconductor package breaks a CRAN package (they have a significant number of packages that depend on us), but … > Do you have examples that you can share? I’m trying better understand the CRAN-Bioconductor relationship. > > OTOH they will not hesitate to remove a package that we depend on, without consulting or notifying us. Maybe someone could approach them to discuss a less disruptive approach? > FWIW, when a package is about to be archived on CRAN, they email maintainers of CRAN packages that are affected. I think those emails go out at the same time the maintainer of the to-be-archived package is notified, i.e. at least two weeks in advance. > > If I understand you correctly, they willnotprovide the same heads up for reverse dependencies on Bioconductor. Instead, for those Bioconductor package maintainers point of view, those CRAN packages just disappear all of a sudden without notice. Do I understand it correctly?

Hervé Pagès (19:39:28) (in thread): > > If I understand you correctly, they will not provide the same heads up for reverse dependencies on Bioconductor. > No heads up AFAIK. Maybe the maintainers of the affected Bioconductor packages are notified, I don’t know. But the core team is not. Being notified would be nice, and better than nothing I guess, but that would still not really address the issue that Bioconductor packages become uninstallablein release.

Henrik Bengtsson (20:07:00) (in thread): > > But the core team is not. > So one actionable proposal to CRAN could be to ask for a generalize mechanism for such notification that anyone can subscribe to. For instance,https://llrs.dev/post/2021/12/07/reasons-cran-archivals/mentionshttps://cran.r-project.org/src/contrib/PACKAGES.in, which is a file that usesX-CRAN-Commentas follows: > > Package: ZIPG > X-CRAN-Comment: Archived on 2023-10-29 as requires archived package 'optimr'. > > Maybe one could ask CRAN to update that file to also show packages scheduled to be archived, e.g. > > Package: foobar > X-CRAN-Upcoming: To be archived on 2023-11-13 as requires archived package 'optimr'. > > That way Bioconductor, and anyone else, can monitor upcoming events. > > This is might be something that theR Consortium ‘Repositories’ working groupcould help look into.

Henrik Bengtsson (20:10:20) (in thread): > cc/@Lluís Revilla

Hervé Pagès (21:21:38) (in thread): > Reminds me of the “extra” package repo we had in the early days of the project in addition to the “software”, “data-annot”, “data-exp”, and “workflows” repos. E.g.https://bioconductor.org/packages/2.0/extra/src/contrib/PACKAGESin BioC 2.0, orhttps://bioconductor.org/packages/3.5/extra/src/contrib/PACKAGESin BioC 3.5. > > We used it to host/redistribute packages not available on CRAN like packages from the Omega Project (https://www.omegahat.net/), or CRAN package binaries that for some reasons CRAN wouldn’t build. All this to support some Bioconductor packages that relied on these packages. Was kind of hacky but handy. We got rid of it in 3.6. > > Anyways, if we had this, we could use it to host those packages that get removed from CRAN and that BioC release depends on, granted of course that they don’t get removed for breaking the law. > > Another approach maybe would be to modifyBiocManager::install()so that it’s able to find dependencies in the CRAN Archive, but I don’t know how feasible that is or how hard that would be.

2023-11-03

Lluís Revilla (09:40:10) (in thread): > Thanks for the ping@Henrik Bengtsson, I was already following the first messages:smile:. Yes, the X-CRAN-Comments is something we (the working group) would like to discuss/and could ask CRAN, specially since it seems that a new (important) repository might exist soon (~years). Some time ago I asked for feedback from the community for a notification service (for CRAN checks and other issues:https://fosstodon.org/deck/@Lluis_Revilla/111149198801072808). I might start working on it. BTW, thanks also for the summary of CRAN services I want to start a series of posts on (R) repositories.

2023-11-06

Sean Davis (14:22:18): > In case folks here didn’t know, Wes McKinney of pandas, arrow, ibis, etc. is joining Posit.https://wesmckinney.com/blog/joining-posit/

2023-11-14

Federico Marini (14:55:51): > @Federico Marini has joined the channel

2023-11-30

Aedin Culhane (08:31:54) (in thread): > I thought he was working with them for a while back (2018) in Ursa labs

Aedin Culhane (08:32:47) (in thread): > Ursa became Voltronhttps://voltrondata.com - Attachment (Voltron Data): The Leading Designer and Builder of Enterprise Data Systems > The new way to design and build composable data systems.

Aedin Culhane (08:33:01) (in thread): > Think he is still involved with Voltron

Aedin Culhane (08:35:45) (in thread): > @Sean Daviswould be delighted if you could explain what it is??

2023-12-01

Sean Davis (11:32:27): > https://datascience.cancer.gov/news-events/news/sharing-cancer-research-software-nih-wants-hear-you - Attachment (datascience.cancer.gov): Sharing Cancer Research Software? NIH Wants to Hear from You! | CBIIT > Are you working with source codes, algorithms, workflows, and other software in your cancer research? NIH wants to hear from you! Respond today to help NIH develop new best-practice guidelines.

2023-12-03

Simple Poll (11:01:03): > @Simple Poll has joined the channel

Simple Poll (11:01:12): > [Unsupported block type: section] > > [Unsupported block type: section] > > [Unsupported block type: section] > > [Unsupported block type: section] > > [Unsupported block type: actions] > > [Unsupported block type: context]

Vince Carey (13:05:17) (in thread): > so while not “using” i have fired it up and added some extensions - R and python. How do we know if we are running up a bill?

Sean Davis (13:24:09) (in thread): > https://github.com/features/copilot

Sean Davis (13:24:46) (in thread): > Flat rate at $10/month.

2023-12-05

Federico Marini (03:33:50) (in thread): > Isn’t it free for Edu accounts?

Sean Davis (13:45:55) (in thread): > Yep.

Sean Davis (13:46:41) (in thread): - File (PNG): image.png

2023-12-07

Lori Shepherd (14:37:56): > https://community-bioc.slack.com/archives/C020U8LU59P/p1701977296231029see voting on things I wanted input on – feel free to make comments in threads for them - Attachment: Attachment > I’ll post all the above pinned items individually here – please use thumbs up for agree/move forward thumbs down for no – and then comment in a thread any comments/remarks

2024-01-05

Sean Davis (15:08:16): > I looked into the availability of citation files. Of the 2200 software packages, there are 647 unique DOIs available. There are 1199 packages with a Citation file with at least a title (so, many do not have DOIs). The distribution of citations per package is: > > 1 2 3 4 5 6 7 > 765 119 31 15 2 2 3 > > So, 765 packages have only a single citation in the CITATION file while 3 have 7 citations in the CITATION file. > > To get this information, I: > 1. grabbed the biocPkgList for devel > 2. Download the source tarball and extract the CITATION file, if it exists > 3. Read the CITATION file with readCitationFile > 4. Collapse the resulting list of bibentries into a single bibentry list > 5. For completeness, I created a bibtex file for all the citations, where the bibtex has a “pkg” key added

Sean Davis (15:17:32) (in thread): > I’ll pull the citations themselves and do some basic bibliometrics as the next step, but some quick hand-checks show that the citation counts are really astonishing for some of these packages.

Sean Davis (15:26:46) (in thread): > Here are some summary statistics:https://gist.github.com/seandavi/001c51c87cc24a4bf93fa648d5581f10

Sean Davis (15:58:11) (in thread): > And of course, you can make these look like anything you want, but here is a keyword co-occurrence network that does start to hint at some structure. - File (PNG): image.png

Kasper D. Hansen (16:35:29) (in thread): > So one next step would be to follow up with authors on this. For example. most (all) of my packages have CITATION files, but I am not sure they have been kept up to date.

Kasper D. Hansen (16:36:00) (in thread): > We should wrap your code into a package. Perhaps grab CITATION from git as opposed to downloading the tarball

Michael Love (19:43:12) (in thread): > This is awesome Sean. The co-authorship graph would be interesting, as these papers tied to packages include non-developers. That adds information to the graph that isn’t present from what we have from the Authors@R. Also the journal could be useful for clustering packages

Michael Love (19:43:29) (in thread): > Some of the bibliometrics would be useful for an automated report for the project

Michael Love (19:44:29) (in thread): > Also looking, at the project level, at what journals/institutes/fields we are getting citations from.

Michael Love (19:46:49) (in thread): > We should also make a push to have all packages contain a CITATION. BiocCheck checks the validity of CITATION, but does it require it?

Sean Davis (20:34:59) (in thread): > I have started a package, but I think we may need to do some offline work to get data resources into place. While we can pull stuff from APIs, when citation counts get into the 6 digits, this is very slow, so I’ll get some automation in place.

Henrik Bengtsson (21:12:30) (in thread): > For my clarification: You’re after packages that don’t cite articles related to the package or the method it implements, correct? (I don’t think CITATION should be used to point to reverse dependencies, that is, articles that make use of the package.)

Sean Davis (22:11:57) (in thread): > You are correct, Henrik. We have the list of packages with CITATION files and the contents of those. Some of those CITATION files do not contain identifiers that are useful for automated metadata extraction. > > WRT your comment about “reverse dependencies”, there are no cases I’ve seen where folks confused the meaning/intent of the CITATION file, at lease when looking by eye. We can do some digging to look for outliers, though.

2024-01-08

Lluís Revilla (04:10:40) (in thread): > So almost half of the packages are not published in an article? (Or authors forgot to add the article). Is there something that the Bioconductor’s F1000Research gateway could help? I.e: sending a reminder to those maintainer to see if they forgot to add an article and if there isn’t anyone to publish at it if they want and can?

Sean Davis (07:02:47) (in thread): > @Lluís Revillayes, we’ll be doing a “reminder” campaign to get citations for packages. I’m not surprised to see that many packages are unpublished in the literature, though.

Michael Love (08:10:09) (in thread): > “BiocCheck checks the validity of CITATION, but does it require it?” > > I take back “require” — I think BiocCheck should have a light message like > > “The package is missing a CITATION file. Bioconductor packages are not required to have a CITATION file but it is useful both for users and for tracking Bioconductor project-wide metrics. If you later post a preprint or publish a paper about your Bioconductor package, please add the details with DOI to the CITATION file.”

Kasper D. Hansen (08:52:44) (in thread): > I agree with Michael here

Kasper D. Hansen (08:53:31) (in thread): > But the bigger task ahead is to somehow start the process of following up on the existing CITATION files. Some amount of work, but I think it would be worthwhile for the project

Kasper D. Hansen (08:53:44) (in thread): > Assuming we care about bibliometrics (which I think we should)

Sean Davis (08:56:36) (in thread): > Concerning BiocCheck, anyone willing to make a PR here:https://github.com/Bioconductor/BiocCheck?

2024-01-09

Marcel Ramos Pérez (11:39:44) (in thread): > I can add this to BiocCheck and also include a check for DOI in the CITATION data

Henrik Bengtsson (11:56:02) (in thread): > > … also include a check for DOI in the CITATION data > Two quick comments, which might already be aware of/have thought about: > 1. R CMD check --as-cran ...validates DOI (curl lookup etc.). You might be able leverage that via some*R_CHECK*...settings > 2. Not all citations have DOI:s - how to deal with those?

Sean Davis (12:58:25) (in thread): > For citations that do not have DOIs, we can perform a metadata search (roughly, fuzzy match title, author, and journal). CITATION files are just the “raw data” that supply the input to the ETL that leads to a useful data project for bibliometrics.

Sean Davis (13:00:04) (in thread): > In fact, some articles have multiple DOIs in “databases,” so not all DOIs will match to all bibliographic data resources (pubmed, openalex, etc.)

Sean Davis (13:00:48) (in thread): > These are all edge cases; starting with maximizing the population of CITATION files is the place to start.

2024-01-10

Sehyun Oh (10:52:37): > @Sehyun Oh has joined the channel

2024-01-22

Sean Davis (17:11:13): > After some discussion about wishes for the build system, I took advantage of@Vince Carey’s budding historical archive of build reports. FYI, each build report is available for each build as a gzipped tarfile. However, these have not been maintained historically and are replaced with each successive build. The data in each tarfile are the same as those available on the website or via the BiocPkgTools::biocBuildReport. With a collection of a couple of hundred of these, captured historically, I pulled out all the relevant data for each and constructed a couple of parquet (tables) for each. I then loaded those to motherduck (duckdb, but a server). > > At a higher level, the idea is to be able to track the build processes longitudinally and perform analytics over the builds. To give things a try, using duckdb (whatever flavor you want to use–R, python, CLI), start duckdb and then: > > ATTACH 'md:_share/bioc/b0d67cb0-cbf7-4d24-9f39-2fce1e8ea648'; > use bioc; > show tables; > select count(*) from build_reports; > describe build_reports; > select min(started_at),max(ended_at) from build_reports; > select distinct git_branch from build_reports; > describe propagate_statuses; > select array_agg(package) as affected_packages, count(**), propagate from propagate_statuses group by propagate order by count(**) desc; > > Here is what the two tables look like: > > D show build_reports; > ┌──────────────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐ > │ column_name │ column_type │ null │ key │ default │ extra │ > │ varchar │ varchar │ varchar │ varchar │ varchar │ int32 │ > ├──────────────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤ > │ package │ VARCHAR │ YES │ │ │ │ > │ version │ VARCHAR │ YES │ │ │ │ > │ command │ VARCHAR │ YES │ │ │ │ > │ started_at │ TIMESTAMP │ YES │ │ │ │ > │ ended_at │ TIMESTAMP │ YES │ │ │ │ > │ ellapsed_time │ VARCHAR │ YES │ │ │ │ > │ ret_code │ VARCHAR │ YES │ │ │ │ > │ status │ VARCHAR │ YES │ │ │ │ > │ package_file │ VARCHAR │ YES │ │ │ │ > │ package_file_size │ VARCHAR │ YES │ │ │ │ > │ machine │ VARCHAR │ YES │ │ │ │ > │ process │ VARCHAR │ YES │ │ │ │ > │ deploy_dest_dir │ VARCHAR │ YES │ │ │ │ > │ warnings │ VARCHAR │ YES │ │ │ │ > │ md5 │ UUID │ YES │ │ │ │ > │ maintainer │ VARCHAR │ YES │ │ │ │ > │ maintainer_email │ VARCHAR │ YES │ │ │ │ > │ git_url │ VARCHAR │ YES │ │ │ │ > │ git_branch │ VARCHAR │ YES │ │ │ │ > │ git_last_commit │ VARCHAR │ YES │ │ │ │ > │ git_last_commit_date │ TIMESTAMP │ YES │ │ │ │ > │ md5_right │ UUID │ YES │ │ │ │ > ├──────────────────────┴─────────────┴─────────┴─────────┴─────────┴───────┤ > │ 22 rows 6 columns │ > > D show propagate_statuses; > ┌──────────────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐ > │ column_name │ column_type │ null │ key │ default │ extra │ > │ varchar │ varchar │ varchar │ varchar │ varchar │ int32 │ > ├──────────────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤ > │ package │ VARCHAR │ YES │ │ │ │ > │ version │ VARCHAR │ YES │ │ │ │ > │ maintainer │ VARCHAR │ YES │ │ │ │ > │ maintainer_email │ VARCHAR │ YES │ │ │ │ > │ git_url │ VARCHAR │ YES │ │ │ │ > │ git_branch │ VARCHAR │ YES │ │ │ │ > │ git_last_commit │ VARCHAR │ YES │ │ │ │ > │ git_last_commit_date │ TIMESTAMP │ YES │ │ │ │ > │ md5 │ VARCHAR │ YES │ │ │ │ > │ platform │ VARCHAR │ YES │ │ │ │ > │ propagate │ VARCHAR │ YES │ │ │ │ > ├──────────────────────┴─────────────┴─────────┴─────────┴─────────┴───────┤ > │ 11 rows 6 columns │ > └──────────────────────────────────────────────────────────────────────────┘ > > I’ve excluded the actual log text from the duckdb tables, but those are available for each entry in the build_reports table (one per row).

Sean Davis (17:12:56): > If you have feedback on this work, let me know.@Vince Careyand I will continue to build and maintain an archive of the build reports. If folks find these tables a useful set of data to have, I can add new data regularly.

Lluís Revilla (18:05:10) (in thread): > If I understand this correctly, this is to store the different builds of packages? Or is this more about the checks notes/warnings/errors?

Sean Davis (18:07:35) (in thread): > Not about builds of packages. Only build “metadata.” Just a tidy build report captured over time.

Lluís Revilla (18:28:01): > Hi Board: the CRAN team wants a new package maintainer to deprecate XML:https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010359.htmlThere are many packages important packages in Bioconductor that would be affected. I explored this here:https://llrs.dev/post/2023/05/03/cran-maintained-packages/ - File (PNG): image.png

Lori Shepherd (19:24:15) (in thread): > This might be good to post on tht general channel and possibly onbioc-devel@r-project.orgmailing list to make the broader bioc community aware

Marcel Ramos Pérez (20:12:34) (in thread): > @Lluís RevillaWas a deprecation discussed elsewhere? I don’t see it in the r-pkg-devel email..

2024-01-23

Lluís Revilla (03:09:08) (in thread): > @Marcel Ramos PérezNot directly, but they say that they want packages to use other XML parsers. I talked with some interested people and they don’t see a viable to take over the package to maintain it. What’s the benefit for them? I asked a couple of questions on r-pkg-devel mailing list that might help any potential new maintainer.

Lluís Revilla (03:14:06) (in thread): > @Lori ShepherdI wanted the TAB to be aware of this to decide how to suport Bioconductor packages that depend on it, or if Bioconductor would step up to maintain it. > I already posted it on r-pkg-devel, the TAB can forward that message to the community or provide its own message with a link to the post

Laurent Gatto (03:15:43) (in thread): > I see MSnbase on the graph. I depend on xml2 for more recent packages, which wasn’t around at the time. As far as I’m concerned for MSnbase, I would either switch of xml2 or remove that functionality from the package if it is available in the more recent packages we work on.

Lluís Revilla (03:21:08) (in thread): > @Laurent GattoYes, the xml2 package maintainer, Hadley, is open to add functionality to cover something if it is currently available on XML but not on xml2. > I am not sure which version of Bioconductor I used, probably 3.17 (it was in May when I wrote it), in that version it still depends on XML. Many packages depend indirectly from it.

Martin Morgan (05:39:24) (in thread): > I get > > $ duckdb > v0.8.1 6536a77232 > Enter ".help" for usage hints. > Connected to a transient in-memory database. > Use ".open FILENAME" to reopen on a persistent database. > D ATTACH 'md:_share/bioc/b0d67cb0-cbf7-4d24-9f39-2fce1e8ea648'; > Error: Invalid Error: RPC 'GET_WELCOME_PACK' failed: (UNAVAILABLE, request id: '8783ae0d-c648-496d-9cf3-7099addd2bc9') > > ?

Sean Davis (08:21:19) (in thread): > Apologies, Martin. Motherduck is still very much alpha software and that is where the “_share” is coming from. As an alternative, here are a couple of a signed URLs (good through Friday) for parquet files with the same data. > * build_report (about 5.5GB):https://55bf7202fe14474e57a300f56a652f64.r2.cloudflarestorage.com/bioconductor/build_reports/build_report[…]5248894690119416f082c0e4fbb44703e6889a16 > * propagation_statuses (about 11MB):https://55bf7202fe14474e57a300f56a652f64.r2.cloudflarestorage.com/bioconductor/build_reports/propagate_st[…]9e2b7616d2a4fd82359ac3cbe719ee547537b24a > Download the links. The parquet files are just tabular data. For loading to R, seehttps://arrow.apache.org/docs/r/reference/read_parquet.htmlor use the duckdb R package. - Attachment (arrow.apache.org): Read a Parquet file — read_parquet > ‘Parquet’ is a columnar storage file format. > This function enables you to read Parquet files into R.

Frederick Tan (08:37:27): > @Frederick Tan has joined the channel

2024-01-27

Michael Love (07:36:24): > There is a thread onR-pkg-develabout possible malware in anarchived(not current) package vignette for a CRAN package called poweRlaw. See the thread “[R-pkg-devel] Possible malware(?) in a vignette Colin Gillespie.” Apparently, if you build the archived package’s vignette directly it is fine. It’s not clear yet the source of the [potential] malware component, but it’s been flagged by security software and tested in a sandbox by a security firm.https://stat.ethz.ch/pipermail/r-package-devel/2024q1/thread.html#10402https://cran.r-project.org/web/packages/poweRlaw/index.html

RGentleman (09:24:29): > it is most likely a false positive…

Michael Love (09:37:12): > i’m confused from the thread whether or not sandbox-ed activity was detected. the link says “17 security vendors and no sandboxes flagged this file as malicious”

Michael Love (09:39:09): > but then the thread people are saying it does do suspicious things in a sandbox

Michael Love (09:39:28): > i can’t interpret the report fromvirustotal

Martin Morgan (11:58:40): > There seem to be two important issues. (1) What steps does Bioconductor take to protect its own infrastructure. (2) How does Bioconductor protect users from malicious software. For (1) one might think of sandboxing the build system (it is to some extent), scanning incoming files, OS best practices (e.g., staying current with OS and other software), running an active virus checker, …. For (2) the obvious vulnerabilities are distribution of malicious R / C / etc code included in packages, and re-distribution of malicious binaries (of which user-contributed PDF are one example, c.f.,here). - Attachment (code.bioconductor.org): Bioconductor Code: Search > Search source code across all Bioconductor packages

Michael Love (14:15:09): > yeah a reminder of the issues with PDF

Vince Carey (21:53:26): > The bioc core discussed this at some length last week. Thanks for checking in.

2024-02-01

Marcel Ramos Pérez (13:54:48): > I created the#wasmchannel for anyone interested

2024-02-15

Vince Carey (05:23:25): > Do TAB members view CRAM support as a significant priority for bioc?

Vince Carey (05:36:43): > https://github.com/single-cell-data/TileDB-SOMA/mentions AnnData and Seurat support. Do we want to ensure that SCE is supported?

Michael Love (07:54:39): > in IGVF, I’ve seen prioritization of AnnData and also MuData for single cell objects. i do think that SCE should be more prominent in the discussion, given how many downstream users want to use R (also genomic ranges are nice to have)https://mudata.readthedocs.io/en/latest/

Michael Love (07:59:40) (in thread): > i cant say, i think you don’t see it that much because it’s a ~2x savings, and in the end storage is maybe easier than the cost of working with new formats and extraction of information. you have to go to the reference and apply diffs to get a read sequence

Michael Love (08:00:32) (in thread): > but i dont know if it has some future advantages with new data or working with new types of references

Michael Love (08:01:49) (in thread): > grey -> black - File (PNG): Screenshot 2024-02-15 at 8.01.43 AM.png

Vince Carey (08:20:17) (in thread): > Thanks Mike. Absent significant community input on this we will not be prioritizing CRAM support.

Kasper D. Hansen (12:49:55) (in thread): > I am not up to date on this, but I would think that CRAM is being supported through htslib?

Kasper D. Hansen (12:51:44): > Having SCE support for stuff like this is likely to be a deal breaker for us. If this approach takes off, lack of SCE support would be a major downside to using Bioc. Of course, if it takes of, we could potentially add it later

Kasper D. Hansen (12:52:10): > But basic SCE support should not be cray hard for them to add, so asking them to plan it in from the start, would not be crazy

Vince Carey (13:30:56) (in thread): > Yes, there is code in htslib to work with CRAM but exposing it to R has proven challenging and effort stopped some time ago. We could resume if prominent scientific use cases emerged in the bioc community. I don’t feel we have bandwidth to do it for the sake of completeness. Some use cases might be met through a reticulate interface to pysam.

2024-03-04

Lluís Revilla (17:03:38): > Comparison operators == and != won’t be allowed to be used on language objects:https://stat.ethz.ch/pipermail/r-devel/2024-March/083254.html

Henrik Bengtsson (19:34:20): > Regarding package dependencies dropping off CRAN because issues are not fixed in time:CRANhavenis a repository for recently archived CRAN packages (https://cranhaven.r-universe.dev/). The gist is that archived CRAN packages get to hang around for a little bit longer on CRANhaven. If not un-archived on CRAN within 4 weeks(), they eventually get dropped from CRANhaven too. I prototyped this with Bioconductor in mind, but also for the bigger R community. () I just picked 4 weeks based on a gut feeling and to see what happens.

Lori Shepherd (20:04:43): > Yet if they are not fixed and disappear wouldn’t it just delay the inevitable? And if they are fixed they are back on CRAN then no issue and behaves as normal.

Henrik Bengtsson (20:26:55) (in thread): > The idea is to give a cushion for when the maintainer does not have time to fix their CRAN package within the 2-4 week deadline that the CRAN Team gives them. > > The two days I had CRANhaven up and running, I’ve already seen a few packages being archived and unarchived on CRAN (The commits will give more data over time on what fraction returns to CRAN and within what time span. There’s alsohttps://dirk.eddelbuettel.com/cranberries/one could query). > > One idea could be to haveBiocManager::install()append CRANhaven, to protect the end users from the “noise” of briefly archived CRAN packages. Another way to think about it is that CRANhaven serves users who did not install a specific package in time before it was archived - a little bit like a time machine.

2024-03-05

Lluís Revilla (08:03:13) (in thread): > I looked into the data some time ago. There is no need to check the cranberries as CRAN documents (most of) that process:https://llrs.dev/post/2021/12/07/reasons-cran-archivals/ - Attachment (B101nfo): Reasons why packages are archived on CRAN | B101nfo > Most frequent reason is due to the package not fixed on time, followed by depending on packages archived and policy violation.

Henrik Bengtsson (12:49:54) (in thread): > Very nice; I had forgotten about that report. There you write: > > “This suggests that once a package is archived maintainers do not make the effort to put it back on CRAN except on very few cases were there are multiple attempts. To check we can see the current available packages and see how many of those are still present on CRAN: > > | CRAN | Packages | Proportion | > | ---- | -------- | ----------- | > | no | 3869 | 64% | > | yes | 2183 | 36% | > > Many packages are currently on CRAN despite their past archivation but close to 64% are currently not on CRAN.” > > From this, it sounds like 36% of the archived packages return to CRAN. Is that the correct interpretation? If so, that by itself gives a strong argument for providing a cushion. Then it remains to find a good cutoff for how long of a package should stay on CRANhaven, before giving up on it. I guess those stats can be pulled fromhttps://cran.r-project.org/src/contrib/PACKAGES, because each archive/unarchive event has a datestamp.

Lluís Revilla (14:08:53) (in thread): > Yes, 36% of all packages archived returned to CRAN (when I created the post). As time goes this % will lower, and also it could mean that a package was archived, then returned and then was archived for good. The time they were archived could be calculated comparing the archive and current dates and the date when they were archived. This is relatively trivial to do and could provide some estimation for CRANhaven.

Hervé Pagès (19:13:12) (in thread): > Thanks Lluis for the heads-up.@Andres WokatyWe should probably consider updating R on the devel builders (it’s 7 weeks old!) and add_R_COMPARE_LANG_OBJECTS=eqonlyto the Renviron.bioc file.

Hervé Pagès (19:23:23) (in thread): > The absolute bummer is when a package dropping off CRAN breaks the current release so I’d be in favor of appendingCRANhaven toBiocManager::install()in release. But maybe not so much in devel where we want to see and feel the pain sooner rather than later so developers can start planning around.

2024-03-06

Andres Wokaty (17:35:09) (in thread): > When I reread the post, it seems to say that == and != are allowed on language objects (at least for now) but that ordered comparisons on language objects using <=, <, >=, and > will raise errors soon as part of a larger transition.

Hervé Pagès (17:37:18) (in thread): > correct

Lluís Revilla (17:53:50) (in thread): > It says a s a first step the ordered comparison operators will signal error, I understood that in the future == and != won’t be allowed. > But I’ll stop posting summaries on these heads-up.

Andres Wokaty (17:58:47) (in thread): > I’m sorry if my comment was discouraging. I need to understand the issue well so that I can send a bioc-devel email about it as we will use the flag that’s mentioned. I do appreciate it that you brought it up and summarized it.

2024-03-07

Laurent Gatto (15:05:54): > Thank you@Henrik Bengtssonfor brining the release period and Bioc/CRAN discussions up. We didn’t discuss the Bioc 6-months vs R 1-year release cycles. I understand the reasons for having shorter cycles in Bioc, but this also comes with some added complexity (such as using R-devel only every second cycles). If there were ways to make using R-devel easier (for example docker containers, although I am not familiar with this on Windows; are R-devel binaries available, or only source?), switching to a 1-year cycle for Bioc may become a possibility? How would such a change impact our community: flexibility for developers, extra (or less) burden for the core group, impact on package maintainers, how easy/difficult to use new/devel packages, … Happy to continue this discussion here or next month.

Hervé Pagès (15:42:43): > Thanks Laurent. It’s good to separate the “6-month vs 1-year release cycles” discussion from the “publish new Bioconductor releases on CRAN” discussion. While I think the latter pauses many challenges that would hurt us in many ways, it makes sense to put the former on the table. Just to provide some historical context: the project started with a 6-month period not only because at the time R was also having a new major release every 6 months, but also because: (1) the project was moving fast and all the innovations and improvements had to reach the community in reasonable time, (2) we’ve hundreds of annotation packages that used to get outdated pretty quickly (all the.dbpackages for microarrays) so it was important to update them for each release. This is why when R switched from a 6-month to a 1-year release cycle in 2012, we kept our 6-month cycles.

Laurent Gatto (15:47:47): > Thank you for the clarifications. I appreciate the pace of Bioconductor being a very good reason for a short release cycle. For the sake of discussion, assuming installing and using R-devel was easy enough, would the fast pace of Bioconductor still require a 6 months release? Is the higher risk of breaking packages in R-devel a good enough reason against a 1-year release cycle?

Henrik Bengtsson (16:57:54): > I would love to have near-zero friction support for having Bioconductor release and devel installed in parallel and regardless of R version. I know how to workaround this myself by modifyingR_LIBS_USERon the fly. I think we should make such a setup as simple as ever possible for Bioconductor package maintainers, but also for end users. > > A major blocker for this is the tie down to a specific version of R. It’s actually quite rare that there are major changes to base R forcing us to use a specific R version. I argue that Bioconductor shouldn’t go the extra mile that it currently does to make it hard for users to install current Bioc devel on, say, R oldrel. Instead, we should rely solely on theDepends: R (>= some version)feature to declare R version requirements and let R take care of the actual validation. > > By making it super easy for package developers to to switch between Bioc versions, we increase the chances for patches and bug fixes to be backported, and for maintainers to stay up-to-date with both Bioc release and devel. I suspect that most maintainers stick with either of them today, because it’s so tedious to switch. I doubt Linux containers will lower this threshold; instead I think we want developers to be able to work in their own day-to-day R environment. > > It’s not super hard to have aBiocManager::useVersion()that allows us to switch between different package library paths. For instance, Bioconductor library paths could live under theR_LIBS_USER, e.g. conceptually$R_LIBS_USER/_bioc-3.18/and$R_LIBS_USER/_bioc-3.19-devel/. This would allow us to break free from the R-release/R-devel time-dependent kerfuffle. Bioc check validation can still be done towards the R-release/R-devel as done today, and, if wanted, also toward multiple R versions if we had the cycles.

Hervé Pagès (19:48:34): > > I argue that Bioconductor shouldn’t go the extra mile that it currently does to make it hard for users to install current Bioc devel on, say, R oldrel. > I’m not very optimistic that, say, BioC 3.18 (which we build and check on R 4.3) would work with, say, R 4.1, without major problems. It would actually not be too hard to force a BioC 3.18 installation on R 4.1. The installation itself might work, but there’s no guarantee that the code will run properly. Let’s try to collect some data on this. I’m going to use R 4.1 to run BUILD and CHECK on a small subset of BioC 3.18 packages, and will post a link to the report here. I’ll use the same subset of packages that I’m using for the BiocCheck builds here:https://bioconductor.org/checkResults/3.19/bioc-testing-LATEST/. > Now that was for backward compatibility ofsource packages. For backward compatibility ofbinary packages, things are going to be worse since there’s no guarantee that a binary package built for R 4.3 will work with R 4.1. And in this case, when things go wrong, it’s usually a crash. Not pretty! > > By making it super easy for package developers to to switch between Bioc versions etc… > Let’s discuss ways to make this super easy for developers but hopefully without opening the can of worms of making it easy for end users to install the latest BioC on old versions of R. The railguards implemented inBiocManager::install()by default are here for good reasons.

2024-03-08

Hervé Pagès (17:05:05): > Here is the report of using R 4.1.3 to BUILD/CHECK a small subset of 3.18 packages:https://bioconductor.org/checkResults/3.18/bioc-testing-LATEST/Some notes about these builds: > * They use R 4.1.3 which is from 2022-03-10. R 4.1.3 was the latest release in the R 4.1 series. > * They run on a small subset of software packages (118 packages). > * They run on nebbiolo2, like the daily 3.18 builds. The only two differences between these builds and the daily 3.18 builds are: > > * the version of R that is used: R 4.1.3 for the former, and R 4.3.3 for the latter (4.3.3 was released 9 days ago); > * the set of packages that are built/checked > > * Here’s the daily 3.18 report for nebbiolo2:https://bioconductor.org/checkResults/3.18/bioc-LATEST/nebbiolo2-index.html > * The report for R 4.1.3 is not a blood bath, but it’s instructive to take a close look at packages that fail on this report and not on the daily 3.18 report. Which I did (see below). > * Of course we are not interested in packages that fail on both reports. That’s the case for example forGenomicFeatureswhich fails for reasons that have nothing to do with the version of R being used (the UCSC folks broke it again last week:disappointed:). > Here are some interesting failures: > * GOSemSim: It does something likelist(letters, LETTERS) |> do.call('rbind', args = _). This works in R 4.3.3 but not in R 4.1.3. With the latter it produces a parsing error. > * S4Vectors: Unit tests fail becauseas.POSIXlt.numeric(111)only works with recent versions of R. > * beachmat: Last year Aaron started to use C++17 features in his C++ code. However, for some mysterious reason R 4.1.3 wants to use the C++14 compiler when installing the package source tarball to build the vignette. Hence the compilation error. So maybe that’s a bug in R 4.1.3 that got fixed in 4.3.3, I’m not sure. > So basically some packages are taking advantage of functionalities introduced in new versions of R, or they do things that trigger bugs in older versions of R. We can’t blame them for that. Unfortunately that makes them incompatible with older versions of R. > Note that usingDepends: R (>= X.Y.Z)in these packages (which some do but others don’t) doesn’t solve anything, as it only makes the package impossible to install. You could say that it has the merit to be upfront. However, at the end of the day, someone trying to use BioC 3.18 with R 4.1.3 will be left with some packages that are either impossible to install or non-functional, or functional in appearance but returning wrong results (the worst-case scenario). These might be essential packages with dozens or hundreds of revdeps, direct or indirect.

Kasper D. Hansen (17:08:27): > I am wondering what@Henrik Bengtsson’s real use case is. I am suspecting that it is much less general and is really about being able to run bioc-stable and bioc-devel on the same R version

Kasper D. Hansen (17:10:07): > That should work out of the box 50% of the year (where there is only a minor version change in R). The other 50% of the year, it only fails if a package quite rapidly takes advantage of a feature in R-devel. Taking advantage of a new feature “somewhat immediately” is probably much less common than “eventually” (ie after a while).

2024-03-12

Erdal Cosgun (13:05:27): > @Erdal Cosgun has joined the channel

2024-03-17

Raihanat Adewuyi (20:42:59): > @Raihanat Adewuyi has joined the channel

2024-04-04

Hervé Pagès (11:43:22): > Putting my slides on changes to infrastructure packages here:https://docs.google.com/presentation/d/1d-ecvi3hqMSLeH9H41GGxlibZxD6nl3Lj3l2cTFcey4/edit?usp=sharing

Lluís Revilla (13:08:44) (in thread): > Is this for people outside the TAB? I tried to read it but it asks me to requests access

Andres Wokaty (13:14:51) (in thread): > I think it will be publicly available.@Hervé Pagèsneeds to share access with anyone who has the link.

Hervé Pagès (13:42:03) (in thread): > oops, sorry! should work now

Lluís Revilla (14:20:54) (in thread): > Thanks! I could see it and they look great improvements!

2024-04-05

Laurent Gatto (02:13:33) (in thread): > Thanks Hervé. By the way, I also added a link in the meeting agenda. Hope that’s OK.

2024-04-12

Hervé Pagès (12:49:26) (in thread): > Sure, no problem. Thanks!

2024-04-15

Lluís Revilla (17:53:49): > CRAN maintainers would like to use the newly introduced function pkg2HTML to provide a html page for all the packages on CRAN (after R 4.4 release). However, they want to handle “xrefs to BioC[onductor] targets”. Would be possible for Bioconductor to serve html manual pages of all the (software) packages? That way CRAN could link to them and provide more user friendly documentation and Bioconductor html manuals could do the same for CRAN packages.

Vince Carey (20:25:35): > I just tried pkg2HTML out. Any suggestions for improving the appearance? I think we can take this on after 3.19 is released, but it may take some time. - File (PNG): image.png

Vince Carey (20:26:09): > For clarity this was done using rocker/r-ver:devel

2024-04-16

Lluís Revilla (01:45:07): > I assume you@Vince Careyare using Firefox. Update the R-4.4 and you’ll see that the appearance has improved a lot recently (I used rig but building from source would work too.). > Further changes in the output can be added via the CSS argument the pkg2HTML function has or sending patches to R core.

Lluís Revilla (10:20:15) (in thread): > Which email would be best to follow up on this? I got back from CRAN that there are still some issues to think before starting to work on this.

2024-04-26

Henrik Bengtsson (14:34:19): > Q. Is there a reason why we won’t skip the automatic package version bumps when there have been zero changes? > > Take for instanceCGHbase- the code base has not been changed since June 2018, yet it keeps getting bumped four times a year; CGHbase 1.42.0 -> 1.43.0 -> 1.44.0 -> … -> 1.62.0. This makes it really hard for users and developers who depend on the package to know whether there has been any updates. You basically have to inspect the code to know if something has changed. It also causes the NEWS file to become out of date, which makesnews(package = "CGHbase")unreliable. If I see a package where NEWS is way behind the package version number, I indirectly question what other parts of the package is falling behind. > > On a related note: The automatic version bump, e.g. 1.1.1 -> 1.2.0, makes it impossible for the developer to have a NEWS files that is up-to-date when 1.2.0 is released. After the version has been bumped automatically, it is too late to update the NEWS file. So, if you want to have an up-to-date NEWS file that mentions 1.2.0, you have to create a dummy 1.2.1 version entry too, which just adds to the confusion for others. - Attachment (code.bioconductor.org): Bioconductor Code: CGHbase > Browse the content of Bioconductor software packages.

Marcel Ramos Pérez (15:00:49) (in thread): > It is not impossible, developers would just have to document their NEWS files in anticipation of the bump (e.g., write it with1.2.0in the heading). I agree that artificial version bumps should be reconsidered.

Lluís Revilla (16:17:53) (in thread): > Yes, recently I got a bug report in a forked repository because the package was “updated” in Bioconductor the user was confused by my fork and wanted to prevent a bug before release

Hervé Pagès (17:53:02) (in thread): > > You basically have to inspect the code to know if something has changed. > or usegit log, much easier:wink:

2024-05-06

Michael Love (14:44:31): > I was thinking about writing something aboutVignetteIndexEntryin the package contribution part about documentation:https://contributions.bioconductor.org/docs.htmlWith multi-vignette packages, I noticed that developers often do not make use of numbering to put the vignettes in a logical order > > E.g. > 1. Getting started with foobar > 2. Plotting and data export with foobar > 3. Advanced analyses with foobar > instead most of the time these show up on the website in alphabetical order - Attachment (contributions.bioconductor.org): Chapter 12 Documentation | Bioconductor Packages: Development, Maintenance, and Peer Review > Package documentation is important for users to understand how to work with your code. 12.1 Bioconductor documentation minimal requirements: a vignette in Rmd or Rnw format with executable code…

Henrik Bengtsson (15:28:03) (in thread): > It might also be worth trying to get built-in support for collating vignettes in R. Seehttps://github.com/r-devel/r-project-sprint-2023/discussions/9for an effort. There will be a R Dev Day on Fri July 12 immediately after useR! 2024 - Attachment: #9 Collating package vignettes > Background > > In the HTML help and on CRAN package pages, vignettes are ordered by their filenames. > > Issue > > To control the ordering of the vignettes, developers rename the vignette files, for instance, by adding numbers as in future-1-overview.md, future-2-details.md, … The problem with this approach, other than being tedious, is that when injecting new vignettes, or reordering existing ones, the filenames need to be renamed. Renaming files change existing URLs on CRAN pages, which will break any links to the online vignettes and it might also affect search engines. > > Proposal > > Add a mechanism to control the ordering of package vignettes. This could be done by adding a new vignette markup, e.g. %\VignetteIndex{2} where 2 can be any non-missing integer. The vignettes will be ordered by their relative %\VignetteIndex{<index>} and then by their filenames. A vignette without an explicit index is treated as %\VignetteIndex{+Inf}. > > Note how %\VignetteIndex{<index>} resembles the %\VignetteIndexEntry{<title>} markup name. > > What needs to be updated > > I think that the implemention of this would take place in the tools package, because where package builing is taking place. Specifically, I believe it’s src/library/tools/R/Vignettes.R that needs to be updated. The functions that I can imagine need to be updated are: > > • https://github.com/wch/r-source/blob/0c748f31b035ab1acc4737f051af526bd4d6ce82/src/library/tools/R/Vignettes.R#L925-L955|`tools::vignetteInfo()` > • https://github.com/wch/r-source/blob/0c748f31b035ab1acc4737f051af526bd4d6ce82/src/library/tools/R/Vignettes.R#L957-L1022|`tools:::.build_vignette_index()` > > PS. This proposals was taken from HenrikBengtsson/Wishlist-for-R#92.

Michael Love (15:28:55) (in thread): > i would love to avoid numbering. i despise numbering! haha

Michael Love (15:29:16) (in thread): > i’ve resorted to numbering bc worse is alphabetical

Michael Love (15:30:24) (in thread): > A quick guide > Beginner but beyond quick guide > … > You are now able to do advanced stuff > Zebra?

Martin Morgan (15:38:55) (in thread): > One (maybe unlikely) thing to watch for with numbers is that ‘11. Finally…’ will sort before ‘2. Just Starting…’. What if the build system respected vignette file names for ordering, but used title for display as pkgdown does – ‘a_intro.Rmd’ … ‘z_conculsion.Rmd’? (I guess Henrik doesn’t like this solution…)

Michael Love (15:44:42) (in thread): > I will buy a beer for every vignette a developer writes over 9

Michael Love (15:45:46) (in thread): > for pkgdown, I keep filenames obvious and short but i usearticlesto specify order

Michael Love (15:46:08) (in thread): > https://github.com/nullranges/nullranges/blob/devel/pkgdown/_pkgdown.yml#L23-L32

Michael Love (15:47:49) (in thread): > there aren’t that many multi-vignette packages, so I know this isn’t a big deal, but for devels that did want to do this, it would be nice if we had a unified approach

Martin Morgan (15:50:41) (in thread): > You owe~six~~nine~a lot of beers!:beer::beer::beer::beer:…:beer::beer: > > > dcf = read.dcf(url("[https://bioconductor.org/packages/3.20/bioc/VIEWS](https://bioconductor.org/packages/3.20/bioc/VIEWS)")) > > vigs = strsplit(dcf[,'vignettes'], ",") > > sum(lengths(vigs) > 9) > [1] 6 > > sum(lengths(vigs)[lengths(vigs) > 9] - 9) > [1] 21 > > (OnlyRCy3seems to get this right for the Bioc web site, using ‘01’, ‘02’, etc.)

Michael Love (15:54:37) (in thread): > over those beers I will explain how to be more concise

Michael Love (15:55:13) (in thread): > i think i’m in the hole for more than nine beers > > a beer for everyvignettea developer writes over 9

Henrik Bengtsson (16:58:03) (in thread): > > One (maybe unlikely) thing to watch for with numbers is that ‘11. Finally…’ will sort before ‘2. Just Starting…’. What if the build system respected vignette file names for ordering, but used title for display as pkgdown does – ‘a_intro.Rmd’ … ‘z_conculsion.Rmd’? (I guess Henrik doesn’t like this solution…) > One problem is that one might renumber them when new vignettes are injected, e.g. you want to add a new vignette between mypkg-3-abc and mypkg-4-abc. If you renumber them to mypkg-3-abc and mypkg-5-abc, now all online references to mypkg-5-abc.{html,pdf} are broken. It’s also rather tedious having to hard code the ordering this way. > > On a related note, I prefer that package vignette files are prefixed with the package name, e.g. mypkg-1-abc.pdf. That way, if I download PDF vignettes to my laptop or phone, I won’t end up with ‘intro.pdf’, ‘intro(1).pdf’, ‘intro(2).pdf’, … not knowing what is what.

Henrik Bengtsson (17:02:38) (in thread): > BTW, I’m getting BASIC flashbacks; > > 10 LET N=10 > 11 FOR I=1 TO N > 12 PRINT "Hello, World!" > 13 NEXT I >

Michael Love (17:32:56) (in thread): > > BASIC: May 1, 1964 > :birthday:

Hervé Pagès (19:02:02) (in thread): > > What if the build system respected vignette file names for ordering? > But then order would be inconsistent withbrowseVignettes("MyPackage")which I think displays (and orders according to)\VignetteIndexEntry. It seems that the landing pages do the same, which is good. So yeah, maybe it’s just a matter of advertizing the use of\VignetteIndexEntryin the doc as@Michael Lovesuggested.

Michael Love (19:05:58) (in thread): > i’ll make a PR tomorrow, and mention that if they plan to write more than 9 use the leading 0:slightly_smiling_face:

Hervé Pagès (19:12:40) (in thread): > yes and avoid roman numbers

2024-05-07

Vince Carey (05:52:36) (in thread): > Interesting discussion. I ran into the submission called tidytof this morning and had a look at their pkgdown site, which uses the _pkgdown.yml to structure both the vignette collection and the function man pages (reference).https://keyes-timothy.github.io/tidytof/ - Attachment (keyes-timothy.github.io): Analyze High-dimensional Cytometry Data Using Tidy Data Principles > This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a “grammar” of high-dimensional cytometry data analysis.

Vince Carey (05:52:55) (in thread): - File (JPEG): tofsh.jpg

Vince Carey (05:55:57) (in thread): > At present that’s irrelevant for most packages, but IMHO it’s a really nice standard to consider when there are multiple concepts through which to guide visitors.

Michael Love (06:50:36) (in thread): > yes i really like structuring the man pages with pkgdown. if you have multiple functions that have similar names you can also use selection functions e.g. > > - title: "Foobar all the things" > contents: > - starts_with("foobar") >

Michael Love (09:20:12) (in thread): > submitted a PR for the docs

2024-05-21

Vince Carey (09:35:41): > CZI/HCA is using r-universe for cellxgene.census. Should we change our policy of restricting dependencies to CRAN or Bioc?

Lori Shepherd (09:41:06): > I really am strongly against this. or at least as generic a statement as this. CRAN builds and checks and ensure quality of packages – we should at least have that as a minimum

Lori Shepherd (09:41:29): > r-universe doesn’t enforce failure on checks – its available if it builds – we will be lowering the expectation of quality packages being utilized and the idea of continuous testing and reliability

Kasper D. Hansen (10:16:30): > That may be something we could discuss with r-universe?

Kasper D. Hansen (10:17:02): > I mean, it seems a bit weird.

Kasper D. Hansen (10:18:48): > Like, I can see a difference in deciding when to check the package (which is already different between CRAN and Bioc and which I think is also complicated by the computational resources needed to check + which versions of R we check for etc etc etc). But once a check is run, it seems weird to no respect the check, at least if the check errors out (we can again have disagreements on warnings)

Lori Shepherd (10:26:42): > are we limiting this discussion purely to allowing r-universe and not other – would be good to know the scope of the disucssion

Kasper D. Hansen (10:28:08): > I would personally be extremely conservative here, but my impression is that r-universe is getting to the point where we should have this discussion

Kasper D. Hansen (10:28:30): > But I have not looked into this in depth and I am happy to be told I am wrong

Marcel Ramos Pérez (10:29:08) (in thread): > The r-universe doesn’t have release and devel concepts. It would be the same as installing from GitHub. There are no guarantees of reproducibility and stability of packages.

Kasper D. Hansen (10:29:46): > All I am saying is that the the well-made point@Lori Shepherdhas above, might be something that we could have a discussion on with the r-universe people. Some decisions are well thought out and people are unlike to change them, but some decisions are more … random

Kasper D. Hansen (10:30:01) (in thread): > CRAN does not have release and devel

Kasper D. Hansen (10:30:51) (in thread): > I am very happy to be educated on r-universe, which I don’t know enough about, but I think you need to be a bit more precise on your points

Lori Shepherd (10:32:05): > yes. also excellent point by Marcel – it will make it harder to reproduce and debug issues in its current state as well

Marcel Ramos Pérez (10:36:58) (in thread): > But we do and the same reasons apply for not using a GitHub package with a Bioconductor release. We have ways to match CRAN snapshots with Bioconductor versions for reproducibility (with BiocArchive) but with the r-universe that would be more difficult as it only gives you the latest version of packages. It is possible to reproduce a workflow in e.g., Bioc 3.17 with an r-universe package?

Kasper D. Hansen (10:41:07) (in thread): > Honest question: is this easy to accomplish if it depends on a CRAN package? I would say no (unless there has been recent developments on this front, which there very well may be). I have always been an advocate of us needing to mirror CRAN packages for this reason

Marcel Ramos Pérez (10:44:48) (in thread): > Relatively, yes, we look through theCRAN archiveto find the version of the package that was closest to the last built date for e.g., Bioc release 3.17.

Henrik Bengtsson (10:59:54) (in thread): > Last day of Bioconductor 3.17 was 2023-10-24. A snapshot of the CRAN repository from that date is available fromhttps://packagemanager.posit.co/as: > > db <- available.packages(repos = "[https://packagemanager.posit.co/cran/2023-10-24](https://packagemanager.posit.co/cran/2023-10-24)") >

Henrik Bengtsson (11:06:14) (in thread): > CRAN supports having soft package dependencies (i.e.Suggests) in non-mainstream() package repositories by specifying them inAdditional_repositories. > > CRAN requires a package to passR CMD checkalso when none of theSuggestspackages are installed. > > () They define mainstream as CRAN and Bioconductor.

Henrik Bengtsson (11:52:07) (in thread): > > CZI/HCA is using r-universe for cellxgene.census. > @Vince Carey, can you pls give a pointer? I have no idea what CZI/HCA is.

Marcel Ramos Pérez (11:52:33) (in thread): > https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_installation.html#r

Henrik Bengtsson (12:03:58) (in thread): > Thx/Oh - I thought it was a package named ‘HCA’, heading to Bioc or something, that depends on a package ‘cellxgene.census’ that lives on R-universe. Now I see there’s only one package ‘cellxgene.census’ and it’s on R-universe. So, I assume the question is what to do when a Bioc package want to depend on ‘cellxgene.census’. > > My suggestion is to always encourage maintainers to consider publishing on CRAN. Going from GitHub-only to R-universe is the first step towards that. Sometimes they’re not aware of the benefits and only heard tales of horror stories, sometimes they want to reach a certain milestone before doing so, and sometimes there is a real blocker preventing them from doing it. By my experience, most maintainers are willing do publish on CRAN.

Robert Castelo (13:04:10): > We could attempt persuading CZI to submit cellxgene.census to Bioconductor, with the argument that Bioconductor provides minimum standards for archiving, licensing, testing and documentation, which are kind of nice when doing open science.

Kasper D. Hansen (15:32:58): > We should at least communicate with CZI about the choices they are making here.

Lluís Revilla (17:29:57) (in thread): > TheR repository working grouphas been working on this area for a a while (~3 years) with various interested parties: from pharmaceutical companies, to rOpenSci, which maintains r-universe. I think someone from Bioconductor attended one of the meetings at some point. > There are several ideas for using the r-universe, frommultiverse(which also aims to also provide devel and release repositories) tocranhavenand others. Besides the technical aspects of relying on one company to provide this for free, andBioconductor’smirror there, and several other points raised here; I think that any change should also consider the trust and network effects it would have. CRAN trusts Bioconductor because it checks the packages, if it is possible to depend on a package that is not in either of them, a CRAN package could end up depending on something completely unchecked. > In my opinion, it would be ideal if there was a working group from the R Foundation to coordinate repository activities and decisions, which is why I started ther-devel/reposrepository. This would make it easier to take a consensus decision (but harder to get there) and share experience/code between repository maintainers.

Hervé Pagès (20:20:37): > The Mac binary provided by r-universe fortiledbsomais broken: > > > library(tiledbsoma) > Error: package or namespace load failed for 'tiledbsoma' in dyn.load(file, DLLpath = DLLpath, ...): > unable to load shared object '/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/arrow/libs/arrow.so': > dlopen(/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/arrow/libs/arrow.so, 0x0006): symbol not found in flat namespace (_utf8proc_category) > > Yes, hopefully at some point they understand the benefits of moving this to CRAN or Bioconductor.

2024-06-07

Hervé Pagès (21:11:51): > Link to the Bioconductor builds and their schedule added to the build report (right below the “Approx. Package Snapshot Date/Time”), as discussed at last meeting:https://bioconductor.org/checkResults/3.20/bioc-LATEST/

2024-06-12

Robert Castelo (11:08:49): > CRAN packages will get a DOIhttps://x.com/AchimZeileis/status/1800446746054529212?t=ORxztps6wUj33GCaW8SKjg&s=35 - Attachment (X (formerly Twitter)): Achim Zeileis @zeileis@fosstodon.org (@AchimZeileis) on X > PSA: All #rstats package on #cran will get an official DOI! > > This will facilitate bibliometrics and giving credit to R package authors. > > Registering all 20,000+ packages will still take a few more days. But the first couple of thousand are already live. Example:

2024-07-29

Sean Davis (10:56:24): > Has anyone usedpakfor package installation?https://pak.r-lib.org/If so, what were your experiences? In a limited trial (last week or so), I’ve been impressed with the performance. - Attachment (pak.r-lib.org): Another Approach to Package Installation > The goal of pak is to make package installation faster and > more reliable. In particular, it performs all HTTP operations in > parallel, so metadata resolution and package downloads are fast. > Metadata and package files are cached on the local disk as well. pak > has a dependency solver, so it finds version conflicts before > performing the installation. This version of pak supports CRAN, > Bioconductor and GitHub packages as well.

Lluís Revilla (15:47:39) (in thread): > It doesn’t use the standard R toolchain to pick up dependencies (filtering packages or deciding from which repo a package should come). But it is fast as it caches downloaded packages and install binaries from P3M. The second part can be also achieved via bspm or related packages for some OS or at ubuntu via apt and r2u.

Lluís Revilla (17:25:32) (in thread): > Correction, it installs packages fromgithub.com/cran(which I think is more insecure):https://github.com/pharmaR/regulatory-r-repo-wg/issues/87 - Attachment: #87 Confirm pak is picking up packages from DownloadURL > Following up on #74 > > @borgmaan mentioned today that we need to confirm that the packages being installed from the url specified in the PACKAGES DownloadURL after setting a repo to a local file path as shown in https://github.com/pharmaR/repos/blob/feature/riskscore/dev/merge-riskscore.R > > There are probably plenty of ways of testing this. One idea is to debugonce(download.file); debugonce(download.package). I’m not certain these are used on the backend, but I think they’re probably a safe bet. > > tagging @ramiromagno for interest

2024-07-30

Martin Morgan (11:32:05) (in thread): > At least some published versions failed to use the correct version of Bioconductor for current R; I solved this by setting the environment variableR_BIOC_VERSIONto3.19or3.20as given byBiocManager::version(). This also has the effect thatinstall.packages()will use the specified version of Bioconductor.

2024-08-02

Helena L. Crowell (03:28:48): > Just to share herehttps://scverse.org/eventslisting past hackathons - they started ’22 in EU, last year 1st event in the US - each event has a set of targets (eg specific data types, but can include docs, testing etc), and outcomes (eg links to repos) are posted after (eg AnnDataR was conceived last year in Heidelberg)…maybe sth Bioc could draw inspiration from in the future to involve the global community in developments that help move/stabilize the project as a whole - tho we are more “mature”, there’s surely lots that could be done. - Attachment (scverse): Events > Open community meeting Scverse community meetings happen every second Tuesday at 6pm CET and are open to everyone! If you are new to scverse, these meetings are a great way to get to know the people behind the project. > We usually start off the meetings with a short presentation about a development-related topic or a new ecosystem package. > The meetings have an open agenda. If you would like to bring up a topic, feel free to add it before or during the meeting!

Sean Davis (12:53:18): > FYI:https://datascience.cancer.gov/news-events/news/funding-available-software-support-open-science - Attachment (datascience.cancer.gov): Funding Available for Software to Support Open Science | CBIIT > NIH has a new funding opportunity to help you develop scientifically sound and sustainable software tools to advance cancer research.

Martin Morgan (13:01:34) (in thread): > This related funding announcementhttps://grants.nih.gov/grants/guide/rfa-files/RFA-OD-24-011.htmlmight also be of interest – it is meant specifically to support individual ‘research software engineers’ which seems like an interesting opportunity for those actually developing the software to receive direct support. - Attachment (grants.nih.gov): RFA-OD-24-011: NIH Research Software Engineer (RSE) Award (R50 Clinical Trials Not Allowed) > NIH Funding Opportunities and Notices in the NIH Guide for Grants and Contracts: NIH Research Software Engineer (RSE) Award (R50 Clinical Trials Not Allowed) RFA-OD-24-011. ODSS

2024-08-20

Vince Carey (15:03:16): > I am having trouble finding the nomination form. Is there a link on the web site?

Charlotte Soneson (15:04:14): > https://community-bioc.slack.com/archives/C35G93GJH/p1723212858640809 - Attachment: Attachment > :star2: Reminder: Annual Nominations Open for CAB & TAB! :star2: > > Are you interested in contributing to Bioconductor decision-making, or do you know someone who would be a great fit? Join our advisory boards! > > :globe_with_meridians: The Community Advisory Board (CAB) aims to: > • enable productive and respectful participation in the Bioconductor project by users and developers at all levels of experience > • empower user and developer communities by coordinating training and outreach activities > :wrench: The Technical Advisory Board (TAB) aims to: > • develop strategies to ensure long-term technical suitability of core infrastructure for the Bioconductor mission > • identify and pursue technical and scientific aspects of funding strategies for long-term viability of the project > :date: Apply by Aug 31: > • CAB Application Form > • TAB Application Form > Don’t miss this opportunity to impact Bioconductor’s future! > > Feel free to share this on LinkedIn or https://genomic.social/@bioconductor/112932443380312640|Mastodon

Vince Carey (15:04:42): > I found it on the support site too.

2024-09-05

Vince Carey (11:11:38): > bioc tab replacement sept 5 > Thursday, Sep 5 · 12–1 PM > Google Meet joining info by email plz contact me if u have an issue i am locked out of zoom and cannot admit members

Vince Carey (11:16:34): > i have edited calendar info for this event

2024-10-03

Hervé Pagès (12:57:32): > Here’s the slide (only one) I had prepared for today’s meeting (NaArray objects):https://docs.google.com/presentation/d/1P-m1WGGrRSlFObCtrKnKQnXryd7DG5j5H0ju9cPbAVA/edit?usp=sharing

Vince Carey (13:25:29): > whoops. sorry that we did not get to this.

Kasper D. Hansen (13:30:31): > Where was the signup sheet for testing the new system that Alex mentioned (or I thought he mentioned)

Vince Carey (13:34:30): > https://forms.gle/bU8jWCee9PkfY4AfA - Attachment (Google Docs): GitHub Source - Testing Volunteers > After completing this form, you will be added as an external collaborator with write access to the GitHub-hosted source of your Bioconductor package. You are then asked to git remote add gitdev [https://gitdevtest.bioconductor.org/packages/yourpackage](https://gitdevtest.bioconductor.org/packages/yourpackage) when you first clone your package or in your existing package directory. Subsequently every time you push to devel on git.bioconductor.org, you are asked to also git push gitdev HEAD:devel, i.e. push all your devel changes to BOTH git.bioconductor.org and gitdevtest.bioconductor.org. > Please direct any questions or report any issues to mailto:maintainer@bioconductor.org|maintainer@bioconductor.org

Kasper D. Hansen (13:35:38): > Thanks

Laurent Gatto (15:24:54) (in thread): > That looks very interesting for (single-cell) proteomics, where we do have quite a few missing values!

Hervé Pagès (15:28:16) (in thread): > Great! Good to know that there are other use cases for this:smiley:

Laurent Gatto (15:29:15) (in thread): > https://github.com/UCLouvain-CBIO/scp/issues/75

Hervé Pagès (15:32:10) (in thread): > You seem to have quite a fewNaN’s too. You’ll improve sparsity, and hence reduce memory footprint, if you replace them withNA’s.

2024-12-06

Davide Risso (09:59:28): > A direct account of the extent of the F1000R issue - Attachment: Attachment > Thanks Davide. That’s really helpful & informative. :grinning: > > For @Vince Carey, F1000R now insist on all submissions being MS Word documents. Mike Smith wrote a really useful LaTeX template for writing your paper using Rmd, then converting to TeX & pdf, which I recently used & hoped the process might be smooth. Unfortunately, the final strategy for us was to output the Rmd as an MS Word document, which is almost workable, but leads to formatting issues and all equations will need to be re-entered using the MS Equation Editor as they don’t parse correctly. Essentially, the direct reproducibility from code to the submitted document is broken. From submission, the paper is then put through their formatting tool which removes all line breaks from code. So an opening multi-line chunk loading packages would become the single line library(pkg1) library(pkg2) ... library(pkg_n) which means every chunk with more than one line is then broken & needs reformatting and re-testing. Comments also become part of this single line and this destroys tidy syntax and relevant indentation. When pointing out that they had broken all of our code their response was to ask me to send each code chunk (19 for us) as a stand-alone word document “correctly formatted”, so from us they wanted 19 separate MS Word files sent through. I politely declined & pointed out they had each chunk correctly formatted in our initial submission, which after a little frustrated discussion, they conceded and used. It’s still viable to publish there, but it’s no longer safe to assume that the published code will work, which to me, defeats the purpose of a Bioc workflow.

2024-12-29

Yahya Jahun (04:01:53): > @Yahya Jahun has joined the channel

2025-02-06

Hervé Pagès (13:19:04): > To expand a little bit on the discussion about the names of saveObject/readObject/validateObject. These functions arenotgeneral functions for saving/reading/validating objects written inanyformat as their names suggest. They are instead specialized functions for dealing with the alabaster format, which is very new (and still kind of niche at the moment). My point was that the names of these functions should reflect the format that they deal with e.g. saveAlabaster, readAlabaster, validateAlabaster, like most saving/reading functions do. My point was not that they shouldn’t be generic functions (of course they should be because they need a bunch of methods to handle all kinds of objects). I’m just worried that these names are adding an unnecessary level of opacity to what these functions really do. Also the confusion between save() vs saveObject() and validObject() vs validateObject() is real, especially for newcomers to the project. Naming is important!

Vince Carey (20:26:09): > Fair points. I am glad we started the discussion. It seems to me that your idea of saveAlabaster/readAlabaster could be introduced as a protocol for “client” functions. Developers need to know this and provide sufficient explicitness to users. There is an#artifactdbchannel where more discussion could take place.

2025-02-21

Sean Davis (09:44:33): > Any thoughts on:https://cloud.r-project.org/web/packages/rextendr/vignettes/package.htmland Rust in R more generally?

Vince Carey (09:47:31): > Aaron Quinlan’s group submitted a rust-dependent package which we were able to build, I think, but they abandoned it, I think the dev left. I do not have a big problem with it in an abstract sense. Whether it is appropriate for us to take on another maintenance obligation with flat or diminishing resources is another question.

Vince Carey (09:49:10): > I foundthisa little difficult to parse in detail but the message is – accessible? - Attachment (theregister.com): Open source maintainers are feeling the squeeze > Overworked, under pressure, and subjected to abuse – is it really worth it?

Sean Davis (09:49:55): > My limited experience with the rust ecosystem and development is that the infrastructure burden is relatively low (even compared to python), but I haven’t done any R integration.

Sean Davis (09:51:55): > On supporting research software engineers, here is a group that a couple of our CU software engineering team are engaging with:https://us-rse.org/ - Attachment (us-rse.org): US-RSE > United States Research Software Engineer Association

Sean Davis (09:52:50): > One member of the team has been successful in getting a couple of small fellowships to support independent work and career development.

Sean Davis (09:53:48): > I’ll think a bit about how we could better support folks, at least with networking and communities like us-rse.

Vince Carey (09:56:06) (in thread): > Is there a “use the new languages while they’re young!” phenomenon? Or is there hard-won knowledge that really makes this work better?

Sean Davis (09:56:06): > I do agree with the article above. We manage our local software engineering team with 25% undirected effort time to try to mitigate some of the squeeze, but I suspect that is unusual and in this climate may be untenable. In practice, though, the day-to-day pressures are very real.

Sean Davis (10:01:24) (in thread): > Don’t know, but here are some related data:https://madnight.github.io/githut/#/pull_requests/2024/1 - Attachment (madnight.github.io): GitHub Language Stats > This website shows the popularity of programming languages on GitHub over time.

Michael Love (10:21:11) (in thread): > make Rust better than other languages in terms of maintenance?

Sean Davis (10:25:50): > Oh, and if anyone has a motivated organizer, this is a soft of handbook for developing a research software engineering community within their organizations/institutions.https://zenodo.org/records/10436166 - Attachment (Zenodo): Getting Started with the RSE Movement within your Organization: A Guide for Individuals > Recognizing the critical role of Research Software Engineers (RSEs), this guide serves as a resource for individuals aspiring to champion the RSE movement within their organizational contexts. By offering practical steps and tips, the guide aims to instigate positive change by connecting RSEs and cultivating a cohesive RSE community. Structured into specific sections, it begins with creating awareness and assessing interest, then progresses to the establishment of an informal RSE community and the recruitment of allies. Ultimately, the guide guides individuals in forming an RSE group or society within their organization. These actions set the foundation for collaborative efforts, support systems, and advocacy, enabling individuals to drive impactful change and foster a conducive environment for the flourishing of Research Software Engineers in their organization.

Anushka Paharia (10:58:08): > @Anushka Paharia has joined the channel

2025-02-22

Lluís Revilla (04:00:21) (in thread): > CRAN has a specific page for developing with rust:https://cran.r-project.org/web/packages/using_rust.html. But there are some desire to improve R check about it too. - Attachment (cran.r-project.org): Using Rust in CRAN packages > Using Rust in CRAN packages

2025-02-25

Michael Love (09:19:57): > Do we have a rough idea of the 3.21 release schedule ?

Lori Shepherd (09:21:13): > we have been trying to get information from R core on when R 4.5 is going to be released; which we need to officially schedule the 3.21 release. based on history generally its the end of April or early May

Grace Kenney (09:21:22): > @Grace Kenney has joined the channel

Lluís Revilla (09:21:57): > Fromdeveloper.r-project.org: > > The release of 4.4.3 (“Trophy Case”) is scheduled for Friday 2025-02-28. Release candidate tarballs will be made available during the week up to the release. > > Please refer to thegeneric checklistfor details > > * Tuesday 2025-02-18:START(4.4.3 beta) > * Friday 2025-02-21:CODE FREEZE(4.4.3 RC) > * Friday 2025-02-28:RELEASE(4.4.3) > > >

Lori Shepherd (09:22:31): > that is for patch to 4.4 not for the release of R-devel/R 4.5

Sarah Parker (09:26:53): > @Sarah Parker has joined the channel

Lluís Revilla (09:27:06): > Exactly my point, there is still no information for R 4.5

2025-03-03

W Sun (18:08:37): > @W Sun has joined the channel

2025-03-04

Michael Love (08:45:01): > I’m working on a package that interfaces with IGVF, and I noticed we don’t have a biocViews for reporter assays, e.g. MPRA, STARR-seq, saturation mutagenesis, transgenic mouse assay like vista enhancer db, etc. These are experiments where libraries are put into cells/tissues to look for a regulatory effect using a reporter. What do people think about addingReporterAssay? Additionally, is there a broader term, or a complementary term we could use for mutational scanning with respect to protein function?

Vince Carey (09:26:27): > relevant to#edam-collaboration

Claire Rioualen (11:42:24): > @Claire Rioualen has joined the channel

Claire Rioualen (11:42:53) (in thread): > Will look into it

Kasper D. Hansen (12:04:02) (in thread): > I think its a good idea

Michael Love (12:12:54) (in thread): > other terms that may be relevant for IGVF arePerturbationScreen(CRISPR a/i, Perturb-seq) andMutationalScanning

Michael Love (12:13:37) (in thread): > mutational scanning includes some assays that are not reporters, e.g. if the readout is a cell phenotype

2025-04-24

Alex Mahmoud (12:13:12): > If you have any ideas for the collaboration fest / hackathon post-conference, please submit any and all ideas tohttps://forms.gle/RzrT75DpRh9Vb8Z16 - Attachment (Google Docs): GBCC Cofest Ideas Submission > Submit any and all ideas you want to lead and/or in which you’d like to participate.