#anvil
2018-12-17
Nitesh Turaga (15:21:36): > @Nitesh Turaga has joined the channel
Nitesh Turaga (15:21:36): > set the channel description: Discussions related to the anvil project
Martin Morgan (15:21:36): > @Martin Morgan has joined the channel
Valerie Obenchain (15:21:36): > @Valerie Obenchain has joined the channel
Levi Waldron (15:21:36): > @Levi Waldron has joined the channel
Vince Carey (15:21:36): > @Vince Carey has joined the channel
C. Mirzayi (please do not tag this account) (15:21:36): > @C. Mirzayi (please do not tag this account) has joined the channel
Ludwig Geistlinger (15:21:36): > @Ludwig Geistlinger has joined the channel
Marcel Ramos Pérez (15:21:36): > @Marcel Ramos Pérez has joined the channel
Shweta Gopal (15:21:36): > @Shweta Gopal has joined the channel
BJ Stubbs (15:21:37): > @BJ Stubbs has joined the channel
Kasper D. Hansen (15:21:37): > @Kasper D. Hansen has joined the channel
Sehyun Oh (15:21:37): > @Sehyun Oh has joined the channel
Nitesh Turaga (15:23:29): > Hi<!channel>, I created this group as a medium of communication for Bioconductor related tasks and goals for the AnVIL project. If I missed anyone, please invite them to this channel. You all should have have received an invitation to collaborate on thewww.github.com/Bioconductor/AnVIL_adminrepository.
2018-12-19
Nitesh Turaga (13:13:54): > Hi<!channel>, Thought i’d try to ask some of my questions here. > > I’m working through the Leonardo Standalone document,https://docs.google.com/document/d/1RYaLK5uLwxm_zaYTg7nCzKrmaPHtPATOjz9Z2KrI_Ak/edit#. I’ve managed to get through most of it, except be able to “access” the instance after launching it by setting a cookie. > > This is the very last step in the document describes this, i’ve successfully launched theleonardo-jupyter:rt-galaxy-dev
image, on a new cluster, but can’t seem to get access to it at this locationhttps://leonardo.dev.anvilproject.org/notebooks/anvil-leo-dev/
Nitesh Turaga (13:14:09): > This is the error: - File (PNG): Screen Shot 2018-12-19 at 1.12.56 PM.png
Nitesh Turaga (13:15:12): > It seems like the cookie authentication mechanism isn’t working. Any thoughts?
BJ Stubbs (13:18:25): > Sometimes I have to re-authenicate. I cannot detect a reason why
Nitesh Turaga (13:19:02): > Does that help you actually connect to the jupyter-notebook ?
BJ Stubbs (13:19:51): > We have mainly been using r-studio, but it should work with a notebook. I will start one up now to verify
Nitesh Turaga (13:21:32): > r-studio works? Can you give my rstudio image a shot with the new version of R, and BiocManager (us.gcr.io/anvil-leo-dev/anvil_bioc_docker) ?
BJ Stubbs (13:23:31): > Sure. Shweta put one on dockerhub with SingleCellExperiment, rhdf5client, and such and it worked fine. I will spin one up with yours. Should take about 10 min
Nitesh Turaga (13:25:06): > Thanks.
BJ Stubbs (13:29:44): > I believe that if you authorize once, the authorize link will say “log out” and look like you are authorized even if it times out or whatever is happening. You need to refresh the swagger page then click authorize to see if you are really still authorized. Losing authorization will kick you out of rstudio, but the instance stays up, and when you re-auth you shouldn’t lose anything though.
Valerie Obenchain (17:56:15): > @Nitesh Turaga@BJ Stubbsin the PM meeting today we were asked to give names of those that will present at the face to face in January. I know Nitesh was planning a demo - BJ, did you want to do this with him or should I just say Nitesh solo … ?
2018-12-20
Vince Carey (13:05:37): > @Valerie Obenchain@BJ StubbsI think our group will have something to present at face to face. Please book Carey group.
Valerie Obenchain (13:33:16): > Thanks Vince. Will do.
2018-12-22
Vince Carey (05:40:10): > Our Rstudio container did not include tex … I guess we have to deal with this? > > No TeX installation detected (TeX is required to create PDF output). You should install a recommended TeX distribution for your platform: > > Windows: MiKTeX (Complete) -[http://miktex.org/2.9/setup](http://miktex.org/2.9/setup)(NOTE: Be sure to download the Complete rather than Basic installation) > > Mac OS X: TexLive 2013 (Full) -[http://tug.org/mactex/](http://tug.org/mactex/)(NOTE: Download with Safari rather than Chrome *strongly* recommended) > > Linux: Use system package manager >
Martin Morgan (11:30:13): > Does one want to create tex output? It seems a bit heavy these days…
Martin Morgan (11:33:12): > Also with respect to R-devel version of jupyter (and jupyter in general) it seems to me like we should be doing the whatever the docker equivalent of importing rocker/r-devel instead of rolling our own? Or actually, reusing the bioc-devel docker container (or at least recipe)
Vince Carey (18:18:27): > Both points are acceptable … PDF generation might be valuable at some point, however.
Vince Carey (18:25:53): > I have become more enthusiastic about Rstudio notebooks because I have been able to do more with them than with jupyter in leo, including authenticate to BigQuery, and use shiny. So whatever gives us Rstudio+R-devel is fine with me.
Vince Carey (18:53:46): > Apropos docker – I followed (with slight modification) remarks athttps://github.com/rocker-org/rocker/wikiand was surprised that > > docker run --rm -ti rocker/r-devel >
> culminated in > > R version 3.4.4 (2018-03-15) -- "Someone to Lean On" >
- Attachment (GitHub): rocker-org/rocker > R configurations for Docker. Contribute to rocker-org/rocker development by creating an account on GitHub.
2018-12-23
Martin Morgan (19:40:04): > Thehttps://github.com/Bioconductor/AnVILpackage has proof-of-concept functionality to authenticate and use the leonardo REST API; the cloned repo needs to be completed by adding application credentials as described in the README link. > > > api_clusters() %>% select(starts_with("label")) %>% head(3) > No encoding supplied: defaulting to UTF-8. > # A tibble: 3 x 3 > labels.creator labels.clusterName labels.googleProject > <chr> <chr> <chr> > 1[reshg@channing.harvard.edu](mailto:reshg@channing.harvard.edu)rhdf5client anvil-leo-dev > 2[nitesh.turaga@gmail.com](mailto:nitesh.turaga@gmail.com)ntbioc anvil-leo-dev > 3[reshg@channing.harvard.edu](mailto:reshg@channing.harvard.edu)jupyter_rstudio_bc anvil-leo-dev >
- Attachment (GitHub): Bioconductor/AnVIL > Interact with AnVIL and Leonardo projects. Contribute to Bioconductor/AnVIL development by creating an account on GitHub.
2018-12-24
Martin Morgan (09:08:24): > I think the R-devel image installs R-devel alongside R, and you’ll access it withRD
–https://github.com/rocker-org/rocker/blob/a7ecee9111f3b5dde35555d338a0e30401f5c095/r-devel/Dockerfile#L97. Also tex seems to be installed there… - Attachment (GitHub): rocker-org/rocker > R configurations for Docker. Contribute to rocker-org/rocker development by creating an account on GitHub.
Martin Morgan (09:12:53): > Following the dockerhub links on thegithub.com/rocker-org/rockerpage suggests that r-devel hasn’t built in 8 months; drd (a lighter version) built yesterday.
Nitesh Turaga (10:27:44) (in thread): > It’s possible they haven’t run the “Dockerfile” after 3.4.4 because I can see that they build fromsvn co
https://svn.r-project.org/R/trunkR-devel
. So it should have the latest version of R-devel.
Nitesh Turaga (10:28:55) (in thread): > I’m thinking if you clone that repo,rocker/r-devel
and build it again on your local machine withdocker build -t rocker/r-devel:local
it’ll give you the latest version of the R-devel.
Martin Morgan (10:42:43): > Seehttps://github.com/rocker-org/rocker/issues/319for a new r-devel build - Attachment (GitHub): r-devel builds stale? · Issue #319 · rocker-org/rocker > dockerhub reports that this last built 9 months ago? https://hub.docker.com/r/rocker/r-devel/builds
2019-01-08
Valerie Obenchain (09:49:33): > @Valerie Obenchain has left the channel
2019-01-09
Lori Shepherd (06:55:34): > @Lori Shepherd has joined the channel
2019-01-10
Samuela Pollack (09:14:30): > @Samuela Pollack has joined the channel
2019-01-27
Martin Morgan (18:58:26): > RStudio Server Pro (i.e., paid) can launch jobs across kubernetes clustershttps://resources.rstudio.com/rstudio-conf-2019/rstudio-job-launcher-changing-where-we-run-r-stuff - Attachment (RStudio Job Launcher Changing where we run R stuff - Darby Hadley): RStudio Job Launcher Changing where we run R stuff - Darby Hadley > RStudio Job Launcher provides the ability to start processes within batch processing systems and container orchestration platforms.
2019-01-29
Levi Waldron (11:13:05) (in thread): > Also scaling (the expensive way) - it’s about 5c/minute or $2K/mo if always-on for RStudio Pro on GCP!
Nitesh Turaga (13:50:51): > Just thought i’d bring this to our attention, there are images in the rocker-project calledrocker/binder
with tagsrocker/binder:devel
orrocker/binder:3.5.2
. They seem to be very elegant solutions to host both Jupyter and Rstudio within the same container. I’m not sure if this has any implications on AnVIL, but it was very well done.
Nitesh Turaga (15:19:57) (in thread): > It would be great to give it a go if someone has contacts at RStudio with some free credits.:slightly_smiling_face:
Levi Waldron (20:09:40) (in thread): > I would love to have both RStudio and Jupyter available from the Bioconductor Docker images.
2019-01-30
Nitesh Turaga (13:55:19) (in thread): > Yes, it seems like a good feature. I’m not totally sure on howbinder
works, and i’ll take a look at it over the weekend to explore the technology.
2019-01-31
Nitesh Turaga (19:38:07) (in thread): > @Levi WaldronI was playing around a little with binder, and came up with thishttps://github.com/nturaga/bioconductor_binder. You can test out release and devel on your local machine using > > docker run -p 8888:8888 nitesh1989/bioconductor_binder:devel > > or > > docker run -p 8888:8888 nitesh1989/bioconductor_binder:R-3.5.2 >
> Check yourhttps://localhost:8888for jupyter andhttps://localhost:8888/rstudio. You can launch Rstudio from jupyter as well, which is really neat.
Levi Waldron (19:39:04) (in thread): > Oh nice@Nitesh Turaga, I’ll try it!
2019-02-05
Martin Morgan (16:09:13): > @Vince Careymaybe I misunderstood; the call today (now) does have something about bioc workflows, and I’m not in a position to present about that. Would you like to join?
Vince Carey (16:09:36): > OK
Vince Carey (16:11:27): > can you give me the link
Nitesh Turaga (16:11:38): > https://meet.google.com/xoa-eqym-uxc - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
Martin Morgan (16:12:22): > I forwarded the invite to your email (includes link to agenda, etc…)
Vince Carey (16:29:21): > can’t get access to the agenda doc alas … have sent request
2019-02-06
Sehyun Oh (14:42:07): > HI! I’m trying to setup my CNV analysis workflow on AnVIL/ FireCloud - to share my workflow and to test the AnVIL platform. The workflow involves applying GATK tools to WES BAM files followed by analysis in Bioconductor. My understanding is that rstudio / notebooks in Leo do not (yet?) have the ability to call Docker, and that in the short term I should develop this workflow using WDL on FireCloud. Can anyone confirm this? Thanks!
Levi Waldron (14:43:57) (in thread): > Hey Nitesh, I just tried, it works, and it’s great.
Nitesh Turaga (14:44:13) (in thread): > Nice:smile:
Levi Waldron (14:47:22) (in thread): > Would it be premature to convert myhttps://github.com/waldronlab/bioconductor_develstuff to using this as a base layer?
Nitesh Turaga (14:47:24): > Hi@Sehyun Oh, my understanding is that you can use the notebooks but i’m not sure if you can transfer data from Firecloud to any of the images launched through leonardo at the moment. Because they are both going to be launched under separate google projects. Firecloud lets you run things under a “freemium” free google account, and Leonardo is running under “anvil-leo-dev” billing account. There is no sharing between these two at this point. So you should develop your workflow using WDL on Firecloud (it’ll be rebranded as Terra in the AnVIL). > > But Leonardo has the ability to launch Docker containers hosted on Dockerhub (if that’s a question you had too, not related to the earlier question).
Levi Waldron (14:48:38) (in thread): > I would like very much to be able to choose between RStudio and Jupyter…
Nitesh Turaga (14:48:56): > If any one has a better answer, feel free to correct me. This is my current understanding though.
Nitesh Turaga (14:51:04) (in thread): > You can use this if you’d like. But i’ve just played with it a little bit, and would have to explore the ‘binder’ technology a little more to get a real handle on it. For instance, i don’t really like thetoken
copy paste stuff. But if that is ok with you, feel free to convert your images.
Levi Waldron (14:53:09) (in thread): > Yeah thetoken
is annoying, but even then maybe worth the trouble. BTW, why not abioconductor_binder:release
image, instead of just:R-3.5.2
?
Sehyun Oh (14:58:21): > Thanks@Nitesh Turaga! So RStudio app from Leonardo will be able to launch Docker container in it? I thought large scale batch analysis is still done through WDL/CWL in Terra (I guess this is where Dockstore is involved?)…
Nitesh Turaga (14:58:26) (in thread): > Since I was just playing with it I managed to go with the a poor choice of tag names. But, I didn’t use the wordrelease
for a tag name, becauserocker
doesn’t do it at any stage, they just uselatest
for the release version. Since it’s the default download when using the image without a tagname. > > But in hindsight maybe it should have beenbioconductor_binder:release
.
Nitesh Turaga (14:59:27): > Yes currently, large scale analysis should be done using Firecloud (Terra) with a WDL workflow.
Levi Waldron (15:00:16) (in thread): > There’s no “latest” tag either: > > docker pull nitesh1989/bioconductor_binder:latest > Error response from daemon: manifest for nitesh1989/bioconductor_binder:latest not found >
Nitesh Turaga (15:00:43) (in thread): > One second, i’ll make arelease
tag and push it up.
Sehyun Oh (15:01:43): > Btw, I have a trouble installingBioconductor/AnVIL
. Can someone help me to resolve this issue? Thanks!
Sehyun Oh (15:01:47): > > Bioconductor version 3.7 (BiocManager 1.30.4), R 3.5.0 (2018-04-23) > Installing github package(s) 'Bioconductor/AnVIL' > Downloading GitHub repo Bioconductor/AnVIL@master > ✔ checking for file '/private/var/folders/b2/gq9s7sk56xlfqz1s8hzykd3c0000gn/T/RtmpwlcVno/remotes395122e12e20/Bioconductor-AnVIL-32903d6/DESCRIPTION' ... > ─ preparing 'AnVIL': > ✔ checking DESCRIPTION meta-information ... > ─ checking for LF line-endings in source and make files and shell scripts > ─ checking for empty or unneeded directories > ─ building 'AnVIL_0.0.4.tar.gz' > > * installing **source** package 'AnVIL' ... > **** R > **** inst > **** byte-compile and prepare package for lazy loading > Error in get_api(.api_path(service), config) : unused argument (config) > Error : unable to load R code in package 'AnVIL' > ERROR: lazy loading failed for package 'AnVIL' > * removing '/Library/Frameworks/R.framework/Versions/3.5/Resources/library/AnVIL' > Error in i.p(...) : > (converted from warning) installation of package '/var/folders/b2/gq9s7sk56xlfqz1s8hzykd3c0000gn/T//RtmpwlcVno/file39516f1888c1/AnVIL_0.0.4.tar.gz' had non-zero exit status >
Nitesh Turaga (15:04:44) (in thread): > https://cloud.docker.com/repository/docker/nitesh1989/bioconductor_binder/tags, therelease
tag should be available now@Levi Waldron
Martin Morgan (15:05:35): > Maybe your rapiclient is installed from CRAN, rather thanBiocManager::install("bergant/rapiclient")
Sehyun Oh (15:13:23): > Ha!BiocManager::install("bergant/rapiclient")
fixed the error. Thanks!!
BJ Stubbs (16:40:25): > I played around a bit with the api package on my fork. I think it would be useful to make use of the tag data for increased usability. I made a proof of concept method
BJ Stubbs (16:41:19): - File (Plain Text): Untitled
2019-02-12
Nitesh Turaga (15:59:27): > Just a heads up to the AnVil team here…the tech call is happening now starting at 4pm.
Nitesh Turaga (15:59:48): > Meeting link:https://meet.google.com/xoa-eqym-uxc - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
Vince Carey (16:08:07): > i am on now
Nitesh Turaga (16:45:47): > Ok, so everything Leo is back up again.
2019-02-14
Martin Morgan (09:21:21): > <!channel>Mo has asked for a bi-weekly update to communicate to the powers that be. Can you provide a (one-sentence) update on your recent activities? Best framed in the context of our ‘milestones’https://docs.google.com/document/d/1fKvxUPZleDFfEcRzCe-HM4HmZ3pMKsLV3T6dNTPVL7k, which project members should / can have access to. Start or add to a thread for this message. > > I’ll try to use my slack-fu to make this an automatic reminder…
Martin Morgan (09:23:39): > set up a reminder “Please provide one-sentence updates on progress over the last two weeks” in this channel at 9AM every other Thursday (next occurrence is February 21st), Eastern Standard Time.
Martin Morgan (09:28:04): > set up a reminder “Please briefly summarize your AnVIL activities over the last two weeks” in this channel at 6AM every other Thursday (next occurrence is February 28th), Eastern Standard Time.
Nitesh Turaga (12:22:20): > Should we just be writing it here in slack?
Martin Morgan (12:36:44): > I’ll eventually record the summary herehttps://docs.google.com/document/d/1W0g2uBQeBaURAf2JMaobI-E7ef6zyLZU8Nvb9_b-s9I/editbut post to slack if that’s more convenient. Please do so by 2pm (I told Mo that I’d have our paragraph by 3)
Nitesh Turaga (12:55:18): > I’ve been working on code to localize/delocalize/install packages in buckets in the AnVIL package. There are also some minor changes i’m making in the Docker images to make them more maintainable. They are currently stable with the Bioc 3.8 and 3.9 versions.
Vince Carey (13:21:24): > BJ has done substantial work on authentication to leonardo, terra, dockstore, and gen3. The solutions involve a mix of oauth2 and bearer-in-request-header and diverse upstream resources need to be visited to retrieve authentication tokens. Ultimately the user’s installation of AnVIL will include all necessary tokens for use with functions that interact with the remote APIs. BJ’s fork (devel branch) has a tags method that uses swagger metadata to group method sets: > > > tags(terra) > List of 18 > $ Billing :'data.frame': 6 obs. of 2 variables: > $ Entities :'data.frame': 14 obs. of 2 variables: > $ GA4GH Tool Registry :'data.frame': 7 obs. of 2 variables: > $ Groups :'data.frame': 7 obs. of 2 variables: ... >
> .
Vince Carey (13:31:38): > We are writing a vignette to clarify the current state of authentication. It would be nice to have some commitments that the framework is not going to change. There are 4 different services with different approaches. Vince has been dialoguing with Dockstore group about translation of Bioc workflows to dockstore workflows. A crude example is available.
Sehyun Oh (14:23:41) (in thread): > Hi Vince! Can you share that example?
Vince Carey (14:38:46) (in thread): > Hi Sehyun it is athttps://dockstore.org/my-workflows/github.com/vjcitn/vardemo/AnnotatingWGSVariantsWithBioc
Vince Carey (14:58:23): > @Martin Morgan@Nitesh Turaga@Levi WaldronI have a quick question about docker in anvil. Specifically in Firecloud. What docker image should be used to work with bioc-devel? I have tried a few and run into fatal errors connected with zlib, htslib, etc. If git2r is needed it takes a very long time to install. I know I should take more careful notes and will do so once I get your guidance.
Martin Morgan (15:02:18): > Thanks all for your bi-weekly input, tremendous progress! I think we should have a show-and-tell (everyone spend five minutes demo’ing their favorite result so far) some time next week to coordinate activities. Please click thumbs up / thumbs down if that sounds like a good / bad idea; sad face if Wed 10am would not be a good time
Martin Morgan (15:03:07): > Should have tagged<!channel>on the last message
Martin Morgan (15:07:28) (in thread): > start withhttps://github.com/Bioconductor/AnVIL_Docker/tree/master/rstudio/bioc_3.9but this is a ‘bare bones’ image and I guess your problem is missing system libraries? I’ll ask@Nitesh Turagato work, with Levi’s existing image and with input from you, an ‘all-of-bioc’ capable image.
Vince Carey (15:14:51) (in thread): > So the WDL reference would be bioconductor/anvil-rstudio-bioc:3.9. I’ll see where this gets me and report back.
Vince Carey (15:50:55) (in thread): > Using this, we have
Vince Carey (15:51:46) (in thread): > > RROR: dependency ‘Rsamtools’ is not available for package ‘GenomicAlignments’ > * removing ‘/usr/local/lib/R/site-library/GenomicAlignments’ > ERROR: dependencies ‘Rsamtools’, ‘GenomicAlignments’ are not available for package ‘rtracklayer’ > * removing ‘/usr/local/lib/R/site-library/rtracklayer’ > ERROR: dependencies ‘rtracklayer’, ‘Rsamtools’ are not available for package ‘BSgenome’ > * removing ‘/usr/local/lib/R/site-library/BSgenome’ > ERROR: dependency ‘rtracklayer’ is not available for package ‘GenomicFeatures’ > * removing ‘/usr/local/lib/R/site-library/GenomicFeatures’ > ERROR: dependencies ‘Rsamtools’, ‘rtracklayer’, ‘BSgenome’, ‘GenomicFeatures’, ‘Rhtslib’ are not available for package ‘VariantAnnotation’ > * removing ‘/usr/local/lib/R/site-library/VariantAnnotation’ > ERROR: dependency ‘GenomicFeatures’ is not available for package ‘TxDb.Hsapiens.UCSC.hg19.knownGene’ > * removing ‘/usr/local/lib/R/site-library/TxDb.Hsapiens.UCSC.hg19.knownGene’ > ERROR: dependency ‘BSgenome’ is not available for package ‘BSgenome.Hsapiens.UCSC.hg19’ > * removing ‘/usr/local/lib/R/site-library/BSgenome.Hsapiens.UCSC.hg19’ > ERROR: dependency ‘VariantAnnotation’ is not available for package ‘cgdv17’ > * removing ‘/usr/local/lib/R/site-library/cgdv17’ > ERROR: dependency ‘VariantAnnotation’ is not available for package ‘PolyPhen.Hsapiens.dbSNP131’ > * removing ‘/usr/local/lib/R/site-library/PolyPhen.Hsapiens.dbSNP131’ > ERROR: dependencies ‘VariantAnnotation’, ‘cgdv17’, ‘TxDb.Hsapiens.UCSC.hg19.knownGene’, ‘BSgenome.Hsapiens.UCSC.hg19’, ‘PolyPhen.Hsapiens.dbSNP131’ are not available for package ‘variants’ > * removing ‘/usr/local/lib/R/site-library/variants’ >
Vince Carey (15:51:56) (in thread): > Trouble starts with > > Makefile.Rhtslib:128: warning: overriding recipe for target '.c.o' > /usr/local/lib/R/etc/Makeconf:166: warning: ignoring old recipe for target '.c.o' > cram/cram_io.c:57:19: fatal error: bzlib.h: No such file or directory > #include <bzlib.h> > ^ > compilation terminated. > make[1]: ***** [cram/cram_io.o] Error 1 > make: ***** [htslib] Error 2 > ERROR: compilation failed for package ‘Rhtslib’ > * removing ‘/usr/local/lib/R/site-library/Rhtslib’ >
Vince Carey (15:54:26) (in thread): > That’s the basic problem – no Rsamtools and some key dependencies can’t be installed.
Sehyun Oh (17:47:15) (in thread): > Hi again@Vince Carey! When I run this Dockstore Workflow on FireCloud, it failed with the below error message. Could you help me to resolve this? Thanks!
Sehyun Oh (17:47:25) (in thread): - File (Binary): 73c5496c-f101-4680-b17d-4f2dcf759d7f_task1_89d68b1c-1ebb-42ee-8676-4bdcca9ae4ac_call-doVariantWorkflow_stderr
Levi Waldron (21:43:38): > Great idea, I’m just in study section next Weds and Thurs!
Levi Waldron (21:53:03) (in thread): > If you open a shell on the waldronlab/bioconductor_devel image, take note of what you do to install the dependencies (just run R in the same shell to keep it simple), let me know and I’ll add it to the image.
Levi Waldron (21:55:14) (in thread): > But I think you should already be able to install these on that image. I could add “all of Bioconductor” images with the Anvil base?
Nitesh Turaga (21:58:32): > Question for the team, is anyone able to launch any image using leo??
Nitesh Turaga (21:59:24): > All the images I try to launch go into an error state because of this, > > "errors": [ > { > "errorMessage": "Initialization action failed. Failed action '[gs://leoinit-nitesh9-5bc7bf3f-a759-4811-81ea-f3478486f2a9/init-actions.sh](gs://leoinit-nitesh9-5bc7bf3f-a759-4811-81ea-f3478486f2a9/init-actions.sh)', see output in:[gs://leostaging-nitesh9-9c181497-0f19-428b-9c5e-a9714625bba3/google-cloud-dataproc-metainfo/8dbc3e55-22ac-4b31-b884-8423753bc6cd/nitesh9-m/dataproc-initialization-script-0_output](gs://leostaging-nitesh9-9c181497-0f19-428b-9c5e-a9714625bba3/google-cloud-dataproc-metainfo/8dbc3e55-22ac-4b31-b884-8423753bc6cd/nitesh9-m/dataproc-initialization-script-0_output)", > "errorCode": 3, > "timestamp": "2019-02-15T02:58:23Z" > } > ], >
Nitesh Turaga (21:59:48): > It’s a google bucket which is created along with every dataproc cluster.
2019-02-15
Martin Morgan (10:29:36) (in thread): > Using the waldronlab/bioconductor_devel image probably unblocks vince; I asked@Nitesh Turagato create an ‘official’ variant that will appear under Bioconductor/AnVIL_docker ; I’m asking Nitesh to do this so that the form of the Dockerfile is consistent with what he’s already done.
BJ Stubbs (11:52:18): > Rob Title is looking into it. He replied to my post in the leonardo channel on the anvil slack.
Nitesh Turaga (11:52:39): > Ok, great. Thanks BJ
BJ Stubbs (14:52:59): > http://tamaszilagyi.com/blog/2018/2018-08-06-kubernetes-parallel/ - Attachment (tamaszilagyi.com): Parallelizing R code on Kubernetes > pre code, pre, code { white-space: pre !important; overflow-x: scroll !important; word-break: keep-all !important; word-wrap: initial !important; } Kubernetes who? The hype around kubernetes is real, but likely also justified. Kubernetes is an open-source tool that facilitates deployment of jobs and services onto computer clusters. It provides different patterns for different type of workloads, be it API servers, databases or running batch jobs. Not only makes kubernetes running workloads and services easy, it also keeps them running.
BJ Stubbs (14:53:29): > This looks like a good tutorial on getting batch R stuff running in K8S. We will give it a try and report back
2019-02-16
Martin Morgan (03:52:44): > the anvil-provided R image uses the pbdZMQ package (implementing ZeroMQ); this in addition to redis is I think commonly used in k8s and fits with the classic batch-parallel processing model and the other pbd* packages.
Vince Carey (09:08:07) (in thread): > > * installing **source** package 'BSgenome.Hsapiens.UCSC.hg19' ... > **** R > **** inst > Warning in file.append(to[okay], from[okay]) : > write error during file append > **** byte-compile and prepare package for lazy loading > **** help > Warning in gzfile(file, mode) : > cannot open compressed file '/usr/local/lib/R/site-library/BSgenome.Hsapiens.UCSC.hg19/help/paths.rds', probable reason 'No such file or directory' > Error in gzfile(file, mode) : cannot open the connection > ERROR: installing Rd objects failed for package 'BSgenome.Hsapiens.UCSC.hg19' > * removing '/usr/local/lib/R/site-library/BSgenome.Hsapiens.UCSC.hg19' > * installing **source** package 'cgdv17' ... >
Vince Carey (09:08:29) (in thread): > disk space limitation?
Vince Carey (09:09:05) (in thread): > in summary, the dockstore workflow is failing many ways at this time, but a week or two ago it worked fine…
Vince Carey (09:10:22) (in thread): > clearly i need to reduce dependencies in my packages but i am not sure this one can be avoided. it is useful to have the reference sequence
2019-02-19
Vince Carey (06:50:37): > @Levi WaldronThe dockstore app for variant annotation is athttps://dockstore.org/workflows– if you type “WGS” into the search workflows box, you should come tohttps://dockstore.org/workflows/github.com/vjcitn/vardemo/AnnotatingWGSVariantsWithBioc:master?tab=infoand the WDL is visible under the Files tab.
Vince Carey (06:51:20): > We can also get to the WDL through the AnVIL package dockstore$ API …the docu needs to be written….@BJ Stubbshas demonstrated this
2019-02-20
Martin Morgan (06:29:18): > <!channel>I started a document for agenda / notes for today’s 10am meeting athttps://docs.google.com/document/d/1e4Hs94UgBYSoyllotS-rHIVX5X7NT-I-ce1UpRJZK9U. Mostly hoping for a friendly discussion of what we’re all working on so that we can coordinate / be productive together. Feel free to add links / notes on your activities. We’ll meet athttps://bluejeans.com/230794278
Nitesh Turaga (12:08:19): > Regarding the access of workspaces through Terra for people whose “fc-credits” (free credits) have run out > > Hi Nitesh, > > Thank you for using Terra and giving us valuable feedback! We are working on migrating all billing infrastructure to Terra but while we work on this functionality, the workaround would be to add your GCP billing account in FireCloud which then would allow you to see the billing account options in Terra when you attempt to make/clone a new workspace. > > Please let us know if you have any other questions or suggestions! > > Sushma >
Vince Carey (20:46:28): > Here is the problem with Dockstore:Firecloud that seems only to come up with waldronlab/bioconductor-devel … when based on release, the workflow succeeds. > > * installing **source** package ‘BSgenome.Hsapiens.UCSC.hg19’ ... > **** R > **** inst > Warning in file.append(to[okay], from[okay]) : > write error during file append > **** byte-compile and prepare package for lazy loading > **** help > Warning in gzfile(file, mode) : > cannot open compressed file '/usr/local/lib/R/site-library/BSgenome.Hsapiens.UCSC.hg19/help/paths.rds', probable reason 'No such file or directory' > Error in gzfile(file, mode) : cannot open the connection > ERROR: installing Rd objects failed for package ‘BSgenome.Hsapiens.UCSC.hg19’ > * removing ‘/usr/local/lib/R/site-library/BSgenome.Hsapiens.UCSC.hg19’ > * installing **source** package ‘cgdv17’ ... > **** R > **** data > **** inst > Warning in file.append(to[okay], from[okay]) : > write error during file append > Warning in file.append(to[okay], from[okay]) : > write error during file append >
Vince Carey (20:47:15): > It seems to me this is a disk space exhaustion. I specify that the runtime should have 40GB disk but it may not be accessible to the R installation.
Vince Carey (20:52:46): > The issues I mentioned with bzlib seem not to be present any more.
Vince Carey (21:56:54): > The space constraint for package installation has been solved using a runtime statement in the WDL: bootDiskSizeGb: 50
Vince Carey (21:58:40): > @BJ Stubbs@Shweta Gopalwe can now proceed with the parameterized WDL. I think we still have to watch out for bugs in locateVariants() at present, however.
BJ Stubbs (22:03:55): > @
BJ Stubbs (22:05:43): > Awesome, we will sally forth.
2019-02-26
Martin Morgan (09:06:03): > BJ – for dockstore, if I authenticate athttps://dockstore.org/accountsand add a ‘token’ (hmm, but from which method, google, dockstore, github, …?) into services/dockstore/auth.json, then when I try to create .Dockstore(Service(…)) a page opens in google telling me that client_id is missing. I guess it’s fromhttps://github.com/Bioconductor/AnVIL/blob/master/R/authenticate.R#L68where auth.json doesn’t have this information… ??
Martin Morgan (09:06:15): > Should have tagged that@BJ Stubbs
BJ Stubbs (09:39:03): > I clicked on the eye icon under the dockstore listing, then a token should appear. Create an auth.json file and put {“token” : “yourtoken”} in the service directory for dockstore. The code in the dockstore.R file should build the rapiclient with this bearer token in the header.
Martin Morgan (09:53:59): > OK, so this is because when the package is installed (when your code runs…)interactive()
returns FALSE,https://github.com/Bioconductor/AnVIL/blob/master/R/authenticate.R#L57so it never gets to line 68; on the other hand devtools::load_all() is interactive and then fails… thanks for the insight
Martin Morgan (10:14:32): > @BJ Stubbshow would I know (is there an endpoint I can invoke?) that would let me know that I’m communicating and credentialed successfully ?
BJ Stubbs (10:26:44): > rjson::fromJSON(httr::content(dockstore$getUser(),“text”))
BJ Stubbs (10:27:05): > If you are authenticated, this will return information about you
Martin Morgan (17:15:00): > I implemented both thetags()
and dockstore authentication functionality;tags()
returns a tibble rather than list. Also and fwiw the functionsflatten()
andstr()
(and other ideas?) are there to make it easier to work with json, e.g.,dockstore$getUser() %>% str()
seems to be useful for things that are basically ragged, whereasflatten()
is useful when the result can mostly be represented as a tibble. I’m not sure if I’ve got the functionality right, and spent a bit of time (outlined in the commit messages) revising the code to be a little more consistent with the overall package ‘philosophy’. > > Probably authentication needs to be revisited across these services!
2019-02-27
Martin Morgan (15:30:50): > <!channel>please make short updates tohttps://docs.google.com/document/d/1W0g2uBQeBaURAf2JMaobI-E7ef6zyLZU8Nvb9_b-s9Ifor progress during the last two weeks, by tomorrow (Thursday) noon Eastern. Happy to give permissions if needed…
2019-02-28
USLACKBOT (06:00:15): > Reminder: Please briefly summarize your AnVIL activities over the last two weeks.
Vince Carey (11:58:31): > @Martin Morgan@Levi WaldronThe firecloud TCGA data seems to be in the form of one text file per assay type per subject. These text files can be read from google buckets; i have used cloudml::gs_copy to make the data accessible but i don’t know if there are more convenient ways to read text from gs: resources. In any event, it seems to me would could populate a bucket with MAEs built from firecloud TCGA data, which would be more convenient. I will go through the exercise of making a SummarizedExperiment for RNA-seq for one tumor and then we can consider how to do this comprehensively.
Martin Morgan (12:52:40) (in thread): > @Nitesh Turagais there a pull request tolocalize()
gs buckets to local file system? Is that what Vince would use?
Nitesh Turaga (13:05:01) (in thread): > There isn’t a pull request yet, it’s just on a branchhttps://github.com/Bioconductor/AnVIL/tree/nitesh_dev. (sync.R) > > Once you launch a notebook and install AnVIL, you should be able togsutil_cp(google_bucket = "<name>/<file>", target_path = "my_local_folder")
for a single file/folder in the bucket. > > orlocalize(google_bucket, local_path)
, which will sync the entire folder.
Vince Carey (14:52:16) (in thread): > Thanks – I will try that. I am not really local however for this application – I am doing this in a firecloud notebook.
Nitesh Turaga (14:53:02) (in thread): > I see, I don’t think this is the right application then.
Martin Morgan (15:23:38) (in thread): > doesn’tlocalize()
localize to the ‘local’ disk, so if you’re running in firecloud it’ll localize to whatever instance you’re running in firecloud?
Nitesh Turaga (15:25:59) (in thread): > yes, it does. I misunderstood Vince’s comment. localize will copy the file to the “jupyter/Rstudio” instance running on firecloud.
Martin Morgan (15:34:14) (in thread): > Thanks for the updates; seems very interesting work…
Levi Waldron (15:35:31): - File (JPEG): Terra being presented at NCI by Danielle Ciofani of the Broad
2019-03-01
Vince Carey (23:15:10) (in thread): > I think the only problem with this approach is that firecloud is just using R 3.5 at this time. Do we need 3.6 for the AnVIL package?
2019-03-03
Levi Waldron (10:25:31) (in thread): > Sorry I’m behind here. What’s the advantage to building SE and MAE objects from the single patient profiles in Firecloud, as opposed to just using curatedTCGAData? curatedTCGAData was a lot of work. We could even create a shared bucket for the ExperimentHub cache.
Vince Carey (14:45:24): > There is no advantage except that of synchrony with what Firecloud actually offers. I need to audit this more carefully, but I find that for COAD the curatedTCGAData image has fewer subjects. That motivated me to do the work in Firecloud. The two approaches are not mutually exclusive and I would say that it would be good to go ahead and put curatedTCGAData in some Firecloud-accessible form, perhaps in gs buckets.
Levi Waldron (16:37:17): > Right - I do have some regret over using RTCGAToolbox for the curatedTCGAData pipeline, but it seems like too much work to re-implement. It would be nice to have even some limited comparisons though. The pipeline for creating curatedTCGAData is athttps://github.com/waldronlab/MultiAssayExperiment-TCGA.
Martin Morgan (19:16:35): > What does ‘put curatedTCGAData in some Firecloud-accessible form’ mean? Presumably this is available as a Bioconductor / ExperimentHub package in Firecloud? Or do you mean the underlying files, which are currently a mix of HDF5 (for methylation data) and rda, should be available for use outside R? It seems like the approach here would be to continue to make these available but as plain text files (e.g., csv, with separate row / column / assays, or maybe sqlite, or…) with more derived objects created ‘on the fly’. Or instead to take the extra step and develop a full (MariaDB?) database representation of this for truly dynamic queries… this sounds like a very big job (but not altogether uninteresting or un-useful…)
Vince Carey (19:56:53): > I think this needs discussion. I created a single SummarizedExperiment from the native TCGA content in Firecloud workspaces to see what was involved. This still needs to be checked – I noted that one clinical data file for one individual asserted to be available in a kind of manifest was not present where it was supposed to be. I would like to compare what I got to both curatedTCGAData and to Genomic Data Commons.
Vince Carey (19:57:59): > The database representation might not be necessary if we have access to BigQuery. But if we do not, or we don’t like the BigQuery representation, then this seems worth considering.
Vince Carey (19:59:19): > I think we should consider something in addition to just “letting users install and use curatedTCGAData” because, at least in the notebook setting, it does not seem straightforward to have a durable cache of resources, like BiocFileCache or *Hub support when we are working on laptops.
Levi Waldron (20:00:16): > I think cBioPortal has done the work of putting TCGA in a database and providing API access, and I’m looking forward to providing an MAE wrapper to that. Not sure how it compares to the BigQuery database.
Levi Waldron (20:01:19): > I hope it will be straightforward to provide mountable volumes containing AnnotationHub / ExperimentHub caches and compiled packages.
Vince Carey (20:01:24): > Would you use DelayedArray interface? (for the cBioPortal interface)
Levi Waldron (20:03:29): > My current thinking is that it will use Rapiclient to construct ordinary matrices and DataFrames in SEs and MAEs. The API is really intended for small slices of the data, not bulk downloads.
Vince Carey (20:08:53): > OK. I don’t plan to do much more than verify what I did in the TCGA notebook. But the task of creating SEs from the Firecloud content seems pretty generic and could be done wholesale if there was interest. I wouldn’t want to do it if it duplicated some other Bioconductor approach, but the way of achieving persistent and shareable access to *Hub type resources is still not clear to me.
Martin Morgan (20:15:46): > Does the notebook remember state at al? So just storing the hub cache in the correct location? Alternatively, using AnVIL::localize / delocalize to push the cache to / retrieve from a google bucket, at least for individual users? I realize localize / delocalize is still a work in progress…
Levi Waldron (20:21:09): > FireCloud workspaces do, and is convenient for letting you mount data volumes to local directories. But I guess any container running on GCP should be able to mount a public bucket to a local directory? I could look into this.
Levi Waldron (20:30:53): > @Vince CareyBTW I’m still worried that DelayedArray has an unsolved (and maybe unsolvable?) performance bottleneck in access of arbitrary rows. I think your question of how to achieve persistent and shareable access to *Hub type resources is something we should discuss carefully.
Martin Morgan (20:35:17) (in thread): > I think this is what localize / delocalize is doing, using gs_util. Just wanting us not to reinvent the wheel (of course inventing the right wheel…); the functionality is currently on a branch, athttps://github.com/Bioconductor/AnVIL/blob/nitesh_dev/R/sync.R
Martin Morgan (20:37:52): > Maybe useful remember that DelayedArray != HDF5Array, and the problem is in HDF5. That said I think if the use case is to subset data that fits in memory, and then to work with it (rather than wanting to process data that is too large for memory), then one should just do that – get the subset and represent it as a plain-old matrix without any ‘delay’ involved.
Levi Waldron (20:53:26): > The faults lie with HDF5, but does that distinction probably doesn’t matter to users. I’ll formalize and post a more realistic use case and benchmark to#bigdata-repinvolving processing a dataset that (could be) too large for memory. In such a case I don’t know what other answers there may be.
Kasper D. Hansen (20:56:58): > (There has been some recent fixes to rhdf5)
Kasper D. Hansen (20:57:37): > Leaving that aside, if you think about it, it is not clear that any file format which can store “really big” data has random access to rows and columns.
Kasper D. Hansen (20:58:08): > Not that I will defend HDF5; there could be many problems with implementation
2019-03-04
Vince Carey (17:46:52): > VJC update for this week. Seehttps://github.com/vjcitn/mmSCQC/blob/master/README.md
Vince Carey (17:52:38): > Very rudimentary. Upside is that you do not need to install R or any packages to do a QC of single cell data found in EBI ArrayExpress. Just need docker and cwltool or cromwell. NB I made my own docker image; I know we will want to do that differently but working from simple images led to long delays as things like RSQLite and Rhtslib were installed. Happy to take any guidance on this.
Vince Carey (17:55:37): > The simpleSingleCell workflow seems like a good example for workflow language as there are multiple separable, orderable tasks. Need some ideas on how to add syntax to the Rmd to foster derivation of WDL or CWL (+R).
2019-03-05
Martin Morgan (16:25:38): > CCDG CRAMS in buckets – oh oh, CRAM hasn’t (yet) been implemented in Rsamtools; we’ve only recently had an updated Rhtlib with those capabilities…
Martin Morgan (16:36:23): > Am i just being dense about what we are targetting, terra vs. gen3 vs …?
Nitesh Turaga (16:36:45): > No, I think it’s a very good question
Nitesh Turaga (16:36:56): > I’m not sure why Gen3 is the face of AnVIL really
Nitesh Turaga (16:37:03): > It’s just a data source
Vince Carey (16:44:10): > @Nitesh Turagayou have lots of API/app progress?
Nitesh Turaga (16:45:11): > Not sure I have any. Martin already touched on most of my work these last two weeks.
Vince Carey (16:45:37): > ok …
Nitesh Turaga (16:46:14): > What kind of API progress did you have in mind? I will be merging the “sync” branch of AnVIL R package this week.
Vince Carey (16:46:43): > @Martin Morganmaybe we would not work directly with CRAM but have the CRAM called to variants to individual-level VCF
Vince Carey (16:47:00): > That is use an external tool to call from CRAM
BJ Stubbs (16:47:37): > Are the tech call folks aware of the anvil package? I dont see a link anywhere
Nitesh Turaga (16:48:27): > Hmm, I think it’s been touched on two weeks ago in the call.
Martin Morgan (16:49:11) (in thread): > yes, sounds fine for bulk-processing use caess
Vince Carey (16:49:36): > Are we ready for users?staging.anvil.iolooks like something one could actually use. Or is this only accessible to whitelisted developers in this project?
BJ Stubbs (16:53:45): > I can authenticate against it using the api, but I am having issues mapping the endpoints. We can probably clear it up with some convos with the right people.
2019-03-08
Martin Morgan (15:37:42): > <!channel>if you’re interested in getting access to the eMerge data set for development (not research) purposes, please thumbs-up here
2019-03-13
Vince Carey (15:58:17) (in thread): > @Martin MorganAre you still looking for updates by Thursday noon? Don’t want to miss any deadlines
Martin Morgan (22:39:04): > Yes, updates athttps://docs.google.com/document/d/1W0g2uBQeBaURAf2JMaobI-E7ef6zyLZU8Nvb9_b-s9I/edit?usp=sharingI think there’ll be an automated reminder from slack in a few hours…
2019-03-14
USLACKBOT (06:00:29): > Reminder: Please briefly summarize your AnVIL activities over the last two weeks.
2019-03-19
Vince Carey (14:23:15): > Anvil activities: see google doc
Vince Carey (14:23:31): > Are we expecting a call at 4pm today?
Sehyun Oh (16:06:25): > I also thought we have one, but it seems not…
2019-03-22
Vince Carey (11:05:52) (in thread): > any progress on data access?
Martin Morgan (11:28:39) (in thread): > I passed this on to Mo, who said he’d update when there was news… no update from Mo, but I’ll ping him anyway…
Martin Morgan (11:52:43) (in thread): > “We have submitted our application and are awaiting response from eMerge.”
2019-03-25
Martin Morgan (13:29:21): > Would we like to have another conference call, maybe on Friday at noon? Thumbs up or down; if that doesn’t work then I’ll do a doodle poll for next week…
Martin Morgan (13:31:07): > <!channel>on the above…
2019-03-26
Martin Morgan (08:10:07): > <!channel>we’ll meet Friday March 29 at 12pm, Eastern athttps://bluejeans.com/516331493
Nitesh Turaga (16:48:45): > I thought that was a very good meeting, especially how we actually get “something” done in Terra. Thanks@Vince Carey.
Martin Morgan (17:20:21): > yes@Vince Careyit was very valuable for you to drive it in the direction you did
Vince Carey (17:21:51): > :blush:
2019-03-28
USLACKBOT (06:00:28): > Reminder: Please briefly summarize your AnVIL activities over the last two weeks.
Martin Morgan (06:03:14): > <!channel>– updates should go tohttps://docs.google.com/document/d/1W0g2uBQeBaURAf2JMaobI-E7ef6zyLZU8Nvb9_b-s9I/edit
2019-03-29
Nitesh Turaga (13:07:37): > https://github.com/Bioconductor/AnVIL/projects
Nitesh Turaga (13:07:38): > Maybe here?
Vince Carey (14:27:25): > Yes, I think that could work. Maybe transfer key elements from Martin’s spreadsheet. As long as we have a small number of columns and keep the card sets focused, it is worth a try.
Martin Morgan (14:37:37): > i’ll create a separate repo for this, for the overall Bioc AnVIL project rather than this particular package.
2019-03-30
Martin Morgan (20:12:56): > I cloned rapiclient tohttps://github.com/Bioconductor/AnVIL_rapiclient(the package in the repository is still called rapiclient). It has a specific version number with extension-1
,-2
, … to indicate that it is a derivative of a particular version of the upstream package. > > Thehttps://github.com/Bioconductor/AnVILpackage specifies the precise version of rapiclient required; seehttps://github.com/Bioconductor/AnVIL/README.mdI tried to improve the show method for the clients.@BJ Stubbsare there specific operations that are failing? Can you open issue(s) on the AnVIL package?
BJ Stubbs (20:14:47): > I can probably fix the code. I will fork our version and make it work with our api.
BJ Stubbs (20:15:08): > Then submit a pull request
2019-03-31
Martin Morgan (16:00:18): > I updatedhttps://github.com/Bioconductor/AnVIL_Adminas a place for us to coordinate and communicate Bioc / AnVIL development activities. > > Projectshttps://github.com/Bioconductor/AnVIL_Admin/projects- Backlog – Year 1 contains a high-level overview of year 1 objects. This should mostly be considered ‘read only’ > - Active Features is meant to summarize what each of us is currently working on – most of us should be able to add and track our own cards. Please try to keep to just one or two active cards, with bullets & checkboxes to summarize progress (rather than many cards, one per task) so that the overall gestalt is clear.docs/
- the markdown files here get processed (on commit) tohttps://bioconductor.github.io/AnVIL_Admin- I put some preliminary material up, including two ‘Current activities’ that I thought could be a model for other activities – a link to some sort of resource where additional information can be found, plus a short description (to be updated regularly) on the current status of the activity. > > Please > - Update the Active Features project with current activities > - Updatedocs/index.md
with brief summaries of what you’re doing. Feel free to make these via pull requests if you want a quick review before posting. > - Make suggestions, etc, as issues on AnVIL_Admin - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
2019-04-01
Vince Carey (10:51:14): > This looks good@Martin Morgan– the link between cards and docs is a little foggy to me. I just added a card directly. Is that OK on its own or should a card be accompanied by a contribution todocs/
?
Vince Carey (10:52:14): > FWIW here’s my card
Vince Carey (10:53:09): > I guess markdown doesn’t work here?
Martin Morgan (10:58:42): > Cards don’t need to be tied to docs/ contributions. I think of docs/ as the public face of the work-in-progress, so if you want to advertise your work then modify docs/. Apparently limited markdown in slackhttps://get.slack.help/hc/en-us/articles/202288908-Format-your-messages - Attachment (Slack Help Center): Format your messages > Use formatting to add clarity and detail to your message when you need it. There are two ways to format your text, depending on which Slack app you’re using: Format as you type: On desktop and mo…
Vince Carey (11:04:05): > OK, I had not visitedhttps://bioconductor.github.io/AnVIL_Admin/… that is very nice. I hope we can make it more dynamic and attract lots of hits. - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
Vince Carey (13:44:54): > @Martin Morganhave you considered the projects setup as a pattern that other teams should follow? – certainly would be nice to see James’ group’s activities in such a layout.
Vince Carey (13:45:04): > Release early and often.
Martin Morgan (14:43:46): > I’ll bring this up at the project manager meeting on Wednesday
2019-04-02
Vince Carey (15:17:57): > is there a 4pm tech call today?
Vince Carey (15:41:16): > I see that there is nothttps://docs.google.com/document/d/1XcTR3rDFP4oE_4Ggl1WfD7nRfKPSHmWaE2_amIjfuk4/edit?usp=sharing
2019-04-08
Martin Morgan (09:46:32): > @Shweta Gopal(or others in the<!channel>) I wanted to get up to speed on kubernettes essentials today at 11 or 12, any chance of a meeting, maybe athttps://bluejeans.com/504661320?
Shweta Gopal (10:24:37): > Hi Martin, Yes 11 or 12 pm would be okay. Thanks.
Martin Morgan (10:50:37): > Ok let’s meet at 11 athttps://bluejeans.com/504661320
Shweta Gopal (10:51:41): > Sorry, I think 12 would be better. Would that be okay?
Vince Carey (10:51:54): > Martin have you engaged Sean Davis on this? I have had a brief interchange on K8S with him andhttps://vjcitn.github.io/chanKubernetes/collects some of the points he raised. - Attachment (Channing / Kubernetes): Channing Network Medicine / Kubernetes at Landmark > project arrangement attempt for k8s methods
Vince Carey (10:53:06): > I’ll be happy to get on the call today … but I would say we might want to do a bit more homework to set up a presentation … whatever you like
Martin Morgan (10:55:33): > yes 12 would be fine; I’m just looking for some self-education and am happy to spend the informal time today;@Sean Davisis of course welcome to join!
Shweta Gopal (10:56:25): > @Martin Morganthanks!
Sean Davis (10:57:02): > @Sean Davis has joined the channel
Sean Davis (10:57:13): > Happy to join you all.
Levi Waldron (10:58:16): > See you there.
Vince Carey (12:35:49): > slides by Shweta and BJ:https://docs.google.com/presentation/d/1Y7g_6X8I6DPaNK84EzWNo1wVpfAwdORGt6kcgcPYOV4/edit?usp=sharing
Martin Morgan (13:04:30): > so what’s helm ?
Martin Morgan (13:21:01): > helm: templating system for complicated kubernetes deployments, e.g.,https://github.com/CloudVE/galaxy-kubernetes/blob/v3/galaxy/templates/service.yamlgalaxy to be presented tomorrow
Sean Davis (14:38:07): > With regard to helm and managing kubernetes configurations:https://www.reddit.com/r/kubernetes/comments/b4wigh/what_are_some_best_practices_for_organizing/ - Attachment (reddit): r/kubernetes - What are some best practices for organizing kubernetes yaml when dealing with multiple microservices in a project? > 17 votes and 16 comments so far on Reddit
2019-04-09
Vince Carey (06:10:29): > https://kubespray.io/#/, discussed athttps://opensource.com/article/19/3/bringing-kubernetes-bare-metal-edge - Attachment (kubespray.io): Kubespray - Deploy a Production Ready Kubernetes Cluster > Deploy a Production Ready Kubernetes Cluster - Attachment (Opensource.com): Bringing Kubernetes to the bare-metal edge > New Kubespray features enable Kubernetes clusters to be deployed across next-generation edge locations.
Vince Carey (15:54:28): > Tech call today, right?
2019-04-10
Martin Morgan (03:20:29): > set up a reminder “Please briefly summarize your AnVIL activities over the last two weeks https://docs.google.com/document/d/1W0g2uBQeBaURAf2JMaobI-E7ef6zyLZU8Nvb9_b-s9I/edit?usp=sharing” in this channel at 6AM every other Thursday (next occurrence is tomorrow), Eastern Daylight Time.
Vince Carey (13:16:36): > CloudRun was referenced in call yesterday:https://www.zdnet.com/article/google-cloud-platform-launches-cloud-run-aims-to-bring-enterprise-workloads-to-serverless-kubernetes/ - Attachment (ZDNet): Google Cloud Platform launches Cloud Run, aims to bring enterprise workloads to serverless, Kubernetes | ZDNet > Google’s aim for Cloud Run is to make it easier to run more enterprise workloads via containers, integration, and serverless functions.
2019-04-11
USLACKBOT (06:00:24): > Reminder: Please briefly summarize your AnVIL activities over the last two weekshttps://docs.google.com/document/d/1W0g2uBQeBaURAf2JMaobI-E7ef6zyLZU8Nvb9_b-s9I/edit?usp=sharing
2019-04-16
Vince Carey (21:46:56): > Hi – i tried to update my AnVIL and got a strange > > Error: package or namespace load failed for 'AnVIL' in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): > namespace 'rapiclient' 0.1.2.0002 is being loaded, but == 0.1.2.2.2 is required > Error: loading failed > recover called non-interactively; frames dumped, use debugger() to view > Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : > namespace 'rapiclient' 0.1.2.0002 is being loaded, but == 0.1.2.2.2 is required >
Vince Carey (21:48:28): > I am seeing Version: 0.1.2.0002-2 athttps://github.com/Bioconductor/AnVIL_rapiclient/blob/master/DESCRIPTION
Vince Carey (21:48:48): > And I see a 0.1.3 for rapiclient at@BJ Stubbsgithub repo
Vince Carey (21:49:31): > I think I am at the master branch ofhttps://github.com/Bioconductor/AnVIL.git
2019-04-17
Martin Morgan (07:08:33): > Is your AnVIL current? I see a dependency there on rapiclient on 0.1.2.000-2 which matches rapiclient
Vince Carey (08:30:02): > mystery … i don’t know where this 0.1.2.2.2 requirement came from. i removed both AnVIL and rapiclient, reinstalled, and proble went away
Martin Morgan (08:50:08): > R reports 0.1.2.0002-2 in a ‘normal’ form 0.1.2.2.2. The 000x form seems to come from the tidyverse people; seems like a mistake. The requirement for 0.1.2.0002-2 is from the AnVIL package, synced with between master AnVIL and master Bioconductor/rapiclient whenever the master branch of rapiclient changes
Martin Morgan (08:55:00): > <!channel>– about version numbering – we’re a fork of bergant/rapicient, which is at version 0.1.2.0002. Since ideally we’d like to provide up-stream pull requests, the idea is to provide versions that differentiate from the base but do not force a version bump – that’s the job of the upstream package maintainer. So we add another field to the version, using-1
,-2
, etc for our contributions. Going to something like 0.1.3 as bj did isn’t really a good idea; at some point we might expect the upstream package to increment to 0.1.3, and then there are two implementations of the same version. Also, versions (at least once ‘published’) are generally one-way – a version 0.1.3 can’t be replaced by version 0.1.2.0003, because R (and bioconductor, and humans) would recognize 0.1.3 as ’more recent than 0.1.2.0003.
BJ Stubbs (12:48:27): > Interesting. What version should I change my fork to before issuing a pull request?
Martin Morgan (13:11:19): > yeah, a good question. My logic above says 1.2.0002-2.1 if it’s based on 1.2.0002-2
Kasper D. Hansen (13:57:24): > does it make sense to talk with the rapiclient author to get more … sane versioning
Vince Carey (14:18:10): > Oh, that point about normalization of version numbers is important. I was dumbfounded.
Vince Carey (14:21:11): > Quick question about anvil/leo – I am unclear about the role of leo – are there examples of bioinformatic processes that use leo? The main api topics are cluster and notebooks. Should we write a workflow (for internal demonstration at this point) that uses the AnVIL package to do something?
Martin Morgan (14:27:49) (in thread): > Yes I’ll try to do that; I think he’ll be accommodating. > > Part of the sane version numbering has to do with forks-of-forks-of repositories. I don’t really know what the best practice is, especially since version number is in many way identical to (not easily ordered) git hash.
Martin Morgan (14:34:02) (in thread): > This sounds like ‘why leo / why terra?’ and if so I think it’s useful to ask what leo is giving us – I think flexibility (you can use ‘any’ docker container). If the flexibility came almost for free (you didn’t have to think about how to spin up your specialized docker container…) then it would seem to be a good thing. So to that extent I think it would be interesting to come up with a demonstration that uses AnVIL to spin up a work flow or jupyter notebook on an ‘arbitrary’ docker image. > > FWIW there was a post by Gábor Csárdi in the R-package-devel mailing list where he saidrhub::check(<package-file>, platform = "debian-clang-devel")
would do R CMD check using an arbitrary image. Didn’t seem to work in practice but seemed like a pretty interesting idea
Vince Carey (15:01:28) (in thread): > When you write “uses AnVIL” you mean the AnVIL package, right?
BJ Stubbs (15:22:47) (in thread): > Leo and workflows is a bit interesting. I think you would need an rstudio container with docker and cromwell or cwltool installed. Then, you could run the cwltool system command from rstudio to run the workflow. Notebooks you can just push to the bucket or import in jupyter i think
Sean Davis (16:02:57) (in thread): > For workflows on cromwell (server), take a look at:https://seandavi.github.io/wdlRunR/.
Sean Davis (16:03:19): - Attachment: Attachment > For workflows on cromwell (server), take a look at: https://seandavi.github.io/wdlRunR/.
Vince Carey (18:13:55) (in thread): > Yes, wdlRunR takes care of cromwell/wdltool acquisition and positioning. So we could definitely get mileage out of that. One issue we have with cromwell in AnVIL is that the deployment there does not allow use of BigQuery – there is a “scopes” restriction that has to be relaxed. This impedes convenient demonstration of TCGA analyses that use ISB-CGC image. The issue is known and filed but the time for solution is unknown.
Vince Carey (18:21:47) (in thread): > Could we have an agreement that bergant uses 0.x.y, when we fork 0.x.y we use 0.x.y.z, z a simple integer. In our group, if a pull request is generated on 0.x.y.z the DESCRIPTION file uses 0.x.y.(z+1) – there could be collisions among pull requests and the owner of 0.x.y.z (implicitly Martin) resolves them to a new value of z … the resolution process could be hard but probably will not be, and certain requests could be deferred for resubmission as z increases.
Martin Morgan (19:41:50) (in thread): > The current scheme is 0.x.y.z (bergant) and 0.x.y.z-w (AnVIL), with our increments on w. Certainly we could change the-
to a.
. Probably it’s still useful to think of a scheme for fork-of-fork-of-fork, which is what BJ has; presumably he doesn’t want to version as w+1, because perhaps someone else will get a pull request in before he’s ready… I’ll ask bergant about simplifying version numbering to 0.x.y, along with a minor pull request tomorrow or Friday.
2019-04-19
Sean Davis (08:16:21) (in thread): > Just to be clear, the “cromwell” that is used with wdlRunR is a standalone server. The scopes used are customizable.
Vince Carey (13:12:06): > Will we need to write an interface to rawls for AnVIL package?
Martin Morgan (13:42:33): > Yes i’ll do that. I’ll also update the authentication approach; I’ll add to the project board
Martin Morgan (13:46:48): > @BJ Stubbsif you’ve got bug fixes it works better (for me) if these come as simple pull requests rather than saving things up in a big complicated one
Martin Morgan (13:56:34): > <!channel>it would be good to see a work flow w/out broad expert guide; maybe BJ or someone else would like to do that some time next week? Maybe express availability via emojisTues:+1:1-2 or:tada:3-4, or Wed:clap:1-2 or:ok_hand:3-4 ? Plus general updates?
BJ Stubbs (14:01:43): > If we change the auth approach, I think we should make sure the user sees which account they will log in as first
Nitesh Turaga (14:01:51): > /poll Time for next week? “Tuesday 1-2” “Tuesday 3-4” “Wednesday 1-2” “Wednesday 3-4”
Unknown User (14:01:51):
2019-04-20
BJ Stubbs (00:39:10): > Fyi for 2 weeks i teach tues 3-5.50
2019-04-22
BJ Stubbs (13:50:46): > Hmm, discussion on the anvil slack client api channel brings up an interesting point. If we hard wire the apis into the package, then release schedules will prevent us from updating if the devs change things. Should we have an annotation hub like solution to fetch new specifications if they are detected?
Vince Carey (13:54:50): > IMHO we need a continuous integration framework and ‘devel’/‘release’ streams for the API. We need a test suite that informs us that our developments are or are not consistent with the prevailing API and any proposed changes. Is this something to ask for?
Sean Davis (14:14:58): > Parts of the API can be discovered if using swagger/openapi. It would be useful to think about how the package can adapt to changes. For example, the GDC API and data model are quite complex, but the GenomicDataCommons package uses built-in discovery methods, making the package pretty stable even as the underlying data model evolves.
Sean Davis (14:15:45): > That said, testing seems like a necessity.
Martin Morgan (14:16:02): > <!channel>let’s aim for Wednesday 1-2 for a meeting, athttps://bluejeans.com/801233735
Martin Morgan (14:18:06): > I think we no longer have to ‘hardwire’ the api, we can read the json / yaml from the web. Of course bioc versions older than current release / devel fall off the maintenance bandwagon, but that’s because the upstream API changes…
Vince Carey (14:19:41): > In the world of testing it seems natural to consider a mock server that can run locally … the available data for responses would be sharply curtailed but you would have enough to distinguish valid from invalid requests for a given deployment of the API. Is this done?
Martin Morgan (14:33:24): > I have no experience implementing mock services in this way; I don’t have a good sense for how comprehensive / helpful they can be, especially when the interface is generated from the advertised API ?
Sean Davis (16:00:03): > There are two different flavors of testing here. The API itself needs tests, both for conformance to stated schema, etc., and functional testing to check validity of data and behavior. The client may implement some of those tests as well, but further testing that is specific to the client (returns a data frame, etc) are perhaps somewhat orthogonal to the API-focused testing.
Sean Davis (16:00:11): > As for mocking, see:https://app.swaggerhub.com/help/integrations/api-auto-mocking - Attachment (app.swaggerhub.com): Build, Collaborate & Integrate APIs | SwaggerHub > Join thousands of developers who use SwaggerHub to build and design great APIs. Signup or login today.
Sean Davis (16:02:15): > With a fleshed-out openapi/swagger spec, a mock server can be set up using swaggerhub.@Vince Carey, I haven’t checked to see if a mock server can be run locally, but I don’t necessarily see a need for that except for offline work.
2019-04-24
Martin Morgan (09:57:33): > <!channel>Reminder: AnVIL team meeting the workflows edition athttps://bluejeans.com/801233735today at 1pm
Vince Carey (15:38:29): > Thanks again to BJ for a very comprehensive and concrete illustration of the API functionalities. I believe his slides will be linked into thegithub.iosoon if they are not there already.
Vince Carey (15:39:43): > I would like to get clearer on the relationship of leonardo to terra and Bioconductor. My attitude to date is that the best way to get Bioconductor users comfortable in terra is to build jupyter notebooks embodying key workflows, and to share the workspaces holding those notebooks.
Vince Carey (15:40:18): > However issues with getting access to R 3.6 are a problem for that outlook, as that would bar us from using current images of the key software, annotation, and data.
Vince Carey (15:41:46): > Therefore an approach to using terra that is container-oriented seems to be essential. The sooner it becomes clear how to do that, the better – particularly if the notebook interface can be used.
Martin Morgan (16:04:22): > > > tags(leonardo, "cluster") > # A tibble: 8 x 3 > tag operation summary > <chr> <chr> <chr> > 1 cluster createCluster Creates a new Dataproc cluster in the given project… > 2 cluster createClusterV2 Creates a new Dataproc cluster in the given project… > 3 cluster deleteCluster Deletes an existing Dataproc cluster in the given p… > 4 cluster getCluster Get details of a Dataproc cluster > 5 cluster listClusters List all active clusters > 6 cluster listClustersByPr… List all active clusters within the given Google pr… > 7 cluster startCluster Starts a Dataproc cluster > 8 cluster stopCluster Stops a Dataproc cluster >
> allows you to create / start / etc an arbitrary container
Vince Carey (20:50:46): > Thanks. The R method seems hard to debug: > > > leonardo$createClusterV2(googleProject="anvil-leo-dev", clusterName="vincemmsc", rstudioDockerImage="vjcitn/mmsc1:latest") -> oo > Error in parameters[[which(parameter_idx)]] : > recursive indexing failed at level 3 > > Enter a frame number, or 0 to exit > > 1: leonardo$createClusterV2(googleProject = "anvil-leo-dev", clusterName = "vi > 2: get_message_body(op_def, x) > > Selection: 0 >
2019-04-25
USLACKBOT (06:00:29): > Reminder: Please briefly summarize your AnVIL activities over the last two weekshttps://docs.google.com/document/d/1W0g2uBQeBaURAf2JMaobI-E7ef6zyLZU8Nvb9_b-s9I/edit?usp=sharing
2019-04-29
Martin Morgan (11:44:06): > Both AnVIL_rapiclient and AnVIL have been updated to address the problem Vince mentions above. There were actually two parts to the fix – supporting multiple “in:” parameters, and providing a convenient representation for json{}
, which is the symbolAnVIL::empty_object
. So > > leonardo$createClusterV2( > googleProject="anvil-leo-dev", clusterName="vincemmsc", > rstudioDockerImage="vjcitn/mmsc1:latest", > labels = empty_object > ) >
> should work
BJ Stubbs (15:04:14) (in thread): > This is due to rapi_client
BJ Stubbs (15:04:23) (in thread): > If you use my version, it should work
BJ Stubbs (15:04:46) (in thread): > https://github.com/bjstubbs/AnVIL_rapiclient
BJ Stubbs (15:05:20) (in thread): > The error is caused by the approach to parsing the parameters in the get_message_body function
2019-05-01
Vince Carey (07:38:49): > @Martin Morgan@Sehyun Oh@Nitesh Turaga@Levi Waldronshould we write a brief joint document on data curation vs data ingestion? i felt the call last night was disappointing in terms of process in hand for decisionmaking about standards and tools for conformance checking. I will start a google doc and provide link soon
Vince Carey (08:09:43): > Here is early sketch of data ingestion/curation notehttps://docs.google.com/document/d/1TNwY-6E879m8PWOTIAaN0CvZnyrnSkccaw-NGGy-3Y8/edit?usp=sharing
Samuela Pollack (09:19:23): > @Samuela Pollack has left the channel
2019-05-02
BJ Stubbs (16:38:22): > odd. I think I tried to create a leo cluster the other day, and it worked fine. But.. There is no billing account associated with that google project, so why would it work? I can see the compute instance I created on the account with no billing.
BJ Stubbs (16:42:36): > Also, a shiny app to kill and delete clusters might be a useful addition to the api
2019-05-03
Martin Morgan (09:30:22): > For the shiny app, is the starting point just some functioncluster_statistics()
or something to return a tibble with instances and their states, andcluster_manage()
to change state? just wrapping the Leo API in a more R-friendly way. Obviously the shiny app is built on top of this (personally I would opt for command line rather than shiny, so I don’t want to have to reverse-engineer the shiny app…)
Sean Davis (10:54:58): > +1 for the command-line API first.
2019-05-04
Vince Carey (09:34:06): > I think we should not let the “ingestion” topic die … somehow the dialogue Sean and I carried on in the comments section needs to be digested. I can try this – but before I do,@Martin Morgan@Levi Waldronhave you had any thoughts?https://docs.google.com/document/d/1TNwY-6E879m8PWOTIAaN0CvZnyrnSkccaw-NGGy-3Y8/edit?usp=sharing
2019-05-05
Martin Morgan (18:57:22): > My feeling is that ‘they’ must already have very well-developed protocols for data ingestion, as they are essentially responsible for the GDC, and other resources? Probably this is so heavy and internalized that it is no longer ‘exposed’ to us – the resources that they’d like to ingest are very substantial, with teams on both contributor and ingestion sides involved. So I’m wondering whether you’re looking for a much lighter weight way to contribute user-level data resources, the efforts of some third party to make a curated data resource available? I’ll try to add a few comments to the doc…
2019-05-06
Vince Carey (12:04:34): > The ingestion protocol prototype document is very good but addresses concerns specific to contributions of data from arbitrary sources. There are data sets that AnVIL is obligated to house and process that need to be ingested – does the protocol need to be vetted and implemented before resources like CCDG are exposed to us? Perhaps, and then we just wait. The waiting is problematic because (IMHO) we have zero access to the endorsed indexed data views. We do not know whether patterns of data and annotation affordance common in Bioconductor will work smoothly with the endorsed representations. My experience with the Terra representation of TCGA and (legacy) 1000 genomes suggests that Bioconductor patterns will not deploy against these representations without significant work. But I really don’t know. That is why I am expressing a bit of impatience in this domain. Whatever work needs to be done to have Bioc users work efficiently with the AnVIL resources may be good for both AnVIL and Bioconductor. Or it may turn out that certain approaches used in Bioconductor (e.g., *Hub) present certain advantages, and ingestion and conveyance of Bioc-oriented resources could be generally beneficial in AnVIL. It would be nice to be gathering performance and feasibility information on these notions.
Martin Morgan (16:33:41) (in thread): > The TCGA & 1000 genomes representations in Terra are basically collections of files, right? And that seems pretty old-fashioned and far removed from our use cases? And there aren’t provisions for more modern or curated approaches [e.g., BigTable representations], for more ‘user’-oriented contributions, etc? I think I get the gist of the ingestion document, but am I understanding the goal or just bringing my own baggage?
Martin Morgan (16:58:11): > I did some weekend work on my vision of k8s – launch redis, and generic workers, and an interactive front-end; see the kubernetes section k8s-redis-bioc-example inhttps://bioconductor.github.io/AnVIL_Admin/ - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
2019-05-07
Nitesh Turaga (13:30:51): > I did some followup work to martin’s k8s repo,https://github.com/nturaga/rstudio-k8s. I managed to get rocker/rstudio launched on k8s using a minikube cluster.
Vince Carey (16:07:53): > is there a call? how does one connect?
Nitesh Turaga (16:08:04): > https://meet.google.com/xoa-eqym-uxc - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
Martin Morgan (16:56:06): > vince if you (or anyone here…) is already a member of dockstore I’ll add you to the Bioconductor organization
Martin Morgan (16:56:23): > Should have@Vince Careyabove
Martin Morgan (17:31:25): > Great job@BJ Stubbs&@Shweta Gopal! Also the Robert Johnson reference, which had gone over my head…
BJ Stubbs (17:32:23): > That was@Vince Carey
2019-05-08
Martin Morgan (16:26:34): > Mo points to NCBI Computational Medicine in the Cloud Hackathonhttps://ncbiinsights.ncbi.nlm.nih.gov/2019/05/08/computational-medicine-in-the-cloud-june-10-11-2019/10JUN19 and 11JUN19, at JHU > Galaxy team participating via Visualization of Complex Variants > and asks whether there’s interest from other AnVIL teams? - Attachment (NCBI Insights): Computational Medicine in the Cloud Hackathon: June 10-11, 2019 > We are pleased to announce the first ever Computational Medicine in the Cloud Hackathon! NCBI will help run a bioinformatics hackathon in Baltimore, Maryland hosted by the Johns Hopkins University. > We’re specifically looking for folks who have experience in working with complex haplotypes, complex disease, precision medicine, and similar genomic analysis. If this describes you, please apply! This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for large scale genomic analyses from high-throughput experiments (please note that the event itself will focus on open access public human data). > Potential topics include: > Coherent-phenotype mapping to haplotypes > Mapping haplotype blocks to ontologies > Structural Variants in Health and Disease > Haplotypes and RNA-seq > Complex Variant Structure > Annotation Structures for Complex Variants > Visualization of Complex Variants > Hackathon Logistics > The hackathon runs from 9 am – 6 pm each day, with the potential to extend into the evening hours each day. There will also be optional social events at the end of each day. Working groups of five to six individuals, with various backgrounds and expertise, will be formed into five to eight teams with an experienced leader. These teams will build pipelines and tools to analyze large datasets within a cloud infrastructure. On both days, we will come together to discuss progress on each of the topics, bioinformatics best practices, coding styles, etc. > There will be no registration fee associated with attending this event. > Note: Participants will need to bring their own laptop to this program. No financial support for travel, lodging, or meals is available for this event. > Datasets > Datasets will come from open access public repositories, with a focus on a number of trios produced by long read sequencing as a base graph and short read datasets in the sequence read archive that have been ported to cloud infrastructure, as well as derivative contigs of the above. > Products > All pipelines and other scripts, software, and programs generated in this hackathon will be added to a public GitHub repository designed for that purpose. > Manuscripts describing the design and usage of the software tools constructed by each team may be submitted to an appropriate journal such as the F1000Research hackathons channel, BMC Bioinformatics, GigaScience, Genome Research or PLoS Computational Biology. Ideally, we will present a graph genome, a number of protocols for associating short read omics data with it, and some derived datasets (e.g. variant calls) from such protocols. > Application > To apply, please complete this form. Initial applications are due on May 22, 2019 by 3 pm EDT. We will select participants based on the experience and motivation they indicate on the form. > Prior participants and applicants are especially encouraged to apply. The first round of accepted applicants will be notified on May 24 by 3 pm EDT, and will have until noon EDT on May 28 to confirm their participation (especially qualified applicants or those traveling internationally may receive acceptances earlier). If you confirm, you must be willing to commit to both days of the event, as confirming and not attending prevents other data scientists from attending this event. > Legal > Entrants retain ownership of all intellectual property rights (including moral rights) in the code submitted to as well as developed in the hackathon. Employees of the U.S. Government attending as part of their official duties retain no copyright in their work and their work is in the public domain in the U.S. > The Government disclaims any rights in the code submitted or developed in the hackathon. > Participants agree to publish the code and any related data in GitHub. > For more information, or with any questions, please contact Ben Busby ().
2019-05-09
USLACKBOT (06:00:29): > Reminder: Please briefly summarize your AnVIL activities over the last two weekshttps://docs.google.com/document/d/1W0g2uBQeBaURAf2JMaobI-E7ef6zyLZU8Nvb9_b-s9I/edit?usp=sharing
Vince Carey (11:15:11) (in thread): > Missed this. The representations look like separate files and that may be ideal for key use cases like running GATK or hail. I guess I am waiting for a guarantee that this is the ultimate approach – and then, if it is, to set up GenomicFiles-like interfaces, and then some benchmarking against various ad libitum/R-friendly approaches that we have employed.
2019-05-15
Martin Morgan (14:30:45): > <!channel>Mo says “Hi Martin, we now have access to the re-sequenced 1000 Genomes data, two subsets of CMG data, and a subset of CCDG data on Terra. These data are shared asworkspaces
in your Terra account.” Would be great to get some feedback about this if you have a chance…
BJ Stubbs (16:16:27): > Interesting. For the non 1000Genomes workspaces, you should be able to clone them to get a writable bucket. The data sets contain links to the google bucket crams and crais. These should be usable in WDL based workflows and notebooks. We can try to get an example. The 1000Genomes workspace is a bit more of a pain because it is “requester pays”
BJ Stubbs (16:17:34): > (copied from anvil slack)
BJ Stubbs (16:17:55): > If you create a new workspace, call it my100G or something. Then, download the data from the original anvil-datastorage/1000G-high-coverage-2019 as a tsv (this is just metadata) (edited) > you can upload that data to your new my100G workspace > Now, you have access to the sample urls in your workspace that has a writable bucket > With the upgrade to the pipleline api version 2 that happened this week, cromwell can now access requester pays resources > > So, if you run a WDL workflow on your new my100G workspace, cromwell will eventually send the billing info for the workspace to the bucket, which will give you read access to the data. And since your workspace bucket is writable, you can get output from your workflow. > I am not sure how this works with notebooks at the moment > I also only tried localizing data to the new workspace bucket, not sure if slice access works from within the WDL command
2019-05-17
Vince Carey (07:26:01): > Just a little update here relative to CRAM. Googling on CRAM and Bioconductor leads to the Rhtslib vignette and a single mention; nothing in the support site. I have updated my samtools/htslib infrastructure and am planning to convert the RNAseqData.HNRNPC.bam.chr14 bam files to cram.
Martin Morgan (09:07:59): > Rsamtools has had no testing with CRAM; that’s an obvious need and we’ll prioritize that…
Vince Carey (21:18:20): > Apropos the dockstore organization I should be added with my github identity which is vjcitn …
2019-05-18
Vince Carey (01:53:42): > Apropos the vision for k8s@Martin Morganhave you been able to run this in google cloud platform? My efforts to get run minikube there come to grief. I have filled disks … I don’t seem to be able to select the cluster configuration sensibly.
Martin Morgan (07:27:06): > @Vince CareyI pushed some preliminary instructions for use on gcloud to the README. You need to stop minikube, start the gcloud kubernetes service, and deploy the application. All thekubectl
commands are the same once the gcloud service is started.
Martin Morgan (10:34:05): > (you’ll want to pull the repository)
Martin Morgan (16:32:38) (in thread): > yes vjcitn is a member of the Bioconductor organization on dockstore.
2019-05-19
Sean Davis (08:44:10): > And for the google version:https://cloud.google.com/kubernetes-engine/docs/quickstart
2019-05-20
Vince Carey (06:54:06): > indexed CRAMs for RNAseqData.HNRNPC.bam.chr14 bam files can be seen via > > aws s3 ls[s3://biocfound-cram/](s3://biocfound-cram/) >
> ; an example URL ishttps://s3.amazonaws.com/biocfound-cram/ERR127302_chr14.cram
2019-05-21
Jeff Gentry (21:31:29): > @Jeff Gentry has joined the channel
2019-05-23
USLACKBOT (06:00:13): > Reminder: Please briefly summarize your AnVIL activities over the last two weekshttps://docs.google.com/document/d/1W0g2uBQeBaURAf2JMaobI-E7ef6zyLZU8Nvb9_b-s9I/edit?usp=sharing
Martin Morgan (07:31:07): > <!channel>if you can add update before noon today that would be great!
2019-05-28
BJ Stubbs (12:09:46): > The AnVIL slack leonardo channel seems to say that the leonardo api in the AnVIL R package needs to be updated, because the leo server we were using may be permanently gone.
Sehyun Oh (12:39:03): > Hey@BJ Stubbs, could you make the RStudio available through production leo following Rob’s instruction?
Sehyun Oh (12:39:43): > It didn’t work for me and I’m wondering whether I miss something… ;(
Sean Davis (12:40:10): > I’ve found that testing for these clients benefits from regular (daily or so) testing, not just at package update. Ideally, the CI pipeline for leonardo API would also include testing of the clients.
BJ Stubbs (12:55:48): > spinning up now, I’ll let you know if it creates correctly
BJ Stubbs (13:50:47): > I am having trouble getting this to work. I can start a cluster, but can’t get to rstudio
BJ Stubbs (13:50:49): > The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.
Sehyun Oh (13:52:32): > Thanks for trying, BJ. I got the same issue - cluster started but can’t get to rstudio.
Vince Carey (14:02:06): > have you notified rob title on the anvil slack? maybe invite him over here?
BJ Stubbs (14:17:42): > good point
Martin Morgan (15:08:39) (in thread): > Can you be more specific, with a reproducible example showing what the problem is?
Nitesh Turaga (16:05:25): > No anvil meeting today??
BJ Stubbs (16:07:36): > My calendar says it was canceled
Nitesh Turaga (16:07:54): > I see! Thanks@BJ Stubbs
2019-05-29
Martin Morgan (08:45:18): > Something fun…https://terra.bio/tosc
Jeff Gentry (12:08:31): > Yeah, I was surprised to see that. I think it’s pretty cool, folks should consider poking at it
Jeff Gentry (12:09:34): > Also seeing the Leo discussion above, I’m not often able to directly answer Leo questions but try to keep enough of an eye on the anvil slack to point Rob at them when they come up.
BJ Stubbs (13:17:15): > dockstore api is now version 1.6, AnVIL package has version 1.5.1. I am not sure how significant the changes are
Martin Morgan (13:19:38): > I’ll update, also AnVIL_rapiclient to support online access of yaml and json; this means that we can access apis as they are published, rather than archived in AnVIL. Should get to this by the end of the day today.
BJ Stubbs (13:24:48): > Cool. I think that will really help with the Bicoonductor release schedule. Speaking of updates, should we switch the leo link tohttps://notebooks.firecloud.org/?
Jeff Gentry (13:52:55): > Yeah
2019-05-30
Vince Carey (09:38:16) (in thread): > i did sign up for this and was accepted. waiting on next steps, would be happy to work with galaxy group if R interfaces are of interest
2019-06-04
Martin Morgan (09:39:53): > <!channel>it would be helpful to have a bioc/anvil team meeting to communicate conversations at a ‘reverse site visit’ yesterday held at NHGRI and to come up with more formal directions for the next six months. Express your availability for > -:+1:Thurs June 13, 11am > -:tada:Thurs June 13, 3pm > -:clap:Mon June 17 12pm > -:ok_hand:Mon June 17 2pm > -:-1:None of the above
2019-06-06
USLACKBOT (06:00:27): > Reminder: Please briefly summarize your AnVIL activities over the last two weekshttps://docs.google.com/document/d/1W0g2uBQeBaURAf2JMaobI-E7ef6zyLZU8Nvb9_b-s9I/edit?usp=sharing
Martin Morgan (13:06:56) (in thread): > <!channel>I’ll send out an invite for Thurs June 13 at 11am
2019-06-13
Vince Carey (10:55:28) (in thread): > i am sorry but i do not find an invite. please paste the contact info into this slack channel
Martin Morgan (11:20:39) (in thread): > yeah, well, I didn’t send one out, so it didn’t make it on to my calendar either! Sorry, I’ll try again for a different time…
2019-06-17
Martin Morgan (10:59:26): > The AnVIL package 0.0.13 now hasgsutil_*()
(e.g., ls, cp, rm, rsync) and alsolocalize()
anddelocalize()
to help with syncrhonizing files on buckets and instances. There’s also aninstall()
for package installation, but I wouldn’t use that at the moment… The help pages should be helpful
2019-06-27
BJ Stubbs (14:48:10): > I haven’t tested this yet, but it looks like terra notebooks created with leonardo have the bigquery scope
BJ Stubbs (14:48:13): > “scopes”: [ > “https://www.googleapis.com/auth/bigquery”, > “https://www.googleapis.com/auth/source.read_only”, > “https://www.googleapis.com/auth/userinfo.email”, > “https://www.googleapis.com/auth/userinfo.profile” > ],
BJ Stubbs (14:48:30): > so, maybe if you do a Sys.setenv(GCS_OAUTH_TOKEN=system(“gcloud auth application-default print-access-token”, intern=TRUE))
BJ Stubbs (14:48:45): > you can use bigquery on your billing code
Sean Davis (14:55:19): > SRA and Biosample metadata are available on BigQuery if you need another dataset to play with.https://seandavi.github.io/2019/06/omicidx-on-bigquery/ - Attachment (seandavi(s12)): OmicIDX on BigQuery | seandavi(s12) > OmicIDX is a project to democratize access to omics metadata. As the sizes of omics repositories have grown into the millions of available samples, thinking of the metadata themselves as Big Data seems reasonable. Additionally, by making the metadata more fit-for-use for text mining, natural language processing, ingestion into machine learning or search engines, OmicIDX aims to facilitate augmentation and analysis of these metadata. In practice, the OmicIDX mines data from the NCBI Sequence Read Archive (SRA), updated monthly, and the NCBI Biosample database, updated daily.
2019-06-28
Aedin Culhane (15:56:21): > @Aedin Culhane has joined the channel
2019-07-02
BJ Stubbs (13:00:52): > I would like to delete all buckets associated with the anvil-leo-dev project since it is now defunct and billing is disabled, in order to prevent any unexpected charges. Any objections?
BJ Stubbs (13:01:14): > these were buckets created using the old leonardo test api
Nitesh Turaga (13:01:38): > I’m in favor.
2019-07-09
Nitesh Turaga (13:40:37): > @Vince CareyDo you and your group have an allocation on xsede already?
Vince Carey (13:42:04): > yes
Nitesh Turaga (13:43:27): > I’m trying to get started with stuff on Xsede and, I was wondering if there is anything I should be doing?
2019-07-10
Vince Carey (06:08:06): > Right, I would like to see whether we can get the k8s-redis stuff working there, to start leveraging that against the HumanTranscriptomeCompendium package –https://zonca.github.io/2018/09/kubernetes-jetstream-kubespray.htmlmay be relevant. I think you would build a public jetstream image that can run the k8s-redis example, and once that is done we can set up some examples of scalably querying 180000 RNA-seq studies in this framework.
Nitesh Turaga (13:30:29): > Useful article for dataproc stuff which we’ll get to later on on GCPhttps://cloud.google.com/solutions/running-rstudio-server-on-a-cloud-dataproc-cluster.
BJ Stubbs (14:26:12): > I am shutting down the anvil-leo-dev since it is not associated with a billing account and is defunct
2019-07-24
Martin Morgan (17:08:00): > I believe that AnVIL_GTEX_V8 is available now as a workspace; clicking through presents an ominous red box but apparently one just ignores it
2019-07-25
BJ Stubbs (21:23:03): > Should we switch to bearer token auth in the anvil api package? That way, the package would allow api access to terra from terra, which is not easy at the moment. We have been replicating api calls in httr so we can get entity access from a terra rstudio. There is a bit of confusion on the anvil slack on how to use the package. I will link to slides detailing what I think is the current approach
2019-07-26
Martin Morgan (01:48:23): > can you outline for me what that means, or point me to an appropriate resource that shows how to implement bearer token authentication?
2019-07-30
Vince Carey (13:21:22) (in thread): > FWIW I have made a small notebook that parses a bit of the VCF for 838 subjects. I shared it to Martin and BJ.
Martin Morgan (13:36:31) (in thread): > Is this something that could be mentioned on the technical wg call today? I’m at the CZI seed network meeting and won’t be able to attend (and as far as I know no-one from the Leo app integration group is prepared to present…)
BJ Stubbs (15:04:49): > Sure, sorry for the delay
BJ Stubbs (15:05:59): > Basically, as I understand it, we authenticate to Terra as our gmail identity. In the current AnVIL package this is done by creating the oath app
BJ Stubbs (15:06:32): > But, terra creates resources for us (notebooks, rstudio, etc) under a service “pet” account
BJ Stubbs (15:06:53): > that account can authenticate to all terra/anvil resources that we have access to, but not in the same way
BJ Stubbs (15:07:14): > instead, when we are on the google compute platform, we should be using a bearer token
BJ Stubbs (15:07:29): > We can get one by executing this command on the node
BJ Stubbs (15:07:41): > gcloud auth application-default print-access-token
BJ Stubbs (15:07:57): > and we can use it by adding it into the header like we do with dockstore
BJ Stubbs (15:08:34): > so, to get a listing of workspaces we have access to in Terra from Terra, the following code should work
BJ Stubbs (15:08:53): > url=“https://api.firecloud.org/api/workspaces” > mytoken=system(“gcloud auth application-default print-access-token”, intern=TRUE) > temp=content(GET(url, add_headers(Authorization=paste(“Bearer”,mytoken))))
BJ Stubbs (15:10:52): > The problem is that this token is shortlived, and rapiclient binds the token to the object, so coming up with a refreshing strategy is nontrivial. It might be sufficient to make a small package of httr based functions for common api calls, since I do not think people would need the full api in a compute enviornment
2019-08-05
Vince Carey (09:11:09): > It looks like our outlining of activities has flagged:https://docs.google.com/document/d/1W0g2uBQeBaURAf2JMaobI-E7ef6zyLZU8Nvb9_b-s9I/edit?usp=sharing
Vince Carey (09:26:59): > This also seems a little stale.https://github.com/Bioconductor/AnVIL_Admin/projects/3… I think we are going to be hit with some real deadlines soon. This particular card-oriented layout seems OK but I find it odd that “last date touched” or something related to age is not a top level datum on cards. “Updated on May 9” is what I see at the very highest level.
Vince Carey (09:34:33): > I have added two “To Do” items there – 1) define best practices for provisioning terra workspaces/notebooks with R/bioc packages …, 2) produce a scientifically interesting example of working with public data in terra (GTEx? is it public?) using bioc and get the outreach group to blast/blog about it
2019-08-13
Martin Morgan (13:15:55): > <!channel>let’s get together early next week to work through milestones and other activities in the near and long term. Please indicate availability for Monday 11am:+1:; noon:tada:; 2pm:clap:or Tuesday 11am:ok_hand:; 2pm:wink:; or 3pm:face_with_monocle:
2019-08-14
Martin Morgan (00:26:19): > OK We’ll meet Monday 19 August at noon Eastern athttps://bluejeans.com/600792156See you then!
2019-08-19
Vince Carey (12:02:06): > call is starting
2019-08-20
Nitesh Turaga (14:01:43): > Hi@Vince CareyThere is a bioconductor image call now.
Vince Carey (17:18:26): > @Martin Morgandid you mention that there is an update to AnVIL APIs that would necessitate significant changes to AnVIL package? We thought we heard something to this effect on the call with Adrian. Just checking.
Martin Morgan (17:40:01): > I was overly pessimistic; there’s a coordinated API in the works;https://raw.githubusercontent.com/anvilproject/client-apis/master/openapis/output/combined-apis.ymlit uses swagger 2.0 which is fine for AnVIL; Marcel is using AnVIL_rapiclient for HCAMatrixBrowser, which is based on openapi 3.0.2 and is problematic. It seems like rapiclient is convenient and useful, and should be updated to support openapi (which is at least moderate work…)
Vince Carey (19:16:47): > OK, thanks!
2019-08-21
Vince Carey (00:45:21): > marshalling galaxy to do single-cell analysis (mostly scanpy-based at present, although scater is in the toolshed):https://www.ebi.ac.uk/gxa/sc/home - Attachment (ebi.ac.uk): Home > EMBL-EBI Single Cell Expression Atlas, an open public repository of single cell gene expression data
2019-08-26
Martin Morgan (12:04:19): > I believe we’re on the hook for 1/2 of the next (Tuesday) technical meeting. Topics ?
BJ Stubbs (12:38:19): > I am hoping to get a leo instance up with R and samtools this afternoon. Then, hopefully, we can show how to use GenomicFiles, GenomicFeatures, Rsamtools, and GenomicAlignments to get the count matrices for a chrom in a notebook without downloading the bam files as a demo of the Bioconductor paradigm
Martin Morgan (12:56:09) (in thread): > I’m almost finished a first round of updates to Rsamtools (hoping to push later today or tomorrow) that allows CRAM access. I intend to support gs / s3 remote access through a further tweak to Rhtslib later this week or next.
Nitesh Turaga (13:01:56) (in thread): > I’m working on testing the terra-jupyter-R image on Leo. We can potentially demonstrate something on that image as well. A simple example, which showcases 1. A successful Bioc package installation and 2. A failed package installation pointing out the need for more robust images.
Nitesh Turaga (14:49:59): > @BJ Stubbsor anyone, can you actually see the pagehttps://leonardo.dev.anvilproject.org? I think it’s dead. > > I’m trying to test the image on Leonardo and I can’t seem to get access.
Nitesh Turaga (14:52:22): > https://broadworkbench.atlassian.net/wiki/spaces/AP/pages/100401153/Testing+notebook+functionality+with+FiabQi Wang from the broad sent this to test the image on Leo. And i’m a little lost, what is “Fiab”? > > All the links on that page are not accessible. > > I’m posting here as opposed to the AnVIL page, just to see if anyone else in the group has an idea, before I ask on the other page.
BJ Stubbs (14:53:25): > I believe it is dead
BJ Stubbs (14:53:46): > it was a dev instance of leo, and when the funds ran out it stopped.
Nitesh Turaga (14:54:05): > ok. That makes sense.
BJ Stubbs (14:54:06): > If you have a billing account on terra, you can test the instances there
Nitesh Turaga (14:54:48): > I do, and how?
BJ Stubbs (14:54:53): > For jupyter, you can use the swagger, the api package, or terraplane (terrastation)
BJ Stubbs (14:55:00): > the fastest is the swagger
Nitesh Turaga (14:55:16): > Can you send me a link to the swagger api?
BJ Stubbs (14:55:22): > https://notebooks.firecloud.org/
BJ Stubbs (14:56:02): > inhttps://notebooks.firecloud.org/#!/cluster/createClusterV2
BJ Stubbs (14:56:43): > put in the billing account as googleproject, a name (there are rules, and I forget what they are, lowercase letters seem to work)
BJ Stubbs (14:57:02): > then in the clusterRequest put
Nitesh Turaga (14:57:10): > Yes, the rest is familiar. I will ask more questions as I have them:smile:
BJ Stubbs (14:57:19): > ah ok, good luck!
Nitesh Turaga (14:57:31): > Thank you.
BJ Stubbs (14:58:27): > One small thing that I didn’t know. If you modify the anvil docker starting with the docker, I think you need to add the end bit from the original dockerfile to make it work
Nitesh Turaga (14:59:16): > You mean theENTRYPOINT
?
BJ Stubbs (14:59:19): > yeah
Nitesh Turaga (15:00:12): > I just modified their file. it still holds theENTRYPOINT
.
BJ Stubbs (15:01:20): > ah cool
Nitesh Turaga (16:45:22): > Ok, so, just as a clarification for the future for people on this group, there is a round about way to test the docker image(s) we create. > > Main: Launch your docker image on Leonardo,https://notebooks.firecloud.org/#!/cluster/createClusterV21.Go tohttps://firecloud.terra.bio/#2. In the “Options” Choose to see the classic firecloud page. > 3. Go to “Notebooks Beta” > 4. Create a notebook (call it whatever) > 5. Choose “Cluster” on that notebook you just created. The image launched through the Swagger UI on Leonardo should already be present. > 6. If you click that you should see the Jupyter notebook with R. NOTE: R takes a few minutes to start up for some reason.
Nitesh Turaga (17:00:52): > The docker image I used is being developed here,https://github.com/DataBiosphere/terra-docker/tree/master/terra-jupyter-r
2019-08-27
Vince Carey (07:27:54): > I shared workspace+notebookhttps://app.terra.bio/#workspaces/landmarkanvil2/use_pkg_zipwith Martin, Nitesh and BJ. It demonstrates retrieval of a zipped package set from gs to run the popstrat demo in under 2 minutes.
Vince Carey (07:34:30): > The zip file is 600MB decompressing to about 1.3G of software
Martin Morgan (08:49:47): > @Nitesh Turaga@Vince Carey@BJ StubbsI started a slide presentation athttps://docs.google.com/presentation/d/1V9ijFlTKFlaFAaRZppFAkE7UqdqBkp_BUypbhES9pxE/edit?usp=sharingplease add or link to your own presentations from there; I’ll start with a quick run-through of the first slide then turn things over to Nitesh
BJ Stubbs (12:01:26): > Will do. Do you mind if I alter the color theme?
BJ Stubbs (12:38:09): > Odd, jupyter notebook does not seem to be running the .onLoad code for the AnVIL package. It works in the terminal on Jupyter.
BJ Stubbs (13:03:38): > solved, it was the interactive call
Martin Morgan (13:10:07): > in the notebookinteractive()
returns FALSE?
BJ Stubbs (14:10:08): > yep
Martin Morgan (15:51:21): > based on order of slides & content I updated the overview & slide deck, with short intro (martin), docker / leonardo (nitesh), 1k genome (vince), gTEX (BJ)
Martin Morgan (16:37:22): > thanks@BJ Stubbsfor the demo; can you point me to or create a pull request with bearer authentication?
Martin Morgan (16:38:15): > Also I liked the idea of at least exploring packrat as a tool for saving / restoring installed packages (does it save binary installations?)
Nitesh Turaga (16:39:00): > packrat in R is synonymous to “virtualenv” in python.
Marcel Ramos Pérez (16:44:55): > See also:renv
https://rstudio.github.io/renv/articles/renv.html
Nitesh Turaga (16:46:17): > That’s great@Marcel Ramos Pérez, I was just reading thishttps://rstudio.github.io/renv/articles/renv.html#comparison-with-packrat
Martin Morgan (16:51:35): > It would be very helpful in our presentations to consistently use BiocManager::install() to install all packages, CRAN, Bioc, or github, with the sole exception of BiocManager…
Vince Carey (17:21:02): > oh, did I use install.packages()? Sorry …
Vince Carey (17:29:10): > @Nitesh Turaga@Levi Waldron@Marcel Ramos Pérezthanks for the pointers on renv/packrat. Maybe you have run into > > > renv::init("/Users/stvjc/abcd") > * Discovering package dependencies ... Done! > * Copying packages into the cache ... Done! > * Lockfile written to '~/abcd/renv.lock'. > Error in file(con, "w") : cannot open the connection > In addition: Warning message: > In file(con, "w") : > cannot open file '/Users/stvjc/Library/Application Support/renv/projects': No such file or directory >
Marcel Ramos Pérez (18:00:16): > Hi Vince@Vince CareyI’m unable to reproduce the issue on a Mac. Do you haveabcd
on GitHub somewhere?
Vince Carey (19:22:41): > No. Is github necessarily involved? Are there some steps required to set up the ‘projects’ folder that it can’t find?
Marcel Ramos Pérez (19:27:49): > It’s not required although it would be helpful to have project code posted somewhere so that I can try to reproduce the error. I’ve tried it on my own project and it worked for me.
Vince Carey (21:18:25): > As far as I can tell neither packrat nor renv hav a vignette. I guess the “project” concept has something to do with Rstudio. Treating that as implicit has thrown me. But I will try to use it.
Vince Carey (22:33:50): > > > install.packages("gee") > Installing package into '/Users/stvjc/renvd/packrat/lib/x86_64-apple-darwin15.6.0/3.6.1' > (as 'lib' is unspecified) > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 0 100 123k 100 0 0 0 123k 0 0 546k0 --:--:-- --:--:-- --:--:-- 545k > > The downloaded binary packages are in > /var/folders/5_/14ld0y7s0vbg_z0g2c9l8v300000gr/T//RtmpWqTepp/downloaded_packages > > install.packages("BiocManager") > Installing package into '/Users/stvjc/renvd/packrat/lib/x86_64-apple-darwin15.6.0/3.6.1' > (as 'lib' is unspecified) > Error in install.packages : unknown input format >
Vince Carey (22:39:13): > Rstudio is not helping me out tonight. It is good to know there is a systematic approach to the package archive approach I conducted manually in the terra notebook. We can’t bank on Rstudio at this point, so renv may not be relevant. I will look into packrat itself presently.
2019-08-28
Vince Carey (07:52:53): > I guess there aren’t many anvil jupyter notebook R users, but I have observed that there are situations in which it is useful to run gc() manually, when memory limits seem to be breached.
Vince Carey (10:10:57): > Progress with packrat: > > install.packages("BiocManager") > BiocManager::install("packrat") > system("gsutil cp[gs://bioc_pkgs_aug_2019/popstr_packrat.zip](gs://bioc_pkgs_aug_2019/popstr_packrat.zip).", intern=TRUE) > system("unzip popstr_packrat.zip") > library(packrat) > on() > .libPaths() >
Vince Carey (10:12:32): > There are two jupyter notebooks – populate_packrat and restore_packrat that illustrate this approach.
Marcel Ramos Pérez (10:30:50) (in thread): > They have a website “vignette”.renv
:https://rstudio.github.io/renv/index.htmlpackrat
:https://rstudio.github.io/packrat/
Marcel Ramos Pérez (10:32:36) (in thread): > I was able to successfully runrenv
onBioconductor/RaggedExperiment
after removing the*.Rproj
.
Martin Morgan (11:19:27): > Was wondering whether gcs fusehttps://cloud.google.com/storage/docs/gcs-fusewas a viable solution for this? I know the Broad had hesitations, but they’d probably have hesitations about copying / unzipping too!
Vince Carey (13:15:19): > Jeff Gentry had been looking at FUSE. It seems a definite candidate for this problem, constituting a much more general solution.
Martin Morgan (13:28:06): > @BJ StubbsI was reflecting on your presentation overnight (thanks, they are always stimulating!) and was wondering (a) whether the some of the complexity is because the demos tackle too much, e.g., I would probably have been happy being told that you’d written a Dockstore workflow, and then seen how one invokes that from within AnVIL, without the detail of creating it or even what the workflow does. And (b) I was wondering whether the complexity of download / subset / upload of the sample descriptions could be avoided by doing these filtering operations as a part of the workflow, taking the full set of data as input then restricting to those of interest..? This would might seem to address two things, no need to perform transformations ‘out of band’ and perhaps no need to clone the workspace?
BJ Stubbs (15:15:43): > We can now use the api to get/filter/and create new tables in the workspace data object model from a notebook, which will cut down on the out of band work
Sehyun Oh (15:28:21): > @BJ StubbsIs there any reason you are using new participant (or sample) table instead of usingsetentity?
BJ Stubbs (15:37:08): > @Sehyun OhI don’t know how to create and upload sample set entities programatically yet.
2019-08-31
Martin Morgan (13:19:12): > <!here>I made a significant change to AnVIL 0.0.17 – the symbolsterra
,leonardo
, … are not automatically created on package load. Instead, one has to start a session with, e.g.,terra <- Terra()
.
2019-09-03
Martin Morgan (09:26:11): > Some internal commentary on Vince’ comments herehttps://the-anvil.slack.com/archives/GM5C32K2P/p1567508271018100. We tried with our docker images to come up with ‘topical’ images, but ran into four problems quickly. The first is in definition of the images, where the core team does not really have expertise (in proteomics or flow cytometry most clearly, but even in RNASeq differential expression – DESeq2? Limma? both? additional packages? etc). The second is the more-or-less immediate request that we add package ‘X’ because that is a central component of a particular individual’s work flow. The third is the challenge of maintaining each image, requiring that the packages and their dependencies install (obviously this price is paid either by the core team or by the user, and in some ways the core team is in a better position to pay the price). The fourth is the ‘fragmentation’ of the user experience – what image am I supposed to use? > > I guess this is like a supermarket, where you can sell beer and chips and pretzels individually, or a ‘beer pretzel special’. We do actually have some approximation of the basket of packages individual IP addresses install, so one could probably arrive at an informed decision about this. > > I personally like the idea of binary installs of packages coupled with user persistence across sessions. Binary packages would require a commitment from the core team to make the binaries, probably as a product of the build system.
Vince Carey (09:44:42): > Good analysis – I will give a couple of rebuttals. Apropos first and second comments: the user can always customize a session by adding package X in real time so our topicals are just a proposed convenience. The topicals may live or die, the community will decide. I think you’ve solved the problem of the third comment– maintaining bioconductor_full (which is part of minimal) seems inevitable. Comment four is real but we already give users that problem by the nature of the project. In summary, I think we want to be sure that the system, long term, is at least as easy to use as a laptop. And I think the persistence issues are a big part of ease of use and have to be costed and solved systemically. Therefore the general discussion should be conducted in anvil space but perhaps not just at bioconductor-in-terra. Bioc is just an example of an ecosystem that will require some innovation to support at AnVIL level and AnVIL team as a whole should figure out how to do it.
Nitesh Turaga (11:05:56): > Just a heads up to the bioc-anvil team, I’ve forked a couple of DataBiosphere repositories on the Bioconductor organization account. 1.https://github.com/Bioconductor/terra-docker2.https://github.com/Bioconductor/leonardo(we only need this to do some automation tests on Leo for the images we publish, but I went ahead and forked it into the Bioconductor organization account for consistency on what we contribute from our end).
Vince Carey (16:06:03): > Out of curiosity I used packrat to install the 5 packages listed in the terra-docker dockerfile. Resolving all dependencies in Bioc and CRAN led to installation of 110 packages from AnnotationHub to zlibbioc.
Martin Morgan (16:16:26): > Do you mean 110 rather than 134?
Vince Carey (16:44:04): > I got a total of 109addedthrough packrat as far as I can tell. There are 29 that seem to be ‘base’. Thus the total package load seems to be 138 for 3.6.1.
Nitesh Turaga (16:50:08): > ok, so, as far as I can tell. There are a few things here. > > On the terra-jupyter-bioconductor image, > > > sapply(.libPaths(), function(x) length(rownames(installed.packages(x)))) > /home/jupyter-user/.rpackages /usr/local/lib/R/site-library > 169 0 > /usr/lib/R/site-library /usr/lib/R/library > 0 29 >
> There are 169 packages installed by “us” (broad and bioc). And there are 29 packages which come as base R.
Nitesh Turaga (16:57:02): > As far as I know, unless there is some difference in opinions on the 5 packages we do install + their dependencies > > RUN R -e 'BiocManager::install(c( \ > "SingleCellExperiment", \ > "GenomicFeatures", \ > "GenomicAlignments", \ > "ShortRead", \ > "DESeq2"))' >
> The 169 won’t really change.
2019-09-04
Martin Morgan (03:23:52): > I guess my 134 number was a bit casual; in devel I had > > pkgs = c("SingleCellExperiment", "GenomicFeatures", "GenomicAlignments", "ShortRead", "DESeq2") > db = available.packages(repos=BiocManager::repositories()) > deps = tools::package_dependencies(pkgs, db, recursive = TRUE) > length(unique(c(pkgs, unlist(deps)))) #134 >
> but it’s actually release > > db = available.packages(repos=sub("3.10", "3.9", BiocManager::repositories())) > deps = tools::package_dependencies(pkgs, db, recursive = TRUE) > install = unique(c(pkgs, unlist(deps))) > length(install) # 128 >
> and excluding base and recommended packages > > inst = installed.packages() > dflt = rownames(inst)[inst[,"Priority"] %in% c("recommended", "base")] > length(setdiff(install, dflt)) # 108 >
> so maybe one lost sheep between this number and packrat? (packrat itself? BiocManager?) > > As Nitesh points out, there are additional packages installed by the Broad, in particular tidyverse. > > And speaking of release… the goal is for this deliverable to be available in Q3, i.e., this month. We’ll have our release at the end of October, I guess, so AnVIL will be current for about a month …:disappointed:
Martin Morgan (03:48:56): > <!channel>We’d carefully scheduled (and missed, or at least I did…) our Bioc / AnVIL meeting for labor day. I’m scheduling the next for September 16 12-1pm athttps://bluejeans.com/881434832
Levi Waldron (08:13:29) (in thread): > I’m going to be on vacation and most likely offline the week of September 16, and otherwise I’m on Central European Time and these afternoon meetings will be sketchy for me (making an exception for the monthly TAB dinner meeting:smile:). Starting the 30th, how would others feel about moving this to morning, like 9 or 10am? Sorry to be a pain, and if it’s inconveniencing others I’ll do my best for the current meeting times.
Martin Morgan (08:29:47) (in thread): > 10am could work for me… especially on a different (Tue / Wed / Thur) day
Levi Waldron (08:31:29) (in thread): > 10am Tue is free, Wed I could free up.
Vince Carey (10:45:36) (in thread): > 10am tuesday should work
Vince Carey (10:46:44): > @Martin Morgancan you point me to a formal statement of deliverables by quarter?
Martin Morgan (13:28:49): > I sent a link to ‘AnVIL milestones through end of year’ , in the AnVIL shared folder AnVIL PM Weekly Meeting folder; was that what you were looking for?
2019-09-05
Martin Morgan (05:51:31) (in thread): > Since Levi is out the week of the 16th anyway we’ll leave that as currently scheduled, Monday the 16th at 12 noon. I’ll send out a recurring invite for Tue Oct 1 at 10am et seq.
Levi Waldron (06:45:24) (in thread): > Thanks Martin.
Levi Waldron (06:51:54): > A tool to make Docker and Singularity easier to run across different computing environments. From our friend Nathan Sheffield, so feature requests will be welcomed. I’ll be trying it out.http://docs.bulker.io/en/latest/motivation/
Martin Morgan (09:05:54): > This paragraphhttps://github.com/mikelove/tximetaPaper/blob/3f7ff748981547a2188b615b2df2fb421301f5c5/main.tex#L110and the preceding provide an interesting orientation on provenance tracking through workflows
2019-09-06
Vince Carey (11:07:43) (in thread): > One item that is missing is the transition to R 3.6 for terra notebooks. A general synchronization practice should be adopted.
Martin Morgan (12:21:48): > Probably identifying more explicit goals for Q4 would be very helpful (‘deploy w/ RStudio’ is the main formal milestone). Agree that Bioc release management would be excellent. Also robust AnVIL package functionality for, e.g., workspace data discovery / use.
2019-09-07
Vince Carey (09:47:12): > Another policy issue: for Bioc 3.11, we will be working with R-devel. Is it clear how the bioconductor_devel container will acquire/update its R?
Vince Carey (10:20:23): > Just a heads up: I have been working again with dockstore and am revisiting the ‘parameterized workflow’ concept, that I believe can be listed as a Q4 deliverable. At present it seems that a ‘workflow-capable’ docker container is inevitable for this. A mix of persistent resources in google storage and reasonably provisioned containers will help keep container image sizes modest. I have an illustration in the domain of pancancer transcriptomics that I will write up in the next couple of days.
2019-09-08
Vince Carey (12:35:43): > Is there some reasonhttps://github.com/Bioconductor/terra-dockerlacks an ‘issues’ element? In any event, is there a chance of getting gsutil into this container, or into bioconductor_full?
Vince Carey (22:25:10): > Do we have a sense for when terra jupyter will run R 3.6?
Vince Carey (22:26:08): > Also, can someone point me to the container (presumably in gcr) that the current terra R for jupyter is based on? It provides R 3.5.2.
Kasper D. Hansen (22:26:43): > Should we talk with the broad people on Tuesday and tell them this needs to be more frequently updated and that we need a recent R
Kasper D. Hansen (22:26:58): > I mean, they need to know, but should we do it on Tuesday
2019-09-09
Martin Morgan (05:59:09): > we have had several exchanges and well-defined communication, includinghttps://the-anvil.slack.com/messages/GM5C32K2Pif you’re interested (I think there’s value in having discussions like the current one ‘internally’, so as to present a more-or-less consistent and less confusing agenda to the Broad); this is clearly not a Q3 deliverable, and we have a ‘Q4’ discussion tentatively scheduled for after the terra-jupyter-bioconductor image is made available…
Martin Morgan (06:02:40) (in thread): > gsutil on terra-jupyter-bioconductor seems like a good idea, but maybe at a lower level?@Nitesh Turagacan you ask Qi?
Vince Carey (07:21:20): > here’s my motivation for asking about the R 3.5.2 image … i now have a rubric for converting bioconductor workflow documents into dockstore workflows and (independently … until there is greater concordance between containers for dockstore and terra) jupyter notebooks that run in terra. by going back to Bioc 3.8 I can get workflow code that runs in R 3.5.2. I would like to have a library of jupyter notebooks in terra that reflect the bioc workflow documents. it is a shame that it only reflects bioc 3.8 but it is a template for a process that can work more generally.
Vince Carey (07:22:58): > so in a sense i am surrendering to the lack of agility with respect to R in terra jupyter. but not surrendering with respect to dockstore. we can use state-of-the-art bioc there because we can select our own containers.
Vince Carey (07:25:40): > spoiler – this rubric depends upon collections of compiled packages stored in google bucket and copied to active workspace, to keep container size small. it is admittedly inelegant but it works and seems sufficiently performant.
Nitesh Turaga (07:58:32) (in thread): > Yes, I’ll ask.
Martin Morgan (08:06:06) (in thread): > @Nitesh Turagacan you point Vince to the right location?
Nitesh Turaga (08:35:01) (in thread): > Hi Vince, the currenthttps://app.terra.biodoesn’t give you the option of choosing a notebook runtime environment. It defaults to the image which has R-3.5.2. You cannot change this. This has to be a modification in the UI coming from the terra UI folks. I’ve spoken to them about this, and they have an open ticket for it, but it’s not on their agenda in Q3. Adrian Sharma answered this question for my in the AnVIL slack channel. > > You’d have to navigate tohttps://firecloud.terra.biowebsite, and navigate to the older firecloud UI to be able to choose the correct runtime for the notebook you want. Byruntime
I mean the docker image the cluster will be using. > > Once you have that, you can choose to launch theus.gcr.io/broad-dsp-gcr-public/terra-jupyter-bioconductor:0.0.2image, which is publicly hosted athttps://us.gcr.io/broad-dsp-gcr-public/terra-jupyter-bioconductor. This will give you the tag 0.0.2 is the working version of bioconductor_full for terra. You’d need to launch it throughhttps://notebooks.firecloud.org/.
Nitesh Turaga (08:35:29) (in thread): > This image will guide to the right portal hopefully. - File (PNG): Screenshot (2).png
Nitesh Turaga (11:27:28) (in thread): > @Martin MorganIt seems thatgsutil
is available to us as it is installed in theterra-jupyter-base
image.terra-jupyter-bioconductor
extendsterra-jupyter-r
extendsterrra-jupyter-base
. > > If you run the notebook on terra or locally > > docker run -it --user root --entrypoint /bin/bash[us.gcr.io/broad-dsp-gcr-public/terra-jupyter-bioconductor:0.0.2](http://us.gcr.io/broad-dsp-gcr-public/terra-jupyter-bioconductor:0.0.2) >
> You can test the notebook with the command! gsutil --help
(on the jupyter notebook on terra) or the likes locally on the shell,gsutil --help
. > > There is nogcloud config
set up. It needs to be configured by authenticating with google (gcloud auth login
). But i’m not sure how this will work from within the notebook on terra.
Sean Davis (13:15:46): > The collections of compiled packages approach is not elegant, but I think it is the recommended approach these days. It also abstracts the need to keep a bunch of containers alive if bioconductor_full is the base from which the compiled packages are derived and later used.
Kasper D. Hansen (13:18:05): > Ok, I have not really been paying attention, so just ignore me
Sean Davis (13:19:37): > It does seem like developing a process to feed trusted Bioc-blessed images into terra would be a win for everyone.
2019-09-11
Vince Carey (09:54:04): > Google cloud storage seems a bit harder to use than AWS S3. Specifically once I place a file into GCS it looks like I need a signed URL to retrieve it, where with S3 it is not hard to generate a URL that will work with wget. I am raising this issue to see what is involved with making BiocFileCache work with files that are in GCS. Maybe there is an R client package (googleStorageR?) that takes care of the hard parts. In any event thoughts about multi-cloud BiocFileCache are welcome.
Sehyun Oh (10:06:28): > @Vince CareyDo you still need a signed URL, even if you set the bucket permission to public?
Vince Carey (10:08:50): > Ah – you may be right – I need to do some IAM work.
Nitesh Turaga (10:10:02): > Just a general comment about the multi-cloud BiocFileCache. Some of that work might be made easier by using a service likeCloudBridge
-https://github.com/CloudVE/cloudbridge. Which does the heavy lifting of providing a unified interface for multiple clouds. > > It might need some work using Rstudio’s reticulate as the software is written in python.
Martin Morgan (10:28:49) (in thread): > who pays for egress when the bucket is public?
Martin Morgan (10:29:55) (in thread): > are these accessible with gs://…? tagging@Lori Shepherd
Sehyun Oh (10:54:58) (in thread): > I think the bucket owner (data hosting side) is paying, but not completely sure. FYI, here is the reference:https://cloud.google.com/storage/pricing#network-pricing.
Vince Carey (11:04:08) (in thread): > We should be able to enable requester payshttps://cloud.google.com/storage/docs/requester-pays
2019-09-13
Martin Morgan (09:05:17): > Just a reminder about our meeting on Monday at 12 noon Eastern,https://bluejeans.com/881434832, recognizing that Levi can’t come. Future meetings shifted to alternate Tuesdays. Tagging@Lori Shepherdfor info
Sean Davis (19:48:11) (in thread): > Public GCS objects are available via URLs that look like:https://storage.cloud.google.com/BUCKET_NAME/PATH/TO/FILE.
Sean Davis (19:50:26) (in thread): > Requester pays is a pretty strong paywall right now for general use. With resources meant to be Anvil only, it probably makes sense, though.
Sean Davis (19:54:33) (in thread): > GCS objects are already available via an S3 protocol if you want a unified approach. A python example is listed here. Other s3 tooling should work similarly.https://cloud.google.com/storage/docs/migrating#storage-list-buckets-s3-python
2019-09-15
Vince Carey (10:38:38) (in thread): > A paywall in the sense that a prospective user must establish a payment mechanism, presumably a credit card, right? With cloud credits, could requester pays status just charge against the credits? It seems that for the intended data transfer activities the charges would be quite small.
2019-09-17
Sehyun Oh (16:53:24): > Can anyone give me 2~3 sentence answer about why Dockstore is better than others, like DockerHub or Quay? (Apologize for too basic question.) I only grasp that Dockstore forces more documentation…
Sean Davis (18:06:10): > Quay and Docker Hub are image repositories only. Dockstore ties workflow/tool descriptions in WDL, CWL, or NextFlow to Docker images that can be hosted at Docker Hub or Quay.
Sehyun Oh (18:17:44): > Thanks,@Sean Davis! So Dockstore is more like a repository for docker use cases. Btw, Dockstore itself is not hosting docker image, right? I got very confused with their use of term ‘tool’.:slightly_frowning_face:
2019-09-18
Vince Carey (10:12:31): > The terminology is a bit challenging IMO. “store” is pretty clear – it is an archive (of workflows). The “dock” comes from the emphasis on reproducibility. Workflows can be composed without any connection to containers. But when a workflow is linked to a container image it becomes possible to reproduce the workflow pretty directly. I guess I should have saved the msireg1 workflow as a tool because the docker container is explicitly identified in the workflow. Live and learn… – added after email from Walt Shands – in Dockstore “a WDL workflow with only one task can be registered as a tool”. Since tools do not have ‘launch in terra’ interface, and none is planned, it was appropriate to register msireg1 as a workflow.
2019-09-23
Vince Carey (09:13:34): > reminder:github.com:Bioconductor/AnVIL_Admin.gitis where we are cataloging development progress; the READMEs there indicate how to update. A graphical view of results and targets over time would be useful.
Vince Carey (09:56:09): > I have added a node for “terra workspaces/workflows” to the index.md
Nitesh Turaga (09:56:42): > I have added what I have worked on after the class last week.
Nitesh Turaga (10:40:15): > Just a follow up with a minor docker image update from the Broad side, to make things more explicit with R + Bioconductor version, i’ve suggestedBiocManager::install(version="3.9")
in the terra-jupyter-r image. > > Justin C from the broad is doing some work on these images as well. - File (PNG): Screen Shot 2019-09-23 at 8.07.53 PM.png
2019-09-24
Vince Carey (16:12:29): > Should we be looking at how Bioc can interact with galaxy CVMFS? Are there annotation resources equivalent to those in our packages/Hub?
Martin Morgan (18:34:02): > it could be cool at some level; we explored a little – one can effectively ‘mount’ a CVMFS node and then the distributed file system (e.g., collection of AH resources) could be available as though on a local file system. I think the mount happens quickly, with transparent localization only when a file is accessed. > > This only helps when the ‘user’ can mount CVMFS. In principle I think we could do this in a docker image or in AnVIL, but it would be too difficult to do individually. In practice@Nitesh Turagawent down the path a little but CVMFS was not playing at all well with docker, requiring low-level access that docker did not provide; for instance, I don’t think there’s a CVMFS docker image, which you’d expect from the CVMFS community if it were easy to do… But I guess Galaxy will figure out a containerized way to make CVMFS available…
2019-09-27
Martin Morgan (06:40:18): > AnVIL version 0.0.23 should use bearer tokens for Terra, Leonardo authentication; tested locally but not on terra. Note also thegsutil_*()
andgcloud_*()
functions for interacting with gsutil and gcloud.
2019-10-01
Martin Morgan (10:02:08): > <!channel>reminder of our bi-weekly meeting now…
Vince Carey (10:02:42): > ok
Vince Carey (10:03:38): > link please
Lori Shepherd (10:04:19): > https://bluejeans.com/480153337?src=calendarLink
Nitesh Turaga (12:14:55): > This was held at 10am EST?
Nitesh Turaga (12:18:08): > My apologies for missing it, are there notes for this call?
Nitesh Turaga (12:45:47): > Ok, just an update from my end for this call. The bioconductor image tests are running and there are a few errors which need to be corrected. Particularly installation of ShortRead has timed out on the first run. We are running it again to see if it happens another time (if it fails again, i’ll increase the time out limit for this package). The rest of the tests run fine.
Vince Carey (16:17:02): > nb – i put a pull request in on the anvil page for current tools to indicate Bioconductor in conjunction with jupyter/R
Sehyun Oh (16:20:58): > So…https://app.terra.bio/andhttps://anvil.terra.bioare different? What is the relationship between Terra and AnVIL?
Vince Carey (16:54:16): > @Sehyun Ohat the moment i think it is really just the skin and maybe some aspects of authorization/data availability. Longer run I think there may be various other divergences but at the moment it doesn’t seem there are important functional differences. Maybe a terra reader could comment further — but i don’t see anyone from broad on this channel. Perhaps the question should go on the anvil board?
Kasper D. Hansen (21:04:47): > Im talking about something Im not an expert on
Kasper D. Hansen (21:05:43): > But Terra is broads compute platform. They have been “selling” this platform for various projects including Anvil. Which means there is Terra, and Anvil-branded Terra. And probably OTHER PROJECT
-terra
2019-10-11
Lori Shepherd (12:36:48): > Our next bi-weekly meeting is next Tue Oct. 15. We have created an agenda. Please feel free to view and edit.
Lori Shepherd (12:36:51): > https://docs.google.com/document/d/14oM_22WPfpbOXFCP2hkvhOIYAfmtulHj1Es1LOiP_1U/edit?usp=sharing
Vince Carey (13:07:38): > @Lori Shepherdcan you give more details on the internal meeting – I thought it would occur on mondays so that we would have time for the tuesday anvil tech call
Martin Morgan (13:32:27): > I think we’d changed to Tues for@Levi Waldron??? I can work either day (except this coming week, when Monday is a holiday
Vince Carey (14:03:01): > OK, fine – so we will have limited time to revise prior to anvil meeting. I don’t see the calendar link, alas, so please provide the time of the call.
Martin Morgan (14:04:55): > i forwarded the invite to you just now
Vince Carey (14:22:04): > @Martin Morgani have not received it. i tried to liberalize my calendar … i looked in spam. nothing.
Martin Morgan (14:35:43): > alternate tuesdays at 10amhttps://bluejeans.com/480153337
Martin Morgan (14:53:52): > my intention for this coming Tuesday’s afternoon working group presentation was to do this myself, so only my work needs to be revised in limited time…
BJ Stubbs (18:06:29): > @Martin MorganI just want to say, the AnVIL api package is really great, and being able to use in in Terrra notebooks is a game changer
2019-10-15
Lori Shepherd (10:01:37): > Reminder our bi-weekly meeting is starting now athttps://bluejeans.com/480153337
Sehyun Oh (10:02:51): > I’m attending a local conference this morning, and can’t attend the meeting at 10am. Also, I was on vacation last 10 days, so not much update from my side. Sorry. But I’ll be at 4pm call. See you then!
Vince Carey (14:50:41): > BJ found this –https://support.terra.bio/hc/en-us/articles/360034829252-Service-Incident-October-15-2019 - Attachment (Terra Support): Service Incident - October 15, 2019 > Summary The issue was found at 1:00 PM EDT on October 15, 2019 and impacts users attempting to: Launch workflows that depend on docker images hosted on Dockerhub Create new Notebook Runtime Enviro…
Martin Morgan (16:48:51): > Ok,@Sehyun Ohasked me to share the notebook from today, but I don’t know how to do that!
Sehyun Oh (16:54:25): > I don’t think there is a classy way to do it. ;(
Sehyun Oh (16:55:42): > It seems like 1) share your workspace with me or 2) copy to another workspace where both sides have an access to.
Sehyun Oh (16:56:34): > Anyone knows better way?@BJ Stubbs@Vince Carey
Sehyun Oh (17:03:57): > I just create a new workspace ‘BioC_on_Terra’ and add you (mtmorgan.bioc@gmail.com) as a reader. Can you check whether you can copy the notebook to this workspace? In the Notebook page, the third options from ‘three-dots circle’ is for copying.
Vince Carey (19:11:37): > FWIW i think sharing is simplified if you do not specify an authorization domain
Sehyun Oh (21:52:56): > Yes, I didn’t add any authorization domain to this workspace.
2019-10-16
BJ Stubbs (14:59:49): > @Levi Waldron@Martin Morgan@Marcel Ramos PérezI am pretty close to creating summarized experiments from the AnVIL TCGA workspaces. I have a notebook that shows how to filter the workspaces to TCGA, get the hg38 versions, get the sample and participant data, parse out the GDC uuids from the data, and use GenomicDataCommons to download the bio and clinical xml files and the assay data. Now, I need to put things together and decide where/how to save the result. I am using BRCA as an example. Moving forward, does it make sense to work towards an hdf5 summarized experiment stored in a workspace-linked google bucket?
Levi Waldron (15:09:53): > Depends on your objective I guess - do you want to make a curatedTCGAData from genomic data in AnVIL? Or just show a proof of concept on BRCA and near equivalence with curatedTCGAData? BTW TCGAutils helps with UUID mapping.
Martin Morgan (15:13:00): > In this new-fangled cloud world, does it make sense to use a service like BigQuery and have a much more dynamic ‘constructor’?
Levi Waldron (15:15:14): > curatedTCGAData uses HDF5 for methylation data only. It helps to serialize the object quickly, but seems best if you are then only interested in a small slice of the data and otherwise requireas.matrix()
. So maybe Martin’s idea would be more flexible?
BJ Stubbs (15:26:02): > I guess this example is a bit artificial, since the data is open and available easier elsewhere, but I was going for creating natural Bioconductor objects in AnVIL with no egress as a proof of concept of how a Bioconductor user might use AnVIL resources. Then maybe try some science with it
Vince Carey (15:37:53): > Apropos BigQuery, it is undoubtedly a performant approach to dealing with TCGA as a database. What we lose relative to SummarizedExperiment/MAE discipline (or what we have to do to recover it via work along the restfulSE line) is an open question that I am looking at as time permits. The fact that Cromwell had to be changed to allow access to BigQuery in terra made me think that the BigQuery image of TCGA is not going to be the “endorsed” one for AnVIL. And that we should do the work of illustrating the Bioconductor approach to cohort representation for a tumor within the AnVIL model, as a benchmark against which we can demonstrate alternate approaches.
Vince Carey (15:43:22): > In BiocOncoTK pancan_BQ and buildPancanSE allow us to get going pretty quickly with TCGA in BigQuery. But what we have there is assuming one tumor-one assay at a time – as a basic unit of organization. More aggressive cross-tumor analyses are certainly possible in this framework but smoother access to bigger chunks of cancer genomic data are conceivable and should be supported in Bioc tooling. Use cases are fundamental;https://doi.org/10.1016/j.cell.2018.03.022has some I am looking at now.
Sean Davis (15:46:15): > BigQuery TCGA is a one-off and took several months to get right. I don’t expect that most public datasets will end up there. Also, BigQuery costs can become significant for larger datasets that are queried in an ad hoc way.
Sean Davis (15:46:52): > I’m not arguing that BigQuery should be out-of-the-picture, but I do wonder if it is the right fit as a general solution.
Martin Morgan (16:12:38) (in thread): > I see the workspace in the ‘Workspaces’ panel, but not if I try to copy my notebook to it; if I open the workspace and try to ‘Create a New Notebook’ I’m told ‘you do not have access to modify this workspace’
Sehyun Oh (16:52:11) (in thread): > Oops… sorry I set you as a reader. I switch you as a writer, so it should work now. Could you try it?
Martin Morgan (16:58:35) (in thread): > yep, it worked (I had to move my notebook [via download end then upload again] to a workspace that did not have particular security groups associated, so that I could copy it to your workspace (with similar absence of security groups…)
Sehyun Oh (17:04:45) (in thread): > Thanks!
2019-10-17
Vince Carey (10:16:22): > Sketchy thoughts: using GA4GH conceptual scheme, federated genomics analysis system = TRS (tool registry system, supporting improvised workflows and custom environments in containers) + WES (workflow execution system, various implementations) + “data commons”, with which tools and executors operate efficiently. Canonical problems for federated genomics, and metrics to compare values of solutions, are needed. There will not be a general solution but a few candidates should be well understood, and a calendar based approach to articulating and working towards new goals relative to current state, should be formulated.
Martin Morgan (11:01:10): > …I think this translates into ‘Milestones for Q1 and Q2’ in AnVIL project manager speak?
Vince Carey (13:36:06): > Could be… I am willing to work on these topics.
2019-10-18
James Taylor (09:06:40): > @James Taylor has joined the channel
2019-10-22
Sehyun Oh (10:59:29): > I’m getting the below error with AnVIL package - it doesn’t recognize gcloud binary. I tried to set ‘GCLOUD_SDK_PATH’ with the file path togcloud
, but didn’t work. Any help?
Sehyun Oh (10:59:31): > > > Terra() > Error: failed to find 'gcloud' binary; set option or environment variable 'GCLOUD_SDK_PATH'? >
Vince Carey (11:07:14): > maybe debug AnVIL:::.gcloud_sdk_find_binary
Martin Morgan (11:15:28): > Please includesessionInfo()
to help with knowing your OS and R and AnVIL versions; my gcloud SDK was installed at~/bin/google-cloud-sdk
where there is abin/
directory > > > dir("~/bin/google-cloud-sdk/bin") > [1] "bootstrapping" "bq" > [3] "dev_appserver.py" "docker-credential-gcloud" > [5] "endpointscfg.py" "gcloud" > [7] "git-credential-gcloud.sh" "gsutil" > [9] "java_dev_appserver.sh" "kubectl" > [11] "kubectl.1.11" "kubectl.1.12" > [13] "kubectl.1.13" "kubectl.1.14" >
> So I > > > terra = AnVIL::Terra() > Error: failed to find 'gcloud' binary; set option or environment variable 'GCLOUD_SDK_PATH'? > > Sys.setenv(GCLOUD_SDK_PATH="~/bin/google-cloud-sdk") > > terra = AnVIL::Terra() > > >
> (I used tab completion inside the"
to make sure the path was valid…)
Sehyun Oh (11:40:43): > Works now.
Sehyun Oh (11:40:56): > > > sessionInfo() > R version 3.6.1 (2019-07-05) > Platform: x86_64-apple-darwin15.6.0 (64-bit) > Running under: macOS Mojave 10.14.6 > > Matrix products: default > BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib > LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] AnVIL_0.0.28 dplyr_0.8.3 > > loaded via a namespace (and not attached): > [1] Rcpp_1.0.2 rstudioapi_0.10 magrittr_1.5 tidyselect_0.2.5 R6_2.4.0 > [6] rlang_0.4.0 httr_1.4.1 tools_3.6.1 packrat_0.5.0 lambda.r_1.2.4 > [11] futile.logger_1.4.3 remotes_2.1.0 yaml_2.2.0 assertthat_0.2.1 tibble_2.1.3 > [16] crayon_1.3.4 BiocManager_1.30.8 purrr_0.3.3 formatR_1.7 futile.options_1.0.1 > [21] curl_4.2 rapiclient_0.1.2.3-3 glue_1.3.1 compiler_3.6.1 pillar_1.4.2 > [26] jsonlite_1.6 pkgconfig_2.0.3 >
Sehyun Oh (11:42:43): > > > dir("/Applications/google-cloud-sdk/bin/") > [1] "bootstrapping" "bq" "dev_appserver.py" "docker-credential-gcloud" > [5] "endpointscfg.py" "gcloud" "git-credential-gcloud.sh" "gsutil" > [9] "java_dev_appserver.sh" > > Sys.setenv(GCLOUD_SDK_PATH="/Applications/google-cloud-sdk") > > terra = AnVIL::Terra() > > >
Vince Carey (15:06:34): > I will be late to terra call – may not make it for a while
2019-10-23
Vince Carey (12:29:34): > > folder = "[gs://firecloud-tcga-open-access/tcga/dcc/gbm/rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data/unc.edu_GBM.IlluminaHiSeq_RNASeqV2.Level_3.1.2.0/](gs://firecloud-tcga-open-access/tcga/dcc/gbm/rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data/unc.edu_GBM.IlluminaHiSeq_RNASeqV2.Level_3.1.2.0/)" > expr_norm_ > system("mkdir GBM_expr_norm_txt") > GBM_expr_norm_txt > localize(folder, "GBM_expr_norm_txt") > > WARNING: gsutil rsync uses hashes when modification time is not available at > both the source and destination. Your crcmod installation isn't using the > module's C extension, so checksumming will run very slowly. If this is your > first rsync since updating gsutil, this rsync can take significantly longer than > usual. For help installing the extension, please see "gsutil help crcmod". > > Building synchronization state... > Starting synchronization... > Would copy[gs://firecloud-tcga-open-access/tcga/dcc/gbm/rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data/unc.edu_GBM.IlluminaHiSeq_RNASeqV2.Level_3.1.2.0/TCGA-02-0047-01A-01R-1849-01.data.txt](gs://firecloud-tcga-open-access/tcga/dcc/gbm/rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data/unc.edu_GBM.IlluminaHiSeq_RNASeqV2.Level_3.1.2.0/TCGA-02-0047-01A-01R-1849-01.data.txt)to[file://GBM_expr_norm_txt/TCGA-02-0047-01A-01R-1849-01.data.txt](file://GBM_expr_norm_txt/TCGA-02-0047-01A-01R-1849-01.data.txt) >
Vince Carey (12:32:58): > no data seem to be transferred to the terra workspace through this localize request. folders “edit” and “safe” were produced.
Sean Davis (12:39:02): > https://github.com/Bioconductor/AnVIL/blob/master/R/localize.R#L17-L19
Sean Davis (12:40:02): > Looks likedry
is the default forlocalize()
? Seehttps://cloud.google.com/storage/docs/gsutil/commands/rsyncand take a look at the-n
option.
Sean Davis (12:40:34): > The hint is theWould copy....
in the output above.
Vince Carey (12:44:38): > Oh yes, thanks!
Martin Morgan (13:03:00): > be careful with localize / rsync, mis-use can ‘sync’ your empty local folder to your full google bucket, making your google bucket empty! hence thedry=
Vince Carey (15:02:58): > thanks, that sounds like a condition that could be checked and an extra “do you really want to” query could be interposed. A “force” parameter could be included to skip this check/query.
Martin Morgan (19:07:57): > dry=
is used by I think curatedTCGA… I made this an issue athttps://github.com/Bioconductor/AnVIL/issues/18
Vince Carey (22:38:11): > https://app.terra.bio/#workspaces/landmarkanvil2/BIOC_TCGA_V1_0_GBM/notebooksincludes code that, given a tumor site (like ‘brca’) and an assay type (like ‘rsem_genes_norm’ – need a vocabulary of short names), acquires and lightly annotates the matrix of assay quantifications. tested for ‘acc’, ‘rsem_genes_norm’ and ‘rppa’ … I have shared workspace pretty broadly, if you cannot see it let me know. The assembly of the matrix seems reasonably fast, uses localize and parses all the text files in a naive way.
Vince Carey (22:39:50): > The data seem to be straight from GDC. Apologies if I have failed to take advantage of infrastructure in the GDC package … undoubtedly there is help there for the unwieldy filenames.
2019-10-24
Vince Carey (10:19:23): > Now inhttps://github.com/vjcitn/BiocAnvilTK– build_tcga_mat
Vince Carey (10:54:26): > Localization is time-consuming. Are there prospects for running read.delim on gs:// files? Maybe there is something in R googleverse to do this?
Sean Davis (11:00:30): > http://code.markedmondson.me/googleCloudStorageR/index.html
Martin Morgan (11:01:27): > Currently you can make a pipep = gsutil_pipe("gs://...")
and thenread.delim(p, ...)
; this still pulls the entire file for processing, and it does it less (much?) efficiently thangsutil_cp()
locally and then reading, which is I think also less efficient than localizing several files at once. > > Another potential option is to use tabix to index the file (with the index stored locally) and then read the relevant chunks. This wouldn’t work in Bioc currently, but it could be explored using command-line tabix for feasibility; it might have restrictions on table structure and the indexing could be as expensive as pulling the file locally anyway… I think the actual queries would transfer just the data requested.
Martin Morgan (11:03:15): > I don’t think googleCloudStorageR helps here really, it provides an alternative (better, yes!) than shelling out to gsutil / gcloud for retrieving the file. But retrieving the file is I guess the expensive part of localize?
Vince Carey (11:49:13): > For these textfiles, the retrieval and parsing is not a showstopper. How to construe the results of this sort of work as a reusable intermediate representation for AnVIL users is an open question – I think we should consider how the *Hub concepts play out here. But I don’t want to do too much more until Gen3 is fully operational for us. Observing the performance of the file-per-assay-per-subject for this processed data has been instructive. It is better than I expected. I want to turn to alignments now.
Vince Carey (12:01:15): > New error, possibly transient > > Error in curl::curl_fetch_memory(url, handle = handle): Error in the HTTP2 framing layer > Traceback: > > 1. avtable("sample", namespace = "broad-firecloud-tcga", name = "TCGA_GBM_OpenAccess_V1-0_DATA") > 2. Terra()$getEntities(namespace, name, table) > 3. httr::GET(url = get_url(x), config = get_config(), httr::content_type("application/json"), > . httr::accept_json(), httr::add_headers(.headers = .headers)) > 4. request_perform(req, hu$handle$handle) > 5. request_fetch(req$output, req$url, handle) > 6. request_fetch.write_memory(req$output, req$url, handle) > 7. curl::curl_fetch_memory(url, handle = handle) >
Martin Morgan (12:04:42): > when I’ve seen these simply re-issuing the command usually works, and diagnosing is ‘above my pay grade’ as they say; it seems to be at the curl level and any hints appreciated…
Vince Carey (12:13:45): > Yes it was transient.
Marcel Ramos Pérez (13:11:36) (in thread): > It’s a known issue with workarounds:https://github.com/jeroen/curl/issues/156
BJ Stubbs (15:32:31): > I see this all the time. Maybe 5% of all calls from multiple machines and networks.
BJ Stubbs (15:34:42): > It usually works the second time I run the httr call.
Martin Morgan (15:38:34) (in thread): > I opened an issue…https://github.com/Bioconductor/AnVIL/issues/19
2019-10-25
Martin Morgan (06:05:38): > I updatedhttps://github.com/Bioconductor/AnVIL_Adminandhttps://bioconductor.github.io/AnVIL_Admin/to try to provide some additional orientation about our container – terra-jupyter-bioconductor (different from bioconductor_full) – and where to report issues (R / Bioc on our fork of the DataBiosphere terra-docker repository; our fork is athttps://github.com/Bioconductor/terra-docker - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
2019-10-28
Vince Carey (15:32:16): > call tomorrow at 10?
Lori Shepherd (16:52:12): > Yes. Tomorrow at 10.
2019-10-29
Lori Shepherd (07:53:23): > Here is the agenda for the day - Since we didn’t have much time for updates last meeting I wanted to leave time for this - feel free to add in here anything you wanted to highlight -https://docs.google.com/document/d/1sxCRkfLu9c_qUoOsQBQ1fn81shaWAqjQZGk34NIj1eg/edit?usp=sharing
Lori Shepherd (10:02:30): > our bi weekly meeting is starting now. log in if interested
2019-10-31
Lori Shepherd (09:01:38): > Hi<!channel>- Just wanted to see who was planning on attending the Dec meeting?https://sites.google.com/view/anvildeveloperdays/
Martin Morgan (09:02:37): > I’ll be there ; is there a link to the agenda?:thumbsup:
Nitesh Turaga (09:02:49): > I’ll register as well.
Lori Shepherd (09:04:32) (in thread): > There is an agenda link on the website page
Nitesh Turaga (09:05:38) (in thread): > https://docs.google.com/document/d/10pq3WOvhfUaLO5RhXgfJyT9ZcL-PZ8xp8LjBYJx25_Q/edit
Nitesh Turaga (09:05:43): > Agenda :https://docs.google.com/document/d/10pq3WOvhfUaLO5RhXgfJyT9ZcL-PZ8xp8LjBYJx25_Q/edit
Vince Carey (09:18:55): > @BJ Stubbsis planning to attend. I probably will not attend given the event’s proximity to bioc europe, but I see that registration is open until 25 Nov and will notify if I change my mind.
Lori Shepherd (09:25:54): > Anyone that is planning to attend please remember to fill out the registration form - There is a link on the main website page:https://sites.google.com/view/anvildeveloperdays/
2019-11-01
Nitesh Turaga (07:12:11): > I remember we spoke about K8s in our biweekly meeting, I was going through some slides and found this, > > Seems like there are plans in 2020 to add K8s support for scaling Galaxy. It might be a better idea to follow the galaxy/terra support of K8s before Bioconductor adopts it. - File (PNG): Screen Shot 2019-11-01 at 4.39.59 PM.png
Nitesh Turaga (07:12:18): > Link to presentation:https://docs.google.com/presentation/d/1BTv7wVTWNzzAaV-D2U2J6yYwAcWPQXsCd5yL4e9vDIs/edit#slide=id.g6f4f75eb94_0_60
2019-11-02
James Taylor (12:47:04): > @Nitesh TuragaAlthough it says “F2F with JHU group in September” Martin was also at that meeting an involved in the discussions. My impression is that Broad is going to deliver RStudio under the existing Leonardo+Docker model for the moment because they have other “customers” demanding it. Later things will be redeployed under k8s at which point the k8s APIs will be exposed for scaling out additional containers.
Martin Morgan (15:17:47): > Clare also presented this at a recent Tuesday technical working group call (linked in the technical call agendahttps://docs.google.com/document/d/1XcTR3rDFP4oE_4Ggl1WfD7nRfKPSHmWaE2_amIjfuk4/edit?usp=sharing) and indeed k8s is not prominent in our (Bioconductor) thinking for the current quarter. We’re definitely excited and participating in RStudio delivery via docker… It was also interesting to read about Galaxy activity in the October 15 quarterly report; I asked about that a bit (off-topic) during Clare’s presentation and hopefully there’ll be some discussion when Enis next talks at the technical call, including user authentication and Galaxy views on security
2019-11-03
Nitesh Turaga (13:34:08): > Thanks@James Taylorfor the input.
2019-11-04
Sehyun Oh (14:35:36): > Can we access the contents under****‘Workspace > Data > Other Data > Files’****usingAnVIL
package?
BJ Stubbs (15:24:04): > Yes, but it is not obvious how
BJ Stubbs (15:27:09): > The files in the the “files” section live in the bucket associated with the workspace
BJ Stubbs (15:30:39): > you can get the bucket name using the api call
BJ Stubbs (15:31:00): > terra$getWorkspace(“billingcode”, “workspacename”)
BJ Stubbs (15:31:39): > the name of the bucket is in the response under “\(workspace\)bucketName”
BJ Stubbs (15:32:09): > Then you can use google bucket commands to fetch or put things
BJ Stubbs (15:32:22): > @Sehyun Oh
Sehyun Oh (15:37:25): > Thanks a lot,@BJ Stubbs! I could access the file this way.
Sehyun Oh (15:41:40): > FYI, I’m working one of the use cases - ‘Terra Notebook in the class’, which Levi actually tried but had a difficulty to use uploaded data within the notebook.@Levi Waldron
BJ Stubbs (15:44:26): > Sounds cool. Glad I could help
Martin Morgan (16:52:50): > I had thought there was an environment variable for bucket, but thanks for thegetWorkspace()
hint@BJ Stubbs! If you install the latest version of AnVIL (may require a new AnVIL_rapiclient) > > suppressMessages({ > BiocManager::install("Bioconductor/AnVIL") > }) >
> then the following shows how to discover, write, and read directly to / from the bucket > > ## From within AnVIL... > bucket <- avbucket() # discover bucket > path <- file.path(bucket, "mtcars.tab") > gsutil_ls(dirname(path)) # no 'mtcars.tab'... > write.table(mtcars, gsutil_pipe(path, "w")) # write to bucket > gsutil_stat(path) # yep, there! > read.table(gsutil_pipe(path, "r")) # read from bucket >
> This is shown on the help page?avbucket
.
Marcel Ramos Pérez (16:54:27) (in thread): > Can we add aRemotes:
field in theDESCRIPTION
?
Martin Morgan (16:56:00) (in thread): > I’ll admit my ignorance to how aRemotes:
field works… (esp. withAnVIL_rapiclient
where the package and repository names differ?)
Marcel Ramos Pérez (17:00:39) (in thread): > A pointer to the repository should sufficeRemotes: Bioconductor/AnVIL_rapiclient
Martin Morgan (17:08:53) (in thread): > sorry, I’m still ignorant, how will this make installation smoother?
Marcel Ramos Pérez (17:29:34) (in thread): > By grabbing updates fromAnVIL_rapiclient
from GitHub automatically when doingremotes::install_github("Bioconductor/AnVIL", repos = BiocManager::repositories())
Marcel Ramos Pérez (17:29:56) (in thread): > > Downloading GitHub repo Bioconductor/AnVIL@master > These packages have more recent versions available. > Which would you like to update? > > 1: All > 2: CRAN packages only > 3: None > 4: rapiclient (e8d1706a6... -> 154198b52...) [GitHub] >
Martin Morgan (17:30:27) (in thread): > I see; why notBiocManager::install("Bioconductor/AnVIL")
?
Marcel Ramos Pérez (17:33:05) (in thread): > Because it will installhttps://cloud.r-project.org/src/contrib/rapiclient_0.1.2.tar.gzif you don’t haverapiclient
installed already
Martin Morgan (17:36:15) (in thread): > > > BiocManager::install("Bioconductor/AnVIL") > Bioconductor version 3.11 (BiocManager 1.30.9), R Under development (unstable) > (2019-10-31 r77350) > Installing github package(s) 'Bioconductor/AnVIL' > Downloading GitHub repo Bioconductor/AnVIL@master > Downloading GitHub repo Bioconductor/AnVIL_rapiclient@master > ... > > packageVersion("rapiclient") > [1] '0.1.2.3.4' >
Martin Morgan (17:38:30) (in thread): > So this makes it seem like a win, mostly, and I added the Remotes: field to the DESCRIPTION file, thanks!
2019-11-05
BJ Stubbs (11:47:45): > Awesome! I need to update terraplane to use all of the new functionality you have added@Martin Morgan
Levi Waldron (13:26:09): > This definitely helps@BJ Stubbsand@Martin Morgan. But it seems to me to still leave too many implementation details exposed to the user, and I’m not sure whether that’s something that can be solved within the R container or is further upstream. Perhaps I was spoiled by Azure Notebooks, which seems to mount the “workspace” volume as the home directory in the notebook server, so whatever you upload on the “workspace” page, or any subdirectories you create there, just appear as your~/
home directory within the notebook. That also has the reverse effect thatwrite.csv(mtcars, "mtcars.csv")
writes a file that is then visible from the workspace page and remains after the runtime is shut down. Do you think that is something that could be accomplished in the Docker startup call on AnVIL?
BJ Stubbs (13:59:48): > @Levi WaldronMaybe. I am asking on the anvil slack if the notebooks have the scope needed for gcsfuse, if so, we could in theory add gcsfuse to the bioconductor docker image, and add a small helper function to create a mounted directory on the notebook associated bucket, and the user could use that directory to maintain persistence. I am not sure how it would affect performance.
BJ Stubbs (14:00:31): > If I don’t hear back shortly, I will just try to do it and see what happens. If it does work, it could also be a way to approach the binary package idea
Levi Waldron (14:00:47): > Cool!
BJ Stubbs (14:06:37): > Looks like it is worth testing. I will get back to the channel when I have a bit to try this.
Sean Davis (14:37:54): > Keep in mind that gcsfuse does not allow metadata storage, so if anvil is using metadata tagging on files, those can be lost when accessing with gcsfuse.
Sean Davis (14:40:57): > Taking a more object-oriented approach can be helpful to smooth over some of the incongruities between R and Terra.https://github.com/sbg/sevenbridges-r#complete-api-r-clientfor example….
BJ Stubbs (14:50:03): > That is an interesting idea.
Kasper D. Hansen (15:06:48): > From a user perspective, Levi’s experience with Azure is much preferred
Martin Morgan (16:06:32): > another approach might sync a mount point (directory path) to a bucket on shutdown; I think this is how python notebooks persist across runtimes… this would at least be consistent w/ what AnVIL / terra is already doing, so when they come up with something better we could piggy-back…
Martin Morgan (17:01:52): > also fwiw I was hoping to get to an R connection, so thatwrite.table(mtcars, avcon("my/mtcars.tab"))
orcon = avcon("my/mtcars.tab"); write.table(mtcars, con); read.table(con)
works, which is a little closer to a file system. It does cost to read from buckets; I don’t know if this means one should be reminded of the extra step by requiring special action…
Kevin Blighe (21:35:23): > @Kevin Blighe has joined the channel
2019-11-06
BJ Stubbs (12:12:48): > Looks like gcsfuse needs the docker to be run with elevated permissions, so it will not work at the moment
BJ Stubbs (16:52:23): > Just throwing this out there. Gen3 authentication example. > > #load libs > library(httr) > library(jsonlite) > > #token is on profile page at[https://staging.theanvil.io](https://staging.theanvil.io)#download the json token > > #set url of login page > url="[https://staging.theanvil.io/user/credentials/cdis/access_token](https://staging.theanvil.io/user/credentials/cdis/access_token)" > > #post this token JSON to the login page. This returns a bearer token > b=POST(url,body=fromJSON("Downloads/credentials.json"), encode="json") > > #now our bearer token is our key - like dockstore > mytoken=as.character(fromJSON(content(b,"text"))$access_token) > > #Create the project id query in json > query = '{"query":"{project(first:100){project_id id}}"}'; > > #graphql query processor endpoint > url2="[https://staging.theanvil.io/api/v0/submission/graphql/](https://staging.theanvil.io/api/v0/submission/graphql/)" > > #execute the query on the endpoint > b2=POST(url2, body=query, encode="json",add_headers(c(Authorization=paste("Bearer",mytoken)))) > content(b2) >
2019-11-07
Laurent Gatto (11:52:51): > @Laurent Gatto has joined the channel
2019-11-12
Lori Shepherd (07:44:23): > There is a biweekly meeting today at 10am - I have started an agenda here - mostly wanted to discuss project updates / blockers , discuss working with Terra Outreach for Bioconductor centric workflows/tutorials/docs/etc and continued discussion of what we are thinking of defining our Q1 and Q2 goals -https://docs.google.com/document/d/1euLEZWoTWMf5si7JzwkdzNT6IjpR5jd8mdrZBW0P60A/edit?usp=sharing
Martin Morgan (09:30:44): > Thanks Lori I fleshed out a couple of topics I’d like to talk about…
Martin Morgan (09:41:19) (in thread): > I’ve been exploring this a bit; we can discuss at the meeting today
Vince Carey (10:04:11): > waiting for moderator
2019-11-14
Nitesh Turaga (12:22:29): > Since, this is not being used, I vote we deprecate this repo,https://github.com/Bioconductor/AnVIL_Docker. Name it AnVIL_Docker_DEPRECATED
Nitesh Turaga (12:22:47): > @Martin Morgan@Lori Shepherdand rest of anvil team, thoughts?
Nitesh Turaga (12:23:33): > I just forked the anvilproject/anvil-docker image and it’ll quickly get confusing.
Martin Morgan (12:27:42): > I think you should simply remove it (or make it private, so that it persists ‘in case…’) since it was really exploratory…
Nitesh Turaga (12:28:04): > Ok. I’ll make it private.
Nitesh Turaga (12:29:48): > The repo has been made private now.
2019-11-19
Martin Morgan (10:10:20): > ok naive jupyter question how do I gracefully interrupt an R computation in a cell? the equivalent of ctrl-c, accomplishing the equivalent of > > > Sys.sleep(10000) > ^C > > >
Nitesh Turaga (10:20:48): > Hmm…that’s interesting….I think there is a “Kernel interrupt” option in the menu.
Martin Morgan (10:24:54): > Unfortunately that doesn’t help for me – the cell still seems to be evaluating…
Nitesh Turaga (10:27:51): > Can you just interrupt the terminal from which you started the jupyter notebook?
Nitesh Turaga (10:28:06): > Oops, this is on the AnVIL channel.
Nitesh Turaga (10:28:29): > But maybe you can hit the “Stop” button on the notebook runtime?
Nitesh Turaga (10:43:44): > Did that work?
Martin Morgan (11:02:11): > no; my only solution has been to restart the notebook runtime. > > Next question: how can I get a shell prompt from a notebook? it seems like I can’tBiocManager::install('Rhdf5lib')
but the notebook is swallowing some output (to stdout??)
Martin Morgan (11:10:21): > Ah, running > > pkg = file.path(tempdir(), "Rhdf5lib_1.6.3.tar.gz") > print(system2("R", c("CMD", "INSTALL", pkg), stdout = TRUE, stderr = TRUE)) >
> tells me that there’s an 00Lock, I guess from a previous failed install…
Nitesh Turaga (11:26:29): > Yes, possibly. That is a “Bug” though, because i’m sure users will come across failed installs and then won’t be able to install again.
Martin Morgan (12:11:33): > ok next and final notebook question for today – at some point my nice dplyr tibble output with fixed width font and only some columns shown gets replaced with a ‘pretty’ table that shows all columns, more rows, and doesn’t use a fixed width font. I can recover the original by explicitly callingprint(tbl)
rather than relying on auto-printingtbl
. Who’s messing with my output and how can I stop them?
Nitesh Turaga (12:12:33): > Can you send a screenshot of the “pretty” table which you see?
Nitesh Turaga (12:12:54): > That might just be the jupyter notebooks default way of showing rectangular data. But I can look into it.
Martin Morgan (13:13:25): > Here’s what I get in a new R session; oddly,print()
cleans things up. Also,sessionInfo()
shows a number of packages loaded, maybe as a consequence of the kernel or of a system-wide.Rprofile
? I’d guess the culprit was thecrayon
package, but that’s just a guess… - File (PNG): Screen Shot 2019-11-19 at 1.08.16 PM.png
Marcel Ramos Pérez (13:38:22): > It looks like an html table
Nitesh Turaga (13:38:42): > Yeah. I think that’s default jupyter.
Martin Morgan (13:42:31): > if it’s a default then it can be changed?
Marcel Ramos Pérez (13:54:00): > I’m not seeing any default option that can be modified.https://github.com/jupyter/notebook/pull/1776They use some type of css (less) to render the table
Nitesh Turaga (14:25:27): > So, just to clarify,Rhdf5lib
installs in the notebook but you want to see the output?
Martin Morgan (14:34:23): > It installed when I removed the lock, but the notebook only showed me something along the lines of ‘Warning: Rhdflib failed to install” but did not show me the helpful part of the output, which was along the lines of “Error: 00Lock… already exists’. Also it doesn’t show the details of the installation, like the C code getting compiled, which in a weird way is reassuring as a ‘progress bar’. I guess it’s some decision in jupyter notebooks to not show one or the other of stdout or stderr to the user…
Nitesh Turaga (14:34:49): > Right, I see.
Marcel Ramos Pérez (14:49:15): > Perhapshttps://github.com/IRkernel/IRkernelis setting some R output defaults?
Martin Morgan (14:49:26): > I thinkoptions(jupyter.rich_display = FALSE)
set by theIRdisplay:::.onLoad()
fixes my display problem. Well actually@Marcel Ramos Pérezyou’re right it’s set byIRkernel:::.onLoad()
which sets > > > IRkernel:::jupyter_option_defaults > $jupyter.rich_display > [1] TRUE > ... >
Martin Morgan (15:35:08): > I think my problem with the output is that notebooks don’t by default echo stderr – the following produces no output > > cat("foo\n", file=stderr()) >
> but I’m not sure how to work around that
Sean Davis (16:07:53): > As far as I can tell, this is an open issue:https://github.com/jupyter/notebook/issues/1461
2019-11-20
Vince Carey (12:54:57): > When I need to see stderr I typically use the associated terminal. Were you able to get a terminal? In AnVIL, the terminals freeze after two minutes of initiation, but you can start a new one and there is continuity of the session.
Martin Morgan (16:43:24): > I did something withsystem2()
, capturing stderr. Definitely not for the faint of heart. Also have experienced the terminal freeze issue, making the terminal pretty close to useless…
2019-11-21
Nitesh Turaga (12:41:20): > Just checking to clarify again, but i’ve tried theoptions(jupyter.rich_display = FALSE)
, and it doesn’t seem to help the display. Was there a different resolution to this problem?
2019-11-25
Lori Shepherd (12:10:23): > A reminder about our Bi-weekly call tomorrow at 10am EST -https://bluejeans.com/480153337?src=calendarLinkI have started an agenda. Please add Project Updates.https://docs.google.com/document/d/1Zbyc5SOBQbcELJI9XIHt6C3LneIvkrrQa062yJ-VikA/edit?usp=sharingI would prefer to go over and add/delete Q1/Q2 milestones interactively at tomorrow’s meeting rather than anyone altering. This will be the finalized Bioconductor milestone list to share with the technical group/PMs
2019-11-26
Vince Carey (07:58:46): > I may not be able to make this call. I have added some items to the updates. I did make some minor changes to the deliverables, probably i should have just done it as comments – sorry.
Lori Shepherd (08:07:43): > Thats okay - I see you moved up the instrumentation toolset from Q1 to Q2 - or at least the marker to have a draft - as long as you feel like this is reasonable and have the time to work on it I’m okay with that -
Nitesh Turaga (10:18:47): > https://github.com/anvilproject/anvil-docker
Nitesh Turaga (10:22:42): > https://github.com/Bioconductor/AnVIL_Docker/tree/master/rstudio/bioc_3.9
Nitesh Turaga (11:05:02): > This is the PR focusing on getting the images which were updated/fixed to terra soonhttps://github.com/DataBiosphere/terra-ui/pull/1919My hope is they will make the update - test - release process smoother for this process.
Vince Carey (11:24:10): - File (JPEG): Image from iOS
Vince Carey (11:24:28): > where the call concluded
Sean Davis (12:08:51): > Coming into port, or sailing out to sea?
2019-11-27
hcorrada (08:40:04): > @hcorrada has joined the channel
Martin Morgan (09:07:10): > Sincehttps://app.terra.bio/#workspaces/featured-workspaces-hca/HCA_Optimus_Pipelineseems to capture the upstream sc workflow (there are alternative flavors at HCA as well, so presumably eventually available from Dockstore) I think@Sehyun Oh’s proposed Q2 deliverable needs to be removed / revised??
Sehyun Oh (11:01:50): > Yes, this workspace appears to overlap with what I proposed. Let’s remove the Q2 deliverable for now.
Nitesh Turaga (11:14:39): > Just an update on the statement I made during the meeting yesterday, the rstudio images fromgithub.com/anvilproject/anvil-dockercannot be tested at the moment or run on Leonardo for testing as there is ongoing work in progresshttps://github.com/DataBiosphere/leonardo/pull/1153. > > Although, there is a Bioconductor fork of theanvil-docker
repo atgithub.com/Bioconductor/anvil-dockerwith two images anvil-rstudio-base and anvil-rstudio-bioconductor.
Nitesh Turaga (11:15:09): > The WIP information on the leonardo front comes from the terra-team.
2019-12-03
Frederick Tan (15:27:24): > @Frederick Tan has joined the channel
2019-12-06
Lori Shepherd (11:00:22): > <!channel>Reminder for our Bioconductor AnVIL biweekly meeting next Tue at 10 - Also the bi-weekly technical meeting at 4pm on Dec 10th is our turn for updates and/or demos so we should think about what we would like share.
2019-12-09
Lori Shepherd (09:09:45): > <!channel>Would there be any objection to moving tomorrow’s meeting from 10 am EST to 11 am EST ?
2019-12-10
Lori Shepherd (08:25:44): > <!channel>Here is the agenda for today’s meeting -https://docs.google.com/document/d/1ulfo6c89kSuPdCvEfc3aSQrNt_z8ezZodse-8-ZVYLM/edit?usp=sharingLast minute feedback - I would like to start this meeting at 11 am ? Will everyone still be able to attend?
Vince Carey (08:32:32): > i think we can do it from belgium
Lori Shepherd (10:59:18): > https://bluejeans.com/480153337?src=calendarLinkHere is the link to the meeting happening now if anyone needs it
Nitesh Turaga (11:22:12): > us.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor@sha256:7f3275cf4ce70a6d68a8d790b9c2aaaf0508c587e7da1bd18b41895451be8cdc
Nitesh Turaga (11:28:56): > https://github.com/DataBiosphere/terra-docker/tree/master/terra-jupyter-bioconductor
Vince Carey (11:42:17): > app.terra.bio/#workspaces/landmarkanvil2/BIOC_TCGA_V1_0_GBM/notebooks/launch/BIOC%20AnVIL%20TCGA%20GBM.ipynb
Nitesh Turaga (11:45:07): > Just to address a question Martin posed during my update,
Nitesh Turaga (11:45:28): > There seems to be a way to tell the version which corresponds to the CHANGELOG.md ,
Nitesh Turaga (11:46:30): - File (PNG): Screen Shot 2019-12-10 at 11.42.31 AM.png
Nitesh Turaga (11:46:38): > But it’s not obvious at all.
Nitesh Turaga (11:47:06): > Changelog:https://github.com/DataBiosphere/terra-docker/blob/master/terra-jupyter-bioconductor/CHANGELOG.md
BJ Stubbs (11:48:52): > https://app.terra.bio/#workspaces/landmarkanvil2/bioconductorTCGA/notebooks/launch/bioconductorTCGA.ipynb
Nitesh Turaga (12:18:10): > I have a generic question for the group, do you all think the terra UI is intuitive for users?
Vince Carey (12:18:30): > https://support.terra.bio/hc/en-us/articles/360027083172#installing - Attachment (Terra Support): Terra’s Jupyter Notebooks environment Part II: Key operations > This article walks through what happens when you perform key operations with your notebooks. Understanding what is happening behind the curtain can help your process to be more efficient and avoid …
2019-12-11
Christine Choirat (12:06:46): > @Christine Choirat has joined the channel
2019-12-12
Martin Morgan (16:38:37): > @BJ Stubbscan you upload your gen3 R script you showed me earlier today?
BJ Stubbs (16:39:10): > Sure
Martin Morgan (16:41:16): > I guess I’m also interested in which / if any of the steps are part of a published API?
BJ Stubbs (16:42:18): > Where should i put it? I will try to add a reference
Martin Morgan (17:12:58): > maybe add a link (to ‘Programmatic access’) /add as an ‘appendix’ tohttps://docs.google.com/document/d/1xYf_xkoDxA-frjByDjnS62eXYttWqYnhdVN5kJtagKs/edit?usp=sharing?
2019-12-17
Nitesh Turaga (12:03:49): > @BJ Stubbs@Martin MorganHow did we bypass the setcookie issue with jupyter and RStudio notebooks? Or is this just something we are leaving to the terra group to solve?
Nitesh Turaga (12:04:49): > (Recap on the issue itself, RStudio launched through swagger keeps locking me out as the cookie expires and I have to reauthorize and re-“setCookie” to gain access to the RStudio instance’s IP)
Nitesh Turaga (12:09:12): > It doesn’t happen on the terra platform, so i’m guessing that once RStudio gets on the terra platform, we are “Free” of this issue?
BJ Stubbs (12:31:50): > I haven’t tried it yet, but the other slack seems to think changing “autopause”: true, > “autopauseThreshold”: 0 in the payload may fix it. If you set the autopause to false the cluster will never stop, but I am not sure what the number means in the threshold means. We can ask Rob Title@Nitesh Turaga
BJ Stubbs (13:48:59): > On another front, it seems that we can change a setting in gcloud console that will upload daily cost info for a google project by service type (storage, compute, etc) into a bigquery dataset that can be queried.https://cloud.google.com/billing/docs/how-to/export-data-bigquery
2019-12-20
Lori Shepherd (12:04:59): > <!here>Hello all - I would like to formally suggest cancelling this coming Tue. Dec 24ths meeting assuming many are away enjoying the holidays - Happy Holidays and see everyone after the New Year!
2020-01-06
Vince Carey (16:17:21): > @Lori Shepherdwill there be a call tomorrow?
Lori Shepherd (16:20:40): > I was planning on it, Yes. a short one just for everyone to check in and get refocused.
2020-01-07
Lori Shepherd (07:55:39): > <!here>Sorry this is late but draft agenda for 10 am EST meeting todayhttps://docs.google.com/document/d/1UXHTsMUmobklWztjxspF2_cIdoFfi3cuzzmJ5lgl9o4/edit?usp=sharing
Lori Shepherd (10:01:41): > be right there everyone - finishing up another meeting
Nitesh Turaga (10:12:24): > https://github.com/Bioconductor/bioconductor_docker
Nitesh Turaga (10:41:43): > https://github.com/mtmorgan/RedisParam
Marcel Ramos Pérez (11:24:47): > RStudio open source vs pro comparison :https://rstudio.com/products/rstudio-server-pro/comparison/
Nitesh Turaga (16:43:12): > https://github.com/Bioconductor/AnVIL_Admin/projectsI think we should re-invigorate this page and keep better track of things ?
2020-01-08
Lori Shepherd (07:56:53): > To have a glance at the overall picture it might not be a bad idea to clean up - to actually keep track of what is being worked on and who is doing it I would suggest having a fully implemented trello board where sticky could be assigned to people which unless I’m missing something obvious is not possible in projects under github? - there was the thought (i think by you when I originally suggested this) that it might be too much to keep track of and to have additional deadlines imposed - agenda,github.io, and a separate trello board - I propose cleaning this up to have the better broad view of what our projects (current/future deliverables) are [perhaps mimicing the PM Jira board so the team is aware what is on that board for Bioconductor] - is there any opinion yay/nay for creating a trello board where everyone can add/assign sticky for what they are working on specifically or is there a way to do this in the projects?
Levi Waldron (08:28:10): > Just got the invite to this workspace:https://app.terra.bio/#workspaces/help-gatk/Bioconductor_DEV
Vince Carey (10:55:05) (in thread): > I am not clear on the advantages of trello over the projects board. I think you can make the choice@Lori Shepherdand – perhaps – set up cron jobs that ping team members by email to update their board – even to say nothing has occurred.
Sean Davis (10:59:58): > I have been using ZenHub recently. It predates Github Projects, but has some nice features for project management including epics, milestones, dependencies, releases and release reports, and workspaces.
Sean Davis (11:00:52): > https://www.zenhub.com/ - Attachment (ZenHub): ZenHub - Agile Project Management for GitHub > ZenHub is agile project management within GitHub. Add powerful tracking, planning and reporting features to GitHub!
Sean Davis (11:01:11): > Free for public and academic use.
Nitesh Turaga (11:07:53) (in thread): > I fell back onto Github Projects after having the initial conversation with you and suggesting Trello@Lori Shepherd, as I felt there was no traction to Trello from the team. I’m actually not even sure how many people have Trello accounts, and thought it would be worth while if we atleast try to re-invigorate the Github Projects page. > > But if you are on board with Trello, then I think that is the best option as it allows us to tag people and set due dates etc. JIRA is a little too complicated for us at the moment, doesn’t seem like it’s value for time invested in setting it up. > > So, from me, it’s a YAY for trello:smile:
Vince Carey (11:59:52) (in thread): > We have used Trello locally … it seems good but we just aren’t disciplined enough to keep everything current.
Vince Carey (12:01:24): > Zenhub Integration with github is a big plus.
Peter Hickey (15:56:56): > @Peter Hickey has joined the channel
2020-01-20
Lori Shepherd (15:06:06): > Just a reminder we will have our bi weekly meeting tomorrow at 10 EST. I’ll post an agenda link in the morning
2020-01-21
Lori Shepherd (08:07:34): > https://docs.google.com/document/d/1wzjzPeQbXlg6ZXQfYP0Ou9rndgqZIvAjWmHqujtCQjw/edit?usp=sharing
Lori Shepherd (10:01:21): > <!channel>Reminder our bi-weekly meeting is starting now athttps://bluejeans.com/480153337?src=calendarLink
Vince Carey (10:01:36): > i will be a little late
2020-01-28
Lori Shepherd (07:41:57): > Hello all - mid week reminder - Please add cards/tasks to the project board for Q1 Y2 (projects through end of March)https://github.com/Bioconductor/AnVIL_Admin/projects/5by next weeks meeting
Lori Shepherd (07:45:41): > Also reminder - next week Feb 4th is our turn to present in the weekly technical meeting @Vince Careyand@BJ Stubbswere you still able to put together a presentation for the meeting? A reminder we do have a folder on the team drive (Technical Working Group Presentations) to store and share - I think it would be a good plan to gloss over the presentations in our internal 10 am meeting next week before the technical call @Levi WaldronI know you said you also might be able to put together some notes/feedback but wouldn’t be able to present it yourself because of the time slot - If you still wanted to prepare this and schedule a meeting the two of us sometime this week I would be happy to present your notes on your behalf -
2020-01-30
Lori Shepherd (07:37:51): > I only have heard from@BJ Stubbs-@Levi Waldron/@Vince Careyare you preparing anything or will it just be BJ?
Vince Carey (09:29:50): > @lori I am planning on putting some material together on instrumentation. I will release ASAP … maybe this evening. We can then decide how much can be presented when we meet next tues
2020-01-31
Vince Carey (15:26:07): > Stlll working on this (instrumentation). Will have more over weekend. Should not be more than 2 slides.
2020-02-03
Vince Carey (05:17:12): > Gave up. Introspection within cromwell within terra not happening in real time. Maybe can get some clues on realtime approaches in the discussion. Otherwise BJ’s itemized billing data could well be the best data for estimating resource requirements, and one needs to wait a day to get it.
Vince Carey (05:19:17): > The timing diagram data are available after the run and if we can figure out how they are produced we will be better off …. but no one has answered my query on slack.
Lori Shepherd (09:15:19): > <!channel>Remember to add cards tohttps://github.com/Bioconductor/AnVIL_Admin/projects/5before tomorrow!!! This should be what you’ll be working on through the end of March (these can carry over into Q2 if necessary ) - So far I only see BJ working on something and Vince with a blocker
Lori Shepherd (09:15:35): > <!channel>Here is a draft agenda for tomorrows meetinghttps://docs.google.com/document/d/1PoapPBRW9FNc1qMTsTDPDG_S1xG9JDA2EuuiH27WisE/edit?usp=sharing
Sean Davis (11:52:55): > @Lori Shepherd, how does one add comments to a card? Are the cards linked to issues?
Lori Shepherd (11:57:02) (in thread): > Unfortunately in the projects, I think the cards are just text. I don’t think there is ability to comment beside adding additional text to the card (which I would recommend to keep track) . The cards are not directly linked to issues but at least some should represent the Q1/Q2 goals we set
Lori Shepherd (11:57:52) (in thread): > If we have any open issues they should be reflected in a card -
Sean Davis (12:01:22) (in thread): > Got it. Thanks for clarifying.
Lori Shepherd (12:05:52) (in thread): > I would like to use these QY project cards to keep track of whatever people are currently working on relating to the anvil project so we have a better track record of who is doing what:slightly_smiling_face:
BJ Stubbs (15:45:48): > Slides, still working on them, but I know how to get the cromwell results now
BJ Stubbs (15:45:50): > https://docs.google.com/presentation/d/1XWBFDU57Q76b9zIQRcfl4nI44lCJxDgUvNNA6Z6GvRc/edit?usp=sharing
Vince Carey (16:05:21): > here are slides on the blockers for instrumentation as i conceived it:https://docs.google.com/presentation/d/1LIfCNwZQqJn1XLWIVH2ZgllRU4drPX0vJ3WRuohKY_4/edit?usp=sharing
2020-02-04
Lori Shepherd (09:59:44): > https://bluejeans.com/480153337?src=calendarLinklink to the meeting starting momentarily
Levi Waldron (13:24:06): > In progress, Terra/Anvil challenges by Sehyun and me (I added a link also in “AnVIL - Other slides and documents” on GDrive:https://docs.google.com/presentation/d/1Pg20GOPH5vf7Ve9LYMz_vGFLfMsdvtayAUezmxKSLAU/edit?usp=sharing
Vince Carey (14:59:44): > @Levi Waldron@Sehyun OhI like your slides. BJ’s use of the billing system might solve some of your cost isolation problems Levi … not sure. Anyway our slides so far arehttps://docs.google.com/presentation/d/1EB57aEj2wbt_QTovWddRCg2LaZfi97-aJYF5cwrhA7A/edit?usp=sharingand there are 22 …
Vince Carey (15:00:36): > do we want to have a call now to review presentation strategy?
Martin Morgan (15:01:32): > we can usehttps://bluejeans.com/2024711962
Vince Carey (15:51:31): > we edited the slide deck to a single stream … costs, rstudio app, levi, sehyun
Levi Waldron (15:52:33): > I’m in bed, hope you don’t mind summarizing for me!
Vince Carey (15:52:44): > sure
Martin Morgan (17:09:47): > Thanks everyone for the really provocative presentation today!
2020-02-05
Lori Shepherd (17:11:56): > Hey everyone - Just wanted to share with you that Bioconductor got very good feedback at the PM meeting today from yesterday’s demo - Most PMs were present for it and were quite impressed with the general feeling of “wow I didn’t know that was possible”
2020-02-11
Vince Carey (12:21:15): > Should we try to discuss ideas about package provisioning on AnVIL before next week’s meeting? packrat is replaced by renv. I have a couple of notebooks that rely on a zip of binaries in a bucket in order to run > in reasonable time. This is not a durable solution.
Martin Morgan (12:27:40): > my own feeling is that we should aim for a binary ‘cran-style’ repository of binaries, just as we have for Windows and Mac. In the long term the binaries could be build by the build system, and it is not impossible to imaging install.packages() (hence BiocManager::install()) updated to install these. In the short term AnVIL::install() could be revisited – I think Nitesh had poor experience with this primarily because updating the relevant parts of the dependency graph were not incorporated (combined with needing to manually build the binary repository).
Vince Carey (18:04:30): > The repository of binaries is definitely important, but do we want to go further in the direction of a folder with installed images that can be an element of .libPaths()?
Martin Morgan (18:24:37): > do you mean like, in unix-landrsync
or if I understand correctly in googlefuse
where a location (URI?) contains the binaries and we just ‘mount’ (in a lazy way) the location? I guess packrat / renv does this on a per-project basis, where each project represents a set of packages. packrat / renv seem ok if one is publishing a subset of ‘containers’ that do things, but a per-container solution seems like it solves the problem for the producer of the container but not the user who wants flexibility.fuse
in some ways sounds perfect – we use a fuse mount as a.libPath()
, and google lazily resolves packages as needed. Did@BJ Stubbshave some negative experience with fuse, maybe the limitations in permissions and / or performance? > > I’m also thinking that, actually, not all 1800+ packages are used regularly, and that a kind of ‘package server’ might be an interesting solution – ask for package X version Y, and it’s either available as a binary because it has been asked for by someone previously, or it will be built on demand.
Vince Carey (19:28:14): > Agreed on all counts, particularly a component of build on need.
Vince Carey (19:32:53): > perhapshttps://stackoverflow.com/questions/34758090/use-gcsfuse-to-mount-google-cloud-storage-buckets-in-a-docker-containeris useful - Attachment (Stack Overflow): use gcsfuse to mount google cloud storage buckets in a docker container > I am trying to mount a google cloud bucket from within a docker container and get the following error: [root@cdbdc9ccee5b workdir]# gcsfuse -o allow_other –debug_gcs –key-file=/src/
2020-02-12
Vince Carey (07:19:16): > i am using the image that was used to run Rstudio in the last anvil meeting, building binaries with the 64 core terra machine. the resulting binaries will be gsutil cpd to a bucket. i think for the present if we can define library and loadNamespace to make transfers to .libPaths it could be a reasonable approach
Martin Morgan (07:35:18): > I’ll try to implement that functionality in the AnVIL package over the next couple of days. Instead of using a raw gsutil command considerAnVIL::gsutil_cp()
.
Thanh Le Viet (09:42:45): > @Thanh Le Viet has joined the channel
Vince Carey (10:02:22): > Yes, the gsutil_cp works fine (one folder at a time, but I take it the contents are copied in parallel), and I have a bucket that will hold binaries copied from /usr/local/lib/R/site-library (I think I filled the local disk while building so I only have about 933 packages in this demonstration). I do not know how to make the gs folder public at this time.
Martin Morgan (10:09:36): > ah, you’re building all packages! cool.
Sean Davis (13:13:45): > Just a note that this is the process we used last year for the Bioc Workshops. If I remember, the workshops were on a docker container into which we copied 777 packages as a TAR archive. Details are here:https://github.com/Bioconductor/BiocWorkshops2019
Vince Carey (14:40:47): > FWIW It looks like it is nontrivial to grant wide permissions to a workspace bucket in AnVIL. BJ can articulate the details. I have to put the binaries in a bucket that I have full control over to make them accessible to the new library()…. will keep you posted.
Sean Davis (15:29:33): > Yes, a public bucket outside of AnVIL is a good starting place. Ideally, configure it as a website so that URLs are active. Adding a DNS entry to the bucket makes the data relocatable if necessary.
Martin Morgan (18:36:13): > I’m not 100% sure that the concept athttps://github.com/mtmorgan/pkgserverwill work, but I envision the ‘archive’ part of pkgserver running on a docker image (based on bioconductor_docker) and serving binary packages to other bioconductor_docker images running the user-facing part of pkgserver. Right now it serves packages to itself, as illustrated in the README.
2020-02-13
BJ Stubbs (13:41:19): > I think you need an external bucket because the anvil google project is managed by anvil, and users do not have the ability to create buckets in the project. I could be wrong, but I believe the iam for a user’s anvil project is composed of only anvil and proxy users.
Vince Carey (13:58:28): > If you want to use the 932 package bucket as a server archive within AnVIL please send the proxy (pet account) id to use and i will provide access. I have not gotten around to opening the whole bucket….
Sean Davis (14:57:26): > In case anyone wants to try it:https://codelabs.developers.google.com/codelabs/cloud-webapp-hosting-gcs/index.html
Sean Davis (15:00:01): > https://cloud.google.com/storage/docs/gsutil/commands/acl - Attachment (Google Cloud): acl - Get, set, or change bucket and/or object ACLs | Cloud Storage
2020-02-14
Ines de Santiago (04:48:47): > @Ines de Santiago has joined the channel
Lori Shepherd (07:52:21): > <!channel>A reminder our bi-weekly meeting is next Tue Feb 18th - Since monday is the holiday I started a draft agenda earlyhttps://docs.google.com/document/d/1y-KfwOtiT5MmcS8yxUeKQsTmuWQzMq_To587XEAnHKA/edit?usp=sharing- Please make sure your project cards are up-to-date on githubhttps://github.com/Bioconductor/AnVIL_Admin/projectsas we will start with a quick review of the cards and active projects - any other discussion points or details can be added to the agenda - If there are any extra topics we can adjust the time schedule - Cheers!
2020-02-15
Vince Carey (15:03:09): > There are now 3159 folders with binary package images atgs://biocbbs_2020a/packs_3.10– this bucket should be world-readable. The packages in the software manifest that did not install are > > [1] "MeasurementError.cor" "LBE" "xps" "ChIPseqR" "LVSmiRNA" > [6] "qrqc" "ReactomePA" "FunciSNP" "methyAnalysis" "CNORdt" > [11] "chroGPS" "pRoloc" "maPredictDSC" "Rariant" "polyester" > [16] "MSGFplus" "mAPKL" "R3CPET" "pathVar" "readat" > [21] "synergyfinder" "MWASTools" "MSstatsQC" "methimpute" "phantasus" > [26] "ccfindR" "MSstatsQCgui" "CAGEfightR" "qckitfastq" "proBatch" > [31] "scAlign" "gpuMagic" "circRNAprofiler" "sojourner" "pwrEWAS" > [36] "pulsedSilac" "OmnipathR" "waddR" "MBQN" "netboxr" > [41] "ceRNAnetsim" "mitch" "selectKSigs" "PathNet" "PubScore" > [46] "NPARC" "CSSQ" "biobtreeR" "CeTF" "EnMCB" > [51] "dearseq" "easyreporting" "scMAGeCK" "packFinder" "rSWeeP" > [56] "pmp" "reconsi" "SimFFPE" "DAMEfinder" "DIAlignR" > [61] "structToolbox" "netDx" >
> The binaries should work with bioconductor_full:release
Vince Carey (15:11:01): > @Martin Morganis there a modified library() available to work with this sort of distribution within terra?
Vince Carey (15:15:23): > It looks like we need 60GB disk to hold the results of additional installations. Tomorrow we should be able to estimate other aspects of resource consumption using BJ’s billing analysis tools.
2020-02-16
Martin Morgan (07:16:07): > I’ll put something together for what you have, today. Were these built withR CMD INSTALL --build
? This seems to be the right way to go, because binaries built this way can be installed viainstall.packages(<foo.tar.gz>, repos = NULL)
, including where foo is a URL and perhaps from a ‘standard’ CRAN-style repository. Probably the setdiff between your built packages and packages built on the build system (available via BiocPkgTools, I think) is more helpful for measuring ‘success’ – there’s no reason we should be building a broken package…
Vince Carey (08:52:20): > Hi – I did not do that –build … in fact, I just use BiocManager::install and then copy from the installed folders into the bucket. There is no compression or zipping of the package code. It should not be difficult to introduce the conventional approach you describe. However to test this concept, I wrote a lib2() and the code is athttps://gist.github.com/vjcitn/d7b82160907a65fb3a4e83a2e1a1a92c… There are still pitfalls I am sure.
Vince Carey (08:55:22): > My approach of dropping the folders themselves into the bucket may seem strange – but the thought was that once we determine how to make a global .libPaths() target, the contents of the bucket could just be copied there for direct use.
Vince Carey (10:08:17): > I will make the .tgz and report back.
Vince Carey (12:48:05): > there are currently 1800 .tgz atgs://biocbbs_2020a/zpacks/(forgot to tag the bucket name) and the rest should be done in a couple of hr
Vince Carey (15:39:58): > 3159 .tgz files available
Vince Carey (16:21:24): > linux binary packages are accessible via, e.g.,http://storage.googleapis.com/biocbbs_2020a/zpacks/flowMerge_2.34.0.tgz
2020-02-17
Sean Davis (13:07:58): > Thanks,@Vince Carey. As for making the non-tgz folders available, consider looking at gcsfuse (see below). You should be able to mount the bucket/path locally as a read-only file system and add it to the .libPaths (ordered to be last in the path, most likely). I’ll give it a shot.
Sean Davis (13:20:39): > > mkdir /tmp/bioc #for mounting > gcsfuse mount biocbbs_2020a /tmp/bioc > cd /tmp/bioc > ls >
Martin Morgan (13:26:33): > How to dockerize that? – create fuse mount in Dockerfile
Sean Davis (15:38:28): > Gcsfuse inside docker OR Gcsfuse on host and mount volume into docker. Both are viable options.
Vince Carey (16:07:34): > I am seeing indications in online commentary that the container must be run in privileged mode to use gcsfuse, and that is apparently a problem for AnVIL. However the basic idea of using gcsfuse folder as a .libPaths component does seem to work in GCP. (I say seem because I failed to get R 3.6 running on my GCP machine, but even with R 3.2 library found the gcsfuse-mounted package and tried to use it. There seems to be some latency in getting access to folders under the mount point, it is not clear just how patient one needs to be.)
Sean Davis (16:52:11): > @Vince Carey, is there a forum to ask the Terra team about gcsfuse usage? It seems like something that might have come up during their many years of using GCP.
Martin Morgan (16:57:15): > @BJ Stubbshas already asked about gsfuse, I think ‘they’ though it would work but BJ found that it didn’t; this could be a configuration tweak…
Vince Carey (17:04:17): > yes, there has been some dialogue about it. i am running a docker container in privileged mode in GCP now to see how this works. i installed gcsfuse within the container. it seems to work if you use –implicit-dirs option when mounting. however there are very long latencies.
Vince Carey (17:05:37): > library(affy) took 117 seconds
Vince Carey (17:08:38): > I am using a very small machine. example(rma) is taking a very long time … not clear why. I consider this a failed experiment with gcsfuse. But there may be some way of making it work better.@Martin MorganMartin, do you have what you need for pkgserver experiments?
Vince Carey (17:09:11): > To experiment with gcsfuse on a bare GCP instance (ubuntu), > > sudo apt-get update > sudo apt-get install -y gnupg > export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s` > echo "deb[http://packages.cloud.google.com/apt](http://packages.cloud.google.com/apt)$GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list > curl[https://packages.cloud.google.com/apt/doc/apt-key.gpg](https://packages.cloud.google.com/apt/doc/apt-key.gpg)| sudo apt-key add - > sudo apt-get update > sudo apt-get install -y gcsfuse > # be sure to use --implicit-dirs with gcsfuse >
2020-02-18
Martin Morgan (09:12:45): > sorry to be slow on theinstall()
it would help to have a DCF-style manifest, likehttps://bioconductor.org/packages/3.11/bioc/src/contrib/PACKAGESbut with just Package: and Version: fields
Vince Carey (09:46:22): > I will add that ASAP.
Vince Carey (09:46:59): > My presence on AnVIL call at 10am will be inconsistent. I will join as permitted and leave from time to time silently.
Vince Carey (13:55:55): > PACKAGES.gz is now present atgs://biocbbs_2020a/zpacks/PACKAGES.gz
Nitesh Turaga (14:58:24): > looks like the anvil tech call has been cancelled today…
Vince Carey (15:33:42): > used in a .libPaths() entry, this function can grab and deposit the contents of a binary package > > function (x) > { > dir.create(x) > setwd(x) > AnVIL::localize(paste0("[gs://biocbbs_2020a/packs_3.10/](gs://biocbbs_2020a/packs_3.10/)", > x), ".", dry = FALSE) > setwd("..") > } >
> to install package p this should be applied over p and its unsatisfied dependencies
Nitesh Turaga (15:34:17): > Very nice!
Nitesh Turaga (15:34:45): > So, how do I test it? Just spin up a RStudio session on Terra…and….try it out?
Vince Carey (15:52:52): > Ah – yes, that should suffice. For a simple package like parody with no dependencies it is a pretty trivial test … I have also tried it with removal of Rcpp and replacement using that function and it works.
Vince Carey (15:53:48): > But the trick is correctly computing the dependencies and running over them. One has to recurse through the dependencies of dependencies.
Martin Morgan (16:46:45): > I pushed an update to AnVIL (v. 0.0.35) that (a) allowsgutils_cp()
to operate on a vector of files and (b) implementsinstall_precompiled()
, e.g., > > > devtools::load_all() > Loading AnVIL > > lib = tempfile(); dir.create(lib); install_precompiled("Rsamtools", lib, lib) > Warning: 7 packages not available for fast install: > 'methods', 'utils', 'graphics', 'stats', 'parallel', 'grDevices', > 'stats4' > copying 21 packages to local archive > * installing **binary** package 'Rsamtools' ... > * DONE (Rsamtools) > * installing **binary** package 'GenomeInfoDb' ... > * DONE (GenomeInfoDb) > * installing **binary** package 'GenomicRanges' ... >
> Thelib = tempfile()
stuff is to install into a new directory; without that it would updated installed packages in the standard location. Normally the function would be invoked only with a vector of package names. > > The warning in this case is an artifact of using thelib
argument, but could be useful if there were packages not available for precompiled install. > > The ‘copying xxx packages to local archive’ is usinggsutil_cp()
to copy only those files that have not been copied, in the current session, from the bucket. This is what install.packages() does, but usingdownload.file()
; here we are usinggsutil_cp()
which is faster. > > The binary packages are being installed usinginstall.packages()
on the downloads, so this is ‘legal’ from R’s perspective.
Martin Morgan (16:52:12): > There are a couple of things – > > I guess these are ‘tgz’ archives so built on a mac? I think they should be .tar.gz for linux. > > I get the packages from the bucket rather than http:, so that we can use gsutils_cp, which should be faster (especially in the cloud). > > I get package dependencies fromBiocManager::repositories()
but these could be wrong because, e.g., someone changes dependencies of their package after Vince’s snapshot. I think that the right thing to do is to generate the PACKAGES file withtools::write_PACKAGES('path/to/directory/with/*tgz', type = "mac.binary")
or the source tarballs used in the snapshot (...type = "source"
) or in the updated iteration the linux built tar balls.
Martin Morgan (16:56:39): > The full PACKAGES file can be read byavailable.packages()
(with the first argument being the path to PACKAGES, e.g., starting with http://
or file://
for a local version) to generate the dependency graph viatools::package_dependencies()
Martin Morgan (17:00:42): > I’m not 100% sure but I suspect that really the flow should be to build the source tar ballsR CMD build <pkg>
and then the installed package from thatR CMD INSTALL --build <pkg>.tar.gz
.
Kasper D. Hansen (20:36:15): > I am only skimming this, but you’re all aware of the various repos tools like packrat and drat and probably a few others (didn’t Genentech roll out their own?). I know this is different from the copying proposed here, but …
Vince Carey (22:45:59): > @Martin MorganI used .tgz naively, to indicate that these are binary. The .tgz are made by manually runningtar czf p.tgz p
for packagep
. Everything is done under linux, specifically within the bjstubbs/anvilca43k container, using a customized “high computer power” terra runtime with 250GB disk. If I useR CMD INSTALL --build
the binary package names will, I think, include a reference to the platform (e.g.,parody_1.44.0_R_x86_64-pc-linux-gnu.tar.gz
), which is fine. However my first objective was to get folders that would serve as elements of a.libPaths
entry. I am not clear that the product ofINSTALL --build
leads to a different result – maybe there is a checksum computed and stored? It doesn’t look like it. The production of source tarballs was not considered for this exercise. I think we should whiteboard a process that we need for AnVIL, which in my view focuses on getting the release versions of packages efficiently into user workspaces and studios. The package set should be quite stable for a release period, and, probably, dependency changes cannot occur under change policies for release. IMHO the main target is satisfying the needs of basic users, with the understanding that advanced users will be able to craft their own library images no matter what we do. We want to automate this process in AnVIL and make use of it in other settings if possible. Of course all best practices with respect to creating and maintaining repositories should be respected and I may have missed something – but I think the idea of a cloud-accessible “installed library” may not have a clear precedent or practice.@Kasper D. Hansenyes, I am aware that renv has superseded packrat and this will be relevant at the user level. Here the topic is “provisioning the system” and I don’t see a role for those reproducibility tools at this point, but I may be missing something.
Vince Carey (22:54:28): > To summarize – because the actual activities are so naive that folks may not grasp what was done: Given a vectorL
of package names,BiocManager::install(L, ncpus=40)
will, in our container, install them all (and all their dependencies) in a folder. These installations are immediately reusable as such within the container – one who has access to the disk folder can add to .libPaths and library() will work for all L packages. Unfortunately we don’t know how to make the disk folder generated in this way broadly accessible. Naive way to proceed is to copy the installed-package-folders into a bucket, and have a process for moving the contents of those folders to users as needed, for access through their .libPaths(). Dependency resolution is needed in this approach. More systematic way to proceed is to produce a CRAN-style repository of the (tarred/zipped binary) packages. Dependency resolution comes for free with the standard installation methodology. (N.B., in this settingL
is length 1828, derived from a manifest that I think represented all packages in 3.10 but it may have been 3.11.)
2020-02-19
Martin Morgan (05:40:23): > I don’t think there’s value in second-guessing R by assumingtar czf p.tgz p
is the same asR CMD INSTALL --build p
For pragmatic reasons the source tarballs can come from public (CRAN, Bioc) archives (is that your starting point?) rather than starting at the very beginning with a git clone. Might as well do it the R way. > > I don’t think we want to invent a new strategy for making versions available – we should mirror current release, rather than ‘snapshot at some point in time’. The difference can be seen by parsing the directory of archived packages athttp://bioconductor.org/packages/3.10/bioc/src/contrib/Archive/where each entry represents a package that has been updated. According to policy, these are ‘bug fixes’ so are more correct, rather than different from, the release snapshot.
Martin Morgan (07:27:31): > Also, as this moves toward ‘production’ it seems like the image used to build these packages should match exactly the image users have access to, rather than a derived image like BJ’s.@Nitesh Turagacan you add the links (to Dockerhub and to the Dockerfile source repository) tohttps://bioconductor.github.io/AnVIL_Admin/#now?@Nitesh TuragaI think on the tech call a week or so ago it was mentioned that at least the notebook image could also be run locally, not just on terra; is that the case for the RStudio / Bioconductor image? If not, can you follow up e.g., on the AnVIL slack? - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
Vince Carey (07:28:21): > I will certainly adopt best practices that you describe but to be completely clear – I did not use any git checkout or any tarball. I just used BiocManager::install to do what it does, and took its product from the running version of R.
Vince Carey (07:29:43): > The reason I used BJ’s container image was that I knew it produced a running Rstudio … I agree that the standard container images should be used in production.
Vince Carey (07:36:48): > If I understand the pkgserver approach, what we will have in our service is a collection of version-stamped binary tarballs, that will include multiple versions of every package in release, up to the present moment. If a depends on b and only b is modified in release, then two versions of b will be present with different version tags and different hashes, and a will be rebuilt (to capture effects of upstream changes) and two versions of a will be present with different hashes but the same version tag. Is that right?
Martin Morgan (07:41:22): > Currently pkgserver updates b but not a. But yes I think the technically correct requirement is to update the packages that depend on b, both in the archive and in the user installation.
Vince Carey (07:44:02): > Is pkgserver working with the zpacks subfolder of the bucket? I haven’t tried it yet.
Vince Carey (07:56:30): > Can BiocManager::install carry out the tasks ofINSTALL --build
so that we get a compliant binary tarball in addition to an installation? I was motivated by the realization thatBiocManager::install(L, Ncpus=40)
got us everything we needed if we were willing to reuse the installed package folders. We get 3160 packages in a few hours of a $5/hr machine. The simplicity is really attractive.
Martin Morgan (08:04:18): > BiocManager::install("BiocGenerics", INSTALL_opts = "--build")
does this, creating the output in the current working directory!
Martin Morgan (08:05:43): > BiocManager::install(ask=FALSE, INSTALL_opts = "--build")
creates updates of any newly versioned package… (oops, no it doesn’t, but it should…)
Martin Morgan (08:06:58): > pkgserver isn’t paying attention to zpacks yet…
Vince Carey (08:09:53): > OK, so we have a very simple automatiion potential then. Let me know the best way to populate a bucket so that pkgserver can do its magic and I will do that – we should analyze the costs so far and maybe@BJ Stubbscan look into that this afternoon.
BJ Stubbs (08:24:59): > Sure
Martin Morgan (09:19:14): > For what it’s worth, using ‘mocks’ for testing is super-convenient. I wanted to make sure thatINSTALL_opts
worked when updating packages with BiocManager. So I implemented my changes and then tested with > > testthat::with_mock( > old.packages = function(...) { > ## claim that BiocGenerics is out-of-date > cbind(Package = "BiocGenerics", Version = "0.32.0", LibPath = .libPaths()[1]) > }, > install.packages = function(pkgs, ..., INSTALL_opts) { > object <- > identical(pkgs, c(Package = "BiocGenerics") && > identical(INSTALL_opts, "--build") > expect_true(object) > }, > BiocManager::install(ask = FALSE, INSTALL_opts = "--build") > ) >
Vince Carey (10:04:52): > Is there a new version of BiocManager to be tried?
Martin Morgan (10:41:49): > just pushed to github, soBiocManager::install('Bioconductor/BiocManager')
!
Nitesh Turaga (11:49:54) (in thread): > @Vince Carey@BJ StubbsYou should be able to find the links to the images on the website now.
Sean Davis (12:48:39): > FYI: > Dr. Anthony Philippakis, M.D., Ph.D., Chief Data Officer of the Broad Institute, is presenting on efforts to facilitate storing, sharing, and analyzing genomic and clinical data in a scalable manner. He is the lead PI on the NHGRI-funded AnVIL project, through which our team has been able to participate in a pilot for the NIH-WRNMMC symposium, and have seen firsthand its potential for its use facilitating collaborative research. Dr. Philippakis is a highly engaging speaker given his unique expertise in clinical research as well as in informatics.Title:“A Data Platform for Public Health”Date:Tuesday, February 25, 2020Time:12:00 pmLocation:NIH Campus, Building 1 Wilson Hall > Abstract: The life sciences are in the midst of a data revolution. Inexpensive and accurate genome sequencing is a reality, advanced imaging is routine, and clinical data is increasingly stored in electronic form. In principle, these advances have brought us to the threshold of a new era in medicine, one where the data sciences hold the potential to propel our understanding and treatment of disease. In practice, we are stymied by the operational challenges associated with storing, sharing, and analyzing genomic and clinical data at scale. In this talk, I will overview Broad’s efforts at building a data platform to address these unmet needs by 1) building patient-facing software, 2) performing data engineering, 3) creating machine learning tools, and 4) building a cloud-based researcher environment (Terra). I will also overview flagship applications in precision medicine, infectious disease surveillance, and clinical trial design. > The presentation will also be available to stream over NIH videocast.https://videocast.nih.gov/summary.asp?live=35885&bhcp=1 - Attachment (videocast.nih.gov): NIH VideoCast - A Data Platform for Public Health > A Data Platform for Public Health
Nitesh Turaga (13:19:30) (in thread): > The local testing works
2020-02-20
Martin Morgan (05:47:51) (in thread): > unfortunately I don’t think R passes--build
to the parallel processes, so usingNcpus
and--build
in combination does not work. I also looked at the code and the package directory is tarred with > > utils::tar(filepath, curPkg, compression = "gzip", > compression_level = 9L, > tar = Sys.getenv("R_INSTALL_TAR")) >
Vince Carey (08:27:03) (in thread): > My take on this is that my manual tar czvf should do … if I compute the filepath in a suitable way (it will have platform information). Nonparallel building is a nonstarter IMHO. Do you want to try to fix this? I was about to start a new binary build process but I think I will hold off.
Vince Carey (09:35:18) (in thread): > Ncpus is not the only way to parallelize the process effectively but I don’t have time to set up the plan via dependency analysis.
Martin Morgan (09:45:20) (in thread): > yes I agree just creating archives of the installed packages is the way to go now…
2020-03-02
Lori Shepherd (08:52:41): > A link to the agenda for tomorrows meeting:https://docs.google.com/document/d/1X7JQ17ZLYJz-QAwJ6UlVpIP0eUPJ4ACvU27bhWRkliI/edit?usp=sharingPlease use the agenda to make specific notes but we will be reviewing the projects listed in the project boardhttps://github.com/Bioconductor/AnVIL_Admin/projects/5- As a reminder March is the end of Q1 in AnVIL eyes -
Sean Davis (12:38:59): > I notice that one of the topics on the agenda is documentation and workshops. Ours, dealing with the Cancer Genomics Cloud Pilots, are dated, but perhaps there are some useful concepts or materials here:http://teamcgc.nci.nih.gov.s3-website-us-east-1.amazonaws.com/Not bioc-focused, though.
2020-03-03
Lori Shepherd (10:01:26): > meeting linkhttps://bluejeans.com/480153337/webrtc?src=calendarLinkhappening now
Lori Shepherd (11:03:39): > And just to keep in the back of our minds - the next time Bioconductor presents at the Technically meeting is the end of March - Mar 31
Nitesh Turaga (12:13:04): > Alright, on larger files and repeating the experiments multiple times, GCSConnection’s http requests win the time challenge for both small and large files. - File (PNG): Screen Shot 2020-03-03 at 12.11.44 PM.png - File (PNG): Screen Shot 2020-03-03 at 12.07.09 PM.png - File (PNG): Screen Shot 2020-03-03 at 12.07.00 PM.png
Nitesh Turaga (12:16:52): > The > 1. README file is 5kb > 2. the fasta.gz file is 818MB > 3.the snp.tar.gz file is 4.7 GB.
Martin Morgan (12:31:20): > thanks! I’m a little surprised but the difference seems clear!
Nitesh Turaga (12:35:39): > Even listing files is faster, - File (PNG): Screen Shot 2020-03-03 at 12.34.53 PM.png
2020-03-05
Vince Carey (08:46:46): > There is a reference to learnr in the anvil slackhttps://the-anvil.slack.com/archives/CE15M9W3T/p1582838023000700– i haven’t had a chance to look at this but it does seem potentially useful for tutorial designs
2020-03-10
Sehyun Oh (16:04:00): > FYI, the F2F interoperability workshop scheduled next Tue/Wed at Baltimore is now a remote meeting. Temporary schedule is Tue/Wed 11am-4pm EDT, if anyone interested in any section.
Nitesh Turaga (16:04:31): > I might stream it during those two days and see what is interesting.
Sehyun Oh (16:07:56): > Updated schedule (https://docs.google.com/document/d/1Jpsbqw_PSp_7TpBRT4bYF1f8wRqKZgKzH0vFVOIOuPs/edit#heading=h.74jhbdwgacle):
Sehyun Oh (16:07:57): > Draft remote agenda (updated 09MAR20) > Day 1 March 17 - Component introduction > * 11 am (EDT) - Stack (IC) introductions > * 11:10 am - Systems Interoperation WG intro/update (5 min) > * 11:15 am - Gen3 > * 12:00 pm - Terra/Firecloud > * 12:45 pm - break > 1:00 pm - Dockstore > * 1:30 pm - PIC-SURE > * 1:45 pm - Seven Bridges > * 2:30 pm - ISB-CGC > * 3:15 pm - closing > Day 2 March 18 - Use cases > * 11:00 am (EDT) - BDC/AnVIL gold standard use case > * 12:00 pm - SB-CGC use case > * 1:00 pm - break > 1:15 pm - KFDRC & BDC use case > * 2:15 pm - ISB-CGC/CRDC use case > * 3:15 pm - closing
2020-03-16
Lori Shepherd (12:11:01): > Our bi-weekly meeting is tomorrow March 17th . Here is the link to the agenda - we will be talking about the current sticky on the board as well as what to present in two weeks to the technical meeting -https://docs.google.com/document/d/1FEXddDvwkdBM03Sm7NRqCPi2tKAftvX7sVZVjdeIuYg/edit?usp=sharing
2020-03-17
Lori Shepherd (10:29:47): > https://bioconductor.github.io/AnVIL_Admin/ - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
Vince Carey (12:49:14): > Just wanted to recall that we had a proposal to set an environment variable within a Bioc container that will identify the container uniquely, and then we would like if possible to pass that identifier into the built packages. We never discussed this but one could imagine adding a non-exported function that evaluates to the container ID with which the package was built
Martin Morgan (13:25:25): > There’s actually now information in /etc/environment in the docker container > > root@52d88733db54:/# sudo su - rstudio > $ env|grep VER > BIOCONDUCTOR_DOCKER_VERSION=3.10.2 >
> we were thinking that this would be exposed as environment variables, but that turns out not to be correct for all user / initial command configurations; something we’re working on…
Vince Carey (18:17:21): > Is there a gcr container that I can use for Bioc 3.10 + Rstudio to build packages with?@Nitesh Turaga
Vince Carey (18:46:01): > It needs to be such that the ‘notebook’ type that runs in a workspace is Rstudio. I tried to useus.gcr.io/broad-dsp-gcr-public/terra-jupyter-bioconductor:0.0.12for a custom terra runtime and AFAICT it starts jupyter only
Vince Carey (19:30:24): > I am ready to build the binaries in GCP but I want to use the best Rstudio-capable container for Bioc 3.10. I could use bioconductor/bioconductor_full:release … perhaps? Blocker follows on that idea:
Vince Carey (19:30:47): - File (PNG): badcont.png
Vince Carey (19:31:51): > The one that I used previously was bjstubbs/anvilca43k which was tailored to run Rstudio … probably we want to use something more up to date and centrally curated
Martin Morgan (21:02:08): > There is a gcr RStudio Bioc image mentioned athttps://bioconductor.github.io/AnVIL_Admin/ - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
2020-03-18
Vince Carey (06:23:40): > Great, I am using that now. Here is a question. Didhttps://storage.googleapis.com/biocbbs_2020a/zpacks/PACKAGES.gzturn the biocbbs_2020a/zpacks resource into a CRAN-style repository or is something else needed? I will likely repopulate that folder with properly named binary packages after the GCP-based built today.
Vince Carey (06:25:09): > I did get > > Installation path not writeable, unable to update packages: boot, foreign, KernSmooth, > lattice, MASS, Matrix, mgcv, nlme, nnet, survival >
> on first update. Will ignore for now but thought this had been addressed.
Vince Carey (06:29:35): > Also got > > Failed to copy the script/BiocCheck or script/BiocCheckGitClone script > to /usr/local/lib/R/bin. If you want to be able to run 'R CMD > BiocCheck' you'll need to copy it yourself to a directory on your PATH, > making sure it is executable. See the BiocCheck vignette for more > information. >
Vince Carey (07:36:43): > > Warning messages: > 1: packages ‘rHVDM’, ‘plateCore’, ‘charm’, ‘PathNet’, ‘Rchemcpp’, ‘exomePeak’, ‘flipflop’, ‘birte’, ‘SEPA’, ‘CNPBayes’, ‘dSimer’, ‘condcomp’, ‘brainImageR’, ‘gpuMagic’, ‘MBQN’ are not available (for R version 3.6.1) > 2: In install.packages(...) : installation of one or more packages failed, > probably ‘Rmpi’, ‘xps’, ‘mlm4omics’, ‘netbenchmark’, ‘CountClust’, ‘sojourner’, ‘GOSemSim’, ‘chroGPS’, ‘MSstatsQC’, ‘DOSE’, ‘Rcpi’, ‘cellTree’, ‘readat’, ‘CHARGE’, ‘MSstatsQCgui’, ‘GAPGOM’, ‘ViSEAGO’, ‘pulsedSilac’, ‘tRanslatome’, ‘enrichplot’, ‘ccfindR’, ‘scAlign’, ‘waddR’, ‘clusterProfiler’, ‘ReactomePA’, ‘meshes’, ‘scPipe’, ‘cobindR’, ‘DAPAR’, ‘debrowser’, ‘bioCancer’, ‘eegc’, ‘LINC’, ‘CEMiTool’, ‘miRspongeR’, ‘MAGeCKFlute’, ‘GDCRNATools’, ‘epihet’, ‘fcoex’, ‘signatureSearch’, ‘ChIPseeker’, ‘Prostar’, ‘miRSM’, ‘SpidermiR’, ‘MoonlightR’, ‘PathwaySplice’, ‘esATAC’, ‘RNASeqR’, ‘profileplyr’, ‘enrichTF’, ‘StarBioTrek’, ‘ALPS’, ‘TCGAbiolinksGUI’, ‘methylGSA’, ‘scTensor’ > > dim(installed.packages()) > [1] 3092 16 >
Vince Carey (07:40:02): > took an hour and 15 minutes on the $3.06/hr runtime
Vince Carey (07:41:10): > Ncpus=45 … let’s see if some of the failed will install on a second try
Vince Carey (07:48:30): > Second round did not help: For completeness (sorry to take up so much channel but the information could be strategically useful)
Vince Carey (07:48:30): > > ERROR: lazy loading failed for package ‘scAlign’ > * removing ‘/usr/local/lib/R/site-library/scAlign’ > ERROR: dependency ‘eva’ is not available for package ‘waddR’ > * removing ‘/usr/local/lib/R/site-library/waddR’ > ERROR: dependency ‘hashmap’ is not available for package ‘scPipe’ > * removing ‘/usr/local/lib/R/site-library/scPipe’ > ERROR: dependency ‘rtfbs’ is not available for package ‘cobindR’ > * removing ‘/usr/local/lib/R/site-library/cobindR’ > ERROR: dependency ‘GOSemSim’ is not available for package ‘DOSE’ > * removing ‘/usr/local/lib/R/site-library/DOSE’ > ERROR: dependency ‘GOSemSim’ is not available for package ‘Rcpi’ > * removing ‘/usr/local/lib/R/site-library/Rcpi’ > ERROR: dependencies ‘MSstatsQC’, ‘RecordLinkage’ are not available for package ‘MSstatsQCgui’ > * removing ‘/usr/local/lib/R/site-library/MSstatsQCgui’ > ERROR: dependency ‘GOSemSim’ is not available for package ‘GAPGOM’ > * removing ‘/usr/local/lib/R/site-library/GAPGOM’ > ERROR: dependency ‘GOSemSim’ is not available for package ‘ViSEAGO’ > * removing ‘/usr/local/lib/R/site-library/ViSEAGO’ > ERROR: dependency ‘GOSemSim’ is not available for package ‘tRanslatome’ > * removing ‘/usr/local/lib/R/site-library/tRanslatome’ > ERROR: dependency ‘Rmpi’ is not available for package ‘ccfindR’ > * removing ‘/usr/local/lib/R/site-library/ccfindR’ > ERROR: dependencies ‘DOSE’, ‘GOSemSim’ are not available for package ‘enrichplot’ > * removing ‘/usr/local/lib/R/site-library/enrichplot’ > ERROR: dependency ‘DOSE’ is not available for package ‘PathwaySplice’ > * removing ‘/usr/local/lib/R/site-library/PathwaySplice’ > ERROR: dependencies ‘DOSE’, ‘enrichplot’, ‘GOSemSim’ are not available for package ‘clusterProfiler’ > * removing ‘/usr/local/lib/R/site-library/clusterProfiler’ > ERROR: dependencies ‘DOSE’, ‘enrichplot’ are not available for package ‘ReactomePA’ > * removing ‘/usr/local/lib/R/site-library/ReactomePA’ > ERROR: dependencies ‘DOSE’, ‘enrichplot’, ‘GOSemSim’ are not available for package ‘meshes’ > * removing ‘/usr/local/lib/R/site-library/meshes’ > ERROR: dependency ‘enrichplot’ is not available for package ‘ChIPseeker’ > * removing ‘/usr/local/lib/R/site-library/ChIPseeker’ > ERROR: dependency ‘clusterProfiler’ is not available for package ‘DAPAR’ > * removing ‘/usr/local/lib/R/site-library/DAPAR’ > ERROR: dependencies ‘DOSE’, ‘clusterProfiler’, ‘enrichplot’ are not available for package ‘debrowser’ > * removing ‘/usr/local/lib/R/site-library/debrowser’ > ERROR: dependencies ‘DOSE’, ‘clusterProfiler’, ‘ReactomePA’ are not available for package ‘bioCancer’ > * removing ‘/usr/local/lib/R/site-library/bioCancer’ > ERROR: dependencies ‘clusterProfiler’, ‘DOSE’ are not available for package ‘eegc’ > * removing ‘/usr/local/lib/R/site-library/eegc’ > ERROR: dependencies ‘DOSE’, ‘clusterProfiler’, ‘ReactomePA’ are not available for package ‘LINC’ > * removing ‘/usr/local/lib/R/site-library/LINC’ > ERROR: dependency ‘clusterProfiler’ is not available for package ‘CEMiTool’ > * removing ‘/usr/local/lib/R/site-library/CEMiTool’ > ERROR: dependencies ‘clusterProfiler’, ‘ReactomePA’, ‘DOSE’ are not available for package ‘miRspongeR’ > * removing ‘/usr/local/lib/R/site-library/miRspongeR’ > ERROR: dependencies ‘clusterProfiler’, ‘DOSE’, ‘enrichplot’ are not available for package ‘MAGeCKFlute’ > * removing ‘/usr/local/lib/R/site-library/MAGeCKFlute’ > ERROR: dependencies ‘clusterProfiler’, ‘DOSE’ are not available for package ‘GDCRNATools’ > * removing ‘/usr/local/lib/R/site-library/GDCRNATools’ > ERROR: dependency ‘ReactomePA’ is not available for package ‘epihet’ > * removing ‘/usr/local/lib/R/site-library/epihet’ > ERROR: dependency ‘clusterProfiler’ is not available for package ‘fcoex’ > * removing ‘/usr/local/lib/R/site-library/fcoex’ > ERROR: dependencies ‘clusterProfiler’, ‘DOSE’ are not available for package ‘signatureSearch’ > * removing ‘/usr/local/lib/R/site-library/signatureSearch’ > ERROR: dependencies ‘clusterProfiler’, ‘DOSE’ are not available for package ‘MoonlightR’ > * removing ‘/usr/local/lib/R/site-library/MoonlightR’ > ERROR: dependencies ‘ChIPseeker’, ‘clusterProfiler’ are not available for package ‘esATAC’ > * removing ‘/usr/local/lib/R/site-library/esATAC’ > ERROR: dependencies ‘clusterProfiler’, ‘DOSE’ are not available for package ‘RNASeqR’ > * removing ‘/usr/local/lib/R/site-library/RNASeqR’ > ERROR: dependency ‘ChIPseeker’ is not available for package ‘profileplyr’ > * removing ‘/usr/local/lib/R/site-library/profileplyr’ > ERROR: dependency ‘clusterProfiler’ is not available for package ‘enrichTF’ > * removing ‘/usr/local/lib/R/site-library/enrichTF’ > ERROR: dependency ‘ChIPseeker’ is not available for package ‘ALPS’ > * removing ‘/usr/local/lib/R/site-library/ALPS’ > ERROR: dependency ‘clusterProfiler’ is not available for package ‘TCGAbiolinksGUI’ > * removing ‘/usr/local/lib/R/site-library/TCGAbiolinksGUI’ > ERROR: dependency ‘clusterProfiler’ is not available for package ‘methylGSA’ > * removing ‘/usr/local/lib/R/site-library/methylGSA’ > ERROR: dependencies ‘ReactomePA’, ‘DOSE’ are not available for package ‘scTensor’ > * removing ‘/usr/local/lib/R/site-library/scTensor’ > ERROR: dependency ‘DAPAR’ is not available for package ‘Prostar’ > * removing ‘/usr/local/lib/R/site-library/Prostar’ > ERROR: dependency ‘miRspongeR’ is not available for package ‘miRSM’ > * removing ‘/usr/local/lib/R/site-library/miRSM’ > ERROR: dependency ‘MAGeCKFlute’ is not available for package ‘SpidermiR’ > * removing ‘/usr/local/lib/R/site-library/SpidermiR’ > ERROR: dependency ‘SpidermiR’ is not available for package ‘StarBioTrek’ > * removing ‘/usr/local/lib/R/site-library/StarBioTrek’ >
Vince Carey (12:13:51): > gs://biocbbs_2020a/bioc_binarieshas 3063 binary packages and PACKAGES.gz > first few > > %vjcair> gsutil ls[gs://biocbbs_2020a/bioc_binaries/|head](gs://biocbbs_2020a/bioc_binaries/|head)[gs://biocbbs_2020a/bioc_binaries/](gs://biocbbs_2020a/bioc_binaries/)[gs://biocbbs_2020a/bioc_binaries/ABAData_1.16.0_R_x86_64-pc-linux-gnu.tar.gz](gs://biocbbs_2020a/bioc_binaries/ABAData_1.16.0_R_x86_64-pc-linux-gnu.tar.gz)[gs://biocbbs_2020a/bioc_binaries/ABAEnrichment_1.16.0_R_x86_64-pc-linux-gnu.tar.gz](gs://biocbbs_2020a/bioc_binaries/ABAEnrichment_1.16.0_R_x86_64-pc-linux-gnu.tar.gz)[gs://biocbbs_2020a/bioc_binaries/ABSSeq_1.40.0_R_x86_64-pc-linux-gnu.tar.gz](gs://biocbbs_2020a/bioc_binaries/ABSSeq_1.40.0_R_x86_64-pc-linux-gnu.tar.gz) >
Vince Carey (12:21:24): > Via http it looks like > > [https://storage.googleapis.com/biocbbs_2020a/bioc_binaries/ABAData_1.16.0_R_x86_64-pc-linux-gnu.tar.gz](https://storage.googleapis.com/biocbbs_2020a/bioc_binaries/ABAData_1.16.0_R_x86_64-pc-linux-gnu.tar.gz) >
2020-03-19
Martin Morgan (11:58:36): > There are a couple of things to make these available for binary install via standard R commands likeinstall.packages()
(and laterBiocManager::install()
). > 1. The first is that the binaries should be renamed as, e.g,, > > > ABAData_1.16.0.tar.gz >
> (this is a ‘workaround’, R won’t recognize the R_x86… part of the file name in a CRAN-style repository). > > 2. The packages need to be placed into a hierarchy that includes > > .../src/contrib/ABAData_1.16.0.tar.gz >
> for instance > > [https://storage.googleapis.com/biocbbs_2020a/bioc_binaries/src/contrib/ABAData_1.16.0.tar.gz](https://storage.googleapis.com/biocbbs_2020a/bioc_binaries/src/contrib/ABAData_1.16.0.tar.gz) >
> 3. And finally one needs a PACKAGES file insrc/contrib/PACKAGES
. This can be generated by running > > tools::write_PACKAGES(".../src/contrib/") >
> I guess steps 1 and 3 are most easily handled outside of the google bucket (maybe there’s some gsutil foo for step 1?gsutil mv
?) > > On success, > > available.packages(repos ="[https://storage.googleapis.com/biocbbs_2020a/bioc_binaries](https://storage.googleapis.com/biocbbs_2020a/bioc_binaries)") >
> should work, as should > > install.packages("Biobase", repos ="[https://storage.googleapis.com/biocbbs_2020a/bioc_binaries](https://storage.googleapis.com/biocbbs_2020a/bioc_binaries)") >
Daniela Cassol (12:59:19): > @Daniela Cassol has joined the channel
Vince Carey (13:00:40): > I will get on this … we’ll have the repo set up by tomorrow
Vince Carey (15:18:12): > despite the fact that it is a simple metadata change it is taking hours for gsutil to reposition the files under src/contrib but we are 2/3 through so i continue by brute force
Vince Carey (15:19:35): > frontloading the following concept. we need to get the “BuildingContainer” metadata into the packages. I propose that it is a field in DESCRIPTION and its value can be checked against the env var that you are planting in the container
Vince Carey (15:48:21): > It’s all done. The repository is as it should be, with URL as given by Martin above. But install.packages does not handle dependencies parameter appropriately. It seems to ignore it with these binary installs. BiocManager::install does not allow specification of repos=. We can compute necessary dependencies externally to the installer functions but I personally will not do that any time soon.
Martin Morgan (15:58:46): > Can you clarify about dependences parameter? I think with this approach ‘BuildingContainer’ is intrinsically defined by the CRAN-style repository – one repository per building container.
Martin Morgan (16:04:49): > You mean dependencies aren’t installed; ok I think this can be fixed…
Martin Morgan (16:10:20): > The PACKAGES file looks like > > Package: a4 > Version: 1.34.0 > > Package: a4Base > Version: 1.34.0 >
> buttools::write_PACKAGES()
would generate something that parses the dependencies and other metadata, as > > Package: Biobase > Version: 2.47.3 > Depends: R (>= 2.10), BiocGenerics (>= 0.27.1), utils > Imports: methods > Suggests: tools, tkWidgets, ALL, RUnit, golubEsets > License: Artistic-2.0 > MD5sum: 1a029be082b3140254d6ffbc252e990b > NeedsCompilation: no > > Package: BiocGenerics > Version: 0.33.2 > Depends: R (>= 3.6.0), methods, utils, graphics, stats, parallel > Imports: methods, utils, graphics, stats, parallel > Suggests: Biobase, S4Vectors, IRanges, GenomicRanges, DelayedArray, > Biostrings, Rsamtools, AnnotationDbi, oligoClasses, oligo, > affyPLM, flowClust, affy, DESeq2, MSnbase, annotate, RUnit > License: Artistic-2.0 > MD5sum: 622ec3685fccee61a070316ac104050a > NeedsCompilation: no >
> install.packages()
reconstructs dependencies from this file.
Vince Carey (16:12:43): > ah – I used a simpler PACKAGES.gz that I had made previously
Martin Morgan (16:13:45): > with respect to BiocManager, setting site_repository =
to the binary repositories will put the binaries ahead of the source repos, give preference to the binary version of the package over the source version, provided binary version is >= source version – in practice as the binary repo became stale more packages would be installed from source…
Martin Morgan (16:29:07): > A little experimenting…write_PACKAGES()
can be run on a directory of original files > > ABAData_1.16.0_R_x86_64-pc-linux-gnu.tar.gz >
> and I believe usingwrite_PACKAGES(..., addFiles = TRUE)
could have been used to avoid step 1, renaming the files to simple .tar.gz.
Vince Carey (16:58:05): > Well, I have placed a PACKAGES file with the > > Package: a4 > Version: 1.34.0 > Depends: a4Base, a4Preproc, a4Classif, a4Core, a4Reporting > Suggests: MLP, nlcv, ALL, Cairo > License: GPL-3 > NeedsCompilation: no > > Package: a4Base > Version: 1.34.0 > Depends: methods, graphics, grid, Biobase, AnnotationDbi, annaffy, mpm, genefilter, > limma, multtest, glmnet, a4Preproc, a4Core, gplots > Suggests: Cairo, ALL > Enhances: gridSVG, JavaGD > License: GPL-3 > NeedsCompilation: no >
> format atgs://biocbbs_2020a/bioc_binaries/src/contrib/to no avail. Also a gzipped version of the same content. In a bioconductor_full:release session install.packages is not resolving dependencies.
Martin Morgan (17:08:03): > I just tried to download the PACKAGE file now via http and got the old version, and a few seconds later the new version; maybe it was stale / cached? n.b., bioconductor/bioconductor_docker is where you want to be… (bioconductor_full has been replaced)
Martin Morgan (17:10:20): > > > remove.packages(c("Biobase", "BiocGenerics")) > Removing packages from '/usr/local/lib/R/site-library' > (as 'lib' is unspecified) > > repo = "[https://storage.googleapis.com/biocbbs_2020a/bioc_binaries](https://storage.googleapis.com/biocbbs_2020a/bioc_binaries)" > > install.packages("Biobase", repos = repo) > Installing package into '/usr/local/lib/R/site-library' > (as 'lib' is unspecified) > also installing the dependency 'BiocGenerics' > > trying URL '[https://storage.googleapis.com/biocbbs_2020a/bioc_binaries/src/contrib/BiocGenerics_0.32.0.tar.gz](https://storage.googleapis.com/biocbbs_2020a/bioc_binaries/src/contrib/BiocGenerics_0.32.0.tar.gz)' > Content type 'application/x-tar' length 646389 bytes (631 KB) > ================================================== > downloaded 631 KB > > trying URL '[https://storage.googleapis.com/biocbbs_2020a/bioc_binaries/src/contrib/Biobase_2.46.0.tar.gz](https://storage.googleapis.com/biocbbs_2020a/bioc_binaries/src/contrib/Biobase_2.46.0.tar.gz)' > Content type 'application/x-tar' length 2313195 bytes (2.2 MB) > ================================================== > downloaded 2.2 MB > > Bioconductor version 3.11 (BiocManager 1.30.10), ?BiocManager::install for help > * installing **binary** package 'BiocGenerics' ... > * DONE (BiocGenerics) > Bioconductor version 3.11 (BiocManager 1.30.10), ?BiocManager::install for help > * installing **binary** package 'Biobase' ... > * DONE (Biobase) > > The downloaded source packages are in > '/tmp/RtmpR1Gz6r/downloaded_packages' >
Martin Morgan (17:14:31): > Also, I don’t know whether the docker image you built these on is exactly compatible with the bioconductor_docker image…?
Martin Morgan (17:15:21): > And this seems really exciting!
Vince Carey (18:50:57): > The docker container is the one for AnVIL –us.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.3– recall that the build is within AnVIL. We can do a “free range” build with another container/storage target as needed.
Vince Carey (18:51:37): > I was thinking that the PACKAGES might have been cached … explaining my bad results on update. Looks like we are out of the woods.
2020-03-20
Nitesh Turaga (09:35:33): > This works like a charm.
Sean Davis (12:51:44): > This looks great,@Vince Carey. You might consider using some ENVIRONMENT variables inside the docker container (that can be easily overridden) for things like repo path, etc., and then a little custom startup to set stuff appropriately.
Nitesh Turaga (12:58:15): > The terra application seems to have trouble rendering display after a little time passes on a high compute instance. It just freezes and doesn’t let me see either the terminal or the jupyter notebook. > > Anyone else have that issue?
Nitesh Turaga (12:58:59): - File (PNG): Screen Shot 2020-03-20 at 12.58.40 PM.png
Vince Carey (13:02:19): > There have been hiccups, I think they even did an upgrade of some kind while I was making the repo. Is this reproducible?
Nitesh Turaga (13:04:40): > Yes, I ran an install script to do a triage of the packages which failed to compile on thehttp://us.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.3image. I’ll attach the script, but the instance configuration details are in the screenshot.
Vince Carey (13:05:04): > I saw that I guess I should just try that and see if it will freeze.
Vince Carey (13:05:34) (in thread): > Thanks Sean. It is incredibly simple to do the package construction and archiving. But this concept of marking the originating container and propagating the mark to the binary needs attention. Martin and Nitesh did some work on environment variables but I think it was found wanting. Open to suggestions on startup … I don’t have a good feel for where the gaps are.
Nitesh Turaga (13:06:41): > > options(Ncpus = 54) > > Ncpus = getOption('Ncpus', 1L) > > installed <- rownames(installed.packages()) > biocsoft <- available.packages(repos = BiocManager::repositories()[["BioCsoft"]]) > to_install <- rownames(biocsoft)[!rownames(biocsoft) %in% installed] > > BiocManager::install(to_install, Ncpus = Ncpus) >
Nitesh Turaga (13:07:35): > I’m trying to see why so many packages failed to install on that release imagehttp://us.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.3. I’m trying to reproduce the package errors you got.
Nitesh Turaga (14:25:48): > Yeah, happened 3 times now. I’m going to test the image in a different way now. This isn’t working. The terra app freezes too much.
Vince Carey (16:29:14): > This may be silly – but I use 300GB disk. You might not have enough for swap?
Nitesh Turaga (16:37:21): > I see….
2020-03-23
Nitesh Turaga (11:18:03): > Hi@Vince CareyI’ve been able to replicate the errors you are getting for the package installations problems. Per my count we have 167 packages that don’t install.
Vince Carey (11:18:33): > so the bigger disk got you going without the hangups?
Nitesh Turaga (11:18:39): > However, i’ve noticed that the broad people have made multiple changes to the terra-jupyter-r image.
Nitesh Turaga (11:18:50): > Yes, that was it! Thanks for that tip.
Nitesh Turaga (11:19:14): > And theterra-jupyter-bioconductor
image is updated because of that.
Vince Carey (11:22:04): > I did not log the builds … did you? Are there any common sources of failure?
Nitesh Turaga (11:24:17): > I did log the build failures. And they all seem to stem from missing dependencies, which is very surprising. I’m trying to look at what changed fromhttp://us.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.3http://us.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.12https://github.com/DataBiosphere/terra-docker/blob/master/terra-jupyter-bioconductor/CHANGELOG.md
Nitesh Turaga (11:27:28): > It has jumped many versions, and is missing dependencies likelibpq
- File (PNG): Screen Shot 2020-03-23 at 11.24.40 AM.png
Martin Morgan (11:30:13): > so did the Broad people look at the image and say ‘yeah, we don’t need those libraries [that that bioc folks added] so let’s remove them?’ I guess that’s a failure to communicate rather than ill-intentioned…
Nitesh Turaga (11:31:24): > I’m looking at the source now to figure out why these dependencies are not available….if they were removed or something else changed.
Vince Carey (11:31:52): > :+1:
Nitesh Turaga (11:36:46): > If you’d like to replicate these errors locally, and see what the issue is, one way is > > docker run --rm -it -p 8000:8000[us.gcr.io/broad-dsp-gcr-public/terra-jupyter-bioconductor:0.0.12](http://us.gcr.io/broad-dsp-gcr-public/terra-jupyter-bioconductor:0.0.12) >
> https://localhost:8000/notebooksAnd the list of packages is as listed above.
Nitesh Turaga (11:58:50): > Can we think of reasons why the same package would install on a local test of this image? > > for example, > > i’m able to install the packagetrena
when I run the image locally, but not able to when it’s run on the terra app.
Nitesh Turaga (12:22:41): > thejupyter-user
doesn’t seem to have access to some of the installed libraries which is very weird.
2020-03-24
Nitesh Turaga (09:49:51): > Hi, I just wanted to update on what I was working on yesterday with a longer post. I’m attaching the triage of each of the failed packages as two markdown files. > 1. I tested the installation of all the packages onterra-jupyter-bioconductor:0.0.3
and most of the package failures Vince pointed out are because of dependency packages being archived on CRAN (these packages are not available on CRAN anymore for R 3.6.1). The other ones are expected failures because of Rmpi, xps and a couple of other packages. > 2. The other thing i wanted to point out is there is now aus.gcr.io/broad-dsp-gcr-public/terra-jupyter-bioconductor:0.0.12image. This is 9 versions ahead of our version for which the initial binaries were built. So maybe the binaries need to be rebuilt for the latest version? And also i’ll try to keep track of when these versions are being changed. > 3. There is a marked difference in what each of the images available on terra. I assumed that the default (GATK
image) one was based on ourterra-jupyter-bioconductor
image and had trouble replicating the errors. And noticed that while switching images during the session, didn’t particularly solve the problem for me. My thinking is the base standardized image should include bioconductor system dependencies. - File (Markdown (raw)): bioc_0.0.3.md - File (Markdown (raw)): bioc_0.0.12.md
Nitesh Turaga (09:50:44): > The filebioc_0.0.3.md
is the triage with terra-jupyter-bioconductor:0.0.3 > > The filebioc_0.0.12.md
is the triage with terra-jupyter-bioconductor:0.0.12
Nitesh Turaga (09:52:13): > These files aren’t exactly formatted in a very efficient way, but I’m happy to give a run down of packages. But theto_install
on the top of page should be the reference for what Bioconductor packages fail to install.
Vince Carey (09:58:28): > This is great@Nitesh Turaga. It sounds like I need to run the binary build process again? If so I will try to document it so that it can be run by anyone. I will try to get to it in the next couple of days. Any advice you have on accumulating the logs for failed/warning packages would be welcome – this would be collected in the context of a BiocManager::install run on 1800 packages. It would be nice to have some kind of event-based process of creating the binaries, whenever the container image changes.
Nitesh Turaga (10:25:33): > I used the terminal in the terra application@Vince Carey. And I ran my little script with (below), withR CMD BATCH install.R &
. This logged the installation ininstall.Rout
. Once it was done I simply updated myto_install
vector and took a look insideinstall.Rout
, and tried to install these packages again to log the issues. > > options(Ncpus = 54) > installed <- rownames(installed.packages()) > biocsoft <- available.packages(repos = BiocManager::repositories()[["BioCsoft"]]) > to_install <- rownames(biocsoft)[!rownames(biocsoft) %in% installed] > BiocManager::install(to_install) > > ## post install check to update "to_install" > installed <- rownames(installed.packages()) > to_install <- rownames(biocsoft)[!rownames(biocsoft) %in% installed] >
Lori Shepherd (13:24:13): > <!channel>some good news. Since the tech meeting was cancelled last week everything shifted one week so we now have until April 7th for our presentation. Lets plan on having some ideas and demo materials for our own bi-weekly meeting next tue. as well as discuss anything in particular we want to stress to the rest of the technical development team as needs or strong requests or blockers from our end
BJ Stubbs (16:02:33): > @Martin Morgan@Vince CareyI updated the billing code and filled out the vignettehttps://github.com/bjstubbs/AnVILBilling
2020-03-30
Lori Shepherd (14:33:43): > Rich and Mo have asked us if we would still be willing to present tomorrow at the technical meeting at 4 pm as the Gen3 folks have asked to swap. Martin and I felt like the team has made progress and that we could put together our presentation still for tomorrow’s meeting. So the agenda for tomorrow’s Bioc call will focus on getting organized and prepared for the Technical meeting presentationhttps://docs.google.com/document/d/1DNab_8hsqd0Yf2BQJVu-c5EUVaFiMBE7fPiFX7tz6og/edit?usp=sharing
2020-03-31
Martin Morgan (08:35:37): > The slide deck for today’s presentation is athttps://drive.google.com/open?id=1p-HmaehkbT3OZ9GJ54IiNA8cdPbiYmJys6iZSA5l6VU
Martin Morgan (09:33:21): > @Nitesh Turagacan you updatehttps://bioconductor.github.io/AnVIL_Admin/(a) mention 0.0.12 image in ‘Project Activities – Detailed / Containers’ and (b) not sure exactly, but in the ‘Available Now’ / ‘In Progress’ sections there are mentions without links to the containers; maybe these should just be links to ‘Project Activities – Detailed / Containers’ section?? - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
Lori Shepherd (10:02:06): > Link to bluejeans if anyone needs ithttps://bluejeans.com/480153337/webrtc
Vince Carey (14:16:09): > Quick comments on binary packages. We have focused on software, and probably need to clarify that this approach is essential for preserving the patterns of interaction familiar to bioinformaticians – “expand software repertory on demand, with persistence”. We could address this problem with a container that had 4000 packages installed but that seems impractical. 4000 binary packages are managed in a few 100 GB of disk, installation = file transfer. This approach seems quite sound for the “release branch” where most packages are quite stable. On the devel branch this approach may need more management as both R and package sources can be unstable. Finally the Annotation and Experiment packages have not to my knowledge been addressed. It isn’t clear that much more is needed, but one could argue that the rarity of compilation in those spaces may allow standard source installation to work?
Vince Carey (14:16:43): > A refresher on “what a bioconductor session is like” might be a good way to start the slide deck…
2020-04-02
BJ Stubbs (20:04:41) (in thread): > Thanks!
2020-04-10
Lori Shepherd (08:27:58): > Here is the link to Tue agenda for thought over the weekend. Hope everyone is staying safe and healthyhttps://docs.google.com/document/d/1ft7M_sIWGcUf8lDzEXse3YeQuN0DJ3t6hvYSeP9h86g/edit?usp=sharing
2020-04-13
Martin Morgan (10:01:25): > @Vince CareyI added an agenda item ‘Higher than Bioconductor re-orientation’ for any discussion of the topics you’ve been mentioning in email about flagship projects / data ingestion / etc
2020-04-14
Lori Shepherd (10:00:55): > https://bluejeans.com/480153337?src=calendarLink
2020-04-16
Martin Morgan (12:00:08): > Presentation material from high-level interoperability (between-AnVIL-like projects) athttps://docs.google.com/presentation/d/1He7naDpPQugrROq8Gh0VjxLiFyDnOuWUukLyToEoNzUseveral relevant presentations on FHIR (standard phenotype representation), RAS (NIH researcher authentication), …
2020-04-17
Vince Carey (08:00:44): > @Martin Morgancan you check the link? would not resolve for me
Vince Carey (08:01:32): > ah you have to remove the ‘with’ at the end
Vince Carey (08:09:00): > What does NCPI stand for?
Martin Morgan (09:29:06): > ‘NIH Cloud Platforms Interoperability’
2020-04-20
Vince Carey (06:04:40): > This morning I am attempting to build the binaries for 3.10.
Vince Carey (08:22:02): > It would be very helpful to have a mechanism to speed the delivery of source tarballs. Could we use rsync or something like that? It could consume less time in the expensive runtime.
Vince Carey (09:07:46): > Here is the list of packages that don’t install on theus.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.3anvil runtime > > ‘Rglpk’, ‘rpanel’, ‘Ecdat’, ‘jqr’, ‘sodium’, ‘st’, ‘Rmpi’, ‘glpkAPI’, ‘PairedData’, ‘xps’, ‘mlm4omics’, ‘doMPI’, ‘btergm’, ‘ROI.plugin.glpk’, ‘ssh’, ‘googleway’, ‘geojson’, ‘spatstat’, ‘showtext’, ‘keyring’, ‘BioMark’, ‘netbenchmark’, ‘CountClust’, ‘alphahull’, ‘spaMM’, ‘lwgeom’, ‘geojsonio’, ‘spdep’, ‘emojifont’, ‘IsoriX’, ‘blackbox’, ‘splm’, ‘DHARMa’, ‘stars’, ‘rmapshaper’, ‘spatialreg’, ‘pgirmess’, ‘Infusion’, ‘tmaptools’, ‘tmap’, ‘chroGPS’, ‘cellTree’, ‘readat’, ‘CHARGE’, ‘MBQN’, ‘Fletcher2013b’, ‘ccfindR’, ‘scAlign’, ‘scPipe’, ‘cobindR’, ‘MoonlightR’ >
Nitesh Turaga (09:25:29): > A lot of these packages are expected, and some are because the R package dependencies have been archived for R-3.6
Vince Carey (12:10:25): > Do we want me to create a GS-based repo with these newly created binaries (there are only about 80 based on new sources) or is there a step we should take to mark the packages with information about the generating container? I could imagine adding a line to DESCRIPTION with the container id.
Nitesh Turaga (12:52:48): > I’m ok with this idea of adding a new line with the container ID, although users might confuse this for metadata they need to add while submitting packages.
Vince Carey (12:53:23): > is anvil-rstudio-bioconductor:0.0.3 the appropriate image?
Nitesh Turaga (12:54:17): > I’m trying to think if it is, and this is my thought process, > * This is a community image which will be replaced soon with an Rstudio pro. So this suggests that maybe it’s not.
Martin Morgan (12:55:06): > I think the info should be at the repository level (in the file path to
Nitesh Turaga (12:55:16): > I’m thinking a more appropriate image could be the latest update we made to theterra-jupyter-r
image. Which basically combinesterra-jupyter-bioconductor
andterra-jupyter-r
.
Nitesh Turaga (12:55:39): > i.e this imagehttps://github.com/DataBiosphere/terra-docker/tree/R-bioc-updates
Vince Carey (12:56:46): > Is this a good time to make a repo that would be pretty much final for 3.10? I assume there will be an enduring need for these binaries.
Nitesh Turaga (12:56:58): > I can build this image and push to a gcr location, if that would be helpful for you to build the binaries.
Vince Carey (12:59:14): > There isn’t any rush. I confirmed that I can make them in at most a couple of hours. I think we should just have a consensus and write it down, as to what the repo consists of and how it should be used.@Martin MorganIf the package itself is not marked we won’t have any way to tell whether it is being used in an appropriate environment.
Vince Carey (13:01:00): > But the marking if it is to be done has to be done at installation time, I think, or else the checksum will be wrong? or maybe the checksum is made at build time, in which case no change should be made to package content. It still seems to me that we should have something at the level of the binary saying where it came from. Does install.packages record this in package installation metadata? Then the repo-level mark would be adequate.
Sean Davis (13:10:02): > It seems like a Bioc linux-binary installation toolkit would be useful here to help codify the social contracts being described above. Such a system could even be AnVIL-centric (at least to start). That said, I am pretty uninformed about binary package installation tooling built into R.
2020-04-21
Martin Morgan (05:14:18) (in thread): > I’m not sure when this ‘check’ would occur, it isn’t intrinsic to R so it would have to be something that ‘we’ wrote. Isn’t the situation analogous to ‘release’ and ‘devel’ versions of Bioconductor, with installation of correct software determined by repository choice not by information in the package DESCRIPTION?
2020-04-22
Martin Morgan (12:53:08): > @Vince Carey– Nitesh will update the rstudio image tohttp://us.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.4. It will contain environment variables which describe the image: > > R_PLATFORM="anvil-rstudio-bioconductor" > R_PLATFORM_BINARY_VERSION="0.99.0" >
> The idea is to uniquely identify the image (because the different images have different OS, etc.) and then the ‘version’ of the system dependencies installed. We don’t want to use the image tag (0.0.4
) because the container might change for reasons unrelated to the system dependencies (e.g., changing packages installed by default). The0.99.0
version is meant to be ‘semantic versioning’ with formatx.y.z
. Changes inz
donotrequire re-building binary packages, whereas changes inx
andy
dorequire new packages. > > If you place the binary images that you build in a CRAN-style repository that encodes this information, e.g., > > [https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.10/src/contrib/](https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.10/src/contrib/)... >
> where...
contains each built package and a PACKAGES file with package name, version, dependencies, then the AnVIL package will be updated to provide a functionAnVIL::install()
(the currentinstall()
will be replaced) that looks first for binary packages and then in the usual place for source packages. > > Since the Bioconductor builds for the 3.10 release have stopped, the Bioc binaries should always be installed in preference to the source packages. > > We’ll try this for the rstudio image; I think we’ll then be able to also support the terra-jupyter-r stack (after a pull request is accepted) by building binaries on the terra-jupyter-r image and hosting them athttps://storage.googleapis.com/terra-jupyter-r/0.99/3.10/src/contrib/...
Martin Morgan (12:55:30): > From your end, can you make a ‘CRAN-style’ repository at a URL that encodes the R_PLATFORM, R_PLATFORM_BINARY_VERSION, and Bioconductor version like that above? > > Also, I think it would be good to document exactly what we’re doing, maybe in a new file inhttps://github.com/Bioconductor/AnVIL_admindocs/binary_build_documentation.md
?
Vince Carey (12:56:29): > Yes, let me know when that container is ready and I will write a script and run it to produce/populate this CRAN-style repo
2020-04-24
Nitesh Turaga (11:45:28): > Just a follow up to this, the environment variables have been renamed to > > TERRA_R_PLATFORM="anvil-rstudio-bioconductor" > TERRA_R_PLATFORM_BINARY_VERSION="0.99.0" >
> I’ll push the latest image tous.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.4and it should be available to build the binaries on soon. > > Would it be useful to build binaries for the terra-jupyter-r image as well? Which is available atus.gcr.io/broad-dsp-gcr-public/terra-jupyter-r:0.0.13? So we can have binaries for both the terra-jupyter-r images and the rstudio images?
Vince Carey (11:50:26) (in thread): > missed this. Maybe the situation I am considering is implausible. Suppose someone gets hold of a binary package and tries to run it in an incompatible environment and an error is raised. As a post-mortem we could check the tag on that package and say “you can only use this with container x”. Could such a mark also be prophylactic? As this practice grows I think we would want to have some approach to systematic protection. You are right that it is not part of R … now.
Vince Carey (11:52:31): > Hi@Nitesh Turaga… I will try to tackle this in the afternoon. Apropos a second repo – are there concerns that the runtimes are not completely compatible?
Nitesh Turaga (11:57:31): > Yes, you’d have to build them separately i’m guessing. For the rstudio and jupyter platforms at least, > > But please keep in mind that theTERRA_R_PLATFORM
would change for each of the repos, > * ******anvil-rstudio-bioconductor:0.0.4******, it would behttps://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.10/src/contrib/ > > * ******terra-jupyter-r:0.0.13******, it would behttps://storage.googleapis.com/terra-jupyter-r/0.99/3.10/src/contrib/ > From the ******terra-jupyter-r:0.0.13******
built binaries, we can use the same set of binaries forterra-jupyter-bioconductor:0.0.13
,terra-jupyter-gatk (default image)
, andterra-jupyter-aou
image.
Nitesh Turaga (12:36:06): > This is what you are looking for basically. - File (PNG): Screen Shot 2020-04-24 at 12.35.31 PM.png
2020-04-25
Martin Morgan (16:47:36): > I submitted the AnVIL package to Bioconductor. Please review authorship in the DESCRIPTION fileAuthors@Randvignettes/Introduction.Rmd(there is also an earlier vignette on using docker, from Vince) and let me know if I should have / should not have included anyone … happy to make changes.
Nitesh Turaga (19:58:03): > https://github.com/DataBiosphere/terra-docker/pull/111
Nitesh Turaga (19:58:09): > Rstudio pro is a PR now.
Nitesh Turaga (19:58:16): > So there is progress from the terra folks.
2020-04-27
Lori Shepherd (12:38:20): > Draft agenda for tomorrows meetinghttps://docs.google.com/document/d/1dFKgdJzEJDp_XAWX4wnAysaJm7BJFWEU-tF16o6ORmE/edit?usp=sharing
2020-04-28
Vince Carey (06:21:45): > https://github.com/vjcitn/BiocBBSpack/blob/master/vignettes/gcp.Rmddescribes the production of binary package set up to BiocManager::install on AnVIL … installations proceeding now. Construction of repo to follow soon.
Vince Carey (06:40:13): > Are there prospects for creating a warm archive (possibly via rsync) in google storage of all source tarballs from bioc and CRAN? This would save a decent fraction of production time.
Vince Carey (06:41:39): > https://cran.r-project.org/mirror-howto.html?
Levi Waldron (07:22:01) (in thread): > @Vince CareyI or someone on my end could do this. Just a regular CRAN mirror but in a Google Bucket?
Vince Carey (07:35:55) (in thread): > I think so. Just source tarballs, conveniently updateable. But we might want to think about it a bit, with Martin and Nitesh et al.. We want to do better than “install.packages” with a repo – IMHO we want to take advantage of gsutil parallel transfer capabilities. However we represent the raw source tarballs, they will have to get to a standard file system for CMD INSTALL to operate on them. What’s a good way of mobilizing gsutil parallel transfer for this step?
Martin Morgan (07:43:50) (in thread): > use rsync to mirror CRANhttps://cran.r-project.org/mirror-howto.htmland Bioconductorhttp://bioconductor.org/about/mirrors/mirror-how-to/. Old school would set these up as cron jobs.
Martin Morgan (07:48:16) (in thread): > about parallel transfer – you’re talking about the binary installs, right? and the assertion is that gsutil will do a meaningfully faster job than download.file (the part of install.packages that might be network bound) for a ‘typical’? subset of packages. Should be easy to measure – get paths to a bunch of files , compare system.time(download.file()) with AnVIL::gsutil_cp and / or gsutil_rsync (not sure that rsync provides fine enough resolution).
Martin Morgan (07:49:00) (in thread): > but i think install.packages gives us alot; we’d want to focus on tricks to get the downloads faster, not re-inventing the package managment wheel…
Levi Waldron (07:49:35) (in thread): > Do you know which GCP “zone” or “region” is Terra in? Oh the good old days of cron, it feels like a stone age tool now:joy:
Vince Carey (07:51:23) (in thread): > Actually I am referring to getting copies of source tarballs in preparation for CMD INSTALL to produce binaries – this seems to take an hour+ when done with BiocManager::install on the big machine, which is a waste. I could do it in stages, with tarball acquisition done cheaply but that makes it more complex. I agree on your point about getting downloads faster, not altering anything in installation stack.
Vince Carey (07:51:58) (in thread): > We need 3300+ .tar.gz source files
Vince Carey (07:52:08) (in thread): > 1823 from Bioc and the rest from CRAN
Martin Morgan (07:59:31) (in thread): > Seems like you have to do something over https to get CRAN / Bioc src tarballs into the cloud; rsync is expensive once and then incremental… . Maybe gsutil to sync the cloud-based CRAN-style repo with a CRAN-style repository on the compute instance and sayinstall.packages(repos = "file:///path/to/repo")
.
Vince Carey (08:04:32) (in thread): > I think that’s it. If gsutil_rsync can populate a gs bucket that would be great, but if it only populates a file system on an instance we would then need to move to the bucket to minimize time on instance.
Levi Waldron (08:23:16) (in thread): > FYI Terra creates buckets that are “Location type: multi-region” and “Location: us (multiple regions in United States)“. Do you know offhand the approx. size of the CRAN + Bioconductor mirrors?
Martin Morgan (08:54:07) (in thread): > maybe not that large; the bioc software (not annotation / expt data) source tarballs are > > bioc/src/contrib$ du -hs > 5.2G >
2020-05-01
Vince Carey (00:15:33): > “https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.10/” is now a valid repository for R sessions running under anvil-rstudio-bioconductor:0.0.4,
Vince Carey (00:16:47): > There are 3105 binary packages present. The process of tarring binary packages was interrupted for some reason. I tried to find and redo corrupted packages. Some may have slipped by.
Nitesh Turaga (08:50:17): > That’s great. Thanks@Vince Carey
Nitesh Turaga (08:50:31): > Btw, the new images will be available on terra on Monday.
2020-05-03
Martin Morgan (05:18:20) (in thread): > BiocManager::install("Bioconductor/AnVIL")
will install a version of AnVIL on that image;AnVIL::install(<pkgs>)
will install binaries when available, source otherwise. > > It seems like some of the binary packages at this location are not that current, e.g., RMySQL on CRAN is 0.10.20 published 03-14 but in the binary repository it is 0.10.17 (maybe from March, 2019?)
2020-05-04
Vince Carey (05:34:08) (in thread): > looking into this now
Vince Carey (05:48:14) (in thread): > confirmed. how did you find it? there may be other packages in similar conditions. What is strange is that in the workspace I used to build the binaries, which is still available, the RMySQL was current. Then when I installed from the bucket, it was downgraded.
Vince Carey (05:58:22) (in thread): > I’ll also note that my first attempt to install to AnVIL workspace took unusually long. It seems like there’s a “warmup” needed for google storage to supply a package from the repo
Vince Carey (06:11:33) (in thread): > I think the implication is that one should do a BiocManager::valid() before shipping to a new repo. I will try to build this into the solution for the next container (terra-jupyter-r…) which i will work on today
Martin Morgan (06:58:30) (in thread): > I found the problem by trying to update old packages on a new runtime > > pkgs = old.packages(repos=BiocManager::repositories()) >
> I guess RSQLite installed there is old, and your installation step (not documented athttps://github.com/vjcitn/BiocBBSpack/blob/master/vignettes/gcp.Rmd?) didn’t update first.
Vince Carey (07:32:32) (in thread): > it should be sorted now. i am getting 401 errors in rstudio that are interrupting work.
Martin Morgan (08:59:56) (in thread): > I don’t think there are enough packages in PACKAGES.gz > > > repos > [1] "[https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.10/](https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.10/)" > > db1 = available.packages(repos = repos) > > dim(db1) > [1] 67 17 >
Vince Carey (12:46:02): > “https://storage.googleapis.com/terra-jupyter-r/0.99/3.10/” is now a valid repository for jupyter R sessions running under terra-jupyter-bioconductor:0.0.13
Nitesh Turaga (12:48:52): > Ok, thanks Vince!
Lori Shepherd (12:50:47): > If you received an updated invite to our bi-weekly meeting this was because I updated emails of some participants
2020-05-06
Martin Morgan (17:49:01): > https://packagemanager.rstudio.com/client/#/repos/1/packages/A3(a couple of steps down in the navigation path) points to binary repositories of CRAN packages, including for some flavors of linux (top right corner). When the rocker / Bioconductor containers are Ubuntu based (this transition is happening now) all CRAN will be available for fast installation.
Martin Morgan (17:53:25): > one could run an instance of this on our bioconductor_docker image and customize for serving Bioc packages (as an alternative to asking vince to rebuild packages ad hoc). Just exploring this not advocating for it
Martin Morgan (17:58:32): > …except maybe to host oneself would require a paid product? not sure about that…
Vince Carey (23:07:07): > What is ad hoc about - File (JPEG): anv.jpg
2020-05-07
Vince Carey (16:32:06): > @Nitesh Turagaare there new containers for me to build repos for? I would like to see how concise a code base is needed to create and transfer the binaries.
Nitesh Turaga (17:13:19): > Hi Vince, not yet.
2020-05-08
Martin Morgan (02:52:13) (in thread): > @Vince Careythe rstudio binary repository seems to have only 67 packages in PACKAGES –nrow(available.packages(repos = AnVIL::repositories()[1]))
Nitesh Turaga (13:21:17): > @Vince CareyI have a pull request to the broad updating the terra-jupyter-r images to 3.11 and R-4.0. If all goes as expected, this should be in soon.
Nitesh Turaga (13:22:08): > Also, another question to@BJ Stubbsand@Vince Carey, the AnVIL-billing R package, is it able to get the “real time” costs being incurred by terra?
Frederick Tan (14:15:37): > Question about Terra’s “notebooks environment”: Can a user only have one Notebook Runtime per Billing Project? I ask because: > * If Workspace A and Workspace B are both linked to the same Billing Project, I don’t seem to be able to run Jupyter in Workspace A and RStudio in Workspace B > * anvil.terra.bio/#clusters lists Billing Projects and not Workspaces
Frederick Tan (14:26:02): > I think that’s a yes based on > > All workspaces within the same Cloud Project share the same notebook VM and its available software > from support.terra.bio/hc/en-us/articles/360027083172-Terra-s-Jupyter-Notebooks-environment-Part-II-Key-operations
Martin Morgan (15:20:33): > Yes, I think that’s correct; one thing about the use of rstudio is that, when your runtime ends, you loose whatever data you had created on the instance. You’d want to persist it in some way, e.g., by copying to a google bucket of your own. You could tryBiocManager::install("Bioconductor/AnVIL")
and thenAnVIL::gsutil_cp("local_file", "gs:/.../bucket/file")
(see the package vignette, e.g.,http://bioconductor.org/packages/devel/bioc/vignettes/AnVIL/inst/doc/Introduction.html
Frederick Tan (16:00:30): > Thanks!
Frederick Tan (16:02:58): > It sounds similar to output files created by Jupyter Notebooks … only thing that is automatically transferred is the .ipynb?
Martin Morgan (16:10:40): > yes that’s right; the RStudio image is more experimental so more prone to accidental loss. Also the plan is for both notebooks and RStudio to have more natural persistent storage, like a traditional disk, so these things will get better…
Frederick Tan (16:15:16): > Thanks for the peak at the roadmap. Helpful when people ask why something is the way it is to be able to say it should get better:slightly_smiling_face:
Martin Morgan (16:40:29): > FWIW you might also be interested in@BJ Stubbs’shttps://github.com/bjstubbs/AnVILBillingfor billing estimates.@BJ Stubbs: WOULD BE GREAT TO ADD THIS TOhttps://bioconductor.github.io/AnVIL_Admin/ - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
BJ Stubbs (16:57:07): > added
BJ Stubbs (16:58:35): > @Frederick TanI do not believe that you can only have one runtime environment. I think you can only have one if you are shackled to the web interface
BJ Stubbs (16:59:01): > If you use the leonardo api then you should be able to create additional environments
BJ Stubbs (16:59:49): > @Nitesh TuragaIf you set up billing to bigquery than you can get costs about a day later
Martin Morgan (17:01:43) (in thread): > So like > > leo = Leonardo() > leo$createRuntime(...) >
> from, e.g., your gcloud-sdk-equipped laptop? (AnVIL::gcloud_exists()
checks a bit for appropriate configuration)
BJ Stubbs (17:01:48): > @Frederick TanAlso, I think it is more valuable to think about workspaces as folders on a computer and the runtime as the processor. No matter what processor you are running, you always have access to all of your folders. It just takes a bit to make use them all.
BJ Stubbs (17:02:45): > @Frederick TanThe workspaces feel like silos, but they are not. We are working on demonstrations to show this
BJ Stubbs (17:03:46) (in thread): > Yes, although I haven’t tried the new api call yet. I think the createclusterv2 call still works as the moment
BJ Stubbs (17:04:42) (in thread): > The rstudio urls from this process are not obvious though, but we can post a guide somewhere if there is interest
Frederick Tan (17:30:59): > Thanks@BJ Stubbs. The real silo is between Billing Projects, correct?
BJ Stubbs (17:34:12): > @Frederick TanNot really. On AnVIL you have something called a “pet” account that is a proxy for your identify. Your pet account has billing codes you can use to do things, but your pet account has access to every workspace that you are whitelisted for. So, any billing account can be used to view all of the data that you have access to. The real gatekeepers are authorization domains which restrict who you can share data with.
BJ Stubbs (17:37:16): > @Frederick TanThis is getting into the weeds, but the exception is that the owner of a billing project may have access to the workspaces and metadata for all projects under that billin project even if they are not on the authorization domains for the workspace. They will not have access to the actual data however
Frederick Tan (17:41:11): > I think my question is along the lines of when do you need to duplicate data so that a runtime can access the data. Can a runtime access all data a user is authorized to access within that particular Billing Project?
BJ Stubbs (17:42:07): > An AnVIL runtime and cromwell/workflows as well have access to all data associated with your Terra account, not restricted by payment or workspace
BJ Stubbs (17:44:49): > The only time you might want to duplicate data is to create filtered tables in the data tab for workflows. For example, you might want to run an analysis on a subset of participants who responded to a drug. Then you can create a subset table in the data object model and bind your workflow to it
BJ Stubbs (17:45:50): > If you have private google buckets, you can also give access to your pet account to use those in AnVIL/Terra as well
Frederick Tan (17:53:49): > Interesting … perhaps I’m mis-interpreting the last section of this support documenthttps://support.terra.bio/hc/en-us/articles/360027083172-Terra-s-Jupyter-Notebooks-environment-Part-II-Key-operations
BJ Stubbs (18:00:18): > Ah I see. The notebooks infrastructure is more tied to the workspace web-based experience. Every workspace is created with a bucket. The notebooks for a workspace are automatically synced with that bucket They are the only thing that is. But that is an added convenience. You should be able to read from all of your workspace buckets and write to any workspace bucket that you have write privileges from any workspace. Or no workspace. If you install the AnVIL package on your local machine, if you have the gcloud sdk set up with your AnVIL account, then you can read and write directly to your buckets. This can be against individual project rules however. Check with your data provider before egressing data from AnVIL
Frederick Tan (18:02:38): > Great, thanks for details! Been hard to find:slightly_smiling_face:
BJ Stubbs (18:03:24): > No problem, let me know if you have any other questions
BJ Stubbs (18:03:59): > Full disclosure I am a user and 3rd party developer, but this is my understanding
Frederick Tan (18:04:53): > Best perspectives:wink:
2020-05-11
Nitesh Turaga (09:54:48): > @Vince CareyWe have the latestanvil-rstudio-bioconductor:0.0.5
images uphttps://console.cloud.google.com/gcr/images/anvil-gcr-public/US/anvil-rstudio-bioconductor@sha256:baba0c8a28faaee62f1c3f8da2423145e8d2bc83ad2ea9aff7078164dcf83564/details?tab=info&project=anvil-gcr-public.
Vince Carey (09:56:02): > Thanks@Nitesh Turaga… Should I make a whole new repo for this image?
Nitesh Turaga (09:58:15): > I suppose it can go in the same CRAN style repo right? With R-4.0.0 and Bioc-3.11 and theinstall()
function should pull from that repo.
Nitesh Turaga (09:58:59): > Is my thought process correct?
Martin Morgan (11:53:12): > It needs a whole new repository (all new binaries) because the base image has changed (to Ubuntu 18.04), base R has changed, and Bioc version has changed.
2020-05-12
Lori Shepherd (08:30:03): > https://docs.google.com/document/d/1nWF2b0-jQZDEpsrsyj0Ok4R4tOjoSYZLwPIbiHxu7Aw/edit?usp=sharingHere is the link to today’s agenda , meeting at 10 am EST
Nitesh Turaga (10:10:51): > Hi bioc-anvil team, sorry i’m missing today’s meeting. But the docker images are updated to R-4.0 on the RStudio images (anvil-rstudio-base and anvil-rstudio-bioconductor), with the new images available on the google container registry. > > The terra-jupyter-bioconductor images are still in a pull request, and require a “major” version bump, i.e 1.0.0 from R-4. 0 (it was previously 0.0.14 for the R-3.6). I will do that sometime today. Just wanted to give you all an update.
BJ Stubbs (14:16:56): > Thanks@Nitesh Turaga!
BJ Stubbs (17:23:29): > @Frederick TanI don’t think I know any CCDG/CMG investigators, so I am not sure what tools they use
Frederick Tan (17:24:31): > (in reference to whether CCDG/CMG researchers more likely to use Python in Jupyter, R in Jupyter, or RStudio)
2020-05-14
Vince Carey (15:39:39): > has anyone noticed that help() in Rstudio on anvil doesn’t seem to do anything? with help(mtcars) i get a little flash at bottom of browser (waiting for firecloud) but no help
Vince Carey (15:39:55): > i guess i should say “on terra”
Nitesh Turaga (15:51:12): > Could be some port needs to be open forhelp()
to show up in a panel. Do all help() commands not work?
Frederick Tan (15:54:05): > Was able to get?mtcars
andhelp(plot)
to work on anvil.terra.bio just now …
Vince Carey (16:01:36): > in Rstudio or jupyter?
Vince Carey (16:03:09): > @Nitesh TuragaI am seeing nothing from help() or by pressing “R help” in the Help button at top of Rstudio
Frederick Tan (16:19:48): > RStudio ( usingus.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.4)
Vince Carey (16:47:30): > Same here. I probably need to reboot? Thanks for checking.
Nitesh Turaga (16:52:11): > Just a heads up@Frederick Tan, there is a 0.0.5 available now.
Nitesh Turaga (16:52:21): > with R-4.0.0 and Bioc 3.11
2020-05-15
BJ Stubbs (12:08:32): > help(mtcars) works for me. I did need to spawn the files/help tab first though
BJ Stubbs (12:08:56): - File (PNG): rstudiohelpanvil.png
Vince Carey (12:31:02): > thanks to all – my pane set was misconfigured
2020-05-19
Martin Morgan (12:43:29): > If I’m following correctly, the R 4.0 rocker image that we use as a basis for bioconductor_docker:* has updated the CRAN repository to point to pre-built binaries, so quick binary installs… > > ~$ docker run -it --rm bioconductor/bioconductor_docker:devel R --quiet -e "BiocManager::repositories()" > Bioconductor version 3.12 (BiocManager 1.30.10), ?BiocManager::install for help > > BiocManager::repositories() > BioCsoft > "[https://bioconductor.org/packages/3.12/bioc](https://bioconductor.org/packages/3.12/bioc)" > BioCann > "[https://bioconductor.org/packages/3.12/data/annotation](https://bioconductor.org/packages/3.12/data/annotation)" > BioCexp > "[https://bioconductor.org/packages/3.12/data/experiment](https://bioconductor.org/packages/3.12/data/experiment)" > BioCworkflows > "[https://bioconductor.org/packages/3.12/workflows](https://bioconductor.org/packages/3.12/workflows)" > CRAN > "[https://packagemanager.rstudio.com/all/__linux__/bionic/latest](https://packagemanager.rstudio.com/all/__linux__/bionic/latest)" >
> (hmm,BiocManager::install('dplyr')
installs some but not all packages ‘fast’… not sure if this is intentional or otherwise…)
2020-05-20
Vince Carey (11:47:36): > I don’t follow. How does this affect the binary repos in GCP process that I have fallen quite behind on ….?
Martin Morgan (13:19:47): > In principle, since the anvil-rstudio-bioconductor image uses the R 4.0 rocker image, the CRAN packages are already available for binary installation, so we would only need to provide binary images of Bioconductor packages. The terra-jupyter-r image I think does something different…@Nitesh Turagais the R / Bioc version information onhttps://bioconductor.github.io/AnVIL_Admin/(R 3.6 / Bioc 3.10) andhttps://github.com/DataBiosphere/terra-docker/tree/master/terra-jupyter-r(R 3.6) correct? - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
Nitesh Turaga (15:13:01): > Hi@Martin MorganI believe they are up to date. The R-4.0.0 terra-jupyter-r pull request never got merged. This is because of the labels Adrian asked about, and once that is resolved that pull request should be merged.
Martin Morgan (15:22:12): > In the ‘containers’ section of AnVIL_admin the links are to terra-jupyter-bioconductor:0.0.14 and anvil-rstudio-bioconductor:0.0.5 which are both R 4.0 / Bioc 3.11 based?
Nitesh Turaga (15:56:37): > The anvil-rstudio-bioconductor:0.0.5 is R-4.0 / Bioc 3.11 based, but the terra-jupyter-r:0.0.14 is R 3.6.3/ Bioc 3.10 based. I’ll update that document so that this is reflected.
Nitesh Turaga (16:05:59): > I’ve updated the document.
Vince Carey (17:46:02): > Interesting situation. So the R that I use to build our binaries should be using binary versions of CRAN packages taken frompackagemanager.rstudio.comand using them, right? Furthermore they would be copied to the bucket that we use as the bioc repos – a key gain is that i do not have to compile cran packages. Are we sure there is no risk of runtime library incompatibility?
2020-05-21
Martin Morgan (06:45:07): > slightly better – for the anvil-rstudio-bioconductor:0.0.5 image you’d just need to build and copy to the bucket the Bioconductor packages – everything from CRAN would be handled bypackagemanager.rstudio.com. > > there are a lot of new moving pieces here, and it seems fragile, so still an ‘alpha’ idea.
Nitesh Turaga (17:43:46): > The terra-jupyter-r update has been merged. We’ll have the new R-4.0 image in the jupyter notebook by Monday I expect.
2020-05-22
Vince Carey (07:26:26): > Let me know when I should make a new binary repo for AnVIL
Nitesh Turaga (07:27:17): > Thanks Vince, I will. I remember you sent a link to the documentation of the process. Can you send it again, I can’t seem to be able to locate it.
Vince Carey (14:20:00): > There’s nothing worth reading at this time. The snapshot that I posted May 6 needs to be transcribed;github.com/vjcitn/BiocBBSpackhas some relevant material but not properly systematic.
2020-05-26
Lori Shepherd (08:39:14): > Working on the agenda now for our 10 am EST meeting – it will be found athttps://docs.google.com/document/d/19wjx8gtGVb70CjFiWUhTzeLure7ZcXU7zn7qkUUqhiQ/edit?usp=sharing
Lori Shepherd (10:03:06): > starting nowhttps://bluejeans.com/480153337/
Frederick Tan (11:04:35): > @BJ StubbsIn case you haven’t seen this workspace, the Jupyter Notebook demonstrates Spark using Hailhttps://anvil.terra.bio/#workspaces/amp-t2d-op/2019_ASHG_Reproducible_GWAS-V2
Frederick Tan (11:56:40): > @Vince CareyHere’s the heavily under construction Jamboree agenda planning documentdocs.google.com/presentation/d/1jW6Y7w9Tdl-33ttfRCYGNUmOpU2_8JSVRELbPj_oGTo
Frederick Tan (21:57:49): > @Vince Careyet al. Here is an example usinggithub.com/jhudsl/ari > * Movie —drive.google.com/uc?id=1GdJeq2lrlQv0Ba44q3E74_dPdmdkeJO5 > * Slides —docs.google.com/presentation/d/1ezmW_hRlq9gcQO3DQcTtOCvFsuZHbxiXxOre3955LJI > Watching it again reminds me that some of the 30 minute session will be eaten up by time waiting for the runtime environment to spin up
2020-05-27
Vince Carey (07:19:25): > looks pretty nice. would you want to rock their world even more by making a manhattan plot with plotly, that can show metadata for points on hoverover? another thing that would not be too costly to add is code for a shiny app that allows users to pick regions of chromosomes for such a manhattan plot. i wonder if there is some way to warm up the runtimes in advance … it is probably scriptable to start them at a specific clock time.
Frederick Tan (07:25:17): > Two challenges > * Still unclear how much the prior hands-on sessions will explain the concepts of Jupyter Notebooks and Runtime Environments and thus eat into the 30 minute time block … hoping to get that information today > * There might be a decent segment of the Jamboree audience that have never seen R code before
Frederick Tan (07:26:03): > Any ideas on best way to distribute .Rmds so that people can just hit Run / Knit?
Frederick Tan (07:28:54): > That being said, while we’re catering this content for the Jamboree event, I think that showcasing a plotly / shiny Manhattan plot would be great material for people to work through afterwards so let’s do it:slightly_smiling_face:
Vince Carey (07:42:53): > if you share the workspace with me (stvjc@channing.harvard.edu) i will clone it and put some code together. “never seen R” is is what it is. what we are doing does not teach R but shows that if you type this you can experience that. The “that” has to be compelling and the “this” can be sharply minimized through planning.
Vince Carey (07:43:26): > how is the big gsutil with long urls of slide 13 going to be carried out? will there be a way to just paste the string into console?
Frederick Tan (08:24:31): > No AnVIL workspace yet … Mo Heydarian I believe will be making a new Billing Project to organize workspaces related to AnVIL training / outreach … after that can add you
Frederick Tan (08:26:51): > re: long urls of slide 13 … theCopy to clipboard
on slide 10 will copy to clipboard the entiregsutil cp
gs://stuff.
Martin Morgan (09:18:45): > re: long urls – stick in a table in the workspace (AnVIL::avtable_import()) ahead of time, thenavtable() %>% pull(url) %>% gsutil_cp()
? I think we should pre-install AnVIL on our 4.0+ images… > > Also > * Slide 17 – embrace the tidyverse withread_csv()
instead ofread.csv()
andtbl
instead ofdf
> * Slide 18: always pipe (it’s the tidy way…)tbl %>% filter(...)
so use of these functions is consistent
Frederick Tan (10:25:05) (in thread): > re:read_csv()
andtbl
… a little tangential, but I was going to say that I’ve had trouble teachingDESeqDataSetFromMatrix()
since it historically used rownames … but just noticedtidy=TRUE
!
Frederick Tan (10:25:56) (in thread): > Is Bioconductor making a point to deprecate the use of rownames or is that up to the author?
Frederick Tan (10:33:22): > Question about the design ofus.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.5… am I correct that while there are only 11 Bioconductor packages explicitly installed ininstall.R
theapt-get install
s are sufficient to install most (all?) of Bioconductor 3.11?
Frederick Tan (10:44:59) (in thread): > Createdanvil-outreach/Test
and set you as an owner. Let me know if you are able to see it … not sure if you need to be part of the Billing Project or not.
Frederick Tan (10:45:17) (in thread): > … and sent a support ticket as it’s not clear to me whether we can rename the workspace:slightly_smiling_face:
Frederick Tan (11:15:10) (in thread): > Ok, looks like you can’t rename so choose wisely:slightly_smiling_face:
Frederick Tan (11:35:15) (in thread): > In addition to Manhattan plots (exploration) is there an “easy” way to annotate?
Frederick Tan (11:39:01) (in thread): > Though if the idea is to distribute an.Rmd
in the workspace and access usingav*()
then lots of possibilities
Martin Morgan (12:29:16) (in thread): > I think rownames will be with Bioconductor for a long time… they are basically guaranteed to be unique IDs, which makes them exactly appropriate for table joins, which is one way of thinking about central objects like SummarizedExperiment – row, column, assay tables
Martin Morgan (12:30:33) (in thread): > yep, that’s the intention. We are working on ‘binary’ installs (since the OS dependencies are already satisfied) which would mean that most packages would install quickly, especially over fast interconnect
Frederick Tan (12:38:55): > re: AnVIL package … cananvil-rstudio-bioconductor:0.0.5
be updated or would installing a new package require a0.0.6
release?
Frederick Tan (12:39:49) (in thread): > So do you see general user training being to recommend working withtbl
but then recasting todata.frame
when rownames are necessary?
Nitesh Turaga (13:12:13): > A new package installation built into the docker image would require 0.0.6.
Vince Carey (14:42:21) (in thread): > should i start building such a binary repository?
Vince Carey (14:47:53) (in thread): > IMHO the project has not really deliberated on this. It seems an interesting topic – There is a channel on tidiness_in_bioc but it has little traffic.
Vince Carey (14:50:19) (in thread): > http://www.bioconductor.org/packages/release/workflows/vignettes/fluentGenomics/inst/doc/fluentGenomics.html
Sean Davis (15:48:26): > Does AnVIL support specifying a custom container as a runtime?
Sean Davis (15:52:56) (in thread): > Not sure where the best place to discuss the binary build/docker implementation details. Is there a place on Slack to discuss?
Sehyun Oh (16:02:31): > @Sean Davisyes
Sehyun Oh (16:03:41): - File (PNG): Screen Shot 2020-05-27 at 4.01.51 PM.png
Sean Davis (16:05:24): > Thanks,@Sehyun Oh. Then,@Frederick Tan, if it doesn’t add complexity for students, consider building a custom image with all materials including the Rmd and/or notebook files. Easily 15 minutes of the half-hour session will be spent setting up packages, copying files around, etc., otherwise.
Martin Morgan (16:54:16) (in thread): > yes@Vince Careya binary repository sounds great to me…
Martin Morgan (16:58:09) (in thread): > @Sean Davisthere’s already been quite a bit of discussion in this channel, and some iterations of the process; it would be good to have several of these repos (for bioconductor_docker [which includes the R-4.0 jupyter image in AnVIL when it is available?], as well as for the rstudio image) and starting using them regularly. Maybe@Vince Careyadding links to available repositories to the ‘In progress’ section ofhttps://bioconductor.github.io/AnVIL_Admin/would provide a good place to ‘remember’ these activities… - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
Sean Davis (16:59:46) (in thread): > :+1:
Vince Carey (17:16:05) (in thread): > Do we want to use such a setting of repositories() for Bioc 3.11?
Frederick Tan (17:19:26): > Makes sense given that 5 minutes will already be spent spinning up the instance.
Vince Carey (17:34:08) (in thread): > @Nitesh Turagai am going to targethttps://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.11/src/contrib/for the newest binary repo, for jamboree? verify@Martin Morgan?
Nitesh Turaga (17:36:38) (in thread): > ok
Martin Morgan (17:42:13) (in thread): > In general, I feel like the devel image is the right place to introduced big changes like this, including on the docker image. I think we should still aim for AnVIL::install() as the place to enable binary package installations in release – i.e., a feature enabled by a package
Vince Carey (22:31:07): > A caveat here is that your custom container must be defined to use the terra-based Rstudio – see e.g.,https://github.com/anvilproject/anvil-docker/blob/master/anvil-rstudio-bioconductor/Dockerfile
Vince Carey (22:58:08): > We now have a 3.11 binary repository for AnVIL Rstudio – > > > AnVIL::install("Rsamtools") > trying URL '[https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.11/src/contrib/Rsamtools_2.4.0.tar.gz](https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.11/src/contrib/Rsamtools_2.4.0.tar.gz)' > Content type 'application/x-tar' length 5556721 bytes (5.3 MB) > ================================================== > downloaded 5.3 MB > > * installing **binary** package ‘Rsamtools’ ... > * DONE (Rsamtools) >
Vince Carey (22:58:47): > That’s for container image > > [us.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.5](http://us.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.5) >
Vince Carey (23:00:34): > These packages in manifest do not install > > c("affypdnn", "anamiR", "BatchQC", "CALIB", "ccfindR", "cellGrowth", > "cellTree", "CHARGE", "chroGPS", "cobindR", "CountClust", "CTDquerier", > "CVE", "debrowser", "DEDS", "Doscheda", "flowFit", "GeneGeneInteR", > "Genominator", "gpuMagic", "IdMappingAnalysis", "IdMappingRetrieval", > "Imetagene", "lol", "lpNet", "LVSmiRNA", "manta", "MCRestimate", > "Melissa", "MoonlightR", "MSGFgui", "MSGFplus", "MTseeker", "nem", > "netbenchmark", "nethet", "PAPi", "PathwaySplice", "pcaGoPromoter", > "pint", "proteoQC", "QUALIFIER", "R3CPET", "readat", "RIPSeeker", > "SANTA", "scAlign", "sparsenetgls", "splicegear", "trena", "waveTiling", > "xps", "YAPSA") >
Vince Carey (23:01:59): > basilisk python infrastructure works. but conda environment installation is at the user level, so installation of a client like BiocSklearn will lead to python/conda installation efforts on first invocation of a module in a package outside the core set.
2020-05-28
Martin Morgan (06:17:14): > wow, really amusing /:exploding_head:(not meant to convey anger there, just ‘head exploding’) that basilisk works, and the conda-within-docker-within-gcp / python-within-R-within-RStudio-within-terra-… etc. Also thanks for updatinghttps://bioconductor.github.io/AnVIL_Admin/ - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
Frederick Tan (08:56:30): > Jamboree planning meeting today from 11-12pm ( Zoom 97018532760 / Agendadocs.google.com/presentation/d/1jW6Y7w9Tdl-33ttfRCYGNUmOpU2_8JSVRELbPj_oGTo). Two questions: > 1. When you volunteered@Vince Carey, were you also volunteering to present the actual Jun 11 2-2:30 session? If not, anyone interested and available? > 2. How do we go about creating a custom image? I think for starters, we just need to add the AnVIL Bioconductor package and have a mechanism for copying into the image a single placeholder .Rmd
Vince Carey (09:00:12): > 0) i will try to take call today, 1) in principle i can do the session as long as i understand the content, 2) you might not need a custom image given the binary repo – we should test what you need to do under the existing 0.0.5 container
Frederick Tan (09:05:24) (in thread): > I’m thinking that having people execute code chunks in an .Rmd is one way to maximize success amongst 100+ people
Vince Carey (09:05:38): > @Frederick Tanis there content forhttps://app.terra.bio/#workspaces/anvil-outreach/Testthat we should be working on?
Frederick Tan (09:05:50) (in thread): > That would give lots more leeway to showcase “exciting” features
Frederick Tan (09:06:38) (in thread): > A custom image would reclaim any minutes required to transfer that .Rmd
Frederick Tan (09:07:11) (in thread): > Just a placeholder right now to make sure that you can access a workspace in that Billing Project
Frederick Tan (09:07:23) (in thread): > Can’t rename so we’ll likely delete that workspace before too long
Vince Carey (09:07:27): > specifically the table noted in the slide set that has the association statistics …
Vince Carey (09:17:22): > OK I found the top variants table
Frederick Tan (09:18:38) (in thread): > @Vince CareyHere’s a more “official” Workspace to work out ofhttps://anvil.terra.bio/#workspaces/anvil-outreach/MaGIC%20Jamboree%202020
Frederick Tan (09:18:58) (in thread): > Let me know if you’re able to access it (I added you asowner
)
Frederick Tan (09:20:50) (in thread): > Ok, looks like that’s the Google Bucket associated with the “original” ASHG 2019 V2 workspace
Frederick Tan (09:21:26) (in thread): > i.e. Cloning a workspace doesn’t make a new copy of a data, rather all the data are pointers back to the original workspace
Frederick Tan (09:21:54) (in thread): > So thatgs://
should be valid unless they delete that workspace
Frederick Tan (09:22:44) (in thread): > @Vince CareyApologies if you already know all this, new to me:slightly_smiling_face:
Vince Carey (09:34:45) (in thread): > got it. i will clone it so i don’t mess it up. then we can flow back what is needed
Frederick Tan (09:38:37) (in thread): > Great! I’m working on a movie demonstrating my understanding of how the ASHG 2019 material has traditionally been run, at least from a “click this, now click this” perspective
Vince Carey (09:58:17) (in thread): > @Martin Morganshould another pair of hands be mobilized to create the jupyter-oriented repo?
Vince Carey (10:00:44) (in thread): > Right, the gs:// reference should work … but I am going to put the csv in my clone explicitly. We can discuss whether that produces cognitive dissonance. I think the use case “I have a csv, how do I get it in, what do I have to do to import it to an R session” will be common and might be profitably illustrated here.
Frederick Tan (10:24:07) (in thread): > Yes, I agree. “How to bring your own data”.
Martin Morgan (10:28:18) (in thread): > You’d create a Dockerfile extending the current image, linked from the second item inhttps://bioconductor.github.io/AnVIL_Admin/#now. I think it needs to be available ‘somewhere’, not necessarily in the google container registry. I also think the Dockerfile can be tested locally ;@Nitesh Turagamay have additional insight - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
Frederick Tan (10:44:19) (in thread): > I believe the AnVIL interface makes it easier to “type in” an image from Docker Hub e.g. can type in something likebioconductor/rstudio-demo
Martin Morgan (10:47:25) (in thread): > using the google cloud probably means that the install is fast because the repo is co-located with the anvil compute nodes; I don’t know how important this is in practice, since the start-up time is too long (aargh) anyway and I always start-and-go-somewhere-else.
Frederick Tan (10:48:19) (in thread): > Going point … haven’t tested timing. Wonder if there is caching.
Martin Morgan (10:52:17) (in thread): > In RStudio, choosing the Files tab of the (default) lower-right pane, Upload or More -> Export is actually an interface to your local computer. It’s uploaded to the (transient, I guess) image working directory. Suitable for CSV-sized data. Not sure that the Terra people realize this opportunity for easy ingress / egress.
Frederick Tan (10:57:40) (in thread): > Great point!
Frederick Tan (10:58:08) (in thread): > Ok, I think we should definitely include these two easy “Bring your own data” examples
Vince Carey (10:58:39) (in thread): > Agreed about the opportunity … indeed I am ignorant of it till now. I don’t use Rstudio. That egress channel is a feature I could imagine getting blocked from Terra once identified. A compliant way to go would be 1) use the + button in the files folder of workspace, 2) upload, 3) get the gs path,, 4, AnVIL::localize?
Nitesh Turaga (11:01:52) (in thread): > There is no caching@Frederick Tan
Nitesh Turaga (11:02:11) (in thread): > But martin is right, if you host your image on docker hub, the image “pull” times are slightly longer.
Vince Carey (11:02:37) (in thread): > i don’t see contact info for this call
Vince Carey (11:02:55) (in thread): > oh a zoom id in text
Sean Davis (11:39:27) (in thread): > For making a custom container, see:https://github.com/seandavi/BuildABiocWorkshop2020Essentially, write an R package and include the Dockerfile and .github/workflows/basic_checks.yaml in it. Package and dependencies will be installed and docker image will be automatically created and updated with new pushes. Versioning is based on git hash, so you can pin the version wherever you like.
Martin Morgan (11:45:21) (in thread): > I guess those files are inavbucket()
, so for instancegsutil_ls(avbucket())
orgsutil_cp()
or… seems like something that could be simplified –avfiles()
or something; will work on that. > > Also unfortunately at the moment two environment variables that are supposed to be set are not (https://github.com/anvilproject/anvil-docker/issues/10) and require the manual stepsavworkspace_namespace("<first-part-of-path-after-#workspaces-in-url>")
andavworkspace_name("<second-part-of-path...>")
to be called first
Martin Morgan (11:46:10) (in thread): > Also, what’s the opposite of the+
button?
Vince Carey (13:57:56): > @Frederick TanDo we know whether the hits table addresses are hg19 or hg38?
Frederick Tan (14:00:10) (in thread): > If you scroll through the Workflows Input,this_genome_build
is set to"hg19"
https://anvil.terra.bio/#workspaces/anvil-outreach/MaGIC%20Jamboree%202020/workflows/amp-t2d-op/genesis_GWAS
Frederick Tan (14:00:40) (in thread): > I’m guessing that is just metadata though
Frederick Tan (14:04:11) (in thread): > I’d assume that the “truth” would be in the .gds/.vcf files herehttps://console.cloud.google.com/storage/browser/terra-featured-workspaces/GWAS/1kg-genotypes/gds_maf001
Frederick Tan (14:04:51) (in thread): > Those are pointed to in theWorkspace Data
Data Table
Vince Carey (14:52:46) (in thread): > I think hg19 is a good bet.
Vince Carey (14:53:02): > Tricky misbehavior of my repo – > > cannot open URL '[https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.11/src/contrib/PACKAGES.rds](https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.11/src/contrib/PACKAGES.rds)': HTTP status was '404 Not Found'trying URL '[https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.11/src/contrib/EnsDb.Hsapiens.v75_2.99.0.tar.gz](https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.11/src/contrib/EnsDb.Hsapiens.v75_2.99.0.tar.gz)' > Content type 'application/x-tar' length 85340671 bytes (81.4 MB) > ================================================== >
Vince Carey (14:53:27): > I only put PACKAGES.gz there. I think I can reconstruct PACKAGES.rds and see if this goes away.
Vince Carey (15:44:44): > That seems to have worked. But the not-found and additional stuff shown seem to indicate a bug in AnVIL package or R? The reference to 81.4 MB suggests that the binary was found, but it does not get installed.
Vince Carey (15:46:31): > AnVIL Rstudio injects > > html_document: > df_print: paged > toc: yes >
> before my > > BiocStyle::html_document: > highlight: pygments > number_sections: yes > theme: united > toc: yes > --- >
> stymying my effort to use our formatting.
Vince Carey (15:47:05): > This is when I use the “knit to HTML” button
Frederick Tan (15:50:19): > Speaking of RStudio, ideas on when the image will be updated to the just released RStudio 1.3?
Martin Morgan (15:56:02) (in thread): > this seems to be a ‘frankenstein’ error message; PACKAGES.rds isn’t required; it could have been a warning? It seems unrelated to problems with installing the package? I see > > > AnVIL::install("EnsDb.Hsapiens.v75") > trying URL '[https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.11/src/contrib/EnsDb.Hsapiens.v75_2.99.0.tar.gz](https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.11/src/contrib/EnsDb.Hsapiens.v75_2.99.0.tar.gz)' > Content type 'application/x-tar' length 85340671 bytes (81.4 MB) > ================================================== > downloaded 81.4 MB > > * installing **binary** package 'EnsDb.Hsapiens.v75' ... > * DONE (EnsDb.Hsapiens.v75) > > The downloaded source packages are in > '/tmp/RtmpEo0jFw/downloaded_packages' > > library(EnsDb.Hsapiens.v75) > ... > > ## i.e., load without error >
Martin Morgan (16:01:01) (in thread): > Not sure exactly what you’re doing, but I did > > AnVIL::install("BiocStyle") > BiocStyle::use_vignette_html("my.Rmd") >
> which creates a template in the current working directory (accessible in the Files tab in RStudio). Clicking on the file opens it in the RStudio editor. Knitting it knits it:wink:and once I disable popup blocking I have a Bioc-styled document?
Vince Carey (16:07:52): > I suspect not that soon. Here is an oddity about the CSV file with the ‘top hits’ > > > tophits[.Last.value,] > variant.id chr pos allele.index n.obs freq MAC Score Score.SE Score.Stat > 2037 636598 1 177913519 1 2504 0.232 1161 -12.9 3.02 -4.28 > 2038 638710 1 177913519 1 2504 0.232 1161 -12.9 3.02 -4.28 > 2039 640837 1 177913519 1 2504 0.232 1161 -12.9 3.02 -4.28 > 2040 642938 1 177913519 1 2504 0.232 1161 -12.9 3.02 -4.28 > 2041 645033 1 177913519 1 2504 0.232 1161 -12.9 3.02 -4.28 > 2042 647120 1 177913519 1 2504 0.232 1161 -12.9 3.02 -4.28 > 2043 649163 1 177913519 1 2504 0.232 1161 -12.9 3.02 -4.28 > 2044 651245 1 177913519 1 2504 0.232 1161 -12.9 3.02 -4.28 > 2045 653320 1 177913519 1 2504 0.232 1161 -12.9 3.02 -4.28 > 2046 655385 1 177913519 1 2504 0.232 1161 -12.9 3.02 -4.28 > 2047 657468 1 177913519 1 2504 0.232 1161 -12.9 3.02 -4.28 > 2048 659568 1 177913519 1 2504 0.232 1161 -12.9 3.02 -4.28 >
> There are different variant IDs but identical statistics. Just want to make sure this is not a consequence of a bug in the example data generation.
Vince Carey (16:18:04) (in thread): > I think I am confused about environment of Rstudio terminal against Rstudio console and Rmd development page. I kept getting package not installed errors on request. But the reference to the man-made monster is surely apt.
Vince Carey (16:18:50) (in thread): > that sounds good.
Martin Morgan (16:27:00) (in thread): > I use the console… but trying just now in the terminal EnsDb.Hsapiens.v75 I also have success… > > rstudio@saturn-d5f1d755-e371-40b7-ae9a-02f007f19a3d:~$ R --quiet -e "AnVIL::install('EnsDb.Hsapiens.v75')" > > AnVIL::install('EnsDb.Hsapiens.v75') > trying URL '[https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.11/src/contrib/EnsDb.Hsapiens.v75_2.99.0.tar.gz](https://storage.googleapis.com/anvil-rstudio-bioconductor/0.99/3.11/src/contrib/EnsDb.Hsapiens.v75_2.99.0.tar.gz)' > Content type 'application/x-tar' length 85340671 bytes (81.4 MB) > ================================================== > downloaded 81.4 MB > > * installing **binary** package 'EnsDb.Hsapiens.v75' ... > * DONE (EnsDb.Hsapiens.v75) > > The downloaded source packages are in > '/tmp/Rtmpj9Kjic/downloaded_packages' > > > > > rstudio@saturn-d5f1d755-e371-40b7-ae9a-02f007f19a3d:~$ >
Frederick Tan (16:44:39) (in thread): > Worth bringing up inthe-anvil.slack.com#jamboree-2020 channel …
Frederick Tan (16:46:29): > Here’s a 10 minute video walking through the ASHG_2019 tutorial … I’d recommend setting Playback speed to 2:slightly_smiling_face: > * Video —drive.google.com/file/d/1UJjTF_6nlWnn0fWCKg2qJfhM_E9qXrow > * Slides —docs.google.com/presentation/d/1OIF8IeGCc1qNMlDVGo2H7Cr0jgJa1OijiLN63341QVI
Frederick Tan (16:47:26) (in thread): > @Vince CareyMade this without consultation with the Terra team so good probability this won’t be used as is for anything at the Jamboree
Frederick Tan (16:48:06) (in thread): > But a likely preview of what people will see before the RStudio session … hopefully with more explanation on how to transfer data into and out of Runtime Environments
Martin Morgan (16:50:13) (in thread): > @Frederick Tanjust to clarify, ‘only 11’ packages are installed, but that implies their dependencies, too, so to there are about 100 packages pre-installed, I think.
Frederick Tan (16:51:54) (in thread): > Hehe … yeah, took me a bit to figure out how to word that:slightly_smiling_face:
Frederick Tan (16:54:44) (in thread): > e.g. it’s 11 packages explicitly installed byinstall.R
sinceBiocManager
is installed somewhere upstream
Vince Carey (17:06:30) (in thread): > that’s quite nice. What is the approach to delivery? Do we just play/watch those videos and hang out to answer questions?
Frederick Tan (17:07:07) (in thread): > I haven’t been to any meetings where that has been discussed
Frederick Tan (17:07:35) (in thread): > I imagine that one strategy is to have a real live person narrate the slides
Frederick Tan (17:07:56) (in thread): > And the videos are for people who can’t make it / fall behind / review afterwards
Vince Carey (18:22:43): > <!here>There is something peculiar about the ‘top hits’ CSV file in the ASHG demo workspace –summarize_my_ASHG_Terra_analysis.top_variants.assoc.csv
: > > > tophits[24:30,] > variant.id chr pos allele.index n.obs freq MAC Score Score.SE Score.Stat Score.pval Est Est.SE PVE ref alt snpID P > 24 147209 1 54940165 1 2504 0.153 765 10.3 2.47 4.18 2.91e-05 1.69 0.405 0.00700 T C 1:54940165:T:C 2.91e-05 > 25 149638 1 54940165 1 2504 0.153 765 10.3 2.47 4.18 2.91e-05 1.69 0.405 0.00700 T C 1:54940165:T:C 2.91e-05 > 26 152070 1 54940165 1 2504 0.153 765 10.3 2.47 4.18 2.91e-05 1.69 0.405 0.00700 T C 1:54940165:T:C 2.91e-05 > 27 154409 1 54940165 1 2504 0.153 765 10.3 2.47 4.18 2.91e-05 1.69 0.405 0.00700 T C 1:54940165:T:C 2.91e-05 > 28 156751 1 54940165 1 2504 0.153 765 10.3 2.47 4.18 2.91e-05 1.69 0.405 0.00700 T C 1:54940165:T:C 2.91e-05 > 29 159196 1 54940165 1 2504 0.153 765 10.3 2.47 4.18 2.91e-05 1.69 0.405 0.00700 T C 1:54940165:T:C 2.91e-05 >
> these rows of the table occupy the same position (chr1, pos 54940165) and have identical statistics, but different variant id … identical statistics for linked loci are familiar enough, but multiple variants at the same position? to me it is redolent of a flawed merge.
2020-05-29
Frederick Tan (09:16:14): > Here’s the Spanish translation of the ASHG_2019 tutorial … would appreciate feedback from someone with more fluency (Google apparently only has a female voice trained in Spain :)) … interestingly, it’s 30% longer (13 min vs 10 min) > * Video —drive.google.com/uc?id=1kbkwuNCPZrPiLsPTrNJCMv6qG_YoIaMt > * Slides —docs.google.com/presentation/d/1Hu9N-VNLJYt9HpktFEAlERsFQRs8kYEXmlDU6QFtoQ8
Frederick Tan (09:33:24) (in thread): > Presumably this “Bring your own data” would start with the participant downloading data from a link that we provide. One thing to keep in mind is how long it takes for some people to find where files are downloaded on their computer.
Frederick Tan (09:34:31) (in thread): > Testing out Vince’s .Rmd right now … frustrating how much vertical space is lost to the Terra chrome
Martin Morgan (09:52:38) (in thread): > yeah the ’
Martin Morgan (09:58:12) (in thread): > yeah the ‘playground’ thing should go away… not sure where to post issues like that. For fun it turns out that you can launch an RStudio instance running in the AnVIL and accessible via the browse but from outside AnVIL; you then have the usual amount of screen space. This needs a token refresh periodically, so not really useful; I’ll see if I (or@BJ Stubbs) can remember the code…
2020-05-30
Martin Morgan (13:53:44): > I pushedavfiles_ls()
to list files in the workspace bucket (the stuff listed under DATA tab, Files item) plusavfiles_backup()
to save compute node files to the bucket, andavfiles_restore()
to move files from the bucket to the compute node. These work best if one setsavworkspace_namespace()
andavworkspace_name()
to the billing-account/workspace-name parts of the URL of the workspace (there’s a bug in the RStudio containers that don’t allow these to be captured automatically; this is not necessary in the jupyter notebooks). > > An interesting part of this is that, running on a laptop outside AnVIL but with gsutil appropriately configured (described in the vignette) the ‘compute node’ is the laptop, soavfiles_backup()
/avfiles_restore()
can be used to move files and folders to / from the cloud. It’s very convenient.
2020-06-01
Frederick Tan (11:24:45): > Frombioconductor.org/packages/release/bioc/vignettes/AnVIL/inst/doc/Introduction.html > > AnVIL organizes data and analysis environments into ‘workspaces’. AnVIL-provided Data resources in a workspace are managed in as ‘TABLES’, ‘REFERENCE DATA’, and ‘Workspace Data’, as illustrated in the figure below. > Is that figure available somewhere?
Martin Morgan (11:28:38): > It was just this screen-shothttps://github.com/Bioconductor/AnVIL/blob/master/vignettes/images/AnVIL-Workspace-Data.pngnot sure why it doesn’t show up in the vignette…
Frederick Tan (11:40:24) (in thread): > Thanks! Thinking it’d be useful to have a “data flow” diagram with RStudio on one side and annotated arrows to the other side with categories like > * Local > * Google Buckets w/gs_util*()
> * Workspace Tables w/avtable*()
> * Workspace Bucket w/avbucket()
> Would help underscore the importance of backing up data you care about@Vince Carey
2020-06-02
Lori Shepherd (08:25:25): > Hello - A reminder that next week’s (6-9-20) morning call will focus on the demo(s) we will be presenting to the technical call later that afternoon @Vince Careyand@Sehyun OhI have started a slide deck in the Bioconductor presentations google drive folder – feel free to add slides there or put in a link to whatever materials you are putting together. – If anyone had any additional information to add or discuss the slide deck can be found athttps://docs.google.com/presentation/d/1Efg3HHP3YWeSaffzU01MuiucO_PYU493BR9MuHFg6S0/edit?usp=sharing
Nitesh Turaga (10:32:29): > https://support.terra.bio/hc/en-us/articles/360044278391-June-01-2020R-4.0 images are now available. - Attachment (Terra Support): June 01, 2020 > The following release notes correspond to May 22, 2020 - June 1, 2020. In addition to these changes, this release includes back-end updates to workflows, Google integrations, and notebooks to impro…
2020-06-08
Lori Shepherd (19:15:47): > Just a reminder. Tomorrow’s morning meeting will focus on reviewing the material being presenting at the 4 tech call. I created the slide deck but dont see any slides or links so just wanted to check in@Vince Carey@Sehyun Ohand I believe after last week’s tech call there was additional material from Vince or@BJ Stubbs
2020-06-09
Vince Carey (08:52:32): > just getting to it now …
Vince Carey (08:58:52): > @Sehyun Ohwhat were you planning to present … please link a slide deck or add tohttps://docs.google.com/presentation/d/1Efg3HHP3YWeSaffzU01MuiucO_PYU493BR9MuHFg6S0/edit?usp=sharingthanks!
Lori Shepherd (10:02:19): > agenda:https://docs.google.com/document/d/1ej6NuAFGMg44gVslIxnjFizNPcKJGwJYqp7tW92FjKk/edit?usp=sharing
Sehyun Oh (11:06:50): > @Martin MorganWhen I tried to save a modified, new data table from Notebooks to Data model usingavtable_import()
, I got the below error even though this is not a redundant entity type. Any suggestion to resolve this?
Sehyun Oh (11:07:02): > > Error in .avtable_import_set_entity(.data, entity): !anyDuplicated(.data[[entity]]) is not TRUE > Traceback: > 1. avtable_import(data_model_set) > 2. .avtable_import_set_entity(.data, entity) > 3. stopifnot(!anyDuplicated(.data[[entity]]), !anyNA(.data[[entity]])) >
Martin Morgan (12:04:36) (in thread): > can you make this reproducible, e.g., by creating a mini-table that you’re trying to add? Alternatively, you could try > > debugonce(AnVIL:::.avtable_import_set_entity) > avtable_import(...) > Browser> n # several times until you get to the offending line > Browser> anyDuplicated(.data[[entity]]) > Browser> anyNA(.data[[entity]]) >
> and maybe work from there?
Sehyun Oh (13:16:13) (in thread): > With a toy table, I could make basic type work. But whenever I used themembership:
header for set entity, it seems to give an error.
Sehyun Oh (13:16:17) (in thread): > > `Error: 'avtable_import' failed: > Bad Request (HTTP 400). > Invalid first column header, should look like tsvType:entity_type_id > Traceback: > > 1. avtable_import(df) > 2. .avstop_for_status(response, "avtable_import") > 3. stop(message, call. = FALSE) >
Sehyun Oh (13:45:35) (in thread): > Ah… it seems likeavtable_import
doesn’t allow any redundant value for the first column. But by definition set entities have redundant values in the first, entity column.@Martin Morgan
Martin Morgan (14:20:12): > The roadmap slide (second slide) ofhttps://docs.google.com/presentation/d/1Efg3HHP3YWeSaffzU01MuiucO_PYU493BR9MuHFg6S0/edit?usp=sharinghas timings for presentations; please feel free to edit (downward)…
Martin Morgan (14:37:28): > @Sehyun Ohfor your slides if you have a chance,BiocManager::install("Bioconductor/AnVIL")
should mean > * slide 20: no need to explicitly setavworkspace_*()
> * difference between md5sums reported only once (useful to keep one of these reports in the slides) > * slide 21: probably justavtables()
if you’re under that billing account and in that workspace
Sehyun Oh (14:54:49): > @Martin MorganI’m not sure about the second point. I’m getting md5sums report whenever I useavtable
andavtables
…:thinking_face:
Sehyun Oh (14:54:56): > I updated the other two.
Martin Morgan (15:15:12) (in thread): > hmm, I have > > > packageVersion("AnVIL") > [1] '1.1.13' >
> ?
Sehyun Oh (15:20:10) (in thread): > ‘1.1.12’
Sehyun Oh (15:21:08): > https://community-bioc.slack.com/archives/CEW1G98H1/p1591724735249100?thread_ts=1591715222.248200&cid=CEW1G98H1 - Attachment: Attachment > Ah… it seems like avtable_import
doesn’t allow any redundant value for the first column. But by definition set entities have redundant values in the first, entity column. @Martin Morgan
Sehyun Oh (15:21:45): > avtable_import
can take nested data table for set entity?
Sehyun Oh (15:22:48) (in thread): > I’ve been usingus.gcr.io/broad-dsp-gcr-public/terra-jupyter-bioconductor:0.0.14environment without reinstalling AnVIL package…
Martin Morgan (15:32:30) (in thread): > Yeah; I guess restarting the kernel (one of those jupyter notebook commands) and thenBiocManager::install("Bioconductor/AnVIL")
would get you the updated image
Frederick Tan (16:59:06): > @Martin MorganTwo follow-up questions on compute intensive operations on AnVIL > * Would the Kubernetes solution be compatible with packages using BiocParallel etc? > * Planned support for GPUs?
Martin Morgan (17:00:45): > Yes, on option for kubernetes support uses BiocParallel. I’m not sure that GPU is on our radar, really, partly because historically there hasn’t been a standard GPU to target; maybe that’s solved in the same way that docker images allow binary package installation…
2020-06-10
Frederick Tan (11:48:44): > @Vince CareyLooks like you’re time slot has been trimmed to 2-2:25pmtiny.cc/2020magicjam-agenda
2020-06-15
JiefeiWang (07:48:20): > @JiefeiWang has joined the channel
2020-06-22
Lori Shepherd (09:06:02): > <!channel>Here is the draft agenda for our meeting tomorrow. As you will see it is focused on reviewing our Y2 Q1/Q2 promised deliverables to see if they are still active and should move forward to Y2 Q3/Q4 and well as defining new Y2 Q3/Q4 deliverables to share with NHGRI – We are also starting to make a Year 3 plan. Feel free to jot down notes and we can flush ideas out tomorrow. I also encourage everyone to look at the github projects board to make sure the Y2 Q2 board is updated – Q2 ends in June - July-Sept is Q36-23-20 Agenda
Frederick Tan (09:21:19) (in thread): > @Lori ShepherdShould there be an explicit agenda item for the BCC2020 workshop or is the card on the GitHub Projects board sufficient?
Lori Shepherd (09:23:12) (in thread): > I’ll add an agenda item for brief discussion – since it was address in the outreach call this week — thank you
Martin Morgan (19:14:32): > re: binary builds – seehttps://github.com/Bioconductor/BiocManager/issues/71#issuecomment-647802622and ‘we are working on supporting a Bioconductor mirror within RSPM’ which I think is a route toward ‘someone else’ maintaining a bioc binary repository…
2020-06-23
Vince Carey (08:47:16): > Interesting. What’s your take on the absence of macos support?
Vince Carey (08:47:35): > Maybe we don’t care … for AnVIL.
Vince Carey (08:49:28): > Will Rstudio match the system runtime dependencies that we use on AnVIL?
Martin Morgan (08:58:35): > I think they build for the rocker image that we use; I think we’ll have a conversation, though…
Lori Shepherd (09:58:17): > https://bluejeans.com/480153337?src=calendarLinkif anyone needs the link for todays meeting
Sehyun Oh (10:00:57): > AnVIL/Terra workshop for BioC2020 is assigned to 9-10am on Thursday. Would it work for anyone who wants to be a part? Based on the current rough version (http://waldronlab.io/AnVILWorkshop/index.html), I plan to introduce AnVIL/Terra, use cases (classroom, CNV analysis), and introduce AnVIL package. I’m thinking we can potentially add additional part: using Terra outside of the platform (@BJ Stubbs), runtime environment (@Nitesh Turaga), AnVIL package (@Martin Morgan), other analysis example (@Vince Carey). Let me know if you are interested in any part. Also, any suggestion is welcomed! - Attachment (waldronlab.io): AnVILWorkshop2020 > We introduce cloud-based genomics platofrm, Terra, and Bioconductor package AnVIL for R-friendly usage of Terra.
Frederick Tan (11:12:52): > Any plans to advertise activities onanvilproject.org? e.g. > * Tools — Docker image, AnVIL, AnVILBilling > * Training — Terra in the Classroom, Sehyun’s workspace, BJ’s and Vince’s Shiny demos
Sehyun Oh (12:11:13): > @Martin MorganHere is the reference page on Terra data model:https://support.terra.bio/hc/en-us/articles/360025758392 - Attachment (Terra Support): Managing data with tables > Workspace data tables (in the Data tab) are a convenient way to reference and organize attributes from different sources, including output files from previous analysis. You can use data tables to s…
Sehyun Oh (12:12:56): > 4. File format (sets)describes the ‘redundancy’ of the first column of set-entities that I mentioned. It seems likeavtable
* doesn’t know this yet.(Note that multiple rows in a *******set******* table may have the same set entity id (e.g. TCGA_COAD).)*
Sehyun Oh (12:14:35): > Create a table (tsv file) from scratchunder6. How to add rows or columns (using tsv files) explains the format of the first column header I mentioned.**(The first column header has to have the format entity:your_entity_id
(Note that all headers must end with “_id”))**
Sehyun Oh (12:18:23): > For more information on entity types:https://support.terra.bio/hc/en-us/articles/360033913771-Understanding-Entity-Types - Attachment (Terra Support): Understanding Entity Types > Workflows on Terra handle inputs in one of five ways, based on the category the samples - or “entities” - fall in to. This article will help you understand some of the technical details s…
Sehyun Oh (12:20:35): > In this article, the order of uploading entity files is also explained.(When working with nested data, you sometimes have to upload entity files in a specific order.)
Sehyun Oh (12:20:43): > The order is as follows (“A > B” means entity type A must be uploaded before entity type B): > * participants > samples > * samples > pairs > * participants > participant sets > * samples > sample sets > * pairs > pair sets > * set membership > set entity (e.g. participants > samples > sample set membership > sample set entity.)
Sehyun Oh (12:23:43): > And how this ‘linking’ looks like in the platform:
Sehyun Oh (12:23:47): - File (PNG): Screen Shot 2020-06-23 at 12.22.23 PM.png
2020-06-25
BJ Stubbs (13:03:07): > @Martin MorganI am having trouble running avtable_import Error in Terra()$flexibleImportEntities(namespace, name, entities) : > unused argument (entities) Sorry for the trouble. I think I know how we can create sets now, but I am having trouble uploading the new table.
Martin Morgan (13:29:42) (in thread): > yes I noticed that yesterday; I’ll investigate
2020-06-28
BJ Stubbs (22:18:08): > how about 5 min for the workshop on billing? There is a demo dataset in the package and I can walk people though it
2020-06-29
Sean Davis (09:36:19) (in thread): > In our experience with the cancer genomics clouds, getting people an account DURING the workshops was one of the key effectors of later use.
2020-07-02
Vince Carey (08:14:25): > I just noticed this comment on billing in the workshop. I am back in the instrumentation business and am working with AnVILBilling package. It’s very useful but there’s a lot of intrinsic complexity in a) the identities and processes required to get resource usage data shipped to BigQuery and b) the steps needed to filter the data down to i) measurement of resources used, ii) costs incurred, iii) assessment of tradeoffs between programming approaches, e.g., economizing on cores or RAM or i/o …. I have started an instrumentation channel on the AnVIL slack and posed a couple of questions to leading developers there on some technical things. For the user/bioconductor stack, I feel we need some high-level concept of “event” or “task”, easily comprehended at the user level, to which we can attach usage information for convenient interpretation. Right now the queries are mainly time-based, but multiple things can be going on in any given time slice and it isn’t always clear > how to link time-slice data to a specific substantively defined process. Workflow IDs exist but are unwieldy uuids. This is a conceptual thicket and the conversations will not be brief. I am just laying out some of the issues here.
2020-07-04
Umar Ahmad (08:21:49): > @Umar Ahmad has joined the channel
2020-07-06
Lori Shepherd (11:51:52): > Link to tomorrow’s agenda – we will be reviewing/populating the Y2 Q3 board –https://docs.google.com/document/d/19ttiVVTtrQJ1d1q534LtFZs5_G9E86A-n7pV3Tk2pL8/edit?usp=sharing
BJ Stubbs (20:12:02): > I am getting a warning about a checksum of the anvil api. Did something change?
BJ Stubbs (20:12:05): > > terra=Terra() > Warning message: > In .service_validate_md5sum(api_reference_url, api_reference_md5sum) : > service version differs from validated version > service url:https://api.firecloud.org/api-docs.yamlobserved md5sum: ebaeb7e317b1702763a1961242d0666f > expected md5sum: 1b1fe131446f829cc81359d0026279f9
2020-07-07
Martin Morgan (06:03:32): > I tried to introduce this as a way of tracking versions, but actually the md5sums change all the time, including but not limited to both large changes and small changes. Unfortunately the yaml does not use the ‘version’ field to indicate changes. I have mentioned this previously on slack (and opened an issue nowhttps://github.com/DataBiosphere/leonardo/issues/1495for leonardo). > > FWIW I have a fix to theavtables_import()
problem, and will try to push that out this morning
Sehyun Oh (09:33:27): > FYI, I have a doctor’s appointment and can’t make the meeting today. I’ll follow up the billing part for the BioC workshop through the meeting note.
Martin Morgan (09:55:35): > BiocManager::install("Bioconductor/AnVIL")
should get a version ofavtables_import()
that works; there are complaints about out-of-date md5sums still, though.@BJ Stubbs
Lori Shepherd (10:02:15): > Link to meeting:https://bluejeans.com/480153337/webrtc
Sehyun Oh (12:26:07): > Btw, due to fraudulent activity, the free credit ($300) program for Terra is suspended since July 2nd until further notice. Is there any education billing account we can use for workshop?
Lori Shepherd (12:31:24): > We were planning on trying to figure that portion out in the outreach meetings this week but have to follow up since this week’s outreach conflicts with ISMB
Lori Shepherd (12:31:31): > I’ll follow up and get back to you
Levi Waldron (15:49:19): > Speaking of workshops - this year’s format of one Docker container per workshop could be amenable to providing all workshops as AnVIL workspaces. What do you think? Is AnVIL ready?
Nitesh Turaga (16:36:37): > It might need some tweaking as far as the config file(rserver.conf) for RStudio goes, my memory isn’t fresh, but theoretically yes since they are derived from the bioconductor_docker image.
Martin Morgan (17:14:02): > FWIW, from Enis of Galaxy: “From the UI perspective, I definitely don’t think Galaxy (or other apps) need to be embedded into the Terra UI. Instead, apps should be accessed directly from theanvilproject.orgfor example and then the app allows AnVIL data to be browsed from within it.” Hmmm
Vince Carey (17:28:05): > Interesting perspective. Isn’t this (Enis’) concept compatible with that of the AnVIL package, with which we can use R however we like and access data and tools provided “in AnVIL” as we like/are authorized to.
Martin Morgan (17:42:42): > Yeas, maybe just a bit more ‘bold’; I’d thought of using AnVIL from outside as something that was fun and a little ‘cheeky’, but but maybe it would be better to explore this more directly. Already using it locally is pretty easy; I guess using it in the google cloud is also easy enough, and gets the benefit of cloud computing…
2020-07-08
Sean Davis (08:34:22): > Substitute any container you like in here for a Bioconductor “tool” that can access AnVIL using the AnVIL package–running on google cloud. Just to be concrete:https://gist.github.com/seandavi/5da4a73d94bc24236cf204196feddc85
Sean Davis (08:35:06): > Run a Bioconductor Workshop or Docker container on a Google Cloud Instance in two lines….
Martin Morgan (09:56:26): > cool! would be great to make the README.md commands cut & paste-able with line continuation\
in the code chunks. I guess bioconductor_docker doesn’t come with gsutil (@Nitesh Turaga?) so that it is not quite AnVIL-ready (the AnVIL package uses gsutil when not in the AnVIL for authentication, I think, forgcloud auth ... print-access token
; this could / should be changed, perhaps…)
Sean Davis (10:34:52): > Fixed code chunks–thanks@Martin Morgan. As for authentication, one can add authentication scope to the gcloud command in the gist to automatically give the image user access to anything he/she can access in AnVIL. We’d have to work out those details, but I suspect the Terra folks could help with that small piece.
Nitesh Turaga (10:59:44) (in thread): > Yes, it does not have thegcloud sdk
, although it’s easily installable. This will make it AnVIL ready as it can authenticate. The other thing is, the port where the RStudio image is served needs to be changed toENV RSTUDIO_PORT 8001
as in the anvil-rstudio-bioconductor image. Right now, the image launches on 8787 which is a closed port on terra.
Martin Morgan (12:10:41) (in thread): > I think the idea is that for sean the container is not ‘in’ terra, so we don’t have to live by their rules… but we can still do lots of things via the AnVIL package.
Nitesh Turaga (12:13:46) (in thread): > I see…let me re-read it and get back on this issue. Maybe I did not understand what we need to do here properly.
2020-07-14
Levi Waldron (16:32:01): > That’s great@Sean Davis- although I think it will be a big ask for the more beginner#bioc2020attendees. We need options, because the previous years’ one-AMI-for-everything won’t work. We can have more than one option depending on the attendee’s interest & abilities - is there enthusiasm for proving a Terra option, and not just for the AnVIL workshop? I think it would look like one workspace per workshop, with instructions to open it in RStudio with that workshop’s custom Docker image (https://bioc2020.bioconductor.org/workshops). Would likely require help from@Nitesh Turaga@Sehyun Oh@BJ Stubbsfor the real work of implementing.@Vince Carey@Martin Morgan? - Attachment (bioc2020.bioconductor.org): BioC 2020 > Where Software and Biology Connect. July 27 - 31, Boston, USA.
Martin Morgan (16:50:21): > this seems like a big lift at the last minute… Am I understanding that the workshops are in dockerhub, etc, but there is no way for the (naive) user to run these? Do you know that the dockerhub images can be accessed from within Terra?
Sehyun Oh (17:57:45): > It doesn’t seem like? Terra asks custom image extended from one of the Terra base images…
Sehyun Oh (17:57:47): > > { > "causes": [], > "exceptionClass": "org.broadinstitute.dsde.workbench.leonardo.http.service.InvalidImage", > "message": "TraceId(1d191c6a526cf9f16d1b656be22bc885) | Image shbrief/anvilworkshop doesn't have JUPYTER_HOME or RSTUDIO_HOME environment variables defined. Make sure your custom image extends from one of the Terra base images.", > "source": "leonardo", > "stackTrace": [], > "statusCode": 404 > } >
Vince Carey (20:22:09): > Can’t we proceed partially as in the past? If I recall correctly, we would pay for an AMI for each participant for the duration of the conference, and it would be endowed with all necessary resources. Here we are using GCP and the resources are disaggregated. We still buy an instance for each participant and identify it to them. But the instance just has docker on it. We also create a package that will be usable by all, but will deal with the workshop container acquisition and rstudio startup through calling a function in the package. The parameters are the instance id and the workshop name, which is mapped to the container name needed.
Sean Davis (20:43:43): > I’ll put something together.
2020-07-15
Levi Waldron (11:23:23) (in thread): > Want to build this workshop on one of the Terra base images? Of all workshops, this one should be available through AnVIL…
Sehyun Oh (12:04:28) (in thread): > I can check it. But our docker image will be practically the base image with AnVIL package, FYI.
2020-07-17
Vince Carey (15:32:10): > Just checking here…. Anything I can do to help?
2020-07-20
Dr Awala Fortune O. (02:32:48): > @Dr Awala Fortune O. has joined the channel
Dr Awala Fortune O. (02:33:32): > Hello everyone
Lori Shepherd (11:57:19): > <!channel>Reminder of our bi-weekly call tomorrow. Please remember the new meeting time at11 am EST. Here is a link to the agendahttps://docs.google.com/document/d/14Q5Ir7QklxUB01zdEcCup-hS8LLyEeWnxJQFPr7DoLQ/edit?usp=sharing
Frederick Tan (12:01:54) (in thread): > Thanks for the reminder! Is the 11am EST for all meetings going forward?
Lori Shepherd (13:28:53) (in thread): > yes
BJ Stubbs (15:42:37): > fyi, the swagger for leonardo is very different now.https://notebooks.firecloud.org/It looks like docker based runtime creation is deprecated and vm instance creation replaced it?
BJ Stubbs (15:43:33): > Maybe you can still do docker stuff, but it is not clear how
Frederick Tan (16:19:46) (in thread): > Thanks!
2020-07-21
Lori Shepherd (11:03:19): > meeting starting now.
Marcel Ramos Pérez (11:06:16): > set the channel topic: https://bluejeans.com/480153337
Marcel Ramos Pérez (12:04:58): > This may present a hurdle in getting both pkg installation methods to work@Martin Morganhttps://github.com/wch/r-source/blob/759437b8c8e576174bc12d0d2a2139a6d76068d5/src/library/utils/R/packages2.R#L394
Martin Morgan (12:07:49) (in thread): > The rstudio package manager and our own approach both ‘advertise’ the binaries as ‘source’ when deciding which packages to download, but installs them differently depending on whether they really are src or binary… also R chooses the most recent version of a source package if it’s in two different repositories, so x.y.z in our binary-repository-advertised-as-source would be masked by x.y.z+1 in CRAN source
Martin Morgan (13:19:36) (in thread): > @BJ Stubbsis this endpointhttps://notebooks.firecloud.org/#/runtimes/createRuntimenow the way to go? specify the docker image as ‘name’, e.g.,us.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.5? Have not tested this…
Martin Morgan (16:28:17): > sorry if I stepped in front / too quickly through your cool developments, BJ. Was wondering if there were graphql packages in R; but the two I saw seemed mostly ‘write graphql as text, then submit’, which doesn’t seem so helpful…
Vince Carey (17:06:15): > we should work this up for the next bioc-oriented meeting.
2020-07-22
BJ Stubbs (11:06:17): > No problem at all Martin. I am not aware of a graphql package. I was wondering if we could make an object representation of the main nodes (subjects, samples, sequences, etc) and provide accessor functions for the data in them. Then, I think we could create a delayed matrix implementation that would allow for access to all of the data via workflows or Rstudio without having to copy it to a workspace
Martin Morgan (11:41:41): > yes; is the data structured enough across sets to know that subjects / samples / sequences / etc will be re-usable? probably a good starting assumption anyway…
2020-07-23
Frederick Tan (15:46:50): > Is it possible to start up and access RStudio outside of the Terra GUI? I’m wondering what the API allows and have been inspired by the ease ofmybinder.org(perhaps a@BJ Stubbsquestion?)
2020-07-24
BJ Stubbs (12:11:28): > @Frederick TanYou used to be able to, but I am not sure if you can anymore. This code used to work
BJ Stubbs (12:11:44): > # function to create cluster on swagger with httr > createCluster <- function(googleProject, clusterName, rstudioDockerImage){ > # to get the access token > cmd=“gcloud auth print-access-token”; > token=system(cmd,intern=TRUE); > url=paste0(“https://notebooks.firecloud.org/api/cluster/v2/”,googleProject,“/”,clusterName) > httr::PUT(url=url, body=list(toolDockerImage=rstudioDockerImage), encode=“json”, httr::add_headers(Authorization=paste(“Bearer”,token))) > }
BJ Stubbs (12:12:05): > And would start a cluster at an address likehttps://notebooks.firecloud.org/proxy/BILLINGPROJECT/CLUSTERNAME/rstudio/
BJ Stubbs (12:12:47): > But the cluster api endpoints athttps://notebooks.firecloud.orgare deprecated now
BJ Stubbs (12:13:07): > There may be a way to use the runtime endpoints to do this, but it is not obvious to me how to do so
Frederick Tan (12:14:58): > Seems like a possible way (or at least used to be a way) to reclaim all that chrome and make it easier to start up RStudio
BJ Stubbs (12:16:08): > It had an issue with timing out though, you needed to hit a proxy api endpoint occasionally to stay connected. The cluster would stay up, but you would get “unauthorized” errors
Frederick Tan (12:16:41): > Ahh … ok … too bad
BJ Stubbs (12:17:12): > I know what to ask on the other slack in the leo channel. I will post there
Frederick Tan (12:17:49): > Again, have been liking themybinder.orgmodel from a training perspective … note that I haven’t thought through all the security maintenance etc. implications
BJ Stubbs (12:25:48): > It is pretty easy to modify the anvil docker images. If you wanted to add a collection of notebooks to your image. I think authentication or data egress to a third party app like that might be problematic
BJ Stubbs (12:27:00): > Rob Title[12:23 PM] > Yeah we recommend using the “runtimes” APIs to create both Dataproc and GCE VMs > [12:23 PM] thecloudService
enum inruntimeConfig
can be either “dataproc” or “gce”BJ Stubbs[12:24 PM] > @rtitle do you use the same “toolDockerImage” parameter for the dataproc version?Rob Title[12:24 PM] > yes
BJ Stubbs (12:27:11): > Looks like a small modification to the code above might work.
2020-07-31
bogdan tanasa (13:54:10): > @bogdan tanasa has joined the channel
Erick Cuevas (14:12:41): > @Erick Cuevas has joined the channel
2020-08-03
Hena Ramay (01:44:47): > @Hena Ramay has joined the channel
Lori Shepherd (16:27:20): > Agenda link for tomorrow’sBi-weekly AnVIL meetingat 11 am EST
2020-08-04
Lori Shepherd (11:02:08): > https://bluejeans.com/480153337/webrtc?src=calendarLinkmeeting starting now
Lori Shepherd (12:10:24): > https://docs.google.com/presentation/d/1e6Uwtj0AhJFpTHlCwktqPwut750DrcDgnH_G9xpmCts/edit?usp=sharingStarted slide deck in google slides for next week
Nitesh Turaga (13:25:33): > If anyone wants to test my azure script, it’s available as a gist,https://gist.github.com/nturaga/0c4335ddc0559287e2d104e4c71ae64e. There are minor modifications and personalizations in this compared to the one onbioconductor.org/help/docker
2020-08-05
Hans-Rudolf Hotz (03:22:33): > @Hans-Rudolf Hotz has joined the channel
Vince Carey (07:19:06): > I am embarking on building a binary repo in google cloud storage for bioc 3.12 … if this is redundant with something that could be used from rstudio repo please let me know – i am building off the 0.0.6 anvil rstudio-bioc container and i will need to set version to 3.12 via BiocManager, reinstalling 100 packages that have 3.11-based images.
Vince Carey (07:24:39): > The purpose of this is to build oscabook in AnVIL
Frederick Tan (08:35:31): > How does it handle the data?
Vince Carey (09:39:41): > AFAIK the data are in the ExperimentHub cache. Ideally this cache would be globally accessible and maintained to stay current. Otherwise it gets populated in each build – which I think implies transfers from AWS storage (where *Hub basic resources live) to GCP. It need not be built often but it is still worth looking at opportunities to reduce data transfers.
Frederick Tan (09:43:32): > What would that translate into with regards to “transfer times” to start analyzing the data? I’m asking this with an eye towards potential workshop content.
Vince Carey (11:29:03): > I think we can get those transfer times cut down as a one-off … I should have a fair amount of this done today so we can look at actual approaches.
Vince Carey (11:30:05): > Here are some of the packages in the manifest that don’t install with the 0.0.6 container > > ‘Rmpi’, ‘xps’, ‘Polyfit’, ‘ffbase’, ‘explorase’, ‘EasyqpcR’, ‘sapFinder’, ‘matter’, ‘GeneExpressionSignature’, ‘DBChIP’, ‘omicade4’, ‘shinyTANDEM’, ‘Cardinal’, ‘gpuMagic’, ‘RchyOptimyx’, ‘sparsenetgls’, ‘MSGFplus’, ‘nethet’, ‘SeqGSEA’, ‘MSGFgui’, ‘MetCirc’, ‘Doscheda’, ‘oligoClasses’, ‘RNAither’, ‘tRanslatome’, ‘EDDA’, ‘deco’, ‘cmapR’, ‘oligo’, ‘cellHTS2’, ‘crlmm’, ‘mBPCR’, ‘rnaSeqMap’, ‘HTSFilter’, ‘TCC’, ‘CNVPanelizer’, ‘DEsubs’, ‘ccfindR’, ‘scAlign’, ‘scGPS’, ‘pd.mapping50k.xba240’, ‘pdInfoBuilder’, ‘puma’, ‘ArrayExpress’, ‘frma’, ‘imageHTS’, ‘ADaCGH2’, ‘RNAinteract’, ‘cn.farms’, ‘ArrayExpressHTS’, ‘nucleR’, ‘staRank’, ‘SCAN.UPC’, ‘gespeR’, ‘miRLAB’, ‘crossmeta’, ‘coseq’, ‘mimager’, ‘omicRexposome >
> the full list of fails will be developed later
Sean Davis (11:39:09) (in thread): > Working to reduce transfer times from commercial cloud resources is often a premature optimization. For the Bioc workshop, folks worked with the TenXBrainData (largish data) and despite dozens of folks accessing simultaneously, that part of the workshop was run interactively on GCP with data coming from AWS.
Frederick Tan (11:43:50) (in thread): > Makes sense. Out of curiosity, approximately how long did it take to transfer?
Vince Carey (11:44:42): > I just got a status code 503 in my rstudio session
Nitesh Turaga (11:45:00): > on the AnVIL instance?
Martin Morgan (11:45:02): > Useful to filter against failures in the build report, accessible viaBiocPkgTools::biocBuildReport()
Nitesh Turaga (11:47:15): > This is something I use when I remove failed packages on build reports… > > rpt = BiocPkgTools::biocBuildReport() > build_rpt_failed_pkgs = rpt %>% > filter(node == "malbec1", > stage == "install", > result == "ERROR") %>% > pull(pkg) >
Vince Carey (11:47:30): > thanks
Sean Davis (12:27:39) (in thread): > You can try it yourself. Spin up any workshop here:http://workshop.bioc.cancerdatasci.org/ > > > library(ExperimentHub) > Loading required package: BiocGenerics > Loading required package: parallel > > Attaching package: 'BiocGenerics' > > The following objects are masked from 'package:parallel': > > clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, > parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB > > The following objects are masked from 'package:stats': > > IQR, mad, sd, var, xtabs > > The following objects are masked from 'package:base': > > anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, > Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, > pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, > unique, unsplit, which.max, which.min > > Loading required package: AnnotationHub > Loading required package: BiocFileCache > Loading required package: dbplyr > > hub [===========================================================================================================================| 100% > > snapshotDate(): 2020-08-05 > ](- ExperimentHub() > /home/rstudio/.cache/ExperimentHub > does not exist, create directory? (yes/no): yes > ) # This dataset is available in two formats: a 'dense matrix' format and a > > # 'HDF5-based 10X Genomics' format. We'll use the 'dense matrix' version for > > # this workshop. > > query(hub, "TENxBrainData") > ExperimentHub with 8 records > # snapshotDate(): 2020-08-05 > # $dataprovider: 10X Genomics > # $species: Mus musculus > # $rdataclass: character > # additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer, rdatadateadded, preparerclass, > # tags, rdatapath, sourceurl, sourcetype > # retrieve records with, e.g., 'object[["EH1039"]]' > > title > EH1039 | Brain scRNA-seq data, 'HDF5-based 10X Genomics' format > EH1040 | Brain scRNA-seq data, 'dense matrix' format > EH1041 | Brain scRNA-seq data, sample (column) annotation > EH1042 | Brain scRNA-seq data, gene (row) annotation > EH1689 | Brain scRNA-seq data 20k subset, 'HDF5-based 10x Genomics' format > EH1690 | Brain scRNA-seq data 20k subset, 'dense matrix' format > EH1691 | Brain scRNA-seq data 20k subset, sample (column) annotation > EH1692 | Brain scRNA-seq data 20k subset, gene (row) annotation > > Sys.time() > [1] "2020-08-05 16:22:30 UTC" > > system.time(fname [===========================================================================================================================| 100% > > loading from cache > user system elapsed > 20.029 10.934 79.307 > ](- hub[["EH1040"]]) > see ?TENxBrainData and browseVignettes('TENxBrainData') for documentation > downloading 1 resources > retrieving 1 resource > ) file.size(fname) > [1] 5152667614 >
> In short, 5.15Gb in 79 seconds. This is pretty typical of a 1Gb connection, which is what the node running the workshops uses.
Frederick Tan (12:46:59) (in thread): > Beautiful, thanks! Can’t wait to use this at AnVIL workshops:slightly_smiling_face:
Sean Davis (13:19:38) (in thread): > Note that Orchestra (what I am calling the workshop platform that runs atworkshop.bioc.cancerdatasci.org) is not at all tied to AnVIL.
Vince Carey (14:31:19): > the 3.12 repo should be available now, parallel to 3.11 – if someone can bang on it it would be great
Nitesh Turaga (14:31:47): > I’ll give it go
Nitesh Turaga (14:32:14): > What docker image did you use to build it?
Nitesh Turaga (14:32:38): > Or rather, is it usable only through anvil ?
Vince Carey (16:13:46): > the gcr…0.0.6 for rstudio-bioconductor … if you can pull from gcr it should be usable
Vince Carey (16:44:19): > it isn’t done … there are numerous packages like fs and Rcpp that didn’t get compiled. the package list step probably needs some tweaking
Nitesh Turaga (16:45:20): > The image is still on R 4.0.0, maybe it’s worth getting it to R 4.0.2 and Ubuntu 20.04 like the bioconductor_docker images …
Vince Carey (16:46:27): > Is that risk-free? I wanted to stick with whatever the 0.0.6 Dockerfile prescribed
Nitesh Turaga (16:50:09): > Considering the only images we depend on are therocker/tidyverse
–>anvil-rstudio-base
–>anvil-rstudio-bioconductor
, I would say it “might” be risk free. It would be worth evaluating that a little more. Let me get back to you on that.
Vince Carey (16:54:12): > I need to combine the site-library and packages content to get a complete repo. I am doing it now.
Nitesh Turaga (17:10:54): > So, right now, when I tryBiocManager::valid()
it gives me a vector of outdated packages….I think that’s a result of the packages being installed with R 4.0.0 but with Bioc 3.12….
Nitesh Turaga (17:10:59): > unless i’m mistaken
Vince Carey (19:51:58): > Can you give me exact commands? Things are a bit strange but I think most packages are up to date.
2020-08-06
Nitesh Turaga (08:35:09): > @Vince Carey, I’m trying to reproduce your workflow as given on the anvil_admin page, > > 1. Install AnVIL – do with Ncpus > 1 > 2. Allow updates > 3. NOT YET: set options(repos=AnVIL::repositories()) to get fast install of CRAN packages >
Nitesh Turaga (08:35:29): > https://bioconductor.github.io/AnVIL_Admin/ - Attachment (bioconductor.github.io): Bioconductor / AnVIL > Administration repository for AnVIL project
Nitesh Turaga (08:35:42): > But, it seems like there is some confusion on line 3….
Vince Carey (08:35:49): > It needs to be corrected but I am on a call. There are minor typos…
Nitesh Turaga (08:38:13): > Can you tell me what theNOT YET:
is for?
Vince Carey (08:38:57): > I don’t recall. These steps should work > > BiocManager::install("vjcitn/BiocBBSpack", Ncpus=10) > library(BiocBBSpack) > #Retrieve manifest from Bioconductor git > pl = get_bioc_packagelist("master") > BiocManager::install(pl, Ncpus=50) >
Vince Carey (08:44:21): > That will get you a set of folders, under.libPaths()[1]
with binary package contents. Then a function BiocBBSpack:::dotarmv is used to tar and compress, to another destination folder. You create PACKAGES.gz in the folder.libPaths()[1]
and place it in the destination of dotarmv
Vince Carey (08:45:22): > usetools::write_PACKAGES(unpacked=TRUE)
Nitesh Turaga (14:08:15): > Any reason we can’t get RStudio community edition to show up like this ?? “Community maintainer images by Verified partners” - File (PNG): Screen Shot 2020-08-06 at 2.07.11 PM.png
Nitesh Turaga (14:08:39): > At least it will keep RStudio on peoples radar….
Nitesh Turaga (14:08:44): > for the short term..
Nitesh Turaga (14:10:47): > I know it says “Jupyter environments”…but…can potentially be expanded right?
BJ Stubbs (15:31:43): > Anyone mind if I add a bit of flair to the slide set?
Vince Carey (17:51:11): > what slide set?
Nitesh Turaga (17:51:42): > https://docs.google.com/presentation/d/1e6Uwtj0AhJFpTHlCwktqPwut750DrcDgnH_G9xpmCts/edit?usp=sharing
BJ Stubbs (18:47:27): > I was invited to be an early user of biodata catalyst, do we have an interest in that platform?
Vince Carey (19:58:07): > absolutely – key for NHLBI topmed …
Vince Carey (19:59:10): > <!channel>oscabook builds on anvil – note the url - File (PNG): Screenshot from 2020-08-06 19-57-29.png
Sean Davis (20:00:21): > Nice work!
Vince Carey (20:05:21): > For corporate memory — the dockerfile for the custom environment used is atgithub.com/vjcitn/OrchestratingSingleCellAnalysis… container image is at dockerhub vjcitn/oscabkdemo1:0.0.1
Vince Carey (20:09:30): > The machine type for the build had 64 cores, 240 GB RAM and 250 GB disk. I doubt that was necessary. At conclusion of build, book folder consumed 16GB, and caches seem to have consumed about 2GB (EXPERIMENT_HUB_CACHE and ANNOTATION_HUB_CACHE targets)
Vince Carey (20:11:33): > I am going to stop the runtime. I do not know at this time how to persist the built book in this workspace to make a shareable resource. Tomorrow.
2020-08-07
Martin Morgan (07:14:11): > amazing!
Vince Carey (07:39:45): > I am thinking of using AnVIL::delocalize to move the html and image elements to the workspace bucket, and then adding some functions that will drive the session to browse these after localization. I don’t see a setting for WORKSPACE_BUCKET among the available environment variables – is that a flaw in my container construction?
Vince Carey (08:15:49): > i have delocalized the HTML and figure data to workspace bucket
Martin Morgan (08:58:30): > The absence of WORKSPACE_BUCKET in the RStudio images is a problem with Leohttps://github.com/anvilproject/anvil-docker/issues/10. Would be useful to mention this on the AnVIL slack to renew interst.
Nitesh Turaga (08:58:57) (in thread): > I can do this today.
Nitesh Turaga (09:26:44): > Their answer seems to be they do not have the bandwidth right now for anything but galaxy.
Vince Carey (09:38:57): > I have various time conflicts but it would be nice to whiteboard how we would like to make this workspace a shareable resource for reading and computing through content of the book. Getting the html/png seems pretty easy. The custom container/repo ensures immediate package availability or quick installation. The Rmds should be placed so that the Rstudio session has convenient access to them in the Rstudio file browser. Could Rstudio’s github interface be used to help with version control issues with those? Finally the cached data and annotation should be properly accessible. Do we want to tackle these as a one-off or should we look at this as a class of problems?
Sean Davis (11:24:25): > Just install the OSCA “package” inside docker as part of the build process and copy the contents of the directory into the rstudio home directory. That should do it, right? Essentially, using the BuildABiocWorkshop template (without the GH actions) will get you what you want if you build the docker image in the package directory using this Dockerfile.https://github.com/seandavi/BuildABiocWorkshop2020/blob/master/Dockerfile
Vince Carey (11:53:06): > I think this makes sense but the sources are not in the form of an R package AFAIK.https://github.com/Bioconductor/OrchestratingSingleCellAnalysis… some more strategizing for reorganization and change management seems needed.
Vince Carey (11:54:54): > I don’t think we are completely clear on the role of AnVIL in oscabook management and building – we have a proof of concept and I would love to have a demonstration by the time of the Tuesday tech meeting on how an AnVIL Rstudio user can take advantage of the content and compute resources. But even this is murky to me at the moment and I am stopping for a few hours.
Sean Davis (13:13:46): > In this case, the DESCRIPTION file is all that is needed for the installation. Thecopy . /home/rstudio
will take care of moving the Rmd files into the home directory. The result will be a docker image with all dependencies available and the Rmd files.https://github.com/seandavi/OrchestratingSingleCellAnalysiswith adocker build .
results in the Rmds in the home directory of the Rstudio instance and all packages in the DESCRIPTION file pre-installed.
Martin Morgan (18:18:32): > I don’t know the osca book details but if each chapter were a stand-alone Rmd then these could be converted to Jupyter notebooks (I think this is a one-liner in knitr) and the collection – all 30+ chapters – published in an AnVIL workspace. This would be an amazing resource
2020-08-08
Vince Carey (05:18:37): > I’ve shared the workspacehttps://anvil.terra.bio/#workspaces/use-strides/oscabook_newwith several team members. Note that the billing account name is part of the workspace URL. I think that means that if you just start it up,use-strides
will be billed. But I assume that you can clone it and run it associated with a different billing account. If this is so I should turn down the privileges conferred via sharing … but I want to make sure it is visible and testable by others. The narrative looks like - File (PNG): Screenshot from 2020-08-08 05-14-12.png
Vince Carey (05:19:13): > While there is probably scope for converting to Jupyter, the facts that we now use a runtime with Rstudio embedded, and all the chapters are individual Rmds computable in that runtime, make this somewhat less compelling. One can see all the chapters in the Files pane … I describe how to find them in the narrative. However it is not as simple as clicking on a link in a dropdown. Some of the longer-running chapters might be looked at for expression in WDL. I will be looking into this.
Sean Davis (11:06:12) (in thread): > The way Terra implements users, in order to bill against the account, your user must be added to the billing account. Note that this is DISTINCT from being able to access a workspace. In my case, I can access and even modify the workspace as long as no GCP charges are involved: - File (PNG): image.png
Sean Davis (11:08:30) (in thread): > That said, your description of usage was really clear,@Vince Carey. It might be worth reminding folks that they need to clone the repo before launching a notebook environment.
Martin Morgan (11:40:30): > This is great Vince! For what it’s worth I cloned and localized as described, then opened a system console through RStudio and installed > > pip install jupytext --upgrade >
> I then converted the Rmd to .ipynb > > ~/oscabook_rmds$ ~/.local/bin/jupytext --to notebook *Rmd >
> (there was one glitch; clustering.Rmd had a code chunk labeled in partWard's distance
and the apostrophe needed to be removed). And finally copied these to the google bucket associated with the workspace > > avworkspace_name("oscabook_new%20copy") > bkt = file.path(avbucket(), "notebooks") > gsutil_cp("~/oscabook_rmds/*ipynb", bkt) >
> My workspace now has several dozen notebooks (but I’ve got to switch runtimes to use them…)
Martin Morgan (11:42:48): > I guess the standard R / Jupyter runtime won’t have the packages pre-installed, though…
Vince Carey (12:42:13): > They won’t be preinstalled but you maybe we can make a relevant repo for fast installation – assuming the existing 3.12 isn’t adequate? We also have to work on availability of data/annotation cache in GCP, perhaps.
Vince Carey (12:43:21): > Anyway, we should have a double-barreled presentation for Tuesday, one part with Rstudio Rmd collection, the other with jupyter.
Vince Carey (21:55:01) (in thread): > So you can clone it, link to your billing account, and then run? Sorry to be dense … I should try this from a google identity not known to the project.
2020-08-09
Sean Davis (07:12:15) (in thread): > Yep.
Martin Morgan (12:26:45): > I guess from Allie’s comment that the ‘best practice’ would be to localize / delocalize from the original workspace bucket, rather than creatinggs://oscabook_public?
Sean Davis (13:05:37): > Is there a reason to worry about the AH and EH caches? Seems like running the code should take care of pulling things in at runtime. As for user “experience,” in GCP it takes only about 60-80 seconds to pull at 5GB of TenXBrainData, for example.
Vince Carey (15:37:28) (in thread): > But how do I confer access to that bucket? I need to understand the data tables system to produce references to it, I think.
Martin Morgan (20:07:46) (in thread): > I though if you delocalized to it, then those cloning the workspace would be able to localize from it… I could be mistaken
2020-08-10
Martin Morgan (09:15:35): > A reminder that the slides for tomorrow’s presentation are athttps://docs.google.com/presentation/d/1e6Uwtj0AhJFpTHlCwktqPwut750DrcDgnH_G9xpmCts/edit?usp=sharing; I added names to topics but feel free to add content as appropriate, includnig (bj) making the slides more exciting. > > We don’t have a meeting scheduled for tomorrow at 11 but do we want to meet then to go over the slides yes:+1:no:thumbsdown:
Vince Carey (13:07:19): > I can’t do 11 however, can we pick a different time? Noon?
Martin Morgan (15:20:58): > I can do 2 or 3…
Vince Carey (15:42:33): > let’s go with 2
Vince Carey (18:27:48) (in thread): > AFAICT the delocalized content in bucket does not get cloned.
Martin Morgan (19:51:35) (in thread): > but the localize command would be from the bucket associated with the original workspace, rather than an arbitrary bucket somewhere in the cloud…. so e.g., billing and updates would be in an obvious place
2020-08-11
Vince Carey (09:47:54): > I put in a ton of slides on oscabook but I think there are too many. I think the “future directions” is quite interesting and deserves a good chunk of time. What is the .n=0 argument for values()
Martin Morgan (10:24:22) (in thread): > It’s documented! But it means ‘get all of them’ and is the way GraphQL specifies this. There’s a meeting scheduled some time next week for the future directions, and I was wondering actually about minimizing this component. We ca discuss further at 2
Nitesh Turaga (12:27:31): > I’ve just explored the binary package building again,https://gist.github.com/nturaga/948ac6239ca8f4582b4f3b1bd4609b7d
Nitesh Turaga (12:27:35): > and took a few notes.@Vince CareyThese notes have some changes that need to be made to the process most likely. Part of the need is to update the image itself.
Nitesh Turaga (12:30:00): > The piece of the puzzle i’m missing is, how do I get my anvil image to install binaries fromgs://anvil-rstudio-bioconductor-test/0.99/3.11/src/contrib/. Does my google bucket need any settings to be enabled?
Martin Morgan (12:31:39): > AnVIL::repositories()
andAnVIL::install()
do the trick, I think? The bucket has an https interface…
Nitesh Turaga (12:33:13): > I see, ok, then I seem to have messed up. I created the binaries for 3.11 andrepositories()
is not a function in 3.11
Nitesh Turaga (12:33:18): > I’ll install from github
Nitesh Turaga (12:41:10): > Hmm, i guess my question is along the lines of for access through the"
https://storage.googleapis.com"
https interface, does my bucket need to be public??
Nitesh Turaga (12:41:52): > Let me play with a little and figure it out.
Nitesh Turaga (12:48:43): > ok, it does need to be public.
Nitesh Turaga (12:53:01): > @Martin MorganI however, can I specify a different bucket somehow toAnVIL::install()
? > > AnVIL::install('ABarray', binary_base_url = '[https://storage.googleapis.com/anvil-rstudio-bioconductor-test/0.99/3.11](https://storage.googleapis.com/anvil-rstudio-bioconductor-test/0.99/3.11)') >
> Notice that Vince’s bucket and my bucket have different names. Mine has thetest
attached to it’s name at the end.
Nitesh Turaga (12:54:06): > That command doesn’t work though, it just defaults to compiling the package.
Martin Morgan (13:51:39): > For purely testing purposes, it is probably sufficient to > > repos = BiocManager::repositories() > ## edit repos as desired, then... > install.packages(..., repos = repos) >
Nitesh Turaga (13:52:08): > I see..ok …let me try that
Martin Morgan (13:53:11): > <!channel>we can have our 2pm ‘pre-meeting’ athttps://bluejeans.com/891598362in just a few minutes…
Martin Morgan (19:00:17): > thanks all for the presentation, it seemed to go very well!
2020-08-12
Frederick Tan (09:49:52): > Yes, the Gen3 was very cool! Do you have a vignette using the Gen3 and AnVIL packages to find a VCF and localize it to your RStudio session?
Martin Morgan (10:45:05) (in thread): > @Frederick Tanno, seems like an excellent little project… It’s easy to find a file in Gen3, and to find the location of the file in google storage; I have to work through authentication, etc… (I wonder if it’s actually possible to do this yet through other means, e.g., a GUI or python client…)
Martin Morgan (10:53:05) (in thread): > I’ll try to work on this later today…
Martin Morgan (23:26:03) (in thread): > I added a vignettehttps://github.com/Bioconductor/Gen3/blob/master/vignettes/Gen3TaskVCFFileDiscovery.Rmdto the Gen3 package that illustrates how this works. > > I’ll change it a bit so that thedownload_*()
functions only require a tibble with columnobject_id
, and return the tibble with updated information about the location of the downloaded file. I’ll also ‘vectorize’ the functions so that they be given multiple object_id’s. > > I believe the example works for the 1000 genomes data, which is after all open access, but I’m really not sure that I have the underlying credentials model correct. Probably need yet another lesson on PET accounts …@BJ Stubbs
2020-08-13
Martin Morgan (07:32:04): > I was wondering about coming up with a standard naming scheme for workspaces we intend to use in public settings, e.g.,Bioconductor-Workshop-BioC2020 Introduction To Terra
orBioconductor-Package-AnVIL
orBioconductor-Workflow-Orchestrating Single Cell Analysis
or …? > > I guess the scheme isBioconductor-<Activity>-<Topic>
with the rule being that the hyphen-
is only used to separate these elements. > > Not sure what the vocabulary is for<Activity>
; maybe there is a light-weight existing ontology that we could re-use?
Nitesh Turaga (07:44:39): > Maybe an extension of that idea could be, we have Github repo for notebooks which these workspaces hostBioconductor-AnVIL-notebooks
or a folder within theAnVIL_admin
repo where we can keep versions of the changing notebooks within each workspace?
Frederick Tan (07:46:24): > +1 on “syncing” with a GitHub repo … Dashboard is Markdown so that can be “tracked” as well
Martin Morgan (08:53:55): > Not exactly sure I understand the repo idea? I like that each workspace is somehow explicitly tied to a github repository, but my plan was to process the .Rmd vignettes from the Gen3 package into .ipynb and add those to the workspace. Would I instead / also add them to Bioconductor/AnVIL-notebooks/Packages/Gen3/ (Bioconductor organization, AnVIL-notebooks repository, Packages the<Activity>
and the rest forming the<Topic>
part of the notebook name? Doesn’t sound so bad; what about something like the OSCA book? I guess it’s not a notebook, and it’s complex enough to deserve it’s own repository (which it already has, though maybe not the AnVIL-flavored one?)??
Martin Morgan (08:58:40): > (also I need the ‘Dashboard is Markdown’ comment unpacked a bit…)
Frederick Tan (09:28:38): > Some difficulties with AnVIL workspaces is that they’re not always publicly visible, they’re hard to collaboratively develop, and they’re not version controlled
Frederick Tan (09:29:28) (in thread): > The workspace Dashboard tab is a rendering of a Markdown document so I was thinking that it could be “tracked” via a GitHub repository
Frederick Tan (09:30:03) (in thread): > GitHub repo be the master … deploy by either manual copy and paste or the AnVIL package
Martin Morgan (09:54:35) (in thread): > learn something new every day! Can the Dashboard be updated programmatically or by file upload?
Frederick Tan (10:09:12) (in thread): > As far as I know, only manually copying and pasting. Was hoping the AnVIL package could programmatically create an entire Workspace
Vince Carey (10:17:44): > AnVIL workspace catalog visibility seems inherently linked to billing account. And I think the DESCRIPTION part is valuable text that we want to manage with the best version control practices available. I don’t see how to do this within AnVIL. IMHO we want to have full control of all the content in our workspaces and let AnVIL/Terra be the client of that resource. I think this is congruent with the exchange in the thread, which I just saw… but we don’t know how to establish this programmatically – the ‘client’ has to copy and paste the markdown into the DESCRIPTION.
Martin Morgan (10:51:33): > Hmm, maybe it is possible to create and update the workspace! For instance the description is available as a workspace attribute > > > terra = Terra() > > resp = terra$getWorkspace('bioconductor-rpci-anvil', "1000G-high-coverage-2019") > > content(resp)$workspace$attributes$description > [1] "1000 Genomes 2504 phase 3 panel samples sequenced to high coverage\n==================================================================\n\nThis policy refers > ... >
> and there are$setAttributes()
and$updateAttributes()
methods (the latter is I think a POST method, and probably requires some tweaking to get to work…)
Sean Davis (11:11:30): > Would it be possible to have a “(github) package-to Terra workspace” converter that could do something akin to what we do for workshops? Start with a well-formed R package in github, copy it to a (?new) workspace,install-and-testin a supported AnVIL container, optionally convert Rmd to ipynb, and then make everything available as a workspace. Abstracting this process into code then enabless anyone with the skills to build an R package to port those developments to AnVIL. Data packages could easily be processed the same way.
Vince Carey (11:35:05): > It all sounds good. Do we want to write the converter, or should Terra folks write an importer? As long as our API is clear and the package content is amenable to this process, it sounds to me as if the conversion API definition and maintenance should be at the Terra end.
Sean Davis (12:11:58): > I suspect that this is just R code running on a laptop (or automated outside Terra) from the AnVIL package in R. > 1. Create a workspace. > 2. git checkout > 3. copy working tree to workspace > 4. Copy workspace metadata details from R DESCRIPTION file using code@Martin Morganmentioned above. > Testing and building a custom image can likely leverage google cloud build, but that step would be worth discussing with the Terra/Google folks, perhaps, as I don’t know enough to point you in the right direction.
2020-08-17
Lori Shepherd (11:21:13): > Here is the agenda for tomorrow’s bi-weekly meetingagenda 8-18
Martin Morgan (15:29:10): > FWIW I have a package_as_workspace() function that builds the DASHBOARD and populates the NOTEBOOKS tab with vignettes-as-ipynb; a little different from what Sean suggests but that could be a second iteration… will try to have something workable later today. > > Seems like we should create a standard authentication group that anyone (of us) can add anyone to – Bioconductor_Users ?
Martin Morgan (18:24:48): > I made a group Bioconductor_User and added a number of people here as Admin. My idea was that whenever we wanted to make something widely available we simply give access to this group. If someone wants access to something we’ve made widely available, we add them to the group (HAMBURGER -> package_source_as_workspace()
; there are help pages in the package too. It does require additional software, as indicated in the vignette (it seems like pandoc is supposed to do the conversion from .md to .ipynb, but this always lead to an .html document on my system). > > I made sure to include title / description / authors / license / version info from the DESCRIPTION file into the DASHBOARD, and to link from the dashboard to notebook using vignette titles for display, rather than file names. > > There are some limitations and obvious tweaks, but it seems pretty cool so thanks to Fred for the suggestion!
Frederick Tan (20:59:07) (in thread): > Would love to take a peektan@carnegiescience.edu
Vince Carey (22:00:20) (in thread): > Same herestvjc@channing.harvard.edu
Martin Morgan (22:45:55) (in thread): > I think I have now shared this with Bioconductor_User and you are both a member of that group so should have access
2020-08-18
Frederick Tan (06:25:10) (in thread): > Yes, I can see it now:slightly_smiling_face:
Lori Shepherd (11:00:57): > Meeting starting momentarily: Here is the linkhttps://bluejeans.com/480153337?src=calendarLink
Lori Shepherd (11:01:13): > And agenda link:https://docs.google.com/document/d/135ipNv9757KqByeUCYri2GfCSMF6KbO5w4yfT6tsbac/edit?usp=sharing
Frederick Tan (12:02:49): > Basic question about REST APIs … is that what OSHU is working on?
Martin Morgan (13:13:41): > OSHU is working on unified APIs, but its not clear how much forward momentum that project has?
Martin Morgan (13:14:19): > @Vince Careyand others – for the 3pm call this afternoon, I’ve sketched an agenda athttps://docs.google.com/document/d/19tmgHMeukSLEDK3_ntd1ZeXyPHBZccSpTfPCyRECqfA/edit?usp=sharing
Vince Carey (14:16:55): > Trying AnVILPublish asstvjc@channing.harvard.edu: > > You are not authorized to create a workspace in billing project bioconductor-rpci-anvil >
Martin Morgan (14:51:20): > yep, you have to use your own billing project… (or we could set up an Bioconductor-wide account, but we haven’t done that yet…)
Nitesh Turaga (14:58:46): > Where is the call going to take place? Which platform?
Vince Carey (14:59:14): > link for 3pm call?
Nitesh Turaga (14:59:21): > Yes
Vince Carey (14:59:33): > i don’t have one
Nitesh Turaga (15:00:06): > https://bluejeans.com/480153337
Nitesh Turaga (15:00:14): > Got it
Nitesh Turaga (15:02:08): > @Martin Morganwaiting for you to start the meeting…
Vince Carey (15:02:14): > waiting for moderator
Martin Morgan (15:03:46): > https://broadinstitute.zoom.us/j/95158159708?pwd=VFp1a0o5ZHVkY2lCN2J4NXlNZVFHUT09
Vince Carey (16:05:56) (in thread): > looks like even echo=FALSE chunks don’t propagate. but there is now an ontoProc workspace … I think this is basically working – we need practices for vignette production that will support this idea well.
Martin Morgan (17:06:43) (in thread): > @Vince CareyCan you go under the teardrop associated with the ontoProc workspace and add’Bioconductor_User@firecloud.org’ as a ‘READER’ ? I just pushed an update that will do this step automatically in the future…
Martin Morgan (17:10:07) (in thread): > Or hey,AnVILPublish::bioconductor_user_access('<billing-account>', '<workspace-name>')
Vince Carey (17:51:17) (in thread): > haven’t tried the latter yet.
Vince Carey (22:10:25) (in thread): > I wonder if we need to do some bulletproofing in this process. I think my runtime is Bioc 3.11 but the version I am trying to publish is from devel branch. Can there be a defense against this?
Vince Carey (23:12:18) (in thread): > Is it possible to use the API to specify the kernel to be used in the jupyter process on startup? I always start in python mode and get an error.
Vince Carey (23:28:57) (in thread): > Now the ontoProc workspace is based on 3.11 code
2020-08-19
Vince Carey (06:12:29) (in thread): > By using rmarkdown::md_document() as the rendering target, we get markdown versions of tables in the ipynb. If we then run through the code in AnVIL, we get a second rendering - File (PNG): Screenshot from 2020-08-19 06-08-46.png
2020-08-20
Martin Morgan (08:40:06) (in thread): > @Vince Careyand others – I’ve made a lot of changes, including to the basic interface – useas_workspace()
, etc. > > The Rmd -> jupyter transition doesn’t evaluate cells first, so the duplicated output should be gone (individual cells might haveeval = TRUE
, and these are still evaluated). Also the R kernel should be the default. > > I’ve been working on publishing bookdown-style repositores, e.g.,https://anvil.terra.bio/#workspaces/bioconductor-rpci-anvil/Bioconductor-Workshop-BCC2020from (a locally modified, as described in the AnVILPublish vignette)https://github.com/Bioconductor/BCC2020There’s now a NEWS.md for changes, and the vignette / README has been updated athttps://github.com/Bioconductor/AnVILPublish
2020-08-21
Vince Carey (14:09:02): > FWIW, the AnVILBilling package (try the fork at vjcitn/AnVILBilling for now while PR is being looked at) has been extended and includes some shiny to help investigate costs over a given interval. I am interested to know if there are authentication or other problems with explaining what has to be in place for this to be usable.
Martin Morgan (16:12:11) (in thread): > Alright, giving this a whirl. I think so far I have managed to create a ‘billing’ project under the same ‘billing’ account that has a project corresponding to the ‘billing’ account in Terra – moderately confusing… > > RPCI Bioconductor AnVIL (Google billing account – who’s paying the bill) > * bioconductor-rpci-anvil – project created when I registered with Terra > * bioconductor-rpci-billing – project created just now… > Just have to wait for costs to be incurred over the next day or so to move on to the next step…
Vince Carey (17:44:24) (in thread): > Be sure to add BigQuery scopes to the new project.
2020-08-25
Vince Carey (12:00:54): > I cannot start shiny apps in anvil rstudio. This seems new. From various computers I am getting connection refused.
Martin Morgan (12:56:16): > <!channel>things to bring up during the standup portion of the technical working group today?
Nitesh Turaga (12:56:43): > The 503 issue on the anvil-rstudio-base and anvil-rstudio-bioconductor images?
Vince Carey (13:01:56): > I was hoping to have the cost explorer running in AnVIL. Let’s see if I can get shiny running.
Nitesh Turaga (13:02:16): > Is the issue the lack of another open port@Vince Carey?
Vince Carey (13:02:47): > How should I diagnose this? reboot? I will try that
Nitesh Turaga (13:04:42): > Hmm…I was looking at this issue and asked the question, maybe it is totally unrelated,https://github.com/rstudio/shiny/issues/2455
Nitesh Turaga (13:05:50): > I guess you can somehow check if the port 8080 is open….that’s where it wants to deploy the shiny app
BJ Stubbs (14:33:08) (in thread): > PR merged
BJ Stubbs (14:34:17) (in thread): > @Martin Morgandid you create a billing bigquery dataset and enae billing export to bigquery on your project in the google cloud console?
BJ Stubbs (14:34:39) (in thread): > Enable
BJ Stubbs (14:36:15): > I tried changing the port and using the ip address of the node instead of 127.0.0.1 and neither worked
Vince Carey (14:51:00): > i also did these
Sean Davis (15:29:39) (in thread): > https://github.com/rstudio/shiny/issues/2455#issuecomment-497548811There is no possibility of shiny working on google cloud run (or any truly serverless technology that I know of). AnVIL is not using google cloud run service, so the issues are unrelated.
Sean Davis (15:35:03): > As a workaround, can you just write the CMD for the docker image to run your shiny app (https://shiny.rstudio.com/reference/shiny/1.0.1/runApp.html) using a standard port. This would be an “Rscript” call that loads the library and then runs the app. No Rstudio involved, so no need for a second port or for a “popup” to work….
Sean Davis (15:35:34): > Clearly a workaround, though….
Vince Carey (15:58:54): > Will have a look at that if no solutions emerge soon. Seems a worthy method to know.
Vince Carey (15:59:30): > BTW@Martin MorganI will share screen to show the new cost visualization if you want, during the standup.
Martin Morgan (16:12:49): > @Vince Careywould you like to link something to the agenda ?https://docs.google.com/document/d/1XcTR3rDFP4oE_4Ggl1WfD7nRfKPSHmWaE2_amIjfuk4/edit?usp=sharing
Vince Carey (16:14:10): > did i mess up the hierarchy? seems too indented
Martin Morgan (16:16:59): > looks ok now
2020-08-26
Martin Morgan (07:03:17): > Mike would like a 2 minute video of current status of Bioconductor in AnVIL. It would be good to have a draft by Thursday afternoon (or at least a sketch of content); it is needed on Monday afternoon. Here’s a google doc for notes…https://docs.google.com/document/d/1ZJuJ93WqEEVskGy6y29z4WcUQIRgdMWuanDUt29LVDo/edit?usp=sharing
Vince Carey (07:16:54): > scroll down to the blue band onanvilproject.org– you get a carousel with platforms and tools … bioc is not mentioned… links to here:https://anvilproject.org/tools… as before - Attachment (The AnVIL): Analysis Tools > AnVIL provides access to an array of analysis and visualization solutions for genomic data analysis.
Vince Carey (07:19:16): > AnVIL is on front page ofbioconductor.org, under “Use”.
Frederick Tan (07:46:04): > With regards toanvilproject.org, the Portal Group (Thu 12pm) is very interested and receptive to suggestions.
Frederick Tan (07:48:49): > One priority area will beanvilproject.org/training… are there particular items frombioconductor.github.io/AnVIL_Adminthat should be promoted?
Vince Carey (08:00:33): > https://waldronlab.io/AnVILWorkshop/seems very pertinent. I am going through it cold right now, establishing a new billing account…. - Attachment (waldronlab.io): AnVILWorkshop2020 > We introduce cloud-based genomics platofrm, Terra, and Bioconductor package AnVIL for R-friendly usage of Terra.
Vince Carey (08:04:24): > seems i can’t clone the mentioned workspacehttps://app.terra.bio/#workspaces/bioc2020-workshop-jupyter/BioC2020_Workshop_Jupyterwith my new identity. “either it does not exist or i don’t have access” – so that resource atwaldronlab.iomay need updating.
Vince Carey (08:07:37): > I decided to proceed with the public “Tumor_Only_CNV” which clones nicely. I will take notes of my times of activity so that the charges can be investigated tomorrow using AnVILBilling.
Martin Morgan (09:36:59): > @Sehyun Oh@Levi Waldron@Vince CareyCan I suggest we continue to adopt a standard (re-)naming for our bioconductor resources? Either rename or clone workspaces you’d like to make accessible under a scheme like > > Bioconductor-Workshop-TumorOnlyCNV > Bioconductor-Resource-OrchestratingSingleCellAnalysis >
> and addBioconductor_User@firecloud.orgas a user on the workshop DASHBOARD page teardrop (right side) ‘Share’ icon? This will make our contributions much more visible and coherent.
Martin Morgan (09:41:14): > Another small improvement is to figure out what should be at the end of ‘Notebook Runtime’ -> ‘Project Specific Environment’ ‘AnVIL’ link under the first dropdown… I think this is meant to point to the use of the RStudio image…
Levi Waldron (09:43:45) (in thread): > I think that making the workspace available to anyone without being individually given specific permission is still not possible in Terra (“Publish - COMING SOON”), and the workshop attendees had to be given access individually. Although since this is a public workshop we could/should ask for the Terra team to publish it through their backend.@Sehyun Ohcan you confirm?
Vince Carey (10:21:31): > I think we need an issue tracker for ideas like these
Martin Morgan (10:22:04): > https://github.com/Bioconductor/AnVIL_Adminwould be an appropriate place to open issues
Vince Carey (10:23:24) (in thread): > that sounds right … the caveats should be more visible at points of contact. apropos publishing i think it is relatively easy to have terra personnel make it “public” but not “featured”.
Sehyun Oh (11:28:59) (in thread): > None of the BioC2020* workspaces is public and the website/vignette is based on the assumption that I’ll (manually) give an access to workshop participants. I’ll update it based on which workspace I can make public through Terra team.
Sehyun Oh (11:31:05) (in thread): > @Levi Waldronyes, that’s correct - make workspace public is not directly available yet. I’ll check with Terra team to make BioC2020 workspace public and update workshop webpage accordingly.
2020-08-27
Vince Carey (13:14:41) (in thread): > do we need to discuss video or 3pm meeting?
Martin Morgan (13:39:36) (in thread): > happy to, but also can just discuss then? your call
Martin Morgan (13:42:36) (in thread): > … I don’t really know how to make a video, and don’t really know what the role of the video is in our presentation – is it everything, or just ‘demos’ that could go wrong? Seems like Seyhun’s work is a nice intergrative example of using AnVIL, and a Shiny app (for billing, or synthetic cohorts, or…) a nice illustration of the ‘future’ including connection with clinician / researcher.
Martin Morgan (13:45:48) (in thread): > Here’s a screenshot that captures some of the user-facing products… - File (PNG): Screen Shot 2020-08-27 at 1.44.26 PM.png
Martin Morgan (16:24:52) (in thread): > I created three placeholder slides athttps://docs.google.com/presentation/d/1As46Ed4OAjnDPZ4hwBcfgadvv3Dega-Dq01zfrJgqqk/edit?usp=sharingfor possible short videos?
2020-08-31
Lori Shepherd (13:01:53): > Agenda for tomorrows 11 am EST meetingAgenda 9-1-20
2020-09-09
Vince Carey (14:08:10) (in thread): > @Martin Morgandid you ever make progress with this? I think we are possibly ready to submit AnVILBilling to bioc
2020-09-10
Frederick Tan (14:53:45): > re: usage statistics … can you track how many timesus.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.6has been accessed?
Nitesh Turaga (14:55:04): > nope…not that I know of, and GCR doesn’t give us a way on the console
Sean Davis (16:02:55): > As with everything cloud, there are metrics. When thinking about this stuff, remember Google has to make money, so they know and usually let us know, also. Turns out that GCR is just a google storage bucket.https://medium.com/google-cloud/google-container-registry-statistics-from-gcs-access-logs-3be705abc413 - Attachment (Medium): Google Container Registry statistics from GCS access_logs > Sample flow to extract Google Container Registry usage statistics (image push/pull counts, analytics, etc). GCR images are hosted on…
Nitesh Turaga (16:41:00): > hmm…thanks@Sean DavisI’m able to follow the logic there and get till the cloud storage and see the containers folder. But I don’t have access to certain APIs (BQ, Cloud Pub/Sub..) in theanvil-gcr-public
project where these containers are hosted.@Frederick TanIt might be a good question for Rob Title and team maybe?
Nitesh Turaga (16:41:39): > They might have something already monitoring the account which hosts the jupyter images, and it might be an easy to port solution.
Nitesh Turaga (16:43:35): > But following a link within the medium article Sean sent,https://github.com/salrashid123/gcr_stats. This was very useful.
Frederick Tan (16:46:40): > Thanks! Will try the#terrachannel atthe-anvil.slack.com
Nitesh Turaga (16:47:20): > The project for the jupyter images on google cloud isterra-docker
.
2020-09-11
Martin Morgan (14:10:06) (in thread): > @Vince CareyI’m trying to follow the vignette but get stuck at ‘Setting up a request’. I interpret 3. and 4. asproject name
anddataset name
in the figure above. I don’t know what to use for 5 (table ID); I have two tablescloud_pricing_export
and gcp_billing_export_v1_010C68_FC9D12_2AFEDAT
; I don’t know if I’m the right track with these.setup_billing_request()
takes a final argumentbilling_code
but the vignette and man page are not helpful about what value would work –foo
doe not:wink:
Vince Carey (14:17:26) (in thread): > Thanks, let me take a look at that. I agree that the terminology is too flat. billing account, billing code, project, … i have a lot of trouble keeping them straight. Given a gcloud auth configuration we should be able to define an object that has all the fields properly organized. I wonder whether gargle would have some helpful items.
Tim Triche (15:04:06): > @Tim Triche has joined the channel
Tim Triche (15:04:50): > hi all – is there an example of scanning in read counts from a CRAM file with Rsamtools or Rhtslib these days? I didn’t find one onsupport.bioc.organd the only hits on slack were here.
2020-09-12
Martin Morgan (11:05:37): > The branchhttps://github.com/Bioconductor/Rsamtools/tree/cramis supposed to support this; I’ve been meaning to complete this… It may or may not work in it’s current implementation… I think you just need tobf = BamFile("path/to/a.cram")
and then usebf
in, e.g.,scanBam()
or probablyGenomicAlignments::readGAlignments()
and friends. > > Please let me know how this goes…
Martin Morgan (11:05:53): > @Tim Triche^^
Tim Triche (11:06:26): > Will do, thanks so much@Martin Morgan!
Tim Triche (11:07:38): > I have a test script and a bam/cram/bed version of some reads set up to test. Will grind through it:grin:
Tim Triche (13:48:54): > hmmm. this may be an issue: > > R> install("Bioconductor/Rsamtools", ref="cram") > # ... > /home/tim/R/x86_64-pc-linux-gnu-library/4.0/Rhtslib/include/sam.h:87:30: note: expected 'samfile_t *' {aka 'struct <anonymous> *'} but argument is of type 'htsFile *' {aka 'struct <anonymous> *'} > 87 | void samclose(samfile_t *fp); > | ~~~~~~~~~~~~~~~~~~~~~~^~~ > make: ***** [/usr/lib/R/etc/Makeconf:167: bamfile.o] Error 1 > ERROR: compilation failed for package 'Rsamtools' > * removing '/home/tim/R/x86_64-pc-linux-gnu-library/4.0/Rsamtools' > * restoring previous '/home/tim/R/x86_64-pc-linux-gnu-library/4.0/Rsamtools' > R> packageVersion("Rhtslib") > [1] '1.21.1' >
> I will try reinstalling Rhtslib from bioc-git and debugging from there. Thanks for the lead regardless
Tim Triche (13:53:21): > Update: 1.21.1 is the version in bioc-git. Something else must be funky. Will try pulling the cram tree and fixing if possible.
Tim Triche (14:28:42): > OK, pulled the branch and replaced instances of htsFile with samfile_t. This solves the compilation issues, but is not quite sufficient to get the branch up and running:
Tim Triche (14:29:03): > > **** R > **** inst > **** byte-compile and prepare package for lazy loading > **** help > ***** installing help indices > **** building package indices > **** installing vignettes > **** testing if installed package can be loaded from temporary location > Error: package or namespace load failed for 'Rsamtools' in dyn.load(file, DLLpath = DLLpath, ...): > unable to load shared object '/home/tim/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rsamtools/00new/Rsamtools/libs/Rsamtools.so': > /home/tim/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rsamtools/00new/Rsamtools/libs/Rsamtools.so: undefined symbol: cram_tell > Error: loading failed > Execution halted > ERROR: loading failed > * removing '/home/tim/R/x86_64-pc-linux-gnu-library/4.0/Rsamtools' > * restoring previous '/home/tim/R/x86_64-pc-linux-gnu-library/4.0/Rsamtools' >
Tim Triche (14:29:08): > Getting there, though, I think.
Tim Triche (14:53:54): > Bizarre! > > tim@thinkpad-P1:~/Dropbox/Rsamtools/src$ grep cram_tell * > hts_utilities.c: position = cram_tell(fd->fp.cram); > Binary file hts_utilities.o matches > Binary file Rsamtools.so matches >
Tim Triche (14:58:35): > How was htslib 1.7 chosen as the “freeze target” for Rhtslib? > > tim@thinkpad-P1:~/Dropbox/Rhtslib/src/htslib-1.7$ head -1 NEWS > Noteworthy changes in release 1.7 (26th January 2018) >
Tim Triche (14:58:59): > It seems like the htslib ABI has changed recently (bad) but on the other hand the SAM/BAM/CRAM unification seems to have improved (good)
Tim Triche (15:00:11): > > tim@thinkpad-P1:~/Dropbox/Rhtslib/src/htslib-1.7/cram$ grep cram_tell * > cram_io.h:int64_t cram_tell(cram_fd *fd); >
Tim Triche (15:00:16): > definitely in there
Tim Triche (15:05:54): > OH FOR FUDGE’S SAKE: > > tim@thinkpad-P1:~/Dropbox/Rsamtools/src$ nm -gDC Rsamtools.so | grep -i cram | grep -A2 -B2 tell > 00000000000bf3d0 T cram_subexp_decode_free > 00000000000c0b70 T cram_subexp_decode_init > U cram_tell > 00000000000a4a20 T cram_uncompress_block > 000000000009e980 T cram_update_curr_slice >
Tim Triche (15:09:20): > looks like a linking order issue
Tim Triche (15:18:30): > or not: > > tim@thinkpad-P1:~/bioc-git/Rhtslib/src$ nm -gDC Rhtslib.so | grep -c cram_tell > 0 >
Tim Triche (15:32:32): > This is bizarre.cram_tell
is in there, but maybe it shouldn’t be?!? > > Noteworthy changes in release 1.10 (6th December 2019) > ... > * Deleted defunct cram_tell declaration. (66c41e2; #915 reported by > Martin Morgan) >
Tim Triche (15:33:05): > (from a checkout of htslib itself)
Tim Triche (15:35:48): > and back to the beginning, perhapssamfile_t
is the culprit after all
Tim Triche (15:37:28): > > gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/home/tim/R/x86_64-pc-linux-gnu-library/4.0/Rhtslib/include' -I'/home/tim/R/x86_64-pc-linux-gnu-library/4.0/S4Vectors/include' -I'/home/tim/R/x86_64-pc-linux-gnu-library/4.0/IRanges/include' -I'/home/tim/R/x86_64-pc-linux-gnu-library/4.0/XVector/include' -I'/home/tim/R/x86_64-pc-linux-gnu-library/4.0/Biostrings/include' -fpic -g -O2 -fdebug-prefix-map=/build/r-base-Do_dS_/r-base-4.0.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c bamfile.c -o bamfile.o > bamfile.c: In function '_bam_tryopen': > bamfile.c:19:14: error: 'htsFile' {aka 'struct <anonymous>'} has no member named 'header' > 19 | if (sfile->header == 0) { > | ^~ >
2020-09-14
Vince Carey (22:46:28) (in thread): > Here are clues from thedemo_rec
in the package: > > > demo_rec@project > [1] "bjbilling" > > demo_rec@dataset > [1] "anvilbilling" > > demo_rec@table > [1] "gcp_billing_export_v1_015E39_38569D_3CC771" >
> In other words, the table is the long ‘gcp_billing_export…’ string. The project is > the specially created project with bigquery scope. The dataset is the name of the > BigQuery dataset identified as the target of the billing export. Is that enough guidance? If so I will add these details to vignette.
Vince Carey (22:47:18) (in thread): > @Martin Morgan^^
Vince Carey (22:54:13) (in thread): > billing_code is the “billing project” that we identify when we start an anvil workspace. it is separate from the ‘project’ in the request, because the anvil workspace project does not have BigQuery scope! (in general)
2020-09-15
Lori Shepherd (08:07:03): > Agenda for 11 am meeting:https://docs.google.com/document/d/1wuucgPi6zcJG-y8nxIjDumFz4BnPCnW90obZEgvh0O0/edit?usp=sharing
Martin Morgan (10:57:54) (in thread): > This gets me the data, thanks. Working through the vignette now…
Nitesh Turaga (12:04:46): > https://youtu.be/pfh8Q9funM0 - Attachment (YouTube): k8s redis demo video
Nitesh Turaga (12:04:52): > https://github.com/Bioconductor/k8sredis/tree/nitesh_dev
Vince Carey (12:41:52) (in thread): > @Tim Tricheany updates here? Do you think you will slay the beast?@BJ Stubbsdo you have contributions here via changes to htslib, or was that a different issue?
Tim Triche (13:38:09) (in thread): > been going around in circles a bit. cram_tell should be gone, and samfile_t is in fact the right structure to use, but I need to figure out what the right substitute for cram_tell is. I’ll take another whack at it today
2020-09-17
Frederick Tan (20:37:45): > Persistent disk and “Cloud environment” updateshttps://support.terra.bio/hc/en-us/articles/360049850711
2020-09-18
Vince Carey (05:53:21): > $2.00/month for 50GB. .04/GB/month (i checked that it is linear downwards from 50)
Frederick Tan (08:07:58): > Yeah, time to updatehttps://docs.google.com/presentation/d/1DnNRX703nlwEi0VLNqwcBzvGC8vBI1yGR0x1qOCapdY/edit#slide=id.g8ebed1f26c_0_46
2020-09-21
Martin Morgan (06:34:20): > We were asked to prepare a text summary (1/2 page) of our activities for the ECC meeting, in addition to a slide presentation. Here’s the text-based summary so farhttps://docs.google.com/document/d/1f7Rru7fafOavdg5XSpOOjjtFFSQndRgkpBFZLI3J3Mk/edit?usp=sharingwith comments welcome; I think this is to be presented in advance of the meeting, so probably needs to be fleshed out this week.
Nitesh Turaga (12:30:35): > Following up on the@Sean Davis’s and@Frederick Tandiscussion a week or so ago: > > Since, images on the GCR are just storage buckets, I wonder if a “pull” on public images on GCR is charged to the project hosting the image on GCR or the individual who is pulling. I can’t imagine the latter being the case because I can pull a GCR image without having a billing account. Any idea? Or is google waiving the cost of this entirely? I can’t imagine that either. > > Not able to find documentation on this, and it just points me to network egress pagehttps://cloud.google.com/storage/pricing#network-pricing. - Attachment (Google Cloud): Cloud Storage pricing | Google Cloud
Frederick Tan (14:41:43): > Thanks@Nitesh Turaga! Been told that there is a “MixPanel” that reports the name of every workflow that is run, every notebook that is started. Being passed back and forth between people trying to get access so was hoping we could just a proxy:slightly_smiling_face:
2020-09-23
Frederick Tan (12:48:57): > In case you missed it, Portal Group is looking for feedback on this diagramstaging.anvilproject.org/about
2020-09-24
Vince Carey (10:11:11): > Thanks Fred – I think the Bioconductor group should discuss this diagram. We can be mentioned in conjunction with dockstore, and we can have a paragraph at the bottom. We should also be above Rstudio. IMHO
Frederick Tan (10:18:54): > Agreed! Feel free to drop a short message in the #anvil-portal channel onthe-anvil.slack.comto give Kevin Osborn a heads up. FYI, the Portal Group meets today at 12pm EDT.
Vince Carey (10:48:26) (in thread): > Is this still bubbling? I am wondering whether pysam through basilisk would be a way to address CRAM ingestion.
Vince Carey (11:18:29) (in thread): > @Tim Triche@Martin Morgan^^
Martin Morgan (11:25:44) (in thread): > it’s still a work in progress, further from completion than I though. I have something that reads non-paired alignments (readGAlignments()) but not readGAlignmentPairs()), so there are a few deeper issues
Martin Morgan (11:28:56): > <!channel>I’ll do an ad hoc presentation on Bioconductor in AnVIL at the developer forum at 12pm EDT today athttps://bluejeans.com/114067881if you’re looking for light entertainment…
2020-09-28
Lori Shepherd (11:18:57): > Tomorrow’sAgenda 9-29
Tim Triche (12:45:57) (in thread): > a workingreadGAlignments()
and/orcountOverlaps()
is all I need for the project at hand, tbh
Martin Morgan (13:14:26) (in thread): > Please try withgithub.com:Bioconductor/Rhtslib@enable-cram
andgithub.com:Bioconductor/Rsamtools@cram
but if you run into problems please stop trying! I’m trying to get there this week…
Tim Triche (13:24:19) (in thread): > you rule
Tim Triche (13:27:04) (in thread): > > install("Bioconductor/Rsamtools@cram") > Bioconductor version 3.11 (BiocManager 1.30.10), R 4.0.0 (2020-04-24) > Installing github package(s) 'Bioconductor/Rsamtools@cram' > Downloading GitHub repo Bioconductor/Rsamtools@cram > Error: Failed to install 'Rsamtools' from GitHub: > Invalid comparison operator in dependency: = >
> ack
Tim Triche (13:27:42) (in thread): > hmmm > > tim@thinkpad-P1:~$ git clone[https://github.com/Bioconductor/Rsamtools](https://github.com/Bioconductor/Rsamtools)Cloning into 'Rsamtools'... > remote: Enumerating objects: 168, done. > remote: Counting objects: 100% (168/168), done. > remote: Compressing objects: 100% (90/90), done. > remote: Total 8717 (delta 101), reused 111 (delta 78), pack-reused 8549 > Receiving objects: 100% (8717/8717), 12.72 MiB | 1.01 MiB/s, done. > Resolving deltas: 100% (6729/6729), done. > tim@thinkpad-P1:~$ cd Rsamtools > tim@thinkpad-P1:~/Rsamtools$ git checkout cram > Branch 'cram' set up to track remote branch 'cram' from 'origin'. > Switched to a new branch 'cram' > tim@thinkpad-P1:~/Rsamtools$ R CMD INSTALL . > * installing to library '/home/tim/R/x86_64-pc-linux-gnu-library/4.0' > * installing **source** package 'Rsamtools' ... > **** using staged installation > **** libs > Error in list(Version = c(1L, 21L, 1L, 1L)) = list(c(1L, 21L, 1L, 1L)) : > invalid (do_set) left-hand side to assignment > * removing '/home/tim/R/x86_64-pc-linux-gnu-library/4.0/Rsamtools' > * restoring previous '/home/tim/R/x86_64-pc-linux-gnu-library/4.0/Rsamtools' >
Tim Triche (13:28:20) (in thread): > maybe it’s time to bump R to 4.0.2 (since I need to fix a bunch of packages for submission/release anyways)
Tim Triche (13:29:53) (in thread): > alright I canned theLinking-to: Rhtslib (= 1.21.1.1)
and it has installed now
Martin Morgan (13:31:07) (in thread): > yes there’s an update pushed that is just > 1.21.1
Tim Triche (13:31:25) (in thread): > IT WORKS!!! YOU ARE A MADMAN!!!
Tim Triche (13:32:20) (in thread): > > library(Rsamtools) > setwd("~/Dropbox/TricheLab/CRAMtest/") > list.files() > aCRAM <- BamFile("SCRAP207.chr9.chr22.chrM.GRCh38.cram") > aCRAM > class: BamFile > path: SCRAP207.chr9.chr22.chrM.GRCh38.cram > index: SCRAP207.chr9.chr22.chrM.GRCh38.cram.crai > isOpen: FALSE > yieldSize: NA > obeyQname: FALSE > asMates: FALSE > qnamePrefixEnd: NA > qnameSuffixStart: NA >
Tim Triche (13:33:41) (in thread): > > CRAMgal <- readGAlignments(aCRAM) > > head(CRAMgal) > GAlignments object with 6 alignments and 0 metadata columns: > seqnames strand cigar qwidth start end width > <Rle> <Rle> <character> <integer> <integer> <integer> <integer> > [1] 9 + 74M 74 14522 14595 74 > [2] 9 + 75M 75 14525 14599 75 > [3] 9 + 76M 76 14525 14600 76 > [4] 9 + 76M 76 14525 14600 76 > [5] 9 + 76M 76 14525 14600 76 > [6] 9 + 75M 75 14525 14599 75 > njunc > <integer> > [1] 0 > [2] 0 > [3] 0 > [4] 0 > [5] 0 > [6] 0 > ------- > seqinfo: 194 sequences from an unspecified genome >
Tim Triche (13:34:00) (in thread): > this is great! Please merge this into devel if at all possible! Thanks so much!
2020-09-29
Lori Shepherd (11:02:23): > <!channel>Meeting link happening now :https://bluejeans.com/480153337?src=calendarLink
Martin Morgan (16:17:48): > hmm… looks like we’re up for the bigger presentation in a couple of weeks!
2020-09-30
Vince Carey (12:21:30): > Would it be wise for us to start an “advanced view” of 3.12 in an anvil-targeted docker container? It was awkward to “reinstall” Bioc, as was needed for the oscabook demo. The idea would be that we have a gcr-resident container that runs 4.0.2 and bioc 3.12 ASAP for testing. The associated binary repo would also be useful.
Frederick Tan (13:30:26): - File (PNG): anvil-terra-bio-mixpanel-q3.png
Frederick Tan (13:30:40) (in thread): > Haven’t seen the raw data, but reported launches on anvil.terra.bio (not terra.bio wide) during the past three months > * 139 Workflows > * 141 Jupyter > * 161 RStudio > More RStudio than I would have expected
Sean Davis (15:01:11) (in thread): > Thanks for sharing,@Frederick Tan.
Frederick Tan (15:29:23): > Google Analytics fromanvilproject.orghttps://docs.google.com/document/d/1dkCKOja8ey9whujhP8cauUREU09IW2iSATGyi5jtk0g
Jenny Smith (16:47:42): > @Jenny Smith has joined the channel
2020-10-05
Ines de Santiago (19:49:57): > @Ines de Santiago has left the channel
2020-10-06
Lori Shepherd (13:41:32): > <!channel>We are up for the technical presentation next week Oct. 13. I started the slide deck:slides– Did we want to have a discussion here about what we would like to focus on? if you would like to add slides feel free – we can review the presentation material in the morning of the 13th at our bi-weekly meeting
Vince Carey (13:49:35): > I have some material on variant annotation and OpenCRAVAT that can go in if there is time. It is a bit bumpy but later this week I will put some slides in.
2020-10-11
Vince Carey (13:52:39): > it seems to me that on terra, using theus.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.6runtime, personally installed packages are in /home/rstudio/packages … and that is, by default, the first entry in .libPaths(). i think it would be better to use a version-stamped folder for personally installed packages, as is customary for installed.packages. thoughts? FWIW i ran into this when switching from a 3.12 environment to a 3.11 environment … /home/rstudio/packages had inappropriate versions of packages. admittedly this is a highly unlikely event for users, but still the conventions of installed.packages seem appropriate to follow regardless.
Vince Carey (14:22:25): > also, at this time, 80 packages in the image are regarded as out of date by BiocManager::valid
Vince Carey (14:54:57): > After doing the update, the new packages are in /home/rstudio/packages … and BiocManager does not use them in considering whether there are packages to be updated. It still says there are 80 out of date.
Nitesh Turaga (19:22:26): > Hi Vince, > > Just as a follow up to this. I believe the broad team made this change so they could try and cull the number of “.libPaths()” which were available on the docker image. > > I think we can modify this if needed. They don’t mind input from us and are open to making changes on the rstudio image. They have made these changes so far on advice from us. - Attachment: Attachment > it seems to me that on terra, using the us.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.6 runtime, personally installed packages are in /home/rstudio/packages … and that is, by default, the first entry in .libPaths(). i think it would be better to use a version-stamped folder for personally installed packages, as is customary for installed.packages. thoughts? FWIW i ran into this when switching from a 3.12 environment to a 3.11 environment … /home/rstudio/packages had inappropriate versions of packages. admittedly this is a highly unlikely event for users, but still the conventions of installed.packages seem appropriate to follow regardless.
Vince Carey (23:04:50): > Thanks Nitesh. We can consider this on the next internal call. I think my event was unusual so not an urgent matter but I’d like us to think more about conventional/version-tagged library folders.
2020-10-12
Martin Morgan (17:26:03): > To add – Rob expressed a desire to have the distributed packages immutable, so that the user shooting themselves in the foot can (a) know and (b) recover. This also goes along with the idea of a ‘release’, and would further be one strategy for binary package repositories… > > I’d guess that base Rold.packages()
reports the old packages, even when another library path has newer versions? This could be a feature request for R, or an improvement implemented in BiocManager. If actually old.packages doesnotreport the masked, older packages, then this is simply a bug in BiocManager. Is it easy to check whetherold.packages()
is noisy about masked, old packages? > > The default directory structure created by R in the user directory has the R version, but this seems irrelevant on a docker image with a single R version? But maybe you mean Bioconductor version, and I don’t think there’s a ‘standard’ for that?
Vince Carey (17:47:02): > I agree that it seems irrelevant to note the R version. Let’s ignore this issue for now. Indeed, old.packages looks over the whole set of .libPaths and reports what’s out of date. I did not know about old.packages.
Vince Carey (17:58:34): > I finished my contributions to the slide deck. Maybe I will try shinytest before we meet tomorrow….
2020-10-13
Lori Shepherd (08:26:20): > I’ll work on a quick agenda for today’s 11 am meeting – given that we have the tech presentation today, I’m guess most of today’s meeting with be a quick bi-week update and then reviewing the material we plan on presenting this afternoon –
Lori Shepherd (08:30:52): > https://docs.google.com/document/d/1tvRvUJwBzF6aMMxksJfgm5voK6gS_vdRzx3zBxahivI/edit?usp=sharing
Kayla Interdonato (09:32:58): > @Kayla Interdonato has joined the channel
Martin Morgan (15:27:45): > <!here>for those interested in continuing work on the slides, we can use bluejeans athttps://bluejeans.com/2024711962starting at about 3:30
Sehyun Oh (22:29:15): > I want to work on my workspace outside of Terra UI. It seems like I need to authenticate, but not familiar with ‘using custom handles’, which is recommended byauthenticate
function. Can someone help me with this?
2020-10-14
Martin Morgan (06:25:03): > @Sehyun Ohcan you be a bit more specific about what you’re trying to do? I’m not sure whatauthenticate
function you’re referring to?
Sehyun Oh (13:36:20): > @Martin MorganOh… please ignoreauthenticate
function part - I was confused with something else.:sweat_smile:
Sehyun Oh (13:36:35): > I want to accessavbucket()
from my local RStudio, which is already activated with my gmail account. It seems like my gmail causes 401 error, so I’m trying to switch to Terra’spet-
account and now it asks a valid credential. I’m lost from here…
Martin Morgan (13:50:43): > AnVIL knows me asmtmorgan.bioc@gmail.com. I have > > > gcloud_cmd("auth", "list") > [1] " Credentialed Accounts" > [2] "ACTIVE ACCOUNT" > [3] " martin.t.morgan@gmail.com" > [4] "* mtmorgan.bioc@gmail.com" > [5] " mtmorgan.rpci@gmail.com" > [6] "" > [7] "To set the active account, run:" > [8] " $ gcloud config set account `ACCOUNT`" > [9] "" >
> which shows that gcloud (i.e., the gcloud command line tool I’ve installed on my computer) knows the right account, and I confirm this with > > > gcloud_account() > [1] "mtmorgan.bioc@gmail.com" >
> and I could have chosen the correct account with > > gcloud_account("mtmorgan.bioc@gmail.com") >
> Similarly for my billing accountbioconductor-rpci-anvil
> > > gcloud_project() > [1] "bioconductor-rpci-anvil" >
> (the gory details are available withgcloud_cmd("config", "list")
) > > I have set my workspace namespace/name as > > > avworkspace("bioconductor-rpci-anvil/Bioconductor-Package-AnVIL") > [1] "bioconductor-rpci-anvil/Bioconductor-Package-AnVIL" >
> and then > > > avbucket() > [1] "gs://fc-f9eb5fe2-fff1-4b33-94af-d8226b8c1ff6" >
> returns the bucket associated with that workspace…
Sehyun Oh (13:55:57): > Humm… this is what happen from my local RStudio:
Sehyun Oh (13:55:59): > > > gcloud_cmd("auth", "list") > [1] " Credentialed Accounts" "ACTIVE ACCOUNT" > [3] "* shbrief@gmail.com" "" > [5] "To set the active account, run:" " $ gcloud config set account `ACCOUNT`" > [7] "" > > gcloud_account() > [1] "shbrief@gmail.com" > > gcloud_project() > [1] "waldronlab-terra-rstudio" > > avworkspace() > [1] "waldronlab-terra-rstudio/mtx_workflow_biobakery_ver3" > > avbucket() > Error: lexical error: invalid char in json text. > <!DOCTYPE HTML PUBLIC "-//IETF/ > (right here) ------^ > Called from: parse_string(txt, bigint_as_char) > Browse[1]> >
Sehyun Oh (13:56:13): > From RStudio running in Terra:
Sehyun Oh (13:56:27): > > > gcloud_cmd("auth", "list") > [1] " Credentialed Accounts" > [2] "ACTIVE ACCOUNT" > [3] "* pet-113017026745807664519@waldronlab-terra-rstudio.iam.gserviceaccount.com" > [4] "" > [5] "To set the active account, run:" > [6] " $ gcloud config set account `ACCOUNT`" > [7] "" > > gcloud_account() > [1] "pet-113017026745807664519@waldronlab-terra-rstudio.iam.gserviceaccount.com" > > gcloud_project() > [1] "waldronlab-terra-rstudio" > > avworkspace() > [1] "waldronlab-terra-rstudio/mtx_workflow_biobakery_ver3" > > avbucket() > [1] "gs://fc-071d1d53-e186-44ad-8951-d85538f85502" >
Sehyun Oh (13:58:22): > Any idea what’s going on here?
Martin Morgan (14:24:13): > Can you tell me the output oftraceback()
after the error in RStudio?
Sehyun Oh (14:30:37): > > > traceback() > 9: parse_string(txt, bigint_as_char) > 8: parseJSON(txt, bigint_as_char) > 7: parse_and_simplify(txt = txt, simplifyVector = simplifyVector, > simplifyDataFrame = simplifyDataFrame, simplifyMatrix = simplifyMatrix, > flatten = flatten, ...) > 6: fromJSON(content(x, as = as, encoding = "UTF-8")) > 5: as.list.response(response) > 4: as.list(response) > 3: as.list(response) at av.R#14 > 2: .avstop_for_status(response, "avbucket") at av.R#495 > 1: avbucket() >
Martin Morgan (15:05:25): > I’d guess thatTerra()
fails, and then thatAnVIL:::.gcloud_access_token()
fails? > > If that’s not a good guess, you might try > > debugonce(avbucket) > avbucket() >
> and then step through by typingn
for each command. Take a look at each output and see that it seems to make ‘sense’, and especially printresponse
before callingavstop_for_status()
Sehyun Oh (15:10:53): > AnVIL:::.gcloud_access_token()
didn’t fail andresponse
status is 401…
Sehyun Oh (15:15:46): > Oh… it works from a different local machine somehow. I’ll check little more what was happening in the other one. Thanks!
2020-10-16
Martin Morgan (13:17:43): > @Qian Liuand@BJ Stubbsmight be interested in working together on launching wdl submissions from R
Sean Davis (14:46:56) (in thread): > That would be a fantastic investment, IMHO.
2020-10-19
Martin Morgan (13:32:47): > @BJ Stubbsis there something (in TerraPlane?) where you’ve worked out workflow job submission via the REST api?
2020-10-20
Kevin Blighe (11:32:21): > @Kevin Blighe has joined the channel
Martin Morgan (14:29:24): > <!here>looking for summary activity / blockers for the call later today? I have > * DRS blocking issue partly resolved (thanks arula) > * Bioconductor release scheduled for Oct 28 > * AnVILPublish (publish R packages as AnVIL workspaces) accepted for next Bioconductor release > * Continuing work on binary package install > * Starting work on workflow submission via the AnVIL package
Nitesh Turaga (14:30:40): > Do we want to add that reorg of the docker images for RStudio we spoke about on Friday?
Martin Morgan (14:47:42): > sure…
Vince Carey (15:08:40): > nothing here
Peter Hickey (17:27:07): > @Peter Hickey has left the channel
2020-10-26
Lori Shepherd (11:58:54): > Started draft agenda for tomorrow’s meeting – Please add items to discussagenda
Nitesh Turaga (12:38:13): > I might miss tomorrow’s meeting because of the release for Bioc. Just a heads up. I’ll keep track of progress of release and join if I can.
2020-10-27
Nitesh Turaga (15:04:19): > I guess we are set for standup?? Anything else we want to add to the agenda?
Nitesh Turaga (15:14:35): > <!here>Let me know if we think of anything.
Vince Carey (15:53:49): > will do
Vince Carey (15:58:29): > looks good. i will be a little late
2020-10-28
Tim Triche (10:09:45): > @Tim Triche has left the channel
2020-10-29
Frederick Tan (14:43:41): > NCPIhas a session on Cloud Costs that I think would appreciate AnVILBilling … is there a screenshot of the exploratory appbrowse_reck()
?
Frederick Tan (14:44:13) (in thread): > @BJ Stubbs@Vince Careyperhaps?
Vince Carey (21:49:33) (in thread): > I can get something shortly
Vince Carey (22:00:25) (in thread): - File (PNG): Screenshot from 2020-10-29 21-58-47.png - File (PNG): Screenshot from 2020-10-29 21-58-11.png
Vince Carey (22:04:42) (in thread): > @Frederick Tanitemized table and interactive figure ^^ this is part of AnVILBilling package in Bioconductor 3.12, function is browse_reck() but one needs a fair amount of setup (additional project with bigquery scope, enable billing project to ship daily records to bigquery) to use this for a fixed clock interval for a specific project. i think more fine-grained cost data are available but only next day and more details would have to go into the app
Frederick Tan (22:32:10) (in thread): > Beautiful, thank you@Vince Carey!
2020-11-04
Sehyun Oh (09:44:11): > When I use terra service (e.g.terra$cloneWorkspace
), how can I extract request information?
Marcel Ramos Pérez (10:18:31): > Try usinghttr::content
Sehyun Oh (10:21:52): > Is there any way to get Request URL?
Marcel Ramos Pérez (10:23:08): > I don’t see a ‘Request URL’ in the responsehttps://api.firecloud.org/#/Workspaces/cloneWorkspace
Sehyun Oh (10:26:08): > Yeah, I don’t think request url is a part of response body. Due to a sparse documentation of Terra API, I have a hard time to figure out which API call is used for a specific terra service. I’m doing this to understand what kind of parameters are required, FYI.
Marcel Ramos Pérez (10:32:08): > What I’ve done for cBioPortal is to comb over the API swagger website. The other thing you can do is to open the developer tools in the browser with Ctrl + Shift + I and go to the Network tab as you use the Terra website. It will show you what requests are sent out as you perform operations on the website.
Nitesh Turaga (12:34:05): > The new AnVIL rstudio and jupyter images should be available shortly. They are undergoing review right now.
Frederick Tan (12:52:30) (in thread): > To clarify, this is still RStudio 1.2?
Nitesh Turaga (13:26:15) (in thread): > 1.3
Nitesh Turaga (13:26:48) (in thread): > > RStudio Server > Version 1.3.1093 > © 2009-2020 RStudio, PBC > "Apricot Nasturtium" (aee44535, 2020-09-17) for Ubuntu Bionic > Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36 >
Frederick Tan (13:35:06) (in thread): > Awesome!:champagne:
2020-11-05
Sean Davis (11:04:24) (in thread): > You can get the computable version of the Terra docs here:https://api.firecloud.org/api-docs.yaml
2020-11-09
Lori Shepherd (11:17:58): > Agenda for tomorrow’s meetinghttps://docs.google.com/document/d/155GpgMGsjgp2re3jFQ2HuT0wACykmrRycZ_2AaHe4mI/edit?usp=sharing
Nitesh Turaga (15:17:52): > The docker image updates for RStudio are facing further delays because of the Leo team.https://github.com/anvilproject/anvil-docker/pull/21. But i’ll keep everyone updated as soon as they are ready.
2020-11-10
Lori Shepherd (11:01:22): > Link for the meetinghttps://bluejeans.com/480153337?src=calendarLink
Sehyun Oh (11:44:33): > bioBakeryR vignette:https://rpubs.com/shbrief/bioBakeryR
Sehyun Oh (11:45:53): > bioBakeryR Git Repo:https://github.com/shbrief/bioBakeryR
Marcel Ramos Pérez (11:47:03) (in thread): > It’s private
Sehyun Oh (11:50:28) (in thread): > Oops… sorry. Fixed.
2020-11-14
Vince Carey (04:10:03): > the AnVIL_admin page needs some updating - File (PNG): Screenshot from 2020-11-14 04-09-00.png
Vince Carey (04:46:09): > igraph cannot load in the latest anvil docker image – revised … this is apparently not reproducible but i leave the material in case it is useful history - File (PNG): Screenshot from 2020-11-14 04-45-09.png
Vince Carey (04:46:25): - File (PNG): Screenshot from 2020-11-14 04-44-34.png
Vince Carey (04:48:12): > I was able to install igraph from source, then - File (PNG): Screenshot from 2020-11-14 04-47-24.png
Vince Carey (04:56:50): > I am hoping that I did something wrong. I am going to start over – but before I do - File (PNG): Screenshot from 2020-11-14 04-56-10.png
Vince Carey (05:15:31): > when i run R withinus.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.8on my laptop I do not run into this problem. igraph is installed as a binary package but loads properly. i will try to reproduce on anvil.
Vince Carey (05:38:57): > I started a completely new runtime and was not able to reproduce the error. Maybe just bad luck. It seems that an incorrect binary image of igraph came in when I asked to update all outdated packages as BiocManager::install proceeds. But this did not recur on second attempt.
Martin Morgan (12:51:31): > were you re-using a version of igraph previously installed with the old docker image, e.g., on the persistent disk (which is at i think /home/rstudio when on the RStudio image? Screen shot doesn’t show the installation of igraph…
2020-11-15
Vince Carey (06:34:30): > that diagnosis sounds correct
2020-11-16
Vince Carey (13:07:59): > I am working on a PR to AnVILPublish that will allow a richer workspace DESCRIPTION element: the idea is that if a README.md is present in the package toplevel folder, it is ingested and pasted ahead of the parsed DESCRIPTION data from the package. It would be nice to have the capacity to add images to the workspace DESCRIPTION.
Martin Morgan (13:51:41): > Sounds good. Without thinking deeply I would have put the README.md after the current Description section, with an emphasis on provenance first. Maybe the provenance info could be presented more compactly? I think images should be doable at this stage, depending on where they are stored (in the package?) Look forward to the PR.
Vince Carey (15:44:59): > Thanks, I will have something soon.
2020-11-17
Martin Morgan (09:53:53): > <!channel>– please add items to report during standup as a thread to this message
Martin Morgan (09:54:28) (in thread): > Newavworkflow_*()
functionality for updating workflow configurations
Vince Carey (09:58:27) (in thread): > Working through pyAnVIL FHIR utilities by Brian Walsh, anticipating exposing/testing key elements to R users via basilisk in a new workspace
Vince Carey (10:11:07) (in thread): > collectl-enabled container extending terra-bioconductor container for instrumentation: vjcitn/instr in dockerhub
Vince Carey (10:12:48) (in thread): > enhancement to AnVILPublish to incorporate package-level README.md content in workspace DESCRIPTION – small blocker – does DESCRIPTION markdown processing enable incorporation of image files local to the workspace?
Frederick Tan (10:19:26): > @BJ StubbsCould you elaborate more on this setup step for AnVILBilling > > 4. Make sure the user of this software has the BigQuery scope on the billing project
Frederick Tan (10:19:31) (in thread): > Don’t see it atcloud.google.com/billing/docs/how-to/export-data-bigquery-setup
Lori Shepherd (10:30:32): > Thinking ahead – We wanted to start brainstorming for Y3 Goals – I started aProject Board for a Year 3 Overview– I would encourage everyone to start thinking about these broad topic categories for a discussion at next weeks meeting. Cheers
Vince Carey (10:30:39) (in thread): > Strategizing with JHU OpenCRAVAT developers how to support resources for flexible interactive variant annotation in the AnVIL framework
Martin Morgan (11:07:45) (in thread): > Rmd in /notebooks bucket, cloned w/ notebooks?
Frederick Tan (12:51:41) (in thread): > @Vince Carey@BJ StubbsWelcome comments on this diagram - File (PNG): GoogleBilling.png
Vince Carey (12:53:45) (in thread): > the billing project must enable the BigQuery capabilities in GCP IAM i will try to give some details tomorrow
BJ Stubbs (18:58:08) (in thread): > * @Frederick Tan”BigQuery Admin role for the Cloud projectthat contains the BigQuery dataset that will be used to store the Cloud Billing pricing data > * resourcemanager.projects.update
permission for the Cloud project containing the target dataset. This is included in theroles/editor
role. > ” - Attachment (Google Cloud): Controlling access to datasets | BigQuery | Google Cloud
BJ Stubbs (18:59:11) (in thread): > @Frederick Tanbasically, under the google billing account associated with anvil, you need to create a new google project, and then give that new project scope to use big query
2020-11-19
Vince Carey (08:16:41): > AnVILBilling has been updated in devel and RELEASE_3_12 so that browse_reck works. The getBilling function has been exported and is used to simplify aspects of the app. But the code in the app is still too complex.
2020-11-24
Lori Shepherd (10:47:28): > Here is the agenda for today’s 11 meetinghttps://docs.google.com/document/d/1qzzWD7EfxTx8XJmNDwXGMUEdFEEnq0zQyEAjhm7BtcQ/edit?usp=sharing
2020-12-01
Martin Morgan (09:37:54): > <!channel>please add updates for today’s 4pm standup as comments to this thread
Vince Carey (11:03:26) (in thread): > https://app.terra.bio/#workspaces/use-strides/instr_strides– the description for this workspace shows how Rcollectl (under development atgithub.com/vjcitn) can be run within AnVIL to generate high-resolution data on resource consumption in real time. This is for the instrumentation aim.
Vince Carey (11:03:55) (in thread): > There is an issue recording network traffic. I am not able to do this in Terra as well as I can on my laptop.
Vince Carey (11:07:36) (in thread): > https://app.terra.bio/#workspaces/landmarkanvil2/BiocRepoMakerhas a description that gives some details on how to generate the binaries for a repo. It is not fully current;github.com/vjcitn/BiocBuildToolsis the next generation; some components may contribute to the k8s approach under development by Nitesh.
Qian Liu (16:01:13): > Are we using the bluejeans link show above here?@Lori Shepherdthanks!
Qian Liu (16:03:01): > Hi@Lori Shepherd, seems you are organizing the anvil series meetings. I wonder what are the schedules, do we meet at 11am every 2 weeks? thanks!
2020-12-07
Lori Shepherd (11:06:48): > Next meeting tomorrow: Here is the draft agendahttps://docs.google.com/document/d/1_j1i1fLvQY_gl9UBVwzUcy93ZxhLTSKrWLb9nw3hd0s/edit?usp=sharing
Lori Shepherd (11:06:55) (in thread): > Yes every two weeks
2020-12-08
Qian Liu (07:44:58) (in thread): > @Lori Shepherdcan you grant me access to the agenda please?
Marcel Ramos Pérez (12:29:38) (in thread): > Can you always make the agenda public? I also need permission to see it.
2020-12-09
Levi Waldron (09:50:41): > I recall@Vince Careyand@Kayla Interdonatohave put some Bioconductor training materials on AnVIL. How can I find these? I was wondering whether I could make a useful contribution by transporting Bioconductor workshops and workflows onto AnVIL workspaces and indexing them in a way to make them more easily findable, and have a potential part-time research assistant who could do this kind of work.
Levi Waldron (09:53:13): > Also FYI I’m teaching two courses next semester where I could do some eating of our dogfood (and have the students eat it too:smile:), especially since RStudio is available now. Any thoughts welcome.
Vince Carey (09:55:26): > Seehttps://anvil.terra.bio/#workspaces/landmarkanvil2/MaGIC%20Jamboree%202020%20copyVJCeditedwhich is shared toBioconductor_User@firecloud.org– this was for the jamboree
Frederick Tan (09:55:31): > Just created a Google Doc to try and convert@Kayla Interdonatoworkspace into a Getting Started exercise onanvilproject.org/learn:slightly_smiling_face:https://docs.google.com/document/d/19hjStZNUz5Eqa4MhIcWRFhWz-ymYcgOtzNDRBR73t4M
Vince Carey (09:55:54): > I don’t know where the official workspace that was used in the jamboree is.
Frederick Tan (09:57:36) (in thread): > @Vince CareyAre you referring to this workspace?https://anvil.terra.bio/#workspaces/anvil-outreach/MaGIC%20Jamboree%202020
Frederick Tan (09:58:02) (in thread): > There’s also this “organizer”https://anvilproject.org/learn/training/magic-jamboree-june-2020
Frederick Tan (09:59:08) (in thread): > @Levi WaldronThoughts on how you’ll handle Cloud Credits? Sounds like a nice opportunity to get our ducks in order …
Kayla Interdonato (10:46:57): > This is the workspace where I’ve been working on the DESeq2 analysishttps://anvil.terra.bio/#workspaces/bioconductor-rpci-anvil/Bioconductor-Workflow-DESeq2
Martin Morgan (10:56:43) (in thread): > Bioconductor-Workflow-DESeq2 (bulk RNA-seq) includes both workflows (for fastq -> count matrix using salmon) & interactive notebooks.@Levi Waldronyou’re not a member of Bioconductor_User; happy to add you if you let me know your AnVIL email address. This will provide access to a number of Bioc workspaces…
Vince Carey (10:56:49) (in thread): > @Frederick Tanyes you got it right above. maybe addBioconductor_User@firecloud.orgto writers(?) if not already there….
Frederick Tan (11:05:30) (in thread): > @Martin MorganCould you also addjtleek@gmail.comandcox@carnegiescience.edu?
Ludwig Geistlinger (12:32:20) (in thread): > would be great if I could also be added? (ludwig.geistlinger@gmail.com)
Martin Morgan (12:39:00) (in thread): > added! even better,@Frederick Tanand@Marcel Ramos Pérezare already ‘admin’ and can add other users (under Profile -> Group -> Bioconductor_User…)
Frederick Tan (12:41:52) (in thread): > :thumbsup:
Frederick Tan (12:42:14) (in thread): > Would you like to be notified when someone is added?
Marcel Ramos Pérez (12:43:24) (in thread): > Added!@Ludwig Geistlinger
Ludwig Geistlinger (12:44:03) (in thread): > Great, many thanks!
Martin Morgan (12:45:19) (in thread): > no need to notify me, just add – I think of this as increasing visibility without going fully public; there is no $$ cost
Frederick Tan (12:53:50) (in thread): > Great! Just making sure there isn’t any sensitive information that this group would grant access to.
Levi Waldron (13:08:39) (in thread): > @Frederick TanI haven’t really thought about Cloud credits, other than that I could share a workspace with permission to compute on my own billing account. I’d be happy to implement a better solution though!
Levi Waldron (13:09:07) (in thread): > @Marcel Ramos Pérezwould you add me too? (lwaldron.research@gmail.com)
Frederick Tan (13:15:32) (in thread): > Added@Levi Waldron!
Frederick Tan (13:17:10) (in thread): > Also, would love to find a way to offer “block grants” to run courses and workshops. Let me know if you’re interested@Levi Waldron
Levi Waldron (14:41:15) (in thread): > Yes I’d be happy to help pilot that@Frederick Tan- costs will be low for these courses but it would be good to figure out, not least because if I didn’t already have billing set up, my school (and others I assume) make it a pain at first to pay for Cloud services.
2020-12-10
Lori Shepherd (06:45:33) (in thread): > marcel which email do you want me to use for access?
Lori Shepherd (06:45:43) (in thread): > you can PM me with it
2020-12-15
Martin Morgan (11:41:43): > updates for the tech call this afternoon?
Martin Morgan (11:42:03) (in thread): > * docker image libgit2 issue > * Sehyun’s workshop at EuroBioc2020 > * Sean’s Orchestrate
Vince Carey (14:38:41) (in thread): > nothing to report, really.
2020-12-17
Manojkumar Selvaraju (11:14:12): > @Manojkumar Selvaraju has joined the channel
2020-12-21
Vince Carey (12:09:15) (in thread): > looks like you are working on arabadopsis here? i have been digging up the airways fastq files for a human example
2020-12-22
Lori Shepherd (08:55:54): > Anvil meeting today at 11 am EST Here is theagendafeel free to add updates or agenda items
Qian Liu (09:56:21) (in thread): > i’ll be in another meeting and miss this one. Thanks!
Lori Shepherd (11:03:10): > link for meeting starting now:https://bluejeans.com/480153337?src=calendarLink
2021-01-02
Vince Carey (09:47:30): > The > EuroBioc2020 > workshops would seem to offer some good candidates for AnVIL workspaces. AnVILPublish may be able to take on this task. Here is an example:https://kstreet13.github.io/bioc2020trajectories/articles/workshopTrajectories.html… What do we need to do to minimize the effort required to get folks running quickly in something like this? Is the workspace markdown in description a limiting factor? It would be good to understand the technical gaps.@Frederick Tanyour thoughts welcome. - Attachment (eurobioc2020.bioconductor.org): EuroBioc2020 > European Bioconductor Virtual Meeting 14-18 December 2020 - Attachment (kstreet13.github.io): Trajectory inference across conditions: differential expression and differential progression > bioc2020trajectories
Vince Carey (09:49:22): > One question I have is whether the docker containers defined for the workshops can be used as is in AnVIL, or do they need some modifications? If the latter, can we systematize this in conjunction with AnVILPublish?
Vince Carey (09:52:11): > To bump this up, conceptually, my sense is that the requirements of modern analysis workflows aren’t as well-captured by packages as by the combination of package+container. I find this somewhat disturbing but if it is correct, we should consider how to accommodate this.
Vince Carey (09:54:35): > And to give a little bit of context, very experienced bioinformaticians in my lab have proposed that the reproducible locus of a complex workflow be a conda environment. I don’t know enough to critique this but I think it is too weak.
Martin Morgan (12:34:34) (in thread): > tagging@Nitesh Turagafor ^^
Vince Carey (13:50:58): > This is the resource consumption on a 16 core 60GB RAM AnVIL instance to perform quantification of a single human RNA-seq fastq file from the airway workflow. - File (PNG): Screenshot from 2021-01-02 13-47-46.png
Vince Carey (13:52:30): > The DESCRIPTION athttps://anvil.terra.bio/#workspaces/landmarkanvil2/Bioconductor-Package-Rcollectlgives the details of the analysis; the display is produced using functions in Rcollectl
Vince Carey (13:53:49): > Essentially we start collectl using cl_start and record the process id, use system() to invoke snakemake, and cl_stop/cl_parse/plot_usage to get the figure. Modeling consumption patterns over larger tasks will ensue.
2021-01-03
Nitesh Turaga (14:04:36) (in thread): > They need some modification. The docker containers in the AnVIL use some settings on top of the bioconductor_docker images.
Vince Carey (20:10:19): > The content of the Rcollectl workspace description is athttps://vjcitn.github.io/Rcollectl/… the workspace is not currently the result of AnVILPublish. - Attachment (vjcitn.github.io): Rcollectl: instrumentation for R processes on linux systems > Provide functions to obtain instrumentation data on processes in a unix environment. Parse output of a collectl run. Vizualize aspects of system usage.
2021-01-04
Martin Morgan (13:26:31) (in thread): > Is that just customization, like defining the current workspace in an environment variable, that is not important for functionality, or do arbitrary images just not work?
Nitesh Turaga (14:48:53) (in thread): > Arbitrary images just do not work. They need specific rstudio PORT (8001) setting and some rstudio server configurations to make it work. > > But, these are minimal settings.
Sean Davis (20:28:21) (in thread): > Would it be useful to work with the AnVIL folks to resolve some of these rough edges? Google, for example, injects the PORT as an environment variable, so the default just works, but it is configurable. It would be nice to have such environment variables available so that one could design docker containers that can be reused within and outside of AnVIL.
Sean Davis (20:32:10) (in thread): > As an alternative, in the Orchestra platform, I built things to allow mapping to arbitrary ports so that containers that have additional functionality beyond a browser can be used and those that come from third parties and, therefore, have different port mappings, can be used without access to the original Dockerfile.
2021-01-05
Lori Shepherd (07:22:22): > Meeting today at 11 am EST – Here is the link to theAgenda
Nitesh Turaga (08:51:40) (in thread): > I could ask about this.
Nitesh Turaga (09:55:17) (in thread): > Their biggest concern is security of launching arbitrary images. They are easily able to launch custom images already which inherit from certain “base” images like anvil-rstudio-bioconductor.
Vince Carey (11:36:27): > slideset for anvil standup todayhttps://docs.google.com/presentation/d/1klyPvsL2Z72VrGAwzd6ZXnxODqtTdz9EL0b1g2XxUmo/edit?usp=sharing - File (Google Slides): BiocAnVILJan5
Vince Carey (15:52:29): > I am going to lock the slides shortly. They are “anyone with link can edit” but I will change to “comment”
Nitesh Turaga (15:57:21): > Sounds good Vince.
Vince Carey (16:30:21): > This is going long. I am going to truncate the material to be presented. Sehyun’s piece will probably take most of our time.@Sehyun Oh@Levi WaldronI will try to manage the time.
Frederick Tan (16:47:26) (in thread): > @Vince CareyBig picture … I think this is great! A lot of details I don’t understand. Starting with Kayla’s DESeq2 workspace to familiarize Outreach Group.
Vince Carey (16:48:42) (in thread): > Thanks Fred… any questions you have, just fire away.
Levi Waldron (18:13:37): > Well done, thank you@Vince Careyand@Sehyun Oh!!
Vince Carey (18:14:01): > yes, thank you Sehyun, I think that was very informative!
Sehyun Oh (19:20:48): > Thanks! Hope ‘runnable’ workflow packages can attract some new R users…:crossed_fingers:
2021-01-06
Martin Morgan (12:05:47): > https://the-anvil.slack.com/archives/CGM728FJ4/p1609944683062400for ASHG and SACNAS conference workshop channels
2021-01-12
Vince Carey (05:23:37): > Stand-up items? I have fallen down on the ASHG workshop concept. My impression had been that AnVIL needed us because galaxy could not be done again but this does not seem correct. The scope of the workshop and target audience are foggy to me. I will try to put a little more time into this today. Items that I can report on include a) renovation of the OpenCRAVAT-based variant annotation workspace, b) instrumentation workspace demonstrates RNA-seq quantification with FASTQ from SRA in google storage – two follow-on concerns i) limited flexibility of #cores/ram allocations available in the cloud environment UI, ii) searching SRA with the omicidx API of Sean Davis, c) progress on FHIR (limited)
Frederick Tan (08:28:53) (in thread): > @Vince CareyIf I’m remembering correctly, the three ASHG workshop ideas we’ve lightly discussed are > * Single cell RNA-seq with Bioconductor > * Genome assembly with Galaxy > * BDCat/AnVIL interoperability
Vince Carey (10:05:11) (in thread): > Thanks@Frederick Tan. I am not a single-cell expert so I would not take on the first one. BDCat sounds very interesting, is Brian O up for that?
Frederick Tan (10:06:11) (in thread): > Not sure if Brian O’Connor is up for it … presumably Allie is part of the conversation and would be the main developer of the content.
Frederick Tan (10:08:14) (in thread): > re: Single cell with Bioconductor … that was just one idea since you had put in the work to port OSCA … have a topic you would be willing to take on (bulk RNA-seq, variant annotation, etc.)?
Vince Carey (13:51:55) (in thread): > Variant annotation is a possibility, and I renovated the shiny-oriented workspace. But it doesn’t really showcase AnVIL in any specific way.
Frederick Tan (14:11:01) (in thread): > What do you think about highlighting all of the AnVIL specific Bioconductor packages and how they move forward the “workspace as software” idiom
Vince Carey (14:58:05) (in thread): > For AnVIL days?
Frederick Tan (15:12:47) (in thread): > ASHG
Frederick Tan (15:13:17) (in thread): > Not sure if it’s the best audience, but I think it’s a compelling story to tell (and retell)
Vince Carey (16:00:41) (in thread): > My sense is that objective #1 of ASHG attendee is advances in genetic epi, #2 would be clinical genetics, #3 would be genomic approaches to exposing mechanisms of genetic effects. Bioc would best be used for #3 … the AnVIL* packages seem to me valuable for developers and maybe strategists but I don’t think there will be too many of those at ASHG.
2021-01-14
Frederick Tan (18:00:50) (in thread): > @Vince CareyOngoing developments for ASHG proposals > * UCSC/Dockstore is mulling over human pangenome work using Terra Workflows > * Broad/Galaxy is mulling over long read assembly using Galaxy Workflows (and maybe GATK?)
Frederick Tan (18:01:45) (in thread): > Are you still mulling over variant annotation? I think that seeing OpenCRAVAT would be of interest …
2021-01-15
Vince Carey (09:44:09) (in thread): > yes, mulling … do you have time to catch up today? maybe a 10 minute google meet?
Frederick Tan (10:04:02) (in thread): > Sure, available now until 11am, 12-1, after 2
Vince Carey (12:31:42) (in thread): > want to try now?
Vince Carey (12:34:23) (in thread): > To join the video meeting, click this link:https://meet.google.com/vvv-hpqc-yqfOtherwise, to join by phone, dial+1 985-888-0230and enter this PIN: 966 877 541# > To view more phone numbers, click this link:https://tel.meet/vvv-hpqc-yqf?hs=5 - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers. - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
2021-01-19
Lori Shepherd (09:09:57): > Meeting today at 11 am EST– Here is a draftagendaplease feel free to populate with project update notes or additional items
Lori Shepherd (11:00:31): > Heres the meeting link if anyone needs ithttps://bluejeans.com/480153337?src=calendarLink
Vince Carey (15:39:19): > Standup outline has been entered in AnVIL meeting agenda
Lori Shepherd (15:45:20): > Thanks Vince!
2021-01-20
Vince Carey (17:05:20): > I just started using RStudio ‘project’ management with source code control in github, within AnVIL. This works well. Anyone else using it?
Vince Carey (17:05:38): > I will illustrate it briefly in the AnVIL days session.
2021-01-21
Vince Carey (13:07:22): > From left field: I wonder whether AnVILPublish should ship the whole package verbatim if possible, in addition to the notebook/DESCRIPTION massaging that is done. Whatever folder substructure works (it could all live under “/notebooks”) can be checked into git, installed, etc.
Martin Morgan (13:18:15): > Yeah I though about that. To use the package the user would need to localize it to the local disk. But the package is already available via BiocManager::install(), so has it been that helpful? I think putting the Rmd / md under notebooks would be good…
Vince Carey (13:27:29): > I’m not assuming that the very early image of the package is in github. I am thinking of a very minimal protocol for starting a workspace that can easily evolve to a git-managed package. You could start the DESCRIPTION anywhere, add any relevant code or data, and run AnVILPublish from anywhere. Once the workspace exists, some regimented steps in R or Rstudio in AnVIL get the text into git and get the package installed for continuing development.
2021-01-22
Kozo Nishida (00:34:41): > @Kozo Nishida has joined the channel
Annajiat Alim Rasel (15:47:38): > @Annajiat Alim Rasel has joined the channel
Martin Morgan (16:43:09): > Wow the video in this blog post provides a really great intro to using RStudio / Bioconductor in AnVIL - Attachment (Terra.Bio): Try out RStudio in Terra - Terra.Bio > A sneak preview at RStudio in Terra, with a short video demonstrating how to launch RStudio, import data and use Bioconductor for scRNAseq
Vince Carey (16:54:29): > The dimensions on that SCE are remarkable. It would be nice to take it a little further downstream.
2021-01-26
Nitesh Turaga (12:05:01): > The new image is now working on Terra. Please try to launch a custom image withus.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.10and test it out. If you see any bugs or inconsistencies, please let me know. > > The reason it’s not in the UI drop down yet, is because the terra team is having issues updating the menu UI (unrelated to us), but should be fixed soon per Rob Title.
Nitesh Turaga (12:06:47): > Some consistency checks across all users should be, > 1. > > > > Sys.getenv("BIOCONDUCTOR_DOCKER_VERSION") > [1] "3.12.31" >
> 2. > > > .libPaths() > [1] "/home/rstudio/R/x86_64-pc-linux-gnu-library/4.0-3.12" > [2] "/usr/local/lib/R/site-library" > [3] "/usr/local/lib/R/library" >
Nitesh Turaga (12:08:07): > 3. Delete your environment while keeping your persistent disk. And then relaunch the same imageus.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:0.0.10for now, and you should see the/home/rstudio/R
directory persist along with any packages you install inside of it.
Nitesh Turaga (12:08:29): > <!here>
Vince Carey (13:54:50): > I used this custom image. I had BiocStyle installed previously but failed for its absence in trying to continue a session for compiling an Rmd.
Nitesh Turaga (13:55:55): > Right, so because of the way the PD’s were built the persistence will work only going forward. This is because the /home/rstudio/packages folder is no longer on your.libPaths()
Nitesh Turaga (13:56:10): > You’d have to manually attach it with to your search path.
Vince Carey (13:56:23): > I got 3.12.30 for BIOCONDUCTOR_DOCKER_VERSION
Vince Carey (13:57:51): > I had to restart the Rstudio session to see 3.12.31
Nitesh Turaga (13:58:16): > This is an error i’m unable to understand actually.@Martin Morganseemed to have that yesterday too. But the image being distributed has3.12.31
for BIOCONDUCTOR_DOCKER_VERSION. I’m actually not sure why it would be stored in your browser or leonardo cache.
Nitesh Turaga (13:58:51): > Do you see the/home/rstudio/packages
on your rstudio from your persistent disk?
Vince Carey (14:03:54): > it still exists but it is not in .libPaths() … apropos BIOCONDUCTOR_DOCKER_VERSION, the session seems to keep the environment variables until it is turned off.
Nitesh Turaga (14:06:42): > I see. Thanks Vince.
Nitesh Turaga (16:03:23): > No anvil tech call today?
Lori Shepherd (16:04:03): > most of the anvil meetings were cancelled this week for the AnVIL days tomorrow and thur
Nitesh Turaga (16:04:14): > Aaah, ok. Thanks.
Nitesh Turaga (16:04:30): > The document still says Jan 26th.
Martin Morgan (17:05:58): > Am I misunderstanding, or does this look totally messed up? (CCDG data in the 1000 genomes public workspace)? - File (PNG): Screen Shot 2021-01-26 at 5.04.31 PM.png
Vince Carey (17:16:51): > looks bad but i cannot find that workspace
Vince Carey (17:21:57): > now i see it – public – yes, but i don’t really understand files anyway. outta here for a bit
Vince Carey (17:23:09): > maybe ccdg called the 1000 genomes variants too. the ids deeper in look like hapmap/1000g ids.
Vince Carey (21:05:14): - File (PNG): Screenshot from 2021-01-26 17-23-17.png
2021-01-27
Frederick Tan (09:22:27): > It sounds like some of 1000G is counted towards CCDG so it may be “normal” (depends on what’s actually in the .vcfs)
Frederick Tan (10:28:09): > … sounds like it’s correct, though there are some questions as to the naming convention.
Martin Morgan (10:31:02): > Thanks for investigating / clearing that up; definitely the naming convention threw me for a loop.
2021-01-28
Lori Shepherd (15:51:38): > Thanks for participating everyone! Seems like all the sessions made really good progress and had good conversation
Lori Shepherd (16:01:08): > So – not that I want to pile on work when we just got done – but have we started formulating for our Tech presentation next Tue – we still have that morning to review but don’t want it to slip off the radar until that morning
Martin Morgan (17:27:41): > Beth Sheets pointed me to this workspace as a collection of notebooks that illustrate common useshttps://app.terra.bio/#workspaces/anvil-stage-demo/NHGRI%20AnVIL%20Notebooks%20Collection; it’s maintained using a python moduleherzog
inhttps://github.com/DataBiosphere/bdcat_notebooks
Vince Carey (17:54:17): > I’d be happy to talk about the CMG Eye workspace and its FHIR app. Also could discuss the MRC IEU exploration workspacehttps://anvil.terra.bio/#workspaces/landmarkanvil2/MRC%20GWAS%20Ecosystem%20Explorationswhich is now public and is the basis of the ASHG workshop proposal.
Vince Carey (17:58:16): > It seems also relevant to promote discussion of a possible need for git services within AnVIL/Terra. There is a google source code controlhttps://cloud.google.com/source-repositoriesthat was mentioned by Rob Title and may be relevant – where should the back end for RStudio/AnVIL git project management live? Not ingithub.com, I would think. - Attachment (Google Cloud): Cloud Source Repositories | Google Cloud > Fully managed private Git Repositories with integrations for continuous integration, delivery & deployment. Git source control service that helps you release faster
Martin Morgan (19:31:39) (in thread): > github.comhas ‘private’ repos; does that mitigate some concerns? Or maybe ‘why not github?’
Vince Carey (19:35:18) (in thread): > just seems to me that relying on something outside the fisma boundary might be problematic. but now that microsoft is involved, maybe it is a non-issue.
2021-02-01
Vince Carey (20:13:12): > started some slides for tuesday meeting with AnVIL techhttps://docs.google.com/presentation/d/1GH0CvYYgXzFkcK8y9bISL9GfBz668LrX9YRo_u3Wl-o/edit?usp=sharing - File (Google Slides): bioc 2/2 AnVIL sketch
2021-02-02
Lori Shepherd (06:44:31): > Sorry forgot to post this yesterday but here is the agenda for today’s meeting –https://docs.google.com/document/d/1yM8TxkRznTiNd4iAYvDDQc1m8B5HH96GsWstqO0UgFU/edit?usp=sharing
Lori Shepherd (06:47:04) (in thread): > Please fill in project updates accordingly
Vince Carey (07:15:44): > @Lori Shepherdthe timings on the agenda should be corrected? do we need to do standup items as well as the long presentation?
Lori Shepherd (07:16:38): > No – we just do the presentation and skip the normal standup section –
Vince Carey (09:51:57): > I added to the bottom of agenda document what I believe to be the template for reporting to the project officers
Lori Shepherd (09:58:11) (in thread): > Excellent! Thank you for remembering! We can edit and modify here and I can move over to the bi-weekly working group update doc when its complete.
Nitesh Turaga (10:53:52): > My hotel checkout is in 30 mins, so i’ll probably not be in throughout today’s meeting. My apologies.
Nitesh Turaga (10:58:05): > but, i’d like to demo the following at the 11am meeting today on the RStudio (v: 0.0.10) image in the Terra app > > > BiocManager::install('Bioconductor/AnVIL') > > AnVIL::repositories() > "[https://storage.googleapis.com/bioconductor_docker/packages/3.12/bioc](https://storage.googleapis.com/bioconductor_docker/packages/3.12/bioc)" > BioCsoft > "[https://bioconductor.org/packages/3.12/bioc](https://bioconductor.org/packages/3.12/bioc)" > BioCann > "[https://bioconductor.org/packages/3.12/data/annotation](https://bioconductor.org/packages/3.12/data/annotation)" > BioCexp > "[https://bioconductor.org/packages/3.12/data/experiment](https://bioconductor.org/packages/3.12/data/experiment)" > BioCworkflows > "[https://bioconductor.org/packages/3.12/workflows](https://bioconductor.org/packages/3.12/workflows)" > CRAN > "[https://packagemanager.rstudio.com/all/__linux__/focal/latest](https://packagemanager.rstudio.com/all/__linux__/focal/latest)" > > AnVIL::install('Rhtslib') > trying URL '[https://storage.googleapis.com/bioconductor_docker/packages/3.12/bioc/src/contrib/Rhtslib_1.22.0_R_x86_64-pc-linux-gnu.tar.gz](https://storage.googleapis.com/bioconductor_docker/packages/3.12/bioc/src/contrib/Rhtslib_1.22.0_R_x86_64-pc-linux-gnu.tar.gz)' > Content type 'application/x-tar' length 5196439 bytes (5.0 MB) > ================================================== > downloaded 5.0 MB > Bioconductor version 3.12 (BiocManager 1.30.10), ?BiocManager::install for help > * installing **binary** package 'Rhtslib' ... > * DONE (Rhtslib) > The downloaded source packages are in > '/tmp/RtmpyA3qn8/downloaded_packages' >
Nitesh Turaga (10:58:25): > vs > > > BiocManager::install('Rhtslib') >
Martin Morgan (15:01:53): > I’m working on slides athttps://bluejeans.com/2024711962if anyone wants to join…
2021-02-03
Martin Morgan (14:59:49): > I did want to explore the ‘popup workshops’ idea. Looking for ideas about topics athttps://docs.google.com/document/d/1N78tFpuIcKzbbGpEvDaQflgQvr8PDXHvKVhnRAj5K-w/edit?usp=sharingwhere anyone can comment
Nitesh Turaga (19:53:47): > I added one..
2021-02-05
Laurent Gatto (04:09:32): > @Laurent Gatto has left the channel
2021-02-09
Lori Shepherd (14:16:38): > @Martin Morgan/@Vince Careywere one of you doing the standup portion for Bioc this week at Anvil this afternoon?<!channel>anything to report for the week?
Vince Carey (14:22:37): > Thanks for the reminder. I can do the standup. I don’t have much to report but will try to assemble some notes shortly.
Vince Carey (14:54:22): > @Nitesh Turagado you have comments for standup related to docker or kubernetes?
Vince Carey (14:57:33): > Checking AnVIL package hit > > Error: 'DRS resolution' failed: > Internal Server Error (HTTP 500). > Received error while resolving DRS URL. getaddrinfo ENOTFOUND drs.data.humancellatlas.org drs.data.humancellatlas.org:443 > Execution halted >
Martin Morgan (15:27:15): > yes the HCA removed their DRS endpoints. Try again (I removed the broken exmaple)
Martin Morgan (15:28:19): > (hmm not quite yet)
Martin Morgan (15:29:48): > ok now
Vince Carey (15:49:56): > standup items for today > > - Hardening AnVIL package (increase unit testing) > - Continue development of rstudio docker image > - Strategy for binary repository production and package distribution has matured; .libPaths setting reflects R and bioc versions in use -- different sessions on a given persistent disk will be internally consistent > - Conceptualization of popup workshops > - ASHG GWAS ecosystem workshop proposal environment refinements > - Rob Title aware of shiny pathologies (random timeouts) >
2021-02-12
Lori Shepherd (09:30:26): > I’ve started a draft agenda for next week’s Tue meetingagenda 2-16Please feel free to add items. (Monday is a holiday for my institutional so I wanted everyone to have the opportunity to add things before morning of)
2021-02-13
Levi Waldron (06:08:46): > Should learnr workshops be runnable on anvil? If it’s feasible I’d like to create a workspace with all the rstudio primers.
Vince Carey (07:19:56): > I didn’t know abouthttps://github.com/jcoliver/learn-r! Is that what you mean? Looks very nice. If the Rmds/data were put in a package, AnVILPublish would finish the job.
2021-02-14
Levi Waldron (11:53:35): > I meant the learnr
package (https://rstudio.github.io/learnr/) that uses Shiny to create interactive tutorials, and the rstudio-education primers (https://github.com/rstudio-education/primers) that use it. I guess I’ve answered the question since I know@BJ Stubbshas run Shiny on AnVIL.https://github.com/jcoliver/learn-rlooks great, too!
2021-02-16
Lori Shepherd (07:56:21): > reminder – meeting today at 11 am – please add any items to discuss and project updates to the agenahttps://docs.google.com/document/d/1yqjDebRVeYrW-Ea6-_hU7eZmgvi3f2Sb8khu95tVPqc/edit?usp=sharingat the bottom of the agenda is also a place to summarize any achievements or concerns to be shared at the 4 pm standup today
2021-02-22
Lori Shepherd (11:49:33): > Let’s start collecting notes for tomorrow’s Tech Working Group stand-up! Please post updates here.@Vince Careywill you be able to summarize at tomorrow afternoon’s call?
Vince Carey (11:50:51): > yes
Vince Carey (15:55:20): > Here is an update: the lol concepts in hca package can be deployed against CMG FHIR data > > dir() > [1] "Observation.json" "Organization.json" "Patient.json" > [4] "Practitioner.json" "ResearchStudy.json" "ResearchSubject.json" > [7] "Specimen.json" "Task.json" > > obs = readLines("Observation.json") > > obsl = lapply(obs, jsonlite::fromJSON) > > obslol = lol(obsl) > > obslol > # class: lol > # number of distinct paths: 17 > # total number of elements: 272 > # number of leaf paths: 11 > # number of leaf elements: 176 > # lol_path(): > # A tibble: 17 x 3 > path n is_leaf > <chr> <int> <lgl> > 1 [*] 16 FALSE > 2 [*].code 16 FALSE > 3 [*].code.coding 16 TRUE > 4 [*].code.text 16 TRUE > 5 [*].extension 16 FALSE > 6 [*].id 16 TRUE >
Vince Carey (15:56:46): > The report goes on…. it is a nice overview of the structure. Gotcha: hca package in Bioconductor github uses |>. Need R devel, or a fork that I made that substitutes magrittr.
Martin Morgan (16:44:39) (in thread): > Thanks; I added Depends: R (>= 4.1) just to be difficult; the pipe has already become convenient, and the use of base R pipe can be quite helpful in providing, e.g., clearer debugging / call stacks.
2021-02-23
Vince Carey (08:36:02): > Need more info on last week’s progress here?@Nitesh Turaga?@Levi Waldron?
Levi Waldron (08:58:35): > I started my “DeepPilots Lab Notebook” to document in some detail my AnVIL/Terra for classroom setup and experiences in two courses this semester for Stephen Mosher,@Frederick Tan,@Jeff Leek, Mike Shatz:https://docs.google.com/document/d/1jRn5uZred7RxBOQBdjV3q7Sx3KJcA76jwL3XPOC3rZY/edit?usp=sharing.
Nitesh Turaga (09:01:31): > I’m on the last leg of making the binaries available. I’ve made improvements on launching the application on a k8s cluster, and developed it’shelm-chart
giving us more control over it. With latest developments in RedisParam, i’m hoping that autoscaling will also be available available. I’ve been making the binaries 3 times a week since the start of last week for release and devel.
Vince Carey (09:08:39) (in thread): > Looks great. Should I link to it in the Tuesday tech meeting agenda?
Levi Waldron (09:09:06) (in thread): > Sure if you’d like!
Vince Carey (14:55:25): > standup material draft:https://docs.google.com/document/d/1iPmoNQmIpGnShm-6THcvGch-vWmcgPGQzhACun8soUw/edit?usp=sharing - File (Google Docs): Bioconductor/AnVIL weekly standup notes
2021-03-01
Lori Shepherd (11:12:59): > Agendafor tomorrow’s bi-weekly meeting
2021-03-04
Nitesh Turaga (12:31:53): > Just had a meeting with Gabriella from the Terra team about some RStudio advice they needed. They are trying to integrate Rmd files into the “notebook” section, and wanted a walk through of how users actually use Rmd files. > > In jupyter, they just publish the html, whether it is rendered with code or not, to the workspace. The user then has the option to “edit” the jupyter notebook, add to it and then run if they choose to with a runtime. > > Rmd files are just text files. They are not in HTML form to publish to a workspace. The issue they are having is “listening” to the ‘rendering’ of the Rmd files. This is done only when the user does aknit
and produces a HTML file with the same name as the Rmd file. One suggestion is they just check periodically for an HTML file of the same name in the directory.
Nitesh Turaga (12:32:32): > They are also wondering about user “flow” when using Rmd files and RStudio. I’m not particularly best at this, as I use it casually.
Nitesh Turaga (12:34:59): > This is being discussed by the product and UX teams at Terra. It gathered steam after Martin’s comments at the tech call it seems.
Frederick Tan (12:56:28): > Another (potential) nuance is using R Markdown vs R Notebooks
Nitesh Turaga (12:57:01): > R notebooks ?? As in using Jupyter with an R kernel ?
Nitesh Turaga (12:57:28): > or within Rstudio, using the R Notebook tab vs R markdown….??
Frederick Tan (12:57:28): > https://bookdown.org/yihui/rmarkdown/notebook.html
Nitesh Turaga (12:58:14): > Aaah I understand…but they both produce HTML files at the end of it no? And initially their extensions are.Rmd
and these both need to be published the same way?
Frederick Tan (12:58:46): > Just commenting that the data flow / hooks have the potential to be different
Nitesh Turaga (12:59:54): > i see…i’ve never used notebooks. Your input to the terra team might be very useful.
Martin Morgan (15:23:06): > If I do > > git clone[https://github.com/Bioconductor/AnVILPublish](https://github.com/Bioconductor/AnVILPublish)R CMD build AnVILPublish >
> I end up with a tarball that has content, in part, like > > $ tar tzf AnVILPublish_1.1.2.tar.gz > AnVILPublish/DESCRIPTION > AnVILPublish/R/ > ... > AnVILPublish/README.md > ... > AnVILPublish/inst/doc/AnVILPublishIntro.R > AnVILPublish/inst/doc/AnVILPublishIntro.Rmd > AnVILPublish/inst/doc/AnVILPublishIntro.html > ... >
> The title of the vignette, from the Rmd file, is > > title: "Publishing R / Bioconductor Packages To AnVIL Workspaces" > ... >
> So I might expect on the NOTEBOOKS page to see a link > > Publishing R / Bioconductor Packages To AnVIL Workspaces [RStudio Rmd] >
> Clicking on the link when I didn’t have an RStudio runtime or wanted to ‘preview’ it would link to the static document > > AnVILPublish/inst/doc/AnVILPublishIntro.html >
> Clicking on the link in playground or edit mode with an RStudio runtime would open RStudio with the Rmd file > > AnVILPublish/inst/doc/AnVILPublishIntro.Rmd >
> open and ready for interactive evaluation (note the green ‘evaluate me’ arrow in the R code chunk) in a ‘special’ directory (see the panel on the right; it is account and workspace specific, since the runtime is shared across workspaces [?]). > > Maybe there’s a better slack channel for this? - File (PNG): Screen Shot 2021-03-04 at 3.18.38 PM.png
Nitesh Turaga (15:24:12): > Probably thebioconductor-on-terra
channel@Martin Morganin the AnVIL slack.
Nitesh Turaga (15:24:40): > But I guess their overall goal is to make Rmarkdown files available to use via terra UI
Martin Morgan (15:26:04): > Do you mean ‘use but not in RStudio’?
Nitesh Turaga (15:26:58): > Yes, they’d like to have a Rmd file display in a similar fashion in the NOTEBOOKS tab,
Nitesh Turaga (15:27:10): - File (PNG): Screen Shot 2021-03-04 at 3.26.52 PM.png
Martin Morgan (15:28:05): > Yes, exactly, and when the notebook link is clicked they either get the HTML as a frame, or the RStudio session above?
Nitesh Turaga (15:31:26): > As a frame, like Jupyter and then it could be linked to an RStudio runtime. But maybe this is better discussed with the terra team. Their most basic question was, “how does one use Rmd” in RStudio (What is the use-case because jupyter seems to do a fine job.) ?
Sean Davis (16:41:15): > A jupyter notebook (ipynb)caninclude both input and output chunks, so it is not an equivalent to a .Rmd file in content. An Rmd does not include rendered output from code blocks, sort of by definition. An Rmd be rendered via a fast markdown-to-html converter if the .md, .html, or .pdf are not available.
Sean Davis (16:41:46): > The R notebook just adds another possibility to the mix.https://bookdown.org/yihui/rmarkdown/notebook.html#notebook-file - Attachment (bookdown.org): 3.2 Notebook | R Markdown: The Definitive Guide > The first official book authored by the core R Markdown developers that provides a comprehensive and accurate reference to the R Markdown ecosystem. With R Markdown, you can easily create reproducible data analysis reports, presentations, dashboards, interactive applications, books, dissertations, websites, and journal articles, while enjoying the simplicity of Markdown and the great power of R and other languages.
2021-03-08
Tim Triche (13:01:29): > @Tim Triche has joined the channel
2021-03-09
Lori Shepherd (07:32:29): > <!channel>Can we start putting some notes for@Vince Careyfor the 4 pm technical group call standup later today
Vince Carey (11:17:17): > I got an email that this 4pm call is canceled.
Vince Carey (11:41:02): > should we use at least a fraction of the 4pm interval to catch up internally? maybe to work on the “starting out” document?
Nitesh Turaga (11:42:10): > I see, I didn’t receive that email stating that the call has been canceled.@Lori ShepherdCan you confirm?
Lori Shepherd (11:44:10): > I also did not get an email. I’ll reach out to confirm.
Lori Shepherd (11:52:00): > I just confirmed with Mike. The tech call IS cancelled today
Nitesh Turaga (11:55:53): > Ok great!
Nitesh Turaga (11:58:46): > If folks want to meet and catch up on this document, then i’m open to it. - Attachment: Attachment > should we use at least a fraction of the 4pm interval to catch up internally? maybe to work on the “starting out” document?
Vince Carey (15:07:52): > at 4pm we will meet atmeet.google.com/sqc-pwhm-ogvto work on startup document
Vince Carey (15:16:36): > This is the current drafthttps://github.com/Bioconductor/anvil-portal/blob/bioc-intro/content/learn/getting-started/getting-started-with-bioconductor.md - Attachment: content/learn/getting-started/getting-started-with-bioconductor.md > > --- > title: "Getting Started with R / Bioconductor" > author: "Bioconductor" > description: "Guides helping R / Bioconductor users start RStudio or Jupyter for interactive analysis, and workflows for large-scale data processing." > --- > > <!-- > The plan is for the lead sentence of each bullet to lead to a short video describing the topic. > --> > > # Getting Started with R / Bioconductor > > <hero small>This guide helps R / Bioconductor users: establish and familiarize themselves with essential Terra account and workspace concepts; use RStudio and Jupyter Notebooks for interactive analysis; execute workflows for large-scale, including use of R / Bioconductor in the workflow, and management of workflows from within R. The guide indicates how to discover R / Bioconductor workspaces, and how the R / Bioconductor community can contribute to AnVIL and cloud-based computation.</hero> > > ## AnVIL Basics > > - [Getting Started with AnVIL][] provides essential information for setting up a Terra account, billing and cost management, use of Terra workspaces, finding and accessing (public as well as protected) consortium-scale data, and running workflows and interactive analyses. > > ## R / Bioconductor with RStudio or Jupyter > > Getting started > > - [The RStudio runtime][RStudio] provides a familiar cloud-based environment for using R / Bioconductor. > - Access R / Bioconductor through Jupyter notebooks running an R 'kernel'. > > [RStudio]: [https://terra.bio/try-rstudio-in-terra/](https://terra.bio/try-rstudio-in-terra/) > > Terra / AnVIL concepts for R / Bioconductor users > > - Where Is My Computer? The AnVIL runtime provides the physical machinery for computation (e.g., a 4 core CPU with 16 GB of memory) as well as the local 'disk' storage. Unlike a traditional computer, the compute and storage components are separate from one another. For instance, storage created with one runtime can be used with another runtime. A runtime and persistent disk belong to a single user. They are associated with a billing project. The same runtime / persistent disk can be used across workspaces. > > - Where Is My Data? Local disks, DATA, and workspace buckets. A persistent disk contains data, scripts, packages, and output created by the user in the course of an analysis. Workspaces bring additional data. Tabular summaries of workspace data, e.g., descriptions of participants in the study the workspace encapsulates, are presented under the DATA element, while larger data produced during an analysis may be associated with the workspace 'bucket'. The [AnVIL][AnVIL-package] R / Bioconductor package provides a familiar interface for accessing these resources. > > - Sharing and Cloning; Billing and Cost Control. R / Bioconductor users are particularly interested in reproducibility and sharing, and of course do not want to find themselves stuck with a surprising bill for their computing. ... The [AnVILPublish][] R / Bioconductor package provides a way to easily transform an R package or collection of Rmd files into an AnVIL workspace. Coupled with use of git or other version control system, this provides a good path to collaborative, reproducible, and sharable analysis. > > ## Workflows > > - Workflow Inputs, Execution, and Outputs. ... The [AnVIL][AnVIL-package] also provides commands that make working with workflows, especially workflow inputs and outputs, easy for R / Bioconductor users. > > - A Bulk RNASeq Differential Expression Workflow. > > ## R / Bioconductor Resources > > - Public Workspaces. > > - Participate in the R / Bioconductor Community. > > - Producing customized runtimes. > > [Getting Started with AnVIL]: [https://anvilproject.org/learn#getting-started-with-anvil](https://anvilproject.org/learn#getting-started-with-anvil) > [AnVIL-package]: [https://bioconductor.org/packages/AnVIL](https://bioconductor.org/packages/AnVIL) > [AnVILPublish]: [https://bioconductor.org/packages/AnVILPublish](https://bioconductor.org/packages/AnVILPublish) > >
2021-03-15
Lori Shepherd (13:15:51): > Agenda for tomorrow’s bi-weekly meetinghttps://docs.google.com/document/d/19gSFtqxpYCMq2RZTu3q19aM-fg6MuDyU_h5DrSXdcYM/edit?usp=sharing
2021-03-16
Vince Carey (12:03:10): > Need standup material!!
Nitesh Turaga (12:03:44): > Would you think it’s worth saying / demo-ing the RELEASE_3_12 binary package availability ?
Vince Carey (12:09:10): > maybe … i think we have shown this before … let’s see what else materializes
Nitesh Turaga (12:09:21): > ok
Vince Carey (12:10:07): > it seems to me we are in more of a hardening phase right now … we need to get everything in shape for 3.13. I will talk about available videos for outreach.
Vince Carey (12:12:01): > it may be worth mentioning that runtime creations seem slow … prohibitively slow for a working scientist IMHO … factors affecting time from “create” to “use” should be studied? are they mainly controlled by GCP?
Nitesh Turaga (12:13:22): > They are controlled by GCP and the ‘size’ of the images. These are launched in turn by Leonardo.
Vince Carey (12:14:17): > maybe not worth commenting on then. here’s something interesting > > WARNING: Session forced to suspend due to system upgrade, restart, maintenance, or other issue. Your session data was saved however running computations may have been interrupted. > R version change [4.0.3 -> 4.0.0] detected when restoring session; search path not restored > > .libPaths() > [1] "/home/rstudio/R/x86_64-pc-linux-gnu-library/4.0-3.12" > [2] "/usr/local/lib/R/site-library" > [3] "/usr/local/lib/R/library" >
> this happened when i used an old container (for OSCAbook infrastructure)
Nitesh Turaga (12:15:17): > I see…Let me take a look at this…
Lori Shepherd (12:16:20) (in thread): > Movement on binaries has been listed consistently every week – and I’m pretty sure based on the notes it was demo mid feb.
Vince Carey (12:25:04): > I don’t think it is surprising. The old container has expectations inconsistent with the existing .libPaths target. Then, following the instructions in the orch workspace, we get > > Error: package or namespace load failed for ‘scRNAseq’ in dyn.load(file, DLLpath = DLLpath, ...): > unable to load shared object '/home/rstudio/R/x86_64-pc-linux-gnu-library/4.0-3.12/RSQLite/libs/RSQLite.so': > /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by /home/rstudio/R/x86_64-pc-linux-gnu-library/4.0-3.12/RSQLite/libs/RSQLite.so) > In addition: Warning messages: >
Vince Carey (12:25:55): > I am going to rewrite the instructions so they use actual infrastructure with your binaries and no custom container.
Vince Carey (12:30:26): > Strong argument for CI/CD for workspaces.
Vince Carey (12:30:37): > And the descriptions should be computable.
Vince Carey (12:30:54): > And raise heck when they make false assertions.
Vince Carey (12:35:19): > I think I have some standup material though it is egg on my face that I hope to remove by 4.
Vince Carey (12:39:17): > Trouble: > > > BiocManager::valid() -> vv > > vv > [1] TRUE > > AnVIL::install(c('SingleCellExperiment', 'scater', 'scran', 'uwot')) > Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : > namespace ‘rlang’ 0.4.9 is already loaded, but >= 0.4.10 is required >
Nitesh Turaga (12:40:10): > Hmm…so somehow it skippedrlang
during the binary build?
Vince Carey (12:40:11): > restart required
Vince Carey (12:40:55): > also build rhdf5 from source
Vince Carey (12:43:18): > real world. now hit status code 503 but it keeps plugging along
Vince Carey (12:48:15): > probably the custom environment would have worked had i thrown away the persistent disk. but it is unpleasant to rely upon it when it is not really necessary.
Vince Carey (12:49:48): > BiocNeighbors also compiled from source. Maybe BiocManager::valid found newer version than in the repo?
Nitesh Turaga (12:51:13): > So, i’m confused because there was a successful build yesterday and these binaries are available on the bucket.
Vince Carey (12:52:39): > OK. I will try to understand this better when the compilations stop.
Vince Carey (12:59:26): > This is the command I used > > AnVIL::install(c('SingleCellExperiment', 'scater', 'scran', 'uwot')) >
> and at a minimum rhdf5 and edgeR were compiled from source
Vince Carey (12:59:40): > BiocManager::valid() returns true
Vince Carey (13:00:16): > edgeR is 3.32.1
Nitesh Turaga (13:00:50): > edgeR should have been built by my code.
Nitesh Turaga (13:00:55): > I’ll check why it wasn’t
Nitesh Turaga (13:01:54): > so, edgeR 3.32.1 is available on the bucket
Nitesh Turaga (13:02:01): > I’m not sure why it didn’t pull it down for you
Nitesh Turaga (13:02:08): > https://storage.googleapis.com/bioconductor_docker/packages/3.12/bioc/src/contrib/edgeR_3.32.1_R_x86_64-pc-linux-gnu.tar.gz
Nitesh Turaga (13:02:46): > So is rhdf5 v 2.34.0https://storage.googleapis.com/bioconductor_docker/packages/3.12/bioc/src/contrib/rhdf5_2.34.0_R_x86_64-pc-linux-gnu.tar.gz
Vince Carey (13:04:34): > > > AnVIL::repositories() > BioCsoft > "[https://bioconductor.org/packages/3.12/bioc](https://bioconductor.org/packages/3.12/bioc)" > BioCann > "[https://bioconductor.org/packages/3.12/data/annotation](https://bioconductor.org/packages/3.12/data/annotation)" > BioCexp > "[https://bioconductor.org/packages/3.12/data/experiment](https://bioconductor.org/packages/3.12/data/experiment)" > BioCworkflows > "[https://bioconductor.org/packages/3.12/workflows](https://bioconductor.org/packages/3.12/workflows)" > CRAN > "[https://packagemanager.rstudio.com/all/__linux__/focal/latest](https://packagemanager.rstudio.com/all/__linux__/focal/latest)" > > >
Nitesh Turaga (13:05:04): > You are missing installing the latest version of the AnVIL fromBiocManager::install('Bioconductor/AnVIL')
Vince Carey (13:06:54): > oh yes … will the CRAN version be updated soon?
Nitesh Turaga (13:07:24): > with the release
Nitesh Turaga (13:07:38): > But…i’ll leave@Martin Morganto add to this
Martin Morgan (13:22:15): > AnVIL is in Bioconductor; changes in devel. Installing from GitHub allows the release Bioc of the AnVIL docker image to use the devel version of the packages. BiocManager::install() after the release…
Vince Carey (13:26:23): > things go much more smoothly with the right version of AnVIL
2021-03-23
Lori Shepherd (11:09:53): > I won’t be at this afternoons tech call –@Vince Careyare you able to present our standup items?<!channel>Please list items to share for standup today here
Vince Carey (11:25:40): > yes i will do the standup.
Lambda Moses (23:04:01): > @Lambda Moses has joined the channel
2021-03-24
Nitesh Turaga (13:26:09): > The devel image for bioconductor should be now available atus.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor-devel:3.13.0and can be launched as a “Custom image”. > > I’m having second thoughts on the naming scheme of the image, Maybe it should just be calledus.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:3.13.0
Nitesh Turaga (13:26:12): > Thoughts?
Nitesh Turaga (13:44:02): > or justus.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:devel
Martin Morgan (14:06:22): > I’d leave it the way it is it makes it clear that this is the developed version of 313 zero and not the release version 313 zero that will come up soon.
Nitesh Turaga (14:06:49): > ok, sounds good.
Nitesh Turaga (14:07:25): > I was leaning towards aanvil-rstudio-bioconductor:devel
image where thedevel
tag always is the bleeding edge, and the VERSION file has the correct version. But this sounds good, as long as folks understand what it is.
Chris Williams (16:41:58): > @Chris Williams has joined the channel
Chris Williams (16:42:46): > @Chris Williams has joined the channel
2021-03-30
Lori Shepherd (08:01:03): > Agenda for today’s 11 am meeting :Agenda
2021-04-05
Martin Morgan (15:28:36): > I have created a ‘popup workshop’ announcement, available fromhttps://docs.google.com/document/u/2/d/e/2PACX-1vSVGCaX-wnWyu1TUhhbsoVeTCJ6ODLG53OeMHKRbewGQOqOcMTnZQl7_jrR9kqOPQPlsFN1ecLT4lhd/pub. Please review and note responsibilities for@Levi Waldronand@Vince Carey. There is a sign-up form linked to the announcement. > > The editable announcement is athttps://docs.google.com/document/d/1u5sN3dqUcHA3T3GDRK4CW_IzbrspTmOOHSSWbo9DppQ/edit?usp=sharing
2021-04-06
Levi Waldron (06:08:27) (in thread): > Confirmed for May 24!
Nitesh Turaga (09:19:16) (in thread): > Confirmed for May 3rd:smile:
Lori Shepherd (15:11:35): > <!channel>any additional updates for tech anvil meeting this afternoon ?
Martin Morgan (15:20:59): > I added a few items tohttps://docs.google.com/document/d/1XcTR3rDFP4oE_4Ggl1WfD7nRfKPSHmWaE2_amIjfuk4/edit#(I’m not getting weekly reminders from Jenn Vessio?) but would rather not present…
2021-04-07
Levi Waldron (04:30:10): > (copying this FYI from my “DeepPilots” Lab Notebook I’m keeping to give user feedback) Here is what a workspace looks like when I connect with CyberDuck, a sFTP / gsutil / everything client. Some explanations from Rob Title: > > -fc-*
is your Terra workspace bucket. > > -leoinit-saturn-*
is the Leo initialization bucket. There is one of these per runtime in the project. It contains Leo-internal initialization files. This bucket is deleted when the associated runtime is deleted. > > -leostaging-saturn-*
is the Leo staging bucket. There is one of these per runtime in the project. It contains startup script logs, and Dataproc output/logs. It is deleted 2 weeks (iirc) after the runtime is deleted via lifecycle rule (we keep it around a while for debugging purposes). > As suggested in the diagram athttps://support.terra.bio/hc/en-us/articles/360024056512-Moving-data-to-from-a-workspace-or-external-Google-bucket-and confirmed here, it is a two-step process between local storage and persistent disk. RStudio does provide a direct Persistent Disk <–> local storage option through its own file browser. It is necessary to use command-line orAnVIL::gsutil_*
to move between workspace (“Cloud storage” in the above diagram) and runtime “Persistent Disk”. A very user-friendly Terra upgrade would be a visual interface to Cloud volumes like SevenBridges provides for Google and AWS buckets (https://docs.sevenbridges.com/docs/attach-a-google-cloud-storage-volume). This will increase in importance when Terra also supports Azure, to avoid users having to get into Cloud provider-specific weeds. > > Name: saturn-9f0dd023-7252-4617-966b-0cbb8176824e > > Persistent Disk: saturn-pd-cd264e84-d822-40db-8933-a3909634394d - File (PNG): image.png
Vince Carey (05:39:38): > I wonder whether a process could be introduced to allow user relabeling of folders that include uuids … these should not be user-visible.
Martin Morgan (06:14:32): > I’m not actually clear on what we’re looking at here? it’s not the google bucket associated with the workspace? e.g., on any workspace landing page the link at the bottom right - File (PNG): Screen Shot 2021-04-07 at 6.12.58 AM.png
Levi Waldron (18:05:22) (in thread): > It’s the buckets associated with the “GOOGLE PROJECT ID” (starting with bioconductor in your image)
Levi Waldron (18:06:29) (in thread): > I didn’t succeed in connecting using the Google Bucket ID, although that bucket is visible in the listing (the only one starting withfc-
)
2021-04-08
Martin Morgan (04:25:35) (in thread): > I see, so CyberDuck is providing a listing of all buckets under the project id. But isn’t the user just interested in the specific bucket(s) associated with the workspace > > AnVIL::avworkspace("bioconductor-rpci-anvil/Bioconductor-Package-AnVIL") > AnVIL::avbucket() >
> or more generally buckets associated with any workspace > > library(AnVIL); library(hca); library(dplyr) > wkspc = httr::content(Terra()$listWorkspaces()) > lol = hca::lol(wkspc) > tibble( > namespace = hca::lol_pull(lol, "[*].workspace.googleProject"), > name = hca::lol_pull(lol, "[*].workspace.name"), > bucket = hca::lol_pull(lol, "[*].workspace.bucketName") > ) >
Levi Waldron (04:27:18) (in thread): > Yes I think you’re right Martin.
2021-04-12
Lori Shepherd (11:29:26): > <!channel>Draft agenda for tomorrow’s meetingagenda 4-13Note: we are up for a presentation at the tech call next week!
2021-04-20
Vince Carey (09:36:44): > it is doubtful that i’ll make 4pm tech call
Nitesh Turaga (09:59:37): > I have a meeting at 11am with Meghan and Rob about the jupyter docker image potential updates.
Martin Morgan (12:45:50): > <!channel>updates for today’s 4pm meeting?
Nitesh Turaga (12:48:50): > I had the initial discussion with the Terra team about basing theterra-jupyter-r
image on the same stack asbioconductor_docker
. The solution we came up with is to use a “multi stage” build, where the terra-jupyter-r image involves anotherFROM
statement and adds certain libraries from the bioconductor_docker image. > > There are complexities to this, primarily the fact thatterra-jupyter-r
is based on Ubuntu 18.04 and thebioconductor_docker
and it’s compiled binaries are based on Ubuntu 20.04.
Nitesh Turaga (13:07:59): > https://broadworkbench.atlassian.net/wiki/spaces/IA/pages/1719533597/2021-04-20+R+Images+in+Terra-docker+stack
2021-04-26
Levi Waldron (07:47:01): > I’m currently getting this warning message when installing packages in AnVIL/Terra RStudio sessions; I assume it’s harmless as long as no other binary packages built on 4.0.4 get installed on the RStudio environment (R 4.0.3, Bioconductor 3.12). > > During startup - Warning message: > package 'BiocManager' was built under R version 4.0.4 >
Martin Morgan (08:30:01): > This could potentially be a problem for compiled code that makes a call to an internal function. The problem is that the binary repositoryBiocManager::repositories()[["CRAN"]]
is ‘latest’, but should (?) be pinned to the last day it used the 4.0.3 builds. ‘should’ because maybe our image should be updated to 4.0.4 (but now 4.0.5?) (or do we want to provide a stable environment for our users) or the binary package repository (RStudio in this case) should know about R patch versions, or…
Kozo Nishida (13:12:16): > @Martin MorganThank you for today’s workshop. > Is the binary R package (forLinux)installation (shown at the end of today’s workshop) availableonly inthe AnVIL environment, or is it available in any Ubuntu 18.04 environment?
Martin Morgan (13:15:41): > @Kozo Nishidathanks for coming! This pagehttps://packagemanager.rstudio.com/client/#/repos/1/overviewdescribes how to set up R to use binary packages on Linux; see supported platforms by clicking no the ‘CLIENT OS’ element at the top right
Martin Morgan (13:17:12): > Also, choose the ‘hamburger’ on the left and note that Bioconductor binaries are also available!
Kozo Nishida (13:23:16) (in thread): > Thank you for the information! I will check and try those. > This information is helpful because my colleague was worried about the time it takes to install the package.
Vince Carey (15:20:19): > Lori is away so we’ll have to manage the next meeting (tomorrow) ourselves. I have a FHIR+AnVIL meeting tomorrow at 10, so that should provide a little content. I trust the popup went well; I had a conflict. I am also sorry to have missed last week’s presentation, so if a link can be provided I’d love to see it.
Vince Carey (15:23:25): > Maybe we should discuss the versioning issue noted above at 830am at the 11am AnVIL meeting tomorrow.
Martin Morgan (15:48:28): > The slides from last week are are athttps://docs.google.com/presentation/d/e/2PACX-1vTsg05WspH0oNscKyMpRW_P3YK_Ajm2HH3sS[…]TXN7c31fLLsABwXc1PcmV/pub?start=false&loop=false&delayms=3000
Levi Waldron (16:55:13) (in thread): > How about pinning[["CRAN"]]
to the day of the Bioconductor release? I would think most AnVIL users don’t mind having packages be frozen for 6 months, and they could use the CRAN source repo if they want the latest and greatest.
2021-04-27
Martin Morgan (11:00:26): > https://bluejeans.com/480153337for the meeting?
Martin Morgan (12:09:37): > We didn’t talk about progress report for this afternoon’s meeting? I have > * PopUp workshop 1 done; 16 attendees, ~50 (eventual) registrants. Schedule, including links to material and video of presentation:https://t.co/mf2ztLNFVe. Next workshop (tell your friends): Monday May 3 at noon, US Eastern
Vince Carey (12:17:14): > I took the FHIR call. Some effort will be put into an AnVIL internal FHIR server, using a company called Asymmetric.
Vince Carey (12:24:33): > My work on AnVIL the past week concerned trying to use it to build a tool for software QC. The outcome ishttps://vjcitn.shinyapps.io/biocQA/… sadly I really could not use AnVIL per se to do it because I could not run CMD check effectively on many packages owing to broken latex support. I did not want to use a custom container. Now that I know about startup scripts, maybe I could get around that problem. Anyway if you try the app and put gwasurvivr into the box, all the tabs work. depnet and funnet tabs are important for QC and could play a role in FISMA evaluation for the packaging protocols we are proposing.
Vince Carey (12:36:48): > app runs in AnVIL FWIW
Tim Triche (13:30:03): > an HL7 server?! wow
2021-04-28
Vince Carey (08:53:17) (in thread): > What will happen if BiocManager::valid() is used in this case? Will it find outdated CRAN packages and provide an option for updating?
Martin Morgan (10:22:37) (in thread): > valid()
usesrepositories()
to assess validity, so it would report a valid installation even if newer versions had been added to CRAN.
Martin Morgan (10:26:39) (in thread): > One possibility (or at least to fix the problem of installing 4.0.4 binaries on a 4.0.3 build) is to includebotha date-pinned binary repository and a ‘traditional’ CRAN repository, in that order. Packages that haven’t been updated since the binary repository date will be installed as binaries, packages that have been updated will be installed from source. > > This fixes the bug and retains current expectations about how current packages are. It also avoids a problem with a pure ‘pinned’ solution, where a CRAN package changes and a Bioconductor release package changes in response — the updated Bioconductor package might not be compatible with the pinned CRAN package.
2021-05-04
Lori Shepherd (13:35:44): > <!channel>Any updates for our standup at the technical meeting later today?
Martin Morgan (13:37:07): > I think it’s cancelled…
Lori Shepherd (13:38:35): > oh – excellent – was it just announced at last weeks that I missed? or are there announcements anywhere as I don’t have any notification of this but indeed I see that there is no May 4th in the agenda document
Martin Morgan (13:55:32): > Announced at the meeting last week, NCIP meeting conflict. Also I think Jenn Vessio sends out a weekly announcement on the day of — today first to remind and then to cancel …:wink:Not sure about agendas, etc
Vince Carey (14:35:11): > Right, I forgot about this. Sorry folks. We are free to work through the afternoon.
2021-05-10
Charlotte Soneson (01:11:55): > @Charlotte Soneson has joined the channel
Lori Shepherd (11:46:28): > draft agenda for tomorrows bi-weekly meeting :https://docs.google.com/document/d/1EIeZxpQE5hMFmzfnczUkqbGfGqYUymrfDYuEk7X6Vpg/edit?usp=sharingWe were thinking of changing the format to be a rolling agenda document that is just added to each week like the current Tech/PM agenda’s are. We can discuss thoughts on this format at the begin of tomorrows meeting –
2021-05-11
Lori Shepherd (10:59:56): > Here is the link the meeting if anyone needs it:https://bluejeans.com/480153337?src=calendarLink
Martin Morgan (15:57:28): > @Vince Careyfor the call in a few seconds, I’ve added a couple of Issues to the ‘Admin Updates’ section of the agenda document, as well as some details to the ‘Lessons being learned’ section that could be used for updateshttps://docs.google.com/document/d/1EIeZxpQE5hMFmzfnczUkqbGfGqYUymrfDYuEk7X6Vpg/edit?usp=sharing
Vince Carey (15:59:02): > i think i moved it all to the anvil tech group agenda – have a look and if i missed something just put it in.https://docs.google.com/document/d/1XcTR3rDFP4oE_4Ggl1WfD7nRfKPSHmWaE2_amIjfuk4/edit?usp=sharing - File (Google Docs): AnVIL Weekly Tech call - agenda/notes/action items
Nitesh Turaga (16:35:33): > I looked at multi stage builds, for the PR…and i’m not sure how to merge Ubuntu 18.04 with 20.04. When I said i’d PR to the terra-docker, I meant for multi stage builds and not for the Ubuntu 20.04 update of their entire ecosystem of terra-jupyter images.
Nitesh Turaga (16:36:41): > This seems to have been a misunderstanding with the Terra team.
Nitesh Turaga (16:38:03): > Updating the terra-jupyter stack to Ubuntu 20.04 is a lot of work, which the terra team should undertake IMHO.
Megha Lal (16:43:43): > @Megha Lal has joined the channel
2021-05-18
Vince Carey (12:27:22): > topics for today’s standup?
Vince Carey (12:29:43): > anyone used gcloud source repos? this is GCP git analog and i wonder if it interfaces with Rstudio
Vince Carey (12:30:05): > https://cloud.google.com/source-repositories/docs/quickstart - Attachment (Google Cloud): Quickstart: Create a repository
Martin Morgan (12:43:07): > I think we should highlight the need for an update to bioc-3.13. This implies changes to the cloud environment (from clicking on that icon in the GUI) > * Default: will become R-4.1.0-based > * Terra-maintained R/Bioconductor will become R-4.1.0 > * Legacy GATK and Legacy R / Bioconductor will become the current 4.0.5 images > * Community-maintained RStudio will be 4.1.0 > I think Nitesh knows the details for the tags that need to be added so that the released versions are updated; the Broad needs to update the GUI, I think. Adrian was in sync with this; I’m not sure about Meagan. > > We could also mention trying to make the binary package repository more robust. I have (not necessarily for public consumption)https://gist.github.com/mtmorgan/560d3438c20b4df83789fa5937518f52which points to the problems that we know about. > > Worth mentioning also another iteration of the popup workshops, with 102 total (cummuulative) registrations this week, 24 participants, 15 workspaces cloned, $2.62 costs. I think most of the ‘lessons learned’ were about the need for robust binary package installation (i.e., internal and not relevant to the tech call); a ‘feel good’ call out could be to the ability to create and use custom images.
Vince Carey (14:59:31): > FWIWhttps://app.terra.bio/#workspaces/use-strides/Bioconductor-Workshop-OSCA-3-12description now includes the text of the Dockerfile underlying the popup workshop container
2021-05-24
Lori Shepherd (11:46:23): > Draft Agenda for tomorrow’s biweekly meeting – Note this will be a running agenda from now on – we will have the dates marked in the one file:https://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit?usp=sharing
2021-05-25
Nitesh Turaga (20:17:00): > AnVIL release 3.13 binaries are now available on gcloud.
Nitesh Turaga (20:17:37): > We also have individual package logs at bioconductor_docker/packages/3.13/bioc/src/package_logs
2021-05-27
Andres Wokaty (08:35:05): > @Andres Wokaty has joined the channel
2021-06-01
Lori Shepherd (13:40:47): > <!channel>any upates for this afternoon’s tech meeting standup?
2021-06-02
Levi Waldron (05:22:22): > Heads up since the R/Bioconductor release I’m getting this warning. Not harmful itself but makes me nervous for other packages. > > During startup - Warning message: > > package ‘BiocManager’ was built under R version 4.0.4 > > * installing to library ‘/home/rstudio/R/x86_64-pc-linux-gnu-library/4.0-3.12’
Vince Carey (06:17:58): > @Nitesh Turaga^^
Nitesh Turaga (13:27:54): > Hmm….
Nitesh Turaga (13:28:09): > But the release didn’t happen yet in the AnVIL
Nitesh Turaga (13:28:13): > @Levi Waldron
Nitesh Turaga (13:29:05): > So we didn’t update the bioconductor_docker:RELEASE_3_12 image beyond R 4.0.3.
Nitesh Turaga (13:29:55): > But are we expecting users to be aware of every patch version update too? This is a larger question.
Levi Waldron (13:45:15) (in thread): > But I guess biocmanager did get updated in cran
Nitesh Turaga (13:48:25): > @Vince Carey@Martin Morganseparate thread, but this is what I was supposed to be doinghttps://the-anvil.slack.com/archives/GM5C32K2P/p1603467609019800to make sure the packages populate here, - File (PNG): Screen Shot 2021-06-01 at 9.31.37 AM.png - File (PNG): Screen Shot 2021-06-01 at 9.31.31 AM.png
2021-06-03
Nitesh Turaga (13:44:27) (in thread): > yes, it did. So there is a version mismatch. > > The same thing happens now on the devel docker image too..
2021-06-07
Vince Carey (13:02:42): > are there API calls for configuring runtimes? if so can AnVILPublish incorporate that capability .. more than publishing workspace content but also dealing with triggering a “cloud environment” launch? this would seem to address m stadler’s question.
Martin Morgan (16:40:47) (in thread): > It’s an interesting question. There is an interface to manage runtimes, but I’m not sure how that would work in a typical scenario, where a user has logged in and navigated to a workspace that happens to have been generated by AnVILPublish — they’ll need to launch a cloud environment, so would go to the widget in the upper right… maybe a feature request would be to restrict available cloud environments based on workspace specifications…
Nitesh Turaga (16:49:28): > The AnVIL bioconductor 3.13 image is now live.
Martin Morgan (16:53:49): > I see the RStudio image but not the Jupyter image (still at 3.12 in the Cloud Environment widget)?
Nitesh Turaga (16:54:22): > Yes, the jupyter image PR has been accepted and will go into production next week from the looks of it.
2021-06-08
Lori Shepherd (10:43:10): > https://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit?usp=sharingagenda for today’s meeting
2021-06-15
Lori Shepherd (08:57:55): > Any updates for the anvil tech standup this afternoon?
Nitesh Turaga (09:30:54): > * I left a note on getting UI dropdown updates from Terra team on the #bioconductor-on-terra channel. No updates yet, but monitor the channel. > * Multi-stage build is a no-go, post technical discussion with Rob and Qi. > * Just building binaries 1 time a week starting last week. We did not see a significant uptake on binary package usage. All the builds will be on a 1 time a week rotation. Also capturing build logs for every package.
Nitesh Turaga (10:12:48): > https://broadworkbench.atlassian.net/browse/IA-2810UI update pushed to Wednesday instead of yesterday.
Martin Morgan (13:56:06) (in thread): > Not sure that these should be posted un-censored, but > * PopUp workshop series concludes. Less than $25 in cloud costs. 20+ in attendance per workshop, 125 contacts. > * Useful interactions with the Broad about Analysis tab & Bioconductor placement, looking forward to updated wireframes & discussion > * Bioconductor release / Cloud Environment GUI — RStudio looking good (R 4.1.0 / Bioc 3.13.0); R / Bioconductor and Legacy R / Bioconductor lagging (known reasons and a PR for updated description). > * Would still like to see regular usage reports > > > Workflow # Interactive sessions > Month Jobs CPU Hours GB Data Python R Galaxy Seqr > --------------------------------------------------------------- > May ... >
2021-06-19
Vince Carey (07:38:45): > For Q3 Bioc activities I would like to see a featured workspace developed with Kasper et al, on recount3, with the theme “SRA in AnVIL via recount3/Bioc”
2021-06-22
Lori Shepherd (08:55:11): > Hi everyone. Here is the agenda for today’s meetinghttps://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit?usp=sharing
Frederick Tan (11:39:10): > Given that Seurat is installed by default, any reason to not havebioconductor.org/packages/AnVILinstalled by default? > * https://github.com/anvilproject/anvil-docker/blob/bb8580d27d09f0a5ccb98fc8179c9bc97ae7ff16/anvil-rstudio-bioconductor/Dockerfile#L18
Marcel Ramos Pérez (13:05:53) (in thread): > @Andres Wokatyfeel free to reach out to me withpkgdown
questions for the BiocAnVIL website. The website can also be configured to re-generate with every commit via GHA (I am not sure how it is updated currently)
2021-06-23
Stephen Mosher (16:30:36): > @Stephen Mosher has joined the channel
2021-06-28
Levi Waldron (05:57:54): > Anyone else notice that Bioconductor packages on AnVIL (RStudio) are now installing from source?
Frederick Tan (06:26:35): > Yes, I believe BiocManager::install() installing from source, AnVIL::install() installing from binaries
Vince Carey (07:14:36): > I am considering how to design a “Terra-backed SummarizedExperiment”.
Vince Carey (07:32:00): > The idea is to work from PFB that was exported to terra, which gives tables including drs ids for the assay components.
Levi Waldron (08:36:40) (in thread): > Ahh right, thank you Frederick!
Levi Waldron (08:38:03): > That sounds nice - you would have metadata in-memory, then have a command to localize objects from the tables after potentially subsetting?
Frederick Tan (09:13:14) (in thread): > That being said, I thought that BiocManager::install() installed binaries as well but am not sure if I’m just misremembering cc:@Nitesh Turaga
Vince Carey (09:59:16): > right. more soon
Vince Carey (11:54:10): > Is this meaningful? > > > drs_stat(myd) > Error: 'DRS resolution' failed: > Unauthorized (HTTP 401). > Received error contacting Bond. Invalid authorization token. b'{\n "error": "invalid_token",\n "error_description": "Invalid Value"\n}\n' > > avtables() > # A tibble: 5 x 3 > table count colnames > <chr> <int> <chr> > 1 program 1 program_id, pfb:dbgap_accession_number, pfb:program_name > 2 subject 72 subject_id, pfb:abnormal_wbc_history, pfb:abused_prescription_pill, pfb:active_e… > 3 project 1 project_id, pfb:authz, pfb:code, pfb:dbgap_accession_number, pfb:dbgap_consent, … > 4 sequencing 9510 sequencing_id, pfb:alternative_aligments, pfb:analysis_freeze, pfb:analyte_type,… > 5 sample 3655 sample_id, pfb:autolysis_score, pfb:bss_collection_site, pfb:created_datetime, p… > > drs_stat(myd) > # A tibble: 1 x 10 > fileName size contentType gsUri timeCreated timeUpdated bucket name googleServiceAc… hashes > <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <list> <list> > 1 GTEX-OOB… 1.47e8 applicatio… gs://… 2020-07-08… 2020-07-08… fc-se… GTEx_… <named list [1]> <name… >
Vince Carey (11:54:48): > did avtables ‘wake up’ my authorization somehow? or is this just a transient error that went away?
Nitesh Turaga (18:46:47) (in thread): > AnVIL::install() is correct@Frederick Tan
Nitesh Turaga (18:47:15) (in thread): > BiocManager::install can as well…just have to replace the BiocManager::repositories()
Nitesh Turaga (18:47:29) (in thread): > with AnVIL::repositories()
Frederick Tan (19:12:25) (in thread): > Great, thanks for the clarification!
2021-06-29
Martin Morgan (07:55:36) (in thread): > Using AnVIL::repositories in BiocManager::install won’t work, because one can’t (easily, and by design) set repositories in BiocManager. You could use AnVIL::repositories() in install.packages… (but then why not just use AnVIL::install…?)
Martin Morgan (07:56:24) (in thread): > I don’t think so; is this reproducible? (how?)
Lori Shepherd (14:11:47): > any updates for the tech standup today?
Nitesh Turaga (14:12:12): > None from my end sorry.
Martin Morgan (14:48:58): > * drs_stat()
open issue: improved performance for many requests. Will parallelize to mitigate latency. > * resolved(?) RStudio / R-4.1.0 incompatibility (thanks to Rob & Fred)https://the-anvil.slack.com/archives/GM5C32K2P/p1624381321023800
2021-06-30
Martin Morgan (12:47:34): > @Vince Careyany luck withavtable_paged()
for more time-efficient realization of large tables?
Vince Carey (13:28:32): > thanks for reminder. will check soon.
Vince Carey (23:07:36): > This workspacehttps://anvil.terra.bio/#workspaces/landmarkanvil2/Bioconductor-Package-SraInAnVILcan be part of the Tuesday presentation. The aims are to a) consider how a significant metadata API for public data might be advantageous for thinking through detailed practical application of FHIR, b) to consider how the recount3 quantifications will be usable in AnVIL via Bioconductor, c) to illustrate HSDS as a vehicle for presenting HDF data through DelayedArray … if this were deployed in GCP would it be of use to AnVIL?
Vince Carey (23:35:02): > Aproposavtable_paged()
@Martin Morgan > > > mark(avtable_paged(table="sequencing", name="gtex_pfb_large", pageSize=5000L)) > |===================================================================================================================================| 100% > |===================================================================================================================================| 100% > # A tibble: 1 x 13 > expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc > <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> > 1 avtable_paged(table = "sequencing", name = "gtex_pfb_large", pageSize = 5000L) 1.74m 1.74m 0.00959 1.08GB 0.220 1 23 > # … with 5 more variables: total_time <bch:tm>, result <list>, memory <list>, time <list>, gc <list> > Warning message: > Some expressions had a GC in every iteration; so filtering is disabled. > > > > mark(avtable_paged(table="sequencing", name="gtex_pfb_large", pageSize=2000L)) > |===================================================================================================================================| 100% > |===================================================================================================================================| 100% > # A tibble: 1 x 13 > expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc > <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> > 1 avtable_paged(table = "sequencing", name = "gtex_pfb_large", pageSize = 2000L) 2.57m 2.57m 0.00650 1.66GB 0.221 1 34 > # … with 5 more variables: total_time <bch:tm>, result <list>, memory <list>, time <list>, gc <list> > Warning message: > Some expressions had a GC in every iteration; so filtering is disabled. >
Vince Carey (23:35:54): > it’s about 95000 records.
2021-07-01
Martin Morgan (04:39:49): > Thanks@Vince Careyhow does that compare withavtable()
?
Vince Carey (09:55:42): > Well, the behaviors aren’t consistent with what I experienced on Monday where avtable (the only method tried) seemed intolerably long. Now it is beating the paged approach. > > > mark(zz [===============================================================================================================================| 100% > |===============================================================================================================================| 100% > # A tibble: 1 x 13 > expression min median `itr/sec` mem_alloc `gc/sec` n_itr > <bch:expr](- avtable_paged(table="sequencing", name="gtex_pfb_large", pageSize=2000L)) > ) <bch:t> <bch:t> <dbl> <bch:byt> <dbl> <int> > 1 zz <- avtable_paged(table = "sequencing", name = "gtex_pfb_large", pageSize = 2000L) 2.35m 2.35m 0.00708 1.63GB 0.0850 1 > # … with 6 more variables: n_gc <dbl>, total_time <bch:tm>, result <list>, memory <list>, time <list>, gc <list> > Warning message: > Some expressions had a GC in every iteration; so filtering is disabled. > > dim(zz) > [1] 94949 63 > > > mark(zz2 <- avtable(table="sequencing", name="gtex_pfb_large")) > # A tibble: 1 x 13 > expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result > <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> > 1 zz2 <- avtable(table = "sequencing", name = "gtex_pfb_large") 1.09m 1.09m 0.0153 687MB 0.0614 1 4 1.09m <tibbl… > # … with 3 more variables: memory <list>, time <list>, gc <list> > Warning message: > Some expressions had a GC in every iteration; so filtering is disabled. > > dim(zz2) > [1] 94949 63 >
2021-07-06
Lori Shepherd (10:21:03): > anvil bi-weekly meeting today . here is the link to the agendahttps://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit?usp=sharing
2021-07-07
Lori Shepherd (08:37:33): > <!channel>– since we had a tech presentation this past week, we will have a section in the monthly working group updates – I started a section in the running agenda for the next bi-weekly meeting to review before I move over into the document that is shared with the project. If anyone wanted to start filling in accomplishments/in progress/risk-blockers/ upcoming awhile we can review at the next meeting.
Stephen Mosher (08:42:00) (in thread): > Worth pointing out that the monthly working group update (MWGU) report has been migrated to the new AnVIL shared drive (as opposed to the old AnVIL shared folder). Please let@Lori Shepherdor me know if you have trouble getting access!
2021-07-13
Vince Carey (12:44:08): > <!channel>Any news to report on tech call?
Martin Morgan (13:57:49): > * AnVILPublish enhancements to better support bookdown, especiallyOrchestrating Spatially Resolved Transcriptomics Analysis with Bioconductor/ (preliminary workspace available…) > * Internal enhancements to redis backend used when building package binaries - Attachment (lmweber.org): Orchestrating Spatially Resolved Transcriptomics Analysis with Bioconductor > Online textbook on ‘Orchestrating Spatially Resolved Transcriptomics Analysis with Bioconductor’
2021-07-19
Lori Shepherd (09:31:24): > Please note the NEW meeting link for tomorrow’s b-weekly meeting – it has also been updated in the agenda:https://meet.google.com/ied-ouvi-sey - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
Leo Lahti (16:51:16): > @Leo Lahti has joined the channel
2021-07-20
Vince Carey (07:17:10): > My effort this week spent on renovating support for cost analysis (https://app.terra.bio/#workspaces/landmarkanvil2/use_anvil_billing, shared with Bioconductor_User) and instrumentation (workspace to be posted later today).
Vince Carey (07:17:41): > @Frederick Tandid the “team creation for PI/admin” in AnVIL project/protocol come to fruition?
Lori Shepherd (08:00:09): > Agenda for today’s meetinghttps://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit?usp=sharingremember to click on new meeting link and NOT bluejeans
Frederick Tan (08:04:20) (in thread): > If you’re asking about a single unified dashboard for PIs/admins, then not yet. First step is making more widely available a Jupyter Notebook that leverages the API to create new Billing Projects for team members (if the description that has been conveyed is correct). Latest eta from Terra is their Aug 16th sprint.@Martin Morgan
Vince Carey (09:09:07): > Here is workspace for instrumentationhttps://app.terra.bio/#workspaces/use-strides/use_bioc_instrumentation
Vince Carey (09:19:12): > I had > > # Add back other env vars > RUN echo "TERRA_R_PLATFORM='anvil-rstudio-bioconductor'" >> /usr/local/lib/R/etc/Renviron.site \ > && echo "TERRA_R_PLATFORM_BINARY_VERSION='0.99.1'" >> /usr/local/lib/R/etc/Renviron.site >
> in an old Dockerfile. What are the right values?
Vince Carey (09:20:14): > @Stephen MosherI added a couple of workspace links in the vignette for today. Could you click on them and see if you get access to workspace descriptions with useful graphics? Thanks!
Stephen Mosher (14:56:08) (in thread): > Sorry for the delay, no, I do not have access
Stephen Mosher (14:56:15) (in thread): - File (PNG): image.png
2021-07-21
Vince Carey (07:22:56) (in thread): > Thanks for checking. It is a moot point now. It is unclear to me why access was restricted.
2021-07-23
Sehyun Oh (11:09:53): > Question : “Nothing within RStudio cloud environment in Terra can be shared through cloning workspace.” Is this correct?
Sehyun Oh (11:11:29): > So others can access what I’m doing within RStudio cloud environment only when they have a ‘writer’ access to the workspace? Without cloning, but more like work-together in ‘one’ workspace?
Nitesh Turaga (11:12:11): > So, I don’t think you can actually share your RStudio environment at all.
Nitesh Turaga (11:12:50): > You can save your analysis with an Rmd file -https://bioconductor.org/packages/release/bioc/vignettes/AnVILPublish/inst/doc/AnVILPublishIntro.html#from-collections-of-rmd-filesAnd you can share this Rmd file with users. > > But you cannot share RStudio at all I believe.
Nitesh Turaga (11:13:08): > Maybe i’m understanding your question wrong.
Sehyun Oh (11:15:03): > If someone has a writer access to my workspace, then they can access my RStudio environment, right?
Nitesh Turaga (11:20:41): > No. each RStudio environment is independent of the other. > > I cannot access your RStudio session even if you share your workspace with me. (i’m mostly sure this is the case).
Nitesh Turaga (11:21:00): > You’ll get your own RStudio
Frederick Tan (11:21:01): > That’s my understanding as well
Nitesh Turaga (11:21:21): > and i’ll get my own RStudio. This is because there is no way to share the “container” across different google accounts.
Frederick Tan (11:21:39): > … I’m not sure you can even share RStudio environments between Billing Projects
Nitesh Turaga (11:22:03): > Yep.
Nitesh Turaga (11:22:47): > The only way to share the analysis is to save it via an Rmd file, and then use AnVILPublish -R package to publish the Rmd file within the workspace.
Sehyun Oh (11:22:58): > Got it. I also confirmed it by testing. Even if I give a writer access to the workspace, other cannot access my cloud environment.
Nitesh Turaga (11:23:12): > Yep.
Sehyun Oh (11:26:33): > Yeah… my vignette references other files through the file system, and once I convert Rmd to Jupyter notebook, the vignette cannot access any of them. I can put my data files in Data section, but then I need to rewrite a big chunk of my vignette again… ;(
Sehyun Oh (11:29:09): > It’s doable but seems to require more maintenance than I’m willing to put. But good to clarify. Thanks!
Nitesh Turaga (11:29:52): > Yes, I understand your issue. It’ll be good to ask in the anvil meeting next week. Put it on the agenda maybe?
Sehyun Oh (11:35:54): > I have an idea what I’d like to do in Terra RStudio, but not sure whether it’s feasible. But I guess that’s not the part I should decide or worry about.:sweat_smile:I’ll leave a note in the agenda anyway.
Sean Davis (11:44:49): > @Sehyun OhA different approach is to run the container outside AnVIL and then interact with AnVIL from the container. Not sure what the use case you have in mind is, though.
Sean Davis (11:50:05) (in thread): > Is it possible to read and write files to the workspace rather than the file system for your vignette?https://bioconductor.org/packages/release/bioc/vignettes/AnVIL/inst/doc/Introduction.html#using-avbucket-and-workspace-files
Sehyun Oh (11:56:34): > @Sean DavisI’m trying to put GenomicSuperSignature use examples in Terra. Because it requires a good number of packages and data files, I thought Terra might be useful.
Sehyun Oh (11:57:28): > I can probably build a container and guide users to create cloud environment from it.
Frederick Tan (12:00:02) (in thread): > Can you provide a little more detail on the files (e.g. what’s creating them, where they are, why they can’t be accessed)?
Sean Davis (12:11:35) (in thread): > If you all end up discussing this at an#anvilmeeting, I’d love to attend.@Sehyun Oh, we could also get together with@Frederick Tanand/or@Nitesh Turagaseparately to discuss a bit.
Sehyun Oh (12:18:35) (in thread): > Sounds good! I’ll make the agenda little more clear and share it with you guys soon. Depending on how it looks, we can discuss during our regular meeting or separate one. Will let you know. :)
Nitesh Turaga (12:19:18) (in thread): > I’m open, and happy to discuss.
Nitesh Turaga (12:21:04) (in thread): > The AnVIL meetings are on Tuesdays@Sean Davis, bi-weekly and from 11AM-12PM EST. The next one is on August 3rd I believe. > > But i’m happy to meet before that as well to talk about this.
2021-07-26
Martin Morgan (11:51:12) (in thread): > Similar to Sean’s suggestion,AnVIL::localize()
synchronizes a path on the bucket to the local file system. This could be used as the first step of (any /all) vignettes, and files would not be copied unnecessarily.
Sehyun Oh (12:05:27) (in thread): > @Martin MorganMaybe I didn’t structure my repository in a proper way. It’s not a package, but I’d like to make analyses here (https://shbrief.github.io/GenomicSuperSignaturePaper/) available in Terra. Any suggestion? - Attachment (shbrief.github.io): Reproduce GenomicSuperSignature paper > This package provides data, methods, results covered in GenomicSuperSignature manuscript.
Frederick Tan (13:05:54) (in thread): > Two questions > * Is it mainlyRAVmodel_C2.rds
andRAVmodel_PLIERpriors.rds
or are there other datasets you need? > * When you say “available in Terra” are you referring specifically to Jupyter Notebooks? i.e. does it work fine via RStudio?
Sehyun Oh (13:24:48) (in thread): > * There are other datasets I need to run vignettes in the demo. They are currently saved in GitHub (using LFS) and Google bucket. > * It works fine in RStudio, but I can’t share it with others because the cloud environment is not sharable.
Sean Davis (19:15:23) (in thread): > @Sehyun Ohif you have folks clone the Paper repo and then install, will that get you the “cloud environment” that you need?
Sean Davis (19:18:02) (in thread): > It the workshop/tutorial/workflow that you want to run this?https://shbrief.github.io/GenomicSuperSignaturePaper/ - Attachment (shbrief.github.io): Reproduce GenomicSuperSignature paper > This package provides data, methods, results covered in GenomicSuperSignature manuscript.
Sean Davis (19:22:04) (in thread): > Cloning the repo will be a fast operation in the cloud. I’m not sure how quickly you’ll eat up github LFS credits, though, so another option is to simply tar up the repo and put that into the workspace. Then users localize the tar.gz file and install.
Sehyun Oh (20:38:44) (in thread): > ‘Clone the Paper repo + download data + install packages’ create a desired cloud environment. I just want to make it ‘easier’ especially with large data files, but might be an overkill…
Sean Davis (23:05:06) (in thread): > In a cloud environment, transferring largish files is not a huge challenge. And putting large files in the container doesn’t necessarily help a bunch since the container ALSO needs to be transferred before spinning up. I’d start with something that works and is documentable and reproducible. From there, it will be easier for folks to help iterate to improve.
Sean Davis (23:07:13) (in thread): > Without more information, I suspect that converting the workflows to better match up with an AnVIL workflow will mean putting the necessary files into a workspace and then using@Martin Morgan’s suggestion tolocalize()
them as part of the vignette. That will may not be a heavy lift but will lead to maintaining the package and the AnVILized version of vignettes.
Sehyun Oh (23:23:56) (in thread): > Thanks for the suggestions,@Sean Davis! I’m trying to figure out how much additional maintenance will be required if I decide on ‘AnVILized’ version of vignettes. Maybe Terra is not a right platform for this…:thinking_face:
Sehyun Oh (23:25:43) (in thread): > Yeah, I’ve tried to build a container and it’s been somewhat annoying.
2021-08-02
Lori Shepherd (11:20:50): > Anvil bi-weekly meeting tomorrow. Please add anything you would like to discuss to the agendahttps://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit#
2021-08-03
Martin Morgan (11:03:04): > I’m at the link waiting to be let in?
Nitesh Turaga (11:03:34): > You probably need to switch accounts Martin?
Nitesh Turaga (11:03:50): > Switch google accounts?
Marcel Ramos Pérez (11:04:22): > https://meet.google.com/ied-ouvi-sey
Vince Carey (14:19:25): > I think Lori was not getting the knock on the door because she attended with a different identity than the creator of the event.
Marcel Ramos Pérez (14:25:15): > set the channel topic: https://meet.google.com/ied-ouvi-sey
2021-08-04
Ayush Aggarwal (19:20:08): > @Ayush Aggarwal has joined the channel
2021-08-05
Nitesh Turaga (11:36:11): > Great session@Sehyun Oh. You did an incredible amount of work.
Sehyun Oh (11:40:24): > Thanks, Nitesh! Thanks for answering questions. It was super helpful!:pray:
Felix M (11:41:28): > @Felix M has joined the channel
Mikhail Dozmorov (12:49:04): > @Mikhail Dozmorov has joined the channel
2021-08-10
Lori Shepherd (07:41:13): > Any additional updates for the tech call this afternoon besides the anvil workshops from bioc2021?
Martin Morgan (09:10:34): > * AnVIL bug fix: shell quote file names in gsutil_cp() etc
Nitesh Turaga (09:10:52): > The version mismatch should be taken care of.
Martin Morgan (09:11:31): > do you mean the naming convention of the images? Also, did AnVIL get added to bioconductor_docker?
Nitesh Turaga (09:11:58): > Yes, AnVIL got added to bioconductor_docker. > > Yes, I mean the 3.13.0 and 3.13.2 mismatch as shown in the image below. > > Also, the anvil-docker images will be moved to the terra-docker repo in the next month or so. Once that happens, > 1. We’ll get filling up of the installed packages in R and Python. > 2. And also some scripts will work on the images to ensure tests.
Nitesh Turaga (09:13:43): - File (PNG): Screen Shot 2021-08-09 at 11.25.53 AM.png
Nitesh Turaga (09:15:22): > The location of the images will also change going forward, i.e the registry they are hosted in. Right now, they are inanvil-gcr-public
, and will move toterra-gcr
(this bucket name i’m not entirely sure of)
2021-08-11
Kozo Nishida (22:52:01): > Does anyone know when we are able to use AnVIL with Azure credit? > Japanese research budget can not used “on demand” and can only be used “prepaid”. > Azure is preferred over Google Cloud in Japan because Azure has a “prepaid” credit purchase format (It’s calledAzure in Open Licensing) > In order to spread AnVIL + Bioconductor in Japan, I would like to keep up with the latest status of Azure + AnVIL. - Attachment (Microsoft Azure): Azure in Open Licensing | Microsoft Azure > Get the benefits of the cloud with Azure in Open Licensing. Activate a new subscription or add Azure credits to an existing Azure in Open Licensing subscription.
2021-08-12
Martin Morgan (06:30:25): > I think this is likely to be six months or more into the future. Good to know that there is interest in Azure.
Frederick Tan (06:54:34): > Does your research budget allow you to buy “prepaid” credits through a third party likeonixnet.com?
Kozo Nishida (07:09:39): > Due to accounting reasons, it is difficult to make a contract with a third party with the Japanese research budget (mostly from the government). I think there is a tendency in Japan to prefer contracts with Microsoft.
Kozo Nishida (07:12:25): > As far as I know, both AWS and Google Cloud need to be contracted through a third party (as you say) on a Japanese budget, and Azure, which does not need to do so, seems to have strengths for the Japanese research industry.
Sean Davis (09:39:02) (in thread): > You might want to consider NIH STRIDES.https://datascience.nih.gov/strides/preparing-use-strides-2Since these contract mechanisms are supported by NIH, it may be easier for you to establish such a contract. If work is supported by an NIH grant, you will automatically receive NIH STRIDES discounts on cloud services through AWS and Google. Microsoft is coming soon. > If this seems promising, reach out to the STRIDES email on the website and you can go from there.
Martin Morgan (09:55:40): > I wrote a little R package to help aggregate the MixPanel (AnVIL use, from the AnVIL people) csv files, it’s athttps://github.com/mtmorgan/mixpanel. It’ll require your google credentials to retrieve the csv files (and of course you have to have access to the shared folder, which is not public). Here’s the vignette, showing the basics of how to use and some summary information.
Martin Morgan (10:03:54): > (here’s a sanitized version of the vignette…) - File (HTML): using_mixpanel.html
2021-08-16
Sean Davis (09:30:20): > Would be great to have AnVIL participate in at least one H3Africa session. - Attachment: Attachment > Dear all > Last Friday, I had a meeting with the H3Africa group about continuing to offer Bioconductor courses. The group agreed that it would be great to have a follow-up Bioconductor workshops. They proposed to have a Bioincoductor workshop in the first week of November 2021. I will try to set a second meeting for more discussions and it will be good if more CAB members can attend.
2021-08-17
Martin Morgan (09:02:40): > Here’s the agenda / link for today’s 11am meeting; feel free to add items…https://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit?usp=sharing
Martin Morgan (09:03:18): > <!channel>^^
Martin Morgan (10:49:19): > Is@Lori Shepherdthe host of the google meeting in the document? If so I don’t think she’s here and so won’t be able to let us in. Alternatives?
Nitesh Turaga (10:50:40): > I don’t think Lori is in today, we can just use a different google meet
Nitesh Turaga (10:51:05): > To join the video meeting, click this link:https://meet.google.com/epv-yarp-jhjOtherwise, to join by phone, dial+1 720-477-1895and enter this PIN: 436 040 611# > To view more phone numbers, click this link:https://tel.meet/epv-yarp-jhj?hs=5 - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers. - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
Vince Carey (22:03:12) (in thread): > I see there’s been some editing of the sequence of the tutorial materials. The Bioc2021 workshop video looks like - File (PNG): Screen Shot 2021-08-17 at 10.00.19 PM.png
Vince Carey (22:04:23) (in thread): > so much screen real-estate lost by the airmeet platform? It would take some time but it might be helpful to have more ‘indexing’ of the content on youtube, so that the minute-mark for each topic is present in the video description. i will try to watch through to get some timings of topics.
Vince Carey (22:51:18) (in thread): > Anyway I would say that the revised ordering on the PR is fine and we should go ahead@Andres Wokaty
2021-08-18
Lori Shepherd (06:56:39): > For future reference, anvil meeting is scheduled through the bioconductorcoreteam gmail – that the core team has access and could have started as host
2021-08-24
Vince Carey (12:08:50): > Any news for the 4pm meeting today?
Martin Morgan (12:24:08): > Not much, sorry… > * continue working on AnVIL package blog post for Geraldine
Nitesh Turaga (12:25:33): > Nothing on my end. If someone can check the AnVIL page for the version discrepancy issue on the UI and if it disappeared or not on their end, it would be great. (Can’t access AnVIL right now, and the UI should have updated yesterday).
2021-08-26
Levi Waldron (05:50:46): > I was googling for something for setting up my fall class on AnVIL, and found this - nice of them to format it all nicely!:joy:https://portal.anvilproject.org/learn/data-analysts/using-anvil-for-teaching-r-bioconductor - Attachment (The AnVIL): Using AnVIL for Teaching R / Bioconductor > A case study of using AnVIL to teach R for a Biostatistics course and provides essentials for using AnVIL for other instructional efforts.
Martin Morgan (12:35:43): > A first draft of the AnVIL package blog post is athttps://docs.google.com/document/d/11BTvTg_hE-qaNB-ONRhd5vLGNiza8ekmkzJ3Zg2nAvg/edit?usp=sharing(should have edit permissions…) It partly follows a template from Geraldine, but maybe deviates too much into a vignette…
Vince Carey (17:25:29): > Will the final product include the outputs of the commands?
Vince Carey (17:35:12): > My gut reaction is that it might be more compact if you focused on the Optimus exploration. You have the loomExperiment installation which is probably fairly complex but fast thanks to the binaries. Then there’s the workspace concept to be fleshed out – unfortunately the ‘participant’ concept seems artificial for that workspace. But showing how AnVIL package helps to investigate the outputs of the pipeline in the workspace … seems interesting?
Martin Morgan (17:50:30) (in thread): > I guess it could, in a cut-and-paste way, but it seems like that might detract from the ‘impact’ of a blog post?
Martin Morgan (17:51:25) (in thread): > Yes, I think I’ll revise so that everything is in the Optimus workspace…
2021-08-30
Vince Carey (15:04:38) (in thread): > A few choice output events could be useful. I agree it should be selective.
2021-08-31
Lori Shepherd (07:35:43): > https://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit?usp=sharingPlease add items to today’s meeting agenda
Vince Carey (07:57:20): > I am starting a curatedAnVILData package, emulating the curatedTCGAData paradigm for MultiAssayExperiments and metadata. Will give some details at 11.
2021-09-04
Vince Carey (10:12:24): > https://app.terra.bio/#workspaces/landmarkanvil2/curatedAnVILData-proto– description component will be the basis of a walk-through for Sept 7. I will also do some interactive work in that meeting. However, given the truncation of Alessandro’s Terra talk last Tues, I have offered to give the Terra group the floor and proposed to move Bioc to Sept 14. Haven’t heard back.
Vince Carey (10:27:40): > I don’t think I’ll need the whole 30 minutes so any other items should just be noted here and we will set up time/slides.
2021-09-07
Vince Carey (10:34:34): > I have updatedhttps://app.terra.bio/#workspaces/landmarkanvil2/curatedAnVILData-protowhich will be the basis for the bioc discussion after 30 min mark. I have offered to terra group to cede the floor to finish their material from last week, and Rob is willing but Alessandro is out, so we can decide as a group on the call how to proceed. I think the curatedAnVILData could take 15-20 min. I don’t plan live demo but it is possible if desired. Other items for agenda entries?
Vince Carey (10:35:41): > The app uses GenomicFiles::reduceByFile for bigWig in the local bucket. It would be much more potent if the bigWig could be interrogated over HTTP.
Martin Morgan (11:28:09): > * Draft blog post on using AnVIL package in Geraldine’s hands > * Continue to develop Azure support in AnVIL package > * Application users (unique) and workflow / application launches from mixpanel
Martin Morgan (11:30:25): > Also, in rtracklayer?
import.bwseems to imply thatcon
can be a URL, andselection=
can specify the region of interest. Does that (maybe with the resolved url fromdrs_stat()
) get you range-based selection?
Martin Morgan (15:21:45): - File (PNG): image.png
Martin Morgan (15:30:28): - File (PNG): image.png
Kasper D. Hansen (15:46:06): > There is a meeting today at 4pm eastern right? Vince has asked me to be there 4-4.30 to touch on recount3, but I think I need a zoom invite or whatever you’re using. Could someone just post the URL here? Assuming this is the right channel
Vince Carey (15:47:29): > https://zoom.us/j/94221630188?pwd=M1VEMlp2eFlxZDlYeW9obDhvc21LZz09
2021-09-13
Lori Shepherd (12:30:08): > agenda for tomorrows meeting:https://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit?usp=sharingPlease feel free to add issues, comments, and updates
2021-09-14
Sehyun Oh (11:06:53): > Humm.. I can’t get in to the meeting room. Is this a correct link?https://meet.google.com/ied-ouvi-sey - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
Nitesh Turaga (12:05:29): > Yep, the swagger billing log-in works again
Vince Carey (12:51:27): > OK, add what you know about the problem/solution to the tech call agenda, ok?
Nitesh Turaga (12:51:52): > Yes, will do now.
Vince Carey (12:53:01): > FWIW I do not see a problem with pip3 install firecloud on the rstudio AnVIL environment.
Nitesh Turaga (12:53:56): > I see, I hadn’t tried yet. But, I do think it makes sense to remove Python 2 from that image completely.
2021-09-16
Henry Miller (18:35:31): > @Henry Miller has joined the channel
2021-09-21
Martin Morgan (14:02:41): > Updates for the AnVIL meeting today?
Vince Carey (15:53:28): > No Martin, thanks for filling in agenda. Do you want to do the vocals or should I?
2021-09-25
Haichao Wang (07:20:27): > @Haichao Wang has joined the channel
2021-09-28
Sehyun Oh (11:05:10): > Do we have a meeting today?
Sehyun Oh (11:05:18): > I can’t get in the meeting room again…. ;(
Martin Morgan (11:05:35): > I think we’re waiting for@Vince Careyto let us all in…
Andres Wokaty (11:06:29): > vince is going to start a new meeting
Vince Carey (11:07:05): > new anvil > Tuesday, September 28 · 11:00am – 12:00pm > Google Meet joining info > Video call link:https://meet.google.com/cvt-tpyf-gczOr dial: (US) +1 617-675-4444 PIN: 776 352 570 4481# > More phone numbers:https://tel.meet/cvt-tpyf-gcz?pin=7763525704481 - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers. - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
Vince Carey (11:08:46): > use new link@Martin Morgan
Nitesh Turaga (16:32:21): > crcmod
issue is fixed now inhttps://github.com/anvilproject/anvil-docker/pull/33
2021-10-05
Vince Carey (11:42:30): > <!here>Anything new here for 4pm today?
Martin Morgan (13:10:18) (in thread): > * continued DRS development — would greatly appreciate example URI’s for resolution; it’s easy to generate code accessing DRS URIs that require authentication and ‘work for me’ (BiocManager::install("Bioconductor/AnVIL"); AnVIL::drs_cp(…)"
) but does it work in general?? Brian O’Connor suggested last week that a document with such URIs exists; the examples in Brian Hannafious’s terra-notebook-utils repos are out of date / require privileged access.
Vince Carey (14:58:28): > Thanks. I am trying to get to FHIR. The AnVIL_Devs group can apparently authenticate to the server. I may not be able to do the vocals on this call. I will try to be on the call by the time we are up.
Vince Carey (15:14:23): > I am still unable to authenticate to FHIR server
Vince Carey (15:14:34): > Details communicated to Valerie Reeves
2021-10-06
Vince Carey (16:56:32): > note the 3.13.2 … - File (PNG): anvilVers.png
Nitesh Turaga (19:54:50): > It should be updated when the new version comes out
Nitesh Turaga (19:55:12): > should become3.13.3
2021-10-07
Martin Morgan (09:25:15): > The Terra blog on the AnVIL package is published athttps://terra.bio/access-terra-anvil-resources-easily-with-bioconductor-and-the-anvil-package/ - Attachment (Terra.Bio): Access Terra / AnVIL resources easily with Bioconductor and the AnVIL package - Terra.Bio > Dr. Martin Morgan discusses the R / Bioconductor AnVIL package and shows how his group’s work empowers researchers to work more easily on the cloud.
Nitesh Turaga (16:40:54): > Great post!
2021-10-10
Vince Carey (12:42:06): > Some progress on FHIR:https://app.terra.bio/#workspaces/use-strides/biocfhir-new, based onhttps://github.com/vjcitn/AnvBiocFHIRwith Dockerfile > > FROM us.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor-devel:3.14.2 > > RUN pip3 install pyAnVIL==0.0.9rc2 --upgrade > RUN pip3 install git+[https://github.com/smart-on-fhir/client-py#egg=fhirclient](https://github.com/smart-on-fhir/client-py#egg=fhirclient) >
Vince Carey (13:38:11): > It is somewhat mysterious to me why .local/lib/python3.7/site-packages/ is used to install packages coming in to R via pip via basilisk; it is a different location for the conda-based installations, and when I perform the same tasks on a linux system, all required package wind up in$HOME/.cache/R/basilisk/1.5.0/AnvBiocFHIR/0.0.1/abfhirenv/lib/python3.7/site-packages
Vince Carey (14:37:32): > I think this issue of installing in $HOME/.local/lib/python3.7 is anvil specific?
2021-10-12
Vince Carey (08:38:20): > It is very doubtful I will make tech call today. Or internal call. I added note on FHIR in the tech call agenda.
2021-10-17
Vince Carey (08:04:32): > There are two issues with the OpenVINO cloud environment in AnVIL. First, the cuda infrastructure seems mismatched with the installed tensorflow, and I have filed a ticket on that. Attempts to check if tensorflow can communicate with GPU fail. Second, R 4.1.1 is installed but none of the environmental support for AnVIL package is present, so AnVIL::install does not get binaries. The underlying container image for any AnVIL cloud environment that provides R should align with and be tested for congruent behaviors with Nitesh’s images.
Martin Morgan (08:18:39): > That’s also not the case for the Jupiter notebook. The base image is different so we would have to build a second set of binaries. Perhaps a third for Vino.
Martin Morgan (08:19:48): > Also I think the GPU processing capabilities are available in the Jupiter notebook image without using Vino.
Vince Carey (10:30:59): > Confirmed. Strangely one cannot reach the GPU directly with tensorflow as installed for python3 in the Vino environment, but one can in the stock Jupyter environment.
Sehyun Oh (14:10:02): > Is recount3 data available through Terra’s data model?
2021-10-18
Stephen Mosher (08:40:19): > @Sehyun Oh- We’re working with@Kasper D. Hansenand@Leonardo Collado Torresto get the recount3 data pulled into AnVIL. This is a work in progress at the moment.
2021-10-19
Vince Carey (15:39:17): > Any news for tech call?<!here>
Nitesh Turaga (15:51:32): > We see 3.13.3…and we have plans to update the docker images soon too as soon as release is done.
Nitesh Turaga (15:53:43): > Updated: Sep 28, 2021 > Version: 3.13.3
Martin Morgan (15:55:13): > Added a couple of small updates to the agenda
Nitesh Turaga (15:56:10): > A little hint on my login issue, it seems the specific IAM authentication needed for each account is not being given to me. I need to go to swagger authenticate any API call with thatcloud-billing
authorization. - File (PNG): Screen Shot 2021-10-19 at 9.54.06 AM.png
Vince Carey (15:59:16): > @Nitesh Manishould we mention?@Martin MorganI will be happy to do the talking and give you a break…
Vince Carey (15:59:35): > @Nitesh Turagashould we mention
Nitesh Turaga (15:59:39): > Yes, I think we could mention it.
Martin Morgan (17:00:29): > noticed that there’s an R jq package…https://CRAN.R-project.org/package=jqr - Attachment (cran.r-project.org): jqr: Client for ‘jq’, a ‘JSON’ Processor > Client for ‘jq’, a ‘JSON’ processor (https://stedolan.github.io/jq/](https://stedolan.github.io/jq/))), written in C. ‘jq’ allows the following with ‘JSON’ data: index into, parse, do calculations, cut up and filter, change key names and values, perform conditionals and comparisons, and more.
2021-10-25
Vince Carey (23:07:54): > For Friday meeting, mixpanel results may be of interest in the Analysis Tools discussion. Are the data on sessions hand-curated or retrieved by API? It would be nice to update the figures given 7 September in tech agenda.
2021-10-26
Martin Morgan (08:44:03): > I think the data is extracted from some bigger set of reports and hand-curated. Valerie Reeves says that the data will be updated for September, but nothing so farhttps://the-anvil.slack.com/archives/CGM728FJ4/p1634144200038400. These are the figures from the vignette athttps://github.com/mtmorgan/mixpanel - File (PNG): image.png - File (PNG): image.png
Stephen Mosher (08:53:59): > FYI - metric data from APR2021 - AUG2021 can be found in this folder on the AnVIL shared drive:https://drive.google.com/drive/folders/1ejvQZBwwj5JkBQLtv4Skyr-5N0188KVK. > > I’m searching for the location of metrics from APR2020 - MAR2021.
Frederick Tan (09:37:16): > The original metrics that William Disman provided are available in the “Files” tab athttps://anvil.terra.bio/#workspaces/anvil-outreach/metrics/data
Sehyun Oh (11:24:38): > Sorry I’m not feeling well and miss the meeting today.
Marcel Ramos Pérez (12:33:32) (in thread): > Feel better soon!
Martin Morgan (19:38:02): > @Vince Careythe problem you mentioned with AnVILPublish and workspace creation for DeepPINCS should be fixed in AnVILPublish v. 1.4.1 / 1.5.1
2021-11-01
Vince Carey (21:27:33): > Revisiting OSCA book for 3.13 (finalize) and 3.14 (when all binaries are available):https://github.com/vjcitn/osca4anvil– this is directed at installing all dependencies in advance of any work with the content, and verifying that all book Rmd can be run. How to package this best for an interactive user is not clear. Maybe chapters should be presented as Jupyter. Then this might be a supporting package/process for an AnVILPublish of book content.
Vince Carey (21:31:59): > It would be useful to have estimates of time and resources required for various tasks. This could be done with Rcollectl/instrumentation workspace.
2021-11-08
Paula Nieto García (03:18:38): > @Paula Nieto García has joined the channel
Paula Nieto García (03:19:02): > @Paula Nieto García has left the channel
2021-11-09
Lori Shepherd (07:29:03): > https://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit#heading=h.tfpbk58ubtf0agenda for today’s meeting I’d like to review what we established as Q3/Q4 milestones as end of Q4 is fast approaching – please feel free to add to agenda and update the Issues section and/or Updates/Hightlight, Questions sections
2021-11-11
Shilpa Garg (09:27:43): > @Shilpa Garg has joined the channel
2021-11-16
Vince Carey (09:02:18): > <!here>anything for standup today? i will discuss OSCA for bioc 3.13 (looks like bioc 3.14 is still not in dropdown), with illustration of “execution profile” for a code chunk, and a bit of FHIR progress.
Kasper D. Hansen (09:04:28): > I won’t join the meeting today, but I have a Q. Is it possible to programmatically detect you’re running on anvil
Kasper D. Hansen (09:05:04): > So you could have something in.onLoad
which sets up arguments to functions in a package depending on whether you’re running outside or inside of anvil?
Vince Carey (09:07:39): > There are enough restrictions on the kind of container that can be run as a custom environment that I would suspect that there is a way, probably an environment variable to check. But I am not sure, and@Martin Morgan@Nitesh Turagaor Rob Title on the AnVIL slack would surely know.
Vince Carey (09:08:13): > @Kasper D. Hansenany comments on progress of recount3 in AnVIL? If I use recount3 package to get data, it is still coming from AWS S3, right?
Kasper D. Hansen (09:28:47): > No comments. The ball right now is in our lap
Kasper D. Hansen (09:28:57): > Right now you’re getting it from AWS
Kasper D. Hansen (09:29:09): > Sorry, no, you’re getting it from JHU
Kasper D. Hansen (09:29:13): > never aws
Nitesh Turaga (10:34:42): > @Kasper D. Hansen, the easiest way to prepare a container for AnVIL is to inherit fromus.gcr.io/anvil-gcr-public/anvil-rstudio-bioconductor:3.14.0and build whatever else you want on it. > > If images are based on this container or any of the Bioconductor containers you’ll have a few options for env variables > > > Sys.getenv() > BIOCONDUCTOR_DOCKER_VERSION 3.13.37 > BIOCONDUCTOR_VERSION 3.13 > CLUSTER_NAME saturn-d7999ee5-6ccc-4328-b56e-6ac955a7ddac > GOOGLE_PROJECT land2nitesh > WORKSPACE_BUCKET[gs://fc-0c122436-e115-4629-b6e3-ac8e33fb498c](gs://fc-0c122436-e115-4629-b6e3-ac8e33fb498c)WORKSPACE_NAME test-workspace > WORKSPACE_NAMESPACE land2nitesh >
Nitesh Turaga (10:35:22): > The variables such asWORKSPACE_*
are unique to the AnVIL
Nitesh Turaga (10:36:00): > the variablesBIOCONDUCTOR_*
help you recognize it’s inherited from a Bioconductor docker image.
Nitesh Turaga (10:36:25): > Since AnVIL currently runs on Google cloud, you’ll have aGOOGLE_PROJECT
andCLUSTER_NAME
as well.
Nitesh Turaga (10:37:43): > You can set your.onLoad()
functionality based on your use case on these variables. If you want your function to run in Bioconductor docker images on load, or only on cloud environments. Both options are possible.
Nitesh Turaga (10:41:04): > There are also helpful commands in theAnVIL
package you could use to detect if you are on the AnVIL cloud environment.
Nitesh Turaga (10:50:57): > I think you could just useAnVIL::avworkspace()
as well and see what happens. There are two options, > > It either returns a workspace > > #I'm running this on the AnVIL RStudio env > > AnVIL::avworkspace() > [1] "land2nitesh/nitesh-test-work" >
Nitesh Turaga (10:51:37): > Or it fails, > > # I'm running this locally on my mac > > AnVIL::avworkspace() > Warning message: > In .avworkspace("avworkspace_name", "NAME", name) : > 'WORKSPACE_NAME' undefined; use `avworkspace_name()` to set >
Kasper D. Hansen (10:51:58): > What I would like to do is to modify the recount3 package to detect that it is running on Anvil. Seems like theAnVIL::awaorkspace()
is the way to go
Kasper D. Hansen (10:52:58): > could be useful to have this test, including error handling, to return true/false in theAnVIL
package, but perhaps we’re going to be the sole usecase
Martin Morgan (12:53:22): > Was wondering a little about more of the use case? I don’t have to be ‘in’ AnVIL to use AnVIL, e.g., on my local mac where I have gcloud / gsutil installed I can do lots of things AnVIL related, with or without the environment variables set… maybe you’re looking for a particular capability, rather than being ‘on’ AnVIL?
Kasper D. Hansen (13:15:25): > We’re going to have two copies of data we suck down. One copy is hosted on anvil and one is hosted on the internet. The usecase is just to hide this from the user who just gets it from the “obvious” source.
Martin Morgan (15:34:38): > Can we be more precise? AnVIL data is currently stored in the google cloud, typically with prefixgs://
. If these were public access, then I could drill down and eventually download the data from an https:// url. So are you saying that you’d like one copy of the data to be ‘requester pays’ and billed to the user’s AnVIL account, and one copy to be publicly accessible ‘for free’? And hence you’d like to know whether the user currently has an active google billing project that has ‘requester pays’ enabled? > > For instance, for the data in this workspacehttps://anvil.terra.bio/#workspaces/anvil-datastorage/1000G-high-coverage-2019/dataI see that the AnVIL data are objects in a google bucket, e.g.,gs://fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e. I can discover & navigate the object athttps://console.cloud.google.com/storage/browser/fc-56ac46ea-efc4-4683-b6d5-6d95bed41c5e. Continuing to drill down, I end up at a pagelike this. If this were not requester pays, then the public URL would just work… > > Probably I am speaking at the edge of my comprehension about several things, so take with a grain of salt… - Attachment (accounts.google.com): Google Cloud Platform > Google Cloud Platform lets you build, deploy, and scale applications, websites, and services on the same infrastructure as Google. - Attachment (accounts.google.com): Google Cloud Platform > Google Cloud Platform lets you build, deploy, and scale applications, websites, and services on the same infrastructure as Google.
2021-11-23
Lori Shepherd (07:49:58): > The updated meeting invitation has the new google link that nitesh has set up for the meeting going forward. Same time slot - just different meeting link
Nitesh Turaga (10:59:51): > AnVIL meeting:https://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit#heading=h.9n5waqkr285e
Nitesh Turaga (11:00:19): > Meeting Link:https://meet.google.com/hqg-hysj-ouy - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
Nitesh Turaga (11:00:27): > Now at 11am Eastern time.
Martin Morgan (12:07:37): > The link marcel (from Aedin) postedhttps://developer.nvidia.com/blog/accelerating-single-cell-genomic-analysis-using-rapids/on use of GPU for sc analysis is amazing! - Attachment (NVIDIA Developer Blog): Accelerating Single Cell Genomic Analysis using RAPIDS | NVIDIA Developer Blog > The human body is made up of nearly 40 trillion cells, of many different types. Recent advances in experimental biology have made it possible to explore the genetic material of single cells.
2021-11-29
Martin Morgan (15:00:48): > Running > > docker run -it --rm bioconductor/bioconductor_docker:RELEASE_3_14 R --quiet >
> (or launching the 3.14 RStudio container in AnVIL) and > > BiocManager::install("Bioconductor/AnVIL") >
> now gets a nicer, easily updated location for binary packages > > > AnVIL::repository() > [1] "[https://bioconductor.org/packages/3.14/container-binaries](https://bioconductor.org/packages/3.14/container-binaries)" >
> This redirects to the google cloud storage location via assets/.htaccess in thebioconductor.orgrepository / web site (thanks Lori!) > > Unfortunately, the binary packages are very out-of-date > > > AnVIL:::repository_stats() > Bioconductor software packages: 2071 > Binary packages: 2477 > Binary software packages: 1026 > Out-of-date binary software packages: 1013 >
> so only 13 packages would result in binary installations!
Nitesh Turaga (17:50:31): > The redisparam got fixed just a day ahead of thanksgiving.
Nitesh Turaga (17:50:36): > I’ll work on updating them this week.
Nitesh Turaga (17:51:03): > I’m out tomorrow and day after as well, as i’m moving to a new apartment, but I should have it done by end of week.
2021-11-30
Vince Carey (15:17:42): > Any additional updates for 4pm today?
Martin Morgan (15:58:07) (in thread): > not from me
2021-12-07
Nitesh Turaga (10:56:00): > Link to call at 11am today -https://meet.google.com/hqg-hysj-ouyAnVIL meeting noteshttps://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit# - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
Nitesh Turaga (12:25:12): > @Sehyun Ohand@Vince CareyI can’t find the workspace ‘SRA in AnVIL via recount3/Bioc’ ?
Nitesh Turaga (12:25:16): > Can you help me find it?
Nitesh Turaga (12:25:23): > It’s not in the public or featured workspaces.
Sehyun Oh (12:41:23): > I see only this:https://anvil.terra.bio/#workspaces/landmarkanvil2/Bioconductor-Package-SraInAnVIL. If this is what you are looking for, I think you should have an access throughturaganitesh@gmail.com.
Nitesh Turaga (12:41:45): > can you please add tonitesh@ds.dfci.harvard.edu?
Sehyun Oh (12:41:46): > If this is not what you’re looking for, I guess I don’t have an access neither.
Nitesh Turaga (12:41:56): > Also, isn’t this supposed to be a featured board ?
Nitesh Turaga (12:42:16): > workspace*
Sehyun Oh (12:44:14): > I added that email toBioconductor_User@firecloud.org, which has an access to the above workspace. Somehow, I couldn’t share the workspace directly with you through that email
Sehyun Oh (12:46:30) (in thread): > Not sure. It’s built from AnVILPublish and probably Terra team wants more refined version for the featured workspace I guess.
Vince Carey (12:56:41): > https://anvil.terra.bio/#workspaces/landmarkanvil2/Bioconductor-Package-SraInAnVILhas nitesh and shbrief shared. Please check again
Vince Carey (12:56:54): > it is not supposed to be featured
Nitesh Turaga (12:56:58): > got it.
Nitesh Turaga (12:57:25): > It was written as ‘featured’ on the q3/q4 milestones. I will change that to make sure expectations are set.
2021-12-09
Martin Morgan (07:04:14): > Vince mentioned in a separate slack > > Quitting from lines 101-102 (Seurat.Rmd) > Error in py_module_import(module, convert = convert) : > ImportError: Numba needs NumPy 1.20 or less >
> I guess Seurat won’t play nicely with basilisk? (is this actually Seurat, or just the name of the Rmd file?) Is this a weakness of basilisk (the package author has to opt in)? > > So is the best alternative (and can it be done in a single R session) to set up a python virtual environment, ‘pin’ the appropriate packages for Seurat (and other tools with python dependencies for the Rmd file) and then launch the script? What would that look like (R commands to set up the virtual environment, discover(?) and install dependencies, etc…?
Vince Carey (09:45:05) (in thread): > Sounds like a project.
Vince Carey (09:46:38) (in thread): > I wonder about Seurat/basilisk interplay – it should be a step forward for Seurat to adopt this approach, but I still am not sure what is leading to the error I identified.
2021-12-10
Alex Mahmoud (10:03:29): > @Alex Mahmoud has joined the channel
2021-12-14
Megha Lal (08:23:18): > @Megha Lal has left the channel
2021-12-15
Martin Morgan (11:38:01): > Looks like the (AnVIL RStudio / bioconductor_docker:RELEASE and bioconductor_docker:devel binary repositories are back in business! E.g., > > docker run -it --rm bioconductor/bioconductor_docker:devel R -e "BiocManager::install('Bioconductor/AnVIL'); AnVIL:::repository_stats()" > > ## Bioconductor software packages: 2018 > ## Binary packages: 3505 > ## Binary software packages: 1980 > ## Out-of-date binary software packages: 0 >
> Thanks@Nitesh Turaga!
Nitesh Turaga (11:47:56): > Yep:slightly_smiling_face:
2021-12-21
Nitesh Turaga (09:59:33): > Link to call at 11am today -https://meet.google.com/hqg-hysj-ouyAnVIL meeting noteshttps://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit#Just wanted to figure out if there are enough people on the call for today? Or it’s too close to the holidays. Thoughts? > > We could also do a shorter version of the meeting. - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
2022-01-04
Nitesh Turaga (10:55:11): > Hi<!here>Happy new year! First Bioc - AnVIL meeting of the year. > > Meeting Link:https://meet.google.com/hqg-hysj-ouyNotes are herehttps://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit# - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
2022-01-05
Martin Morgan (17:11:55): > The ‘legacy’ notebook in the UI should now point to Bioc 3.13, not Bioc 3.12 - File (PNG): image.png
2022-01-07
Nitesh Turaga (10:52:43): > Weird jupyter notebook error in AnVIL. > > When I do?some_function
in the R kernel within Jupyter, it seems to give the help page AND an error message. - File (PNG): Screen Shot 2022-01-07 at 10.52.35 AM.png
Martin Morgan (11:15:30): > This seems like an error fromhttps://github.com/IRkernel/reprand should be reported there (unless already fixed?? I’m not sure how to make this reproducible outside the notebook environment).
2022-01-12
Nitesh Turaga (08:26:31): > 3.14 jupyter binaries should work now.
2022-01-18
Nitesh Turaga (10:59:42): > AnVIL meeting today: Meeting Link:https://meet.google.com/hqg-hysj-ouy - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
Nitesh Turaga (11:01:43): > https://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit#
Martin Morgan (12:40:32): > Thisvignettedocuments use of the AnVIL package to configure and run workflows, available in the ‘devel’ version of the package. Thanks@Kayla Interdonatofor the implementation!
2022-02-01
Nitesh Turaga (10:37:39): > Bioc - AnVIL meeting today at 11:00am
Nitesh Turaga (10:37:43): > https://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit#
2022-02-15
Nitesh Turaga (10:59:44): > Hi<!here>, AnVIL meeting today at 11:00 ET (now)
Nitesh Turaga (10:59:48): > https://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit#
2022-03-01
Nitesh Turaga (10:59:47): > AnVIL call today at 11:00 ET (now)
Nitesh Turaga (10:59:56): > https://docs.google.com/document/d/1otCjFNvmXvJUwkgsrZHtwYWYjSCfiOseMuukRPWDcaA/edit#
Nitesh Turaga (11:00:09): > https://meet.google.com/hqg-hysj-ouy - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
2022-03-15
Nitesh Turaga (11:02:03): > Hi all, AnVIL meeting nwo
Nitesh Turaga (11:02:04): > https://meet.google.com/hqg-hysj-ouy - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
2022-03-17
Leonardo Collado Torres (16:52:55): > @Leonardo Collado Torres has joined the channel
Nick Eagles (16:53:06): > @Nick Eagles has joined the channel
Leonardo Collado Torres (16:55:41): > Hi everyone!:wave:Nick@Nick Eagles, from my team at LIBD, will help us upload therecount3
data to AnVIL in the near future. We need to get setup and talk to the Broad to get all things squared away. But well, you might see us around here:smiley:If I need to update therecount3
package once the data is at AnVIL, I’ll do that then. Anyways, I just wanted to give you a heads up.
Nitesh Turaga (17:08:36): > Welcome@Leonardo Collado Torresand@Nick Eagles
2022-03-22
Nitesh Turaga (15:11:02): > How does one add an image in the anvil workspace documentation?
Nitesh Turaga (15:12:33): - File (WebM): Screenshare - 2022-03-22 3:12:20 PM.webm
Nitesh Turaga (15:12:47): > I tried markdown and HTML embedding…still breaks.
Marcel Ramos Pérez (15:26:38): > I’ve looked at three workspaces with pictures and it looks like they’re linking images from the cloud, e.g.,https://storage.googleapis.com/terra-featured-workspaces/encode-tutorial-2019/Binning_PBS.png
Nitesh Turaga (15:27:00): > hmmm…let me check
Martin Morgan (15:44:43) (in thread): > Yes, I think these are just markdown documents so any markdown-correct embedding will work?
Stephen Mosher (15:48:25): > I’ve done this just last week using this workspace:https://anvil.terra.bio/#workspaces/anvil-datastorage/AnVIL_PRIMED_Spring_Workshop_2022. > > This line captured the image from github nicely. > - Attachment: Attachment
Stephen Mosher (15:49:05): > Grabbing an image from google drive was a bit more finicky
Nitesh Turaga (15:50:28): > Got it.
Nitesh Turaga (15:50:36): > needs a “raw” github image url
Nitesh Turaga (15:51:03): > > ) >
Nitesh Turaga (16:56:30): > This weeks builds : > > > AnVIL:::repository_stats() > Container: bioconductor_docker > Bioconductor version: 3.14 > Bioconductor binary repos:[https://bioconductor.org/packages/3.14/container-binaries/bioconductor_docker](https://bioconductor.org/packages/3.14/container-binaries/bioconductor_docker)Bioconductor software packages: 2054 > Binary packages: 3621 > Binary software packages: 2046 > Missing binary software packages: 8 > canceR gpart MACPET mfa networkBMA PanVizGenerator Rgin SLGI > Out-of-date binary software packages: 0 >
> > > > AnVIL:::repository_stats() > Container: bioconductor_docker > Bioconductor version: 3.15 > Bioconductor binary repos:[https://bioconductor.org/packages/3.15/container-binaries/bioconductor_docker](https://bioconductor.org/packages/3.15/container-binaries/bioconductor_docker)Bioconductor software packages: 2066 > Binary packages: 3610 > Binary software packages: 2052 > Missing binary software packages: 14 > canceR ChIPQC coMethDMR DiffBind EnMCB fishpond gpart MACPET mfa > networkBMA Rgin systemPipeR tricycle vulcan > Out-of-date binary software packages: 13 > atena benchdamic BiocParallel DEGreport MoonlightR msa MungeSumstats > mzR pengls rawrr recount3 splatter zellkonverter >
2022-03-23
Vince Carey (04:53:09) (in thread): > a) is this documented? it always seemed to me that one should be able to put an image resource in the workspace that could be referenced in the description markdown > b) whatever the case, it seems appropriate for AnVIL administrators to provide a globally maintained repository of images that we can add to via pull request, and refer to in our descriptions.@Stephen Moshercan you raise this to a PM concern?
Vince Carey (04:54:02) (in thread): > the broadinstitute/github.io path that you used in AnVIL_PRIMED seems a candidate for a global repository
Vince Carey (04:55:52) (in thread): > I have asked Steve Mosher to elevate this concern to PMs … it seems really important for workspace durability that authors not have to create their own cloud assets to manage images to which descriptions refer. but i have done exactly this. if the images cannot be bound to the workspace, some other durable solution must be available.
Vince Carey (04:56:43) (in thread): > https://console.cloud.google.com/storage/browser/bioc-anvil-images;tab=objects?authuser=0&project=biocbbs2020&prefix=&forceOnObjectsSortingFiltering=false - Attachment (accounts.google.com): Google Cloud Platform > Google Cloud Platform lets you build, deploy, and scale applications, websites, and services on the same infrastructure as Google.
Vince Carey (04:57:55) (in thread): > if you can’t resolve that link i am sorry. but it points to storage with many pngs used in workspace descriptions relevant to bioconductor.
Stephen Mosher (10:09:45) (in thread): > a) I am unaware of any documentation. I simply pressed the “insert image” button in the “ABOUT THE WORKSPACE” editor and pasted a url, which worked. > b) Will look into this.
Vince Carey (12:24:58) (in thread): > apparently there is an “add image” control in the workspace details interface that may be helpful. i think i’ve seen it but forgotten about it. maybe you can upload a png to the workspace and reference it using that?
2022-04-12
Nitesh Turaga (11:04:05): > 11:00 AM - 12:00 PM Bioc/AnVILToday!!Where:https://meet.google.com/hqg-hysj-ouy - Attachment (accounts.google.com): Google Calendar > With Google’s free online calendar, it’s easy to keep track of life’s important events all in one place. - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
2022-04-25
Nitesh Turaga (12:53:44): > Hi@Sehyun Ohand@Levi WaldronI was hoping I could get some input from you guys for the AnVIL RRPR report? (“Research Performance Progress Report”) > > One of the sections of the report is reserved for training of students and postdocs. Are you able to share a statement of what you’ve been working on with regards to AnVIL training@Sehyun Oh? (any workshops or workspaces you’ve made available?)
Sehyun Oh (12:55:39): > @Nitesh TuragaIs there any specific time window for the activities to be reported?
Nitesh Turaga (12:55:54): > yes, wedensday this week.
Nitesh Turaga (12:56:01): > The RRPR goes out on Wednesday.
Sehyun Oh (13:08:32): > Oh… sorry my question wasn’t clear. Any training activity happened through our the whole AnVIL project period? Or just last year? etc.
Nitesh Turaga (13:08:47): > Just the last year would be good.
Nitesh Turaga (13:40:59): > Just give me a sentence or two at a very high level.
Sean Davis (13:49:55): > It might be helpful to capture Bioc-based AnVIL training (and use in education) systematically and even publicly, beyond the RPPR.
Nitesh Turaga (13:54:36): > Yes, I agree. This is a good point, maybe a github repo will suffice.
Levi Waldron (14:26:07): > I added BioC2022, RunTerraWorkflow, and biobakeR@Sehyun Oh
Sean Davis (14:27:49) (in thread): > There is also:https://github.com/anvilproject/anvil-portalthat drives this (and other pages)https://anvilproject.org/learn. - Attachment (The AnVIL): Getting Started with AnVIL > A guided walk-through of the AnVIL / Terra documentation with a focus on onboarding and preparing new users to run genomic analyses in the cloud.
Sehyun Oh (14:28:51): > Thanks@Levi Waldron
2022-04-27
Kozo Nishida (00:12:01) (in thread): > Hi@Nitesh TuragaIs Bioc/AnVIL meeting held every month? > If so, I would like to add it to the “Bioconductor Events Google calendar”.https://www.bioconductor.org/help/events/
Kozo Nishida (02:25:03): > Hi all, > Does anyone know this error? - File (PNG): image.png
Kozo Nishida (02:26:32): > I want to create a Terra/AnVIL DATA table with Bioconductor AnVIL package.
Vince Carey (07:21:00): > Hi Kozo –@Martin Morganis the AnVIL package creator. I think we would need to know more about the specific tasks and authentication processes you’ve used to produce this error.
Vince Carey (07:21:44) (in thread): > the meeting is every other week, we had one yesterday, but i think it is regarded as internal so far
Martin Morgan (07:30:29): > Useavtable()
to create a table like the ones under TABLES in the attached. Useavdata()
to access the ‘REFERENCE DATA. and ’OTHER DATA/Workspace Data’. There are examples in the vignette, e.g.,Using avtable*()… - File (PNG): Screen Shot 2022-04-27 at 7.28.10 AM.png
Martin Morgan (07:34:15): > But the error and warnings seems unusual. I’m assuming you are using the standard RStudio runtime. You should have permission to create/home/rstudio/.config/R
. I opened the RStudio ‘Terminal’ and did the following > > rstudio@e43ebdeb967c:~$ pwd > /home/rstudio > rstudio@e43ebdeb967c:~$ ls -al ~/.config > total 16 > drwxr-xr-x 4 rstudio users 4096 Apr 26 18:38 . > drwxrwxrwx 10 root root 4096 Apr 26 20:14 .. > drwxr-xr-x 4 rstudio users 4096 Apr 26 19:50 gcloud > drwxr-xr-x 3 rstudio users 4096 Apr 19 18:19 rstudio > rstudio@e43ebdeb967c:~$ whoami > rstudio >
> Do you see something similar? > > Also, doesAnVIL::gcloud_account()
report something that starts withpet-...@...
?
Kozo Nishida (19:43:28) (in thread): > Thank you for the information. I got it. > Then I don’t try to add it to the calendar.
Kozo Nishida (19:52:02): > > Do you see something similar? > ~Yes.~No, in my environment the owner of the rstudio directory is root instead of rstudio…. - File (PNG): image.png
Kozo Nishida (19:57:17): > > Also, doesAnVIL::gcloud_account()
report something that starts withpet-...@...
? > No, it also fails like this. > The reason is that the owner of~/.config
is notrstudio
, which is the same as before. - File (PNG): image.png
Kozo Nishida (19:58:45): - File (PNG): image.png
Kozo Nishida (21:29:51): > @Martin MorganLet me know if you have any ideas on what to check next.
2022-04-28
Martin Morgan (09:51:52): > @Nitesh Turagawhere there any changes on our end that would mean that the owner of/home/rstudio/.config
would change from ‘rstudio’ to ‘root’?
Nitesh Turaga (09:54:02): > Hmm…Well, this came with the AnVIL early on, > > # add rstudio user to users group > RUN usermod -g users rstudio \ > && useradd -m -s /bin/bash -N -u 1001 welder-user >
> rstudio
seems to be only in a ‘user’ group.
Nitesh Turaga (09:54:13): > https://github.com/anvilproject/anvil-docker/blob/master/anvil-rstudio-bioconductor/Dockerfile#L18
Nitesh Turaga (09:54:38): > What is in that.config
file?
Martin Morgan (10:00:11): > > > dir("~/.config/", recursive = TRUE) > [1] "gcloud/access_tokens.db" > [2] "gcloud/active_config" > [3] "gcloud/config_sentinel" > [4] "gcloud/configurations/config_default" > [5] "gcloud/credentials.db" > [6] "gcloud/gce" > ... > [26] "rstudio/rstudio-prefs.json" >
> but it’s the default location for ‘config’ files, probably used by many applications > > > tools::R_user_dir("my_package", "config") > [1] "/home/rstudio/.config/R/my_package" >
Nitesh Turaga (10:01:40): > let me see where that.config
dir is coming from
Martin Morgan (10:13:58): > I’ll take this over to the AnVIL slack…it’s either an unintentional change or breaks things in a big way…
Kozo Nishida (10:33:37): > Let me know if you would like me to provide some information as well.
Martin Morgan (10:36:38) (in thread): > Thanks, I can reproduce this if I create a new cloud environment so should be able to work this out; I’ll let you know when things are resolved.
2022-04-29
Martin Morgan (13:36:39) (in thread): > Hi@Kozo Nishidathis issues seems to have been resolved, but you need to re-create the cloud environment, including with a new persistent disk. > > This means that any work you have done will be lost. If it’s important, you can, from the command-line terminal in RStudio, create a temporary location for the google cloud configuration files > > $ export CLOUDSDK_CONFIG=/tmp/my_config >
> and then use command-line tools to back up to, e.g., the workspace bucket (the bucket is available in the workspace dashboard); you would use something likegsutil cp <local files> gs://<bucket>/path
. I can help you with this if it is important.
2022-05-02
Kozo Nishida (14:09:38) (in thread): > @Martin MorganSorry for my late reply… > > this issues seems to have been resolved, but you need to re-create the cloud environment, including with a new persistent disk. > Thank you for your help. > Now theavtables()
works well. > > This means that any work you have done will be lost. If it’s important, you can, from the command-line terminal in RStudio, create a temporary location for the google cloud configuration files > My old workspace didn’t have any important data. > Don’t be concerned about that.:slightly_smiling_face: - File (PNG): image.png
Kozo Nishida (14:16:10): > Isbioconductor-rpci-anvil/1000G-high-coverage-2019
workspace still in Terra?
Kozo Nishida (14:17:23): > bioconductor-rpci-anvil/1000G-high-coverage-2019
is used by the AnVIL package vignette, but I don’t think I can see this workspace.
Nitesh Turaga (14:18:41): > https://anvil.terra.bio/#workspaces/anvil-datastorage/1000G-high-coverage-2019
Kozo Nishida (14:19:06): > Thank you for the information!
2022-05-10
Nitesh Turaga (11:01:03): > Bioc AnVIL meeting today at 11am.
Nitesh Turaga (11:01:12): > https://dfci.zoom.us/j/95268776262?pwd=MDdvbUdjNExIQW5qS3FSMWFxNzdOZz09Meeting ID: 952 6877 6262 > Passcode: 845028
2022-07-19
Nitesh Turaga (11:04:35): > Bioc AnVIL meeting today at 11am. > > [11:01 AM][https://dfci.zoom.us/j/95268776262?pwd=MDdvbUdjNExIQW5qS3FSMWFxNzdOZz09](https://dfci.zoom.us/j/95268776262?pwd=MDdvbUdjNExIQW5qS3FSMWFxNzdOZz09)Meeting ID: 952 6877 6262 > Passcode: 845028
Nitesh Turaga (11:09:35): > https://docs.google.com/document/d/1iNHxmXLY1KB8VKekZK2WkdbwlsgnL_9DxHIxiEW6pnQ/edit#
2022-09-21
Levi Waldron (17:15:17): > Just FYI - I met folks at WCMC this week, where their hospital is planning to put ~1M digitized H&E pathology slides in Google buckets, and they’d like to figure out how to use Terra/AnVIL workflows for QC/segmentation/feature extraction, make findable, and incorporate into downstream Bioconductor analysis with matched genomic data. I put them in touch with@Sehyun Ohto talk about workflows, but it seems like a significant enough use case that it could be part of the renewal application either for AnVIL or U24 Cancer Genomics.
Vince Carey (20:34:11): > what is the size overall? private data?
2022-09-22
Marcel Ramos Pérez (11:52:19): > FWIW, in TCGA there are 11,766 SVS files at 12.95 TB
Ludwig Geistlinger (13:21:47) (in thread): > How do you interact with these SVS files? Which tools/libraries/packages do you use?
Marcel Ramos Pérez (13:30:04) (in thread): > I would useGenomicDataCommons
,imagemagick
,vips
, andEBImage
but I don’t know what the latest packages / tools to use are
Ludwig Geistlinger (16:10:56) (in thread): > Thisvips
looks promising as it is based onopenslide- is there avips
R package? can’t find from google …
Marcel Ramos Pérez (16:41:27) (in thread): > Not that I know of. I guess it’d be listed here if it were supported:https://github.com/libvips/libvips
2022-09-27
Sehyun Oh (11:04:57): > Do we have a meeting today?
Andres Wokaty (11:20:27) (in thread): > Yes, i’ll send the link
Andres Wokaty (11:20:45) (in thread): > https://meet.google.com/meq-gtxx-dth - Attachment (meet.google.com): Meet > Real-time meetings by Google. Using your browser, share your video, desktop, and presentations with teammates and customers.
2022-10-10
Sean Davis (12:59:54): > This is a little off-the-beaten-path, but what is the going wisdom about estimating costs for AnVIL usage. I’m giving the advice to “figure out how much storage and compute you need,” but then we need the costs for compute. I know that AnVIL is at some point transitioning off Google to Azure.
Sean Davis (13:01:08): > Answering my own question:https://anvilproject.org/learn/investigators/budget-templates - Attachment (The AnVIL): Preparing a Cloud Cost Budget Justification > An overview of best practices for account setup in AnVIL to effectively track and control cloud costs.
2022-10-11
Stephen Mosher (08:42:38): > Glad you found it, was about to point this out!
2022-10-12
Sean Davis (14:01:41) (in thread): > Hi,@Stephen Mosher. Just a quick followup for you or others. Does the AnVIL cost calculator utilize STRIDES discounts? Does the application of STRIDES discounts depend on the researcher and their individual billing account setup, or are they automatically applied prior to billing the user?
Sean Davis (14:56:03): - Attachment: Attachment > Hi, @Stephen Mosher. Just a quick followup for you or others. Does the AnVIL cost calculator utilize STRIDES discounts? Does the application of STRIDES discounts depend on the researcher and their individual billing account setup, or are they automatically applied prior to billing the user?
2022-10-13
Stephen Mosher (11:14:13) (in thread): > Hi@Sean Davis, this is a great question. TheAnVIL Cost Estimatordoesnotinclude STRIDES discounts. My understanding is that STRIDES discounts are negotiated between STRIDES and Research Institutions. Also, STRIDES Billing Accountsdo needto be setup to take advantage of these discounts (via STRIDES or a distributor like Carahsoft et al). Unfortunately there is no automatic discounts applied.
Sean Davis (16:57:19): > Thanks,@Stephen Mosher. That was my understanding, but I didn’t find it “written” anywhere so thought I would ask.
2022-11-08
Sehyun Oh (10:52:44): > I’m not feeling well and won’t make the meeting today. Sorry.
2022-11-22
Levi Waldron (11:14:48): > Sehyun and I are waiting to be let in. I was first using the links attached to the calendar invitation,https://us04web.zoom.us/j/72658474382?pwd=ja6MDTKxUdiNUag6jg8EHMOLTVnIIB.1andhttps://meet.google.com/meq-gtxx-dth. Now I’m trying the one noted in this channel,https://meet.google.com/ied-ouvi-sey
Marcel Ramos Pérez (11:16:32) (in thread): > It was cancelled, see herehttps://devteam-bioc.slack.com/archives/C024WH42AD7/p1669132935643379
Levi Waldron (11:16:56) (in thread): > Ah OK, thank you Marcel!
Levi Waldron (11:17:36) (in thread): > So many channels:exploding_head:
2022-12-06
Sehyun Oh (11:18:30): > FYI, I’m at the Carpentries instructor training workshop now, so can’t make the meeting today.
2022-12-12
Levi Waldron (11:09:25): > When I share access with several users, individually or through a group in AnVIL, is there any way to break down billing costs per-user? As far as I know the answer is no, but I wanted to confirm. I am incurring ~$5/day from a classroom usage that may be from one user creating an unnecessarily large disk, but as far as I know my only option if I wanted to control this cost, beyond generic please to delete workspaces with large disks or that are not in use, would be to disconnect billing for the billing group of all the students.
Alex Mahmoud (13:05:13) (in thread): > Might be too much hassle, but in theory, iirc, Terra labels resources with some workspace/user information at creation time. Assuming that includes something like the pet service account they create on behalf of a user in your billing project, you might be able to get some information on whether it’s an outlier or everyone adding up to the spend via something likehttps://cloud.google.com/blog/topics/cost-management/use-labels-to-gain-visibility-into-gcp-resource-usage-and-spending, and then maybe with help from Terra, assuming it is an outlier, you might be able to ask them to remap the service account to a user and give you their email. Requires a lot of cooperation with the Broad, but might be doable if my recollection/assumptions hold - Attachment (Google Cloud Blog): Use labels to gain visibility into GCP resource usage and spending | Google Cloud Blog
2022-12-16
Stephen Mosher (12:05:45) (in thread): > @Levi Waldron- This depends on whether you have shared funding with the students through shared workspaces or shared billing projects. > * If the users are all operating out of their own workspace (funding through shared billing project scenario), that should be discernible because each workspace has its very own GCP billing project. > * However, if students are working from a shared workspace (funding through shared workspace scenario), that’d be more tricky. According tothis Terra support article: > > > it is not possible to get cost breakdowns per user for work done in a shared workspace. The most granular cost breakdown of each Google Cloud resource (storage, compute and egress) is per workspace (for workspaces created after September 27, 2021)
Stephen Mosher (12:06:44) (in thread): > Here is the analogy why you cant get per user costs in a shared workspace: > > Billing in Terra works much the same as billing for electricity in a building. The person on the electric bill (Terra billing owner) pays for all the electricity (Google Cloud costs) used over the month by all roommates (collaborators). If one roommate turns up the thermostat and opens the windows (runs a huge analysis), it’s the owner who pays for the extra electricity (Google Cloud compute and storage costs).
Levi Waldron (12:31:09) (in thread): > Thank you for making that so clear Stephen! Unfortunately I did use a shared workspace. Next time I will instead add the class group to a billing project and make it only Reader on the workspace without “Can compute” access, then ask them to start by cloning the workspace.
Levi Waldron (12:56:47) (in thread): > While I am at it - I was trying to delete some old groups but am getting this error, and am not sure what other group it is referring to. I was doing this to make sure these groups no longer have compute or billing access, but in my poking around it looks like billing access can only be provided to individuals (not groups), and I’ve already deleted workspaces that shared compute with these groups, so I think it doesn’t matter anyways. > > Error deleting group. > > > > group AppStatBio2021 cannot be deleted because it is a member of at least 1 other group > > Full error: > > > { > "causes": [], > "message": "group AppStatBio2021 cannot be deleted because it is a member of at least 1 other group", > "source": "sam", > "stackTrace": [], > "statusCode": 409 > } >
Levi Waldron (12:58:22) (in thread): > Just informational I guess, I see this error on the “Spend report” tab of a billing project in current use (the one I was asking about at first) > > No spend data found for billing project BIOS2-F2022 between dates 2022-11-16 and 2022-12-16 > > > Full Error Detail > { > "causes": [], > "message": "no spend data found for billing project BIOS2-F2022 between dates 2022-11-16 and 2022-12-16", > "source": "rawls", > "stackTrace": [], > "statusCode": 404 > } >
2023-01-31
Selvi Guharaj (14:37:04): > @Selvi Guharaj has joined the channel
Selvi Guharaj (15:03:34): > @Selvi Guharaj has left the channel
2023-03-31
Monica Valecha (08:32:48): > @Monica Valecha has joined the channel
2023-05-04
Leopoldo Valiente (16:16:50): > @Leopoldo Valiente has joined the channel
2023-05-18
Oluwafemi Oyedele (05:54:24): > @Oluwafemi Oyedele has joined the channel
2023-05-30
Chiachun Chiu (08:43:54): > @Chiachun Chiu has joined the channel
2023-06-19
Pierre-Paul Axisa (05:08:31): > @Pierre-Paul Axisa has joined the channel
2024-04-11
Sean Davis (22:01:14): > Is anyone aware of whether AnVIL can support datasets that are covered by GDPR? A member of our cancer center wants to use these data:https://ega-archive.org/studies/EGAS00001006494and our compliance office suggested that we cannot operate on the data locally. - Attachment (ega-archive.org): TRACERx NSCLC - Whole exome multiregion sequencing data - EGA European Genome-Phenome Archive
2024-04-12
Frederick Tan (08:18:49) (in thread): > Ok to cross-post atthe-anvil.slack.com? Will report back on what we hear …
2024-04-15
Frederick Tan (13:50:20) (in thread): > @Sean DavisFromhttp://anvil.terra.bio/terms-of-service > > To the extent you Connect Content in Terra that is protected under GDPR, UK GDPR or the Data Protection Act 2018, you and we agree that the Data Protection Addendum describes how we are processing any applicable personal data included in your Content. In absence of a DPA, you may not use Terra to Connect Content that includes PHI or special category personal data without Deidentifying such data prior to Connecting it to Terra.
Sean Davis (16:26:50) (in thread): > @Frederick Tanthat makes sense. Thanks. Should have thought to look there….
2024-07-29
JP Flores (17:07:47): > @JP Flores has joined the channel