GenomicSuperSignature

AnVILPublish

Package installation

If necessary, install the AnVILPublish library.

if (!"AnVILPublish" %in% rownames(installed.packages()))
    AnVIL::install("AnVILPublish")

There are only a small number of functions in the package; it is likely best practice to invoke these using AnVILPublish::...() rather than attaching the package to the search path.

The gcloud SDK

It is necessary to have the gcloud SDK available to copy notebook files to the workspace. Test availability with

AnVIL::gcloud_exists()
AnVIL::gcloud_account("shbrief@gmail.com")  # UPDATE WITH YOUR EMAIL!!!
AnVIL::gcloud_project("bioc2021-workshop")

and verify that the account and project are appropriate (consistent with AnVIL credentials) for use with AnVIL.

AnVIL::gcloud_account()
AnVIL::gcloud_project()

Note that these be used to set, as well as interrogate, the account and project.

notedown software

Conversion of .Rmd vignettes to .ipynb notebooks uses notedown python software. It must be available from within R, e.g.,

system2("notedown", "--version")

Workspace from package source

AnVILPublish::as_workspace(
    "path/to/package",
    "billing-project-name",     # i.e., billing account
    create = TRUE               # use update = TRUE for an existing workspace
)

For this vignette, I cloned GenomicSuperSignature repo through https.

git clone https://github.com/shbrief/GenomicSuperSignature.git

From the below code, I create this workspace.

AnVILPublish::as_workspace(
    path = "/home/rstudio/GenomicSuperSignature/",   # package source
    namespace = "bioc2021-workshop",   # billing account
    create = TRUE,
    use_readme = TRUE
)

Analysis in RStudio

Clone workspace

We will try GenomicSuperSignature analyses using RStudio in Terra.

First, start with cloning the template workspace.

  1. Login to Terra and navigate to “GenomicSuperSignaturePaper” workspace

  1. Clone this workspace:
  • [Step 5] Assign a unique workspace name
  • [Step 6] Use the billing project, “bioc2021-workshop-rstudio”

Choose cloud environment

You can choose the cloud environment of your workspace at the project level. Currently, to use RStudio in Terra, you should use one of the custom environments. Below screen captures show how to do it.

1 & 2. At the top right, click on “Cloud Environment” and click “Customize”
3 & 4. Under “Application Configuration” choose a community-maintained RStudio environment, “RStudio”. Select CPU 4 for our example. (You can reduce the number of CPUs to 2 and the persistent disk size to 20GB to reduce the costs, unless you need more. You can always increase them later at any time.) Click “Create”.


You should then see an R icon in the top-right hand corner, which starts RStudio in your browser. The first time you do this, or after you haven’t used it for some hours, it will take a minute or two to start up. You won’t have to repeat this unless you want to change your compute resources.

Terra will pause the computing environment automatically after a period of inactivity to avoid unnecessary costs, and you can tune the compute resources to what you need for your analysis. You can upload and download files through RStudio. Your work will remain saved on your persistent disk.

Install packages/ Download data

Once your RStudio is ready, run the start-up script from terminal, This start-up script is written under the DASHBOARD. This will take a few minutes.

R -e 'BiocManager::install("AnVIL");
      fpath <- AnVIL::avbucket(namespace = "bioc2021-workshop-rstudio", name = "GenomicSuperSignaturePaper");
      fnames <- AnVIL::avfiles_ls(namespace = "bioc2021-workshop-rstudio", name = "GenomicSuperSignaturePaper");
      AnVIL::gsutil_cp(file.path(fpath, fnames), ".")'
chmod 711 startup.sh
./startup.sh

rm /home/rstudio/install_R_pkgs.R
rm /home/rstudio/pkgs_to_install.rds
rm /home/rstudio/startup.sh

Microbiome analysis

Runnable workflow package

For R users with the limited computing resources, we introduce RunTerraWorkflow package. This package allows users to run workflows implemented in Terra without writing any workflow, installing software, or managing cloud resources. Terra’s computing resources rely on Google Cloud Platform (GCP) and to use RunTerraWorkflow, you only need to setup the Terra account once at the beginning.

Using AnVIL package, RunTerraWorkflow allows users to access both Terra and GCP through R session from a conventional laptop, greatly lowers the learning curve for high-performance, cloud-based genomics resources.

bioBakery

bioBakery workflows is a collection of workflows and tasks for executing common microbial community analyses using standardized, validated tools and parameters. bioBakery is built and maintained by Huttenhower lab.

You can find the usecase example of running Terra-implemented bioBakery workflow (Whole Metagenome Shotgun (wmgx) workflow version 3) using RunTerraWorkflow package HERE.