Chapter 7 Using programming platforms on AnVIL

Modules about opening, touring, and closing AnVIL platforms


7.1 Video overview on using Jupyter Notebooks

Here is a video tutorial that describes the basics of using Jupyter Notebook on AnVIL.

7.1.1 Objectives

  • Start compute for your Jupyter environment
  • Create notebook to perform analysis
  • Stop compute to minimize expenses

7.1.2 Slides

The slides for this tutorial are are located here.

7.2 Launching Jupyter

AnVIL is very versatile and can scale up to use very powerful cloud computers. It’s very important that you select a cloud computing environment appropriate to your needs to avoid runaway costs. If you are uncertain, start with the default settings; it is fairly easy to increase your compute resources later, if needed, but harder to scale down.

Note that, in order to use Jupyter, you must have access to a Terra Workspace with permission to compute (i.e. you must be a “Writer” or “Owner” of the Workspace).

  1. Open Terra - use a web browser to go to anvil.terra.bio

  2. In the drop-down menu on the left, navigate to “Workspaces”. Click the triple bar in the top left corner to access the menu. Click “Workspaces”.

    Screenshot of Terra drop-down menu.  The "hamburger" button to extend the drop-down menu is highlighted, and the menu item "Workspaces" is highlighted.

  3. Click on the name of your Workspace. You should be routed to a link that looks like: https://anvil.terra.bio/#workspaces/<billing-project>/<workspace-name>.

  4. Click on the cloud icon on the far right to access your Cloud Environment options. If you don’t see this icon, you may need to scroll to the right.

    Screenshot of a Terra Workspace. The cloud icon to create a new cloud environment is highlighted.

  5. In the dialogue box, click the “Settings” button under Jupyter.

    Screenshot of the Cloud Environment Details dialogue box. The Settings button under Jupyter is highlighted.

  6. You will see some configuration options for the Jupyter cloud environment, and a list of costs because it costs a small amount of money to use cloud computing.

    Screenshot of the Jupyter Cloud Environment dialogue box. The cost to run the environment is highlighted.

  7. Configure any settings you need for your cloud environment. If you are uncertain about what you need, the default configuration is a reasonable, cost-conservative choice. It is fairly easy to increase your compute resources later, if needed, but harder to scale down. Scroll down and click the “CREATE” button when you are satisfied with your setup.

    Screenshot of the Jupyter Cloud Environment dialogue box. The "CREATE" button is highlighted.

  8. The dialogue box will close and you will be returned to your Workspace. You can see the status of your cloud environment by hovering over the Jupyter icon. It will take a few minutes for Terra to request computers and install software.

    Screenshot of a Terra Workspace. The hovertext for the Jupyter icon is highlighted, and indicates that the status of the environment is "Creating".

  9. When your environment is ready, its status will change to “Running”. Click on the “ANALYSES” tab to create or open a Jupyter Notebook.

    Screenshot of a Terra Workspace. The hovertext for the Jupyter icon is highlighted, and indicates that the status of the environment is "Running".  The ANALYSES tab is also highlighted

  10. From the ANALYSES tab, you can click on the name of an existing Jupyter Notebook to view and launch it, or click the “START” button to create a new Notebook.

    Screenshot of Terra Workspace with the "ANALYSES" tab selected and highlighted.  The page shows a list of Jupyter Notebooks.  The Notebook names and the START button are highlighted.

  11. Clicking on a Notebook name will open a static preview of the Notebook. To edit and run the Notebook, click the “OPEN” button.

    Screenshot of a preview of a Jupyter Notebook in a Terra Workspace.  The "OPEN" button is highlighted.

7.3 Video overview on using Galaxy

Here is a video tutorial that describes the basics of using Galaxy on AnVIL.

7.3.1 Objectives

  • Start compute for your Galaxy on AnVIL
  • Run tool to quality control sequencing reads
  • Stop compute to minimize expenses

7.3.2 Slides

The slides for this tutorial are are located here.

7.4 Starting Galaxy

Note that, in order to use Galaxy, you must have access to a Terra Workspace with permission to compute (i.e. you must be a “Writer” or “Owner” of the Workspace).

Open your Workspace, and click on the “Environment configuration” button, a cloud icon on the righthand side of the screen.

Screenshot of the Workspace that points to the Environment configuration button, an icon of a cloud with a lightning bolt.

Under Galaxy, click on “Create new Environment”. Click on “Next” and “Create” to keep all settings as-is. This will take 8-10 minutes.

The button that starts a cloud environment for Galaxy has been highlighted,

Click on “Open Galaxy” when the environment is ready.

The Open Galaxy button is highlighted in the ready environment popup.

7.6 Deleting Galaxy

Once you are done with your activity, you’ll need to shut down your Galaxy cloud environment. This frees up the cloud resources for others and minimizes computing cost. The following steps will delete your work, so make sure you are completely finished at this point. Otherwise, you will have to repeat your work from the previous steps.

Return to AnVIL, and find the Galaxy logo that shows your cloud environment is running. Click on this logo.

Screenshot of the Workspace menu. The currently running Galaxy cloud environment logo on the right sidebar is highlighted.

Next, click on “Settings”. Click on “Delete Environment”.

Screenshot of the cloud environment pop out menu. The "Delete Environment" button is highlighted.

Finally, select “Delete everything, including persistent disk”. Make sure you are done with the activity and then click “Delete”.

Screenshot of the cloud environment pop out menu. The “Delete everything, including persistent disk” radio button has been checked and is highlighted. The “Delete” button is highlighted.

7.7 Video overview on using RStudio

Here is a video tutorial that describes the basics of using RStudio on AnVIL.

7.7.1 Objectives

  • Start compute for your RStudio environment
  • Tour RStudio on AnVIL
  • Stop compute to minimize expenses

7.7.2 Slides

The slides for this tutorial are are located here.

7.8 Launching RStudio

AnVIL is very versatile and can scale up to use very powerful cloud computers. It’s very important that you select a cloud computing environment appropriate to your needs to avoid runaway costs. If you are uncertain, start with the default settings; it is fairly easy to increase your compute resources later, if needed, but harder to scale down.

Note that, in order to use RStudio, you must have access to a Terra Workspace with permission to compute (i.e. you must be a “Writer” or “Owner” of the Workspace).

  1. Open Terra - use a web browser to go to anvil.terra.bio

  2. In the drop-down menu on the left, navigate to “Workspaces”. Click the triple bar in the top left corner to access the menu. Click “Workspaces”.

    Screenshot of Terra drop-down menu.  The "hamburger" button to extend the drop-down menu is highlighted, and the menu item "Workspaces" is highlighted.

  3. Click on the name of your Workspace. You should be routed to a link that looks like: https://anvil.terra.bio/#workspaces/<billing-project>/<workspace-name>.

  4. Click on the cloud icon on the far right to access your Cloud Environment options. If you don’t see this icon, you may need to scroll to the right.

    Screenshot of a Terra Workspace. The cloud icon to create a new cloud environment is highlighted.

  5. In the dialogue box, click the “Settings” button under RStudio.

    Screenshot of the Cloud Environment Details dialogue box. The Settings button under RStudio is highlighted.

  6. You will see some configuration options for the RStudio cloud environment, and a list of costs because it costs a small amount of money to use cloud computing.

    Screenshot of the RStudio Cloud Environment dialogue box. The cost to run the environment is highlighted.

  7. Configure any settings you need for your cloud environment. If you are uncertain about what you need, the default configuration is a reasonable, cost-conservative choice. It is fairly easy to increase your compute resources later, if needed, but harder to scale down. Scroll down and click the “CREATE” button when you are satisfied with your setup.

    Screenshot of the RStudio Cloud Environment dialogue box. The "CREATE" button is highlighted.

  8. The dialogue box will close and you will be returned to your Workspace. You can see the status of your cloud environment by hovering over the RStudio icon. It will take a few minutes for Terra to request computers and install software.

    Screenshot of a Terra Workspace. The hovertext for the RStudio icon is highlighted, and indicates that the status of the environment is "Creating".

  9. When your environment is ready, its status will change to “Running”. Click on the RStudio logo to open a new dialogue box that will let you launch RStudio.

    Screenshot of a Terra Workspace. The hovertext for the RStudio icon is highlighted, and indicates that the status of the environment is "Running".

  10. Click the launch icon to open RStudio. This is also where you can pause, modify, or delete your environment when needed.

    Screenshot of the RStudio Environment Details dialogue box. The "Open" button is highlighted.

  11. You should now see the RStudio interface with information about the version printed to the console.

    Screenshot of the RStudio environment interface.

7.9 Touring RStudio

Next, we will be using RStudio and the package Glimma to create interactive plots. See this vignette for more information.

  1. The Bioconductor team has created a very useful package to programmatically interact with Terra and Google Cloud. Install the AnVIL package. It will make some steps easier as we go along.

    Screenshot of the RStudio environment interface. Code has been typed in the console and is highlighted.

  2. You can now quickly install precompiled binaries using the AnVIL package’s install() function. We will use it to install the Glimma package and the airway package. The airway package contains a SummarizedExperiment data class. This data describes an RNA-Seq experiment on four human airway smooth muscle cell lines treated with dexamethasone.

{Note: for some of the packages, you will have to install packaged from the CRAN repository, using the install.packages() function. The examples will show you which install method to use.}

<img src="07-using_platforms_modules_files/figure-html//1BLTCaogA04bbeSD1tR1Wt-mVceQA6FHXa8FmFzIARrg_g11f12bc99af_0_56.png" alt="Screenshot of the RStudio environment interface. Code has been typed in the console and is highlighted." width="100%" />
  1. Load the example data.

    Screenshot of the RStudio environment interface. Code has been typed in the console and is highlighted.

  2. The multidimensional scaling (MDS) plot is frequently used to explore differences in samples. When this data is MDS transformed, the first two dimensions explain the greatest variance between samples, and the amount of variance decreases monotonically with increasing dimension. The following code will launch a new window where you can interact with the MDS plot.

    Screenshot of the Glimma popout showing the data in an MDS plot. All data points are blue.

  3. Change the colour_by setting to “groups” so you can easily distinguish between groups. In this data, the “group” is the treatment.

    Screenshot of the Glimma popout showing the data in an MDS plot. Data points are colored blue and orange by group. The colour by dropdown menu on the interactive plot is hightlighted.

  4. You can download the interactive html file by clicking on “Save As”.

    Screenshot of the Glimma popout showing the data in an MDS plot. The Save As menu is highlighted.

  5. You can also download plots and other files created directly in RStudio. To download the following plot, click on “Export” and save in your preferred format to the default directory. This saves the file in your cloud environment.

    Screenshot of the RStudio interface. A plot has been created. The Export menu has been highlighted.

  6. You should see the plot in the “Files” pane.

    Screenshot of the RStudio interface. A plot has been created. The saved pdf file is now visible under the "Files" pane.

  7. Select this file and click “More” > “Export”

    Screenshot of the RStudio interface. A plot has been created. The saved pdf file is now visible under the "Files" pane. The "More" and "Export" menus have been highlighted.

  8. Select “Download” to save the file to your local machine.

    Screenshot of the RStudio interface. The popup to download the selected file has been highlighted,

7.10 Pausing RStudio

  1. You can view costs and make changes to your cloud environments from the panel on the far right of the page. If you don’t see this panel, you may need to scroll to the right. Running environments will have a green dot, and paused environments will have an orange dot.

    Screenshot of the RStudio interface. The cloud environment panel on the right is highlighted.

  2. Hovering over the RStudio icon will show you the costs associated with your RStudio environment. Click on the RStudio icon to open the cloud environment settings.

    Screenshot of the cloud environment panel.  The RStudio icon is highlighted.

  3. Click the Pause button to pause RStudio. This will take a few minutes.

    Screenshot of the RStudio cloud environment settings. The Pause button is highlighted.

  4. When the environment is paused, an orange dot will be displayed next to the RStudio icon. If you hover over the icon, you will see that it is paused, and has a small ongoing cost as long as it is paused. When you’re ready to resume working, you can do so by clicking the RStudio icon and clicking Resume.

    Screenshot of a Terra Workspace Dashboard. The RStudio icon in the far right panel is highlighted.  It has an orange dot next to it indicating the cloud environment is paused.

  5. The right-hand side icon reminds you that you are accruing cloud computing costs. If you don’t see this icon, you may need to scroll to the right.

    Screenshot of the RStudio interface. The icon on the right showing that the cloud environment is running is highlighted.

  6. You should minimize charges when you are not performing an analysis. You can do this by clicking on the RStudio icon and selecting “Pause”. This will release the CPU and memory resources for other people to use. Note that your work will be saved in the environment and continue to accrue a very small cost. This work will be lost if the cloud environment gets deleted. If there is anything you would like to save permanently, it’s a good idea to copy it from your compute environment to another location, such as the Workspace bucket, GitHub, or your local machine, depending on your needs.

    Screenshot of the RStudio menu. The pause button which stops the cloud environment is highlighted.

You can also pause your cloud environment(s) at https://anvil.terra.bio/#clusters.

7.11 Deleting RStudio

  1. Pausing your cloud environment only temporarily stops your work. When you are ready to delete the cloud environment, click on the RStudio icon on the right-hand side and select “Settings”. If you don’t see this icon, you may need to scroll to the right.

    Screenshot of the Workspace page. The RStudio icon associated with the cloud environment is highlighted. The Settings button is also highlighted

  2. Click on “Delete Environment”.

    Screenshot of the cloud environment popout. "Delete environment" is highlighted.

  3. If you are certain that you do not need the data and configuration on your disk, you should select “Delete everything, including persistent disk”. If there is anything you would like to save, open the compute environment and copy the file(s) from your compute environment to another location, such as the Workspace bucket, GitHub, or your local machine, depending on your needs.

    Screenshot of the cloud environment popout. "Delete everything, including persistent disk" is highlighted.

  4. Select “DELETE”.

    Screenshot of the cloud environment popout. "Delete" is highlighted.

You can also delete your cloud environment(s) and disk storage at https://anvil.terra.bio/#clusters.

7.12 Pausing vs. Deleting cloud environments

These instructions can be customized to a specific cloud environment by setting AnVIL_module_settings$cloud_environment before running cow::borrow_chapter(). If these variables have not been set, it defaults to “your cloud environment”.

7.12.1 Generic cloud environment

Cloud computing costs are based on the amount of time you use the computing resources, so it’s important to clean up after yourself when you’re done, and not just leave the computers running.

There are two ways to “shut down” your cloud environment on AnVIL:

  • Pause the environment: This will save a copy of your work, and then release the computers for other people to use them. Do this if you plan to continue working in your cloud environment.
    • It’s similar to turning off your computer or phone - when you start it back up, everything will be where you left it.
    • This still costs a small amount of money, but much less than leaving the computer running.
  • Delete the environment: This will delete everything and then release the computers for other people to use them. Do this if you are completely finished working, or if your future work will be in a new environment.
    • It’s similar to throwing your computer or phone in the trash!
    • You will not be able to recover your work.
    • Make sure you have saved anything you need to another location (such as the Workspace bucket, GitHub, or your local machine) before you delete your environment.

7.12.2 RStudio

AnVIL_module_settings <- list(cloud_environment = "RStudio")
cow::borrow_chapter(
  doc_path = "child/_child_cloud_environment_pause_vs_delete.Rmd",
  repo_name = "jhudsl/AnVIL_Template"
)

Cloud computing costs are based on the amount of time you use the computing resources, so it’s important to clean up after yourself when you’re done, and not just leave the computers running.

There are two ways to “shut down” RStudio on AnVIL:

  • Pause the environment: This will save a copy of your work, and then release the computers for other people to use them. Do this if you plan to continue working in RStudio.
    • It’s similar to turning off your computer or phone - when you start it back up, everything will be where you left it.
    • This still costs a small amount of money, but much less than leaving the computer running.
  • Delete the environment: This will delete everything and then release the computers for other people to use them. Do this if you are completely finished working, or if your future work will be in a new environment.
    • It’s similar to throwing your computer or phone in the trash!
    • You will not be able to recover your work.
    • Make sure you have saved anything you need to another location (such as the Workspace bucket, GitHub, or your local machine) before you delete your environment.