Title

Chapter 7 Launching a Docker image

7.1 Learning Objectives

Learning objectives This chapter will demonstrate how to: Understand the goals and motivation of using Docker. Install Docker. Use basic docker commands as a part of a daily usage. Pull and use a docker image for writing code.


In the introductory part of this course, we discussed package managers like renv or conda. Recall that even if you have the same packages installed between two computers, you can still get different results!

Ruby has a particular computing environment she has developed her code from. This computing environment is represented as a bubble above her computer with various hexagons with version numbers as well as Rstudio and R installed on her computer. Her code ran just fine on her particular computing environment. Avi attempted to run Ruby’s code on his very different local computing environment and got an error. His computer runs the same code but came up with a different result!

This is because package versions do influence results as demonstrated by Beaulieu-Jones and Greene (2017). Package managers address part of this problem, however their limitation is that generally only can help with certain sets of packages. conda really only manages conda installed packages and renv doesn’t help with package management outside of R. Both of these have limited capabilities for cross platform shipping.

This is where Docker can help fill in the gaps.

I don’t even count anymore how many times did my code break when someone else run it. The strange part was — it worked on my machine. That’s where Docker saves the day. If it works on your machine, it will work on any.

Radečić (2020)

From Beaulieu-Jones and Casey S. Greene, 2017 A.) The status quo requires a reader or reviewer to find and install specific versions of dependencies. These dependencies can become difficult to find and may become incompatible with newer versions of other software packages. Different versions of packages identify different numbers of significantly differentially expressed genes from the same source code and data. B.) Containers define a computing environment that captures dependencies. In container-based systems, the results are the same regardless of the host system.

7.2 What’s Docker?

One way to ensure that her collaborators have the same computing environment is Ruby could ship her computer to each of her collaborators and have them run the analysis on her computer. But before you buy hundreds of laptops for your projects, we’ll show you how Docker will allow you to send your computing environment to your collaborators in a practical manner.

Ruby has a particular computing environment she has developed her code from but this time Ruby has this computing environment wrapped up in a Docker container. Avi the Associate is able to download the exact computing environment Ruby used for her analysis.

Ruby can create a Docker image that Avi can use to run the analysis. This way Ruby and Avi know they are using the same computing environment. Now if Ruby and Avi obtain different results, it won’t be because of version differences.

Ruby and Avi are using Docker and can ensure that they are using the exact same computing environment as they are working on Ruby’s analysis.

7.3 Install Docker

Go here to install Docker, following the instructions for your particular operating system.

If you don’t have a Docker account create an account when prompted, or go here. After you install Docker, start up Docker desktop by double clicking on the app. It may take some time to start up.

7.4 Getting started with Docker

  1. Open up your command line.
  2. First we need to get the Docker image. A Docker image is like a snapshot of your computing environment that you can move from place to place. We can download images from online and then use them to make a container. Containers are what we use to actually run analyses.

First we need to get the Docker image. A Docker image is like a snapshot of your computing environment that you can move from place to place. We can download images from online and then use them to make a container. Containers are what we use to actually run analyses.

From command line, run one of these commands depending on whether you’d like to use Python or Docker:

To obtain the python docker image
docker pull jhudsl/reproducible-python
To obtain the R docker image
docker pull jhudsl/reproducible-R
  1. Open up the Docker Desktop app. Click on ‘images’ on the left. This shows the images you currently have available on your computer.

Open up Docker desktop. Click on ‘images’ on the left. This shows the images you have available on your computer. If you hover your mouse over one of these images, you will see a Run button appear. Click the Run button to start a container with this image.

  1. Return to your command line. Using cd and ls navigate to your project repository (or whatever files you’d like to be accessible in your development environment) and we can start up a docker container using docker run.
To run the Python docker image
docker run --rm -v $PWD:/home/jovyan/work -e JUPYTER_ENABLE_LAB=yes -p 8787:8787 jhudsl/reproducible-python

Now in your internet browser, go to the address printed out. It should take you to Jupyter Lab. Now you are ready to develop inside a Docker container!

To run the R docker image

But you can change the password to whatever you’d like.

docker run --rm -v $PWD:/home/rstudio -e PASSWORD=password -p 8787:8787 jhudsl/reproducible-r

Now in your internet browser, go to localhost:8787. You should see an RStudio login page.

Login to RStudio. Your username will be rstudio and your password, will be whatever you set your password to be.

Now you are ready to develop inside a Docker container!

To see what containers you have running or to clear out old containers, in Docker Desktop you can go to the Containers/Apps page.

If you navigate to the Containers/Apps page in Docker Desktop, you should see a container in the list. It will be named something randomly (in this case ‘zen_merkle’). From this page you can stop or remove old containers as well as navigate to their browser page.

7.4.1 A Breakdown what these Docker run options are

Docker has super extensive documentation! But to get you started, here’s the highlights for this docker run command:

A breakdown of the docker run command. The remove option (`--rm`) automatically removes the container when docker run exits. Dash v, the volume option is how you specify what files you’d like available in the container and where to find them. In this instance we are using the output of the pwd command (print working directory) so that wherever you run this command, the files in that directory will be available in the container. The part after the colon specifies where these files will be found in the container. Dash e, the environment option is how you can specify any environment variables you will need. In this instance for the rocker image we need to specify a password. Dash p, the port option is how you specify What port you can connect to this on using your internet browser. The last part of the docker run command says what image to run, so in this instance, we are running a container using the jhudsl/reproducible-r image.

  • The remove option (--rm) Automatically removes the container when docker run exits.
  • The volume option (-v) is how you specify what files you’d like available in the container and where to find them. In this instance we are using the output of the pwd command (print working directory) so that wherever you run this command, the files in that directory will be available in the container. The part after the colon specifies where these files will be found in the container.
  • The environment option (-e) is how you can specify any environment variables you will need. In this instance for the rocker image we need to specify a password. but for the python image we needed to specify JUPYTER_ENABLE_LAB=yes so that we can use Jupyter Lab.
  • The port option (-p) is how you specify what port you can connect to this on using your internet browser.
  • The image to use is specified in the last part of the docker run command says what image to run, so in these instances, we are running a container using the jhudsl/reproducible-r or jhudsl/reproducible-python images.

7.5 More about Docker

7.5.0.1 Python specific:

7.5.0.2 R specific:

If you have any feedback on this chapter, please fill out this form, we’d love to hear your feedback!

References

Beaulieu-Jones, Brett K, and Casey S Greene. 2017. “Reproducibility of Computational Workflows Is Automated Using Continuous Analysis.” Nature Biotechnology 35 (4): 342–46. https://doi.org/10.1038/nbt.3780.
“Jupyter Docker Stacks — Docker-Stacks Latest Documentation.” 2018. https://jupyter-docker-stacks.readthedocs.io/en/latest/index.html.
Neuzerling, David. 2018. useR: Getting Started with R and Docker.” https://mdneuzerling.com/post/user-getting-started-with-r-and-docker/.
Okada, Shinichi. 2021. “How to Run Jupyter Notebook on Docker.” Medium. https://towardsdatascience.com/how-to-run-jupyter-notebook-on-docker-7c9748ed209f.
openscilabs. 2021. “Launching RStudio in Docker.” https://jsta.github.io/r-docker-tutorial/02-Launching-Docker.html.
Radečić, Dario. 2020. “Here’s Why You Should Learn Docker as a Data Scientist.” Medium. https://towardsdatascience.com/heres-why-you-should-learn-docker-as-a-data-scientist-c18faf96c946.
Srivastav, Prakhar. 2018. “A Docker Tutorial for Beginners.” A Docker Tutorial for Beginners. https://docker-curriculum.com/.