Title

Chapter 9 Automation as a reproducibility tool

9.1 Learning Objectives

Learning Objectives This chapter will demonstrate how to: Understand how automation aids in reproducibility and efficiency. Learn the basics of GitHub Actions.


We’ve discussed that a reproducible analysis can be run by someone else and obtain the same result. But what if before you bug your colleague to use their time to re-run your analysis, you had a robot re-run your analysis? Robots don’t get tired or have other deadlines to respond to and can be set up to re-run your analysis at any time. This is the basis of why automation is powerful tool for reproducibility.

Ruby wants to know if her analysis is reproducibly so she sets up an automation tool to re-run her analysis whenever she pushes changes to her analysis. This robot has a computer for a body and says 'I will re-run Ruby’s analysis consistently and instantly upon whatever trigger she sets up.''

There are a lot of applications for GitHub Actions (see links at the end of this chapter) but in the context of our R and Python examples or for scientific notebooks in general, it can be useful to build a GitHub Actions that re-runs the notebook every time a pull request is opened.

This is useful because if the notebook does not re-run successfully by GitHub actions, this can be informative to something being amiss in the changes being made.

9.2 Build an example GitHub Actions

9.2.1 Structure of GitHub actions file

GitHub actions are written yaml file that you store in a folder called .github/workflows in your GitHub repository.

They have two main parts:

  • the trigger: on:
  • the action: job:

The trigger is specified by on: and the action that happens upon the trigger being activated is specified by jobs:. The job can be made up of multiple steps:.

on:
  # Some stuff that specifies when the action should run

jobs:
  # The action that should run

9.2.2 Setting up the trigger

There’s a list of things that happen in GitHub that can be used to trigger a GitHub actions. See the list here in this case, we will set up a github action that happens whenever a pull request is opened that is going to the main branch.

on:
  pull_request:
    branches:
      - main

jobs:
  # The action that should run

9.2.3 Setting up the action

The action part of the GitHub action can be named something (here we are calling it name-of-job and we can use the runs-on: to specify a docker image to run this on. For this we will use a base image of ubuntu-latest.

This simple action will run a bash command echo to say "GitHub action is run!".

on:
  pull_request:
    branches:
      - main

jobs:
  name-of-job:
    runs-on: ubuntu-latest

    - name: Run message
       run: echo "GitHub action is run!"

9.3 Exercise: Set up a GitHub action

Use a GitHub Action Template by following these instructions.

You will need to navigate to your own repository to do this.

Tips for developing a GitHub Action:

9.3.1 Resources for setting up your GitHub Actions

If you have any feedback on this chapter, please fill out this form, we’d love to hear your feedback!