Chapter 9 Automation as a reproducibility tool
9.1 Learning Objectives
We’ve discussed that a reproducible analysis can be run by someone else and obtain the same result. But what if before you bug your colleague to use their time to re-run your analysis, you had a robot re-run your analysis? Robots don’t get tired or have other deadlines to respond to and can be set up to re-run your analysis at any time. This is the basis of why automation is powerful tool for reproducibility.
There are a lot of applications for GitHub Actions (see links at the end of this chapter) but in the context of our R and Python examples or for scientific notebooks in general, it can be useful to build a GitHub Actions that re-runs the notebook every time a pull request is opened.
This is useful because if the notebook does not re-run successfully by GitHub actions, this can be informative to something being amiss in the changes being made.
9.2 Build an example GitHub Actions
9.2.1 Structure of GitHub actions file
GitHub actions are written yaml file that you store in a folder called
.github/workflows in your GitHub repository.
They have two main parts:
- the trigger:
- the action:
The trigger is specified by
on: and the action that happens upon the trigger being activated is specified by
jobs:. The job can be made up of multiple
on: # Some stuff that specifies when the action should run jobs: # The action that should run
9.2.2 Setting up the trigger
There’s a list of things that happen in GitHub that can be used to trigger a GitHub actions. See the list here in this case, we will set up a github action that happens whenever a pull request is opened that is going to the
on: pull_request: branches: - main jobs: # The action that should run
9.2.3 Setting up the action
The action part of the GitHub action can be named something (here we are calling it
name-of-job and we can use the
runs-on: to specify a docker image to run this on. For this we will use a base image of
This simple action will run a bash command
echo to say
"GitHub action is run!".
on: pull_request: branches: - main jobs: name-of-job: runs-on: ubuntu-latest - name: Run message run: echo "GitHub action is run!"
9.3 Exercise: Set up a GitHub action
You will need to navigate to your own repository to do this.
Tips for developing a GitHub Action:
- As you are adding your GitHub actions, consult the GitHub actions log.
- GitHub actions has pretty great documentation so as you are setting up your GitHub actions template, you will want to reference them.
- Be careful with your spacing this will break your GitHub action.
- Take a look at other GitHub actions that are doing something similar to what you are trying to accomplish.
- For testing purposes, modify the trigger so you can test it. You may want to use a
manual workflow trigger or
run:command to give a multi-line command.
9.3.1 Resources for setting up your GitHub Actions
- Python example GitHub Actions to re-run notebook
- R example GitHub Actions to re-run notebook
- Great course about GitHub actions
- Introduction to GitHub Actions for data scientists.
If you have any feedback on this chapter, please fill out this form, we’d love to hear your feedback!