Chapter 9 Automation as a reproducibility tool
9.1 Learning Objectives
We’ve discussed that a reproducible analysis can be run by someone else and obtain the same result. But what if before you bug your colleague to use their time to re-run your analysis, you had a robot re-run your analysis? Robots don’t get tired or have other deadlines to respond to and can be set up to re-run your analysis at any time. This is the basis of why automation is powerful tool for reproducibility.
There are a lot of applications for GitHub Actions (see links at the end of this chapter) but in the context of our R and Python examples or for scientific notebooks in general, it can be useful to build a GitHub Actions that re-runs the notebook every time a pull request is opened.
This is useful because if the notebook does not re-run successfully by GitHub actions, this can be informative to something being amiss in the changes being made.
9.2 Build an example GitHub Actions
9.2.1 Structure of GitHub actions file
GitHub actions are written yaml file that you store in a folder called .github/workflows
in your GitHub repository.
They have two main parts:
- the trigger:
on:
- the action:
job:
The trigger is specified by on:
and the action that happens upon the trigger being activated is specified by jobs:
. The job can be made up of multiple steps:
.
on:
# Some stuff that specifies when the action should run
jobs:
# The action that should run
9.2.2 Setting up the trigger
There’s a list of things that happen in GitHub that can be used to trigger a GitHub actions. See the list here in this case, we will set up a github action that happens whenever a pull request is opened that is going to the main
branch.
on:
pull_request:
branches:
- main
jobs:
# The action that should run
9.2.3 Setting up the action
The action part of the GitHub action can be named something (here we are calling it name-of-job
and we can use the runs-on:
to specify a docker image to run this on. For this we will use a base image of ubuntu-latest
.
This simple action will run a bash command echo
to say "GitHub action is run!"
.
on:
pull_request:
branches:
- main
jobs:
name-of-job:
runs-on: ubuntu-latest
- name: Run message
run: echo "GitHub action is run!"
9.3 Exercise: Set up a GitHub action
Use a GitHub Action Template by following these instructions.
You will need to navigate to your own repository to do this.
Tips for developing a GitHub Action:
- As you are adding your GitHub actions, consult the GitHub actions log.
- GitHub actions has pretty great documentation so as you are setting up your GitHub actions template, you will want to reference them.
- Be careful with your spacing this will break your GitHub action.
- Take a look at other GitHub actions that are doing something similar to what you are trying to accomplish.
- For testing purposes, modify the trigger so you can test it. You may want to use a
manual workflow trigger or
pull request:
andpush:
. - Use
|
in yourrun:
command to give a multi-line command.
9.3.1 Resources for setting up your GitHub Actions
- Python example GitHub Actions to re-run notebook
- R example GitHub Actions to re-run notebook
- Great course about GitHub actions
- Introduction to GitHub Actions for data scientists.
If you have any feedback on this chapter, please fill out this form, we’d love to hear your feedback!