Title

Chapter 3 Using version control with GitHub

3.1 Learning Objectives

Learning objectives This chapter will demonstrate how to: Understand that git and GitHub are tools that help your analyses be conducted reproducibly and in an open source manner. Create a GitHub account. Set up a GitHub repository for your analyses.

In the introductory part of this course, we discussed some of the reasons for using GitHub but we didn’t get into version control (i.e. creating versions for managing changes over time) or GitHub’s capabilities much beyond its capacity to store code in a place where others can find it.

In this advanced course, we will dig deeper into Git and GitHub’s capabilities so you can use this to your daily work’s advantage. However, to gain the benefit of these deeper GitHub skills, you will have to form some new habits. Fully embracing the GitHub workflow will make your work more efficient and help you create more transparent and reproducible analyses!

In this chapter we’re going to introduce you to the basic git commands you’ll need, and guide you as we do them together one by one!

3.2 Prerequisites for this chapter

In order to complete this chapter you will need a GitHub account (it’s free). If you do not currently have a GitHub account, we recommend you go through our Intro to Github chapter from the Introduction to Reproducibility course first, then return to this chapter.

This chapter has Prerequisites. Before working on this chapter you should: Have a basic understanding of the motivation for using git and GitHub and Have a GitHub account. If you are lacking either of these, you can scan this QR code, or go to bit.ly/github-rationale

3.3 Set up a Git Client (GitKraken)

Interaction with git and GitHub can be done completely from the command line, but sometimes this can be harder to keep track of. To help us navigate this, we recommend using a git client. There are a lot of different clients out there, and they are generally free for most situations you will need. In this course, we will take you through how to use GitKraken, one such git client.

GitKraken is nice because they have lots of nice tutorials, it works pretty well, and its free for most use cases. But if you find GitKraken doesn’t work for you, you can explore other git clients. For this course, we’ll be using GitKraken.

3.3.1 Install GitKraken

Go here to install GitKraken.

Follow their instructions to sign in with your GitHub account. It will ask you to authorize your GitHub account to connect to GitKraken. Click Authorize.

You may find it helpful to watch GitKraken’s own tutorial (linked below) about how to “git” started, but we will also guide you through each step!

GitHub has a host of terms that can feel like a whole language at first, but we’ll introduce them one at a time. To start, a lot of the GitHub workflow centers around handling copies of your code that are either stored on the internet (are remote) or are stored on your computer (are local).

GitHub has a whole host of terms that can feel like a whole language at first, but we'll introduce them one at a time. To start with, a lot of the GitHub workflow centers around handling copies of your code that are either stored on the internet (are _remote_) or are stored on your computer (are _local_).

Remote = GitHub on the internet
Local = What’s on your own computer

A repository, in the case of a data science project, is mostly synonymous with the word “project”. Using GitHub, a project will exist both as a remote repository and a local repository (in other words, it will be on the internet on GitHub and on your computer).

A remote repository is project that is stored on the internet for example, a URL to jhudsl/reproducible-R-example. A local repository is a project copy that lives on your computer. For example, a file path to reproducible-R-example. So using GitHub, a project will exist both as a remote repository and a local repository. (It will be on the internet on GitHub and on your computer).

Repository = a set of project files that have a location on GitHub

3.4 Get the exercise project files

In this course, you can work on the exercises from your own GitHub repository, but first we will need to set that up. Below are the files you will want to upload to that repository.

Depending on whether you prefer to use R or Python, you can choose to follow this course using one or the other.

Get the Python project example files

Click this link to download.

Now double click your chapter zip file to unzip. For Windows you may have to follow these instructions).

Get the R project example files

Click this link to download.

Now double click your chapter zip file to unzip. For Windows you may have to follow these instructions).

3.5 Start a GitHub repository

  • Go to Github’s main page and sign in with your GitHub account.
  • Follow these instructions to create a repository. As a general, but not absolute rule, you will want to keep one GitHub repository for one analysis project.
    • Name the repository something that reminds you what its related to. For these examples, we’re calling using repository-name as our placeholder.
    • Choose Public.
    • Choose add a README.
  • Follow these instructions to add all the files that are inside the reproducible-R-example.zip or reproducible-python-example.zip file you downloaded to this new repository.

This new repository you created should look something like this.

3.5.1 git clone

Now you have a repository on GitHub online!

In our daily grind, we will work on this code from our own computer. To set this up, we’ll need to clone it to our own computer. Cloning is making a remote copy of the project local.

To clone a repository means to copy a remote repository to your local computer

clone = To make a remote repository local. In other words, to make an online repository downloaded and linked on your computer.

To get started, you will need to clone the GitHub repository you created. We will be using this repository for the duration of this course.

To clone a GitHub repository, using GitKraken. First, Click Clone a repo. Then, choose where you’d like the repository to be on your computer using the ‘Browse’ button. Then Copy + Paste the url to the project you want to clone where it says ‘URL’. Then click `Clone the repository’.

It is simple to clone a GitHub repository using GitKraken. First, sign in to GitKraken; under Repository Management > Clone tab, click Clone a repo. Then, choose where you’d like the repository to be on your computer using the Browse button. You will need to Copy + Paste your new repository’s URL (web address) to where it says URL.

Navigate to your repository on GitHub to copy the URL. Copying and pasting is advisable because any little typo will inhibit cloning.

Now you are ready to click Clone the repository! It will ask you if you’d like to Open Now, click that.

3.5.2 Create a branch

Handling branches is where you unleash the real benefit of GitHub, but it’s also the confusing part to get the hang of.

branch = a unique working copy of file changes of a GitHub repository. A branch can be local and remote.

Using branches is where you unleash the real benefit of GitHub, but it's also the confusing part to get a hang of. Currently, the repository we just made has a main branch. The main branch is the default branch and is The main branch is what you want most curated, working, and always ready for others to use!

The best way to get a grasp on what the branches represent is to create one and start using it.

In GitKraken we can create a new branch; this will be your working copy. First, click the Branch button. Next, type in a branch name in the box that the cursor is blinking in. In our example, we are calling it a-new-branch. Then click Enter! Now you have a new branch!

In GitKraken we can create a new branch; this will be your working copy. First, click the Branch button. Next, type in a branch name in the box that the cursor is blinking in. In our example, we are calling it a-new-branch. Now click Enter! Now you have a new branch!

Now that we’ve created this new branch, we can do what we like with a-new-branch, knowing that main will remain safe.

Now we can edit our files and code however we normally would. Go ahead and make an edit to any file in your new repository.

Now that we are on our new branch, we will Edit a file as you normally would). Make any change to the README.md file in your local repository. Save the changes you make to that file. Now the changes you make to any files in this repository should show up here.

If you’ve made a change to any file in your repository, it will appear in GitKraken and you can click on it to see the differences.

To see how the file is different, click on the filename in the right side of the GitKraken screen. To add the file to the staged change, click on the Stage File button.

If we want to add these file changes to our current branch, we need to commit them.

To commit files is to include your set of file changes to your current branch. Write a commit message that explains the changes. Now click on the button that says Commit changes to 1 file.

add = to stage your files to be committed to your current branch.
commit = include your set of file changes to your current branch.

Now that we have changes committed to our branch we are ready to add them to the remote, internet copy! To do this, we will need to push our branch.

To push changes on GitHub means to add changes to the remote repository on GitHub.

To push means to add changes that are on your new branch to the remote branch (internet version). You can select your origin, which refers to where your branch is stored on the internet. Choose your origin in the dropdown menu and click Submit.

origin = where your branch is stored on the internet (remotely) push = to add changes from your branch to its remote counterpart. In other words, put your changes online.

To push means to add changes that are on your new branch to the remote branch (internet version). The word origin just refers to where your branch is stored on the internet. Choose your origin in the dropdown menu and click Submit.

After a variable number of commits, your branch, called a-new-branch, is a different version of the original code base that may have a nifty improvement to it. But our main goal is to add that nifty improvement to the main branch. To start this process of bringing in new changes to the main curated repository, we will create a pull request.

pull request = A way to propose changes from a branch to be included into the main repository.
From GitHub: > Pull requests let you tell others about changes you’ve pushed to a GitHub repository. Once a pull request is sent, interested parties can review the set of changes, discuss potential modifications, and even push follow-up commits if necessary.

Pull requests are the meat of how code changes and improvements get reviewed and incorporated! A vast majority of the benefits of incorporating GitHub into your workflow centers around fully utilizing the power of pull requests!

After a variable number of commits, your branch, called a-new-branch is a different version of the original code base that may have a nifty improvement to it. But our main goal is to add that nifty improvement to the main branch. To start this process of bringing in new changes to the main curated repository, we will create a pull request. A pull request will show us the difference between main and a-new-branch so you scrutinize this feature before adding it to the main branch.

Now we can open up a pull request if we go to our GitHub repository on GitHub.

If we’ve recently pushed our changes, we can go to our repository on GitHub and a yellow banner will prompt us to start a new pull request.

After you click on Compare & pull request you’ll be taken to a screen where you can add information about your changes. After you are done writing your description, click Create Pull Request! (If you don’t have your pull request description perfect don’t worry about it, you can always edit it later).

Congrats! You’ve just opened a pull request!

In an upcoming chapter we will discuss what information you should put in this pull request description to make it pertinent for yourself and whoever reviews your pull request.

To summarize, below is what this workflow looks like:

An overview of the GitHub workflow. Uploaded a project to GitHub. Clone that project to your computer. You will only have to do this cloning and set up step once per repository/project. Now let’s say you have an update in mind. Make a new branch to work off of. Edit code as you normally would. Add and commit the file changes to your branch. Push the changes to your remote branch. Repeat these steps until it you’ve addressed the update you had in mind Now, Create a pull request.

One more note: if you do want to use the command line or if you want to know more about the specific git commands that GitKraken is doing for you (which might be handy for troubleshooting), the specific commands that can be used or Googled at each step are highlighted in red in the images - you just need to add git before them! For example, you would type git push in your command line in order to push your code. Or if you’d like to know more about pushing code, you can google git push.

A simple illustration of how you can update codes and work together on GitHub. After you create and make updates to your new branch, you can push your changes to the GitHub cloud. Then, on the GitHub, you create a pull request, for others to come and review. Upon receiving the pull request, other users can review the changes and commit updates to the main branch.

3.6 More resources for learning GitHub

If you have any feedback on this chapter, please fill out this form, we’d love to hear your feedback!

References

Bryan, Jenny, and Jim Hester. 2021. “Happy Git and GitHub for the useR.” https://happygitwithr.com/.
“Creating a Pull Request.” 2021. GitHub Docs. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request.
“Introduction to GitHub.” 2021. GitHub Learning Lab. https://lab.github.com/githubtraining/introduction-to-github.
Radigan, Dan. 2021. “Why Code Reviews Matter (and Actually Save Time!).” Atlassian. https://www.atlassian.com/agile/software-development/code-reviews.
Team, The GitHub Training. 2021a. “First Day on GitHub.” GitHub Learning Lab. https://lab.github.com/githubtraining/first-day-on-github.
———. 2021b. “First Week on GitHub.” GitHub Learning Lab. https://lab.github.com/githubtraining/first-week-on-github.
Vickery, Rebecca. 2019. “Introduction to Github for Data Scientists.” Medium. https://towardsdatascience.com/introduction-to-github-for-data-scientists-2cf8b9b25fba.