Title

Chapter 5 Using Notebooks

5.1 Learning Objectives

This chapter will demonstrate how to: Use an Integrated Development Environment to aid your development of code. Understand how notebooks can increase the usability and readability of scientific code. Set up code as a notebook. Use the notebook’s interface to interactively develop code.

Notebooks are a handy way to have the code, output, and scientist’s thought process all documented in one place that is easy for others to read and follow.

The notebook environment is incredibly useful for reproducible data science for a variety of reasons:

5.1.0.1 Reason 1: Notebooks allow for tracking data exploration and encourage the scientist to narrate their thought process:

Ruby is looking at her computer that has a lovely notebook with a heatmap! Ruby says ‘Working from this notebook allows me to interactively develop on my data analysis and write down my thoughts about the process all in one place!’

Each executed code cell is an attempt by the researcher to achieve something and to tease out some insight from the data set. The result is displayed immediately below the code commands, and the researcher can pause and think about the outcome. As code cells can be executed in any order, modified and re-executed as desired, deleted and copied, the notebook is a convenient environment to iteratively explore a complex problem.

(Fangohr 2021)

5.1.0.2 Reason 2: Notebooks allow for easy sharing of results:

Ruby the researcher has her computer showing her notebook. Ruby says ‘Avi, here’s some output from this scientific notebook I’ve been developing from!’ Avi the associate says ‘This is so easy to follow and read, even though I didn’t write the code. Thanks for sharing your exciting results!’

Notebooks can be converted to html and pdf, and then shared as static read-only documents. This is useful to communicate and share a study with colleagues or managers. By adding sufficient explanation, the main story can be understood by the reader, even if they wouldn’t be able to write the code that is embedded in the document.

(Fangohr 2021)

5.1.0.3 Reason 3: Notebooks can be re-ran as a script or developed interactively:

Ruby is looking at her computer that has a lovely notebook with a heatmap! Ruby says ‘Yay! I just got the data for 5 more samples. Because of my handy notebook set up, I can easily call one command and re-run the analysis so it is updated with the new samples included!’

A common pattern in science is that a computational recipe is iteratively developed in a notebook. Once this has been found and should be applied to further data sets (or other points in some parameter space), the notebook can be executed like a script, for example by submitting these scripts as batch jobs.

(Fangohr 2021)

This can also be handy especially if you use automation to enhance the reproducibility of your analyses (something we will talk about in the advanced part of this course).

Because of all of these reasons, we encourage the use of computational notebooks as a means of enhancing reproducibility. (This course itself is also written with the use of notebooks!)

5.2 Get the exercise project files (or continue with the files you used in the previous chapter)

Get the Python project example files

Click this link to download.

Now double click your chapter zip file to unzip. For Windows you may have to follow these instructions.

Get the R project example files

Click this link to download.

Now double click your chapter zip file to unzip. For Windows you may have to follow these instructions.

5.3 Exercise: Convert code into a notebook!

5.3.1 Set up your IDE

For this chapter, we will create notebooks from our example files code. Notebooks work best with the integrated development environment (IDE) they were created to work with. IDE’s are sets of tools that help you develop your code. They are part “point and click” and part command line and include lots of visuals that will help guide you.

Set up a Python IDE

Install JupyterLab

  1. We advise using the conda method to install JupyterLab, because we will return to talk more about conda later on, so if you don’t have conda, you will need to install that first. We advise going with Anaconda instead of miniconda. To install Anaconda you can download from here. Download the installer, and follow the installation prompts.

  2. Start up Anaconda navigator. On the home page choose JupyterLab and click Install. This may take a few minutes.

  3. Now you should be able to click Launch underneath JupyterLab. This will open up a page in your Browser with JupyterLab.

Getting familiar with JupyterLab’s interface

The JupyterLab interface consists of a main work area containing tabs of documents and activities, a collapsible left sidebar, and a menu bar. The left sidebar contains a file browser, the list of running kernels and terminals, the command palette, the notebook cell tools inspector, and the tabs list.

The JupyterLab interface consists of a main work area containing tabs of documents and activities, a collapsible left sidebar, and a menu bar. The left sidebar contains a file browser, the list of running kernels and terminals, the command palette, the notebook cell tools inspector, and the tabs list.

The menu bar at the top of JupyterLab has top-level menus that expose actions available in JupyterLab with their keyboard shortcuts. The default menus are:

File: actions related to files and directories
Edit: actions related to editing documents and other activities
View: actions that alter the appearance of JupyterLab
Run: actions for running code in different activities such as notebooks and code consoles
Kernel: actions for managing kernels, which are separate processes for running code
Tabs: a list of the open documents and activities in the dock panel
Settings: common settings and an advanced settings editor
Help: a list of JupyterLab and kernel help links

Set up an R IDE

Install RStudio

  1. Install RStudio (and install R first if you have not already).
  2. After you’ve downloaded the RStudio installation file, double click on it and follow along with the installation prompts.
  3. Open up the RStudio application by double clicking on it.

Getting familiar with RStudio’s interface

The RStudio environment has four main panes, each of which may have a number of tabs that display different information or functionality. (their specific location can be changed under Tools, Global Options, Pane Layout). 1. The Editor pane is where you can write R scripts and other documents. Each tab here is its own document. This is your _text editor_, which will allow you to save your R code for future use. Note that change code here will not run automatically until you run it. 2. The Console pane is where you can _interactively_ run R code. There is also a Terminal tab here which can be used for running programs outside R on your computer 3. The Environment pane primarily displays the variables, sometimes known as _objects_ that are defined during a given R session, and what data or values they might hold. 4. The Help viewer pane has several tabs all of which are pretty important: The Files tab shows the structure and contents of files and folders (also known as directories) on your computer. The Plots tab will reveal plots when you make them. The Packages tab shows which installed packages have been loaded into your R session. The Help tab will show the help page when you look up a function. The Viewer pane will reveal compiled R Markdown documents

The RStudio environment has four main panes, each of which may have a number of tabs that display different information or functionality. (their specific location can be changed under Tools -> Global Options -> Pane Layout).

  1. The Editor pane is where you can write R scripts and other documents. Each tab here is its own document. This is your text editor, which will allow you to save your R code for future use. Note that change code here will not run automatically until you run it.
  1. The Console pane is where you can interactively run R code.
  • There is also a Terminal tab here which can be used for running programs outside R on your computer
  1. The Environment pane primarily displays the variables, sometimes known as objects that are defined during a given R session, and what data or values they might hold.
  1. The Help viewer pane has several tabs all of which are pretty important:
  • The Files tab shows the structure and contents of files and folders (also known as directories) on your computer.
  • The Plots tab will reveal plots when you make them
  • The Packages tab shows which installed packages have been loaded into your R session
  • The Help tab will show the help page when you look up a function
  • The Viewer pane will reveal compiled R Markdown documents

From Shapiro et al. (2021)

More reading about RStudio’s interface:

5.3.2 Create a notebook!

Now, in your respective IDE, we’ll turn our unreproducible scripts into notebooks. In the next chapter we will begin to dive into the code itself, but for now, we’ll get the notebook ready to go.

Set up a Python notebook
  1. Start a new notebook by going to New > Notebook.
  2. Then open up this chapter’s example code folder and open the make-heatmap.py file.

In Jupyter, you can create a new notebook by going to New > Notebook. Then open up this chapter’s example code folder and open the make-heatmap.py file.” style=“display: block; margin: auto;” /></p>
<ol start=

  • Create a new code chunk in your notebook.
  • The Jupyter interface has a ‘add a new chunk’ button, a delete chunk button, and a dropdown menu that allows you to choose the chunk type you’d like to add.

    1. Now copy and paste all of the code from make-heatmap.py into a new chunk. We will later break up this large chunk of code into smaller chunks that are thematic in the next chapter.
    2. Save your Untitled.ipynb file as something that tells us what it will end up doing like make-heatmap.ipynb.

    For more about using Jupyter notebooks see this by Mike (2021).

    Set up an R notebook
    1. Start a new notebook by going to File > New Files > R Notebook.
    2. Then open up this chapter’s example code folder and open the make_heatmap.R file. In RStudio, you can create a new notebook by going to File > New Files > R Notebook. Then open up this chapter’s example code folder and open the make_heatmap.R file.” style=“display: block; margin: auto;” /></li>
<li>Practice creating a new chunk in your R notebook by clicking the <code>Code</code> > <code>Insert Chunk</code> button on the toolbar or by pressing <code>Cmd+Option+I</code> (in Mac) or <code>Ctrl + Alt + I</code> (in Windows). (You can also manually type out the back ticks and <code>{}</code>)<br />
</li>
<li>Delete all the default text in this notebook but keep the header which is surrounded by <code>---</code> and looks like:<br />
</li>
</ol>
<pre><code>title: "R Notebook"
output: html_notebook</code></pre>
<p>You can feel free to change the title from <code>R Notebook</code> to something that better suits the contents of this notebook.<br />
5. Now copy and paste all of the code from <code>make_heatmap.R</code> into a new chunk. We will later break up this large chunk of code into smaller chunks that are thematic in the next chapter.<br />
6. Save your <code>untitled.Rmd</code> into something that tells us what it will end up doing like <code>make-heatmap.Rmd</code>.<br />
7. Notice that upon saving your <code>.Rmd</code> file, a new file <code>.nb.html</code> file of the same name is created. Open that file and choose <code>view in Browser</code>. If RStudio asks you to choose a browser, then choose a default browser.<br />
8. This shows the nicely rendered version of your analysis and snapshots whatever output existed when the <code>.Rmd</code> file was saved.</p>
<p>For <a href=more about using R notebooks see this by Xie, Allaire, and Grolemund (2018).

    Now that you’ve created your notebook, you are ready to start polishing that code!

    Any feedback you have regarding this exercise is greatly appreciated; you can fill out this form!

    References

    Fangohr, Hans. 2021. “Jupyter for Computational Science and Data Science.” Computational Science and Data Science. https://fangohr.github.io/blog/jupyter-for-computational-science-and-data-science.html.
    Mike, Driscoll. 2021. “Jupyter Notebook: An IntroductionReal Python.” https://realpython.com/jupyter-notebook-introduction/.
    Shapiro, Joshua A., Candace L. Savonen, Allegra G. Hawkins, Chante J. Bethell, Deepashree Venkatesh Prasad, Casey S. Greene, and Jaclyn N. Taroni. 2021. Childhood Cancer Data Lab Training Modules (version 2021-june).
    Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.