Title

Chapter 7 Writing durable code

7.1 Learning Objectives

This chapter will demonstrate how to: Convert common code pitfalls into more readable and durable alternatives. Write code that is more readable and durable. Write code that is more useable by others.  Understand the importance of following a code style.

7.2 General principles

7.2.0.1 Work on your code iteratively

Getting your code to work the first time is the first step, but don’t stop there! Just like in writing a manuscript you wouldn’t consider your first draft a final draft, your polishing code works best in an iterative manner. Although you may need to set it aside for the day to give your brain a rest, return to your code later with fresh eyes and try to look for ways to improve upon it!

7.2.0.2 Prioritize readability over cleverness

Some cleverness in code can be helpful, too much can make it difficult for others (including your future self!) to understand. If cleverness comprises the readability of your code, it probably is not worth it. Clever but unreadable code won’t be re-used or trusted by others (AGAIN, including your future self!).

What does readable code look like? Orosz (2019) has some thoughts on writing readable code:

Readable code starts with code that you find easy to read. When you finish coding, take a break to clear your mind. Then try to re-read the code, putting yourself in the mindset that you know nothing about the changes and why you made them.

Can you follow along with your code? Do the variables and method names help understand what they do? Are there comments at places where just the code is not enough? Is the style of the code consistent across the changes?

Think about how you could make the code more readable. Perhaps you see some functions that do too many things and are too long. Perhaps you find that renaming a variable would make its purpose clearer. Make changes until you feel like the code is as expressive, concise, and pretty as it can be.

The real test of readable code is others reading it. So get feedback from others, via code reviews. Ask people to share feedback on how clear the code is. Encourage people to ask questions if something does not make sense. Code reviews - especially thorough code reviews - are the best way to get feedback on how good and readable your code is.

Readable code will attract little to no clarifying questions, and reviewers won’t misunderstand it. So pay careful attention to the cases when you realize someone misunderstood the intent of what you wrote or asked a clarifying question. Every question or misunderstanding hints to opportunities to make the code more readable.

A good way to get more feedback on the clarity of your code is to ask for feedback from someone who is not an expert on the codebase you are working on. Ask specifically for feedback on how easy to read your code is. Because this developer is not an expert on the codebase, they’ll focus on how much they can follow your code. Most of the comments they make will be about your code’s readability.

We’ll talk a bit more about code review in an upcoming chapter!

More reading:

7.2.0.3 DRY up your code

DRY is an acronym: “Don’t repeat yourself” (Smith 2013).

“I hate code, and I want as little of it as possible in our product.”

Diederich (2012)

If you find yourself writing something more than once, you might want to write a function, or store something as a variable. The added benefit of writing a function is you might be able to borrow it in another project. DRY code is easier to fix and maintain because if it breaks, its easier to fix something in one place, than in 10 places.

DRY code is easier on the reviewer because they don’t have to review the same thing twice, but also because they don’t have to review the same thing twice. ;) DRYing code is something that takes some iterative passes and edits through your code, but in the end DRY code saves you and your collaborators time and can be something you reuse again in a future project!

Here’s an slightly modified example from Bernardo (2021) for what DRY vs non-DRY code might look like:

paste('Hello','John', 'welcome to this course')
paste('Hello','Susan', 'welcome to this course')
paste('Hello','Matt', 'welcome to this course')
paste('Hello','Anne', 'welcome to this course')
paste('Hello','Joe', 'welcome to this course')
paste('Hello','Tyson', 'welcome to this course')
paste('Hello','Julia', 'welcome to this course')
paste('Hello','Cathy', 'welcome to this course')

Could be functional-ized and rewritten as:

GreetStudent <- function(name) {
 greeting <- paste('Hello', name, 'welcome to this course')
 return(greeting)
}

class_names <- c('John', 'Susan', 'Matt' ,'Anne', 'Joe', 'Tyson', 'Julia', 'Cathy')

lapply(class_names, GreetStudent)

Now, if you wanted to edit the greeting, you’d only need to edit it in the function, instead of in each instance.

More reading about this idea:

7.2.0.4 Don’t be afraid to delete and refresh a lot

Don’t be afraid to delete it all and re-run (multiple times). This includes refreshing your kernel/session in your IDE.

In essence, this is the data science version of “Have you tried turning it off and then on again?” Some bugs in your code exist or are not realized because old objects and libraries have overstayed their welcome in your environment.

To refresh your kernel in python jupyter lab, go to Kernel then choose one of the Restart Kernel options. You can also use the keyboard shortcut of Escape and pressing 0 twice.  In RStudio, go to the dropdown arrow next to Run and choose Restart R and Clear Output. Or you can press the broom, OR you can use the keyboard shortcut of Ctrl and shift and F10 (For Mac) or Cmd and shift and F10 ( for Windows).

Why do you need to refresh your kernel/session?

As a quick example of why refreshing your kernel/session, let’s suppose you are troubleshooting something that centers around an object named some_obj but then you rename this object to iris_df. When you rename this object you may need to update this other places in the code. If you don’t refresh your environment while working on your code, some_obj will still be in your environment. This will make it more difficult for you to find where else the code needs to be updated.

Refreshing your kernel/session goes beyond objects defined in your environment, and also can affect packages and dependencies loaded or all kinds of other things attached to your kernel/session.

As a quick experiment, try this in your Python or R environment:

The dir() and ls() functions list your defined variables in your Python and R environments respectively.

In Python:

some_obj=[]
dir()

Now refresh your Kernel and re-run dir()

dir()

You should see you no longer have some_obj listed as being defined in your environment.

In R

some_obj <- c()
ls()

Now refresh your session and re-run ls()

ls()

You should see you no longer have some_obj listed as being defined in your environment.

Keeping around old code and objects is generally more of a hindrance than a time saver. Sometimes it can be easy to get very attached to a chunk of code that took you a long time to troubleshoot but there are three reasons you don’t need to stress about deleting it:

  1. You might write better code on the second try (or third or n’th).
  2. Keeping around old code makes it harder for you to write and troubleshoot new better code – it’s easier to confuse yourself. Sometimes a fresh start can be what you need.
  3. With version control you can always return to that old code! (We’ll dive more into version control later on, but you’ve started the process by uploading your code to GitHub in chapter 4!)

This means you should not comment out old code. Just delete it! No code is so precious that you need to keep it commented out (particularly if you are using version control and you can retrieve it in other ways should you need it).

Related to this, if you want to be certain that your code is reproducible, it’s worth deleting all your output, and re-running everything with a fresh session. The first step to knowing if your analysis is reproducible is seeing if you can repeat it yourself!

7.2.0.5 Use code comments effectively

Good code comments are a part of writing good, readable code! Your code is more likely to stand the test of time for longer if others, including yourself in the future, can see what’s happening enough to trust it themselves. This will encourage others to use your code and help you maintain it!

‘Current You’ who is writing your code may know what is happening but ‘Future You’ will have no idea what ‘Current You’ was thinking (Spielman, n.d.):

‘Future You’ comes into existence about one second after you write code, and has no idea what on earth Past You was thinking. Help out ‘Future You’ by adding lots of comments! ‘Future You’ next week thinks Today You is an idiot, and the only way you can convince ‘Future You’ that Today You is reasonably competent is by adding comments in your code explaining why Today You is actually not so bad.

Your code and your understanding of it will fade soon after you write it, leaving your hard work to deprecate. Code that works is a start, but readable AND working code is best!

Comments can help clarify at points where your code might need further explanation. The best code comments explain the why of what you are doing. The act of writing them can also help you think out your thought process and perhaps identify a better solution to the odd parts of your code.

(From Savonen (2021a))

More reading:

7.2.0.6 Use informative variable names

Try to avoid using variable names that have no meaning like tmp or x, or i. Meaningful variable names make your code more readable! Additionally, variable names that are longer than one letter are much easier to search and replace if needed. One letter variables are hard to replace and hard to read. Don’t be afraid of long variable names, they are very unlikely to be confused!

1 Write intention-revealing names.
2 Use consistent notation for naming convention.
3 Use standard terms.
4 Do not number a variable name.
5 When you find another way to name variable, refactor as fast as possible.

(Hobert 2018)

More reading:

7.2.0.7 Follow a code style

Just like when writing doesN”t FoLLOW conv3nTi0Ns OR_sPAcinng 0r sp3llinG, it can be distracting, the same goes for code. Your code may even work all the same, just like you understood what I wrote in that last sentence, but a lack of consistent style can make require more brain power from your readers for them to understand. For reproducibility purposes, readability is important! The easier you can make it on your readers, the more likely they will be able to understand and reproduce the results.

There are different style guides out there that people adhere to. It doesn’t matter so much which one you choose, so much that you pick one and stick to it for a particular project.

Python style guides:

R style guides:

Although writing code following a style as you are writing is a good practice, we’re all human and that can be tricky to do, so we recommend using an automatic styler on your code to fix up your code for you. For Python code, you can use python black and for R, styler.

7.2.0.8 Organize the structure of your code

Readable code should follow an organized structure. Just like how outlines help the structure of manuscript writing, outlines can also help the organization of code writing.

A tentative outline for a notebook might look like this:

  1. A description of the purpose of the code (in Markdown).
  2. Import the libraries you will need (including sourcing any custom functions).
  3. List any hard-coded variables.
  4. Import data.
  5. Do any data cleaning needed.
  6. The main thing you need to do.
  7. Print out session info.

Note that if your notebook gets too long, you may want to separate out things in their own scripts. Additionally, it’s good practice to keep custom functions in their own file and import them. This allows you to use them elsewhere and also keeps the main part of the analysis cleaner.

7.2.0.9 Set the seed if your analysis has randomness involved

If any randomness is involved in your analysis, you will want to set the seed in order for your results to be reproducible.

In brief, computers don’t actually create numbers randomly they create numbers pseudorandomly. But if you want your results to be reproducible, you should give your computer a seed by which to create random numbers. This will allow anyone who re-runs your analysis to have a positive control and eliminate randomness as a reason the results were not reproducible.

For more on how setting the seed works – a quick experiment

To illustrate how seeds work, run we’ll run a quick experiment with setting the seed here:

First let’s set a seed (it doesn’t matter what number we use, just that we pick a number), so let’s use 1234 and then create a “random” number.

# Set the seed:
set.seed(1234)

# Now create a random number again
runif(1)
## [1] 0.1137034

Now if we try a different seed, we will get a different “random” number.

# Set a different seed:
set.seed(4321)

# Now create a random number again
runif(1)
## [1] 0.334778

But, if we return to the original seed we used, 1234, we will get the original “random” number we got.

# Set this back to the original seed
set.seed(1234)

# Now we'll get the same "random" number we got when we set the seed to 1234 previously
runif(1)
## [1] 0.1137034

More reading:

7.2.0.10 To review general principles:

General principles of writing reproducible code. Work on your code iteratively. Prioritize readability over cleverness. DRY up your code. Don't be afraid to delete and refresh. Use code comments effectively. Use informative variable names. Follow a code style.

7.3 More reading on best coding practices

There’s so many opinions and strategies on best practices for code. And although a lot of these principles are generally applicable, not all of it is one size fits all. Some code practices are context-specific so sometimes you may need to pick and choose what works for you, your team, and your particular project.

7.3.0.2 R specific:

7.4 Get the exercise project files (or continue with the files you used in the previous chapter)

Get the Python project example files

Click this link to download.

Now double click your chapter zip file to unzip. For Windows you may have to follow these instructions.

Get the R project example files

Click this link to download.

Now double click your chapter zip file to unzip. For Windows you may have to follow these instructions.

7.5 Exercise 1: Make code more durable!

7.5.1 Organize the big picture of the code

Before diving in line-by-line it can be helpful to make a code-outline of sorts.

What are the main steps you need to accomplish in this notebook? What are the starting and ending points for this particular notebook?

For example, for this make-heatmap notebook we want to:

  1. Set up analysis folders and declare file names.
  2. Install the libraries we need.
  3. Import the gene expression data and metadata.
  4. Filter down the gene expression data to genes of interest – in this instance the most variant ones.
  5. Clean the metadata.
  6. Create an annotated heatmap.
  7. Save the heatmap to a PNG.
  8. Print out the session info!
Python version of the exercise

The exercise: Polishing code

  1. Start up JupyterLab with running jupyter lab from your command line.
  2. Activate your conda environment using conda activate reproducible-python.
  3. Open up your notebook you made in the previous chapter make-heatmap.ipynb
  4. Work on organizing the code chunks and adding documentation to reflect the steps we’ve laid out in the previous section, you may want to work on this iteratively as we dive into the code.
  5. As you clean up the code, you should run and re-run chunks to see if they work as you expect. You will also want to refresh your environment to help you develop the code (sometimes older objectives stuck in your environment can inhibit your ability to troubleshoot). In Jupyter, you refresh your environment by using the refresh icon in the toolbar or by going to Restart Kernel.

Set the seed

Rationale: The clustering in the analysis involves some randomness. We need to set the seed!

Before:
Nothing! We didn’t set the seed before!

After: You can pick any number; doesn’t have to be 1234.

random.seed(1234)

Use a relative file path

Rationale:
Absolute file paths only work for the original writer of the code and no one else. But if we make the file path relative to the project set up, then it will work for whomever has the project repository (Mustafeez 2021).

Additionally, we can set up our file path names using f-Strings so that we only need to change the project ID and the rest will be ready for a new dataset (Python 2021)!

Although this requires more lines of code, this set up is much more flexible and ready for others to use.

Before:

df1=pd.read_csv('~/a/file/path/only/I/have/SRP070849.tsv', sep='\t')
mdf=pd.read_csv('~/a/file/path/only/I/have/SRP070849_metadata.tsv', sep='\t')

After:

# Declare project ID
id = "SRP070849"

# Define the file path to the data directory
data_dir = Path(f"data/{id}")

# Declare the file path to the gene expression matrix file
data_file = data_dir.joinpath(f"{id}.tsv")

# Declare the file path to the metadata file
# inside the directory saved as `data_dir`
metadata_file = data_dir.joinpath(f"metadata_{id}.tsv")

# Read in metadata TSV file
metadata = pd.read_csv(metadata_file, sep="\t")

# Read in data TSV file
expression_df = pd.read_csv(data_file, sep="\t")

Related readings:

Avoid using mystery numbers

Rationale:
Avoid using numbers that don’t have context around them in the code. Include the calculations for the number, or if it needs to be hard-coded, explain the rationale for that number in the comments. Additionally, using variable and column names that tell you what is happening, helps clarify what the number represents.

Before:

df1['calc'] =df1.var(axis = 1, skipna = True)
df2=df1[df1.calc >float(10)]

After:

# Calculate the variance for each gene
expression_df["variance"] = expression_df.var(axis=1, skipna=True)

# Find the upper quartile for these data
upper_quartile = expression_df["variance"].quantile([0.90]).values

# Filter the data choosing only genes whose variances are in the upper quartile
df_by_var = expression_df[expression_df.variance > float(upper_quartile)]

Related readings:
- Stop Using Magic Numbers and Variables in Your Code by Aaberge (2021).

Add checks

Rationale:
Just because your script ran without an error that stopped the script doesn’t mean it is accurate and error free. Silent errors are the most tricky to solve, because you often won’t know that they happened!

A very common error is data that is in the wrong order. In this example we have two data frames that contain information about the same samples. But in the original script, we don’t ever check that the samples are in the same order in the metadata and the gene expression matrix! This is a really easy way to get incorrect results!

Before:
Nothing, we didn’t check for this before.

After:

print(metadata["refinebio_accession_code"].tolist() == expression_df.columns.tolist())

Continue to try to apply the general advice we gave about code to your notebook! Then, when you are ready, take a look at what our “final” version looks like in the example Python repository. (Final here is in quotes because we may continue to make improvements to this notebook too – remember what we said about iterative?)

R version of the exercise

About the tidyverse:.

Before we dive into the exercise, a word about the tidyverse. The tidyverse is a highly useful set of packages for creating readable and reproducible data science workflows in R. In general, we will opt for tidyverse approaches in this course, and strongly encourage you to familiarize yourself with the tidyverse if you have not. We will point out some instances where tidyverse functions can help you DRY up your code as well as make it more readable!

More reading on the tidyverse:

The exercise: Polishing code

  1. Open up RStudio.
  2. Open up the notebook you created in the previous chapter.
  3. Now we’ll work on applying the principles from this chapter to the code. We’ll cover some of the points here, but then we encourage you to dig into the fully transformed notebook we will link at the end of this section.
  4. Work on organizing the code chunks and adding documentation to reflect the steps we’ve laid out in the previous section, you may want to work on this iteratively as we dive into the code.
  5. As you clean up the code, you should run and re-run chunks to see if they work as you expect. You will also want to refresh your environment to help you develop the code (sometimes older objectives stuck in your environment can inhibit your ability to troubleshoot). In RStudio, you refresh your environment by going to the Run menu and using Restart R and refresh clear output.

Set the seed

Rationale: The clustering in the analysis involves some randomness. We need to set the seed!

Before:
Nothing! We didn’t set the seed before!

After: You can pick any number; doesn’t have to be 1234.

set.seed(1234)

Get rid of setwd

Rationale:
setwd() almost never work for anyone besides the one person who wrote it. And in a few days/weeks it may not work for them either.

Before:

setwd("Super specific/filepath/that/noone/else/has/")

After:
Now that we are working from a notebook, we know that the default current directory is wherever the notebook is placed (Xie, Dervieux, and Riederer 2020).

Related readings:

Give the variables more informative names

Rationale:
xx doesn’t tell us what is in the data here. Also by using the readr::read_tsv() from tidyverse we’ll get a cleaner, faster read and won’t have to specify sep argument. Note we are also fixing some spacing and using <- so that we can stick to readability conventions.

You’ll notice later

Before:

xx=read.csv("metadata_SRP070849.tsv", sep = "\t")

After:

metadata <- readr::read_tsv("metadata_SRP070849.tsv")

Related readings:

DRYing up data frame manipulations

Rationale:
This chunk of code can be very tricky to understand what it is doing. What is happening with df1 and df2? What’s being filtered out? etc. Code comments would certainly help understanding, but even better, we can DRY this code up and make the code clearer on its own.

Before:
It may be difficult to tell from looking at the before code because there are no comments and it’s a bit tricky to read, but the goal of this is to:

  1. Calculate variances for each row (each row is a gene).
  2. Filter the original gene expression matrix to only genes have a bigger variance (here we use arbitrarily 10 as a filter cutoff).
df=read.csv("SRP070849.tsv", sep="\t")
sums=matrix(nrow = nrow(df), ncol = ncol(df) - 1)
for(i in 1:nrow(sums)) { sums[i, ] <- sum(df[i, -1])
}
df2=df[which(df[, -1] >= 10), ]
variances=matrix(nrow = nrow(dds), ncol = ncol(dds) - 1)
  for(i in 1:nrow(dds)) {
    variances[i, ] <- var(dds[i, -1])
}

After:

Let’s see how we can do this in a DRY’er and clearer way.

We can:
1) Add comments to describe our goals.
2) Use variable names that are more informative.
3) Use the apply functions to do the loop for us – this will eliminate the need for unclear variable i as well.
4) Use the tidyverse to do the filtering for us so we don’t have to rename data frames or store extra versions of df.

Here’s what the above might look like after some refactoring. Hopefully you find this is easier to follow and total there’s less lines of code (but also has comments too!).

# Read in data TSV file
expression_df <- readr::read_tsv(data_file) %>%
  # Here we are going to store the gene IDs as row names so that
  # we can have only numeric values to perform calculations on later
  tibble::column_to_rownames("Gene")

# Calculate the variance for each gene
variances <- apply(expression_df, 1, var)

# Determine the upper quartile variance cutoff value
upper_var <- quantile(variances, 0.75)

# Filter the data choosing only genes whose variances are in the upper quartile
df_by_var <- data.frame(expression_df) %>%
  dplyr::filter(variances > upper_var)

Add checks

Rationale: Just because your script ran without an error that stopped the script doesn’t mean it is accurate and error free. Silent errors are the most tricky to solve, because you often won’t know that they happened!

A very common error is data that is in the wrong order. In this example we have two data frames that contain information about the same samples. But in the original script, we don’t ever check that the samples are in the same order in the metadata and the gene expression matrix! This is a really easy way to get incorrect results!

Before:

Nothing... we didn't check for this :(

After:

# Make the data in the order of the metadata
expression_df <- expression_df %>%
  dplyr::select(metadata$refinebio_accession_code)

# Check if this is in the same order
all.equal(colnames(expression_df), metadata$refinebio_accession_code)

Continue to try to apply the general advice we gave about code to your notebook! Then, when you are ready, take a look at what our “final” version looks like in the example R repository. (Final here is in quotes because we may continue to make improvements to this notebook too – remember what we said about iterative?)

Now that we’ve made some nice updates to the code, we are ready to do a bit more polishing by adding more documentation! But before we head to the next chapter, we can style the code we wrote automatically by using automatic code stylers!

7.6 Exercise 2: Style code automatically!

Styling Python code automatically

Run your notebook through black. First you’ll need to install it by running this command in a Terminal window in your JupyterLab.

Make sure you are running this within your conda environment.

conda activate reproducible-python

Now install this python black.

pip install black[jupyter]

To record your conda environment run this command.

conda env export > environment-record.yml

Now you can automatically style your code by running this command from your Console (be sure to replace the "make-heatmap.Rmd" with whatever you have named your notebook:

python -m black make-heatmap.ipynb

You should get a message that your notebook was styled!

Styling R code automatically

Let’s run your notebook through styler. First you’ll need to install it and add it to your renv.

install.packages("styler")

Then add it to your renv by running:

renv::snapshot()

Now you can automatically style your code by running this command from your Console (be sure to replace the "make-heatmap.Rmd" with whatever you have named your notebook:

styler::style_file("make-heatmap.Rmd")

You should get a message that your notebook was styled!

Before you are done with this exercise, there’s one more thing we need to do: upload the latest version to GitHub. Follow these instructions to add the latest version of your notebook to your GitHub repository. Later, we will practice and discuss how to more fully utilize the features of GitHub but for now, just drag and drop it as the instructions linked describe.

Any feedback you have regarding this exercise is greatly appreciated; you can fill out this form!

References

Aaberge, Martin Andersson. 2021. “Stop Using Magic Numbers and Variables in Your Code.” Medium. https://betterprogramming.pub/stop-using-magic-numbers-and-variables-in-your-code-4e86f008b84c.
Bernardo, Ivo. 2021. “Best Practices for R Programming.” Medium. https://towardsdatascience.com/best-practices-for-r-programming-ec0754010b5a.
“Best Practices for Writing R CodeProgramming with R.” 2021. https://swcarpentry.github.io/r-novice-inflammation/06-best-practices-R/.
Bryan, Jenny. 2017. “Project-Oriented Workflow.” Tidyverse Blog. https://www.tidyverse.org/blog/2017/12/workflow-vs-script/.
Cannell, Brad. 2021. 9 Coding Best Practices R for Epidemiology. https://brad-cannell.github.io/r4epi/.
Carrie Wright, Stephanie C. Hicks and Roger D. Peng, Shannon E. Ellis. n.d. Chapter 1 Introduction to the Tidyverse Tidyverse Skills for Data Science. Accessed November 2, 2021. http://jhudatascience.org/tidyversecourse/intro.html.
Chang, Winston. 2021. “Generating Random Numbers.” http://www.cookbook-r.com/Numbers/Generating_random_numbers/.
Cronin, Mike. 2019. “What Makes a Good Code Comment?” Medium. https://itnext.io/what-makes-a-good-code-comment-5267debd2c24.
Csendes, Gerold. 2020. “15 Common Coding Mistakes Data Scientist Make in Python (and How to Fix Them).” Medium. https://towardsdatascience.com/15-common-coding-mistakes-data-scientist-make-in-python-and-how-to-fix-them-7760467498af.
Diederich, Jack. 2012. “Stop Writing Classes.” PyVideo.org. https://pyvideo.org/pycon-us-2012/stop-writing-classes.html.
Dubel, Marcin. 2021. “5 Tips for Writing Clean R Code - Leave Your Code Reviewer Commentless.” Appsilon End­ to­ End Data Science Solutions. https://appsilon.com/write-clean-r-code/.
Frazee, Alyssa. 2014. “Some Internet Wisdom on R Documentation.” http://alyssafrazee.com/2014/04/20/rdocs.html.
Geeks, Geeks for. 2018. “F-Strings in Python.” GeeksforGeeks. https://www.geeksforgeeks.org/formatted-string-literals-f-strings-python/.
Good, and Nicholas, Rachel Severson. 2021. Chapter 5 Reproducible Research #1 R Programming for Research. https://geanders.github.io/RProgrammingForResearch/reproducible-research-1.html.
“Google’s R Style Guide.” 2021. Styleguide. https://google.github.io/styleguide/Rguide.html.
Heil, Benjamin J. 2020. “Reproducible Programming for Biologists Who Code - Part 2: Should Dos.” AutoBenCoding. https://autobencoder.com/2020-06-30-shoulddo/.
Héroux, Martin. 2018. “Don’t Repeat Yourself: Python Functions.” Scientifically Sound. https://scientificallysound.org/2018/07/19/python-functions/.
Hobert, Kevin. 2018. “Writing VariableInformative, Descriptive & Elegant.” Medium. https://medium.datadriveninvestor.com/writing-variable-informative-descriptive-elegant-1dd6f3f15db3.
Keeton, BJ. 2019. “How to Comment Your Code Like a Pro: Best Practices and Good Habits.” Elegant Themes. https://www.elegantthemes.com/blog/wordpress/how-to-comment-your-code-like-a-pro-best-practices-and-good-habits.
Klinefelter, Sarah. 2016. DRY Programming PracticesMetova.” https://metova.com/dry-programming-practices/.
Koehrsen, Will. 2019. “Data Scientists: Your Variable Names Are Awful. Here’s How to Fix Them.” Medium. https://towardsdatascience.com/data-scientists-your-variable-names-are-awful-heres-how-to-fix-them-89053d2855be.
Kostyuk, Victor. 2020. “Data Science Python Best Practices.” BCG GAMMA. https://medium.com/bcggamma/data-science-python-best-practices-fdb16fdedf82.
Leah Wasser, Jenny Palomino. 2019. DRY Code and Modularity.” Earth Data Science - Earth Lab. https://www.earthdatascience.org/courses/intro-to-earth-data-science/write-efficient-python-code/intro-to-clean-code/dry-modular-code/.
Max Joseph, Software Carpentry, Leah Wasser. 2017. “Write Efficient Scientific Code - the DRY (Don’t Repeat Yourself) Principle.” Earth Data Science - Earth Lab. https://www.earthdatascience.org/courses/earth-analytics/automate-science-workflows/write-efficient-code-for-science-r/.
Meza, Frank. 2018. “The Value of Code Documentation.” Olio Apps. https://www.olioapps.com/blog/the-value-of-code-documentation/.
Mustafeez, Anusheh Zohair. 2021. “Absolute Vs. Relative Path.” Educative. https://www.educative.io/edpresso/absolute-vs-relative-path.
Orosz, Gergely. 2019. “Readable Code.” https://blog.pragmaticengineer.com/readable-code/.
PEP 8 – Style Guide for Python Code.” 2021. Python.org. https://www.python.org/dev/peps/pep-0008/.
“Python Examples of Pathlib.Path.joinpath.” 2021. https://www.programcreek.com/python/example/114070/pathlib.Path.joinpath.
Python, Real. 2021. “Python 3’s f-Strings: An Improved String Formatting Syntax (Guide) – Real Python.” https://realpython.com/python-f-strings/.
“Read a Delimited File (Including CSV and TSV) into a Tibble — Read_delim.” n.d. Accessed November 2, 2021. https://readr.tidyverse.org/reference/read_delim.html.
Riffomonas Project. 2021. “Keeping R Code DRY with Functions: Don’t Repeat Yourself! (Cc096).” https://www.youtube.com/watch?v=XSRO4VKD-pc.
Savonen, Candace. 2021a. Chapter 8 Creating Clarifying Code Comments Documentation and Usability. https://jhudatascience.org/Documentation_and_Usability/creating-clarifying-code-comments.html#creating-clarifying-code-comments.
Saxena, Pranjal. 2021. “6 Mistakes Every Python Beginner Should Avoid While Coding.” Medium. https://towardsdatascience.com/6-mistakes-every-python-beginner-should-avoid-while-coding-e57e14917942.
Shapiro, Joshua A., Candace L. Savonen, Allegra G. Hawkins, Chante J. Bethell, Deepashree Venkatesh Prasad, Casey S. Greene, and Jaclyn N. Taroni. 2021. Childhood Cancer Data Lab Training Modules (version 2021-june).
Smith, Steve. 2013. “Don’t Repeat Yourself - Programmer 97-Things.” https://web.archive.org/web/20131204221336/http://programmer.97things.oreilly.com/wiki/index.php/Don't_Repeat_Yourself.
Soage, Jose Carlos. 2020. SET SEED in R with Set.seed() Function ▷ [WITH EXAMPLES].” R CODER. https://r-coder.com/set-seed-r/.
Spertus, Ellen. 2021. “Best Practices for Writing Code Comments.” Stack Overflow Blog. https://stackoverflow.blog/2021/07/05/best-practices-for-writing-code-comments/.
Spielman, Stephanie. n.d. “Introduction to r - Cb2r Data Science Workshop, Summer 2020.” https://github.com/sjspielman/cb2r-ds-summer2020/blob/71cb11277e7383292bf727841ab5fa4ed43cfcbe/resources/introduction_to_R.Rmd#L92.
“Styleguide.” 2021. Styleguide. https://google.github.io/styleguide/pyguide.html.
Team, Analytics Vidhya. 2019. “What Is Tidyverse Tidyverse Package in R.” Analytics Vidhya. https://www.analyticsvidhya.com/blog/2019/05/beginner-guide-tidyverse-most-powerful-collection-r-packages-data-science/.
Tran, Khuyen. 2021. “Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable.” Medium. https://towardsdatascience.com/python-clean-code-6-best-practices-to-make-your-python-functions-more-readable-7ea4c6171d60.
Wickham, Hadley. 2019. “Style Guide · Advanced R.” http://adv-r.had.co.nz/Style.html.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.