Title

Chapter 6 Managing package versions

6.1 Learning Objectives

This chapter will demonstrate how to: Understand that versions of software influence analysis outcomes. Find what package versions you are using. Print session info in all of your analyses so it is more clear what packages and versions you are using.

As we discussed previously, sometimes two different researchers can run the same code and same data and get different results!

Ruby the researcher and Avi the associate are both very confused and slightly horrified that they both ran the same code and data but received different results.

What Ruby and Avi may not realize is that although they have used the same code and data, the software packages that they have on each of their computers may be very different. Even if they have the same software packages, they likely don’t have the same package versions and these versions can influence results! Different computing environments are not only a headache to detangle, they also can influence the reproducibility of your results (Beaulieu-Jones and Greene 2017).

There are multiple ways to deal with variations in computing environments so that your analyses will be reproducible and we will discuss a few different strategies for tackling this problem in this course and its follow up course. But for now, we will start with the least intensive to implement: session info.

There are two strategies for dealing with software versions that we will discuss in this chapter. Either of these strategies can be used alone or you can use both. They address different aspects of the computing environment discrepancy problem.

6.1.1 Strategy 1: Session Info - record a list of your packages

One strategy to combat different software versions is to list the session info. The easiest (though not most comprehensive) method for handling differences in software versions is to have your code list details about your computing environment.

Session info can lead to clues as to why results weren’t reproducible. For example, if both Avi and Ruby ran notebooks, and included a session info print out, it may look like this:

Session info shows us that they have different R versions and different operating systems. The packages they have attached is rmarkdown but they also have different rmarkdown package versions. If Avi and Ruby have discrepancies in their results, the session info print out gives a record which may have clues for any discrepancies. This can give them items to look into for determining why the results didn’t reproduce as expected.

6.2 Get the exercise project files

Get the Python project example files

Click this link to download.

Now double click your chapter zip file to unzip. For Windows you may have to follow these instructions.

Get the R project example files

Click this link to download.

Now double click your chapter zip file to unzip. For Windows you may have to follow these instructions.

6.3 Exercise 1: Print out session info

Python version of the exercise

In your scientific notebook, you’ll need to add two items.
1. Add the import session_info to a code chunk at the beginning of your notebook.
2. Add session_info.show() to a new code chunk at the very end of your notebook.
2. Save your notebook as is. Note it will not run correctly until we address the issues with the code in the next chapter.

R version of the exercise

In your Rmd file, add a chunk in the very end that looks like this:

sessionInfo()

The output will look something like this:

## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] jsonlite_1.8.8   dplyr_1.1.4      compiler_4.3.2   gitcreds_0.1.2  
##  [5] promises_1.2.1   tidyselect_1.2.0 Rcpp_1.0.12      webshot2_0.1.2  
##  [9] xml2_1.3.6       stringr_1.5.1    tidyr_1.3.1      later_1.3.2     
## [13] jquerylib_0.1.4  yaml_2.3.10      fastmap_1.1.1    readr_2.1.5     
## [17] R6_2.5.1         generics_0.1.3   knitr_1.50       tibble_3.3.0    
## [21] bookdown_0.43    rprojroot_2.1.0  tzdb_0.4.0       bslib_0.6.1     
## [25] pillar_1.9.0     rlang_1.1.6      utf8_1.2.4       websocket_1.4.4 
## [29] stringi_1.8.3    cachem_1.0.8     xfun_0.52        sass_0.4.8      
## [33] cli_3.6.2        magrittr_2.0.3   ps_1.7.6         digest_0.6.34   
## [37] rvest_1.0.4      processx_3.8.3   hms_1.1.3        lifecycle_1.0.4 
## [41] chromote_0.5.1   vctrs_0.6.5      ottrpal_2.0.0    evaluate_1.0.4  
## [45] glue_1.7.0       spelling_2.3.1   fansi_1.0.6      purrr_1.0.2     
## [49] rmarkdown_2.25   httr_1.4.7       tools_4.3.2      pkgconfig_2.0.3 
## [53] htmltools_0.5.7

Save your notebook as is. Note it will not run correctly until we address the issues with the code in the next chapter.

6.4 Exercise 2: Package management

Python version of the exercise

Download this starter conda environment.yml file by clicking on the link and place it with your example project files directory.
Navigate to your example project files directory using command line.
Create your conda environment by using this file in the command.

conda env create --file environment.yml

Activate your conda environment using this command.

conda activate reproducible-python

Now start up JupyterLab again using this command:

jupyter lab

Follow these instructions to add the environment.yml file to the GitHub repository you created in the previous chapter. Later we will practice and discuss how to more fully utilize the features of GitHub but for now, just drag and drop it as the instructions linked describe.

6.4.1 More resources on how to use conda

R version of the exercise

First install the renv package

Go to RStudio and the Console pane:
Install renv using the command below (you should only need to do this once per your computer or RStudio environment).

install.packages("renv")

Now set up renv to use in your project

Change to your current directory for your project using setwd() in your console window (don’t put this in a script or notebook).
Use this command in your project:

renv::init()

This will start up renv in your particular project

*What’s :: about? – in brief it allows you to use a function from a package without loading the entire thing with library().

Now you can develop your project as you normally would; installing and removing packages in R as you see fit. For the purposes of this exercise, let’s install the styler package using the following command. (The styler package will come in handy for styling our code in the next chapter).

install.packages("styler")

Now that we have installed styler we will want to add it to our renv snapshot.

To add any packages we’ve installed to our renv snapshot we will use this command:

renv::snapshot()

This will save whatever packages we are currently using to our environment snapshot file called renv.lock. This renv.lock file is what we can share with our collaborators so they can replicate our computing environment.

If your package installation attempts are unsuccessful and you’d like to revert to the previous state of your environment, you can run renv::restore(). This will restore your renv.lock file to what it was before you attempted to install styler or whatever packages you tried to install.

You should see an renv.lock file is now created or updated! You will want to always include this file with your project files. This means we will want to add it to our GitHub!
Follow these instructions to add your renv.lock file to the GitHub repository you created in the previous chapter. Later we will practice and discuss how to more fully utilize the features of GitHub but for now, just drag and drop it as the instructions linked describe.

After you’ve added your computing environment files to your GitHub, you’re ready to continue using them with your IDE to actually work on the code in your notebook!

Any feedback you have regarding this exercise is greatly appreciated; you can fill out this form!

References

Beaulieu-Jones, Brett K, and Casey S Greene. 2017. “Reproducibility of Computational Workflows Is Automated Using Continuous Analysis.” Nature Biotechnology 35 (4): 342–46. https://doi.org/10.1038/nbt.3780.