Title

Chapter 4 Providing data

4.1 Learning Objectives

Learning Objectives This chapter will demonstrate how to: Provide data in a way that allows your analysis to be reproducible. Set up a download data script to include with your analysis repository.

The first part of any analysis should be getting all the data needed to run it. Data come in all kinds of formats and sizes so while we can’t give specifics on how to share your data we can provide these guidelines:

4.1.1 Overview of data sharing

4.1.2 A very general example of a data download bash script

As far as how to have your data downloaded, this will be dependent on where and how it’s stored online. The most general form of a data download script might look like this:

#!/bin/bash

# This is a template script for downloading data using the wget command
# See docs here: https://www.gnu.org/software/wget/manual/wget.html

mkdir <FOLDER_TO_SAVE_TO>

# To see wget options, use -h (the help flag)
wget -h

wget -O <FOLDER/FILE_TO_SAVE_TO> <URL>

You can download this general template download file here (Shapiro et al. 2021).

References

Shapiro, Joshua A., Candace L. Savonen, Allegra G. Hawkins, Chante J. Bethell, Deepashree Venkatesh Prasad, Casey S. Greene, and Jaclyn N. Taroni. 2021. Childhood Cancer Data Lab Training Modules (version 2021-june).