Part 1

Helpful tips before we start

TROUBLESHOOTING: Common new user mistakes we have seen

Check the file path – is the file there?
Typos (R is case sensitive, x and X are different)
Open ended quotes, parentheses, and brackets
Deleting part of the code chunk
For any function, you can write ?FUNCTION_NAME, or help("FUNCTION_NAME") to look at the help file

1.1

Set up your R Project.

File, New Project or click the new project button
New Directory
New Project
Type a name and choose a location
Check that the folder is there!

Check out our resource here: https://jhudatascience.org/intro_to_r/resources/R_Projects.html

1.2

Load the package by adding “library(tidyverse)” below and running the code.

library(tidyverse)

1.3

Use the manual import method (File > Import Dataset > From Text (readr)) to Read in SARS-CoV-2 vaccination data from this URL:

https://jhudatascience.org/intro_to_r/data/vaccinations.csv

You can learn more about how the data was collected here: https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-Jurisdi/unsk-b7fc

1.4

What is the dataset object called? You can find this information in the Console or the Environment. Enter your answer as a comment using #.

# vaccinations

1.5

Preview the data by examining the Environment. How many observations and variables are there? Enter your answer as a comment using #.

# 37272 obs. of 103 variables

Practice on Your Own!

P.1

Download the data from https://jhudatascience.org/intro_to_r/data/vaccinations.csv and move the file to your project folder. Import the data by browsing for the file on your computer.

Download the data Put data in the project folder File, Import Dataset, From Text (readr) browse for the file click “Update” and “Import”

Part 2

2.1

Read in SARS-CoV-2 vaccination data from URL https://jhudatascience.org/intro_to_r/data/vaccinations.csv. Assign it to an object named vacc. Use the code structure below.

# General format
# OBJECT <- read_csv(FILE)

vacc <- read_csv(file = "https://jhudatascience.org/intro_to_r/data/vaccinations.csv")

## Rows: 37272 Columns: 103
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (2): Date, Location
## dbl (101): MMWR_week, Distributed, Distributed_Janssen, Distributed_Moderna,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2.2

Take a look at the data. Do these data objects (vaccinations and vacc) appear to be the same? Why or why not?

# Yes, when we look in the RStudio environment, the two objects have the same dimensions. If we use the View() or str() functions, we can also see in more detail that the data is the same.

2.3

Learn your working directory by running getwd(). This is where R will look for files unless you tell it otherwise.

getwd()

## [1] "/__w/intro_to_r/intro_to_r/modules/Data_Input/lab"

Practice on Your Own!

P.2

Run the following code - is there a problem? How do you know?

vacc2 <- read_delim("https://jhudatascience.org/intro_to_r/data/vaccinations.csv", delim = "\t")
vacc2

vacc2 <- read_delim("https://jhudatascience.org/intro_to_r/data/vaccinations.csv", delim = "\t")

## Rows: 37272 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): Date,MMWR_week,Location,Distributed,Distributed_Janssen,Distributed...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

vacc2

## # A tibble: 37,272 × 1
##    Date,MMWR_week,Location,Distributed,Distributed_Janssen,Distributed_Moderna…¹
##    <chr>                                                                        
##  1 12/28/22,52,AS,128480,600,25500,102380,0,0,271101,298028,348024,410021,39194…
##  2 12/28/22,52,PA,40583485,1564300,15149120,23816365,53700,0,317009,335288,3662…
##  3 12/28/22,52,WA,24157865,776000,8579700,14772865,29300,0,317245,337475,371607…
##  4 12/28/22,52,CO,16867685,500700,5939340,10405645,22000,0,292906,310837,341724…
##  5 12/28/22,52,ID,4537400,160000,1746180,2616020,15200,0,253902,271560,303422,3…
##  6 12/28/22,52,VI,168920,3200,46740,118780,200,0,158924,169558,188896,207246,82…
##  7 12/28/22,52,IL,36586985,1182100,12515840,22864245,24800,0,288727,306812,3375…
##  8 12/28/22,52,HI,4448760,124900,1698420,2621840,3600,0,314206,334329,367329,39…
##  9 12/28/22,52,CA,115820215,3778100,41357180,70555535,129400,0,293125,311944,34…
## 10 12/28/22,52,ND,1819400,53600,640920,1122680,2200,0,238747,256992,285510,3126…
## # ℹ 37,262 more rows
## # ℹ abbreviated name:
## #   ¹`Date,MMWR_week,Location,Distributed,Distributed_Janssen,Distributed_Moderna,Distributed_Pfizer,Distributed_Novavax,Distributed_Unk_Manuf,Dist_Per_100K,Distributed_Per_100k_5Plus,Distributed_Per_100k_12Plus,Distributed_Per_100k_18Plus,Distributed_Per_100k_65Plus,Administered,Administered_5Plus,Administered_12Plus,Administered_18Plus,Administered_65Plus,Administered_Janssen,Administered_Moderna,Administered_Pfizer,Administered_Novavax,Administered_Unk_Manuf,Admin_Per_100K,Admin_Per_100k_5Plus,Admin_Per_100k_12Plus,Admin_Per_100k_18Plus,Admin_Per_100k_65Plus,Recip_Administered,Administered_Dose1_Recip,Administered_Dose1_Pop_Pct,Administered_Dose1_Recip_5Plus,Administered_Dose1_Recip_5PlusPop_Pct,Administered_Dose1_Recip_12Plus,Administered_Dose1_Recip_12PlusPop_Pct,Administered_Dose1_Recip_18Plus,Administered_Dose1_Recip_18PlusPop_Pct,Administered_Dose1_Recip_65Plus,Administered_Dose1_Recip_65PlusPop_Pct,Series_Complete_Yes,Series_Complete_Pop_Pct,Series_Complete_5Plus,Series_Complete_5PlusPop_Pct,Series_Complete_12Plus,Series_Complete_12PlusPop_Pct,Series_Complete_18Plus,Series_Complete_18PlusPop_Pct,Series_Complete_65Plus,Series_Complete_65PlusPop_Pct,Series_Complete_Janssen,Series_Complete_Moderna,Series_Complete_Pfizer,Series_Complete_Novavax,Series_Complete_Unk_Manuf,Series_Complete_Janssen_5Plus,Series_Complete_Moderna_5Plus,Series_Complete_Pfizer_5Plus,Series_Complete_Unk_Manuf_5Plus,Series_Complete_Janssen_12Plus,Series_Complete_Moderna_12Plus,Series_Complete_Pfizer_12Plus,Series_Complete_Unk_Manuf_12Plus,Series_Complete_Janssen_18Plus,Series_Complete_Moderna_18Plus,Series_Complete_Pfizer_18Plus,Series_Complete_Unk_Manuf_18Plus,Series_Complete_Janssen_65Plus,Series_Complete_Moderna_65Plus,Series_Complete_Pfizer_65Plus,Series_Complete_Unk_Manuf_65Plus,Additional_Doses,Additional_Doses_Vax_Pct,Additional_Doses_5Plus,Additional_Doses_5Plus_Vax_Pct,Additional_Doses_12Plus,Additional_Doses_12Plus_Vax_Pct,Additional_Doses_18Plus,Additional_Doses_18Plus_Vax_Pct,Additional_Doses_50Plus,Additional_Doses_50Plus_Vax_Pct,Additional_Doses_65Plus,Additional_Doses_65Plus_Vax_Pct,Additional_Doses_Moderna,Additional_Doses_Pfizer,Additional_Doses_Janssen,Additional_Doses_Unk_Manuf,Second_Booster_50Plus,Second_Booster_50Plus_Vax_Pct,Second_Booster_65Plus,Second_Booster_65Plus_Vax_Pct,Second_Booster_Janssen,Second_Booster_Moderna,Second_Booster_Pfizer,Second_Booster_Unk_Manuf,Bivalent_Booster_5Plus,Bivalent_Booster_5Plus_Pop_Pct,Bivalent_Booster_12Plus,Bivalent_Booster_12Plus_Pop_Pct,Bivalent_Booster_18Plus,Bivalent_Booster_18Plus_Pop_Pct,Bivalent_Booster_65Plus,Bivalent_Booster_65Plus_Pop_Pct`

# It should be a red flag to see that there is only one column that looks like:  Date,MMWR_week,Location,Distributed,Distributed_Janssen,Distributed_Moderna ..
# The one column can be seen in the dimensions:  A tibble: 37,272 × 1
# This file is comma delimited, not tab delimited!

P.3

Try reading in some data on your computer using any method we discussed!

Data Input Lab - Key

Part 1

Helpful tips before we start

TROUBLESHOOTING: Common new user mistakes we have seen

1.1

1.2

1.3

1.4

1.5

Practice on Your Own!

P.1

Part 2

2.1

2.2

2.3

Practice on Your Own!

P.2

P.3