In this lab you can use the interactive console to explore or Knit the document. Remember anything you type here can be “sent” to the console with Cmd-Enter (OS-X) or Ctrl-Enter (Windows/Linux) in an R code chunk.

library(readr)

Part 1

1.1

Set up your R Project. Once complete, confirm you have created a project folder.

  • First, save your files.
  • Click on the R Project button (blue box with a green plus sign on the top left).
  • If prompted, don’t save your .Rdata
  • Click “New Directory”
  • Click “New Project”
  • Name the Directory (folder) and select an appropriate location (such as Desktop)

1.2

Use the manual import method (File > Import Dataset > From Text (readr)) to Read in SARS-CoV-2 vaccination data from this URL:

https://jhudatascience.org/intro_to_r/data/vaccinations.csv.

You can learn more about how the data was collected here: https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-Jurisdi/unsk-b7fc

1.3

What is the dataset object called? You can find this information in the Console or the Environment. Enter your answer as a comment using #.

# vaccinations

1.4

Preview the data by clicking the table button in the Environment. How many observations and variables are there? Enter your answer as a comment using #.

# 37272 obs. of 103 variables

Part 2

2.1

Read in SARS-CoV-2 vaccination data from URL https://jhudatascience.org/intro_to_r/data/vaccinations.csv and assign it to an object named vacc. Use the code structure below.

# General format
library(readr)
# OBJECT <- read_csv(FILE)
library(readr)
vacc <- read_csv(file = "https://jhudatascience.org/intro_to_r/data/vaccinations.csv")
## Rows: 37272 Columns: 103
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (2): Date, Location
## dbl (101): MMWR_week, Distributed, Distributed_Janssen, Distributed_Moderna,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2.2

Load the readxl package with the library() command.

If it is not installed, install it via: RStudio --> Tools --> Install Packages. You can also try install.packages("readxl").

library(readxl)

2.3

Download the dataset of asthma prevalence in the USA from: https://jhudatascience.org/intro_to_r/data/asthma.xlsx file to asthma.xlsx by running the following code chunk. This only downloads the file, it does NOT bring the file into R.

download.file(
  url = "https://jhudatascience.org/intro_to_r/data/asthma.xlsx",
  destfile = "asthma.xlsx",
  overwrite = TRUE,
  mode = "wb"
)

Note: the “wb” option makes sure the file can be read correctly on Windows machines.

2.4

Use the read_excel() function in the readxl package to read the asthma.xlsx file and call the output asthma.

asthma <- read_excel(path = "asthma.xlsx")

Practice on Your Own!

P.1

Run the following code - is there a problem? How do you know?

yts <- read_delim("https://jhudatascience.org/intro_to_r/data/Youth_Tobacco_Survey_YTS_Data.csv", delim = "\t")
## Rows: 9794 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): YEAR,LocationAbbr,LocationDesc,TopicType,TopicDesc,MeasureDesc,Data...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
yts
## # A tibble: 9,794 × 1
##    YEAR,LocationAbbr,LocationDesc,TopicType,TopicDesc,MeasureDesc,DataSource,R…¹
##    <chr>                                                                        
##  1 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Percent of Curr…
##  2 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Percent of Curr…
##  3 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Percent of Curr…
##  4 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Quit Attempt in…
##  5 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Quit Attempt in…
##  6 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Quit Attempt in…
##  7 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cigarette Use (Youth),Smoking Sta…
##  8 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cigarette Use (Youth),Smoking Sta…
##  9 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cigarette Use (Youth),Smoking Sta…
## 10 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cigarette Use (Youth),Smoking Sta…
## # ℹ 9,784 more rows
## # ℹ abbreviated name:
## #   ¹​`YEAR,LocationAbbr,LocationDesc,TopicType,TopicDesc,MeasureDesc,DataSource,Response,Data_Value_Unit,Data_Value_Type,Data_Value,Data_Value_Footnote_Symbol,Data_Value_Footnote,Data_Value_Std_Err,Low_Confidence_Limit,High_Confidence_Limit,Sample_Size,Gender,Race,Age,Education,GeoLocation,TopicTypeId,TopicId,MeasureId,StratificationID1,StratificationID2,StratificationID3,StratificationID4,SubMeasureID,DisplayOrder`
# It should be a red flag to see that there is only one column that looks like: "YEAR,LocationAbbr,LocationDesc,TopicType,TopicDesc,MeasureDesc..."
# This file is comma delimited not tab delimited!

P.2

By default, R reads the first sheet of an excel file. Copy your code from question 2.4 and add the following argument: sheet = 2. Inspect the data using head().

asthma <- read_excel(path = "asthma.xlsx", sheet = 2)
head(asthma)
## # A tibble: 6 × 3
##   Characteristic      `Weighted Number With Current Asthma` `Percent (SE)`
##   <chr>                                               <dbl> <chr>         
## 1 0–4                                                394206 2.0 (0.43)    
## 2 5–11                                              1641279 5.9 (0.58)    
## 3 5–14                                              2699214 6.6 (0.55)    
## 4 5-17 (School Age)                                 3832453 7.2 (0.49)    
## 5 12-14 (Young Teens)                               1057935 8.1 (1.10)    
## 6 12–17                                             2191174 8.6 (0.77)

P.3

Install and load the haven package. Look at the help page for read_dta() function, and scroll to the very bottom of the page. Try running some of the examples provided.

install.packages("haven")
library(haven)
?read_dta

path <- system.file("examples", "iris.dta", package = "haven")
read_dta(path)

P.4

Learn your working directory by running getwd()

getwd()
## [1] "/__w/intro_to_r/intro_to_r/modules/Data_Input/lab"

Create a folder in your R project called data. Move the “asthma.xlsx” file there.

Modify the following code so that it finds “asthma.xlsx” in the “data” directory.

asthma <- read_excel(path = "asthma.xlsx")
asthma <- read_excel(path = "data/asthma.xlsx")

P.5

Practice importing a dataset of your choice, give it an object name, and use head() to preview the first few lines.