In this lab you can use the interactive console to explore or Knit the document. Remember anything you type here can be “sent” to the console with Cmd-Enter (OS-X) or Ctrl-Enter (Windows/Linux) in an R code chunk.
library(readr)
Set up your R Project. Once complete, confirm you have created a project folder.
Use the manual import method (File > Import Dataset > From Text
(readr
)) to Read in SARS-CoV-2 vaccination data from this
URL:
https://jhudatascience.org/intro_to_r/data/vaccinations.csv.
You can learn more about how the data was collected here: https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-Jurisdi/unsk-b7fc
What is the dataset object called? You can find this information in
the Console or the Environment. Enter your answer as a comment using
#
.
# vaccinations
Preview the data by clicking the table button in the Environment. How
many observations and variables are there? Enter your answer as a
comment using #
.
# 37272 obs. of 103 variables
Read in SARS-CoV-2 vaccination data from URL https://jhudatascience.org/intro_to_r/data/vaccinations.csv
and assign it to an object named vacc
. Use the code
structure below.
# General format
library(readr)
# OBJECT <- read_csv(FILE)
library(readr)
vacc <- read_csv(file = "https://jhudatascience.org/intro_to_r/data/vaccinations.csv")
## Rows: 37272 Columns: 103
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Date, Location
## dbl (101): MMWR_week, Distributed, Distributed_Janssen, Distributed_Moderna,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Load the readxl
package with the library()
command.
If it is not installed, install it via:
RStudio --> Tools --> Install Packages
. You can also
try install.packages("readxl")
.
library(readxl)
Download the dataset of asthma prevalence in the USA from: https://jhudatascience.org/intro_to_r/data/asthma.xlsx
file to asthma.xlsx
by running the following code chunk.
This only downloads the file, it does NOT bring the file into R.
download.file(
url = "https://jhudatascience.org/intro_to_r/data/asthma.xlsx",
destfile = "asthma.xlsx",
overwrite = TRUE,
mode = "wb"
)
Note: the “wb” option makes sure the file can be read correctly on Windows machines.
Use the read_excel()
function in the readxl
package to read the asthma.xlsx
file and call the output
asthma
.
asthma <- read_excel(path = "asthma.xlsx")
Run the following code - is there a problem? How do you know?
yts <- read_delim("https://jhudatascience.org/intro_to_r/data/Youth_Tobacco_Survey_YTS_Data.csv", delim = "\t")
## Rows: 9794 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): YEAR,LocationAbbr,LocationDesc,TopicType,TopicDesc,MeasureDesc,Data...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
yts
## # A tibble: 9,794 × 1
## YEAR,LocationAbbr,LocationDesc,TopicType,TopicDesc,MeasureDesc,DataSource,R…¹
## <chr>
## 1 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Percent of Curr…
## 2 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Percent of Curr…
## 3 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Percent of Curr…
## 4 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Quit Attempt in…
## 5 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Quit Attempt in…
## 6 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Quit Attempt in…
## 7 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cigarette Use (Youth),Smoking Sta…
## 8 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cigarette Use (Youth),Smoking Sta…
## 9 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cigarette Use (Youth),Smoking Sta…
## 10 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cigarette Use (Youth),Smoking Sta…
## # ℹ 9,784 more rows
## # ℹ abbreviated name:
## # ¹`YEAR,LocationAbbr,LocationDesc,TopicType,TopicDesc,MeasureDesc,DataSource,Response,Data_Value_Unit,Data_Value_Type,Data_Value,Data_Value_Footnote_Symbol,Data_Value_Footnote,Data_Value_Std_Err,Low_Confidence_Limit,High_Confidence_Limit,Sample_Size,Gender,Race,Age,Education,GeoLocation,TopicTypeId,TopicId,MeasureId,StratificationID1,StratificationID2,StratificationID3,StratificationID4,SubMeasureID,DisplayOrder`
# It should be a red flag to see that there is only one column that looks like: "YEAR,LocationAbbr,LocationDesc,TopicType,TopicDesc,MeasureDesc..."
# This file is comma delimited not tab delimited!
By default, R reads the first sheet of an excel file. Copy your code
from question 2.4 and add the following argument:
sheet = 2
. Inspect the data using head()
.
asthma <- read_excel(path = "asthma.xlsx", sheet = 2)
head(asthma)
## # A tibble: 6 × 3
## Characteristic `Weighted Number With Current Asthma` `Percent (SE)`
## <chr> <dbl> <chr>
## 1 0–4 394206 2.0 (0.43)
## 2 5–11 1641279 5.9 (0.58)
## 3 5–14 2699214 6.6 (0.55)
## 4 5-17 (School Age) 3832453 7.2 (0.49)
## 5 12-14 (Young Teens) 1057935 8.1 (1.10)
## 6 12–17 2191174 8.6 (0.77)
Install and load the haven
package. Look at the help
page for read_dta()
function, and scroll to the very bottom
of the page. Try running some of the examples provided.
install.packages("haven")
library(haven)
?read_dta
path <- system.file("examples", "iris.dta", package = "haven")
read_dta(path)
Learn your working directory by running getwd()
getwd()
## [1] "/__w/intro_to_r/intro_to_r/modules/Data_Input/lab"
Create a folder in your R project called data. Move the “asthma.xlsx” file there.
Modify the following code so that it finds “asthma.xlsx” in the “data” directory.
asthma <- read_excel(path = "asthma.xlsx")
asthma <- read_excel(path = "data/asthma.xlsx")
Practice importing a dataset of your choice, give it an object name,
and use head()
to preview the first few lines.