x
and
X
are different)?FUNCTION_NAME
, or
help("FUNCTION_NAME")
to look at the help fileR Projects can help you keep files organized and avoid issues with working directories. Check out our resource here: https://jhudatascience.org/intro_to_r/resources/R_Projects.html
In this lab you can use the interactive console to explore or Knit the document. Remember anything you type here can be “sent” to the console with Cmd-Enter (OS-X) or Ctrl-Enter (Windows/Linux) in an R code chunk.
# Load the necessary package
library(readr)
Use the manual import method (File > Import Dataset > From Text
(readr
)) to Read in SARS-CoV-2 vaccination data from this
URL:
https://jhudatascience.org/intro_to_r/data/vaccinations.csv.
You can learn more about how the data was collected here: https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-Jurisdi/unsk-b7fc
What is the dataset object called? You can find this information in
the Console or the Environment. Enter your answer as a comment using
#
.
# vaccinations
Preview the data by clicking the table button in the Environment. How
many observations and variables are there? Enter your answer as a
comment using #
.
# 37272 obs. of 103 variables
Read in SARS-CoV-2 vaccination data from URL https://jhudatascience.org/intro_to_r/data/vaccinations.csv
and assign it to an object named vacc
. Use the code
structure below.
# General format
library(readr)
# OBJECT <- read_csv(FILE)
library(readr)
vacc <- read_csv(file = "https://jhudatascience.org/intro_to_r/data/vaccinations.csv")
## Rows: 37272 Columns: 103
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Date, Location
## dbl (101): MMWR_week, Distributed, Distributed_Janssen, Distributed_Moderna,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Take a look at the data. Do these data objects
(vaccinations
and vacc
) appear to be the same?
Why or why not?
# Yes, when we look in the RStudio environment, the two objects have the same dimensions. If we use the View() or str() functions, we can also see in more detail that the data is the same.
# If we wanted to get really in the weeds, we could do a logical test like all.equal(vacc, vaccinations)
Learn your working directory by running getwd()
. This is
where R will look for files unless you tell it otherwise.
getwd()
## [1] "/__w/intro_to_r/intro_to_r/modules/Data_Input/lab"
Load the readxl
package with the library()
command.
If it is not installed, install it via:
RStudio --> Tools --> Install Packages
. You can also
try install.packages("readxl")
.
library(readxl)
Download the dataset of asthma prevalence in the USA from: https://jhudatascience.org/intro_to_r/data/asthma.xlsx
file to asthma.xlsx
by running the following code chunk.
This only downloads the file, it does NOT bring the file into R.
download.file(
url = "https://jhudatascience.org/intro_to_r/data/asthma.xlsx",
destfile = "asthma.xlsx",
overwrite = TRUE,
mode = "wb"
)
Note: the “wb” option makes sure the file can be read correctly on Windows and Apple machines.
Use the read_excel()
function in the readxl
package to read the asthma.xlsx
file and call the output
asthma
.
asthma <- read_excel(path = "asthma.xlsx")
Run the following code - is there a problem? How do you know?
yts <- read_delim("https://jhudatascience.org/intro_to_r/data/Youth_Tobacco_Survey_YTS_Data.csv", delim = "\t")
## Rows: 9794 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): YEAR,LocationAbbr,LocationDesc,TopicType,TopicDesc,MeasureDesc,Data...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
yts
## # A tibble: 9,794 × 1
## YEAR,LocationAbbr,LocationDesc,TopicType,TopicDesc,MeasureDesc,DataSource,R…¹
## <chr>
## 1 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Percent of Curr…
## 2 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Percent of Curr…
## 3 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Percent of Curr…
## 4 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Quit Attempt in…
## 5 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Quit Attempt in…
## 6 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cessation (Youth),Quit Attempt in…
## 7 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cigarette Use (Youth),Smoking Sta…
## 8 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cigarette Use (Youth),Smoking Sta…
## 9 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cigarette Use (Youth),Smoking Sta…
## 10 "2015,AZ,Arizona,Tobacco Use – Survey Data,Cigarette Use (Youth),Smoking Sta…
## # ℹ 9,784 more rows
## # ℹ abbreviated name:
## # ¹`YEAR,LocationAbbr,LocationDesc,TopicType,TopicDesc,MeasureDesc,DataSource,Response,Data_Value_Unit,Data_Value_Type,Data_Value,Data_Value_Footnote_Symbol,Data_Value_Footnote,Data_Value_Std_Err,Low_Confidence_Limit,High_Confidence_Limit,Sample_Size,Gender,Race,Age,Education,GeoLocation,TopicTypeId,TopicId,MeasureId,StratificationID1,StratificationID2,StratificationID3,StratificationID4,SubMeasureID,DisplayOrder`
# It should be a red flag to see that there is only one column that looks like: "YEAR,LocationAbbr,LocationDesc,TopicType,TopicDesc,MeasureDesc..."
# This file is comma delimited, not tab delimited!
By default, R reads the first sheet of an excel file. Copy your code
from question P.3 and add the following argument:
sheet = 2
. Inspect the data using head()
.
asthma <- read_excel(path = "asthma.xlsx", sheet = 2)
head(asthma)
## # A tibble: 6 × 3
## Characteristic `Weighted Number With Current Asthma` `Percent (SE)`
## <chr> <dbl> <chr>
## 1 0–4 394206 2.0 (0.43)
## 2 5–11 1641279 5.9 (0.58)
## 3 5–14 2699214 6.6 (0.55)
## 4 5-17 (School Age) 3832453 7.2 (0.49)
## 5 12-14 (Young Teens) 1057935 8.1 (1.10)
## 6 12–17 2191174 8.6 (0.77)
Install and load the haven
package. Look at the help
page for read_dta()
function, and scroll to the very bottom
of the page. Try running some of the examples provided.
install.packages("haven")
library(haven)
?read_dta
path <- system.file("examples", "iris.dta", package = "haven")
read_dta(path)