```{r, echo = FALSE}
library(knitr)
library(readr)
opts_chunk$set(comment = "")
```
## Outline
* Part 0: A little bit of set up!
* Part 1: reading in manually (point and click)
* Part 2: reading in directly & working directories
* Part 3: checking data & multiple file formats
We will cover Output a bit later!
# Part 0: Setup - R Project
## New R Project
Let's make an R Project so we can stay organized in the next steps.
Click the new R Project button at the top left of RStudio:
```{r, fig.alt="The New R Project button is highlighted.", out.width = "40%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_Rproject.png")
```
## New R Project
In the New Project Wizard, click "New Directory":
```{r, fig.alt="In the New Project Wizard, the 'New Directory' option is highlighted.", out.width = "60%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_new_directory.png")
```
## New R Project
Click "New Project":
```{r, fig.alt="In the New Project Wizard, the 'New Project' option is highlighted.", out.width = "60%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_new_project.png")
```
## New R Project
Type in a name for your new folder.
Store it somewhere easy to find, such as your Desktop:
```{r, fig.alt="In the New Project Wizard, the new project has been given a name and is going to be stored in the Desktop directory. The 'Create Project' button is highlighted.", out.width = "60%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_new_project_details.png")
```
## New R Project
You now have a new R Project folder on your Desktop!
Make sure you add any scripts or data files to this folder as we go through today's lesson. This will make sure R is able to "find" your files.
**We will review this in lab.**
```{r, fig.alt="The image shows an image of an arrow pointing to the newly created R project repository.", out.width = "60%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_new_desktop.png")
```
# Part 1: Getting data into R (manual/point and click)
## Data Input
* 'Reading in' data is the first step of any real project/analysis
* R can read almost any file format, especially via add-on packages
* We are going to focus on simple delimited files first
* comma separated (e.g. '.csv')
* tab delimited (e.g. '.txt')
* Microsoft Excel (e.g. '.xlsx')
## Note: data for demonstration
* We have added functionality to load some datasets directly in the `jhur` package
## Data Input
Youth Tobacco Survey (YTS) dataset:
"The YTS was developed to provide states with comprehensive data on both middle school and high school students regarding tobacco use, exposure to environmental tobacco smoke, smoking cessation, school curriculum, minors' ability to purchase or otherwise obtain tobacco products, knowledge and attitudes about tobacco, and familiarity with pro-tobacco and anti-tobacco media messages."
* Check out the data at: https://catalog.data.gov/dataset/youth-tobacco-survey-yts-data
## Import Dataset
- `>` File
- `>` Import Dataset
- `>` From Text (`readr`)
- `>` paste the url (http://jhudatascience.org/intro_to_r/data/Youth_Tobacco_Survey_YTS_Data.csv)
- `>` click "Update" and "Import"
## What Just Happened?
You see a preview of the data on the top left pane.
```{r, fig.alt="The image shows an image of an arrow pointing to the newly created R project repository.", out.width = "80%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_data_imported.png")
```
## What Just Happened?
You see a new object called `Youth_Tobacco_Survey_YTS_Data` in your environment pane (top right). The table button opens the data for you to view.
```{r, fig.alt="The image shows an image of an arrow pointing to the newly created R project repository.", out.width = "80%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_data_imported.png")
```
## What Just Happened?
R ran some code in the console (bottom left).
```{r, fig.alt="The image shows an image of an arrow pointing to the newly created R project repository.", out.width = "80%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_code_ran.png")
```
## Browsing for Data on Your Machine
```{r, fig.alt="The image shows an image of an arrow pointing to the newly created R project repository.", out.width = "80%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_browse.png")
```
## Import Dataset
```{r, fig.alt="Gif showing the process of importing a dataset via readr.", out.width = "100%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_import_dataset.gif")
```
## Manual Import: Pros and Cons
Pros: easy!!
Cons: obscures some of what's happening, others will have difficulty running your code
## Summary & Lab Part 1
**R Projects** will make it easier to find files later.
Importing data:
- File `>` Import Dataset `>` From Text (`readr`)
- Paste the url (http://jhudatascience.org/intro_to_r/data/Youth_Tobacco_Survey_YTS_Data.csv)
- Click "Update" and "Import"
Review the process: [`https://youtu.be/LEkNfJgpunQ`](https://youtu.be/LEkNfJgpunQ)
🏠 [Class Website](https://jhudatascience.org/intro_to_r/)
💻 [Data Input Lab](https://jhudatascience.org/intro_to_r/modules/Data_Input/lab/Data_Input_Lab.Rmd)
# Part 2: Getting data into R (directly)
## Data Input: Read in Directly
```{r message = FALSE}
# load library `readr` that contains function `read_csv`
library(readr)
dat <- read_csv(
file = "http://jhudatascience.org/intro_to_r/data/Youth_Tobacco_Survey_YTS_Data.csv"
)
# `head` displays first few rows of a data frame. `tail()` works the same way.
head(dat, n = 5)
```
## Data Input: Declaring Arguments
```{r message = FALSE}
dat <- read_csv(
file = "http://jhudatascience.org/intro_to_r/data/Youth_Tobacco_Survey_YTS_Data.csv"
)
# EQUIVALENT TO
dat <- read_csv(
"http://jhudatascience.org/intro_to_r/data/Youth_Tobacco_Survey_YTS_Data.csv"
)
```
## Data Input: Read in Directly
`read_csv()` needs an argument `file =`.
- `file` is the path to your file, **in quotation marks**
- can be path to a file on a website (URL)
- can be **path** in your local computer -- absolute file path or relative file path
```{r, eval = FALSE}
# Examples
dat <- read_csv(file = "www.someurl.com/table1.csv")
dat <- read_csv(file = "/Users/avahoffman/Downloads/Youth_Tobacco_Survey_YTS_Data.csv")
dat <- read_csv(file = "Youth_Tobacco_Survey_YTS_Data.csv")
```
## Data Input: File paths
Reading from your computer.. What is my "path"?
```{r, fig.alt="GIF with text. PC: *autosaves file* Me: Cool, so where did the file save? PC: shows image of Power Rangers shrugging.", out.width = "40%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_where_are_the_files.gif")
```
## Data Input: File paths
When you set up an R Project, R looks for files in that folder.
Luckily, we already set up an R Project!
Move downloaded files into the R Project folder.
```{r, fig.alt="Image showing the csv dataset being moved to the R Project directory created earlier.", out.width = "60%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_file_move.png")
```
## Data Input: File paths
Confirm the data is in the R Project folder.
```{r, fig.alt="Image showing the csv dataset inside the R Project directory created earlier.", out.width = "70%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_project_directory.png")
```
## Data Input: File paths
If we add the Youth_Tobacco_Survey_YTS_Data.csv file to the R Project folder, we can use the file name for the `file` argument:
```{r, eval = FALSE}
dat <- read_csv(file = "Youth_Tobacco_Survey_YTS_Data.csv")
```
## Why does this work?
When we create an R Project, we establish the **working directory**.
Working directory is a folder (directory) that RStudio assumes "you are working in".
It's where R looks for files.
```{r, fig.alt="The files are in the computer text overlaid on still shot of the movie Zoolander.", out.width = "30%", echo = FALSE, align = "center"}
knitr::include_graphics("images/files.jpg")
```
## The Working Directory
The working directory is wherever the `.Rproj` file is.
```{r, fig.alt="Image showing the RStudio console. There is an arrow pointing to the .Rproj file. The top right corner shows that the 'Intro_to_r' project has been selected.", out.width = "80%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_Rproj_file.png")
```
## Data Input: Getting Organized!
If you move a file into a nested folder, **you must update the path!**
```{r, eval = FALSE}
# Notice "data/" has been added!
dat <- read_csv(file = "data/Youth_Tobacco_Survey_YTS_Data.csv")
```
Always confirm you read in the data by checking the "Environment" pane (top right).
# Part 3: Checking data & Other formats
## Data Input: Checking the data
- the `View()` function shows your data in a new tab, in spreadsheet format
- be careful if your data is big!
```{r eval = FALSE}
View(dat)
```
```{r, fig.alt="Screenshot of the RStudio console. 'View(dat)' has been typed and the data appears in table format.", out.width = "80%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_View_data.png")
```
## Data Input: Other delimiters with `read_delim()`
`read_csv()` is a special case of `read_delim()` -- a general function to read a delimited file into a data frame
`read_delim()` needs path to your file and **file's delimiter**, will return a tibble
- `file` is the path to your file, in quotes
- `delim` is what separates the fields within a record
```{r, eval = FALSE}
## Examples
dat <- read_delim(file = "www.someurl.com/table1.tsv", delim = "\t")
dat <- read_delim(file = "data.txt", delim = "|")
```
## Data Input: Excel files
- You **cannot** read in an excel file from a URL.
- Need to load the `readxl` package with `library()`.
- The argument is `path` (not `file`).
```{r}
# Programmatically download
download.file(
url = "http://jhudatascience.org/intro_to_r/data/asthma.xlsx",
destfile = "asthma.xlsx",
overwrite = TRUE,
mode = "wb"
)
```
## Data Input: Excel files
- You **cannot** read in an excel file from a URL.
- Need to load the `readxl` package with `library()`.
- The argument is `path` (not `file`).
```{r, eval=FALSE}
library(readxl)
read_excel(path = "asthma.xlsx")
```
## Data input: other file types
* `haven` package has functions to read SAS, SPSS, Stata formats
```{r, eval = FALSE}
library(haven)
# SAS
read_sas(file = "mtcars.sas7bdat")
# SPSS
read_sav(file = "mtcars.sav")
# Stata
read_dta(file = "mtcars.dta")
```
* There are also resources for REDCap : [`REDCapR`](https://cran.r-project.org/web/packages/REDCapR/vignettes/BasicREDCapROperations.html)
## `read.csv` is * base R *
There are also data importing functions provided in base R (rather than the `readr` package), like `read.delim()` and `read.csv()`.
These functions have slightly different syntax for reading in data (e.g. `header` argument).
However, while many online resources use the base R tools, the latest version of RStudio switched to use these new `readr` data import tools, so we will use them in the class for slides. They are also up to two times faster for reading in large datasets, and have a progress bar which is nice.
## TROUBLESHOOTING: Common new user mistakes we have seen
1. **Working directory problems: trying to read files that R "can't find"**
- Path misspecification
- more on this shortly!
2. Typos (R is **case sensitive**, `x` and `X` are different)
- RStudio helps with "tab completion"
3. Open ended quotes, parentheses, and brackets
4. Different versions of software
5. Deleting part of the code chunk
## TROUBLESHOOTING: Help
For any function, you can write `?FUNCTION_NAME`, or `help("FUNCTION_NAME")` to look at the help file:
```{r, eval = FALSE}
?read_delim
help("read_delim")
```
```{r, fig.alt="Screenshot of the RStudio console. '?read_delim' has been typed and the help page has appeared in the help pane on the right.", out.width = "60%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_read_delim.png")
```
## TROUBLESHOOTING: Setting the working directory
If your R project directory and working directory do not match:
- Session > Set Working Directory > To Project Directory
```{r, fig.alt="Screenshot of the session menu, with set working directory selected, and To Project Directory selected.", out.width = "60%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_session_directory.png")
```
## TROUBLESHOOTING: Setting the working directory
If you are trying to knit your work, it might help to set the knit directory to the "Current Working Directory":
```{r, fig.alt="Screenshot of the Knit menu, with Knit directory open, and Current Working Directory selected.", out.width = "60%", echo = FALSE, align = "center"}
knitr::include_graphics("images/Data_Input_knit_directory.png")
```
## TROUBLESHOOTING: Setting the working directory
You can also run the `getwd()` function to determine your working directory.
```{r eval=FALSE}
# Get the working directory
getwd()
```
You can also set the working directory manually with the `setwd()` function:
```{r eval=FALSE}
# set the working directory
setwd("/Users/avahoffman/Desktop")
```
## Other Useful Functions
- The `str()` function can tell you about data/objects (different variables and their classes - more on this later).
- We will also discuss the `glimpse()` function later, which does something very similar.
- `head()` shows first few rows
- `tail()` shows the last few rows
- `here` package
```{r eval=FALSE}
library(here)
here()
```
## Summary - Part 2
`read_csv()` function from `readr` package:
- comma delimited data
- needs a file path to be provided
- returns a tibble (data frame)
R Projects are a good way to keep your files organized and reduce headaches
- Use `getwd()` to check your working directory, where R looks for your data files
## Summary - Part 2
Look at your data!
- Check the environment for a data object
- `View()` gives you a preview of the data in a new tab
Other file types
- `readr` package: `read_delim()` for general delimited files
- `readxl` package: `read_excel()` for Excel files
Don't forget to use `<-` to assign your data to an object!
## Lab Part 2
🏠 [Class Website](https://jhudatascience.org/intro_to_r/)
💻 [Data Input Lab](https://jhudatascience.org/intro_to_r/modules/Data_Input/lab/Data_Input_Lab.Rmd)
```{r, fig.alt="The End", out.width = "50%", echo = FALSE, fig.align='center'}
knitr::include_graphics(here::here("images/the-end-g23b994289_1280.jpg"))
```
Image by Gerd Altmann from Pixabay
w