- Load all the libraries we will use in this lab.
library(readr)
library(dplyr)
library(ggplot2)
- Create a function that takes one argument, a vector, and returns the sum of the vector squared. Call it “sum_squared”. Test your function on the vector
c(2,7,21,30,90)
- you should get the answer 22500.
x <- c(2,7,21,30,90)
sum_squared <- function(x) sum(x) ^ 2
sum_squared(x)
## [1] 22500
sum_squared <- function(x) {
out <- sum(x) ^ 2
return(out)
}
sum_squared(x)
## [1] 22500
- Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). Call it “has_n”. Test your function on the vector
c(2,7,21,30,90)
and number 21
- you should get the answer TRUE.
x <- c(2,7,21,30,90)
n <- 21
has_n <- function(x, n) n %in% x
has_n(x, n)
## [1] TRUE
- Amend the function “has_n” from #2 so that it takes a default value of
21
for the numeric argument.
x <- c(2,7,21,30,90)
n <- 21
has_n <- function(x, n = 21) n %in% x
has_n(x)
## [1] TRUE
- Read in the SARS-CoV-2 Vaccination data from http://jhudatascience.org/intro_to_R_class/data/USA_covid19_vaccinations.csv. Assign the data the name “vacc”.
vacc <- read_csv("http://jhudatascience.org/intro_to_R_class/data/USA_covid19_vaccinations.csv")
# If downloaded
# vacc <- read_csv("USA_covid19_vaccinations.csv")
- We want to get some summary statistics on the Moderna vaccines. Use
across
inside summarize
to get the sum total number vaccine doses for any variable containing the word “Moderna” or starting with “Total”. Hint: use matches()
AND starts_with()
to select the right columns inside across
. Keep in mind that this includes the United States as a whole and so is not totally accurate!
vacc %>%
summarize(across(.cols = matches("Moderna") & starts_with("Total"),
.fns = sum))
## # A tibble: 1 x 4
## `Total Number of M… `Total Number of … `Total Count Peopl… `Total Count Peopl…
## <dbl> <dbl> <dbl> <dbl>
## 1 482227080 403816391 NA NA
vacc %>%
summarize(across(.cols = matches("Moderna") & starts_with("Total"),
.fns = sum,
na.rm = TRUE))
## # A tibble: 1 x 4
## `Total Number of M… `Total Number of … `Total Count Peopl… `Total Count Peopl…
## <dbl> <dbl> <dbl> <dbl>
## 1 482227080 403816391 29736587 31851324
- Use
across
and mutate
to convert all columns containing the word “Percent” into proportions (i.e., divide that value by 100). Hint: use matches()
to select the right columns. Use a “function on the fly” to divide by 100. It will also be easier to check your work if you select()
columns that match “Percent”.
vacc %>%
mutate(across(.cols = matches("Percent"),
.fns = function(x) x / 100)) %>%
select(matches("Percent"))
## # A tibble: 64 x 34
## `Percent of Total … `Percent of 18+ Po… `Percent of Total… `Percent of 18+ P…
## <dbl> <dbl> <dbl> <dbl>
## 1 0.746 0.866 0.627 0.734
## 2 0.657 0.776 0.568 0.68
## 3 0.593 0.711 0.482 0.583
## 4 0.636 0.751 0.518 0.618
## 5 0.876 0.95 0.76 0.857
## 6 0.684 0.792 0.577 0.679
## 7 NA NA NA NA
## 8 0.843 0.95 0.67 0.776
## 9 0.754 0.869 0.668 0.776
## 10 0.905 0.95 0.755 0.852
## # … with 54 more rows, and 30 more variables:
## # Percent of Total Pop with 1+ Doses by State of Residence <dbl>,
## # Percent of 18+ Pop with 1+ Doses by State of Residence <dbl>,
## # Percent of Total Pop with 2 Doses by State of Residence <dbl>,
## # Percent of 18+ Pop with 2 Doses by State of Residence <dbl>,
## # Percent of 65+ Pop with at least One Dose by State of Residence <dbl>,
## # Percent of 65+ Pop Fully Vaccinated by State of Residence <dbl>,
## # Percent of 12+ Pop with at least One Dose by State of Residence <dbl>,
## # Percent of 12+ Pop Fully Vaccinated by State of Residence <dbl>,
## # Percent of 5+ Pop with at least One Dose by State of Residence <dbl>,
## # Percent of 5+ Pop Fully Vaccinated by State of Residence <dbl>,
## # Percent of fully vaccinated people with booster doses <dbl>,
## # Percent of fully vaccinated people 18+ with booster doses <dbl>,
## # Percent of fully vaccinated people 50+ with booster doses <dbl>,
## # Percent of fully vaccinated people 65+ with booster doses <dbl>,
## # Percent People Primary Pfizer Booster Pfizer <dbl>,
## # Percent People Primary Pfizer Booster Moderna <dbl>,
## # Percent People Primary Pfizer Booster J&J <dbl>,
## # Percent People Primary Pfizer Booster Other <dbl>,
## # Percent People Primary Moderna Booster Pfizer <dbl>,
## # Percent People Primary Moderna Booster Moderna <dbl>,
## # Percent People Primary Moderna Booster J&J <dbl>,
## # Percent People Primary Moderna Booster Other <dbl>,
## # Percent People Primary J&J Booster Pfizer <dbl>,
## # Percent People Primary J&J Booster Moderna <dbl>,
## # Percent People Primary J&J Booster J&J <dbl>,
## # Percent People Primary J&J Booster Other <dbl>,
## # Percent People Primary Other Booster Pfizer <dbl>,
## # Percent People Primary Other Booster Moderna <dbl>,
## # Percent People Primary Other Booster J&J <dbl>,
## # Percent People Primary Other Booster Uknown <dbl>
- Use
across
and mutate
to convert all columns starting with the word “Total” into a binary variable: TRUE if the value is greater than 10,000,000 and FALSE if less than or equal to 10,000,000. Hint: use starts_with()
to select the columns starting with “Total”. Use a “function on the fly” to do a logical test if the value is greater than 10,000,000.
vacc %>%
mutate(across(.cols = starts_with("Total"),
.fns = function(x) x > 10000000))
## # A tibble: 64 x 125
## `State/Territory/Fe… `Total Doses Deli… `Doses Delivered … `18+ Doses Delive…
## <chr> <lgl> <dbl> <dbl>
## 1 United States TRUE 194167 249614
## 2 Alaska FALSE 185553 246102
## 3 Alabama FALSE 175845 226010
## 4 Arkansas FALSE 177298 230859
## 5 American Samoa FALSE 179165 270975
## 6 Arizona TRUE 180044 232419
## 7 Bureau of Prisons FALSE NA NA
## 8 California TRUE 201694 260288
## 9 Colorado TRUE 194586 249059
## 10 Connecticut FALSE 218700 274760
## # … with 54 more rows, and 121 more variables:
## # Total Doses Administered by State where Administered <lgl>,
## # Doses Administered per 100k by State where Administered <dbl>,
## # 18+ Doses Administered by State where Administered <dbl>,
## # 18+ Doses Administered per 100K by State where Administered <dbl>,
## # People with at least One Dose by State of Residence <dbl>,
## # Percent of Total Pop with at least One Dose by State of Residence <dbl>,
## # People 18+ with at least One Dose by State of Residence <dbl>,
## # Percent of 18+ Pop with at least One Dose by State of Residence <dbl>,
## # People Fully Vaccinated by State of Residence <dbl>,
## # Percent of Total Pop Fully Vaccinated by State of Residence <dbl>,
## # People 18+ Fully Vaccinated by State of Residence <dbl>,
## # Percent of 18+ Pop Fully Vaccinated by State of Residence <dbl>,
## # Total Number of Pfizer doses delivered <lgl>,
## # Total Number of Moderna doses delivered <lgl>,
## # Total Number of Janssen doses delivered <lgl>,
## # Total Number of doses from Other manufacturer delivered <lgl>,
## # Total Number of Janssen doses administered <lgl>,
## # Total Number of Moderna doses administered <lgl>,
## # Total Number of Pfizer doses adminstered <lgl>,
## # Total Number of doses from Other manufacturer administered <lgl>,
## # People Fully Vaccinated Moderna Resident <dbl>,
## # People Fully Vaccinated Pfizer Resident <dbl>,
## # People Fully Vaccinated Janssen Resident <dbl>,
## # People Fully Vaccinated Other 2-dose manufacturer Resident <dbl>,
## # People 18+ Fully Vaccinated Moderna Resident <dbl>,
## # People 18+ Fully Vaccinated Pfizer Resident <dbl>,
## # People 18+ Fully Vaccinated Janssen Resident <dbl>,
## # People 18+ Fully Vaccinated Other 2-dose manufacturer Resident <dbl>,
## # People with 2 Doses by State of Residence <lgl>,
## # Percent of Total Pop with 1+ Doses by State of Residence <dbl>,
## # People 18+ with 1+ Doses by State of Residence <lgl>,
## # Percent of 18+ Pop with 1+ Doses by State of Residence <dbl>,
## # Percent of Total Pop with 2 Doses by State of Residence <dbl>,
## # People 18+ with 2 Doses by State of Residence <dbl>,
## # Percent of 18+ Pop with 2 Doses by State of Residence <dbl>,
## # People with 1+ Doses by State of Residence <dbl>,
## # People 65+ with at least One Dose by State of Residence <dbl>,
## # Percent of 65+ Pop with at least One Dose by State of Residence <dbl>,
## # People 65+ Fully Vaccinated by State of Residence <dbl>,
## # Percent of 65+ Pop Fully Vaccinated by State of Residence <dbl>,
## # People 65+ Fully Vaccinated_Moderna_Resident <dbl>,
## # People 65+ Fully Vaccinated_Pfizer_Resident <dbl>,
## # People 65+ Fully Vaccinated_Janssen_Resident <dbl>,
## # People 65+ Fully Vaccinated_Other 2-dose Manuf_Resident <dbl>,
## # 65+ Doses Administered by State where Administered <dbl>,
## # Doses Administered per 100k of 65+ pop by State where Administered <dbl>,
## # Doses Delivered per 100k of 65+ pop <dbl>,
## # People 12+ with at least One Dose by State of Residence <dbl>,
## # Percent of 12+ Pop with at least One Dose by State of Residence <dbl>,
## # People 12+ Fully Vaccinated by State of Residence <dbl>,
## # Percent of 12+ Pop Fully Vaccinated by State of Residence <dbl>,
## # People 12+ Fully Vaccinated_Moderna_Resident <dbl>,
## # People 12+ Fully Vaccinated_Pfizer_Resident <dbl>,
## # People 12+ Fully Vaccinated_Janssen_Resident <dbl>,
## # People 12+ Fully Vaccinated_Other 2-dose Manuf_Resident <dbl>,
## # 12+ Doses Administered by State where Administered <dbl>,
## # Doses Administered per 100k of 12+ pop by State where Administered <dbl>,
## # Doses Delivered per 100k of 12+ pop <dbl>,
## # People 5+ with at least One Dose by State of Residence <dbl>,
## # Percent of 5+ Pop with at least One Dose by State of Residence <dbl>,
## # People 5+ Fully Vaccinated by State of Residence <dbl>,
## # Percent of 5+ Pop Fully Vaccinated by State of Residence <dbl>,
## # People 5+ Fully Vaccinated_Moderna_Resident <dbl>,
## # People 5+ Fully Vaccinated_Pfizer_Resident <dbl>,
## # People 5+ Fully Vaccinated_Janssen_Resident <dbl>,
## # People 5+ Fully Vaccinated_Other 2-dose Manuf_Resident <dbl>,
## # 5+ Doses Administered by State where Administered <dbl>,
## # Doses Administered per 100k of 5+ pop by State where Administered <dbl>,
## # Doses Delivered per 100k of 5+ pop <dbl>,
## # People who have received a booster dose <dbl>,
## # Percent of fully vaccinated people with booster doses <dbl>,
## # People 18+ who have received a booster dose <dbl>,
## # Percent of fully vaccinated people 18+ with booster doses <dbl>,
## # People 50+ who have received a booster dose <dbl>,
## # Percent of fully vaccinated people 50+ with booster doses <dbl>,
## # People 65+ who have received a booster dose <dbl>,
## # Percent of fully vaccinated people 65+ with booster doses <dbl>,
## # People with Moderna booster dose <dbl>,
## # People with Pfizer booster dose <dbl>,
## # People with Janssen booster dose <dbl>,
## # People with booster dose of an Other manufacturer <dbl>,
## # Total Count People w/Booster Primary Pfizer Minus TX <lgl>,
## # Total Count People w/Booster Primary Moderna Minus TX <lgl>,
## # Total Count People w/Booster Primary J&J Minus TX <lgl>,
## # Total Count People w/Booster Primary Other Minus TX <lgl>,
## # Total Count People w/Booster Booster Pfizer Minus TX <lgl>,
## # Total Count People w/Booster Booster Moderna Minus TX <lgl>,
## # Total Count People w/Booster Booster J&J Minus TX <lgl>,
## # Total Count People w/Booster Booster Other Minus TX <lgl>,
## # Count People Primary Pfizer Booster Pfizer <dbl>,
## # Count People Primary Pfizer Booster Moderna <dbl>,
## # Count People Primary Pfizer Booster J&J <dbl>,
## # Count People Primary Pfizer Booster Uknown <dbl>,
## # Count People Primary Moderna Booster Pfizer <dbl>,
## # Count People Primary Moderna Booster Moderna <dbl>,
## # Count People Primary Moderna Booster J&J <dbl>,
## # Count People Primary Moderna Booster Uknown <dbl>,
## # Count People Primary J&J Booster Pfizer <dbl>,
## # Count People Primary J&J Booster Moderna <dbl>,
## # Count People Primary J&J Booster J&J <dbl>, …
BONUS
- Take your code from #7 and assign it to the variable “vacc_dat”.
- Filter out the “United States” from
State/Territory/Federal Entity
. Make sure to reassign this to “vacc_dat”
- Create a ggplot boxplot (
geom_boxplot()
) where (1) the x-axis is Total Doses Delivered
and (2) the y-axis is Percent of fully vaccinated people with booster doses
.
- You change the
labs()
layer so that the x-axis is “Total Doses Delivered: Greater than 10,000,000”
vacc_dat <-
vacc %>%
mutate(across(.cols = starts_with("Total"),
.fns = function(x) x > 10000000)) %>%
filter(`State/Territory/Federal Entity` != "United States")
ggplot(vacc_dat) +
geom_boxplot(aes(x = `Total Doses Delivered`,
y = `Percent of fully vaccinated people with booster doses`)) +
labs(x = "Total Doses Delivered: Greater than 10,000,000")
