1. Load all the libraries we will use in this lab.
library(readr)
library(dplyr)
library(ggplot2)
  1. Create a function that takes one argument, a vector, and returns the sum of the vector squared. Call it “sum_squared”. Test your function on the vector c(2,7,21,30,90) - you should get the answer 22500.
x <- c(2,7,21,30,90)

sum_squared <- function(x) sum(x) ^ 2
sum_squared(x)
## [1] 22500
sum_squared <- function(x) {
  out <- sum(x) ^ 2
  return(out)
}
sum_squared(x)
## [1] 22500
  1. Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). Call it “has_n”. Test your function on the vector c(2,7,21,30,90) and number 21 - you should get the answer TRUE.
x <- c(2,7,21,30,90)
n <- 21

has_n <- function(x, n) n %in% x
has_n(x, n)
## [1] TRUE
  1. Amend the function “has_n” from #2 so that it takes a default value of 21 for the numeric argument.
x <- c(2,7,21,30,90)
n <- 21

has_n <- function(x, n = 21) n %in% x
has_n(x)
## [1] TRUE
  1. Read in the SARS-CoV-2 Vaccination data from http://jhudatascience.org/intro_to_R_class/data/USA_covid19_vaccinations.csv. Assign the data the name “vacc”.
vacc <- read_csv("http://jhudatascience.org/intro_to_R_class/data/USA_covid19_vaccinations.csv")
# If downloaded
# vacc <- read_csv("USA_covid19_vaccinations.csv")
  1. We want to get some summary statistics on the Moderna vaccines. Use across inside summarize to get the sum total number vaccine doses for any variable containing the word “Moderna” or starting with “Total”. Hint: use matches() AND starts_with() to select the right columns inside across. Keep in mind that this includes the United States as a whole and so is not totally accurate!
vacc %>% 
  summarize(across(.cols = matches("Moderna") & starts_with("Total"), 
                   .fns = sum))
## # A tibble: 1 x 4
##   `Total Number of M… `Total Number of … `Total Count Peopl… `Total Count Peopl…
##                 <dbl>              <dbl>               <dbl>               <dbl>
## 1           482227080          403816391                  NA                  NA
vacc %>% 
  summarize(across(.cols = matches("Moderna") & starts_with("Total"), 
                   .fns = sum,
                   na.rm = TRUE))
## # A tibble: 1 x 4
##   `Total Number of M… `Total Number of … `Total Count Peopl… `Total Count Peopl…
##                 <dbl>              <dbl>               <dbl>               <dbl>
## 1           482227080          403816391            29736587            31851324
  1. Use across and mutate to convert all columns containing the word “Percent” into proportions (i.e., divide that value by 100). Hint: use matches() to select the right columns. Use a “function on the fly” to divide by 100. It will also be easier to check your work if you select() columns that match “Percent”.
vacc %>% 
  mutate(across(.cols = matches("Percent"), 
                .fns = function(x) x / 100)) %>% 
  select(matches("Percent"))
## # A tibble: 64 x 34
##    `Percent of Total … `Percent of 18+ Po… `Percent of Total… `Percent of 18+ P…
##                  <dbl>               <dbl>              <dbl>              <dbl>
##  1               0.746               0.866              0.627              0.734
##  2               0.657               0.776              0.568              0.68 
##  3               0.593               0.711              0.482              0.583
##  4               0.636               0.751              0.518              0.618
##  5               0.876               0.95               0.76               0.857
##  6               0.684               0.792              0.577              0.679
##  7              NA                  NA                 NA                 NA    
##  8               0.843               0.95               0.67               0.776
##  9               0.754               0.869              0.668              0.776
## 10               0.905               0.95               0.755              0.852
## # … with 54 more rows, and 30 more variables:
## #   Percent of Total Pop with 1+ Doses by State of Residence <dbl>,
## #   Percent of 18+ Pop with 1+ Doses by State of Residence <dbl>,
## #   Percent of Total Pop with 2 Doses by State of Residence <dbl>,
## #   Percent of 18+ Pop with 2 Doses by State of Residence <dbl>,
## #   Percent of 65+ Pop with at least One Dose by State of Residence <dbl>,
## #   Percent of 65+ Pop Fully Vaccinated by State of Residence <dbl>,
## #   Percent of 12+ Pop with at least One Dose by State of Residence <dbl>,
## #   Percent of 12+ Pop Fully Vaccinated by State of Residence <dbl>,
## #   Percent of 5+ Pop with at least One Dose by State of Residence <dbl>,
## #   Percent of 5+ Pop Fully Vaccinated by State of Residence <dbl>,
## #   Percent of fully vaccinated people with booster doses <dbl>,
## #   Percent of fully vaccinated people 18+ with booster doses <dbl>,
## #   Percent of fully vaccinated people 50+ with booster doses <dbl>,
## #   Percent of fully vaccinated people 65+ with booster doses <dbl>,
## #   Percent People Primary Pfizer Booster Pfizer <dbl>,
## #   Percent People Primary Pfizer Booster Moderna <dbl>,
## #   Percent People Primary Pfizer Booster J&J <dbl>,
## #   Percent People Primary Pfizer Booster Other <dbl>,
## #   Percent People Primary Moderna Booster Pfizer <dbl>,
## #   Percent People Primary Moderna Booster Moderna <dbl>,
## #   Percent People Primary Moderna Booster J&J <dbl>,
## #   Percent People Primary Moderna Booster Other <dbl>,
## #   Percent People Primary J&J Booster Pfizer <dbl>,
## #   Percent People Primary J&J Booster Moderna <dbl>,
## #   Percent People Primary J&J Booster J&J <dbl>,
## #   Percent People Primary J&J Booster Other <dbl>,
## #   Percent People Primary Other Booster Pfizer <dbl>,
## #   Percent People Primary Other Booster Moderna <dbl>,
## #   Percent People Primary Other Booster J&J <dbl>,
## #   Percent People Primary Other Booster Uknown <dbl>
  1. Use across and mutate to convert all columns starting with the word “Total” into a binary variable: TRUE if the value is greater than 10,000,000 and FALSE if less than or equal to 10,000,000. Hint: use starts_with() to select the columns starting with “Total”. Use a “function on the fly” to do a logical test if the value is greater than 10,000,000.
vacc %>% 
  mutate(across(.cols = starts_with("Total"), 
                .fns = function(x) x > 10000000)) 
## # A tibble: 64 x 125
##    `State/Territory/Fe… `Total Doses Deli… `Doses Delivered … `18+ Doses Delive…
##    <chr>                <lgl>                           <dbl>              <dbl>
##  1 United States        TRUE                           194167             249614
##  2 Alaska               FALSE                          185553             246102
##  3 Alabama              FALSE                          175845             226010
##  4 Arkansas             FALSE                          177298             230859
##  5 American Samoa       FALSE                          179165             270975
##  6 Arizona              TRUE                           180044             232419
##  7 Bureau of Prisons    FALSE                              NA                 NA
##  8 California           TRUE                           201694             260288
##  9 Colorado             TRUE                           194586             249059
## 10 Connecticut          FALSE                          218700             274760
## # … with 54 more rows, and 121 more variables:
## #   Total Doses Administered by State where Administered <lgl>,
## #   Doses Administered per 100k by State where Administered <dbl>,
## #   18+ Doses Administered by State where Administered <dbl>,
## #   18+ Doses Administered per 100K by State where Administered <dbl>,
## #   People with at least One Dose by State of Residence <dbl>,
## #   Percent of Total Pop with at least One Dose by State of Residence <dbl>,
## #   People 18+ with at least One Dose by State of Residence <dbl>,
## #   Percent of 18+ Pop with at least One Dose by State of Residence <dbl>,
## #   People Fully Vaccinated by State of Residence <dbl>,
## #   Percent of Total Pop Fully Vaccinated by State of Residence <dbl>,
## #   People 18+ Fully Vaccinated by State of Residence <dbl>,
## #   Percent of 18+ Pop Fully Vaccinated by State of Residence <dbl>,
## #   Total Number of Pfizer doses delivered <lgl>,
## #   Total Number of Moderna doses delivered <lgl>,
## #   Total Number of Janssen doses delivered <lgl>,
## #   Total Number of doses from Other manufacturer delivered <lgl>,
## #   Total Number of Janssen doses administered <lgl>,
## #   Total Number of Moderna doses administered <lgl>,
## #   Total Number of Pfizer doses adminstered <lgl>,
## #   Total Number of doses from Other manufacturer administered <lgl>,
## #   People Fully Vaccinated Moderna Resident <dbl>,
## #   People Fully Vaccinated Pfizer Resident <dbl>,
## #   People Fully Vaccinated Janssen Resident <dbl>,
## #   People Fully Vaccinated Other 2-dose manufacturer Resident <dbl>,
## #   People 18+ Fully Vaccinated Moderna Resident <dbl>,
## #   People 18+ Fully Vaccinated Pfizer Resident <dbl>,
## #   People 18+ Fully Vaccinated Janssen Resident <dbl>,
## #   People 18+ Fully Vaccinated Other 2-dose manufacturer Resident <dbl>,
## #   People with 2 Doses by State of Residence <lgl>,
## #   Percent of Total Pop with 1+ Doses by State of Residence <dbl>,
## #   People 18+ with 1+ Doses by State of Residence <lgl>,
## #   Percent of 18+ Pop with 1+ Doses by State of Residence <dbl>,
## #   Percent of Total Pop with 2 Doses by State of Residence <dbl>,
## #   People 18+ with 2 Doses by State of Residence <dbl>,
## #   Percent of 18+ Pop with 2 Doses by State of Residence <dbl>,
## #   People with 1+ Doses by State of Residence <dbl>,
## #   People 65+ with at least One Dose by State of Residence <dbl>,
## #   Percent of 65+ Pop with at least One Dose by State of Residence <dbl>,
## #   People 65+ Fully Vaccinated by State of Residence <dbl>,
## #   Percent of 65+ Pop Fully Vaccinated by State of Residence <dbl>,
## #   People 65+ Fully Vaccinated_Moderna_Resident <dbl>,
## #   People 65+ Fully Vaccinated_Pfizer_Resident <dbl>,
## #   People 65+ Fully Vaccinated_Janssen_Resident <dbl>,
## #   People 65+ Fully Vaccinated_Other 2-dose Manuf_Resident <dbl>,
## #   65+ Doses Administered by State where Administered <dbl>,
## #   Doses Administered per 100k of 65+ pop by State where Administered <dbl>,
## #   Doses Delivered per 100k of 65+ pop <dbl>,
## #   People 12+ with at least One Dose by State of Residence <dbl>,
## #   Percent of 12+ Pop with at least One Dose by State of Residence <dbl>,
## #   People 12+ Fully Vaccinated by State of Residence <dbl>,
## #   Percent of 12+ Pop Fully Vaccinated by State of Residence <dbl>,
## #   People 12+ Fully Vaccinated_Moderna_Resident <dbl>,
## #   People 12+ Fully Vaccinated_Pfizer_Resident <dbl>,
## #   People 12+ Fully Vaccinated_Janssen_Resident <dbl>,
## #   People 12+ Fully Vaccinated_Other 2-dose Manuf_Resident <dbl>,
## #   12+ Doses Administered by State where Administered <dbl>,
## #   Doses Administered per 100k of 12+ pop by State where Administered <dbl>,
## #   Doses Delivered per 100k of 12+ pop <dbl>,
## #   People 5+ with at least One Dose by State of Residence <dbl>,
## #   Percent of 5+ Pop with at least One Dose by State of Residence <dbl>,
## #   People 5+ Fully Vaccinated by State of Residence <dbl>,
## #   Percent of 5+ Pop Fully Vaccinated by State of Residence <dbl>,
## #   People 5+ Fully Vaccinated_Moderna_Resident <dbl>,
## #   People 5+ Fully Vaccinated_Pfizer_Resident <dbl>,
## #   People 5+ Fully Vaccinated_Janssen_Resident <dbl>,
## #   People 5+ Fully Vaccinated_Other 2-dose Manuf_Resident <dbl>,
## #   5+ Doses Administered by State where Administered <dbl>,
## #   Doses Administered per 100k of 5+ pop  by State where Administered <dbl>,
## #   Doses Delivered per 100k of 5+ pop <dbl>,
## #   People who have received a booster dose <dbl>,
## #   Percent of fully vaccinated people with booster doses <dbl>,
## #   People 18+ who have received a booster dose <dbl>,
## #   Percent of fully vaccinated people 18+ with booster doses <dbl>,
## #   People 50+ who have received a booster dose <dbl>,
## #   Percent of fully vaccinated people 50+ with booster doses <dbl>,
## #   People 65+ who have received a booster dose <dbl>,
## #   Percent of fully vaccinated people 65+ with booster doses <dbl>,
## #   People with Moderna booster dose <dbl>,
## #   People with Pfizer booster dose <dbl>,
## #   People with Janssen booster dose <dbl>,
## #   People with booster dose of an Other manufacturer <dbl>,
## #   Total Count People w/Booster Primary Pfizer Minus TX <lgl>,
## #   Total Count People w/Booster Primary Moderna Minus TX <lgl>,
## #   Total Count People w/Booster Primary J&J Minus TX <lgl>,
## #   Total Count People w/Booster Primary Other Minus TX <lgl>,
## #   Total Count People w/Booster Booster Pfizer Minus TX <lgl>,
## #   Total Count People w/Booster Booster Moderna Minus TX <lgl>,
## #   Total Count People w/Booster Booster J&J Minus TX <lgl>,
## #   Total Count People w/Booster Booster Other Minus TX <lgl>,
## #   Count People Primary Pfizer Booster Pfizer <dbl>,
## #   Count People Primary Pfizer Booster Moderna <dbl>,
## #   Count People Primary Pfizer Booster J&J <dbl>,
## #   Count People Primary Pfizer Booster Uknown <dbl>,
## #   Count People Primary Moderna Booster Pfizer <dbl>,
## #   Count People Primary Moderna Booster Moderna <dbl>,
## #   Count People Primary Moderna Booster J&J <dbl>,
## #   Count People Primary Moderna Booster Uknown <dbl>,
## #   Count People Primary J&J Booster Pfizer <dbl>,
## #   Count People Primary J&J Booster Moderna <dbl>,
## #   Count People Primary J&J Booster J&J <dbl>, …

BONUS

  1. Take your code from #7 and assign it to the variable “vacc_dat”.
vacc_dat <- 
  vacc %>% 
  mutate(across(.cols = starts_with("Total"), 
                .fns = function(x) x > 10000000)) %>% 
  filter(`State/Territory/Federal Entity` != "United States") 

ggplot(vacc_dat) +
  geom_boxplot(aes(x = `Total Doses Delivered`, 
                   y = `Percent of fully vaccinated people with booster doses`)) +
  labs(x = "Total Doses Delivered: Greater than 10,000,000")