We will use data on student dropouts from the State of California during the 2016-2017 school year. More on this data can be found here: https://www.cde.ca.gov/ds/ad/filesdropouts.asp
To preserve school anonymity, “CDS_CODE” is used in place of the individual school’s name.
You can download the data from the JHU website here: http://jhudatascience.org/intro_to_r/data/dropouts.txt
dropouts <- read_delim("http://jhudatascience.org/intro_to_r/data/dropouts.txt", delim = "\t")
## Rows: 59599 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): CDS_CODE, GENDER
## dbl (18): ETHNIC, E7, E8, E9, E10, E11, E12, EUS, ETOT, D7, D8, D9, D10, D11...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dropouts
## # A tibble: 59,599 × 20
## CDS_CODE ETHNIC GENDER E7 E8 E9 E10 E11 E12 EUS ETOT D7
## <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 01100170… 1 M 0 0 0 0 1 1 0 2 0
## 2 01100170… 1 F 0 0 1 0 2 0 0 3 0
## 3 01100170… 2 M 0 0 0 0 0 1 0 1 0
## 4 01100170… 2 F 0 0 2 2 2 1 0 7 0
## 5 01100170… 3 M 0 0 0 1 0 0 0 1 0
## 6 01100170… 3 F 0 0 1 1 2 0 0 4 0
## 7 01100170… 4 M 0 0 0 1 0 0 0 1 0
## 8 01100170… 5 M 0 0 31 32 17 22 0 102 0
## 9 01100170… 5 F 0 0 26 34 30 20 0 110 0
## 10 01100170… 6 M 0 0 19 20 17 13 0 69 0
## # ℹ 59,589 more rows
## # ℹ 8 more variables: D8 <dbl>, D9 <dbl>, D10 <dbl>, D11 <dbl>, D12 <dbl>,
## # DUS <dbl>, DTOT <dbl>, YEAR <dbl>