```{r knit-setup, echo=FALSE, include = FALSE} library(knitr) opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.height = 4, fig.width = 7, comment = "") library(jhur) library(tidyverse) library(tidyr) library(emo) ``` ## Recap - `pivot_longer()` helps us take our data from wide to long format - `names_to =` gives a new name to the pivoted columns\ - `values_to =` gives a new name to the values that used to be in those columns\ - `pivot_wider()` helps us take our data from long to wide format - `names_from` specifies the old column name that contains the new column names\ - `values_from` specifies the old column name that contains new cell values\ - to merge/join data sets together need a variable in common - usually "id" 📃[Cheatsheet](https://jhudatascience.org/intro_to_r/modules/cheatsheets/Day-6.pdf) ## Recap continued - to merge/join data sets together need a variable in common - usually "id" - `?join` - see different types of joining for `dplyr` - `inner_join(x, y)` - only rows that match for `x` and `y` are kept - `full_join(x, y)` - all rows of `x` and `y` are kept - `left_join(x, y)` - all rows of `x` are kept even if not merged with `y` - `right_join(x, y)` - all rows of `y` are kept even if not merged with `x` - `anti_join(x, y)` - all rows from `x` not in `y` keeping just columns from `x`. - `esquisser()` function of the `esquisse` package can help make plot sketches 📃[Cheatsheet](https://jhudatascience.org/intro_to_r/modules/cheatsheets/Day-6.pdf) ## `esquisse` and `ggplot2` ```{r, fig.alt="esquisse", out.width = "28%", echo = FALSE, fig.show='hold',fig.align='center'} knitr::include_graphics("https://pbs.twimg.com/media/DoaBCAwWsAEaz-y.png") ``` ```{r, fig.alt="ggplot2", out.width = "19%", echo = FALSE, fig.show='hold',fig.align='center'} knitr::include_graphics("https://d33wubrfki0l68.cloudfront.net/2c6239d311be6d037c251c71c3902792f8c4ddd2/12f67/css/images/hex/ggplot2.png") ``` ## Why learn ggplot2? More customization: - branding - making plots interactive - combining plots Easier plot automation (creating plots in scripts) Faster (eventually) ## ggplot2 - A package for producing graphics - gg = *Grammar of Graphics* - Created by Hadley Wickham in 2005 - Belongs to "Tidyverse" family of packages - *"Make a ggplot"* = Make a plot with the use of ggplot2 package Resources: - - ## ggplot2 Based on the idea of: ::: {style="color: red;"} layering ::: plot objects are placed on top of each other with `+` 📉 +\ 📈 ## ggplot2 ```{r, fig.alt="ggplotcake", out.width = "90%", echo = FALSE, fig.show='hold',fig.align='center'} knitr::include_graphics("img/cake.png") ``` Slide Credit: Tanya Shapiro ## ggplot2 - Pros: extremely powerful/flexible -- allows combining multiple plot elements together, allows high customization of a look, many resources online - Cons: ggplot2-specific "grammar of graphic" of constructing a plot - [ggplot2 gallery](https://www.r-graph-gallery.com/ggplot2-package.html) ## Tidy data To make graphics using `ggplot2`, our data needs to be in a **tidy** format **Tidy data**: 1. Each variable forms a column. 2. Each observation forms a row. Messy data: - Column headers are values, not variable names. - Multiple variables are stored in one column. - Variables are stored in both rows and columns. ## Tidy data: example Ideally we want each variable as a column and we want each observation in a row. Column headers are values, not variable names: ```{r, echo = FALSE, fig.align='center', out.width="80%"} include_graphics("img/messy_data_ex1.png") ``` ## Now the the data is "tidy" and in long format ```{r, echo = FALSE, fig.align='center'} include_graphics("img/tidy_data_ex1.png") ``` Read more about tidy data and see other examples: [Tidy Data](https://vita.had.co.nz/papers/tidy-data.pdf) tutorial ## Data to plot Type `?Orange` for more information. Is the data in tidy? Is it in long format? ```{r} head(Orange) ``` # First plot with `ggplot2` package ## First layer of code with `ggplot2` package Will set up the plot - it will be empty! ```{r, echo = FALSE, fig.align='center'} knitr::include_graphics("img/ing_and_base.png") ``` ## First layer of code with `ggplot2` package - **Aesthetic mapping** `aes(x= , y =)` describes how variables in our data are mapped to elements of the plot - Note you don't need to use `mapping` but it is helpful to know what we are doing. ::: codeexample ```{r, eval = FALSE, class.source = "codeexample"} library(ggplot2) # don't forget to load ggplot2 # This is not code but shows the general format ggplot({data_to plot}, aes(x = {var in data to plot}, y = {var in data to plot})) ``` ::: ```{r, fig.width=3, fig.height=2.5, fig.align='center', class.source = "codereg"} ggplot(Orange, aes(x = circumference, y = age)) ``` ## Next layer code with `ggplot2` package ```{r, echo = FALSE, fig.align='center', out.width = "35%"} knitr::include_graphics("img/the_geoms.png") ``` There are many to choose from, to list just a few: - `geom_point()` -- points (we have seen) - `geom_line()` -- lines to connect observations - `geom_boxplot()` -- boxplots - `geom_histogram()` -- histogram - `geom_bar()` -- bar plot - `geom_col()` -- column plot - `geom_tile()` -- blocks filled with color ## Next layer code with `ggplot2` package When to use what plot? A few examples: - a scatterplot (`geom_point()`): to examine the relationship between two sets of continuous numeric data - a barplot (`geom_bar()`): to compare the distribution of a quantitative variable (numeric) between groups or categories - a histogram (`geom_hist()`): to observe the overall distribution of numeric data - a boxplot (`geom_boxplot()`): to compare values between different factor levels or categories ## Next layer code with `ggplot2` package Need the `+` sign to add the next layer to specify the type of plot ::: codeexample ```{r, eval = FALSE, class.source = "codeexample"} ggplot({data_to plot}, aes(x = {var in data to plot}, y = {var in data to plot})) + geom_{type of plot} ``` ::: ```{r, fig.width=4, fig.height=3, fig.align='center'} ggplot(Orange, aes(x = circumference, y = age)) + geom_point() ``` Read as: *using Orange data, and provided aesthetic mapping, add points to the plot* ## Tip - plus sign `+` must come at end of line Having the + sign at the beginning of a line will not work! ```{r, eval = FALSE} ggplot(food, aes(x = item_ID, y = item_price_change, fill = item_categ)) + geom_boxplot() ``` Pipes will also not work in place of `+`! ```{r,eval = FALSE} ggplot(food, aes(x = item_ID, y = item_price_change, fill = item_categ)) %>% geom_boxplot() ``` ## Plots can be assigned as an object {.mall} ```{r, fig.width=4, fig.height=3, fig.align='center'} plt1 <- ggplot(Orange, aes(x = circumference, y = age)) + geom_point() plt1 ``` ## Examples of different geoms ```{r, fig.show="hold", out.width="40%"} plt1 <- ggplot(Orange, aes(x = circumference, y = age)) + geom_point() plt2 <- ggplot(Orange, aes(x = circumference, y = age)) + geom_line() plt1 # fig.show = "hold" makes plots appear plt2 # next to one another in the chunk settings ``` ## Specifying plot layers: combining multiple layers Layer a plot on top of another plot with `+` ```{r, fig.width=4, fig.height=3, fig.align='center'} ggplot(Orange, aes(x = circumference, y = age)) + geom_point() + geom_line() ``` ## Adding color - can map color to a variable ```{r, fig.width=4, fig.height=3, fig.align='center'} ggplot(Orange, aes(x = circumference, y = age, color = Tree)) + geom_point() + geom_line() ``` ## Adding color - or change the color of each plot layer You can change look of each layer separately. Note the arguments like `linetype` and `alpha` that allow us to change the opacity of the points and style of the line respectively. ```{r, fig.width=5, fig.height=3, fig.align='center'} ggplot(Orange, aes(x = circumference, y = age)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "black", linetype = 2) ``` `linetype` can be given as a number. See the docs for what numbers correspond to what `linetype`! # Customize the look of the plot ```{r, echo = FALSE, fig.align='center', out.width= "50%"} knitr::include_graphics("img/frosting.png") ``` ## Customize the look of the plot {.codesmall} You can change the look of whole plot using [`theme_*()` functions](https://ggplot2.tidyverse.org/reference/ggtheme.html). ```{r, fig.width=5, fig.height=3, fig.align='center'} ggplot(Orange, aes(x = circumference, y = age)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + theme_dark() ``` ## More themes! There's not only the built in ggplot2 themes but all kinds of themes from other packages! - [ggthemes](https://jrnold.github.io/ggthemes/) - [ThemePark package](https://github.com/MatthewBJane/ThemePark) - [hrbr themes](https://github.com/hrbrmstr/hrbrthemes) ## Customize the look of the plot You can change the look of whole plot - **specific elements, too** - like changing [font](http://www.cookbook-r.com/Graphs/Fonts/) and font size - or even more [fonts](https://blog.revolutionanalytics.com/2012/09/how-to-use-your-favorite-fonts-in-r-charts.html) ```{r, fig.width=6, fig.height=3.5, fig.align='center'} ggplot(Orange, aes(x = circumference, y = age)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + theme_bw() + theme(text=element_text(size=16, family="Comic Sans MS")) ``` ## Adding labels {.codesmall} The `labs()` function can help you add or modify titles on your plot. The `title` argument specifies the title. The `x` argument specifies the x axis label. The `y` argument specifies the y axis label. ```{r, fig.width=4, fig.height=2.5, fig.align='center'} ggplot(Orange, aes(x = circumference, y = age)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + labs(title = "My plot of orange tree data", x = "Tree Circumference (mm)", y = "Tree Age (days since 12/31/1968)") ``` ## Adding labels line break {.codesmall} Line breaks can be specified using `\n` within the `labs()` function to have a label with multiple lines. ```{r, fig.width=4, fig.height=2.5, fig.align='center'} ggplot(Orange, aes(x = circumference, y = age)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + labs(title = "Plot of orange tree data from 1968: \n trunk circumference vs tree age", x = "Tree Circumference (mm)", y = "Tree Age (days since 12/31/1968)") ``` ## Changing axis: specifying axis scale {.codesmall} `scale_x_continuous()` and `scale_y_continuous()` can change how the axis is plotted. Can use the `breaks` argument to specify how you want the axis ticks to be. ```{r, fig.width=5, fig.height=3, fig.align='center'} range(pull(Orange, circumference)) range(pull(Orange, age)) plot_scale <-ggplot(Orange, aes(x = circumference, y = age)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + scale_x_continuous(breaks = seq(from = 20, to = 240, by = 20)) + scale_y_continuous(breaks = seq(from = 100, to = 1600, by = 200)) ``` ## Changing axis: specifying axis limits `xlim()` and `ylim()` can specify the limits for each axis ```{r, fig.width=5, fig.height=3, fig.align='center'} ggplot(Orange, aes(x = circumference, y = age)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + labs(title = "My plot of orange tree circumference vs age") + xlim(100, max(pull(Orange, circumference))) ``` ## Changing axis: specifying axis scale {.codesmall} ```{r, fig.width=5, fig.height=1.8, fig.align='center'} plot_scale ``` ```{r, fig.width=5, fig.height=1.8, fig.align='center', echo = TRUE} ggplot(Orange, aes(x = circumference, y = age)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) ``` ## Changing axis: specifying axis limits `xlim()` and `ylim()` can specify the limits for each axis ```{r, fig.width=5, fig.height=3, fig.align='center'} ggplot(Orange, mapping = aes(x = circumference, y = age)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + labs(title = "My plot of orange tree circumference vs age") + xlim(100, max(pull(Orange, circumference))) ``` ## Modifying plot objects You can add to a plot object to make changes! Note that we can save our plots as an object like `plt1` below. And now if we reference `plt1` again our plot will print out! ```{r, fig.width=5, fig.height=3, fig.align='center'} plt1 <- ggplot(Orange, aes(x = circumference, y = age)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + labs(title = "My plot of orange tree circumference vs age") + xlim(100, max(pull(Orange, circumference))) plt1 + theme_minimal() ``` ## Overwriting specifications It's possible to go in and change specifications with newer layers ```{r, fig.width=5, fig.height=3, fig.align='center'} Orange %>% ggplot(aes(x = circumference, y = age, color = Tree)) + geom_line(size = 0.8) ``` ## Removing the legend label You can use `theme(legend.position = "none")` to remove the legend. ```{r, fig.width=5, fig.height=3, fig.align='center'} Orange %>% ggplot(aes(x = circumference, y = age, color = Tree)) + geom_line(size = 0.8) + theme(legend.position = "none") ``` ## Overwriting specifications It's possible to go in and change specifications with newer layers ```{r, fig.width=5, fig.height=3, fig.align='center'} Orange %>% ggplot(aes(x = circumference, y = age, color = Tree)) + geom_line(size = 0.8, color = "black") ``` ## Summary - `ggplot()` specifies what data use and what variables will be mapped to where - inside `ggplot()`, `aes(x = , y = , color =)` specify what variables correspond to what aspects of the plot in general - layers of plots can be combined using the `+` at the **end** of lines - special [`theme_*()` functions](https://ggplot2.tidyverse.org/reference/ggtheme.html) can change the overall look - individual layers can be customized using arguments like: `size`, `color` `alpha` (more transparent is closer to 0), and `linetype` - labels can be added with the `labs()` function and `x`, `y`, `title` arguments - the `\n` can be used for line breaks - `xlim()` and `ylim()` can limit or expand the plot area - `scale_x_continuous()` and `scale_y_continuous()` can modify the scale of the axes - by default, `ggplot()` removes points with missing values from plots. ## Lab 1 🏠 [Class Website](https://jhudatascience.org/intro_to_r/)\ 💻 [Lab](https://jhudatascience.org/intro_to_r/modules//Data_Visualization/lab/Data_Visualization_Lab.Rmd) ## theme() function: The `theme()` function can help you modify various elements of your plot. Here we will adjust the font size of the plot title. ```{r, fig.width=5, fig.height=3, fig.align='center'} ggplot(Orange, aes(x = circumference, y = age)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + labs(title = "Circumference vs age") + theme(plot.title = element_text(size = 20)) ``` ## theme() function The `theme()` function always takes: 1. an object to change (use `?theme()` to see - `plot.title`, `axis.title`, `axis.ticks` etc.) 2. the aspect you are changing about this: `element_text()`, `element_line()`, `element_rect()`, `element_blank()` 3. what you are changing: - text: `size`, `color`, `fill`, `face`, `alpha`, `angle` - position: `"top"`, `"bottom"`, `"right"`, `"left"`, `"none"` - rectangle: `size`, `color`, `fill`, `linetype` - line: `size`, `color`, `linetype` ## theme() function: center title and change size The `theme()` function can help you modify various elements of your plot. Here we will adjust the horizontal justification (`hjust`) of the plot title. ```{r, fig.width=5, fig.height=3, fig.align='center'} ggplot(Orange, aes(x = circumference, y = age)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + labs(title = "Circumference vs age") + theme(plot.title = element_text(hjust = 0.5, size = 20)) ``` ## theme() function: change title and axis format ```{r, fig.width=5, fig.height=3, fig.align='center'} ggplot(Orange, aes(x = circumference, y = age)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + labs(title = "Circumference vs age") + theme(plot.title = element_text(hjust = 0.5, size = 20), axis.title = element_text(size = 16)) ``` ## theme() function: moving (or removing) legend {.codesmall} If specifying position - use: "top", "bottom", "right", "left", "none" ```{r, fig.show="hold", out.width="40%"} ggplot(Orange, aes(x = circumference, y = age, color = Tree)) + geom_line() ggplot(Orange, aes(x = circumference, y = age, color = Tree)) + geom_line() + theme(legend.position = "none") ``` ## Cheatsheet about theme ```{r, echo = FALSE, fig.align='center', out.width= "55%"} knitr::include_graphics("img/theme_sheet.png") ``` ## Keys for specifications `linetype` ```{r, echo = FALSE, fig.align='center', out.width= "60%"} knitr::include_graphics("img/line_types.png") ``` [source](http://www.cookbook-r.com/Graphs/Shapes_and_line_types/figure/line_types-1.png) ## Linetype key - *geoms* that draw lines have a `linetype` parameter - these include values that are strings like "blank", "solid", "dashed", "dotdash", "longdash", and "twodash" ```{r, fig.width=5, fig.height=3, fig.align='center'} Orange %>% ggplot(aes(x = circumference, y = age, color = Tree)) + geom_line(size = 0.8, linetype = "twodash") ``` ## Keys for specifications `shape` ```{r, echo = FALSE, fig.align='center', out.width= "50%"} knitr::include_graphics("img/shape.png") ``` [source](http://www.cookbook-r.com/Graphs/Shapes_and_line_types/figure/unnamed-chunk-2-1.png) ## shape key - *geoms* that draw have points have a `shape` parameter - these include numeric values (don't need quotes for these) and some characters values (need quotes for these) ```{r, fig.width=5, fig.height=3, fig.align='center'} Orange %>% ggplot(aes(x = circumference, y = age, color = Tree)) + geom_point(size = 2, shape = 12) ``` ## Can make your own theme to use on plots! Guide on how to: ## Group and/or color by variable's values First, we will read in some data for the purpose of demonstration about food prices over time. ```{r} food <- read_csv( file = "http://jhudatascience.org/intro_to_r/data/food_price.csv") ``` ## Food data - 2 different categories (e.g. pasta, rice) - 4 different items (e.g. 2 of each category) - 10 price values changes collected over time for each item ```{r} food ``` ## Starting a plot ```{r, fig.width=5, fig.height=3, fig.align='center'} ggplot(food, aes(x = observation_time, y = item_price_change)) + geom_line() ``` ## If it looks confusing to you, try again ```{r, echo = FALSE, fig.align='center', out.width= "30%"} knitr::include_graphics("https://media.giphy.com/media/xT0xeuOy2Fcl9vDGiA/giphy.gif") ``` ## Using `group` in plots You can use `group` element in a mapping to indicate that each `item_ID` will have a separate price line. ```{r, fig.width=5, fig.height=3, fig.align='center'} ggplot(food, aes(x = observation_time, y = item_price_change, group = item_ID)) + geom_line() ``` ## Adding color will automatically group the data ```{r, fig.width=5, fig.height=3, fig.align='center'} ggplot(food, aes(x = observation_time, y = item_price_change, color = item_ID)) + geom_line() ``` ## Adding color will automatically group the data ```{r, fig.width=5, fig.height=3, fig.align='center'} ggplot(food, aes(x = observation_time, y = item_price_change, color = item_categ)) + geom_line() ``` ## Sometimes you need group and color ```{r, fig.width=5, fig.height=3, fig.align='center'} ggplot(food, aes(x = observation_time, y = item_price_change, group = item_ID, color = item_categ)) + geom_line() ``` ## Adding a facet can help make it easier to see what is happening {.codesmall} Two options: `facet_grid()`- creates a grid shape `facet_wrap()` -more flexible Need to specify how you are faceting with the `~` sign. ```{r, fig.width=4, fig.height=3, fig.align='center'} ggplot(food, aes(x = observation_time, y = item_price_change, color = item_ID)) + geom_line() + facet_grid( ~ item_categ) ``` ## facet_wrap() {.codesmall} - more flexible - arguments `ncol` and `nrow` can specify layout - can have different scales for axes using `scales = "free_x"`, `scales = "free_y"`, or `scales = "free"` ```{r, fig.width=4, fig.height=2.7, fig.align='center'} rp_fac_plot <- ggplot(food, aes(x = observation_time, y = item_price_change,color = item_ID)) + geom_line() + geom_point() + facet_wrap( ~ item_categ, ncol = 1, scales = "free") rp_fac_plot ``` ## Tips! Let's talk additional tricks and tips for making ggplots! ## Tips - Color vs Fill {.codesmall} - `color` is needed for points and lines - `fill` is generally needed for boxes and bars ```{r, out.width="30%", fig.show='hold'} ggplot(food, aes(x = item_ID, y = item_price_change, color = item_categ)) + #color creates an outline geom_boxplot() ggplot(food, aes(x = item_ID, y = item_price_change, fill = item_categ)) + # fills the boxplot geom_boxplot() ``` ## Tip - Good idea to add jitter layer to top of box plots Can add `width` argument to make the jitter more narrow. ```{r, fig.width=5 , fig.height=3, fig.align='center'} ggplot(food, aes(x = item_ID, y = item_price_change, fill = item_categ)) + geom_boxplot() + geom_jitter(width = .06) ``` ## Tip - be careful about colors for color vision deficiency `scale_fill_viridis_d()` for discrete /categorical data `scale_fill_viridis_c()` for continuous data ```{r, fig.width=5 , fig.height=3, fig.align='center'} ggplot(food, aes(x = item_ID, y = item_price_change, fill = item_categ)) + geom_boxplot() + geom_jitter(width = .06) + scale_fill_viridis_d() ``` ## Tip - can pipe data after wrangling into ggplot() ```{r, fig.width=5 , fig.height=2.5, fig.align='center'} food_bar <-food %>% group_by(item_categ) %>% summarize("max_price_change" = max(item_price_change)) %>% ggplot(aes(x = item_categ, y = max_price_change, fill = item_categ)) + scale_fill_viridis_d()+ geom_col() + theme(legend.position = "none") food_bar ``` ## Tip - color outside of `aes()` Can be used to add an outline around column/bar plots. ```{r, fig.width=5 , fig.height=3, fig.align='center'} food_bar + geom_col(color = "black") ``` ## Tip - col vs bar {.codesmall} `geom_bar(x =)` can only use one `aes` mapping `geom_col(x = , y = )` can have two ## Tip - Check what you plot {.codesmall} `r emo::ji("warning")` May not be plotting what you think you are! `r emo::ji("warning")` ```{r, fig.width=5 , fig.height=3, fig.align='center'} ggplot(food, aes(x = item_ID, y = item_price_change, fill = item_categ)) + geom_col() ``` ## What did we plot? Always good to check it is correct! {.codesmall} ```{r} head(food, n = 3) food %>% group_by(item_ID) %>% summarize(sum = sum(item_price_change)) ``` ## Try that again {.codesmall} ```{r, fig.width=5 , fig.height=3, fig.align='center'} food %>% group_by(item_categ, item_ID) %>% summarize(mean_change = mean(item_price_change)) ``` ## Try that again {.codesmall} ```{r} food %>% group_by(item_categ, item_ID) %>% summarize(mean_change = mean(item_price_change)) %>% ggplot(aes(x = item_ID, y = mean_change, fill = item_categ)) + geom_col() ``` ## Tip - make sure labels aren't too small ```{r, fig.width=5 , fig.height=3, fig.align='center'} food_bar + theme(text = element_text(size = 20)) ``` ## Tip- if you need you can remove outliers Set `outlier.shape = NA` to get ride of outliers. Be careful about if you really should remove these! However, if can be helpful if your plot is getting stretched to accommodate plotting an outlier. You can always say in the figure legend what you removed. ```{r} esoph1 <- ggplot(esoph, aes(y = ncases, x = agegp)) + geom_boxplot() esoph2 <- ggplot(esoph, aes(y = ncases, x = agegp)) + geom_boxplot(outlier.shape = NA) ``` ```{r, fig.show="hold", out.width="40%", echo = FALSE} esoph1 # fig.show = "hold" makes plots appear esoph2 # next to one another in the chunk settings ``` ## NA Values {.codesmall} - if it is a numeric value it will just get dropped from the graph and you will see a warning - it is categorical you will see it on the graph and will need to filter to remove the NA category ```{r} icecream <-tibble(flavor = rep(c("chocolate", "vanilla", NA,"chocolate", "vanilla"), 8)) icecream1 <- ggplot(icecream, aes(x = flavor)) + geom_bar() + theme(text=element_text(size=24)) icecream2 <- icecream %>% drop_na(flavor) %>% ggplot( aes(x = flavor)) + geom_bar() + theme(text=element_text(size=24)) ``` ```{r, fig.show="hold", out.width="30%", echo = FALSE} icecream1 # fig.show = "hold" makes plots appear icecream2 # next to one another in the chunk settings ``` # Extensions ## `patchwork` package Great for combining plots together Also check out the [`patchwork` package](https://patchwork.data-imaginist.com/) ```{r, out.width= "50%", fig.align='center'} #install.packages("patchwork") library(patchwork) (plt1 + plt2)/plt2 ``` ## `directlabels` package Great for adding labels directly onto plots ```{r, out.width="50%", fig.align="center"} #install.packages("directlabels") library(directlabels) direct.label(rp_fac_plot, method = list("angled.boxes")) ``` ## plotly ```{r} #install.packages("plotly") library("plotly") # creates interactive plots! ggplotly(rp_fac_plot) ``` Also check out the [`ggiraph` package](https://www.rdocumentation.org/packages/ggiraph/versions/0.6.1) # Saving plots ## Saving a ggplot to file A few options: - RStudio \> Plots \> Export \> Save as image / Save as PDF - RStudio \> Plots \> Zoom \> [right mouse click on the plot] \> Save image as - In the code ```{r, eval = FALSE} ggsave(filename = "saved_plot.png", # will save in working directory plot = rp_fac_plot, width = 6, height = 3.5) # by default in inches ``` ## Summary - The `theme()` function helps you specify aspects about your plot - move or remove a legend with `theme(legend.position = "none")` - change font aspects of individual text elements `theme(plot.title = element_text(size = 20))` - center a title: `theme(plot.title = element_text(hjust = 0.5))` - sometimes you need to add a `group` element to `aes()` if your plot looks strange - make sure you are plotting what you think you are by checking the numbers! - `facet_grid(~ variable)` and `facet_wrap(~variable)` can be helpful to quickly split up your plot - `facet_wrap()` allows for a `scales = "free"` argument so that you can have a different axis scale for different plots - use `fill` to fill in boxplots ## Good practices for plots Check out this [guide](https://jhudatascience.org/tidyversecourse/dataviz.html#making-good-plots) for more information! ## Lab 2 🏠 [Class Website](https://jhudatascience.org/intro_to_r/)\ 💻 [Lab](https://jhudatascience.org/intro_to_r/modules//Data_Visualization/lab/Data_Visualization_Lab.Rmd) ```{r, fig.alt="The End", out.width = "50%", echo = FALSE, fig.align='center'} knitr::include_graphics(here::here("images/the-end-g23b994289_1280.jpg")) ``` Image by Gerd Altmann from Pixabay