esquisse
and ggplot2
More customization:
Easier plot automation (creating plots in scripts)
Faster (eventually)
A package for producing graphics - gg = Grammar of Graphics
Created by Hadley Wickham in 2005
Belongs to “Tidyverse” family of packages
“Make a ggplot” = Make a plot with the use of ggplot2 package
Resources:
Based on the idea of:
layering
plot objects are placed on top of each other with +
📉
âž•
📈
Pros: extremely powerful/flexible – allows combining multiple plot elements together, allows high customization of a look, many resources online
Cons: ggplot2-specific “grammar of graphic” of constructing a plot
To make graphics using ggplot2
, our data needs to be in a tidy format
Tidy data:
Messy data:
Each variable forms a column. Each observation forms a row.
Column headers are values, not variable names
Read more about tidy data and see other examples: Tidy Data tutorial by Hadley Wickham
It’s also helpful to have data in long format!!!
set.seed(3) var_1 <- seq(from = 1, to = 30) var_2 <- rnorm(30) my_data = tibble(var_1, var_2) my_data
# A tibble: 30 × 2 var_1 var_2 <int> <dbl> 1 1 -0.962 2 2 -0.293 3 3 0.259 4 4 -1.15 5 5 0.196 6 6 0.0301 7 7 0.0854 8 8 1.12 9 9 -1.22 10 10 1.27 # … with 20 more rows
ggplot2
packageggplot2
packageWill set up the plot - it will be empty!
(mapping= aes(x= , y =))
describes how variables in our data are mapped to elements of the plotlibrary(ggplot2) # don't forget to load ggplot2 # This is not code but shows the general format ggplot({data_to plot}, mapping = aes(x = {var in data to plot}, y = {var in data to plot}))
ggplot(my_data, mapping = aes(x = var_1, y = var_2))
ggplot2
packageThere are many to choose from, to list just a few:
geom_point()
– points (we have seen)geom_line()
– lines to connect observationsgeom_boxplot()
geom_histogram()
geom_bar()
geom_col()
geom_errorbar()
geom_density()
geom_tile()
– blocks filled with colorggplot2
package+
sign to add the next layer to specify the type of plot
ggplot({data_to plot}, mapping = aes(x = {var in data to plot}, y = {var in data to plot})) + geom_{type of plot}</div>
ggplot(my_data, mapping = aes(x = var_1, y = var_2)) + geom_point()
Read as: add points to the plot (use data as provided by the aesthetic mapping)
plt1 <- ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point() plt2 <- ggplot(my_data, aes(x = var_1, y = var_2)) + geom_line() plt1; plt2 # to have 2 plots printed next to each other on a slide
Also check out the patchwork
package
Layer a plot on top of another plot with +
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point() + geom_line()
You can change look of each layer separately.
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "black", linetype = 2)
You can change the look of whole plot using theme_*()
functions.
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + theme_dark()
You can change the look of whole plot - specific elements, too - like changing font and font size - or even more fonts
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + theme_bw(base_size = 20, base_family = "Comic Sans MS")
The labs()
function can help you add or modify titles on your plot.
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + labs(title = "My plot of var1 vs var2", x = "Variable 1", y = "Variable 2")
xlim()
and ylim()
can specify the limits for each axis
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + labs(title = "My plot of var1 vs var2") + xlim(0,40)
scale_x_continuous()
and scale_y_continuous()
can change how the axis is plotted. Can use the breaks
argument to specify how you want the axis ticks to be.
seq(from = 0, to = 30, by = 5)
[1] 0 5 10 15 20 25 30
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + scale_x_continuous(breaks = seq(from = 0, to = 30, by = 5))
The theme()
function can help you modify various elements of your plot. Here we will adjust the horizontal justification (hjust
) of the plot title.
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + labs(title = "My plot of var1 vs var2") + theme(plot.title = element_text(hjust = 0.5, size = 20))
The theme()
function always takes:
?theme()
to see - plot.title
, axis.title
, axis.ticks
etc.)element_text()
, element_line()
, element_rect()
, element_blank()
size
, color
, fill
, face
, alpha
, angle
"top"
, "bottom"
, "right"
, "left"
, "none"
size
, color
, fill
, linetype
size
, color
, linetype
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + labs(title = "My plot of var1 vs var2", x = "Variable 1") + theme(plot.title = element_text(hjust = 0.5, size = 20), axis.title.x = element_text(size = 16))
head(Orange, 3)
Tree age circumference 1 1 118 30 2 1 484 58 3 1 664 87
If specifying position - use: “top”, “bottom”, “right”, “left”, “none”
ggplot(Orange, aes(x = Tree, y = circumference, fill = Tree)) + geom_boxplot() + theme(legend.position = "none")
Guide on how to: https://rpubs.com/mclaire19/ggplot2-custom-themes
First, we will generate some data frame for the purpose of demonstration.
# create 4 vectors: 2x character class and 2x numeric class item_categ <- rep(c("pasta", "rice"),each = 20) item_ID <- rep(seq(from = 1, to = 4), each = 10) item_ID <- paste0("ID_", item_ID) observation_time <- rep(seq(from = 1, to = 10), times = 4) item_price_change <- c(sample(0.5:2.5, size = 10, replace = TRUE), sample(0:1, size = 10, replace = TRUE), sample(2:5, size = 10, replace = TRUE), sample(6:9, size = 10, replace = TRUE)) # use 4 vectors to create data frame with 4 columns food <- tibble(item_ID, item_categ, observation_time, item_price_change)
food
# A tibble: 40 × 4 item_ID item_categ observation_time item_price_change <chr> <chr> <int> <dbl> 1 ID_1 pasta 1 0.5 2 ID_1 pasta 2 2.5 3 ID_1 pasta 3 1.5 4 ID_1 pasta 4 0.5 5 ID_1 pasta 5 0.5 6 ID_1 pasta 6 0.5 7 ID_1 pasta 7 0.5 8 ID_1 pasta 8 0.5 9 ID_1 pasta 9 1.5 10 ID_1 pasta 10 1.5 # … with 30 more rows
ggplot(food, aes(x = observation_time, y = item_price_change)) + geom_line()
group
in plotsYou can use group
element in a mapping to indicate that each item_ID
will have a separate price line.
ggplot(food, aes(x = observation_time, y = item_price_change, group = item_ID)) + geom_line()
ggplot(food, aes(x = observation_time, y = item_price_change, color = item_ID)) + geom_line()
ggplot(food, aes(x = observation_time, y = item_price_change, color = item_categ)) + geom_line()
ggplot(food, aes(x = observation_time, y = item_price_change, group = item_ID, color = item_categ)) + geom_line()
Two options: facet_grid()
- creates a grid shape facet_wrap()
-more flexible
Need to specify how you are faceting with the ~
sign.
ggplot(food, aes(x = observation_time, y = item_price_change, color = item_ID)) + geom_line() + facet_grid( ~ item_categ)
ncol
and nrow
can specify layoutscales = "free_x"
, scales = "free_y"
, or scales = "free"
rp_fac_plot <- ggplot(food, aes(x = observation_time, y = item_price_change, color = item_ID)) + geom_line() + geom_point() + facet_wrap( ~ item_categ, ncol = 1, scales = "free") rp_fac_plot
NOTE: color is needed for points and lines, fill generally needed for boxes and bars
ggplot(food, aes(x = item_ID, y = item_price_change, color = item_categ)) + geom_boxplot()
NOTE: color is needed for points and lines, fill generally needed for boxes and bars
ggplot(food, aes(x = item_ID, y = item_price_change, fill = item_categ)) + geom_boxplot()
+
can’t come at start of a new lineThis will not work! Also don’t use pipes instead of +
!
ggplot(food, aes(x = item_ID, y = item_price_change, fill = item_categ)) + geom_boxplot()
Can add width
argument to make the jitter more narrow.
ggplot(food, aes(x = item_ID, y = item_price_change, fill = item_categ)) + geom_boxplot() + geom_jitter(width = .06)
scale_fill_viridis_d()
for discrete /categorical data scale_fill_viridis_c()
for continuous data
ggplot(food, aes(x = item_ID, y = item_price_change, fill = item_categ)) + geom_boxplot() + geom_jitter(width = .06) + scale_fill_viridis_d()
food_bar <-food %>% group_by(item_categ) %>% summarize("max_price_change" = max(item_price_change)) %>% ggplot(aes(x = item_categ, y = max_price_change, fill = item_categ)) + scale_fill_viridis_d()+ geom_col() + theme(legend.position = "none") food_bar
aes()
Can be used to add an outline around column/bar plots.
food_bar + geom_col(color = "black")
geom_bar()
can only one aes
mapping & geom_col()
can have two
👀May not be plotting what you think you are…
ggplot(food, aes(x = item_ID, y = item_price_change, fill = item_categ)) + geom_col()
head(food)
# A tibble: 6 × 4 item_ID item_categ observation_time item_price_change <chr> <chr> <int> <dbl> 1 ID_1 pasta 1 0.5 2 ID_1 pasta 2 2.5 3 ID_1 pasta 3 1.5 4 ID_1 pasta 4 0.5 5 ID_1 pasta 5 0.5 6 ID_1 pasta 6 0.5
food %>% group_by(item_ID) %>% summarize(sum = sum(item_price_change))
# A tibble: 4 × 2 item_ID sum <chr> <dbl> 1 ID_1 10 2 ID_2 5 3 ID_3 32 4 ID_4 75
food_bar + theme(text = element_text(size = 20))
directlabels
packageGreat for adding labels directly onto plots
https://www.opencasestudies.org/ocs-bp-co2-emissions/
#install.packages("directlabels") library(directlabels) direct.label(rp_fac_plot, method = list("angled.boxes"))
#install.packages("plotly") library("plotly") ggplotly(rp_fac_plot)
Also check out the ggiraph
package
A few options:
ggsave(filename = "saved_plot.png", # will save in working directory plot = rp_fac_plot, width = 6, height = 3.5) # by default in inch