

esquisse and ggplot2

More customization:
Easier plot automation (creating plots in scripts)
Faster (eventually)
A package for producing graphics - gg = Grammar of Graphics
Created by Hadley Wickham in 2005
Belongs to “Tidyverse” family of packages
“Make a ggplot” = Make a plot with the use of ggplot2 package
Resources:
Based on the idea of:
layering
plot objects are placed on top of each other with +
📉
âž•
📈
Pros: extremely powerful/flexible – allows combining multiple plot elements together, allows high customization of a look, many resources online
Cons: ggplot2-specific “grammar of graphic” of constructing a plot
To make graphics using ggplot2, our data needs to be in a tidy format
Tidy data:
Messy data:
Each variable forms a column. Each observation forms a row.
Column headers are values, not variable names
Read more about tidy data and see other examples: Tidy Data tutorial by Hadley Wickham
It’s also helpful to have data in long format!!!
set.seed(3) var_1 <- seq(from = 1, to = 30) var_2 <- rnorm(30) my_data = tibble(var_1, var_2) my_data
# A tibble: 30 × 2 var_1 var_2 <int> <dbl> 1 1 -0.962 2 2 -0.293 3 3 0.259 4 4 -1.15 5 5 0.196 6 6 0.0301 7 7 0.0854 8 8 1.12 9 9 -1.22 10 10 1.27 # … with 20 more rows
ggplot2 packageggplot2 packageWill set up the plot - it will be empty!
(mapping= aes(x= , y =)) describes how variables in our data are mapped to elements of the plotlibrary(ggplot2) # don't forget to load ggplot2
# This is not code but shows the general format
ggplot({data_to plot}, mapping = aes(x = {var in data to plot},
y = {var in data to plot}))ggplot(my_data, mapping = aes(x = var_1, y = var_2))
ggplot2 packageThere are many to choose from, to list just a few:
geom_point() – points (we have seen)geom_line() – lines to connect observationsgeom_boxplot()geom_histogram()geom_bar()geom_col()geom_errorbar()geom_density()geom_tile() – blocks filled with colorggplot2 package+ sign to add the next layer to specify the type of plot
ggplot({data_to plot}, mapping = aes(x = {var in data to plot},
y = {var in data to plot})) +
geom_{type of plot}</div>ggplot(my_data, mapping = aes(x = var_1, y = var_2)) + geom_point()
Read as: add points to the plot (use data as provided by the aesthetic mapping)
plt1 <- ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point() plt2 <- ggplot(my_data, aes(x = var_1, y = var_2)) + geom_line() plt1; plt2 # to have 2 plots printed next to each other on a slide
Also check out the patchwork package
Layer a plot on top of another plot with +
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point() + geom_line()
You can change look of each layer separately.
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "black", linetype = 2)
You can change the look of whole plot using theme_*() functions.
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + theme_dark()
You can change the look of whole plot - specific elements, too - like changing font and font size - or even more fonts
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + theme_bw(base_size = 20, base_family = "Comic Sans MS")
The labs() function can help you add or modify titles on your plot.
ggplot(my_data, aes(x = var_1, y = var_2)) +
geom_point(size = 5, color = "red", alpha = 0.5) +
geom_line(size = 0.8, color = "brown", linetype = 2) +
labs(title = "My plot of var1 vs var2",
x = "Variable 1",
y = "Variable 2")
xlim() and ylim() can specify the limits for each axis
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + labs(title = "My plot of var1 vs var2") + xlim(0,40)
scale_x_continuous() and scale_y_continuous() can change how the axis is plotted. Can use the breaks argument to specify how you want the axis ticks to be.
seq(from = 0, to = 30, by = 5)
[1] 0 5 10 15 20 25 30
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + scale_x_continuous(breaks = seq(from = 0, to = 30, by = 5))
The theme() function can help you modify various elements of your plot. Here we will adjust the horizontal justification (hjust) of the plot title.
ggplot(my_data, aes(x = var_1, y = var_2)) + geom_point(size = 5, color = "red", alpha = 0.5) + geom_line(size = 0.8, color = "brown", linetype = 2) + labs(title = "My plot of var1 vs var2") + theme(plot.title = element_text(hjust = 0.5, size = 20))
The theme() function always takes:
?theme() to see - plot.title, axis.title, axis.ticks etc.)element_text(), element_line(), element_rect(), element_blank()size, color, fill, face, alpha, angle"top", "bottom", "right", "left", "none"size, color, fill, linetypesize, color, linetypeggplot(my_data, aes(x = var_1, y = var_2)) +
geom_point(size = 5, color = "red", alpha = 0.5) +
labs(title = "My plot of var1 vs var2", x = "Variable 1") +
theme(plot.title = element_text(hjust = 0.5, size = 20),
axis.title.x = element_text(size = 16))
head(Orange, 3)
Tree age circumference 1 1 118 30 2 1 484 58 3 1 664 87
If specifying position - use: “top”, “bottom”, “right”, “left”, “none”
ggplot(Orange, aes(x = Tree, y = circumference, fill = Tree)) + geom_boxplot() + theme(legend.position = "none")
Guide on how to: https://rpubs.com/mclaire19/ggplot2-custom-themes
First, we will generate some data frame for the purpose of demonstration.
# create 4 vectors: 2x character class and 2x numeric class
item_categ <- rep(c("pasta", "rice"),each = 20)
item_ID <- rep(seq(from = 1, to = 4), each = 10)
item_ID <- paste0("ID_", item_ID)
observation_time <- rep(seq(from = 1, to = 10), times = 4)
item_price_change <- c(sample(0.5:2.5, size = 10, replace = TRUE),
sample(0:1, size = 10, replace = TRUE),
sample(2:5, size = 10, replace = TRUE),
sample(6:9, size = 10, replace = TRUE))
# use 4 vectors to create data frame with 4 columns
food <- tibble(item_ID, item_categ, observation_time, item_price_change)
food
# A tibble: 40 × 4 item_ID item_categ observation_time item_price_change <chr> <chr> <int> <dbl> 1 ID_1 pasta 1 0.5 2 ID_1 pasta 2 2.5 3 ID_1 pasta 3 1.5 4 ID_1 pasta 4 0.5 5 ID_1 pasta 5 0.5 6 ID_1 pasta 6 0.5 7 ID_1 pasta 7 0.5 8 ID_1 pasta 8 0.5 9 ID_1 pasta 9 1.5 10 ID_1 pasta 10 1.5 # … with 30 more rows
ggplot(food, aes(x = observation_time,
y = item_price_change)) +
geom_line()
group in plotsYou can use group element in a mapping to indicate that each item_ID will have a separate price line.
ggplot(food, aes(x = observation_time,
y = item_price_change,
group = item_ID)) +
geom_line()
ggplot(food, aes(x = observation_time,
y = item_price_change,
color = item_ID)) +
geom_line()
ggplot(food, aes(x = observation_time,
y = item_price_change,
color = item_categ)) +
geom_line()
ggplot(food, aes(x = observation_time,
y = item_price_change,
group = item_ID,
color = item_categ)) +
geom_line()
Two options: facet_grid()- creates a grid shape facet_wrap() -more flexible
Need to specify how you are faceting with the ~ sign.
ggplot(food, aes(x = observation_time,
y = item_price_change,
color = item_ID)) +
geom_line() +
facet_grid( ~ item_categ)
ncol and nrow can specify layoutscales = "free_x", scales = "free_y", or scales = "free"rp_fac_plot <- ggplot(food, aes(x = observation_time,
y = item_price_change,
color = item_ID)) +
geom_line() +
geom_point() +
facet_wrap( ~ item_categ, ncol = 1, scales = "free")
rp_fac_plot
NOTE: color is needed for points and lines, fill generally needed for boxes and bars
ggplot(food, aes(x = item_ID,
y = item_price_change,
color = item_categ)) +
geom_boxplot()
NOTE: color is needed for points and lines, fill generally needed for boxes and bars
ggplot(food, aes(x = item_ID,
y = item_price_change,
fill = item_categ)) +
geom_boxplot()
+ can’t come at start of a new lineThis will not work! Also don’t use pipes instead of +!
ggplot(food, aes(x = item_ID,
y = item_price_change,
fill = item_categ))
+ geom_boxplot()
Can add width argument to make the jitter more narrow.
ggplot(food, aes(x = item_ID,
y = item_price_change,
fill = item_categ)) +
geom_boxplot() +
geom_jitter(width = .06)
scale_fill_viridis_d() for discrete /categorical data scale_fill_viridis_c() for continuous data
ggplot(food, aes(x = item_ID,
y = item_price_change,
fill = item_categ)) +
geom_boxplot() +
geom_jitter(width = .06) +
scale_fill_viridis_d()
food_bar <-food %>%
group_by(item_categ) %>%
summarize("max_price_change" = max(item_price_change)) %>%
ggplot(aes(x = item_categ,
y = max_price_change,
fill = item_categ)) +
scale_fill_viridis_d()+
geom_col() +
theme(legend.position = "none")
food_bar
aes()Can be used to add an outline around column/bar plots.
food_bar + geom_col(color = "black")
geom_bar() can only one aes mapping & geom_col() can have two
👀May not be plotting what you think you are…
ggplot(food, aes(x = item_ID,
y = item_price_change,
fill = item_categ)) +
geom_col()
head(food)
# A tibble: 6 × 4 item_ID item_categ observation_time item_price_change <chr> <chr> <int> <dbl> 1 ID_1 pasta 1 0.5 2 ID_1 pasta 2 2.5 3 ID_1 pasta 3 1.5 4 ID_1 pasta 4 0.5 5 ID_1 pasta 5 0.5 6 ID_1 pasta 6 0.5
food %>% group_by(item_ID) %>% summarize(sum = sum(item_price_change))
# A tibble: 4 × 2 item_ID sum <chr> <dbl> 1 ID_1 10 2 ID_2 5 3 ID_3 32 4 ID_4 75
food_bar + theme(text = element_text(size = 20))
directlabels packageGreat for adding labels directly onto plots
https://www.opencasestudies.org/ocs-bp-co2-emissions/
#install.packages("directlabels")
library(directlabels)
direct.label(rp_fac_plot, method = list("angled.boxes"))
#install.packages("plotly")
library("plotly")
ggplotly(rp_fac_plot)
Also check out the ggiraph package
A few options:
ggsave(filename = "saved_plot.png", # will save in working directory
plot = rp_fac_plot,
width = 6, height = 3.5) # by default in inch