This analysis has been adapted from this refine.bio-examples notebook
In this analysis, we will use this acute myeloid leukemia sample dataset from Shih et al., 2017 and pre-processed by refinebio.
The data that we downloaded from refine.bio for this analysis has 19 samples (obtained from 19 acute myeloid leukemia (AML) model mice), containing RNA-sequencing results for types of AML under controlled treatment conditions.
This dataset can be downloaded from this page on refine.bio. We will download it already processed and quantile normalized.
# Create the data folder if it doesn't exist
if (!dir.exists("data")) {
dir.create("data")
}
# Define the file path to the plots directory
plots_dir <- "plots"
# Create the plots folder if it doesn't exist
if (!dir.exists(plots_dir)) {
dir.create(plots_dir)
}
# Define the file path to the results directory
results_dir <- "results"
# Create the results folder if it doesn't exist
if (!dir.exists(results_dir)) {
dir.create(results_dir)
}
# Define the file path to the data directory
data_dir <- file.path("data", "SRP070849")
# Declare the file path to the gene expression matrix file
data_file <- file.path(data_dir, "SRP070849.tsv")
# Declare the file path to the metadata file
# inside the directory saved as `data_dir`
metadata_file <- file.path(data_dir, "metadata_SRP070849.tsv")
We will use []pheatmap
by Slowikowski et al., 2017](https://www.rdocumentation.org/packages/pheatmap/versions/1.0.12/topics/pheatmap)
for clustering and creating a heatmap.
if (!("pheatmap" %in% installed.packages())) {
# Install pheatmap
install.packages("pheatmap", update = FALSE)
}
Attach the pheatmap
and magrittr
libraries:
# Attach the `pheatmap` library
library(pheatmap)
# We will need this so we can use the pipe: %>%
library(magrittr)
# Set the seed so our results are reproducible:
set.seed(12345)
This chunk of code will read in both TSV files and add them as data frames to your environment.
# Read in metadata TSV file
metadata <- readr::read_tsv(metadata_file)
Rows: 19 Columns: 23
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (11): refinebio_accession_code, experiment_accession, refinebio_organism...
dbl (1): refinebio_processor_id
lgl (11): refinebio_age, refinebio_cell_line, refinebio_compound, refinebio_...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Read in data TSV file
expression_df <- readr::read_tsv(data_file) %>%
# Here we are going to store the gene IDs as row names so that
# we can have only numeric values to perform calculations on later
tibble::column_to_rownames("Gene")
Rows: 41249 Columns: 20
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): Gene
dbl (19): SRR3189679, SRR3189680, SRR3189681, SRR3189682, SRR3189683, SRR318...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Let’s take a look at the metadata object that we read into the R environment.
head(metadata)
Now let’s ensure that the metadata and data are in the same sample order.
# Make the data in the order of the metadata
expression_df <- expression_df %>%
dplyr::select(metadata$refinebio_accession_code)
# Check if this is in the same order
all.equal(colnames(expression_df), metadata$refinebio_accession_code)
[1] TRUE
Now we are going to use the pheatmap
package to look at
how are samples and genes are clustering.
For this example, we will sort genes by variance and select genes in the upper quartile, but there are many alternative criterion by which you may want to sort your genes, e.g. fold change, t-statistic, membership in a particular gene ontology, so on.
# Calculate the variance for each gene
variances <- apply(expression_df, 1, var)
# Determine the upper quartile variance cutoff value
upper_var <- quantile(variances, 0.75)
# Filter the data choosing only genes whose variances are in the upper quartile
df_by_var <- data.frame(expression_df) %>%
dplyr::filter(variances > upper_var)
Let’s save these to our results folder as a TSV file.
readr::write_tsv(df_by_var, file.path(results_dir, "top_90_var_genes.tsv"))
From the accompanying paper, we know that
the mice with IDH2
mutant AML were treated with vehicle or
AG-221 (the first small molecule in-vivo inhibitor of IDH2 to enter
clinical trials) and the mice with TET2
mutant AML were
treated with vehicle or 5-Azacytidine (Decitabine, hypomethylating
agent) Shih et al.,
2017.
# Let's prepare the annotation for the uncollapsed `DESeqData` set object
# which will be used to annotate the heatmap
annotation_df <- metadata %>%
# Create a variable to store the cancer type information
dplyr::mutate(
mutation = dplyr::case_when(
startsWith(refinebio_title, "TET2") ~ "TET2",
startsWith(refinebio_title, "IDH2") ~ "IDH2",
startsWith(refinebio_title, "WT") ~ "WT",
# If none of the above criteria are satisfied,
# we mark the `mutation` variable as "unknown"
TRUE ~ "unknown"
)
) %>%
# select only the columns we need for annotation
dplyr::select(
refinebio_accession_code,
mutation,
refinebio_treatment
) %>%
# The `pheatmap()` function requires that the row names of our annotation
# data frame match the column names of our `DESeaDataSet` object
tibble::column_to_rownames("refinebio_accession_code")
# Create and store the annotated heatmap object
heatmap_annotated <-
pheatmap(
df_by_var,
cluster_rows = TRUE,
cluster_cols = TRUE,
show_rownames = FALSE,
annotation_col = annotation_df, # Specify our annotation here
main = "Annotated Heatmap",
colorRampPalette(c(
"deepskyblue",
"black",
"yellow"
))(25
),
scale = "row" # Scale values in the direction of genes (rows)
)
You can switch this to save to a JPEG or TIFF by changing the function and file name within the function to the respective file suffix.
# Open a PNG file
png(file.path(
plots_dir,
"aml_heatmap.png" # Replace with a relevant file name
))
# Print the heatmap
heatmap_annotated
# Close the PNG file:
dev.off()
png
2
# Print session info
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.3 (2025-02-28)
os Ubuntu 24.04.1 LTS
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Etc/UTC
date 2025-03-25
pandoc 3.6.3 @ /usr/bin/ (via rmarkdown)
quarto 1.5.57 @ /usr/local/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
bit 4.5.0.1 2024-12-03 [1] RSPM (R 4.4.0)
bit64 4.6.0-1 2025-01-16 [1] RSPM (R 4.4.0)
bslib 0.9.0 2025-01-30 [1] RSPM (R 4.4.0)
cachem 1.1.0 2024-05-16 [1] RSPM (R 4.4.0)
cli 3.6.4 2025-02-13 [1] RSPM (R 4.4.0)
colorspace 2.1-1 2024-07-26 [1] RSPM (R 4.4.0)
crayon 1.5.3 2024-06-20 [1] RSPM (R 4.4.0)
digest 0.6.37 2024-08-19 [1] RSPM (R 4.4.0)
dplyr 1.1.4 2023-11-17 [1] RSPM (R 4.4.0)
evaluate 1.0.3 2025-01-10 [1] RSPM (R 4.4.0)
farver 2.1.2 2024-05-13 [1] RSPM (R 4.4.0)
fastmap 1.2.0 2024-05-15 [1] RSPM (R 4.4.0)
generics 0.1.3 2022-07-05 [1] RSPM (R 4.4.0)
glue 1.8.0 2024-09-30 [1] RSPM (R 4.4.0)
gtable 0.3.6 2024-10-25 [1] RSPM (R 4.4.0)
hms 1.1.3 2023-03-21 [1] RSPM (R 4.4.0)
htmltools 0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0)
jquerylib 0.1.4 2021-04-26 [1] RSPM (R 4.4.0)
jsonlite 1.9.0 2025-02-19 [1] RSPM (R 4.4.0)
knitr 1.49 2024-11-08 [1] RSPM (R 4.4.0)
lifecycle 1.0.4 2023-11-07 [1] RSPM (R 4.4.0)
magrittr * 2.0.3 2022-03-30 [1] RSPM (R 4.4.0)
munsell 0.5.1 2024-04-01 [1] RSPM (R 4.4.0)
pheatmap * 1.0.12 2019-01-04 [1] RSPM (R 4.4.0)
pillar 1.10.1 2025-01-07 [1] RSPM (R 4.4.0)
pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.4.0)
R6 2.6.1 2025-02-15 [1] RSPM (R 4.4.0)
RColorBrewer 1.1-3 2022-04-03 [1] RSPM (R 4.4.0)
readr 2.1.5 2024-01-10 [1] RSPM (R 4.4.0)
rlang 1.1.5 2025-01-17 [1] RSPM (R 4.4.0)
rmarkdown 2.29 2024-11-04 [1] RSPM (R 4.4.0)
sass 0.4.9 2024-03-15 [1] RSPM (R 4.4.0)
scales 1.3.0 2023-11-28 [1] RSPM (R 4.4.0)
sessioninfo 1.2.3 2025-02-05 [1] RSPM (R 4.4.0)
tibble 3.2.1 2023-03-20 [1] RSPM (R 4.4.0)
tidyselect 1.2.1 2024-03-11 [1] RSPM (R 4.4.0)
tzdb 0.4.0 2023-05-12 [1] RSPM (R 4.4.0)
vctrs 0.6.5 2023-12-01 [1] RSPM (R 4.4.0)
vroom 1.6.5 2023-12-05 [1] RSPM (R 4.4.0)
withr 3.0.2 2024-10-28 [1] RSPM (R 4.4.0)
xfun 0.51 2025-02-19 [1] RSPM (R 4.4.0)
yaml 2.3.10 2024-07-26 [1] RSPM (R 4.4.0)
[1] /usr/local/lib/R/site-library
[2] /usr/local/lib/R/library
* ── Packages attached to the search path.
──────────────────────────────────────────────────────────────────────────────