This script demonstrates how to generate commonly used genomics visualization plots (such as barplot, density plot, scatter plot, violin plot, and box plot) using the ggplot2 R package.
We use the GSE183947 gene expression data set converted into in a long format to allow flexible plotting with ggplot2. These plots are commonly used in transcriptomics and cancer genomics studies to visualize gene expression differences across biological conditions.
First we clean the R environment and load the required packages.
# Clean environment
rm(list = ls())
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 610660 32.7 1393680 74.5 702125 37.5
## Vcells 1137722 8.7 8388608 64.0 1927959 14.8
# Load libraries
library(dplyr)
library(ggplot2)
library(readr)
library(tidyr)
library(RColorBrewer)
The dataset used in this tutorial is a long-format gene expression table.Read this data using read_delim() function in readr package.
data.long <- read_delim("long_format_expression_data_GSE183947.txt")
## Rows: 1214760 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (5): gene, samples, title, breast_tissue, metastasis
## dbl (1): expression
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dat <- data.long
#View initial rows of the data
head(dat)
## # A tibble: 6 × 6
## gene samples expression title breast_tissue metastasis
## <chr> <chr> <dbl> <chr> <chr> <chr>
## 1 TSPAN6 CA.102548 0.93 tumor rep1 tumor yes
## 2 TNMD CA.102548 0 tumor rep1 tumor yes
## 3 DPM1 CA.102548 0 tumor rep1 tumor yes
## 4 SCYL3 CA.102548 5.78 tumor rep1 tumor yes
## 5 C1orf112 CA.102548 2.83 tumor rep1 tumor yes
## 6 FGR CA.102548 4.8 tumor rep1 tumor yes
The basic structure of a ggplot figure to make all the plots is:
ggplot(data, aes(x = variable_x, y = variable_y)) +
geom_*()
Examples:
| Plot Type | Function |
|---|---|
| Bar plot | geom_col() |
| Scatter plot | geom_point() |
| Density plot | geom_density() |
| Violin plot | geom_violin() |
| Box plot | geom_boxplot() |
Here in place of asterisk you can add label of any plot you need to make. # Select Genes of Interest
We select a subset of genes to demonstrate visualization.
genes_of_interest <- c(
'TP53','ATM','PIK3CA','BRCA1','BRCA2','CD163','MRC1','IL10','CD3E','CD3E')
filtered_dat <- dat %>% filter(gene %in% genes_of_interest)
head(filtered_dat)
## # A tibble: 6 × 6
## gene samples expression title breast_tissue metastasis
## <chr> <chr> <dbl> <chr> <chr> <chr>
## 1 BRCA1 CA.102548 30.4 tumor rep1 tumor yes
## 2 MRC1 CA.102548 0.37 tumor rep1 tumor yes
## 3 PIK3CA CA.102548 0.03 tumor rep1 tumor yes
## 4 IL10 CA.102548 38.9 tumor rep1 tumor yes
## 5 BRCA2 CA.102548 0.38 tumor rep1 tumor yes
## 6 TP53 CA.102548 38.3 tumor rep1 tumor yes
Creating a reusable theme for consistent visualization.
my_theme <- theme(
axis.line = element_line(colour = "black", linewidth = 0.75),
axis.text = element_text(colour = 'black',size=10, face='bold'),
axis.title = element_text(size=14, face='bold'),
axis.ticks = element_line(color='black', linewidth=1),
legend.position = "right",
legend.title = element_text(face='bold'),
legend.background = element_blank(),
legend.box.background = element_rect(colour="black"),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
plot.background = element_rect(fill=NULL, colour='white'),
panel.background = element_rect(fill='white')
)
A bar plot comparing gene expression levels across tissue types.
ggplot(filtered_dat,
aes(x=gene, y=expression, fill=breast_tissue)) +
geom_col()
Adding the axis labels, legend titles and adding color brewer (for better color visualization) to plot.
ggplot(filtered_dat, aes(x=gene, y=expression, fill=breast_tissue)) +
geom_col(position="dodge") +
scale_fill_manual(values=c(normal="#00AEF3", tumor="#E81B23")) +
labs(x="Genes", y="Expression (FPKM)", fill="Breast Tissue")
Adding the customized theme to the plot to make it visually appealing.
ggplot(filtered_dat, aes(x= gene, y= expression, fill= breast_tissue)) +
geom_col() +
scale_fill_manual(values=c(normal = "#00AEF3",tumor= "#E81B23")) +
labs(x= "Genes", y = "Expression (FPKM)", fill = "Breast Tissue") +
my_theme
Side by side Tumor/Normal bar plot for each gene
ggplot(filtered_dat, aes(x = gene, y = expression, fill = breast_tissue)) +
geom_col(position = "dodge") +
scale_fill_manual(values = c(normal = "#00AEF3", tumor = "#E81B23")) +
labs(x= "Genes", y = "Expression (FPKM)", fill = "Breast Tissue") +
my_theme
Density plots help visualize expression distribution of a gene.Example using TP53 expression.
dat %>%
filter(gene == 'TP53') %>%
ggplot(aes(x = expression, fill = breast_tissue)) +
geom_density(alpha = 0.3) +
scale_fill_manual(values = c(normal="#00AEF3",
tumor="#E81B23")) +
labs(x="Expression",
y="Density",
fill="Breast Tissue") +
my_theme
#Using spread function from tidyr package here to get separate columns for two genes
dat %>%
filter(gene == 'CD3E' | gene == 'IL10') %>%
spread(key = gene, value = expression) %>%
ggplot(aes(x = CD3E,
y = IL10,
color = breast_tissue)) +
geom_point() +
labs(x="CD3E Expression",
y="IL10 Expression",
color="Breast Tissue") +
my_theme
Violin plots show distribution and density of expression values.
dat %>%
filter(gene %in% c('TP53','ATM','CD163')) %>%
ggplot(aes(x = gene,
y = expression,
fill = breast_tissue)) +
geom_violin(trim = FALSE, linewidth = 0.9) +
scale_fill_brewer(palette = "Paired") +
labs(x="Genes",
y="Expression (FPKM)",
fill="Breast Tissue") +
my_theme
Box plots are widely used to show median expression differences between groups.
dat %>%
filter(gene == 'TP53' | gene == 'ATM' | gene == 'CD163') %>%
ggplot(., aes(x = gene, y = expression, fill = breast_tissue)) +
geom_boxplot() +
scale_fill_brewer(palette = "Dark2") +
labs (x= "Genes", y = "Expression (FPKM)", fill = "Breast Tissue") +
my_theme
Let’s make a boxplot for many genes
filtered_dat %>%
ggplot(., aes(x = gene, y = expression, fill = breast_tissue)) +
geom_boxplot() +
scale_fill_brewer(palette = "Spectral") +
labs (x= "Genes", y = "Expression (FPKM)", fill = "Breast Tissue") +
my_theme
Box plot using theme_bw() package for an alternative clean visualization.
filtered_dat %>%
ggplot(., aes(x = as.factor(gene), y = expression)) +
geom_boxplot(aes(fill = breast_tissue))+
labs (x= "Genes", y = "Expression (FPKM)", fill = "Breast Tissue") +
scale_fill_manual(values = c("#09E359","#E31009")) +
theme_bw() +
theme(axis.text = element_text(colour = 'black',size= 10, face= 'bold'),
axis.title = element_text(colour = 'black',size= 13, face= 'bold'),
legend.title = element_text(face = 'bold'))
filtered_dat %>%
ggplot(., aes(x = as.factor(gene), y = expression)) +
geom_boxplot(aes(fill = metastasis))+
labs (x= "Genes", y = "Expression (FPKM)", fill = "Metastasis") +
scale_fill_brewer(palette = "Accent") +
my_theme