Common Genomics Plots using ggplot2

This script demonstrates how to generate commonly used genomics visualization plots (such as barplot, density plot, scatter plot, violin plot, and box plot) using the ggplot2 R package.

We use the GSE183947 gene expression data set converted into in a long format to allow flexible plotting with ggplot2. These plots are commonly used in transcriptomics and cancer genomics studies to visualize gene expression differences across biological conditions.

Setup Environment

First we clean the R environment and load the required packages.

# Clean environment
rm(list = ls())
gc()
##           used (Mb) gc trigger (Mb) max used (Mb)
## Ncells  610660 32.7    1393680 74.5   702125 37.5
## Vcells 1137722  8.7    8388608 64.0  1927959 14.8
# Load libraries
library(dplyr)
library(ggplot2)
library(readr)
library(tidyr)
library(RColorBrewer)

Load Gene Expression Dataset

The dataset used in this tutorial is a long-format gene expression table.Read this data using read_delim() function in readr package.

data.long <- read_delim("long_format_expression_data_GSE183947.txt")
## Rows: 1214760 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (5): gene, samples, title, breast_tissue, metastasis
## dbl (1): expression
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dat <- data.long
#View initial rows of the data
head(dat)
## # A tibble: 6 × 6
##   gene     samples   expression title      breast_tissue metastasis
##   <chr>    <chr>          <dbl> <chr>      <chr>         <chr>     
## 1 TSPAN6   CA.102548       0.93 tumor rep1 tumor         yes       
## 2 TNMD     CA.102548       0    tumor rep1 tumor         yes       
## 3 DPM1     CA.102548       0    tumor rep1 tumor         yes       
## 4 SCYL3    CA.102548       5.78 tumor rep1 tumor         yes       
## 5 C1orf112 CA.102548       2.83 tumor rep1 tumor         yes       
## 6 FGR      CA.102548       4.8  tumor rep1 tumor         yes

ggplot2 Basic Framework

The basic structure of a ggplot figure to make all the plots is:

ggplot(data, aes(x = variable_x, y = variable_y)) +
  geom_*()

Examples:

Plot Type Function
Bar plot geom_col()
Scatter plot geom_point()
Density plot geom_density()
Violin plot geom_violin()
Box plot geom_boxplot()

Here in place of asterisk you can add label of any plot you need to make. # Select Genes of Interest

We select a subset of genes to demonstrate visualization.

genes_of_interest <- c(
'TP53','ATM','PIK3CA','BRCA1','BRCA2','CD163','MRC1','IL10','CD3E','CD3E')

filtered_dat <- dat %>% filter(gene %in% genes_of_interest)

head(filtered_dat)
## # A tibble: 6 × 6
##   gene   samples   expression title      breast_tissue metastasis
##   <chr>  <chr>          <dbl> <chr>      <chr>         <chr>     
## 1 BRCA1  CA.102548      30.4  tumor rep1 tumor         yes       
## 2 MRC1   CA.102548       0.37 tumor rep1 tumor         yes       
## 3 PIK3CA CA.102548       0.03 tumor rep1 tumor         yes       
## 4 IL10   CA.102548      38.9  tumor rep1 tumor         yes       
## 5 BRCA2  CA.102548       0.38 tumor rep1 tumor         yes       
## 6 TP53   CA.102548      38.3  tumor rep1 tumor         yes

Custom Plot Theme (Publication Style)

Creating a reusable theme for consistent visualization.

my_theme <- theme(
  axis.line = element_line(colour = "black", linewidth = 0.75),
  axis.text = element_text(colour = 'black',size=10, face='bold'),
  axis.title = element_text(size=14, face='bold'),
  axis.ticks = element_line(color='black', linewidth=1),

  legend.position = "right",
  legend.title = element_text(face='bold'),
  legend.background = element_blank(),
  legend.box.background = element_rect(colour="black"),

  panel.grid.major.y = element_blank(),
  panel.grid.minor.y = element_blank(),
  panel.grid.major.x = element_blank(),
  panel.grid.minor.x = element_blank(),

  plot.background = element_rect(fill=NULL, colour='white'),
  panel.background = element_rect(fill='white')
)

1. Bar Plot

A bar plot comparing gene expression levels across tissue types.

ggplot(filtered_dat,
       aes(x=gene, y=expression, fill=breast_tissue)) +
  geom_col()

Adding the axis labels, legend titles and adding color brewer (for better color visualization) to plot.

ggplot(filtered_dat, aes(x=gene, y=expression, fill=breast_tissue)) +
  geom_col(position="dodge") + 
  scale_fill_manual(values=c(normal="#00AEF3", tumor="#E81B23")) + 
  labs(x="Genes", y="Expression (FPKM)", fill="Breast Tissue")

Adding the customized theme to the plot to make it visually appealing.

ggplot(filtered_dat, aes(x= gene, y= expression, fill= breast_tissue)) + 
  geom_col() +
  scale_fill_manual(values=c(normal = "#00AEF3",tumor= "#E81B23")) + 
  labs(x= "Genes", y = "Expression (FPKM)", fill = "Breast Tissue") +
  my_theme 

Side by side Tumor/Normal bar plot for each gene

ggplot(filtered_dat, aes(x = gene, y = expression, fill = breast_tissue)) + 
  geom_col(position = "dodge") +
  scale_fill_manual(values = c(normal = "#00AEF3", tumor = "#E81B23")) +
  labs(x= "Genes", y = "Expression (FPKM)", fill = "Breast Tissue") +
  my_theme 

2. Density Plot

Density plots help visualize expression distribution of a gene.Example using TP53 expression.

dat %>%
  filter(gene == 'TP53') %>%
  ggplot(aes(x = expression, fill = breast_tissue)) +
  geom_density(alpha = 0.3) +
  scale_fill_manual(values = c(normal="#00AEF3",
                               tumor="#E81B23")) +
  labs(x="Expression",
       y="Density",
       fill="Breast Tissue") +
  my_theme

3. Scatter Plot

#Using spread function from tidyr package here to get separate columns for two genes
dat %>%
  filter(gene == 'CD3E' | gene == 'IL10') %>%
  spread(key = gene, value = expression) %>%
  ggplot(aes(x = CD3E,
             y = IL10,
             color = breast_tissue)) +
  geom_point() +
  labs(x="CD3E Expression",
       y="IL10 Expression",
       color="Breast Tissue") +
  my_theme

4. Violin Plot

Violin plots show distribution and density of expression values.

dat %>%
  filter(gene %in% c('TP53','ATM','CD163')) %>%
  ggplot(aes(x = gene,
             y = expression,
             fill = breast_tissue)) +
  geom_violin(trim = FALSE, linewidth = 0.9) +
  scale_fill_brewer(palette = "Paired") +
  labs(x="Genes",
       y="Expression (FPKM)",
       fill="Breast Tissue") +
  my_theme

5. Box Plot

Box plots are widely used to show median expression differences between groups.

dat %>%
  filter(gene == 'TP53' | gene == 'ATM' | gene == 'CD163') %>%
  ggplot(., aes(x = gene, y = expression, fill = breast_tissue)) +
  geom_boxplot() +
  scale_fill_brewer(palette = "Dark2") + 
  labs (x= "Genes", y = "Expression (FPKM)", fill = "Breast Tissue") +
  my_theme

Let’s make a boxplot for many genes

filtered_dat %>%
  ggplot(., aes(x = gene, y = expression, fill = breast_tissue)) +
  geom_boxplot() +
  scale_fill_brewer(palette = "Spectral") + 
  labs (x= "Genes", y = "Expression (FPKM)", fill = "Breast Tissue") +
  my_theme 

Box plot using theme_bw() package for an alternative clean visualization.

filtered_dat %>%
  ggplot(., aes(x = as.factor(gene), y = expression)) +
  geom_boxplot(aes(fill = breast_tissue))+
  labs (x= "Genes", y = "Expression (FPKM)", fill = "Breast Tissue") +
 scale_fill_manual(values = c("#09E359","#E31009")) +
  theme_bw() +
  theme(axis.text = element_text(colour = 'black',size= 10, face= 'bold'),
        axis.title = element_text(colour = 'black',size= 13, face= 'bold'),
        legend.title = element_text(face = 'bold'))

Boxplot to explore the expression differences by Metastasis Status

filtered_dat %>%
  ggplot(., aes(x = as.factor(gene), y = expression)) +
  geom_boxplot(aes(fill = metastasis))+
  labs (x= "Genes", y = "Expression (FPKM)", fill = "Metastasis") +
  scale_fill_brewer(palette = "Accent") + 
  my_theme