7.5 Volcano Plots
Volcano plots are used to summarize the results of differential analysis. They are scatter plots that show log\(_2\) fold-change vs statistical significance. The plot_volcano
function in the MSnSet.utils package is used to create volcano plots. For ANOVA results, volcano plots will not be useful, since the p-values are based on two or more contrasts; the volcano plots would not display the characteristic “V” shape.
7.5.1 Base plot
Unless the differential analysis results are from a one-sample t-test with a single comparison, volcano plots need to be made for each contrast, since there are multiple logFC values for each feature. For this example, we will make a volcano plot using the log\(_2\) fold-changes from the t_res1
table. We will set sig_threshold
to 0.05, which will add a dashed horizontal line indicating the cutoff for statistical significance. Normally, the adjusted p-values would be used, though they are all extremely high in these results, so we will use the unadjusted p-values for the examples.
plot_volcano(df = t_res1, logFC = "logFC",
pvals = "P.Value", sig_threshold = 0.05)
7.5.2 Label top features
plot_volcano
has an argument called label
to label the top most significant features. By default, the top 8 features will be labelled. The column used for labeling must be in the data frame supplied to the df
argument. In this case, we will need to create it using the row names.
# Create RefSeq column for labeling
$RefSeq <- rownames(t_res1)
t_res1
# Volcano plot with feature labels
plot_volcano(df = t_res1, logFC = "logFC",
pvals = "P.Value", sig_threshold = 0.05,
label = "RefSeq") # label by RefSeq - top 8
7.5.3 Label specific features
There may be cases when the features to label are not necessarily the top \(n\). To label specific features, we need a column where everything but the labels are NA
, and we need to set num_features
to the number of rows in the data so that nothing is discarded. For this example, we will select a random group of 5 features to label.
# 5 random RefSeqs to label
<- sample(t_res1$RefSeq, size = 5)
random_features # If RefSeq is not in the random group of 5, set it to NA
<- mutate(t_res1,
t_res1 custom_labels = ifelse(RefSeq %in% random_features,
NA))
RefSeq, # Volcano plot with feature labels
plot_volcano(df = t_res1, logFC = "logFC",
pvals = "P.Value", sig_threshold = 0.05,
# Custom labels
label = "custom_labels", num_features = nrow(t_res1))
7.5.4 Modify point colors
We will change the color of points to reflect their significance and the sign of the log\(_2\) fold-change. We start by creating a point_color column with three groups: “down”, “up” and “NS” (not-significant). Then, the point_args
argument is modified so that the color of points depends on these groups.
# Determine point colors based on significance and sign of the logFC
# We would normally use adj.P.Value instead of P.Value
<- t_res1 %>%
t_res1 mutate(point_color = case_when(
< 0.05 & logFC < 0 ~ "down", # significantly down
P.Value < 0.05 & logFC > 0 ~ "up", # significantly up
P.Value TRUE ~ "NS") # not significant
)
# Color points
<- plot_volcano(df = t_res1, logFC = "logFC",
v1 pvals = "P.Value", sig_threshold = 0.05,
# Change point color
point_args = list(mapping = aes(color = point_color)))
v1
We will change the default colors to be more informative. Points in the “down” group will be #5555ff (blue), points in the “up” group will be red3, and points in the “NS” group will be lightgrey. We will also remove the legend, since it doesn’t add much information.
# Change colors
+ scale_color_manual(values = c("#5555ff", "red3", "lightgrey"),
v1 breaks = c("down", "up", "NS")) +
theme(legend.position = "none") # do not show legend
7.5.5 Multiple volcano plots
For results generated by limma_contrasts
, we should make separate plots for each comparison with facet_wrap
or facet_grid
.
# Basic volcano plot
plot_volcano(df = t_res2, logFC = "logFC", pvals = "adj.P.Val",
sig_threshold = 0.05) +
facet_wrap(vars(contrast)) + # plot for each contrast
labs(title = "Volcano Plots of limma_contrasts Results")