Section 7 Differential Analysis

In this section, we will use wrappers around functions from the limma package to fit linear models (linear regression, t-test, and ANOVA) to proteomics data. While LIMMA was originally intended for use with microarray data, it is useful for other data types. When working with LIMMA, the LIMMA User’s Guide is an invaluable resource.

LIMMA makes use of empirical Bayes techniques to borrow information across all features being tested to increase the degrees of freedom available for the test statistics. This results in so-called moderated test statistics and improved power to detect differential expression (Gordon K. Smyth, 2004).

We will use the CPTAC ovarian cancer proteomics dataset for this section. The required packages are MSnSet.utils for the LIMMA wrappers and volcano plots, dplyr for data frame manipulation, and ggplot2 for p-value histograms and to further customize the volcano plots. We load the cptac_oca data and assign oca.set to m, which will be used in the examples.

## Install missing packages
cran_packages <- c("remotes", "dplyr", "ggplot2")
for (pkg_i in cran_packages) {
  if (!require(pkg_i, quietly = T, character.only = T))
    install.packages(pkg_i)
}
if (!require("MSnSet.utils", quietly = T))
  remotes::install_github("PNNL-Comp-Mass-Spec/MSnSet.utils")
## ------------------------
library(MSnSet.utils)
library(dplyr)
library(ggplot2)

# MSnSet for testing
data("cptac_oca")
m <- oca.set

References

Smyth, G. K., Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments, Statistical Applications in Genetics and Molecular Biology, vol. 3, p. Article3, 2004. DOI: 10.2202/1544-6115.1027