Section 5 Linear Modeling

In this section, we will use wrappers around functions from the limma package to fit linear models (linear regression, t-test, and ANOVA) to proteomics data. While LIMMA was originally intended for use with microarray data, it is useful for other data types. When working with LIMMA, the LIMMA User’s Guide is an invaluable resource.

LIMMA makes use of empirical Bayes techniques to borrow information across all features being tested to increase the degrees of freedom available for the test statistics. This results in so-called “moderated” test statistics and improved ability to detect differential expression/abundance.

We will use the CPTAC ovarian cancer proteomics dataset for this section. The required packages are MSnSet.utils for the LIMMA wrappers and volcano plots, dplyr for data frame manipulation, and ggplot2 for p-value histograms and to further customize the volcano plots. We load the cptac_oca data and copy oca.set to m, which will be used in the examples.

library(MSnSet.utils)
library(dplyr)
library(ggplot2)

# MSnSet
data("cptac_oca")
m <- oca.set