7.3 One-Way ANOVA

A one-way ANOVA is a generalized version of the two-sample t-test that is used to determine whether there is a significant difference between the means of three or more groups. The null hypothesis is that all group means are equal, and the alternative is that at least one of the means is different from the rest. Written another way, the null hypothesis is that the difference between any two means is zero, and the alternative is that the difference between at least two means is not zero.

Note: A one-way ANOVA does not tell us which means are different—only that a difference exists.

MSnSet.utils::limma_gen is a wrapper around functions from the limma package that performs one-way ANOVA. We will use it to test if there is a significant difference between any two levels of SUBTYPE: “Immunoreactive”, “Proliferative”, “Mesenchymal”, and “Differentiated”. Since SUBTYPE is a factor, the first level (“Immunoreactive”) will be used as the reference. That is, we will be testing whether the means of the “Proliferative”, “Mesenchymal”, or “Differentiated” groups are different from the mean of the “Immunoreactive” group for each feature in the MSnSet m.

anova_res <- limma_gen(eset = m, model.str = "~ SUBTYPE", 
                       coef.str = "SUBTYPE")
head(arrange(anova_res, adj.P.Val)) # top 6 rows arranged by adjusted p-value
##             SUBTYPEProliferative SUBTYPEMesenchymal SUBTYPEDifferentiated
## NP_055140.1           -0.4979740         0.24131186            -0.3342889
## NP_000388.2           -1.2232098        -0.21980158            -0.7849428
## NP_009005.1           -1.0097220         0.04832193            -0.6224298
## NP_000878.2           -0.7633419         0.07176514            -0.5563074
## NP_001944.1           -1.3465807        -0.17808291            -0.9476618
## NP_115584.1           -0.2718495         0.93758021             0.1842301
##                   AveExpr        F      P.Value    adj.P.Val
## NP_055140.1  2.269399e-18 24.74128 3.642291e-11 2.951348e-07
## NP_000388.2 -3.421920e-18 23.63972 8.266856e-11 3.349317e-07
## NP_009005.1 -1.273715e-17 19.72001 1.784885e-09 4.820974e-06
## NP_000878.2 -1.710960e-18 18.89587 3.521123e-09 5.195885e-06
## NP_001944.1 -5.322987e-18 19.03216 3.144239e-09 5.195885e-06
## NP_115584.1  1.172771e-18 18.76318 4.488608e-09 5.195885e-06

The row names are the features that were tested, and the first three columns are the average log2 fold-changes for each contrast: “Proliferative - Immunoreactive”, “Mesenchymal - Immunoreactive”, and “Differentiated - Immunoreactive”. That is, a positive value indicates that the mean of the “Immunoreactive” group is lower than the mean of the other group, and a negative value indicates that the mean of the “Immunoreactive” group is higher than the mean of the other group. To find the logFC between the “Proliferative” and “Mesenchymal” groups for protein NP_055140.1, for example, we would take the difference between “SUBTYPEProliferative” and “SUBTYPEMesenchymal”: -0.498 - 0.241 = -0.739. The other columns are

  • AveExpr overall mean (same as rowMeans(exprs(m), na.rm = TRUE))
  • F moderated F-statistic
  • P.Value p-value
  • adj.P.Val BH-adjusted p-value

Below is a graphical representation of the results for a specific feature. This is not a required step; it is just a visual explanation of the results.

The next step would be to check the p-value histograms. If those look fine, we can tally the number of significant features.

table(anova_res$adj.P.Val < 0.05)
## 
## FALSE  TRUE 
##  7049  1054

1054 features have adjusted p-values less than 0.05. Since the expected FDR is 0.05, we estimate that at most ~53 of these are false positives.