7.2 Two-Sample t-tests

Two-sample t-tests are used to determine whether there is a significant difference between the means of two groups. The null hypothesis is that the group means are equal, and the alternative is that they are not equal. Written another way, the null hypothesis is that the difference in means is zero, and the alternative is that the difference is not zero.

limma_a_b and limma_contrasts can perform moderated two-sample t-tests.

7.2.1 One comparison

If a factor only has two groups, we can use limma_a_b. We will test for differences between the “RESISTANT” and “TREATMENT” groups without accounting for any covariates.

# Test RESISTANT - SENSITIVE = 0
t_res1 <- limma_a_b(eset = m, model.str = "~ PLATINUM.STATUS", 
                    coef.str = "PLATINUM.STATUS")
head(arrange(t_res1, adj.P.Val)) # top 6 rows sorted by adjusted p-value
##                  logFC      AveExpr         t     P.Value adj.P.Val         B
## NP_000007.1  0.4924636  0.050292811  2.746579 0.008196829 0.9569777 -3.618784
## NP_000013.2 -0.2221522  0.062925069 -1.373152 0.175457942 0.9569777 -4.484676
## NP_000030.1 -0.3525116 -0.051132133 -1.251127 0.216355370 0.9569777 -4.538114
## NP_000031.1 -0.3487920  0.046000375 -1.248196 0.217418216 0.9569777 -4.539343
## NP_000032.1 -0.3564539  0.003432085 -1.445320 0.154225965 0.9569777 -4.451000
## NP_000037.2 -0.2507794  0.041070595 -1.770298 0.087859482 0.9569777 -4.386374

The logFC column is the difference in means between the “RESISTANT” and “SENSITIVE” groups (the first level is always the reference; use levels(m$PLATINUM.STATUS) to check). The other columns are

  • AveExpr overall mean (same as rowMeans(exprs(m), na.rm = TRUE))
  • t moderated t-statistic
  • P.Value p-values
  • adj.P.Val BH-adjusted p-values
  • B log-odds of differential expression/abundance

Below is a graphical representation of the results for a specific feature. This is not a required step. It is just to visually explain the results.

The next step would be to check the p-value histograms. If those look fine, we can tally the number of significant features.

# TRUE - significant, FALSE - not significant
table(t_res1$adj.P.Val < 0.05)
## 
## FALSE 
##  8101

None of the features are significantly different between the two PLATINUM.STATUS groups at the 0.05 FDR level.

7.2.2 Multiple comparisons

Now, we will move on to an example of how to use limma_contrasts, which is suited for comparing groups against a reference. We will treat “Immunoreactive” as the reference group for this example, though this does not really make sense in the context of this data. It would make more sense to do a one-way ANOVA with limma_gen (Section 7.3). This is just for example purposes.

We will test the following contrasts. Each level must begin with the variable name, or limma_contrasts will not work.

# Contrasts to test
contrasts <- paircomp(x = m$SUBTYPE, name = "SUBTYPE", 
                      ref = "Immunoreactive")
contrasts
## [1] "SUBTYPEProliferative-SUBTYPEImmunoreactive" 
## [2] "SUBTYPEMesenchymal-SUBTYPEImmunoreactive"   
## [3] "SUBTYPEDifferentiated-SUBTYPEImmunoreactive"

By default, limma_contrasts generates diagnostic plots. For now, we will not make these plots. We also need to specify a no-intercept model by including 0 in model.str.

# Test contrasts
t_res2 <- limma_contrasts(eset = m, model.str = "~ 0 + SUBTYPE", 
                          coef.str = "SUBTYPE", contrasts = contrasts,
                          trend = TRUE, robust = TRUE, # passed to eBayes
                          plot = FALSE)
head(arrange(t_res2, adj.P.Val)) # top 6 rows sorted by adjusted p-value
##            RefSeq      logFC       AveExpr         t      P.Value    adj.P.Val
## 1:    NP_000388.2 -1.2232098 -3.421920e-18 -7.703025 4.549213e-11 1.105868e-06
## 2:    NP_001944.1 -1.3465807 -5.322987e-18 -6.637345 4.520647e-09 3.191542e-05
## 3:    NP_112092.1 -1.0268282 -9.315227e-18 -6.602133 5.251623e-09 3.191542e-05
## 4:    NP_002323.2  0.6707465  3.564500e-18  6.698537 3.482606e-09 3.191542e-05
## 5: NP_001120963.1 -0.9267318 -2.281280e-18 -6.358267 1.475334e-08 7.172780e-05
## 6:    NP_009005.1 -1.0097220 -1.273715e-17 -6.243353 2.392300e-08 8.017389e-05
##            B        feature                     contrast
## 1: 14.750586    NP_000388.2 Proliferative-Immunoreactive
## 2: 10.463800    NP_001944.1 Proliferative-Immunoreactive
## 3: 10.323954    NP_112092.1 Proliferative-Immunoreactive
## 4: 10.602125    NP_002323.2   Mesenchymal-Immunoreactive
## 5:  9.360149 NP_001120963.1 Proliferative-Immunoreactive
## 6:  8.909146    NP_009005.1 Proliferative-Immunoreactive

In addition to the columns from the output of limma_a_b, limma_contrasts creates a column for the contrasts and includes all columns from fData. It is important to note that p-values in the adj.P.Val column have been adjusted across all features and contrasts, so testing more contrasts results in fewer significant features. It is best to test only a small number of related contrasts.

Below is a graphical representation of the results for a specific feature.

The next step would be to check the p-value histograms. If those look fine, we can tally the number of significant features.

# TRUE - significant, FALSE - not significant
table(t_res2$contrast, t_res2$adj.P.Val < 0.05)
##                                
##                                 FALSE TRUE
##   Differentiated-Immunoreactive  8052   51
##   Mesenchymal-Immunoreactive     7908  195
##   Proliferative-Immunoreactive   7848  255

If we take the 51, 195, and 255 features with the lowest adjusted p-values from the “Differentiated-Immunoreactive”, “Mesenchymal-Immunoreactive”, and “Proliferative-Immunoreactive” comparisons, respectively, the overall estimated FDR is at most 0.05. That is, we expect ~25 out of those 501 to be false positives.

More features are significantly different between the “Proliferative” and “Immunoreactive” groups than in the other comparisons.