Tidy T-Test in R With Cohen’s D

A dry-run of my blog - and a useful (if pedestrian) r function I’ve written to make t-test output more informative.

Doctoral Student at the CUNY Graduate Center and Quantitative Reasoning Fellow at the CUNY School for Labor and Urban Studies


Sunday, January 31, 2021


Monday, October 16, 2023

The independent-samples \(t\)-test is one of the most commonly used statistical tests and is taught in most introductory statistics courses. In broad strokes, the independent-samples \(t\)-test is used to compare the means of some continuous variable between two qualitatively different groups. Associated with this mean difference is a \(p\)-value. Although there’s more to the story, for our purposes we can define the \(p\)-value as the probability of observing a difference between the means this large or larger, just due to chance, if there were in reality no difference between the groups. Although arbitrary, researchers will adopt (for most applications) a cutoff \(p\)-value of .05, a value referred to as alpha. If the \(p\)-value falls below alpha, we can conclude that the difference between the means is large enough to warrant further scrutiny.

Where’s My Effect Size?

Effect sizes help contextualize the results of statistical tests, which consensus is more often beginning to recognize as the least interesting part of statistical output. As most introductory statistics courses will teach, you should use effect sizes like Cohen’s \(d\) to get a sense of how impressive the magnitude of the mean difference is. The suggested rules of thumb for interpreting Cohen’s \(d\) are (Cohen, 1988):

  • \(\le 0.2\) is “small”
  • \(0.5\) is “medium”
  • \(\ge 0.8\) is “large”

One would be forgiven for expecting to see this reported alongside the default output of standard statistical software. You almost always need it. I so far have been unable to find a function that provides this, and have become annoyed enough to finally create my own wrapper function that uses R’s built-in t.test() function and the cohen.d() function from the psych package. The output, unlike that of the default t.test() function, is in a tidy format.

Feel free to skip to the full function at the bottom; if you want a preview, keep reading.

Walk-Through of my.t()

Default T-Test in R

The t.test() function, which is built in to base-R along with other mainstream tests, has messy output. It also doesn’t provide the mean difference, the standard error of the difference, or Cohen’s \(d\). As an example of its use, I’ll conduct a \(t\)-test to see whether automatic or manual transmissions are associated with greater miles per gallon. This data can be found in the mtcars dataset (from the datasets package, also available in base-R):

# t.test(dv ~ iv, data)
# am = automatic or manual
# mpg = miles per gallon
t.test(mpg ~ am, mtcars)

    Welch Two Sample t-test

data:  mpg by am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -11.280194  -3.209684
sample estimates:
mean in group 0 mean in group 1 
       17.14737        24.39231 

In this dataset, vehicles with automatic transmissions are coded as 0, and manuals as 1. The dependent variable is miles per gallon. The mean miles per gallon for vehicles with automatic transmissions (17.15) is quite a bit lower than for vehicles with manual transmissions (24.39). This difference is statistically significant, \(p = .001\) (that’s less than an alpha of .05), meaning the means are different enough that we might interested in investigating a bit further. At this point, you’d usually calculate Cohen’s \(d\).

T-Test Using my.t()

my.t(mtcars, iv = "am", dv = "mpg") %>% 
  huxtable_print() %>% 
  set_align(everywhere, value = "center") 
Table 1: T-Test With my.t() (IV = Transmission and DV = Miles Per Gallon)
M_Group_1 M_Group_2 SD_Group_1 SD_Group_2 Mean_Difference SE CI_Lower CI_Upper t df p Cohens_d d_CI_Lower d_CI_Upper
17.1 24.4 3.83 6.17 -7.24 1.92 -11.3 -3.21 -3.77 18.3 <0.001 -1.48 -2.27 -0.67

The output in Table 1 above provides all of the relevant statistics, clearly labeled, with Cohen’s \(d\) included, and in a tidy format that can easily be exported to a .csv file, copied-and-pasted into Word, or printed within an R Markdown as a table as I have done here (using the huxtable package).

Output Formatting

Currently the output is in wide format, meaning that each statistic is its own column with the values for those statistics contained in the table row. If you prefer a different format, I have also added the long = argument that can optionally display output in long format, with all of the values in a single column with the statistic indicated by the row:

my.t(mtcars, iv = "am", dv = "mpg", long = TRUE) %>% 
  huxtable_print() %>% 
  set_align(col = 1, value = "left") %>% 
  set_align(col = 2, value = "center")
Table 2: T-Test With my.t() in Long Format (IV = Transmission and DV = Miles Per Gallon)
Statistic Value
M_Group_1 17.15
M_Group_2 24.39
SD_Group_1 3.83
SD_Group_2 6.17
Mean_Difference -7.24
SE 1.92
CI_Lower -11.28
CI_Upper -3.21
t -3.77
df 18.33
p <0.001
Cohens_d -1.48
d_CI_Lower -2.27
d_CI_Upper -0.67

Personally, I like the long format, because usually I am running more than one \(t\)-test. For example, say that in my car-shopping, miles per gallon was one of just several criteria I were using to decide what kind of car to buy. In addition to miles per gallon, I might compare automatic vs manual transmissions in their level of displacement, which roughly describes “how fast a vehicle gets up and goes” (my dad built engines for 40 years, so I hope he will forgive me if this description is less than accurate). Displacement is measured in cubic inches. Below is a table with the output of both tests side-by-side:

# conduct the tests
mpg.test <- my.t(mtcars, iv = "am", dv = "mpg", long = TRUE)
disp.test <- my.t(mtcars, iv = "am", dv = "disp", long = TRUE)
# relabel the columns with the name of the dv
mpg.test <- mpg.test %>% rename(MPG = Value)
disp.test <- disp.test %>% rename("Disp." = Value)
# combine into one dataframe
my.results <- left_join(mpg.test, disp.test, by = "Statistic")
# print the table
my.results %>% 
  huxtable_print()  %>% 
  set_align(col = 1, value = "left") %>% 
  set_align(col = 2, value = "center")
Table 3: Comparing Group Differences in MPG and Displacement
Statistic MPG Disp.
M_Group_1 17.15 290.38
M_Group_2 24.39 143.53
SD_Group_1 3.83 110.17
SD_Group_2 6.17 87.2
Mean_Difference -7.24 146.85
SE 1.92 34.98
CI_Lower -11.28 75.33
CI_Upper -3.21 218.37
t -3.77 4.2
df 18.33 29.26
p <0.001 <0.001
Cohens_d -1.48 1.45
d_CI_Lower -2.27 0.64
d_CI_Upper -0.67 2.23

As shown in Table 3, the mean difference in displacement between automatics and manuals is 146.85, meaning that automatic transmissions are associated with an extra 146.85 cubic inches of displacement on average. So while you might lose ~7 miles to the gallon by going with an automatic transmission, you can also expect to gain ~147 cubic inches of displacement.

OK, but what the heck does that mean? Is that an impressive number? It’s hard to tell for those who don’t spend time dealing with vehicle displacement. At least with miles per gallon, I can readily discern about how much money I would save if my vehicle got an extra 7 miles to the gallon. Displacement is a black box to me - I can’t tell if that mean difference in displacement is meaningful or not, especially not compared to how meaningful the difference in miles per gallon was.

In Table 3, the Cohen’s \(d\) statistics for miles per gallon and displacement are conveniently side-by-side. Both are very large effect sizes - quite a bit larger than 0.8, the rule of thumb for a large. Although the sign for displacement is negative (indicating that manual transmissions are associated with lower displacement than automatics), the absolute value is very close to that of miles per gallon. It seems like whatever gas I’d save with a manual transmission, I’d lose an equally-meaningful amount of displacement per cubic inch. That is, with Cohen’s \(d\), I can see that the magnitude of the differences between between automatic and manual transmissions (effect sizes of 1.53 and -1.49, respectively) are pretty similar, even though the mean differences themselves (7.24 miles per gallon and -146.85 cubic inches of displacement, respectively) are not comparable numbers.

Of course, miles per gallon and displacement might be too conceptually different to be compared meaningfully without added context. These tests would be most helpful if you also knew what adding 146.85 of displacement per cubic inch would feel like in terms of pushing your back against the seat as you accelerate in your new car. Maybe you would need, say, twice as much displacement (or 293.7 cubic inches) in order for a gain in displacement to be noticeable. But in the absence of better information, it’s nice to know for the sake of car shopping that the magnitude of the difference between the groups is basically the same for miles per gallon and displacement.

Paired Samples T-Tests

anxiety <- ToothGrowth %>% 
  mutate(time.point = recode(supp, "OJ" = "before", "VC" = "after")) %>% 
  rename(anxiety = len)
levels(anxiety$time.point) <- c("before", "after")

my.t(anxiety, iv = "time.point", dv = "anxiety", paired = TRUE) %>% 
  huxtable_print() %>%  
  set_align(everywhere, value = "center") 
M_Before M_After SD_Before SD_After Mean_Difference SE CI_Lower CI_Upper t df p Cohens_d d_CI_Lower d_CI_Upper
20.7 17 6.61 8.27 3.7 1.93 -0.17 7.57 1.92 55.3 0.060 0.49 -0.02 1.01

The function can handle repeated-measures data. For example, say we had data from a study that looked to see whether anxiety decreases following a round of exercise. I have two columns in my dataset: anxiety score, and time.point (before or after exercise). This would call for a paired samples t-test.

All you have to do is tack on the paired = TRUE argument to my.t:

Comparing Anxiety Before and After Exercise
M_Before M_After SD_Before SD_After Mean_Difference SE CI_Lower CI_Upper t df p Cohens_d d_CI_Lower d_CI_Upper
20.7 17 6.61 8.27 3.7 1.93 -0.17 7.57 1.92 55.3 0.060 0.49 -0.02 1.01

Because the after group had a lower anxiety score than the before group, the mean difference and Cohen’s \(d\) are positive. The reason is that the default behavior of the function is to subtract in the order of the level of the dv, which in this case is:

[1] "before" "after" 

In this case, the “before” time point is the first level, so the mean difference and Cohen’s \(d\) is based on mean.before - mean.after. But this is an odd order if you’re trying to interpret the mean difference as the mean change in anxiety over time. If thinking that way, the mean difference is really the mean change, which was a decrease in anxiety, which means the mean difference should be negative. Just change the order of the levels:

# change order of levels
anxiety$time.point <- factor(anxiety$time.point, levels = c("after", "before"))
# re-run t-test
my.t(anxiety, iv = "time.point", dv = "anxiety", paired = TRUE) %>% 
  huxtable_print()  %>% 
  set_align(everywhere, value = "left") 
M_After M_Before SD_After SD_Before Mean_Difference SE CI_Lower CI_Upper t df p Cohens_d d_CI_Lower d_CI_Upper
17 20.7 8.27 6.61 -3.7 1.93 -7.57 0.17 -1.92 55.3 0.060 -0.49 -1.01 0.02
M_After M_Before SD_After SD_Before Mean_Difference SE CI_Lower CI_Upper t df p Cohens_d d_CI_Lower d_CI_Upper
17 20.7 8.27 6.61 -3.7 1.93 -7.57 0.17 -1.92 55.3 0.060 -0.49 -1.01 0.02

The mean difference and Cohen’s \(d\) are now negative to reflect that anxiety decreased after exercise.

Full Function

Copy-and-paste this function to use it in your own R projects. You will be prompted to install the tidyverse and psych packages if you do not already have them. Click the Code drop-down below to show the full function:

my.t <- function(data, iv, dv, long = FALSE, paired = FALSE, ...) {
    means.SDs <- suppressMessages(
      data %>%
        group_by_at(iv) %>%
          mean = mean(eval(parse(text = dv)), na.rm = TRUE),
          sd = sd(eval(parse(text = dv)), na.rm = TRUE)) %>%
        pivot_wider(names_from = iv, values_from =  c("mean", "sd"), names_sep = ".")
    iv.2 <- rlang::sym(iv)
    dv.2 <- rlang::sym(dv)
    form <- expr(!! dv.2 ~ !! iv.2)
    t.tests <- t.test(eval(form), data, var.equal = FALSE, ...) 
    Ds <- cohens_d(eval(form), data = data, ...) %>% as.tibble
    stats <- t.tests %>% tidy
    std.err <- t.tests$stderr
    cols <- c("conf.low", "conf.high", "statistic", "parameter", "p.value")
    means.table <- cbind(means.SDs, stats[, "estimate"], std.err, stats[, cols], select(Ds, -CI))
    means.table <- means.table %>%
        "Mean_Difference" = estimate,
        SE = std.err,
        "CI_Lower" = conf.low,
        "CI_Upper" = conf.high,
        "t" = statistic,
        df = parameter,
        p = p.value,
        "Cohens_d" = Cohens_d,
        "d_CI_Lower" = CI_low,
        "d_CI_Upper" = CI_high
      ) %>%
      mutate(across(.cols = where(is.numeric), round, 2), p = scales::pvalue(p))
    rownames(means.table) <- NULL
    if(paired == FALSE) {
      means.table <- means.table %>%  
          "M_Group_1" = mean.0,
          "M_Group_2" = mean.1,
          "SD_Group_1" = sd.0,
          "SD_Group_2" = sd.1
    } else {
      means.table <- means.table %>%
          "M_Before" = mean.before,
          "M_After" = mean.after,
          "SD_Before" = sd.before,
          "SD_After" = sd.after 
    if (long == TRUE) {
      means.table.long <- means.table %>% 
        t %>% 
        data.frame("Value" = .) %>% 
    } else{ return(means.table) }


Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Routledge.


BibTeX citation:
@online{e. vanaman2021,
  author = {E. Vanaman, Matthew},
  title = {Tidy {T-Test} in {R} {With} {Cohen’s} {D}},
  date = {2021-01-31},
  url = {https://matthewvanaman.com/posts/2021-02-05-tidy-t-test-in-r-with-cohen-s-d/},
  langid = {en}
For attribution, please cite this work as:
E. Vanaman, M. (2021, January 31). Tidy T-Test in R With Cohen’s D.