am
and vs
Variables
Observed | Expected | |||
vs = 0 | vs = 1 | vs = 0 | vs = 1 | |
am = 0 | 12 | 7 | 10.6875 | 8.3125 |
am = 1 | 6 | 7 | 7.3125 | 5.6875 |
Doctoral Student at the CUNY Graduate Center and Quantitative Reasoning Fellow at the CUNY School for Labor and Urban Studies
Saturday, February 6, 2021
Wednesday, July 19, 2023
If you are interested in some unsolicited advice about choosing tests and/or a walk-through of this function, keep reading. Otherwise, feel free to skip to the full function at the bottom.
Most students in the sciences learn about the \(\chi^{2}\) test of independence in their introductory statistics course. The idea is that if binary variables x and y are unrelated, then the observed values and expected values will be identical, in which cases \(\chi^{2}\) is equal to zero. The expected values are just the counts of each combination of x and y we would expect if x and y were unrelated in the population, while the observed counts are the counts we actually have.
The \(\chi^{2}\) value, then, is a function of the difference between the expected values and the observed values (the values in from the sample). The greater the difference between expected and observed values, the lower the probability that the difference between them (or a larger difference) is due to chance, assuming that there is actually no difference at the population level. Thus all the \(\chi^{2}\) “test” really means is that we check to see whether the \(p\)-value from the \(\chi^{2}\) we get is less than alpha, which is usually a cutoff \(p\)-value of .05.
One assumption of the \(\chi^{2}\) test violated fairly often is that no expected values are less than 5. This is sort of an arbitrary cutoff - the larger the expected counts, the more accurate the \(\chi^{2}\) value will be (in general). Unlike the \(\chi^{2}\) test, the Fisher’s exact test is accurate even in the presence of expected values less than 5. In cases where the expected values are all well above 5, the \(p\)-values from the tests will be close, if not identical. There isn’t really a drawback to using Fisher’s exact test, so it makes sense to just always use Fisher’s exact test. This argument is analogous to the argument that we should always use Welch’s instead of Student’s \(t\)-test by default .
As an illustration, let’s compare the results of a \(\chi^{2}\) test and Fisher’s exact test. We’re interested in seeing whether transmission type (am
, 0 = automatic, 1 = manual) and engine shape (vs
, 0 = V-shaped, 1 = straight) are related. These date come from the mtcars
dataset, which is built in to R. Here are the observed and expected values:
am
and vs
Variables
Observed | Expected | |||
vs = 0 | vs = 1 | vs = 0 | vs = 1 | |
am = 0 | 12 | 7 | 10.6875 | 8.3125 |
am = 1 | 6 | 7 | 7.3125 | 5.6875 |
In R, we can use the built-in stats package to get a \(\chi^{2}\) test, which prints the following output:
Pearson's Chi-squared test with Yates' continuity correction
data: mtcars$am and mtcars$vs
X-squared = 0.34754, df = 1, p-value = 0.5555
We can also use the stats package to get a Fisher’s exact test. Here is what the exact test gives us:
Fisher's Exact Test for Count Data
data: mtcars$am and mtcars$vs
p-value = 0.4727
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.3825342 10.5916087
sample estimates:
odds ratio
1.956055
Notice that the \(p\)-values are a bit different. By definition, the Fisher’s exact test is more accurate - it’s an “exact” test, while the \(\chi^{2}\) is an approximation test. In other words, the \(\chi^{2}\) test merely aims to be “close enough”, because back in the day, people had to do these tests by hand, and getting the exact answers would be nearly impossible by hand. Today, computers can solve for these answers for us. Assuming adequate sample size, the \(\chi^2\) test is usually fine, though it will become less accurate when sample sizes become smaller. Not so with Fisher’s exact test.
The stats
package, which comes preloaded in RStudio, provides chi.sq()
and fisher.test()
functions. Personally I don’t like the format of the output. I’ve written a function that will conduct Fisher’s exact test and provide output including the odds ratio, \(p\)-value, and confidence intervals in tidy wide or long output, along with pre-formatted cross-tabulation tables with counts, percentages, and totals. I usually want to look at these things to help put my Fisher test results in context. I wrote a function which I’ll call my.fisher()
that gives me some crucial information in the output, ready to be looked at without further effort. Here’s the first Fisher’s exact test, but this time using my.fisher()
:
am | vs = 0 | vs = 1 | Total |
0 | 37.50% (12) | 21.88% (7) | 59.38% (19) |
1 | 18.75% (6) | 21.88% (7) | 40.62% (13) |
Total | 56.25% (18) | 43.75% (14) | 100.00% (32) |
Odds Ratio | p | CI_Lower | CI_Higher | Test | Direction |
1.96 | 0.473 | 0.38 | 10.59 | Fisher's Exact Test for Count Data | two.sided |
Ah, that’s easier on the eyes. We can also get that in long format too:
Statistic | Value |
Odds Ratio | 1.96 |
p | 0.473 |
CI_Lower | 0.38 |
CI_Higher | 10.59 |
Test | Fisher's Exact Test for Count Data |
Direction | two.sided |
You can also pass arguments to janitor::fisher.test()
. For example, say you wanted to do a leff-sided single-tailed test. Just add alternative = "less"
to my.fisher()
(note that the default is a two-sided test):
Odds Ratio | p | CI_Lower | CI_Higher | Test | Direction |
1.96 | 0.906 | 0.00 | 8.34 | Fisher's Exact Test for Count Data | less |
Notice the \(p\)-value is smaller; that is because one-tailed tests are more powerful than two-tailed tests, assuming there is a real relationship at the population level, specifically in the direction of the test.
An added benefit of Fisher’s exact test is that unlike \(\chi^{2}\), it can give you a \(p\)-value in cases where x, y, or both have greater than 2 categories (resulting in a 2 \(\times\) 3 table, for example). This function can also accommodate nominal variables with greater than 2 categories. For example, a 2 \(\times\) 3 table between transmission type and number of cylinders (cyl
, 4, 6, or 8):
p.value | method | alternative |
0.009 | Fisher's Exact Test for Count Data | two.sided |
The odds ratio left out of the output in these cases, because it is not applicable to exact tests one or more variables has greater than 2 categories. In this case, you’d want to look at a crosstabulation of x and y to contextualize the \(p\)-value. This function provides them for you in a nice format with the help of the janitor
package.
cyl | ||||
am | 4 | 6 | 8 | Total |
0 | 9.38% (3) | 12.50% (4) | 37.50% (12) | 59.38% (19) |
1 | 25.00% (8) | 9.38% (3) | 6.25% (2) | 40.62% (13) |
Total | 34.38% (11) | 21.88% (7) | 43.75% (14) | 100.00% (32) |
Table 7 shows the cross-tabulation between transmission type and number of cylinders, with percentages and counts (the latter in parentheses). You also get row and column totals, along with the sample total in the right-most bottom corner. You also get two other cross-tabulation, one with percentages calculated relative the row total; the other with percentages calculated relative to the column total. Here’s an example with row totals:
cyl | ||||
am | 4 | 6 | 8 | Total |
0 | 15.79% (3) | 21.05% (4) | 63.16% (12) | 100.00% (19) |
1 | 61.54% (8) | 23.08% (3) | 15.38% (2) | 100.00% (13) |
Total | 34.38% (11) | 21.88% (7) | 43.75% (14) | 100.00% (32) |
Ah - with the row-wise percentages, you can see that the likely driver of statistical significance here is the inverse relationship between transmission type and cylinder. More than half of automatics have 8 cylinders, while more than half of manuals have only 4 cylinders. Looks like an interaction effect!
Copy-and-paste this function to use it in your own R
projects. You will be prompted to install the tidyverse
broom
, and janitor
packages if you do not already have them. Click the Code drop-down below to show the full function:
library(tidyverse)
library(janitor)
library(broom)
my.fisher <-
# ... allows you to pass arguments to functions where ... are present
function(data, ..., alternative = "two.sided") {
# load required packages
require(janitor)
require(dplyr)
require(broom)
# make.pretty is a function-within-a-function that
# adds totals and formatting using janitor package
make.pretty <- function(tab, ...){
tab %>%
janitor::adorn_totals(where = c("row", "col")) %>%
janitor::adorn_percentages(...) %>%
janitor::adorn_pct_formatting(digits = 2) %>%
janitor::adorn_ns() %>%
janitor::adorn_title()
}
# this gets the n x n crosstab
my.tab <- janitor::tabyl(
data,
...,
show_na = FALSE,
show_missing_levels = FALSE)
# use make.pretty to add totals and formatting
my.tab.all <- make.pretty(tab = my.tab, denominator = "all")
my.tab.row <- make.pretty(tab = my.tab, denominator = "row")
my.tab.col <- make.pretty(tab = my.tab, denominator = "col")
# get the fisher's exact test
# can pass arguments to stats::fisher.test via ...
# this determines which type of test to run, depending
# on alternative = argument
# default is "two.sided"
if(alternative == "greater"){
fish <- my.tab %>%
janitor::fisher.test(., alternative = "greater")
} else {
if (alternative == "less"){
fish <- my.tab %>%
janitor::fisher.test(., alternative = "less")
}
fish <- my.tab %>%
janitor::fisher.test(., alternative = "two.sided")
}
fish <- fish %>%
# organizes output into a table
broom::tidy() %>%
mutate(p.value = scales::pvalue(p.value)) %>%
# rounds values
dplyr::mutate_if(is.numeric, round, 2) %>%
# makes them into character format with exactly 2 decimal points
# so that numeric and character columns can be combined (for long format)
dplyr::mutate_if(is.numeric, format, nsmall = 2)
# if estimate column is present (in case of 2x2's), rename as "odds.ratio"
if ("estimate" %in% names(fish)) {
fish <- fish %>%
dplyr::rename(
"Odds Ratio" = estimate,
p = p.value,
CI_Lower = conf.low,
CI_Higher = conf.high,
Test = method,
Direction = alternative
)
fish.long <- fish %>%
# reshape to long format
tidyr::pivot_longer(
cols = everything(),
# name of column that gets the row values
values_to = "Value",
# name of column that gets the column names
names_to = "Statistic")
}
# this is the same thing as above, except this gets done when
# estimate column is not present
fish.long <- fish %>%
tidyr::pivot_longer(
cols = everything(),
values_to = "Value",
names_to = "Statistic")
# what does the function give as output?
# left side of = is what you'll see with $
# right side is object saved somewhere above
return(list(
crosstab.all = my.tab.all,
crosstab.row = my.tab.row,
crosstab.column = my.tab.col,
fish.test = fish,
fish.test.long = fish.long))
}
@online{e._vanaman2021,
author = {E. Vanaman, Matthew},
title = {Tidy {Fisher’s} {Exact} {Test} {In} {R} {(And} {Why} {To}
{Always} {Use} {It)}},
date = {2021-02-06},
url = {https://matthewvanaman.com/posts/2021-02-06-tidy-fisher-s-exact-test-in-r/},
langid = {en}
}