Create a three-way contingency table

Contingency tables are a common way to describe the relationship between two to three categorical variables. You want to create one from three categorical columns in a data frame.

Step 1 - Start with a tidy data frame. Each of the variables that you want to appear in the contingency table should be in its own column.

Step 2 = Call janitor::tabyl() on the three variables for the contingency table. Pass it the variables that contain row names, column names, and group names for the finished table, in that order.

data %>% 
  tabyl(row_var, col_var, group_var)

Step 3 (Optional) - Use dplyr::bind_rows(.id = ) to combine the results into a single table. Set .id to a name for the grouping variable, as a character string. This is preparation for Step 4.

Step 4 (Optional) - Use gt::gt() to polish the table for publication. Identify the columns that contain the row names and group names as you do.

data %>% 
  tabyl(row_var, col_var, group_var) %>% 
  bind_rows(.id = "group_var") %>% 
  gt(
    rowname_col = "row_var",
    groupname_col = "group_var"
  ) 

Step 5 (Optional) - Further polish the table with functions from the gt package.

Step 6 (Optional) - Export your table with gt::gtsave(). Provide a filename that ends with a common table format (.html, .tex, .ltx, .rtf). gtsave() will automatically save your table to that format.

Example

fda_adverse describes adverse events reported to the FDA for a variety of drugs. We are interested to see if within each country there is a relationship between the sex of a respondent and whether or not the adverse event they reported was serious. (An adverse event is serious if it resulted in death, a life-threatening condition, hospitalization, disability, birth defects, or some other severe condition).

To begin, we first load the janitor and gt packages.

library(janitor)
library(gt)

Then we call tabyl() on the three variables of interest, sex, serious, and country, in fda_adverse. Since tabyl() returns a list of tables, we use bind_rows() to collapse them into a single table.

fda_adverse %>% 
  tabyl(sex, serious, country) %>% 
  bind_rows(.id = "country")
        country    sex FALSE TRUE
      Australia female     0   12
      Australia   male     0   33
          Japan female     8  425
          Japan   male     5  680
 United Kingdom female     2  319
 United Kingdom   male     0  232
  United States female   261  516
  United States   male    56  332

We then pass the table to gt(), taking care to specify which column contains the row names and which column contains the group names. We also use gt functions to label the columns variable and to make the group names bold.

fda_adverse %>%
  tabyl(sex, serious, country) %>% 
   bind_rows(.id = "country") %>%
  gt(
    rowname_col = "sex",
    groupname_col = "country"
  ) %>% 
  tab_spanner(
    columns = 3:4,
    label = "Serious Event?"
  ) %>% 
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_row_groups()
  )
Serious Event?
FALSE TRUE
Australia
female 0 12
male 0 33
Japan
female 8 425
male 5 680
United Kingdom
female 2 319
male 0 232
United States
female 261 516
male 56 332

Contingency tables and SAS

Contingency tables are obtained in SAS using the FREQ procedure with the TABLES statement.

In SAS:

PROC FREQ DATA = data;
  TABLES group_var * row_var * col_var / MISSING NOPERCENT NOCOL NOROW;
RUN;

In R:

data %>% 
  tabyl(row_var, col_var, group_var) %>% 
  bind_rows(.id = "group_var") %>% 
  gt(
    rowname_col = "row_var",
    groupname_col = "group_var"
  ) %>% 
  tab_spanner(
    columns = -1:2,
    labels = "Column Variable Name"
  )