Create a two-way contingency table

Contingency tables are a common way to describe the relationship between two categorical variables. You want to create one from two categorical columns in a data frame.

Step 1 - Start with a tidy data frame. Each of the variables that you want to appear in the contingency table should be in its own column.

Step 2 - Call janitor::tabyl() on the two variables for the contingency table. Pass it the variable that contains the row names for your new table, then the variable that contains the column names.

data %>% 
  tabyl(row_var, col_var) 

Step 3 (Optional) - Use gt::gt() to polish the table for publication. As you do, identify the column that contains the rownames. gt() uses this information to format the table as a contingency table.

data %>% 
  tabyl(row_var, col_var) %>% 
  gt(rowname_col = "row_var") 

Step 4 (Optional) - Further polish the table with functions from the gt package.

Step 5 (Optional) - Export your table with gt::gtsave(). Provide a filename that ends with a common table format (.html, .tex, .ltx, .rtf). gtsave() will automatically save your table to that format.

Label the column variable

By default, the values of your column variable will appear in the two-way table, but not its name. You can add the name of the column variable above the columns with gt::tab_spanner().

data %>% 
  tabyl(row_var, col_var) %>% 
  gt(rowname_col = "row_var") %>% 
  tab_spanner(
    columns = -row_var, # select all columns except the row variable
    label = "Column variable" # the name to display
  )

Or you can combine the row/column variable names in the first column name with janitor::adorn_title(placement = "combined"):

data %>% 
  tabyl(row_var, col_var) %>% 
  adorn_title(placement = "combined")

Example

fda_adverse describes adverse events reported to the FDA for a variety of drugs. We are interested to see if there is a relationship between the sex of a respondent and whether or not the adverse event they reported was serious. (An adverse event is serious if it resulted in death, a life-threatening condition, hospitalization, disability, birth defects, or some other severe condition).

To begin, we first load the janitor and gt packages.

library(janitor)
library(gt)

Then we call tabyl() on the two variables of interest, sex and serious, in fda_adverse.

fda_adverse %>% 
  tabyl(sex, serious)
    sex FALSE TRUE
 female   271 2562
   male    61 2871

Next, we pass the table to gt() and tweak it with the gt package to add a column variable header, row variable header, and title.

fda_adverse %>% 
  tabyl(sex, serious) %>% 
  gt(
    rowname_col = "sex" # chose column to be in rows
  ) %>% 
  tab_spanner(          # add variable name to the columns
    columns = 2:3,
    label = "Serious Event?"
  ) %>% 
  tab_stubhead(         # add row variable header
    label = "Sex of Respondent" 
  ) %>% 
  tab_header(           # add title
    title = "FDA Adverse Events by Sex and Event Seriousness"
  )
FDA Adverse Events by Sex and Event Seriousness
Sex of Respondent Serious Event?
FALSE TRUE
female 271 2562
male 61 2871

Marginal Distributions

Add row and/or column totals with janitor::adorn_totals(). The where argument can be set to "row", "col", or c("row", "col").

fda_adverse %>% 
  tabyl(sex, serious) %>% 
  adorn_totals(where = c("row", "col")) %>% 
  gt() %>% 
  tab_spanner(
    columns = 2:3, # choose columns to span
    label = "Serious Event?"
  )
sex Serious Event? Total
FALSE TRUE
female 271 2562 2833
male 61 2871 2932
Total 332 5433 5765

Percentages or counts?

To display the data in your contingency table as percentages instead of counts, add janitor::adorn_percentages(). The denominator argument in adorn_percentages() indicates the direction to use for calculating percentages. This can be set to “row”, “col”, or “all”.

data %>% 
  tabyl(row_var, col_var) %>% 
  adorn_percentages(denominator = "all") %>% 
  gt() %>% 
  fmt_percent(
    columns = -1,
    decimals = 1
  ) %>% 
  tab_spanner(
    columns = -1,
    label = "Column Label"
  )

You can further format the table display with janitor adorn_* functions, e.g. adorn_ns() adds totals as (N)s beside the percents, or with any gt functions after the table is passed to gt().

What about tidyr::pivot_wider()?

We can also make two-way-tables with pivot_wider(), the tidyverse’s general purpose function for pivoting long data into wide data. janitor::tabyl() is an alternative to pivot_wider() that specializes in building and formatting contingency tables.

data %>% 
  count(row_var, col_var) %>% 
  pivot_wider(
    names_from = col_var, 
    values_from = n, 
    values_fill = 0  # fill in NAs with 0
  ) %>% 
  gt() %>% 
  tab_spanner(
    columns = -1,
    label = "Column Label"
  )

Contingency tables and SAS

Contingency tables are obtained in SAS using the FREQ procedure with the TABLES statement.

In SAS:

PROC FREQ DATA = data;
  TABLES row_var * col_var / MISSING NOPERCENT NOCOL NOROW;
RUN;

In R:

data %>% 
  tabyl(row_var, col_var) %>% 
  gt() %>% 
  tab_spanner(
    columns = -1,
    label = "Column Label"
  )