data %>%
tabyl(row_var, col_var) Create a two-way contingency table
Contingency tables are a common way to describe the relationship between two categorical variables. You want to create one from two categorical columns in a data frame.
Step 1 - Start with a tidy data frame. Each of the variables that you want to appear in the contingency table should be in its own column.
Step 2 - Call janitor::tabyl() on the two variables for the contingency table. Pass it the variable that contains the row names for your new table, then the variable that contains the column names.
Step 3 (Optional) - Use gt::gt() to polish the table for publication. As you do, identify the column that contains the rownames. gt() uses this information to format the table as a contingency table.
data %>%
tabyl(row_var, col_var) %>%
gt(rowname_col = "row_var") Step 4 (Optional) - Further polish the table with functions from the gt package.
Step 5 (Optional) - Export your table with gt::gtsave(). Provide a filename that ends with a common table format (.html, .tex, .ltx, .rtf). gtsave() will automatically save your table to that format.
Label the column variable
By default, the values of your column variable will appear in the two-way table, but not its name. You can add the name of the column variable above the columns with gt::tab_spanner().
data %>%
tabyl(row_var, col_var) %>%
gt(rowname_col = "row_var") %>%
tab_spanner(
columns = -row_var, # select all columns except the row variable
label = "Column variable" # the name to display
)Or you can combine the row/column variable names in the first column name with janitor::adorn_title(placement = "combined"):
data %>%
tabyl(row_var, col_var) %>%
adorn_title(placement = "combined")Example
fda_adverse describes adverse events reported to the FDA for a variety of drugs. We are interested to see if there is a relationship between the sex of a respondent and whether or not the adverse event they reported was serious. (An adverse event is serious if it resulted in death, a life-threatening condition, hospitalization, disability, birth defects, or some other severe condition).
To begin, we first load the janitor and gt packages.
library(janitor)
library(gt)Then we call tabyl() on the two variables of interest, sex and serious, in fda_adverse.
fda_adverse %>%
tabyl(sex, serious) sex FALSE TRUE
female 271 2562
male 61 2871
Next, we pass the table to gt() and tweak it with the gt package to add a column variable header, row variable header, and title.
fda_adverse %>%
tabyl(sex, serious) %>%
gt(
rowname_col = "sex" # chose column to be in rows
) %>%
tab_spanner( # add variable name to the columns
columns = 2:3,
label = "Serious Event?"
) %>%
tab_stubhead( # add row variable header
label = "Sex of Respondent"
) %>%
tab_header( # add title
title = "FDA Adverse Events by Sex and Event Seriousness"
)| FDA Adverse Events by Sex and Event Seriousness | ||
|---|---|---|
| Sex of Respondent | Serious Event? | |
| FALSE | TRUE | |
| female | 271 | 2562 |
| male | 61 | 2871 |
Marginal Distributions
Add row and/or column totals with janitor::adorn_totals(). The where argument can be set to "row", "col", or c("row", "col").
fda_adverse %>%
tabyl(sex, serious) %>%
adorn_totals(where = c("row", "col")) %>%
gt() %>%
tab_spanner(
columns = 2:3, # choose columns to span
label = "Serious Event?"
)| sex | Serious Event? | Total | |
|---|---|---|---|
| FALSE | TRUE | ||
| female | 271 | 2562 | 2833 |
| male | 61 | 2871 | 2932 |
| Total | 332 | 5433 | 5765 |
Percentages or counts?
To display the data in your contingency table as percentages instead of counts, add janitor::adorn_percentages(). The denominator argument in adorn_percentages() indicates the direction to use for calculating percentages. This can be set to “row”, “col”, or “all”.
data %>%
tabyl(row_var, col_var) %>%
adorn_percentages(denominator = "all") %>%
gt() %>%
fmt_percent(
columns = -1,
decimals = 1
) %>%
tab_spanner(
columns = -1,
label = "Column Label"
)You can further format the table display with janitor adorn_* functions, e.g. adorn_ns() adds totals as (N)s beside the percents, or with any gt functions after the table is passed to gt().
What about tidyr::pivot_wider()?
We can also make two-way-tables with pivot_wider(), the tidyverse’s general purpose function for pivoting long data into wide data. janitor::tabyl() is an alternative to pivot_wider() that specializes in building and formatting contingency tables.
data %>%
count(row_var, col_var) %>%
pivot_wider(
names_from = col_var,
values_from = n,
values_fill = 0 # fill in NAs with 0
) %>%
gt() %>%
tab_spanner(
columns = -1,
label = "Column Label"
)Contingency tables and SAS
Contingency tables are obtained in SAS using the FREQ procedure with the TABLES statement.
In SAS:
PROC FREQ DATA = data;
TABLES row_var * col_var / MISSING NOPERCENT NOCOL NOROW;
RUN;In R:
data %>%
tabyl(row_var, col_var) %>%
gt() %>%
tab_spanner(
columns = -1,
label = "Column Label"
)