data %>%
tabyl(row_var, col_var, group_var)Create a three-way contingency table
Contingency tables are a common way to describe the relationship between two to three categorical variables. You want to create one from three categorical columns in a data frame.
Step 1 - Start with a tidy data frame. Each of the variables that you want to appear in the contingency table should be in its own column.
Step 2 = Call janitor::tabyl() on the three variables for the contingency table. Pass it the variables that contain row names, column names, and group names for the finished table, in that order.
Step 3 (Optional) - Use dplyr::bind_rows(.id = ) to combine the results into a single table. Set .id to a name for the grouping variable, as a character string. This is preparation for Step 4.
Step 4 (Optional) - Use gt::gt() to polish the table for publication. Identify the columns that contain the row names and group names as you do.
data %>%
tabyl(row_var, col_var, group_var) %>%
bind_rows(.id = "group_var") %>%
gt(
rowname_col = "row_var",
groupname_col = "group_var"
) Step 5 (Optional) - Further polish the table with functions from the gt package.
Step 6 (Optional) - Export your table with gt::gtsave(). Provide a filename that ends with a common table format (.html, .tex, .ltx, .rtf). gtsave() will automatically save your table to that format.
Example
fda_adverse describes adverse events reported to the FDA for a variety of drugs. We are interested to see if within each country there is a relationship between the sex of a respondent and whether or not the adverse event they reported was serious. (An adverse event is serious if it resulted in death, a life-threatening condition, hospitalization, disability, birth defects, or some other severe condition).
To begin, we first load the janitor and gt packages.
library(janitor)
library(gt)Then we call tabyl() on the three variables of interest, sex, serious, and country, in fda_adverse. Since tabyl() returns a list of tables, we use bind_rows() to collapse them into a single table.
fda_adverse %>%
tabyl(sex, serious, country) %>%
bind_rows(.id = "country") country sex FALSE TRUE
Australia female 0 12
Australia male 0 33
Japan female 8 425
Japan male 5 680
United Kingdom female 2 319
United Kingdom male 0 232
United States female 261 516
United States male 56 332
We then pass the table to gt(), taking care to specify which column contains the row names and which column contains the group names. We also use gt functions to label the columns variable and to make the group names bold.
fda_adverse %>%
tabyl(sex, serious, country) %>%
bind_rows(.id = "country") %>%
gt(
rowname_col = "sex",
groupname_col = "country"
) %>%
tab_spanner(
columns = 3:4,
label = "Serious Event?"
) %>%
tab_style(
style = cell_text(weight = "bold"),
locations = cells_row_groups()
)| Serious Event? | ||
|---|---|---|
| FALSE | TRUE | |
| Australia | ||
| female | 0 | 12 |
| male | 0 | 33 |
| Japan | ||
| female | 8 | 425 |
| male | 5 | 680 |
| United Kingdom | ||
| female | 2 | 319 |
| male | 0 | 232 |
| United States | ||
| female | 261 | 516 |
| male | 56 | 332 |
Contingency tables and SAS
Contingency tables are obtained in SAS using the FREQ procedure with the TABLES statement.
In SAS:
PROC FREQ DATA = data;
TABLES group_var * row_var * col_var / MISSING NOPERCENT NOCOL NOROW;
RUN;In R:
data %>%
tabyl(row_var, col_var, group_var) %>%
bind_rows(.id = "group_var") %>%
gt(
rowname_col = "row_var",
groupname_col = "group_var"
) %>%
tab_spanner(
columns = -1:2,
labels = "Column Variable Name"
)