data %>%
gt(groupname_col = "group_col") %>%
summary_rows(
groups = TRUE,
columns = everything(),
fns = list(
average = "mean",
total = "sum"
)
)Add summary rows for each group of rows in a table for publication
You are making a table for publication with rows organized into groups based on the values of one or more variables. You want to add summary rows after each group with statistics such as mean or sum.
Step 1 - Start with a data frame that contains the values that you want to display in the arrangement that you want to display them.
Step 2 - Use gt::gt() to turn your data frame into an editable, professional looking table.
Step 3 - Pass you group variable(s) to groupname_col as character strings. gt() will create a separate group of rows for each value or combination of values in these columns.
Step 4 - Add summary rows with gt::summary_rows(groups = TRUE). The groups = TRUE argument indicates that you want summary statistics for each group.
Step 5 - Specify which columns to summarize, and which functions (fns) to use to generate summaries. Supply these functions as a list. summary_rows() will add one row per function per group, using the list names as row names.
Grouped data
gt() automatically creates groups of rows for data grouped with dplyr::group_by(), e.g.
data %>%
group_by(col1, col2) %>%
gt()Include function arguments
You can use the anonymous function notation associated with purrr:map() to set arguments for your summary functions. Pass the function as a call prefixed with ~. Place a . where the column vector should go.
fns = list(
average = ~mean(., na.rm = TRUE),
total = ~sum(., na.rm = TRUE),
std_dev = ~sd(., na.rm = TRUE)
)Example
CA_vaccines describes the number of doses of COVID-19 vaccines allocated to California for three months in 2021.
CA_vaccines# A tibble: 9 × 3
month vaccine total_doses
<chr> <chr> <dbl>
1 March Janssen 632700
2 March Moderna 3803000
3 March Pfizer 5206500
4 April Janssen 640300
5 April Moderna 3304000
6 April Pfizer 4422600
7 May Janssen 155400
8 May Moderna 4342200
9 May Pfizer 5756400
When we publish this data, we’d like to group the rows into separate groups for each vaccine type and add summary rows after each group for the total and average number of doses.
To do this, we specify vaccine as a groupname column when we convert our table for publication with gt(). Then we add group summary rows with summary_rows().
library(gt)
CA_vaccines %>%
gt(
rowname_col = "month",
groupname_col = "vaccine"
) %>%
summary_rows(
group = TRUE,
columns = total_doses,
fns = list(
average = ~mean(., na.rm = TRUE),
sum = ~sum(., na.rm = TRUE)
)
)| total_doses | |
|---|---|
| Janssen | |
| March | 632700 |
| April | 640300 |
| May | 155400 |
| Moderna | |
| March | 3803000 |
| April | 3304000 |
| May | 4342200 |
| Pfizer | |
| March | 5206500 |
| April | 4422600 |
| May | 5756400 |
Adding group summaries in SAS
summary_rows() is similar to SAS’s REPORT procedure with the BREAK statement or calculating statistics in the SQL procedure with GROUP BY. However, the REPORT procedure only calculates the sum per group and the SQL procedure adds the summary values into new columns rather than rows.
In SAS:
PROC REPORT DATA=data;
COLUMN group_col row_col col_A;
DEFINE group_col / GROUP;
DEFINE row_col / DISPLAY;
BREAK AFTER group_col / SUMMARIZE;
RUN;or
PROC SQL;
SELECT *,
SUM(col_A) AS total_A
FROM data
GROUP BY group_col;
QUIT;In R:
data %>%
gt(
rowname_col = "row_col",
groupname_col = "group_col"
) %>%
summary_rows(
columns = col_A,
fns = "sum"
)