Add summary rows for each group of rows in a table for publication

You are making a table for publication with rows organized into groups based on the values of one or more variables. You want to add summary rows after each group with statistics such as mean or sum.

Step 1 - Start with a data frame that contains the values that you want to display in the arrangement that you want to display them.

Step 2 - Use gt::gt() to turn your data frame into an editable, professional looking table.

Step 3 - Pass you group variable(s) to groupname_col as character strings. gt() will create a separate group of rows for each value or combination of values in these columns.

Step 4 - Add summary rows with gt::summary_rows(groups = TRUE). The groups = TRUE argument indicates that you want summary statistics for each group.

Step 5 - Specify which columns to summarize, and which functions (fns) to use to generate summaries. Supply these functions as a list. summary_rows() will add one row per function per group, using the list names as row names.

data %>% 
  gt(groupname_col = "group_col") %>% 
  summary_rows(
    groups = TRUE,
    columns = everything(),
    fns = list(
      average = "mean",
      total = "sum"
    )
  )

Grouped data

gt() automatically creates groups of rows for data grouped with dplyr::group_by(), e.g.

data %>% 
  group_by(col1, col2) %>%
  gt()

Include function arguments

You can use the anonymous function notation associated with purrr:map() to set arguments for your summary functions. Pass the function as a call prefixed with ~. Place a . where the column vector should go.

fns = list(
  average = ~mean(., na.rm = TRUE),
  total = ~sum(., na.rm = TRUE),
  std_dev = ~sd(., na.rm = TRUE)
)

Example

CA_vaccines describes the number of doses of COVID-19 vaccines allocated to California for three months in 2021.

CA_vaccines
# A tibble: 9 × 3
  month vaccine total_doses
  <chr> <chr>         <dbl>
1 March Janssen      632700
2 March Moderna     3803000
3 March Pfizer      5206500
4 April Janssen      640300
5 April Moderna     3304000
6 April Pfizer      4422600
7 May   Janssen      155400
8 May   Moderna     4342200
9 May   Pfizer      5756400

When we publish this data, we’d like to group the rows into separate groups for each vaccine type and add summary rows after each group for the total and average number of doses.

To do this, we specify vaccine as a groupname column when we convert our table for publication with gt(). Then we add group summary rows with summary_rows().

library(gt)
CA_vaccines %>% 
  gt(
    rowname_col = "month",
    groupname_col = "vaccine"
  ) %>% 
  summary_rows(
    group = TRUE,
    columns = total_doses,
    fns = list(
      average = ~mean(., na.rm = TRUE),
      sum = ~sum(., na.rm = TRUE)
    )
  )
total_doses
Janssen
March 632700
April 640300
May 155400
Moderna
March 3803000
April 3304000
May 4342200
Pfizer
March 5206500
April 4422600
May 5756400

Adding group summaries in SAS

summary_rows() is similar to SAS’s REPORT procedure with the BREAK statement or calculating statistics in the SQL procedure with GROUP BY. However, the REPORT procedure only calculates the sum per group and the SQL procedure adds the summary values into new columns rather than rows.

In SAS:

PROC REPORT DATA=data;
  COLUMN group_col row_col col_A;
  DEFINE group_col / GROUP;
  DEFINE row_col / DISPLAY; 
  BREAK AFTER group_col / SUMMARIZE;
RUN;

or

PROC SQL;
  SELECT *,
    SUM(col_A) AS total_A
  FROM data
  GROUP BY group_col;
QUIT;

In R:

data %>% 
  gt(
    rowname_col = "row_col",
    groupname_col = "group_col"
  ) %>% 
  summary_rows(
    columns = col_A,
    fns = "sum"
  )