Combine tables by stacking them one on top of another

You have several tables that all share the same columns in the same order. You’d like to combine them into a single table by appending each new table as additional rows beneath the first.

Step 1 - Double check that each of your data frames has the same column names in the same order. Repair any mistakes with dplyr::rename() or dplyr::relocate().

Step 2 - Combine your data frames with dplyr::bind_rows(). bind_rows() returns a copy of the first data frame that has the second data frame appended as new rows at the bottom of the copy.

data_1 %>%
  bind_rows(data_2) %>%
  bind_rows(data_3)

Interleave rows

After combining your data, consider sorting your new data frame by row with dplyr::arrange().

Example

mid_atlantic and new_england contain the number of COVID-19 cases reported by different states in March 2021. We’d like to combine them into a single dataset to make them easier to work with.

mid_atlantic
# A tibble: 3 × 2
  state total_cases
  <fct>       <dbl>
1 NJ          23253
2 NY          50724
3 PA          17001
new_england
# A tibble: 6 × 2
  state total_cases
  <fct>       <dbl>
1 CT           5384
2 ME           1160
3 MA          10208
4 NH           1437
5 RI           2345
6 VT            885

We first load the dplyr package, which contains bind_rows(). Then we bind our data frames together.

library(dplyr)
mid_atlantic %>% 
  bind_rows(new_england)
# A tibble: 9 × 2
  state total_cases
  <fct>       <dbl>
1 NJ          23253
2 NY          50724
3 PA          17001
4 CT           5384
5 ME           1160
6 MA          10208
7 NH           1437
8 RI           2345
9 VT            885

Notice that the rows of mid_atlantic appear first in the new data frame because we passed mid_atlantic first to bind_rows().

Keep track of which data frame each row came from

If you pass bind_rows() a named list of data frames, bind_rows() will add a column to the output that records the name of the list element that each row came from. Choose a name for this column with .id.

This provides an excellent way to capture information stored in the names of the data frames, which would otherwise be lost.

list(
  "Middle Atlantic" = mid_atlantic, 
  "New England"  = new_england
) %>% 
  bind_rows(.id = "region")
# A tibble: 9 × 3
  region          state total_cases
  <chr>           <fct>       <dbl>
1 Middle Atlantic NJ          23253
2 Middle Atlantic NY          50724
3 Middle Atlantic PA          17001
4 New England     CT           5384
5 New England     ME           1160
6 New England     MA          10208
7 New England     NH           1437
8 New England     RI           2345
9 New England     VT            885

Stacking data sets in SAS

bind_rows() is the equivalent of SAS’s DATA step with the data sets to stack in the SET statement:

In SAS:

DATA data_stack;
  SET data_1 data_2;
  BY col_sort;
RUN;

In R:

data_stack <-
  bind_rows(data_1, data_2) %>%
  arrange(col_sort)