data_1 %>%
bind_rows(data_2) %>%
bind_rows(data_3)Combine tables by stacking them one on top of another
You have several tables that all share the same columns in the same order. You’d like to combine them into a single table by appending each new table as additional rows beneath the first.
Step 1 - Double check that each of your data frames has the same column names in the same order. Repair any mistakes with dplyr::rename() or dplyr::relocate().
Step 2 - Combine your data frames with dplyr::bind_rows(). bind_rows() returns a copy of the first data frame that has the second data frame appended as new rows at the bottom of the copy.
Interleave rows
After combining your data, consider sorting your new data frame by row with dplyr::arrange().
Example
mid_atlantic and new_england contain the number of COVID-19 cases reported by different states in March 2021. We’d like to combine them into a single dataset to make them easier to work with.
mid_atlantic# A tibble: 3 × 2
state total_cases
<fct> <dbl>
1 NJ 23253
2 NY 50724
3 PA 17001
new_england# A tibble: 6 × 2
state total_cases
<fct> <dbl>
1 CT 5384
2 ME 1160
3 MA 10208
4 NH 1437
5 RI 2345
6 VT 885
We first load the dplyr package, which contains bind_rows(). Then we bind our data frames together.
library(dplyr)
mid_atlantic %>%
bind_rows(new_england)# A tibble: 9 × 2
state total_cases
<fct> <dbl>
1 NJ 23253
2 NY 50724
3 PA 17001
4 CT 5384
5 ME 1160
6 MA 10208
7 NH 1437
8 RI 2345
9 VT 885
Notice that the rows of mid_atlantic appear first in the new data frame because we passed mid_atlantic first to bind_rows().
Keep track of which data frame each row came from
If you pass bind_rows() a named list of data frames, bind_rows() will add a column to the output that records the name of the list element that each row came from. Choose a name for this column with .id.
This provides an excellent way to capture information stored in the names of the data frames, which would otherwise be lost.
list(
"Middle Atlantic" = mid_atlantic,
"New England" = new_england
) %>%
bind_rows(.id = "region")# A tibble: 9 × 3
region state total_cases
<chr> <fct> <dbl>
1 Middle Atlantic NJ 23253
2 Middle Atlantic NY 50724
3 Middle Atlantic PA 17001
4 New England CT 5384
5 New England ME 1160
6 New England MA 10208
7 New England NH 1437
8 New England RI 2345
9 New England VT 885
Stacking data sets in SAS
bind_rows() is the equivalent of SAS’s DATA step with the data sets to stack in the SET statement:
In SAS:
DATA data_stack;
SET data_1 data_2;
BY col_sort;
RUN;In R:
data_stack <-
bind_rows(data_1, data_2) %>%
arrange(col_sort)