
Reshape a table from wide format to long
You want to reshape your data frame from a wide format to a long format. In other words, you want to reduce several columns into just two: a column of former column names, and a column of former column values.
Step 1 - Pass your data to tidyr::pivot_longer(). pivot_longer() will return a modified copy of the data frame.
Step 2 - Select the columns to reshape. pivot_longer() will combine these columns into a names column and a values column in the new format. Use ?dplyr::select helper functions to specify columns.
data %>%
pivot_longer(cols = c(time1, time2, time3))Step 3- Use names_to and values_to to pick new column names. What should pivot_longer() name the column it sends the current column names_to? And the column it sends the values_to?
data %>%
pivot_longer(
cols =c(time1, time2, time3),
names_to = "time",
values_to = "count_hrs"
)Example
vaccine_wide describes the number of vaccine doses allocated over March, April, and May of 2021 for each vaccine type.
vaccine_wide# A tibble: 3 × 4
# Groups: vaccine [3]
vaccine `202103` `202104` `202105`
<chr> <dbl> <dbl> <dbl>
1 Janssen 137900 139500 34000
2 Moderna 828000 719200 946080
3 Pfizer 1137240 968760 1262880
We’d like to plot doses vs. time, but to do that we need to reorganize our data to contain a doses column and a time column.
To begin, we load the tidyr package which contains pivot_longer(). Next, we pivot every column in the data except vaccine. By default, pivot_longer() will return the names column as character data. We don’t want that, so we take advantage of pivot_longer()’s names_transform argument to ask pivot_longer() to run as.numeric() on the names column, which turns the values into numeric data.
library(tidyr)
vaccine_wide %>%
pivot_longer(
cols = -vaccine,
names_to = "month",
values_to = "doses",
names_transform = as.numeric
)# A tibble: 9 × 3
# Groups: vaccine [3]
vaccine month doses
<chr> <dbl> <dbl>
1 Janssen 202103 137900
2 Janssen 202104 139500
3 Janssen 202105 34000
4 Moderna 202103 828000
5 Moderna 202104 719200
6 Moderna 202105 946080
7 Pfizer 202103 1137240
8 Pfizer 202104 968760
9 Pfizer 202105 1262880
Now we are set to make our plot.
library(ggplot2)
vaccine_wide %>%
pivot_longer(
cols = -vaccine,
names_to = "month",
values_to = "doses",
names_transform = as.numeric
) %>%
ggplot(aes(x = month, y = doses, linetype = vaccine)) +
geom_line()
Change the types of the new columns
By default, pivot_longer() will save the former column names as character strings and will make an educated guess about how to save the former column values.
To coerce either column to a different type, set the names_transform or values_transform argument to a conversion function, such as as.numeric().
Reshape table to long format in SAS
Pivoting a data frame from wide to long in SAS is either done with a DATA step and OUTPUT statements or with the TRANSPOSE procedure.
For example, we have a wide data frame with four columns: student_id, d1, d2, and d3 indicating the number of hours each student studied each day.
In SAS:
DATA data_long;
SET data_wide;
student_id = student_id;
day = "day_1";
count_hrs = d1;
OUTPUT;
student_id = student_id;
day = "day_2";
count_hrs = d2;
OUTPUT;
student_id = student_id;
day = "day_3";
count_hrs = d3;
OUTPUT;
DROP d1-d3;
RUN;OR
PROC TRANSPOSE DATA = data_wide OUT = data_long;
BY student_id;
RUN;In R:
data_long <-
data_wide %>%
pivot_longer(
cols = c(d1, d2, d3),
names_to = "day",
values_to = "count_hrs"
)