Sort a table by row

You want to sort the rows in your data frame according to the values of one or more columns.

Step 1 - Pass the data frame to dplyr::arrange(). arrange() will return a modified copy of this data frame.

Step 2 - Specify a column to sort the data frame by. Do not surround the name with quotation marks.

arrange(data, col_sort1)

Step 3 (Optional) - Specify additional column(s) to sort by. If you sort by more than one column, arrange() will each subsequent column to breaks ties in the preceding columns.

arrange(data, col_sort1, col_sort2)

Sort values in descending order

By default, arrange() sorts values in ascending order—numerically, alphabetically, logically, or chronologically. To sort a column in descending order, wrap it in dplyr::desc(), e.g.

arrange(data, desc(col_sort1), col_sort2)

R users often pass a data frame to arrange() with %>%, sometimes as part of a larger sequence of commands. For example,

data %>% 
  mutate(new_col = x + 5) %>%
  arrange(new_col)

Example

covid_jan2021 describes COVID-19 tests, hospitalizations, and deaths collected in the United States in January 2021. The data has been aggregated by state, and we want to find the state with the fewest COVID-19 deaths.

We begin by loading the dplyr package, which contains arrange(). We then pass covid_jan2021 to arrange() and add the column of interest, total_deaths, as an argument in arrange().

library(dplyr)
covid_jan2021 %>% 
  arrange(total_deaths)

# A tibble: 50 × 5
   state total_test total_cases total_hospitalizations total_deaths
   <fct>      <dbl>       <dbl>                  <dbl>        <dbl>
 1 VT        199646        4553                      0           38
 2 AK        224575        7137                    182           56
 3 HI        175137        4686                    299          122
 4 ND        101691        5135                    228          153
 5 WY        133201        7503                    182          158
 6 DE        252679       20615                      0          164
 7 ME        280838       15123                    325          243
 8 MT        162660       12394                    677          272
 9 SD         56237        9086                    618          290
10 NH        229228       21667                    135          298
# ℹ 40 more rows

We see here that Vermont was the state with the lowest number of total_deaths in January 2021, with 38 deaths.

Sort a table to bring missing values to the top

We can use arrange() to sort on functions of variables. To bring missing values to the top of a table, sort on the output of calling is.na() on a column:

data %>% 
  arrange(desc(is.na(col_1)))

arrange() and SAS

arrange() is the equivalent of SAS’s SORT procedure with the BY statement or PROC SQL with ORDER BY.

In SAS:

PROC SORT DATA = data OUT = data_sort;
  BY var_1 DESCENDING var_2;
RUN;

PROC SQL;
  CREATE TABLE data_sort AS
  SELECT *
  FROM DATA
  ORDER BY var_1 DESC, var_2;
QUIT;

In R:

data_sort <- 
  data %>% 
    arrange(desc(var_1), var_2)