Add a column to a table and compute its values

You want to modify a data frame by adding a new variable that is based on an existing variable.

Step 1 - Pass your data frame to dplyr::mutate(). mutate() will return a modified copy of this data frame.

mutate(data)

Step 2 - Choose a name for the new variable that will be added to the data frame. Write the name as an argument name in mutate().

mutate(data, new_col)

Step 3 - Pass the name an R expression that will create values for the new variable. The expression can use existing column(s) as input. For example, below new_col is based on x from data.

mutate(data, new_col = x + 5)

R users often pass a data frame to mutate() with %>%, sometimes as part of a larger sequence of commands. For example,

data %>% 
  mutate(new_col = x + 5) %>%
  select(x, new_col, old_col)

Example

weather_ contains hourly climate data collected near Monash University in Australia. The temp_cel column contains the temperature in degrees Celsius and wind_speeds contains the wind speed in m/s.

weather_
# A tibble: 24 × 3
   timestamp           temp_cel wind_speed
   <dttm>                 <dbl>      <dbl>
 1 2015-01-01 00:00:00     19.6       4.29
 2 2015-01-01 01:00:00     20.5       3.56
 3 2015-01-01 02:00:00     20.5       1.86
 4 2015-01-01 03:00:00     21.4       1.21
 5 2015-01-01 04:00:00     21.8       1.11
 6 2015-01-01 05:00:00     22.4       1.9 
 7 2015-01-01 06:00:00     22.7       2.52
 8 2015-01-01 07:00:00     22.7       3.36
 9 2015-01-01 08:00:00     22.4       4.13
10 2015-01-01 09:00:00     22.6       4.25
# ℹ 14 more rows

We want to create a new column, temp_fah, with the temperature in degrees Fahrenheit. The formula is \(Fahrenheit = (Celsius * 9 / 5) + 32\).

To begin, we first load the dplyr package which contains mutate(). Then, we create temp_fah based on temp_cel using the formula.

library(dplyr)
weather_ %>%
  mutate(temp_fah = (temp_cel * 9 / 5) + 32)
# A tibble: 24 × 4
   timestamp           temp_cel wind_speed temp_fah
   <dttm>                 <dbl>      <dbl>    <dbl>
 1 2015-01-01 00:00:00     19.6       4.29     67.2
 2 2015-01-01 01:00:00     20.5       3.56     68.8
 3 2015-01-01 02:00:00     20.5       1.86     68.8
 4 2015-01-01 03:00:00     21.4       1.21     70.5
 5 2015-01-01 04:00:00     21.8       1.11     71.3
 6 2015-01-01 05:00:00     22.4       1.9      72.4
 7 2015-01-01 06:00:00     22.7       2.52     72.9
 8 2015-01-01 07:00:00     22.7       3.36     72.8
 9 2015-01-01 08:00:00     22.4       4.13     72.2
10 2015-01-01 09:00:00     22.6       4.25     72.6
# ℹ 14 more rows

We see here temp_fah is added to the end of the data frame. This is because mutate() automatically adds the new column to the end. However, this can be altered with the mutate() arguments .before or .after. See ?mutate for more details.

Lastly, we save the dataset modified by mutate() to a new object so it can be used later.

weather_edits <- 
  weather_ %>%
    mutate(temp_fah = (temp_cel * 9/5) + 32)

mutate() and SAS

mutate() is the equivalent of SAS’s DATA step with SET:

In SAS:

DATA new_data;
  SET data;
  new_col = x + 5;
run;

In R:

new_data <- 
  data %>% 
    mutate(new_col = x + 5)