Visualize data with a scatterplot

You want to create a scatterplot to visualize the relationship between two continuous variables in your data frame.

Step 1 - Pass your data to ggplot2::ggplot(). ggplot() creates a blank canvas for your plot.

Step 2 - Set the \(x\) and \(y\) variables with mapping = aes(x = , y = ). ggplot() will use these variables to create a coordinate system.

Step 3 - Add a layer of points with ggplot2::geom_point(). ggplot() will draw a point for each row in your data frame.

Step 4 (Optional) - Use mapping = aes() to add additional variables. Consider mapping these variables to the color, shape, size, or alpha (transparency) of your points.

ggplot(data, mapping = aes(x = col_A, y = col_B)) +
  geom_point(mapping = aes(color = col_C))

Be sure to place a + at the end of each line to connect ggplot2 plot elements.

Example

uber contains hourly summaries of Uber rideshare services in regions of Boston, Massachusetts.

uber
# A tibble: 864 × 5
   source_location   provider_service  hour price_mean distance_mean
   <chr>             <chr>            <int>      <dbl>         <dbl>
 1 Back Bay          UberPool             0       9.38          2.65
 2 Back Bay          UberX                0      10.9           2.53
 3 Back Bay          WAV                  0      10.6           2.48
 4 Beacon Hill       UberPool             0       8.47          2.03
 5 Beacon Hill       UberX                0      10.3           2.16
 6 Beacon Hill       WAV                  0       9.53          2.00
 7 Boston University UberPool             0       8.85          2.92
 8 Boston University UberX                0      10.2           2.71
 9 Boston University WAV                  0      12             3.35
10 Fenway            UberPool             0       9.95          2.81
# ℹ 854 more rows

We know that the price of a ride depends on its distance, but we also suspect that different services charge different rates. To explore this three-way relationship, we first make a scatterplot of price vs. distance. We then color the points by provider service.

library(ggplot2)
ggplot(uber, mapping = aes(x = distance_mean, y = price_mean)) +
  geom_point(mapping = aes(color = provider_service))

Add a trend line

Use geom_smooth() to overlay a trend line atop of a scatterplot, e.g.

ggplot(data, mapping = aes(x = col_A, y = col_B)) +
  geom_point() +
  geom_smooth()

Scatterplots in SAS

ggplot() with geom_point() is the equivalent of SAS’s SGPLOT procedure with the SCATTER statement:

In SAS:

PROC SGPLOT DATA = data_plot; 
  SCATTER X = col_1 Y = col_2;
RUN;

In R:

ggplot(data_plot, mapping = aes(x = col_1, y = col_2)) +
  geom_point()