read_excel("my/file.xlsx")Read an Excel file (.xls, .xlsx)
You want to read a Microsoft Excel file into your R session, where you can manipulate its contents. The file has the extension .xls or .xlsx.
Step 1 - Call readxl::read_excel(). read_excel() is designed to read in .xls or .xlsx files.
Step 2 - Give read_excel() the filepath to your file as a character string. For example:
Step 3 - Specify the sheet you want to read in with the sheet argument. By default, read_excel() reads in the first sheet of an Excel file. You can set sheet to the name of a different sheet (as a character string) or the location of a different sheet (as a number).
read_excel("my/file.xlsx", sheet = "Sheet_B")Step 4 - Save the output to an object, so you can access it later.
xl_table <- read_excel("my/file.xlsx", sheet = "Sheet_B")Example
We want to read in an ecology dataset which is an Excel file containing data about earthquakes. We have saved the dataset on our computer at /Users/bcullen/Desktop/rstudio/academySassyRecipes/recipes/basics/data/ecology.xlsx.
We begin by loading the readxl package which contains read_excel(). Next, we pass read_excel() the filepath for our Excel file in order to read in the dataset. The file is in our working directory, therefore we do not need the entire filepath. We only need the portion that goes from our working directory to the file. The output looks like this:
library(readxl)
read_excel("data/ecology.xlsx")# A tibble: 150 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <chr>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# ℹ 140 more rows
Looking at the beginning of the data frame, we realize read_excel() automatically read in the first sheet of the file. We are looking for data on earthquakes, not flowers. However, we cannot remember what the sheet was named. We can call readxl::excel_sheets() on the filepath to see the names of the sheets in the “ecology” file.
excel_sheets(path = "data/ecology.xlsx")[1] "iris" "mtcars" "chickwts" "quakes"
Now we can see that we wanted to read in “quakes” on Sheet 4 from the file. We can do this by specifying the sheet argument in read_excel().
read_excel(path = "data/ecology.xlsx", sheet = 4)# A tibble: 1,000 × 5
lat long depth mag stations
<dbl> <dbl> <dbl> <dbl> <dbl>
1 -20.4 182. 562 4.8 41
2 -20.6 181. 650 4.2 15
3 -26 184. 42 5.4 43
4 -18.0 182. 626 4.1 19
5 -20.4 182. 649 4 11
6 -19.7 184. 195 4 12
7 -11.7 166. 82 4.8 43
8 -28.1 182. 194 4.4 15
9 -28.7 182. 211 4.7 35
10 -17.5 180. 622 4.3 19
# ℹ 990 more rows
Lastly, we assign the earthquake data frame read in by read_excel() to an object so we can access it later.
earthquake <- read_excel(path = "data/ecology.xlsx", sheet = 4)read_excel() comes with many arguments that you can use to customize which parts of the spreadsheet will be read in and how. Here are a few of the most useful:
| Argument | Description |
|---|---|
col_names |
Should the first row be read in as column names? Defaults to TRUE. Can also be a character vector of column names. |
col_types |
Explicitly set the data type for each column. |
skip |
Number of rows to skip before reading any data. |
range |
Specify a subset of cells to read in. |
Read the help page at ?read_excel to learn more.
read_excel() and SAS
read_excel() is the equivalent of SAS’s IMPORT procedure with the DBMS = EXCEL.
In SAS, you must specify the complete path, filename, and extension for DATAFILE, whereas when using read_excel() the path can be specified based on the working directory.
In SAS:
PROC IMPORT DATAFILE = 'filename' OUT = xl_table
DBMS = EXCEL REPLACE SHEET = 'Sheet_B';
RUN;In R:
xl_table <- read_excel("my/file.xlsx", sheet = "Sheet_B")