In this section we introduce the main tidyverse data importing functions. We will use the murders.csv
file provided by the dslabs package as an example. To simplify the illustration we will copy the file to our working directory using the following code:
filename <- "murders.csv"
dir <- system.file("extdata", package = "dslabs")
fullpath <- file.path(dir, filename)
file.copy(fullpath, "murders.csv")
The readr library includes functions for reading data stored in text file spreadsheets into R. readr is part of the tidyverse package, or you can load it directly:
library(readr)
The following functions are available to read-in spreadsheets:
Function | Format | Typical suffix |
---|---|---|
read_table | white space separated values | txt |
read_csv | comma separated values | csv |
read_csv2 | semicolon separated values | csv |
read_tsv | tab delimited separated values | tsv |
read_delim | general text file format, must define delimiter | txt |
Although the suffix usually tells us what type of file it is, there is no guarantee that these always match. We can open the file to take a look or use the function read_lines
to look at a few lines:
read_lines("murders.csv", n_max = 3)
#> [1] "state,abb,region,population,total" "Alabama,AL,South,4779736,135"
#> [3] "Alaska,AK,West,710231,19"
This also shows that there is a header. Now we are ready to read-in the data into R. From the .csv suffix and the peek at the file, we know to use read_csv
:
dat <- read_csv(filename)
#> Parsed with column specification:
#> cols(
#> state = col_character(),
#> abb = col_character(),
#> region = col_character(),
#> population = col_integer(),
#> total = col_integer()
#> )
Note that we receive a message letting us know what data types were used for each column. Also note that dat
is a tibble
, not just a data frame. This is because read_csv
is a tidyverse parser. We can see that the data has in fact been read-in with the content in the file:
head(dat)
#> # A tibble: 6 x 5
#> state abb region population total
#> <chr> <chr> <chr> <int> <int>
#> 1 Alabama AL South 4779736 135
#> 2 Alaska AK West 710231 19
#> 3 Arizona AZ West 6392017 232
#> 4 Arkansas AR South 2915918 93
#> 5 California CA West 37253956 1257
#> 6 Colorado CO West 5029196 65
Finally, note that we can also use the full path for the file:
dat <- read_csv(fullpath)
Please download this file to run
murders.csvYou can see further readings in the following link .
Copy and Run the sample code in your Rstudio to see how readr() function works.
DO NOT CLICK RUN
# load readr package
library(readr)
# read three lines
read_lines("murders.csv", n_max = 3)
# read data file
dat <- read_csv(filename)