read.csv()

In this section we introduce the main tidyverse data importing functions. We will use the murders.csv file provided by the dslabs package as an example. To simplify the illustration we will copy the file to our working directory using the following code:

filename <- "murders.csv"
dir <- system.file("extdata", package = "dslabs") 
fullpath <- file.path(dir, filename)
file.copy(fullpath, "murders.csv")

6.2.1 readr

The readr library includes functions for reading data stored in text file spreadsheets into R. readr is part of the tidyverse package, or you can load it directly:

library(readr)

The following functions are available to read-in spreadsheets:

Function Format Typical suffix
read_table white space separated values txt
read_csv comma separated values csv
read_csv2 semicolon separated values csv
read_tsv tab delimited separated values tsv
read_delim general text file format, must define delimiter txt

Although the suffix usually tells us what type of file it is, there is no guarantee that these always match. We can open the file to take a look or use the function read_lines to look at a few lines:

read_lines("murders.csv", n_max = 3)
#> [1] "state,abb,region,population,total" "Alabama,AL,South,4779736,135"     
#> [3] "Alaska,AK,West,710231,19"

This also shows that there is a header. Now we are ready to read-in the data into R. From the .csv suffix and the peek at the file, we know to use read_csv:

dat <- read_csv(filename)
#> Parsed with column specification:
#> cols(
#>   state = col_character(),
#>   abb = col_character(),
#>   region = col_character(),
#>   population = col_integer(),
#>   total = col_integer()
#> )

Note that we receive a message letting us know what data types were used for each column. Also note that dat is a tibble, not just a data frame. This is because read_csv is a tidyverse parser. We can see that the data has in fact been read-in with the content in the file:

head(dat)
#> # A tibble: 6 x 5
#>   state      abb   region population total
#>   <chr>      <chr> <chr>       <int> <int>
#> 1 Alabama    AL    South     4779736   135
#> 2 Alaska     AK    West       710231    19
#> 3 Arizona    AZ    West      6392017   232
#> 4 Arkansas   AR    South     2915918    93
#> 5 California CA    West     37253956  1257
#> 6 Colorado   CO    West      5029196    65

Finally, note that we can also use the full path for the file:

dat <- read_csv(fullpath)

Please download this file to run

murders.csv

You can see further readings in the following link .

Instruction

Copy and Run the sample code in your Rstudio to see how readr() function works.

DO NOT CLICK RUN

# load readr package library(readr) # read three lines read_lines("murders.csv", n_max = 3) # read data file dat <- read_csv(filename)

Previous: 4-3 | getwd()

Next: 4-5 | readr, readxl, foreign package

Back to Main