pull(): Accessing a data object in a piped data

The us_murder_rate object defined above represents just one number. Yet we are storing it in a data frame:

class(us_murder_rate)
#> [1] "data.frame"

since, as most dplyr functions, summarize always returns a data frame.

This might be problematic if we want to use this result with functions that require a numeric value. Here we show a useful trick for accessing values stored in data when using pipes: when a data object is piped that object and its columns can be accessed using the pull function. To understand what we mean take a look at this line of code:

us_murder_rate %>% pull(rate)
#> [1] 3.03

This returns the value in the rate column of us_murder_rate making it equivalent to us_murder_rate$rate.

To get a number from the original data table with one line of code we can type:

us_murder_rate <- murders %>% 
  summarize(rate = sum(total) / sum(population) * 100000) %>%
  pull(rate)

us_murder_rate
#> [1] 3.03

which is now a numeric:

class(us_murder_rate)
#> [1] "numeric"

Instruction

Run the sample code to see how pull() function works.

DO NOT CLICK RUN

library(dplyr) library(dslabs) data(heights) data(murders) murders <- murders %>% mutate(rate = total/population*100000) us_murder_rate <- murders %>% summarize(rate = sum(total) / sum(population) * 100000) # print us_murder_rate us_murder_rate # pull rate column from the us_murder_rate us_murder_rate %>% pull(rate)

Previous: 3-7 | summarize(): Compute summary statistics

Next: 3-9 | group_by(): Group then summarize with group_by

Back to Main