In the code above, we have used the function head
to avoid having the page fill up with the entire dataset. If we want to see a larger proportion, we can use the top_n
function. This function takes a data frame as it’s first argument, the number of rows to show in the second, and the variable to filter by in the third. Here is an example of how to see the top 10 rows:
murders %>% top_n(10, rate)
#> state abb region population total rate
#> 1 Arizona AZ West 6392017 232 3.63
#> 2 Delaware DE South 897934 38 4.23
#> 3 District of Columbia DC South 601723 99 16.45
#> 4 Georgia GA South 9920000 376 3.79
#> 5 Louisiana LA South 4533372 351 7.74
#> 6 Maryland MD South 5773552 293 5.07
#> 7 Michigan MI North Central 9883640 413 4.18
#> 8 Mississippi MS South 2967297 120 4.04
#> 9 Missouri MO North Central 5988927 321 5.36
#> 10 South Carolina SC South 4625364 207 4.48
Note that rows are not sorted by rate
, only filtered. If want to sort, we need to use arrange
. Note that if the third argument is left blank, top_n
, filters by the last column.
Run the sample code to see how top_n() function works.
library(dplyr)
library(dslabs)
data(heights)
data(murders)
murders <- murders %>% mutate(rate = total/population*100000)
# select top_10 in rate
murders %>% top_n(10, rate)