top_n(): see the top n rows

In the code above, we have used the function head to avoid having the page fill up with the entire dataset. If we want to see a larger proportion, we can use the top_n function. This function takes a data frame as it’s first argument, the number of rows to show in the second, and the variable to filter by in the third. Here is an example of how to see the top 10 rows:

murders %>% top_n(10, rate)
#>                   state abb        region population total  rate
#> 1               Arizona  AZ          West    6392017   232  3.63
#> 2              Delaware  DE         South     897934    38  4.23
#> 3  District of Columbia  DC         South     601723    99 16.45
#> 4               Georgia  GA         South    9920000   376  3.79
#> 5             Louisiana  LA         South    4533372   351  7.74
#> 6              Maryland  MD         South    5773552   293  5.07
#> 7              Michigan  MI North Central    9883640   413  4.18
#> 8           Mississippi  MS         South    2967297   120  4.04
#> 9              Missouri  MO North Central    5988927   321  5.36
#> 10       South Carolina  SC         South    4625364   207  4.48

Note that rows are not sorted by rate, only filtered. If want to sort, we need to use arrange. Note that if the third argument is left blank, top_n, filters by the last column.

Instruction

Run the sample code to see how top_n() function works.

library(dplyr) library(dslabs) data(heights) data(murders) murders <- murders %>% mutate(rate = total/population*100000) # select top_10 in rate murders %>% top_n(10, rate)

Previous: 3-10 | arrange(): Sorting data frames

Next: 3-12 | case_when(): define categorical variables based on existing variables

Back to Main