arrange(): Sorting data frames

When examining a dataset, it is often convenient to sort the table by the different columns. We know about the order and sort function, but for ordering entire tables, the dplyr function arrange is useful. For example, here we order the states by population size:

murders %>%
  arrange(population) %>%
  head()
#>                  state abb        region population total   rate
#> 1              Wyoming  WY          West     563626     5  0.887
#> 2 District of Columbia  DC         South     601723    99 16.453
#> 3              Vermont  VT     Northeast     625741     2  0.320
#> 4         North Dakota  ND North Central     672591     4  0.595
#> 5               Alaska  AK          West     710231    19  2.675
#> 6         South Dakota  SD North Central     814180     8  0.983

With arrange we get to decide which column to sort by. To see the states by population, from smallest to largest, we arrange by rate instead:

murders %>% 
  arrange(rate) %>% 
  head()
#>           state abb        region population total  rate
#> 1       Vermont  VT     Northeast     625741     2 0.320
#> 2 New Hampshire  NH     Northeast    1316470     5 0.380
#> 3        Hawaii  HI          West    1360301     7 0.515
#> 4  North Dakota  ND North Central     672591     4 0.595
#> 5          Iowa  IA North Central    3046355    21 0.689
#> 6         Idaho  ID          West    1567582    12 0.766

Note that the default behavior is to order in ascending order. In dplyr, the function desc transforms a vector so that it is in descending order. To sort the table in descending order, we can type:

murders %>% 
  arrange(desc(rate)) %>% 
  head()
#>                  state abb        region population total  rate
#> 1 District of Columbia  DC         South     601723    99 16.45
#> 2            Louisiana  LA         South    4533372   351  7.74
#> 3             Missouri  MO North Central    5988927   321  5.36
#> 4             Maryland  MD         South    5773552   293  5.07
#> 5       South Carolina  SC         South    4625364   207  4.48
#> 6             Delaware  DE         South     897934    38  4.23

Instruction

Run the sample code to see how arrange() and desc() function works.

library(dplyr) library(dslabs) data(murders) murders <- murders %>% mutate(rate = total/population*100000) us_murder_rate <- murders %>% summarize(rate = sum(total) / sum(population) * 100000) # Arrange by population murders %>% arrange(population) %>% head() # Arrange by rate murders %>% arrange(rate) %>% head() # Arrange by rate in the descending direction murders %>% arrange(desc(rate)) %>% head()

Previous: 3-9 | group_by(): Group then summarize with group_by

Next: 3-11 | top_n(): see the top n rows

Back to Main