dplyr:: dplyr package

The dplyr package from the tidyverse introduces functions that perform some of the most common operations when working with data frames and uses names for these functions that are relatively easy to remember. For instance, to change the data table by adding a new column, we use mutate. To filter the data table to a subset of rows, we use filter. Finally, to subset the data by selecting specific columns, we use select.

dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:

These all combine naturally with group_by() which allows you to perform any operation “by group”. You can learn more about them in vignette("dplyr"). As well as these single-table verbs, dplyr also provides a variety of two-table verbs, which you can learn about in vignette("two-table").

dplyr is designed to abstract over how the data is stored. That means as well as working with local data frames, you can also work with remote database tables, using exactly the same R code. Install the dbplyr package then read vignette("databases", package = "dbplyr").

If you are new to dplyr, the best place to start is the data import chapter in R for data science.

Instruction

Copy and Run the sample code in your Rstudio to install dplyr package.

# NOT RUN # PLEASE COPY THE CODE IN YOUR RSTUDIO. # Install a dplyr package install.packages("dplyr") # Look up the functions in the 'dplyr' package. library(help=dplyr)

Previous: 2-6 | For-Loop

Next: 3-2 | Case Study:: US GUN murders

Back to Main