The dplyr package from the tidyverse introduces functions that perform some of the most common operations when working with data frames and uses names for these functions that are relatively easy to remember. For instance, to change the data table by adding a new column, we use mutate
. To filter the data table to a subset of rows, we use filter
. Finally, to subset the data by selecting specific columns, we use select
.
dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:
mutate()
adds new variables that are functions of existing
variablesselect()
picks variables based on their names.filter()
picks cases based on their values.summarise()
reduces multiple values down to a single summary.arrange()
changes the ordering of the rows.These all combine naturally with group_by()
which allows you to
perform any operation “by group”. You can learn more about them in
vignette("dplyr")
. As well as these single-table verbs, dplyr also
provides a variety of two-table verbs, which you can learn about in
vignette("two-table")
.
dplyr is designed to abstract over how the data is stored. That means as
well as working with local data frames, you can also work with remote
database tables, using exactly the same R code. Install the dbplyr
package then read vignette("databases", package = "dbplyr")
.
If you are new to dplyr, the best place to start is the data import chapter in R for data science.
Copy and Run the sample code in your Rstudio to install dplyr package.
# NOT RUN
# PLEASE COPY THE CODE IN YOUR RSTUDIO.
# Install a dplyr package
install.packages("dplyr")
# Look up the functions in the 'dplyr' package.
library(help=dplyr)