When you first get a data set, you will often notice that it contains factors with specific factor levels. However, sometimes you will want to change the names of these levels for clarity or other reasons. R allows you to do this with the function levels()
:
levels(factor_vector) <- c("name1", "name2",...)
A good illustration is the raw data that is provided to you by a survey. A common question for every questionnaire is the sex of the respondent. Here, for simplicity, just two categories were recorded, "M"
and "F"
. (You usually need more categories for survey data; either way, you use a factor to store the categorical data.)
survey_vector <- c("M", "F", "F", "M", "M")
Recording the sex with the abbreviations "M"
and "F"
can be convenient if you are collecting data with pen and paper, but it can introduce confusion when analyzing the data. At that point, you will often want to change the factor levels to "Male"
and "Female"
instead of "M"
and "F"
for clarity.
Watch out: the order with which you assign the levels is important. If you type levels(factor_survey_vector)
, you'll see that it outputs [1] "F" "M"
. If you don't specify the levels of the factor when creating the vector, R
will automatically assign them alphabetically. To correctly map "F"
to "Female"
and "M"
to "Male"
, the levels should be set to c("Female", "Male")
, in this order.
survey_vector
. You should use factor_survey_vector
in the next instruction.factor_survey_vector
to c("Female", "Male")
. Mind the order of the vector elements here.
# Code to build factor_survey_vector
survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
# Specify the levels of factor_survey_vector
levels(factor_survey_vector) <-
factor_survey_vector
# Code to build factor_survey_vector
survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
# Specify the levels of factor_survey_vector
levels(factor_survey_vector) <- c("Female", "Male")
factor_survey_vector
msg = "Do not change the definition of `survey_vector`!"
test_object("survey_vector", undefined_msg = msg, incorrect_msg = msg)
msg = "Do not change or remove the code to create the factor vector."
test_function("factor", "x", not_called_msg = msg, incorrect_msg = msg)
# MC-note: ideally would want to test assign operator `<-`, and have it highlight whole line.
# MC-note: or negate this test_student_typed, to highlight where they type this incorrect phrase
# test_student_typed('c("Male", "Female")')
test_object("factor_survey_vector", eq_condition = "equal",
incorrect_msg = paste("Did you assign the correct factor levels to `factor_survey_vector`? Use `levels(factor_survey_vector) <- c(\"Female\", \"Male\")`. Remember that R is case sensitive!"))
success_msg("Wonderful! Proceed to the next exercise.")
Mind the order in which you have to type in the factor levels. Hint: look at the order in which the levels are printed when typing levels(factor_survey_vector)
.