I want to compute the FIRM importance scores for a model made from a tidymodels workflow. For regex, I will use the iris dataset and try to predict whether an observation is setosa or not.
library(tidymodels)
library(readr)
library(vip)
#clean data
iris <- iris %>%
mutate(class = case_when(Species == 'setosa' ~ 'setosa',
TRUE ~ 'other'))
iris$class = as.factor(iris$class)
iris <- subset(iris, select = -c(Species))
#split data into training and testing
iris_split = initial_split(iris, prop = 0.8)
cv_splits = vfold_cv(training(iris_split), v = 5)
#preprocessing
iris_recipe = recipe(class ~., data = iris) %>%
step_center(Sepal.Length) %>%
prep()
#specify MARS model
model = rand_forest(
mode = "classification",
mtry = tune(),
trees = 50
) %>%
set_engine("ranger", importance = "impurity")
#tuning parameters
tuning_grid = grid_regular(mtry(range=c(1,4)), levels = 4)
iris_wkfl = workflow() %>%
add_recipe(iris_recipe) %>%
add_model(model)
iris_tune = tune_grid(iris_wkfl,
resamples = cv_splits,
grid = tuning_grid,
metrics = metric_set(accuracy))
best_params = iris_tune %>%
select_best(metric = "accuracy")
best_model = finalize_workflow(iris_wkfl, best_params) %>%
parsnip::fit(data = training(iris_split)) %>%
pull_workflow_fit()
vip(best_model, method = "firm")
The last line produces an error from the pdp package.
Error in get_training_data.default(object) :
The training data could not be extracted from object. Please supply the raw training data using the train argument in the call to partial.
Is the following line correct? Or do I need to supply for transformed training data using my recipe first? I want to make sure that vip is applying my recipe when computing the importance scores. I know the error says "raw training data" but I am unsure if pdp knows about my workflow.
vip(best_model, method = "firm", train = training(iris_split))
