Обзор:
Я создал четыре модели, используя пакет tidymodels с FID фрейма данных (см. Ниже):
- Общая линейная модель
- Упакованное дерево
- Случайный лес
- Усиленные деревья
Фрейм данных содержит три предиктора:
- Год (числовой)
- Месяц (фактор)
- Дни (числовые)
Зависимая переменная - частота (числовая).
Проблема
Я пытаюсь подогнать модель дерева в мешках и получаю следующее сообщение об ошибке ниже:
Есть идеи, почему я получаю сообщение об ошибке при использовании bag_tree () и fit_resamples ()?
В Интернете не так много материалов, за исключением того, что я нашел это сообщение; однако эта проблема связана с логистической регрессией, а не с моделями деревьев в мешках.
x Fold01: model: Error: Input must be a vector, not NULL.
x Fold02: model: Error: Input must be a vector, not NULL.
x Fold03: model: Error: Input must be a vector, not NULL.
x Fold04: model: Error: Input must be a vector, not NULL.
x Fold05: model: Error: Input must be a vector, not NULL.
x Fold06: model: Error: Input must be a vector, not NULL.
x Fold07: model: Error: Input must be a vector, not NULL.
x Fold08: model: Error: Input must be a vector, not NULL.
x Fold09: model: Error: Input must be a vector, not NULL.
x Fold10: model: Error: Input must be a vector, not NULL.
Warning message:
All models failed in [fit_resamples()]. See the `.notes` column.
Если кто-нибудь может помочь с решением этого сообщения об ошибке, я был бы очень признателен за ваш совет.
Спасибо заранее
R-код
##Open the tidymodels package
library(tidymodels)
library(glmnet)
library(parsnip)
library(rpart.plot)
library(rpart)
library(tidyverse) # manipulating data
library(skimr) # data visualization
library(baguette) # bagged trees
library(future) # parallel processing & decrease computation time
library(xgboost) # boosted trees
library(ranger)
library(yardstick)
library(purrr)
library(forcats)
library(rlang)
library(poissonreg)
#split this single dataset into two: a training set and a testing set
data_split <- initial_split(FID)
# Create data frames for the two sets:
train_data <- training(data_split)
test_data <- testing(data_split)
# resample the data with 10-fold cross-validation (10-fold by default)
cv <- vfold_cv(train_data, v=10)
###########################################################
##Produce the recipe
rec <- recipe(Frequency ~ ., data = FID) %>%
step_nzv(all_predictors(), freq_cut = 0, unique_cut = 0) %>% # remove variables with zero variances
step_novel(all_nominal()) %>% # prepares test data to handle previously unseen factor levels
step_medianimpute(all_numeric(), -all_outcomes(), -has_role("id vars")) %>% # replaces missing numeric observations with the median
step_dummy(all_nominal(), -has_role("id vars")) # dummy codes categorical variables
#####Bagged Trees
mod_bag <- bag_tree() %>%
set_mode("regression") %>%
set_engine("rpart", times = 10) #10 bootstrap resamples
##Update the model with cost complexity
##A positive number for the cost/complexity parameter, and
##The cost/complexity parameter
Updated_bag<-update(mod_bag, cost_complexity=1)
##Create workflow
wflow_bag <- workflow() %>%
add_recipe(rec) %>%
add_model(Updated_bag)
##Fit and predict the general linear model
bag_fit_model <- fit(wflow_bag, data = train_data)
##We can access the fit using pull_workflow_fit(), and even
##tidy() the model coefficient results into a convenient dataframe format.
##STACKOVERFLOW
bag_fit_model %>%
pull_workflow_fit()
##Predict the model
bag_predict<-predict(bag_fit_model, train_data)
##Fit the model
plan(multisession)
fit_bag <- fit_resamples(
wflow_bag,
cv,
metrics = metric_set(rmse, rsq),
control = control_resamples(save_pred = TRUE,
extract = function(x) extract_model(x)))
x Fold01: model: Error: Input must be a vector, not NULL.
x Fold02: model: Error: Input must be a vector, not NULL.
x Fold03: model: Error: Input must be a vector, not NULL.
x Fold04: model: Error: Input must be a vector, not NULL.
x Fold05: model: Error: Input must be a vector, not NULL.
x Fold06: model: Error: Input must be a vector, not NULL.
x Fold07: model: Error: Input must be a vector, not NULL.
x Fold08: model: Error: Input must be a vector, not NULL.
x Fold09: model: Error: Input must be a vector, not NULL.
x Fold10: model: Error: Input must be a vector, not NULL.
Warning message:
All models failed in [fit_resamples()]. See the `.notes` column.
Фрейм данных - FID
structure(list(Year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017,
2017, 2017, 2017, 2017, 2017, 2017, 2017), Month = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L), .Label = c("January", "February", "March",
"April", "May", "June", "July", "August", "September", "October",
"November", "December"), class = "factor"), Frequency = c(36,
28, 39, 46, 5, 0, 0, 22, 10, 15, 8, 33, 33, 29, 31, 23, 8, 9,
7, 40, 41, 41, 30, 30, 44, 37, 41, 42, 20, 0, 7, 27, 35, 27,
43, 38), Days = c(31, 28, 31, 30, 6, 0, 0, 29, 15,
29, 29, 31, 31, 29, 30, 30, 7, 0, 7, 30, 30, 31, 30, 27, 31,
28, 30, 30, 21, 0, 7, 26, 29, 27, 29, 29)), row.names = c(NA,
-36L), class = "data.frame")