Skip to content

tune_cluster() computes a set of performance metrics (e.g. accuracy or RMSE) for a pre-defined set of tuning parameters that correspond to a model or recipe across one or more resamples of the data.

Usage

tune_cluster(object, ...)

# S3 method for cluster_spec
tune_cluster(
  object,
  preprocessor,
  resamples,
  ...,
  param_info = NULL,
  grid = 10,
  metrics = NULL,
  control = tune::control_grid()
)

# S3 method for workflow
tune_cluster(
  object,
  resamples,
  ...,
  param_info = NULL,
  grid = 10,
  metrics = NULL,
  control = tune::control_grid()
)

Arguments

object

A tidyclust model specification or a workflows::workflow().

...

Not currently used.

preprocessor

A traditional model formula or a recipe created using recipes::recipe().

resamples

An rset() object.

param_info

A dials::parameters() object or NULL. If none is given, a parameters set is derived from other arguments. Passing this argument can be useful when parameter ranges need to be customized.

grid

A data frame of tuning combinations or a positive integer. The data frame should have columns for each parameter being tuned and rows for tuning parameter candidates. An integer denotes the number of candidate parameter sets to be created automatically.

metrics

A cluster_metric_set() or NULL.

control

An object used to modify the tuning process. Defaults to tune::control_grid().

Value

An updated version of resamples with extra list columns for .metrics and .notes (optional columns are .predictions and .extracts). .notes contains warnings and errors that occur during execution.

Examples

library(recipes)
#> 
#> Attaching package: ‘recipes’
#> The following object is masked from ‘package:stats’:
#> 
#>     step
library(rsample)
library(workflows)
library(tune)

rec_spec <- recipe(~., data = mtcars) %>%
  step_normalize(all_numeric_predictors()) %>%
  step_pca(all_numeric_predictors())

kmeans_spec <- k_means(num_clusters = tune())

wflow <- workflow() %>%
  add_recipe(rec_spec) %>%
  add_model(kmeans_spec)

grid <- tibble(num_clusters = 1:3)

set.seed(4400)
folds <- vfold_cv(mtcars, v = 2)

res <- tune_cluster(
  wflow,
  resamples = folds,
  grid = grid
)
res
#> # Tuning results
#> # 2-fold cross-validation 
#> # A tibble: 2 × 4
#>   splits          id    .metrics         .notes          
#>   <list>          <chr> <list>           <list>          
#> 1 <split [16/16]> Fold1 <tibble [6 × 5]> <tibble [0 × 3]>
#> 2 <split [16/16]> Fold2 <tibble [6 × 5]> <tibble [0 × 3]>

collect_metrics(res)
#> # A tibble: 6 × 7
#>   num_clusters .metric          .estimator  mean     n std_err .config    
#>          <int> <chr>            <chr>      <dbl> <int>   <dbl> <chr>      
#> 1            1 sse_total        standard   160.      2   0.346 Preprocess…
#> 2            1 sse_within_total standard   160.      2   0.346 Preprocess…
#> 3            2 sse_total        standard   160.      2   0.346 Preprocess…
#> 4            2 sse_within_total standard    80.3     2   3.63  Preprocess…
#> 5            3 sse_total        standard   160.      2   0.346 Preprocess…
#> 6            3 sse_within_total standard    54.3     2   7.15  Preprocess…