tidyclust 0.3.0
Deprecation
-
finalize_model_tidyclust()andfinalize_workflow_tidyclust()are deprecated. Usetune::finalize_model()andtune::finalize_workflow()instead, which now supportcluster_specobjects natively. (#223)
New Models and Engines
New
db_clust()clustering specification for fitting DBSCAN models, with engines"dbscan"and"hdbscan". (#209, #238)New
gm_clust()clustering specification for fitting Gaussian mixture models, with engine"mclust". (#209)New
mean_shift()clustering specification for fitting mean shift models, which iteratively shift observations toward regions of high density and determine the number of clusters automatically. Engines"LPCM"and"meanShiftR"are supported. (#240, #244)
Improvements
Added
dialsparameter constructorsradius(),min_points(),circular(),zero_covariance(),shared_orientation(),shared_shape(), andshared_size()so that tuning parameters fordb_clust()andgm_clust()resolve to real parameter objects rather than erroring on unexporteddials::names.Added a “Getting started with tidyclust” vignette (
vignette("tidyclust")). (#232)Added
butchersupport forcluster_fitobjects.axe_data()removes the training data stored in the fit, andaxe_env()clears the environment reference from the preprocessing terms. (#126)contr_one_hot()is now exported, fixing theindicators = "one_hot"code path in.convert_form_to_x_fit()and.convert_form_to_x_new(). (#218)extract_cluster_assignment(),extract_centroids(), andpredict()gain alabelsargument, a character vector of cluster labels that overrides the auto-generatedprefix-based labels. (#148)hier_clust()gains adist_funargument for specifying a custom distance function. (#70)hier_clust()documentation now clarifies thatpredict()may not matchextract_cluster_assignment()on training data:predict()uses a distance-based heuristic whileextract_cluster_assignment()usescutree()based on the dendrogram structure. (#208)The
dist_funargument accepted by cluster metrics is now documented, including how to use philentropy to supply custom distance methods. Seevignette("tuning_and_metrics", package = "tidyclust")for examples. (#185)tune_cluster()now supports parallel processing via themiraipackage in addition tofuture. (#220)tune_cluster()now warns when passed anapparent()resample. Metrics from apparent resamples are excluded bycollect_metrics(summarize = TRUE)(the default) since tune 1.2.0, which caused unexpectedNAvalues. Usecollect_metrics(summarize = FALSE)to see per-resample metrics. (#193)The
.notescolumn returned bytune_cluster()now includes atracecolumn containing backtraces for errors and warnings, making it easier to debug failures. (#220)
Bug Fixes
Fixed bug when trying to tune the
linkage_methodargument. (#206, @lgaborini)silhouette_avg()now hasdirection = "maximize"instead ofdirection = "zero", so thatshow_best()andselect_best()correctly return models with the highest silhouette values. (#212, @dnldelarosa)sse_within_total()now correctly applies a customdist_funwhennew_dataisNULLby using training data stored in the model. (#184)
Breaking Changes
The
foreachpackage is no longer supported for parallel processing intune_cluster(). Use thefutureormiraipackages instead. See?tune::parallelismfor details. (#220)The
.configcolumn produced bytune_cluster()has changed from thePreprocessor{num}_Model{num}pattern topre{num}_mod{num}_post{num}to align with updates in the tune package. (#220)
tidyclust 0.2.4
CRAN release: 2025-01-27
- The philentropy package is now used to calculate distances rather than Rfast. (#199)
tidyclust 0.2.1
CRAN release: 2024-02-28
- Small change to let tune package have easy CRAN release. (#178)
tidyclust 0.2.0
CRAN release: 2023-09-25
Improvements
- Engine specific documentation has been added for all models and engines. (#159)
Bug Fixes
Fixed bug where engine specific arguments were passed along for
k_means()when the engine ClusterR. (#142)Fixed bug where
prefixargument wouldn’t be correctly passed throughextract_cluster_assignment(),extract_centroids(), andpredict()(#145)Metric functions now error informatively if used with unfit cluster specifications. (#146)
Fixed bug that caused cluster ordering in extract_fit_summary(). (#136)
Using
extract_cluster_assignment(),extract_centroids()andpredict()on a fittedhier_clust()model without specifyingnum_clustorcut_heightnow gives more informative error message. (#147)k_means()now errors informatively iffit()withoutnum_clustspecified. (#134)Fixed bug where levels didn’t match number of clusters if prediction on fewer number of observations. (#158)
Fixed bug where
tune_cluster()would error if used with an recipe that contained non-predictor variables such as id variables. (#124)
Breaking Changes
Exported internal functions
ClusterR_kmeans_fit(),stats_kmeans_fit(), andhclust_fit()have been renamed to.k_means_fit_ClusterR(),.k_means_fit_stats(), and.hier_clust_fit_stats()to reduce visibility for users.Cluster reordering is now done at the fitting time, not the extraction and prediction time. (#154)
tidyclust 0.1.2
CRAN release: 2023-02-23
- The cluster specification methods for
generics::tune_args()andgenerics::tunable()are now registered unconditionally (#115).
tidyclust 0.1.1
CRAN release: 2022-12-20
Fixed bug where
extract_cluster_assignment()andpredict()sometimes didn’t have agreement of clusters. (#94)silhouette()andsilhouette_avg()now return NAs instead of erroring when applied to a clustering object with 1 cluster. (#104)Fixed bug where
extract_cluster_assignment()doesn’t work forhier_clust()models in workflows wherenum_clustersis specified inextract_cluster_assignment().
