hier_clust()
defines a model that fits clusters based on a distance-based
dendrogram
There are different ways to fit this model, and the method of estimation is chosen by setting the model engine. The engine-specific pages for this model are listed below.
Usage
hier_clust(
mode = "partition",
engine = "stats",
num_clusters = NULL,
cut_height = NULL,
linkage_method = "complete"
)
Arguments
- mode
A single character string for the type of model. The only possible value for this model is "partition".
- engine
A single character string specifying what computational engine to use for fitting. Possible engines are listed below. The default for this model is
"stats"
.- num_clusters
Positive integer, number of clusters in model (optional).
- cut_height
Positive double, height at which to cut dendrogram to obtain cluster assignments (only used if
num_clusters
isNULL
)- linkage_method
the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of
"ward.D"
,"ward.D2"
,"single"
,"complete"
,"average"
(= UPGMA),"mcquitty"
(= WPGMA),"median"
(= WPGMC) or"centroid"
(= UPGMC).
Details
What does it mean to predict?
To predict the cluster assignment for a new observation, we find the closest cluster. How we measure “closeness” is dependent on the specified type of linkage in the model:
single linkage: The new observation is assigned to the same cluster as its nearest observation from the training data.
complete linkage: The new observation is assigned to the cluster with the smallest maximum distances between training observations and the new observation.
average linkage: The new observation is assigned to the cluster with the smallest average distances between training observations and the new observation.
centroid method: The new observation is assigned to the cluster with the closest centroid, as in prediction for k_means.
Ward’s method: The new observation is assigned to the cluster with the smallest increase in error sum of squares (ESS) due to the new addition. The ESS is computed as the sum of squared distances between observations in a cluster, and the centroid of the cluster.
Examples
# Show all engines
modelenv::get_from_env("hier_clust")
#> # A tibble: 1 × 2
#> engine mode
#> <chr> <chr>
#> 1 stats partition
hier_clust()
#> Hierarchical Clustering Specification (partition)
#>
#> Main Arguments:
#> linkage_method = complete
#>
#> Computational engine: stats
#>