Agglomerative
Module: agglomerative.py
This module provides an Agglomerative Clustering implementation using scikit-learn. It optionally allows a user-defined cluster count or uses silhouette scores (2..max_k) to auto-select the optimal cluster count.
Classes:
Name | Description |
---|---|
Agglomerative |
Implements an agglomerative clustering approach with optional user-specified n_clusters or silhouette-based auto selection. |
Dependencies
- numpy
- sklearn.cluster.AgglomerativeClustering
- sklearn.metrics.silhouette_score
- base.py (Clustering)
Key Features
- Automatic scanning of possible cluster counts (2..max_k) if n_clusters is not provided
- Silhouette-based selection of the best cluster count
- Provides get_model_params() for retrieving final clustering info
Version Info
- 28/Dec/2024: Initial release
Agglomerative
Bases: Clustering
Agglomerative Clustering with optional user-defined 'n_clusters' or automatic silhouette-based selection.
Attributes:
Name | Type | Description |
---|---|---|
n_clusters |
Optional[int]
|
User-specified cluster count. If not provided, the algorithm auto-selects. |
max_k |
int
|
Maximum number of clusters for auto-selection if n_clusters is None. |
labels |
Optional[ndarray]
|
Cluster labels for each data point after fitting. |
n_clusters_ |
Optional[int]
|
The actual number of clusters used in the final model. |
Source code in scirex/core/ml/unsupervised/clustering/agglomerative.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
|
__init__(n_clusters=None, max_k=10)
Initialize the Agglomerative clustering class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_clusters
|
Optional[int]
|
If provided, the class will use this cluster count directly. Otherwise, it scans 2..max_k using silhouette. Defaults to None. |
None
|
max_k
|
int
|
Maximum number of clusters to try (auto selection) if n_clusters is None. Defaults to 10. |
10
|
Source code in scirex/core/ml/unsupervised/clustering/agglomerative.py
fit(X)
Fit the Agglomerative Clustering model to the data.
- If n_clusters is provided, skip auto selection and use that value.
- Else, compute silhouette scores for k in [2..max_k], pick the best k, and finalize the clustering.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
ndarray
|
Input data array of shape (n_samples, n_features). |
required |
Source code in scirex/core/ml/unsupervised/clustering/agglomerative.py
get_model_params()
Retrieve key parameters and results from the fitted AgglomerativeClustering model.
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: - n_clusters (int): The final number of clusters used |