Hdbscan
Module: hdbscan.py
This module provides an HDBSCAN (Hierarchical Density-Based Spatial Clustering of
Applications with Noise) implementation. It optionally allows user-defined
min_cluster_size
and min_samples
, or applies heuristics to determine them.
Classes:
Name | Description |
---|---|
Hdbscan |
Implements HDBSCAN with an optional user override or heuristic-based approach. |
Dependencies
- numpy
- hdbscan (pip install hdbscan)
- base.py (Clustering)
Key Features
- If user-defined 'min_cluster_size' or 'min_samples' is given, skip auto-heuristic
- Otherwise, compute simple heuristics
- Inherits from base
Clustering
for consistency with other clustering modules
Version Info
- 28/Dec/2024: Initial release
Hdbscan
Bases: Clustering
HDBSCAN clustering with optional user-defined 'min_cluster_size' and 'min_samples', or a heuristic-based approach if they are not provided.
Attributes:
Name | Type | Description |
---|---|---|
min_cluster_size |
Optional[int]
|
User-specified or auto-calculated minimum cluster size. |
min_samples |
Optional[int]
|
User-specified or auto-calculated minimum samples for a point to be core. |
cluster_selection_method |
str
|
Method for extracting clusters from the condensed tree. Defaults to 'eom'. |
labels |
Optional[ndarray]
|
Cluster labels for each data point after fitting (some may be -1 for noise). |
n_clusters_ |
Optional[int]
|
Number of clusters discovered (excluding noise). |
n_noise_ |
Optional[int]
|
Number of data points labeled as noise (-1). |
Source code in scirex/core/ml/unsupervised/clustering/hdbscan.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
__init__(min_cluster_size=None, min_samples=None, cluster_selection_method='eom')
Initialize the HDBSCAN clustering model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
min_cluster_size
|
Optional[int]
|
If provided, HDBSCAN will use this min cluster size directly. |
None
|
min_samples
|
Optional[int]
|
If provided, HDBSCAN will use this min_samples directly. |
None
|
cluster_selection_method
|
str
|
The method to extract clusters from condensed tree: 'eom' (Excess of Mass) or 'leaf'. Defaults to 'eom'. |
'eom'
|
Source code in scirex/core/ml/unsupervised/clustering/hdbscan.py
fit(X)
Fit HDBSCAN to the data.
- If min_cluster_size/min_samples are None, estimate them heuristically.
- Then create and fit an HDBSCAN model, storing labels, cluster count, and noise count.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
ndarray
|
Input data array of shape (n_samples, n_features). |
required |
Source code in scirex/core/ml/unsupervised/clustering/hdbscan.py
get_model_params()
Retrieve key parameters and results from the fitted HDBSCAN model.
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: - min_cluster_size (int): Final min_cluster_size used - min_samples (int): Final min_samples used - n_clusters (int): Number of clusters discovered - n_noise (int): Number of noise points |