GMM
Module: gmm.py
This module provides a Gaussian Mixture Model (GMM) clustering implementation using scikit-learn's GaussianMixture.
It optionally allows a user-defined number of components or automatically scans [2..max_k] for the best silhouette score.
Classes:
Name | Description |
---|---|
Gmm |
Gaussian Mixture Model clustering with optional user-specified n_components or silhouette-based auto selection. |
Dependencies
- numpy
- sklearn.mixture.GaussianMixture
- sklearn.metrics.silhouette_score
- base.py (Clustering)
Key Features
- Automatic scanning of [2..max_k] for best silhouette score if n_components is None
- Final model is stored, along with predicted cluster labels
- Ties into the base
Clustering
for plotting/metrics
Version Info
- 28/Dec/2024: Initial release
Gmm
Bases: Clustering
Gaussian Mixture Model clustering with optional user-defined 'n_components' or automatic silhouette-based selection.
Attributes:
Name | Type | Description |
---|---|---|
n_components |
Optional[int]
|
The actual number of components used in the final fitted model. If provided, the class will skip auto-selection and directly use this many mixture components. |
max_k |
int
|
Maximum number of components to consider for auto selection if n_components is None. |
labels |
Optional[ndarray]
|
Cluster/component labels for each data point after fitting. |
Source code in scirex/core/ml/unsupervised/clustering/gmm.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
|
__init__(n_components=None, max_k=10)
Initialize the Gmm clustering class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_components
|
Optional[int]
|
If provided, the model will directly use this many Gaussian components. Otherwise, it scans [2..max_k] for the best silhouette score. Defaults to None. |
None
|
max_k
|
int
|
Maximum components to try for auto selection if n_components is None. Defaults to 10. |
10
|
Source code in scirex/core/ml/unsupervised/clustering/gmm.py
fit(X)
Fit the GMM model to the data.
If user-defined n_components is set, skip auto selection. Otherwise, compute silhouette scores across [2..max_k] and pick the best.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
ndarray
|
Scaled feature matrix of shape (n_samples, n_features). |
required |
Source code in scirex/core/ml/unsupervised/clustering/gmm.py
get_model_params()
Get parameters/results of the fitted GMM model.
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: - n_components (int): The final number of components used - max_k (int): The maximum considered if auto |