pygmi.clust.crisp_clust#

Crisp clustering is a set of clustering routines, using standard statistical methods, as opposed to fuzzy methods.

Classes#

CrispClust

Crisp cluster GUI class.

Functions#

gcentroids(data, index, no_clust, mindist)

G Centroids.

gdist(data, center, index, no_clust, cltype, cov_constr)

G Dist routine.

Module Contents#

class pygmi.clust.crisp_clust.CrispClust(parent=None)#

Bases: pygmi.misc.BasicModule

Crisp cluster GUI class.

Parameters:

parent (parent, optional) – Reference to the parent routine. The default is None.

setupui()#

Set up UI.

Return type:

None.

combo()#

Set up combo box to choose algorithm.

Return type:

None.

settings(nodialog=False)#

Entry point into item.

Parameters:

nodialog (bool, optional) – Run settings without a dialog. The default is False.

Returns:

True if successful, False otherwise.

Return type:

bool

saveproj()#

Save project data from class.

Return type:

None.

update_vars()#

Update the variables.

Return type:

None.

acceptall()#

Process the data.

Return type:

None.

crisp_means(data, no_clust, cent, centfix, maxit, term_thresh, cltype, cov_constr)#

Script enables the crisp clustering of COMPLETE multi-variate datasets.

Parameters:
  • data (numpy array) – N x P matrix containing the data to be clustered, N is number of samples, P is number of different attributes available for each sample.

  • no_clust (int) – Number of clusters to be used.

  • cent (numpy array) – cluster centre positions, either empty [] –> randomly guessed center positions will be used for initialisation or NO_CLUSTxP matrix

  • centfix (numpy array) – Constrains the position of cluster centers, if CENTFIX is empty, cluster centers can freely vary during cluster analysis, otherwise CENTFIX is of equal size to CENT and gives an absolute deviation from initial center positions that should not be exceeded during clustering. Note, CETNFIX applies only if center values are provided by the user.

  • maxit (int) – number of maximal allowed iterations.

  • term_thresh (float) – Termination threshold, either empty [] –> go for the maximum number of iterations MAXIT or a scalar giving the minimum reduction of the size of the objective function for two consecutive iterations in Percent.

  • cltype (str) – either ‘kmeans’ –> kmeans cluster analysis (spherically shaped cluster), ‘det’ –> uses the determinant criterion of Spath, H., “Cluster-Formation and Analyse, chapter3” (ellipsoidal clusters, all cluster use the same ellipsoid), or ‘vardet’ –> Spath, H., chapter 4 (each cluster uses its individual ellipsoid). Note: the latter is the crisp version of the Gustafson-Kessel algorithm

  • cov_constr (float) – scalar between [0 1], values > 0 trim the covariance matrix to avoid needle-like ellipsoids for the clusters, applies only for cltype=’vardet’, but must always be provided.

Returns:

  • idx (numpy array) – cluster index number for each sample after the last iteration, column vector.

  • cent (numpy array) – matrix with cluster centre positions after last iteration, one cluster centre per row

  • obj_fcn (numpy array) – Vector, size of the objective function after each iteration

  • vrc (numpy array) – Variance Ratio Criterion

pygmi.clust.crisp_clust.gcentroids(data, index, no_clust, mindist)#

G Centroids.

Parameters:
  • data (numpy array) – Input data.

  • index (numpy array) – Cluster index number for each sample.

  • no_clust (int) – Number of clusters to be used.

  • mindist (numpy array) – Minimum distances.

Returns:

  • centroids (numpy array) – Centroids

  • index (numpy array) – Index

pygmi.clust.crisp_clust.gdist(data, center, index, no_clust, cltype, cov_constr)#

G Dist routine.

Parameters:
  • data (numpy array) – Input data.

  • center (numpy array) – center of each class.

  • index (numpy array) – Cluster index number for each sample.

  • no_clust (int) – Number of clusters to be used.

  • cltype (str) – Clustering type.

  • cov_constr (float) – scalar between [0 1].

Returns:

bigd – Output data.

Return type:

numpy array