Usage
Installation
To use this package, first install it using pip:
How to use this package
After installing the package, it can be imported using standard import syntax:
The main entry point to the package is vbvarsel.main()
This function requires at minimum Hyperparameters()
and one of either SimulationParameters()
or a user-supplied dataset.
The user-supplied dataset should be only numeric values. An optional parameter, cols_to_ignore
may be passed, which is a list of column name strings that are to be dropped.
If the data contains any non-numeric values, the process will fail.
Hyperparameters
The hyperparameters are a collection of parameters that control the clustering algorithm. These values can only be set once on initialisation. These parameters all have default values, but can be modified upon initialisation.
* threshold - The threshold for simulation convergence. (Default 1e-1)
* k1 - Maximum number of clusters to simulate for. (Default 5)
* alpha0 - Prior coefficient count, also known as the concentration parameter for
Dirichelet prior on the mixture proportions. This field is calculated
from 1/k1. (Default 0.2)
* a0 - Degrees of freedom for the Gamma prior on the cluster precision, which
controls the shape of the Gamma distribution. A higher number results
in a more peaked distribution. (Default 3)
* beta0 - Shrinkage parameter of the Gaussian conditional prior on the cluster
mean. This influences the tightness and spread of the cluster, smaller
shrinkage leads to tighter clusters. (Default 1e-3)
* d0 - Shape parameter of the Beta distribution on the probability. A value of
1 results in a uniform distribution. (Default 1)
* t_max - Maximum starting annealing temperature. Value of 1 has no annealing.
(Default 1)
* max_itr - Maximum number of iterations. (Default 25)
* max_annealed_itr - Maximum number of iterations for annealing, if applicable. (Default 10)
* max_models - Maximum number of models to run for averaging (Default 10)