PREDICT.classification package¶
Submodules¶
PREDICT.classification.RankedSVM module¶
-
PREDICT.classification.RankedSVM.
RankSVM_test
(test_data, num_class, Weights, Bias, SVs, svm='Poly', gamma=0.05, coefficient=0.05, degree=3)¶
-
PREDICT.classification.RankedSVM.
RankSVM_test_original
(test_data, test_target, Weights, Bias, SVs, svm='Poly', gamma=0.05, coefficient=0.05, degree=3)¶
-
PREDICT.classification.RankedSVM.
RankSVM_train
(train_data, train_target, cost=1, lambda_tol=1e-06, norm_tol=0.0001, max_iter=500, svm='Poly', gamma=0.05, coefficient=0.05, degree=3)¶
-
PREDICT.classification.RankedSVM.
RankSVM_train_old
(train_data, train_target, cost=1, lambda_tol=1e-06, norm_tol=0.0001, max_iter=500, svm='Poly', gamma=0.05, coefficient=0.05, degree=3)¶ - Weights,Bias,SVs = RankSVM_train(train_data,train_target,cost,lambda_tol,norm_tol,max_iter,svm,gamma,coefficient,degree)
Description
- RankSVM_train takes,
train_data - An MxN array, the ith instance of training instance is stored in train_data[i,:] train_target - A QxM array, if the ith training instance belongs to the jth class, then train_target[j,i] equals +1, otherwise train_target(j,i) equals -1
- svm - svm gives the type of svm used in training, which can take the value of ‘RBF’, ‘Poly’ or ‘Linear’; svm.para gives the corresponding parameters used for the svm:
- if svm is ‘RBF’, then gamma gives the value of gamma, where the kernel is exp(-Gamma*|x[i]-x[j]|^2)
- if svm is ‘Poly’, then three values are used gamma, coefficient, and degree respectively, where the kernel is (gamma*<x[i],x[j]>+coefficient)^degree.
- if svm is ‘Linear’, then svm is [].
cost - The value of ‘C’ used in the SVM, default=1 lambda_tol - The tolerance value for lambda described in the appendix of [1]; default value is 1e-6 norm_tol - The tolerance value for difference between alpha(p+1) and alpha(p) described in the appendix of [1]; default value is 1e-4 max_iter - The maximum number of iterations for RankSVM, default=500
- and returns,
- Weights - The value for beta[ki] as described in the appendix of [1] is stored in Weights[k,i] Bias - The value for b[i] as described in the appendix of [1] is stored in Bias[1,i] SVs - The ith support vector is stored in SVs[:,i]
For more details,please refer to [1] and [2].
-
PREDICT.classification.RankedSVM.
is_empty
(any_structure)¶
-
PREDICT.classification.RankedSVM.
neg_dual_func
(Lambda, Alpha_old, Alpha_new, c_value, kernel, num_training, num_class, Label, not_Label, Label_size, size_alpha)¶
PREDICT.classification.construct_classifier module¶
-
PREDICT.classification.construct_classifier.
construct_SVM
(config, image_features, regression=False)¶ Constructs a SVM classifier
- Args:
config (dict): Dictionary of the required config settings mutation_data (dict): Mutation data that should be classified features (pandas dataframe): A pandas dataframe containing the features
to be used for classification- Returns:
- SVM/SVR classifier, parameter grid
-
PREDICT.classification.construct_classifier.
construct_classifier
(config, image_features)¶ Interface to create classification
Different classifications can be created using this common interface
- config: dict, mandatory
- Contains the required config settings. See the Github Wiki for all available fields.
- Returns:
- Constructed classifier
PREDICT.classification.crossval module¶
-
PREDICT.classification.crossval.
crossval
(config, label_data, image_features, classifier, param_grid={}, use_fastr=False, fastr_plugin=None, tempsave=False, fixedsplits=None, ensemble={'Use': False}, outputfolder=None, modus='singlelabel')¶ Constructs multiple individual classifiers based on the label settings
- config: dict, mandatory
- Dictionary with config settings. See the Github Wiki for the available fields and formatting.
- label_data: dict, mandatory
Should contain the following: patient_IDs (list): IDs of the patients, used to keep track of test and
training sets, and genetic data- mutation_label (list): List of lists, where each list contains the
- mutations status for that patient for each mutations
- mutation_name (list): Contains the different mutations that are stored
- in the mutation_label
- image_features: numpy array, mandatory
- Consists of a tuple of two lists for each patient: (feature_values, feature_labels)
- classifier: sklearn classifier
- The untrained classifier used for training.
- param_grid: dictionary, optional
- Contains the parameters and their values wich are used in the grid or randomized search hyperparamater optimization. See the construct_classifier function for some examples.
- use_fastr: boolean, default False
If False, parallel execution through Joblib is used for fast execution of the hyperparameter optimization. Especially suited for execution on mutlicore (H)PC’s. The settings used are specified in the config.ini file in the IOparser folder, which you can adjust to your system.
If True, fastr is used to split the hyperparameter optimization in separate jobs. Parameters for the splitting can be specified in the config file. Especially suited for clusters.
- fastr_plugin: string, default None
- Determines which plugin is used for fastr executions. When None, uses the default plugin from the fastr config.
- tempsave: boolean, default False
- If True, create a .hdf5 file after each cross validation containing the classifier and results from that that split. This is written to the GSOut folder in your fastr output mount. If False, only the result of all combined cross validations will be saved to a .hdf5 file. This will also be done if set to True.
- fixedsplits: string, optional
- By default, random split cross validation is used to train and evaluate the machine learning methods. Optionally, you can provide a .xlsx file containing fixed splits to be used. See the Github Wiki for the format.
- ensemble: dictionary, optional
- Contains the configuration for constructing an ensemble.
- modus: string, default ‘singlelabel’
- Determine whether one-vs-all classification (or regression) for each single label is used (‘singlelabel’) or if multilabel classification is performed (‘multilabel’).
- panda_data: pandas dataframe
- Contains all information on the trained classifier.
-
PREDICT.classification.crossval.
nocrossval
(config, label_data_train, label_data_test, image_features_train, image_features_test, classifier, param_grid, use_fastr=False, fastr_plugin=None, ensemble={'Use': False}, modus='singlelabel')¶ Constructs multiple individual classifiers based on the label settings
- Arguments:
config (Dict): Dictionary with config settings label_data (Dict): should contain: patient_IDs (list): IDs of the patients, used to keep track of test and
training sets, and genetic data- mutation_label (list): List of lists, where each list contains the
- mutations status for that patient for each mutations
- mutation_name (list): Contains the different mutations that are stored
- in the mutation_label
- image_features (numpy array): Consists of a tuple of two lists for each patient:
- (feature_values, feature_labels)
- ensemble: dictionary, optional
- Contains the configuration for constructing an ensemble.
- modus: string, default ‘singlelabel’
- Determine whether one-vs-all classification (or regression) for each single label is used (‘singlelabel’) or if multilabel classification is performed (‘multilabel’).
- Returns:
- classifier_data (pandas dataframe)
-
PREDICT.classification.crossval.
singleiteration
(X_train, Y_train, PID_train, feature_labels, classifier, param_grid, config_hyperopt, use_SMOTE=False, SMOTE_ratio=1, SMOTE_neighbors=10, n_cores=4, N_jobs=4, random_state=None, use_fastr=False, fastr_plugin='LinearExecution', use_ensemble=False, use_oversampling=True)¶ Perform a single iteration of a cross validation.
PREDICT.classification.estimators module¶
-
class
PREDICT.classification.estimators.
RankedSVM
(cost=1, lambda_tol=1e-06, norm_tol=0.0001, max_iter=500, svm='Poly', gamma=0.05, coefficient=0.05, degree=3)¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.ClassifierMixin
An example classifier which implements a 1-NN algorithm.
- demo_param : str, optional
- A parameter used for demonstation of how to pass and store paramters.
- X_ : array, shape = [n_samples, n_features]
- The input passed during
fit()
- y_ : array, shape = [n_samples]
- The labels passed during
fit()
-
fit
(X, y)¶ A reference implementation of a fitting function for a classifier.
- X : array-like, shape = [n_samples, n_features]
- The training input samples.
- y : array-like, shape = [n_samples]
- The target values. An array of int.
- self : object
- Returns self.
-
predict
(X, y=None)¶ A reference implementation of a prediction for a classifier.
- X : array-like of shape = [n_samples, n_features]
- The input samples.
- y : array of int of shape = [n_samples]
- The label for each sample is the label of the closest sample seen udring fit.
-
predict_proba
(X, y)¶ A reference implementation of a prediction for a classifier.
- X : array-like of shape = [n_samples, n_features]
- The input samples.
- y : array of int of shape = [n_samples]
- The label for each sample is the label of the closest sample seen udring fit.
PREDICT.classification.metrics module¶
-
PREDICT.classification.metrics.
multi_class_auc
(y_truth, y_score)¶
-
PREDICT.classification.metrics.
multi_class_auc_score
(y_truth, y_score)¶
-
PREDICT.classification.metrics.
pairwise_auc
(y_truth, y_score, class_i, class_j)¶
-
PREDICT.classification.metrics.
performance_multilabel
(y_truth, y_prediction, y_score=None, beta=1)¶ Multiclass performance metrics.
y_truth and y_prediction should both be lists with the multiclass label of each object, e.g.
y_truth = [0, 0, 0, 0, 0, 0, 2, 2, 1, 1, 2] ### Groundtruth y_prediction = [0, 0, 0, 0, 0, 0, 1, 2, 1, 2, 2] ### Predicted labels
Calculation of accuracy accorading to formula suggested in CAD Dementia Grand Challege http://caddementia.grand-challenge.org Calculation of Multi Class AUC according to classpy: https://bitbucket.org/bigr_erasmusmc/classpy/src/master/classpy/multi_class_auc.py
-
PREDICT.classification.metrics.
performance_singlelabel
(y_truth, y_prediction, y_score, regression=False)¶ Singleclass performance metrics
PREDICT.classification.parameter_optimization module¶
-
PREDICT.classification.parameter_optimization.
random_search_parameters
(features, labels, N_iter, test_size, classifier, param_grid, scoring_method, n_jobspercore=200, use_fastr=False, n_cores=1, fastr_plugin=None)¶ Train a classifier and simultaneously optimizes hyperparameters using a randomized search.
- Arguments:
features: numpy array containing the training features. labels: list containing the object labels to be trained on. N_iter: integer listing the number of iterations to be used in the
hyperparameter optimization.- test_size: float listing the test size percentage used in the cross
- validation.
classifier: sklearn classifier to be tested param_grid: dictionary containing all possible hyperparameters and their
values or distrubitions.- scoring_method: string defining scoring method used in optimization,
- e.g. f1_weighted for a SVM.
- n_jobsperscore: integer listing the number of jobs that are ran on a
- single core when using the fastr randomized search.
- use_fastr: Boolean determining of either fastr or joblib should be used
- for the opimization.
- fastr_plugin: determines which plugin is used for fastr executions.
- When None, uses the default plugin from the fastr config.
- Returns:
- random_search: sklearn randomsearch object containing the results.