The Random Forests Classifier in MRIQC¶
MRIQC is shipped with a random-forests classifier, using the combination of the ABIDE and DS030 datasets as training sample.
To predict the quality labels (0=”accept”, 1=”reject”) on a features table
computed by mriqc
with the default classifier, the command line
is as follows:
mriqc_clf --load-classifier -X aMRIQC.csv -o mypredictions.csv
where aMRIQC.csv
is the file generated by the group
level run of
mriqc
.
Building your custom classifier¶
Custom classifiers can be fitted using the same mriqc_clf
tool in fitting
mode:
mriqc_clf --train aMRIQC_train.csv labels.csv --log-file
where aMRIQC_train.csv
contains the IQMs calculated by mriqc
and labels.csv
contains
the matching ratings assigned by an expert.
The labels must be numerical (-1``= exclude, ``0``= doubtful, ``1
= accept).
With the flat --multiclass
the flags are not binarized.
Otherwise 0
and 1
will be mapped to 0
(accept) and -1
will be mapped
to 1
(reject).
Removing all arguments of the --train
flag we instruct mriqc_clf
to run cross-validation
for model selection and train the winner model on the ABIDE dataset:
mriqc_clf --train --log-file
Model selection can be followed by testing on a left out dataset using the flag --test
.
If test is provided empty (without paths to samples and labels), then the default
features and labels for ds030 are used:
mriqc_clf --train --test --log-file
The trained classifier can be then used for prediction on unseen data with the command at the top, indicating now which classifier should be used:
mriqc_clf --load-classifier myclassifier.pklz -X aMRIQC.csv -o mypredictions.csv
Predictions are stored as a CSV file, containing the BIDS identifiers as
indexing columns and the predicted quality label under the prediction
column.
Usage of mriqc_clf
¶
MRIQC model selection and held-out evaluation
usage: mriqc [-h] [--train [TRAIN [TRAIN ...]] | --load-classifier
[LOAD_CLASSIFIER]] [--test [TEST [TEST ...]]]
[-X EVALUATION_DATA] [--train-balanced-leaveout] [--multiclass]
[-P PARAMETERS] [-M {rfc,xgb,svc_lin,svc_rbf}] [--nested_cv]
[--nested_cv_kfold] [--perm PERM] [-S SCORER]
[--cv {kfold,loso,balanced-kfold,batch}] [--debug]
[--log-file [LOG_FILE]] [-v] [--njobs NJOBS] [-t THRESHOLD]
Named Arguments¶
–train | training data tables, X and Y, leave empty for ABIDE. |
–load-classifier | |
load a previously saved classifier | |
–test | test data tables, X and Y, leave empty for DS030. |
-X, –evaluation-data | |
classify this CSV table of IQMs | |
–train-balanced-leaveout | |
leave out a balanced, random, sample of training examples | |
–multiclass, –ms | |
do not binarize labels |
Options¶
-P, –parameters | |
-M, –model | Possible choices: rfc, xgb, svc_lin, svc_rbf model under test |
–nested_cv | run nested cross-validation before held-out |
–nested_cv_kfold | |
run nested cross-validation before held-out, using 10-fold split in the outer loop | |
–perm | permutation test: number of permutations |
-S, –scorer | |
–cv | Possible choices: kfold, loso, balanced-kfold, batch |
–debug | |
–log-file | write log to this file, leave empty for a default log name |
-v, –verbose | increases log verbosity for each occurence. |
–njobs | number of jobs |
-t, –threshold | |
decision threshold of the classifier |