Utility Scripts¶
In addition to the main script, run_experiment, SKLL comes with a number of helpful utility scripts that can be used to prepare feature files and perform other routine tasks. Each is described briefly below.
compute_eval_from_predictions¶
Compute evaluation metrics from prediction files after you have run an experiment.
filter_features¶
Filter feature file to remove (or keep) any instances with the specified IDs or labels. Can also be used to remove/keep feature columns.
Positional Arguments¶
-
infile
¶
Input feature file (ends in
.arff
,.csv
,.jsonlines
,.megam
,.ndj
, or.tsv
)
-
outfile
¶
Output feature file (must have same extension as input file)
Optional Arguments¶
-
-f
<feature <feature ...>>
,
--feature
<feature <feature ...>>
¶ A feature in the feature file you would like to keep. If unspecified, no features are removed.
-
-I
<id <id ...>>
,
--id
<id <id ...>>
¶ An instance ID in the feature file you would like to keep. If unspecified, no instances are removed based on their IDs.
-
-i
,
--inverse
¶
Instead of keeping features and/or examples in lists, remove them.
-
-L
<label <label ...>>
,
--label
<label <label ...>>
¶ A label in the feature file you would like to keep. If unspecified, no instances are removed based on their labels.
-
-l
label_col
,
--label_col
label_col
¶ Name of the column which contains the class labels in ARFF, CSV, or TSV files. For ARFF files, this must be the final column to count as the label. (default:
y
)
-
-q
,
--quiet
¶
Suppress printing of
"Loading..."
messages.
-
--version
¶
Show program’s version number and exit.
generate_predictions¶
Loads a trained model and outputs predictions based on input feature files. Useful if you want to reuse a trained model as part of a larger system without creating configuration files.
Positional Arguments¶
-
model_file
¶
Model file to load and use for generating predictions.
-
input_file
¶
A csv file, json file, or megam file (with or without the label column), with the appropriate suffix.
Optional Arguments¶
-
-l
<label_col>
,
--label_col
<label_col>
¶ Name of the column which contains the labels in ARFF, CSV, or TSV files. For ARFF files, this must be the final column to count as the label. (default:
y
)
-
-p
<positive_label>
,
--positive_label
<positive_label>
¶ If the model is only being used to predict the probability of a particular label, this specifies the index of the label we’re predicting. 1 = second label, which is default for binary classification. Keep in mind that labels are sorted lexicographically. (default: 1)
-
-q
,
--quiet
¶
Suppress printing of
"Loading..."
messages.
-
-t
<threshold>
,
--threshold
<threshold>
¶ If the model we’re using is generating probabilities of the positive label, return 1 if it meets/exceeds the given threshold and 0 otherwise.
-
--version
¶
Show program’s version number and exit.
join_features¶
Combine multiple feature files into one larger file.
Positional Arguments¶
-
infile
...
¶ Input feature files (ends in
.arff
,.csv
,.jsonlines
,.megam
,.ndj
, or.tsv
)
-
outfile
¶
Output feature file (must have same extension as input file)
Optional Arguments¶
-
-l
<label_col>
,
--label_col
<label_col>
¶ Name of the column which contains the labels in ARFF, CSV, or TSV files. For ARFF files, this must be the final column to count as the label. (default:
y
)
-
-q
,
--quiet
¶
Suppress printing of
"Loading..."
messages.
-
--version
¶
Show program’s version number and exit.
plot_learning_curves¶
Generate learning curve plots from a learning curve output TSV file.
print_model_weights¶
Prints out the weights of a given trained model.
skll_convert¶
Convert between .arff, .csv., .jsonlines, .libsvm, .megam, and .tsv formats.
Positional Arguments¶
-
infile
¶
Input feature file (ends in
.arff
,.csv
,.jsonlines
,.libsvm
,.megam
,.ndj
, or.tsv
)
-
outfile
¶
Output feature file (ends in
.arff
,.csv
,.jsonlines
,.libsvm
,.megam
,.ndj
, or.tsv
)
Optional Arguments¶
-
-l
<label_col>
,
--label_col
<label_col>
¶ Name of the column which contains the labels in ARFF, CSV, or TSV files. For ARFF files, this must be the final column to count as the label. (default:
y
)
-
-q
,
--quiet
¶
Suppress printing of
"Loading..."
messages.
-
--arff_regression
¶
Create ARFF files for regression, not classification.
-
--arff_relation
ARFF_RELATION
¶ Relation name to use for ARFF file. (default:
skll_relation
)
-
--reuse_libsvm_map
REUSE_LIBSVM_MAP
¶ If you want to output multiple files that use the same mapping from labels and features to numbers when writing libsvm files, you can specify an existing .libsvm file to reuse the mapping from.
-
--version
¶
Show program’s version number and exit.
summarize_results¶
Creates an experiment summary TSV file from a list of JSON files generated by run_experiment.