skll
Package¶
The most useful parts of our API are available at the package level in addition to the module level. They are documented in both places for convenience.
From data
Package¶
-
class
skll.
FeatureSet
(name, ids, labels=None, features=None, vectorizer=None)[source]¶ Bases:
object
Encapsulation of all of the features, values, and metadata about a given set of data. This replaces ExamplesTuple from older versions of SKLL.
Parameters: - name (str) – The name of this feature set.
- ids (np.array) – Example IDs for this set.
- labels (np.array, optional) – labels for this set.
Defaults to
None
. - feature (list of dict or array-like, optional) – The features for each instance represented as either a
list of dictionaries or an array-like (if vectorizer is
also specified).
Defaults to
None
. - vectorizer (DictVectorizer or FeatureHasher, optional) – Vectorizer which will be used to generate the feature matrix.
Defaults to
None
.
Warning
FeatureSets can only be equal if the order of the instances is identical because these are stored as lists/arrays. Since scikit-learn’s DictVectorizer automatically sorts the underlying feature matrix if it is sparse, we do not do any sorting before checking for equality. This is not a problem because we _always_ use sparse matrices with DictVectorizer when creating FeatureSets.
Notes
If ids, labels, and/or features are not None, the number of rows in each array must be equal.
-
filter
(ids=None, labels=None, features=None, inverse=False)[source]¶ Removes or keeps features and/or examples from the Featureset depending on the parameters. Filtering is done in-place.
Parameters: - ids (list of str/float, optional) – Examples to keep in the FeatureSet. If None, no ID
filtering takes place.
Defaults to
None
. - labels (list of str/float, optional) – Labels that we want to retain examples for. If None,
no label filtering takes place.
Defaults to
None
. - features (list of str, optional) – Features to keep in the FeatureSet. To help with
filtering string-valued features that were converted
to sequences of boolean features when read in, any
features in the FeatureSet that contain a = will be
split on the first occurrence and the prefix will be
checked to see if it is in features.
If None, no feature filtering takes place.
Cannot be used if FeatureSet uses a FeatureHasher for
vectorization.
Defaults to
None
. - inverse (bool, optional) – Instead of keeping features and/or examples in lists,
remove them.
Defaults to
False
.
Raises: ValueError
– If attempting to use features to filter aFeatureSet
that uses aFeatureHasher
vectorizer.- ids (list of str/float, optional) – Examples to keep in the FeatureSet. If None, no ID
filtering takes place.
Defaults to
-
filtered_iter
(ids=None, labels=None, features=None, inverse=False)[source]¶ A version of __iter__ that retains only the specified features and/or examples from the output.
Parameters: - ids (list of str/float, optional) – Examples to keep in the
FeatureSet
. IfNone
, no ID filtering takes place. Defaults toNone
. - labels (list of str/float, optional) – Labels that we want to retain examples for. If
None
, no label filtering takes place. Defaults toNone
. - features (list of str, optional) – Features to keep in the
FeatureSet
. To help with filtering string-valued features that were converted to sequences of boolean features when read in, any features in theFeatureSet
that contain a = will be split on the first occurrence and the prefix will be checked to see if it is infeatures
. If None, no feature filtering takes place. Cannot be used ifFeatureSet
uses a FeatureHasher for vectorization. Defaults toNone
. - inverse (bool, optional) – Instead of keeping features and/or examples in lists,
remove them.
Defaults to
False
.
Yields: - id_ (str) – The ID of the example.
- label_ (str) – The label of the example.
- feat_dict (dict) – The feature dictionary, with feature name as the key and example value as the value.
Raises: ValueError
– If the vectorizer is not a DictVectorizer.- ids (list of str/float, optional) – Examples to keep in the
-
static
from_data_frame
(df, name, labels_column=None, vectorizer=None)[source]¶ Helper function to create a
FeatureSet
instance from a pandas.DataFrame. Will raise an Exception if pandas is not installed in your environment. Theids
in theFeatureSet
will be the index from the given frame.Parameters: - df (pd.DataFrame) – The pandas.DataFrame object to use as a
FeatureSet
. - name (str) – The name of the output
FeatureSet
instance. - labels_column (str, optional) – The name of the column containing the labels (data to predict).
Defaults to
None
. - vectorizer (DictVectorizer or FeatureHasher, optional) – Vectorizer which will be used to generate the feature matrix.
Defaults to
None
.
Returns: feature_set – A
FeatureSet
instance generated from from the given data frame.Return type: - df (pd.DataFrame) – The pandas.DataFrame object to use as a
-
has_labels
¶ Check if
FeatureSet
has finite labels.Returns: has_labels – Whether or not this FeatureSet has any finite labels. Return type: bool
-
static
split_by_ids
(fs, ids_for_split1, ids_for_split2=None)[source]¶ Split the
FeatureSet
into two newFeatureSet
instances based on the given IDs for the two splits.Parameters: - fs (skll.FeatureSet) – The
FeatureSet
instance to split. - ids_for_split1 (list of int) – A list of example IDs which will be split out into
the first
FeatureSet
instance. Note that the FeatureSet instance will respect the order of the specified IDs. - ids_for_split2 (list of int, optional) – An optional ist of example IDs which will be
split out into the second
FeatureSet
instance. Note that theFeatureSet
instance will respect the order of the specified IDs. If this is not specified, then the secondFeatureSet
instance will contain the complement of the first set of IDs sorted in ascending order. Defaults toNone
.
Returns: - fs1 (skll.FeatureSet) – The first
FeatureSet
. - fs2 (skll.FeatureSet) – The second
FeatureSet
.
- fs (skll.FeatureSet) – The
-
class
skll.
Reader
(path_or_list, quiet=True, ids_to_floats=False, label_col='y', id_col='id', class_map=None, sparse=True, feature_hasher=False, num_features=None, logger=None)[source]¶ Bases:
object
A helper class to make picklable iterators out of example dictionary generators.
Parameters: - path_or_list (str or list of dict) – Path or a list of example dictionaries.
- quiet (bool, optional) – Do not print “Loading…” status message to stderr.
Defaults to
True
. - ids_to_floats (bool, optional) – Convert IDs to float to save memory. Will raise error
if we encounter an a non-numeric ID.
Defaults to
False
. - label_col (str, optional) – Name of the column which contains the class labels
for ARFF/CSV/TSV files. If no column with that name
exists, or
None
is specified, the data is considered to be unlabelled. Defaults to'y'
. - id_col (str, optional) – Name of the column which contains the instance IDs.
If no column with that name exists, or
None
is specified, example IDs will be automatically generated. Defaults to'id'
. - class_map (dict, optional) – Mapping from original class labels to new ones. This is
mainly used for collapsing multiple labels into a single
class. Anything not in the mapping will be kept the same.
Defaults to
None
. - sparse (bool, optional) – Whether or not to store the features in a numpy CSR
matrix when using a DictVectorizer to vectorize the
features.
Defaults to
True
. - feature_hasher (bool, optional) – Whether or not a FeatureHasher should be used to
vectorize the features.
Defaults to
False
. - num_features (int, optional) – If using a FeatureHasher, how many features should the
resulting matrix have? You should set this to a power
of 2 greater than the actual number of features to
avoid collisions.
Defaults to
None
. - logger (logging.Logger, optional) – A logger instance to use to log messages instead of creating
a new one by default.
Defaults to
None
.
-
classmethod
for_path
(path_or_list, **kwargs)[source]¶ Instantiate the appropriate Reader sub-class based on the file extension of the given path. Or use a dictionary reader if the input is a list of dictionaries.
Parameters: - path_or_list (str or list of dicts) – A path or list of example dictionaries.
- kwargs (dict, optional) – The arguments to the Reader object being instantiated.
Returns: reader – A new instance of the Reader sub-class that is appropriate for the given path.
Return type: Raises: ValueError
– If file does not have a valid extension.
-
read
()[source]¶ Loads examples in the .arff, .csv, .jsonlines, .libsvm, .megam, .ndj, or .tsv formats.
Returns: feature_set –
FeatureSet
instance representing the input file.Return type: Raises: ValueError
– Ifids_to_floats
is True, but IDs cannot be converted.ValueError
– If no features are found.ValueError
– If the example IDs are not unique.
-
class
skll.
Writer
(path, feature_set, **kwargs)[source]¶ Bases:
object
Helper class for writing out FeatureSets to files on disk.
Parameters: - path (str) – A path to the feature file we would like to create. The suffix
to this filename must be
.arff
,.csv
,.jsonlines
,.libsvm
,.megam
,.ndj
, or.tsv
. Ifsubsets
is notNone
, when calling thewrite()
method, path is assumed to be a string containing the path to the directory to write the feature files with an additional file extension specifying the file type. For example/foo/.csv
. - feature_set (skll.FeatureSet) – The
FeatureSet
instance to dump to the file. - quiet (bool) – Do not print “Writing…” status message to stderr.
Defaults to
True
. - requires_binary (bool) – Whether or not the Writer must open the
file in binary mode for writing with Python 2.
Defaults to
False
. - subsets (dict (str to list of str)) – A mapping from subset names to lists of feature names
that are included in those sets. If given, a feature
file will be written for every subset (with the name
containing the subset name as suffix to
path
). Note, since string- valued features are automatically converted into boolean features with names of the formFEATURE_NAME=STRING_VALUE
, when doing the filtering, the portion before the=
is all that’s used for matching. Therefore, you do not need to enumerate all of these boolean feature names in your mapping. Defaults toNone
. - logger (logging.Logger) – A logger instance to use to log messages instead of creating
a new one by default.
Defaults to
None
.
-
classmethod
for_path
(path, feature_set, **kwargs)[source]¶ Retrieve object of
Writer
sub-class that is appropriate for given path.Parameters: - path (str) – A path to the feature file we would like to create. The
suffix to this filename must be
.arff
,.csv
,.jsonlines
,.libsvm
,.megam
,.ndj
, or.tsv
. Ifsubsets
is notNone
, when calling thewrite()
method, path is assumed to be a string containing the path to the directory to write the feature files with an additional file extension specifying the file type. For example/foo/.csv
. - feature_set (skll.FeatureSet) – The
FeatureSet
instance to dump to the output file. - kwargs (dict) – The keyword arguments for
for_path
are the same as the initializer for the desiredWriter
subclass.
Returns: writer – New instance of the Writer sub-class that is appropriate for the given path.
Return type: - path (str) – A path to the feature file we would like to create. The
suffix to this filename must be
- path (str) – A path to the feature file we would like to create. The suffix
to this filename must be
From experiments
Module¶
-
skll.
run_configuration
(config_file, local=False, overwrite=True, queue='all.q', hosts=None, write_summary=True, quiet=False, ablation=0, resume=False, log_level=20)[source]¶ Takes a configuration file and runs the specified jobs on the grid.
Parameters: - config_file (str) – Path to the configuration file we would like to use.
- local (bool, optional) – Should this be run locally instead of on the cluster?
Defaults to
False
. - overwrite (bool, optional) – If the model files already exist, should we overwrite
them instead of re-using them?
Defaults to
True
. - queue (str, optional) – The DRMAA queue to use if we’re running on the cluster.
Defaults to
'all.q'
. - hosts (list of str, optional) – If running on the cluster, these are the machines we should use.
Defaults to
None
. - write_summary (bool, optional) – Write a TSV file with a summary of the results.
Defaults to
True
. - quite (bool, optional) – Suppress printing of “Loading…” messages.
Defaults to
False
. - ablation (int, optional) – Number of features to remove when doing an ablation
experiment. If positive, we will perform repeated ablation
runs for all combinations of features removing the
specified number at a time. If
None
, we will use all combinations of all lengths. If 0, the default, no ablation is performed. If negative, aValueError
is raised. Defaults to 0. - resume (bool, optional) – If result files already exist for an experiment, do not
overwrite them. This is very useful when doing a large
ablation experiment and part of it crashes.
Defaults to
False
. - log_level (str, optional) – The level for logging messages.
Defaults to
logging.INFO
.
Returns: result_json_paths – A list of paths to .json results files for each variation in the experiment.
Return type: list of str
Raises: ValueError
– If value for"ablation"
is not a positive int orNone
.OSError
– If the lenth of theFeatureSet
name > 210.
From learner
Module¶
-
class
skll.
Learner
(model_type, probability=False, feature_scaling='none', model_kwargs=None, pos_label_str=None, min_feature_count=1, sampler=None, sampler_kwargs=None, custom_learner_path=None, logger=None)[source]¶ Bases:
object
A simpler learner interface around many scikit-learn classification and regression functions.
Parameters: - model_type (str) – Name of estimator to create (e.g.,
'LogisticRegression'
). See the skll package documentation for valid options. - probability (bool, optional) – Should learner return probabilities of all
labels (instead of just label with highest probability)?
Defaults to
False
. - feature_scaling (str, optional) – How to scale the features, if at all. Options are - ‘with_std’: scale features using the standard deviation - ‘with_mean’: center features using the mean - ‘both’: do both scaling as well as centering - ‘none’: do neither scaling nor centering Defaults to ‘none’.
- model_kwargs (dict, optional) – A dictionary of keyword arguments to pass to the
initializer for the specified model.
Defaults to
None
. - pos_label_str (str, optional) – The string for the positive label in the binary
classification setting. Otherwise, an arbitrary
label is picked.
Defaults to
None
. - min_feature_count (int, optional) – The minimum number of examples a feature must have a nonzero value in to be included. Defaults to 1.
- sampler (str, optional) – The sampler to use for kernel approximation, if desired.
Valid values are
- ‘AdditiveChi2Sampler’
- ‘Nystroem’
- ‘RBFSampler’
- ‘SkewedChi2Sampler’
Defaults to
None
. - sampler_kwargs (dict, optional) – A dictionary of keyword arguments to pass to the
initializer for the specified sampler.
Defaults to
None
. - custom_learner_path (str, optional) – Path to module where a custom classifier is defined.
Defaults to
None
. - logger (logging object, optional) – A logging object. If
None
is passed, get logger from__name__
. Defaults toNone
.
-
cross_validate
(examples, stratified=True, cv_folds=10, grid_search=False, grid_search_folds=3, grid_jobs=None, grid_objective='f1_score_micro', output_metrics=[], prediction_prefix=None, param_grid=None, shuffle=False, save_cv_folds=False, use_custom_folds_for_grid_search=True)[source]¶ Cross-validates a given model on the training examples.
Parameters: - examples (skll.FeatureSet) – The
FeatureSet
instance to cross-validate learner performance on. - stratified (bool, optional) – Should we stratify the folds to ensure an even
distribution of labels for each fold?
Defaults to
True
. - cv_folds (int, optional) – The number of folds to use for cross-validation, or a mapping from example IDs to folds. Defaults to 10.
- grid_search (bool, optional) – Should we do grid search when training each fold?
Note: This will make this take much longer.
Defaults to
False
. - grid_search_folds (int or dict, optional) – The number of folds to use when doing the grid search, or a mapping from example IDs to folds. Defaults to 3.
- grid_jobs (int, optional) – The number of jobs to run in parallel when doing the
grid search. If
None
or 0, the number of grid search folds will be used. Defaults toNone
. - grid_objective (str, optional) – The name of the objective function to use when
doing the grid search.
Defaults to
'f1_score_micro'
. - output_metrics (list of str, optional) – List of additional metric names to compute in addition to the metric used for grid search. Empty by default. Defaults to an empty list.
- prediction_prefix (str, optional) – If saving the predictions, this is the
prefix that will be used for the filename.
It will be followed by
"_predictions.tsv"
Defaults toNone
. - param_grid (list of dicts, optional) – The parameter grid to traverse.
Defaults to
None
. - shuffle (bool, optional) – Shuffle examples before splitting into folds for CV.
Defaults to
False
. - save_cv_folds (bool, optional) – Whether to save the cv fold ids or not?
Defaults to
False
. - use_custom_folds_for_grid_search (bool, optional) – If
cv_folds
is a custom dictionary, butgrid_search_folds
is not, perhaps due to user oversight, should the same custom dictionary automatically be used for the inner grid-search cross-validation? Defaults toTrue
.
Returns: - results (list of 6-tuples) – The confusion matrix, overall accuracy, per-label PRFs, model parameters, objective function score, and evaluation metrics (if any) for each fold.
- grid_search_scores (list of floats) – The grid search scores for each fold.
- skll_fold_ids (dict) – A dictionary containing the test-fold number for each id
if
save_cv_folds
isTrue
, otherwiseNone
.
Raises: ValueError
– If labels are not encoded as strings.- examples (skll.FeatureSet) – The
-
evaluate
(examples, prediction_prefix=None, append=False, grid_objective=None, output_metrics=[])[source]¶ Evaluates a given model on a given dev or test
FeatureSet
.Parameters: - examples (skll.FeatureSet) – The
FeatureSet
instance to evaluate the performance of the model on. - prediction_prefix (str, optional) – If saving the predictions, this is the
prefix that will be used for the filename.
It will be followed by
"_predictions.tsv"
Defaults toNone
. - append (bool, optional) – Should we append the current predictions to the file if
it exists?
Defaults to
False
. - grid_objective (function, optional) – The objective function that was used when doing
the grid search.
Defaults to
None
. - output_metrics (list of str, optional) – List of additional metric names to compute in addition to grid objective. Empty by default. Defaults to an empty list.
Returns: res – The confusion matrix, the overall accuracy, the per-label PRFs, the model parameters, the grid search objective function score, and the additional evaluation metrics, if any.
Return type: 6-tuple
- examples (skll.FeatureSet) – The
-
classmethod
from_file
(learner_path)[source]¶ Load a saved
Learner
instance from a file path.Parameters: learner_path (str) – The path to a saved
Learner
instance file.Returns: learner – The
Learner
instance loaded from the file.Return type: Raises: ValueError
– If the pickled object is not aLearner
instance.ValueError
– If the pickled version of theLearner
instance is out of date.
-
learning_curve
(examples, cv_folds=10, train_sizes=array([ 0.1, 0.325, 0.55, 0.775, 1. ]), metric='f1_score_micro')[source]¶ Generates learning curves for a given model on the training examples via cross-validation. Adapted from the scikit-learn code for learning curve generation (cf.``sklearn.model_selection.learning_curve``).
Parameters: - examples (skll.FeatureSet) – The
FeatureSet
instance to generate the learning curve on. - cv_folds (int, optional) – The number of folds to use for cross-validation, or a mapping from example IDs to folds. Defaults to 10.
- train_sizes (list of float or int, optional) – Relative or absolute numbers of training examples
that will be used to generate the learning curve.
If the type is float, it is regarded as a fraction
of the maximum size of the training set (that is
determined by the selected validation method),
i.e. it has to be within (0, 1]. Otherwise it
is interpreted as absolute sizes of the training
sets. Note that for classification the number of
samples usually have to be big enough to contain
at least one sample from each class.
Defaults to
np.linspace(0.1, 1.0, 5)
. - metric (str, optional) – The name of the metric function to use
when computing the train and test scores
for the learning curve. (default: ‘f1_score_micro’)
Defaults to
'f1_score_micro'
.
Returns: - train_scores (list of float) – The scores for the training set.
- test_scores (list of float) – The scores on the test set.
- num_examples (list of int) – The numbers of training examples used to generate the curve
- examples (skll.FeatureSet) – The
-
load
(learner_path)[source]¶ Replace the current learner instance with a saved learner.
Parameters: learner_path (str) – The path to a saved learner object file to load.
-
model
¶ The underlying scikit-learn model
-
model_kwargs
¶ A dictionary of the underlying scikit-learn model’s keyword arguments
-
model_params
¶ Model parameters (i.e., weights) for a
LinearModel
(e.g.,Ridge
) regression and liblinear models.Returns: - res (dict) – A dictionary of labeled weights.
- intercept (dict) – A dictionary of intercept(s).
Raises: ValueError
– If the instance does not support model parameters.
-
model_type
¶ The model type (i.e., the class)
-
predict
(examples, prediction_prefix=None, append=False, class_labels=False)[source]¶ Uses a given model to generate predictions on a given
FeatureSet
.Parameters: - examples (skll.FeatureSet) – The
FeatureSet
instance to predict labels for. - prediction_prefix (str, optional) – If saving the predictions, this is the prefix that will be used for
the filename. It will be followed by
"_predictions.tsv"
Defaults toNone
. - append (bool, optional) – Should we append the current predictions to the file if it exists?
Defaults to
False
. - class_labels (bool, optional) – For classifier, should we convert class indices to their (str) labels?
Defaults to
False
.
Returns: yhat – The predictions returned by the
Learner
instance.Return type: array-like
Raises: MemoryError
– If process runs out of memory when converting to dense.- examples (skll.FeatureSet) – The
-
probability
¶ Should learner return probabilities of all labels (instead of just label with highest probability)?
-
save
(learner_path)[source]¶ Save the
Learner
instance to a file.Parameters: learner_path (str) – The path to save the Learner
instance to.
-
train
(examples, param_grid=None, grid_search_folds=3, grid_search=True, grid_objective='f1_score_micro', grid_jobs=None, shuffle=False, create_label_dict=True)[source]¶ Train a classification model and return the model, score, feature vectorizer, scaler, label dictionary, and inverse label dictionary.
Parameters: - examples (skll.FeatureSet) – The
FeatureSet
instance to use for training. - param_grid (list of dicts, optional) – The parameter grid to search through for grid
search. If
None
, a default parameter grid will be used. Defaults toNone
. - grid_search_folds (int or dict, optional) – The number of folds to use when doing the grid search, or a mapping from example IDs to folds. Defaults to 3.
- grid_search (bool, optional) – Should we do grid search?
Defaults to
True
. - grid_objective (str, optional) – The name of the objective function to use when
doing the grid search.
Defaults to
'f1_score_micro'
. - grid_jobs (int, optional) – The number of jobs to run in parallel when doing the
grid search. If
None
or 0, the number of grid search folds will be used. Defaults toNone
. - shuffle (bool, optional) – Shuffle examples (e.g., for grid search CV.)
Defaults to
False
. - create_label_dict (bool, optional) – Should we create the label dictionary? This
dictionary is used to map between string
labels and their corresponding numerical
values. This should only be done once per
experiment, so when
cross_validate
callstrain
,create_label_dict
gets set toFalse
. Defaults toTrue
.
Returns: grid_score – The best grid search objective function score, or 0 if we’re not doing grid search.
Return type: float
Raises: ValueError
– If grid_objective is not a valid grid objective.MemoryError
– If process runs out of memory converting training data to dense.ValueError
– If FeatureHasher is used with MultinomialNB.
- examples (skll.FeatureSet) – The
- model_type (str) – Name of estimator to create (e.g.,
From metrics
Module¶
-
skll.
f1_score_least_frequent
(y_true, y_pred)[source]¶ Calculate the F1 score of the least frequent label/class in
y_true
fory_pred
.Parameters: - y_true (array-like of float) – The true/actual/gold labels for the data.
- y_pred (array-like of float) – The predicted/observed labels for the data.
Returns: ret_score – F1 score of the least frequent label.
Return type: float
-
skll.
kappa
(y_true, y_pred, weights=None, allow_off_by_one=False)[source]¶ Calculates the kappa inter-rater agreement between two the gold standard and the predicted ratings. Potential values range from -1 (representing complete disagreement) to 1 (representing complete agreement). A kappa value of 0 is expected if all agreement is due to chance.
In the course of calculating kappa, all items in
y_true
andy_pred
will first be converted to floats and then rounded to integers.It is assumed that y_true and y_pred contain the complete range of possible ratings.
This function contains a combination of code from yorchopolis’s kappa-stats and Ben Hamner’s Metrics projects on Github.
Parameters: - y_true (array-like of float) – The true/actual/gold labels for the data.
- y_pred (array-like of float) – The predicted/observed labels for the data.
- weights (str or np.array, optional) –
Specifies the weight matrix for the calculation. Options are
- None = unweighted-kappa - 'quadratic' = quadratic-weighted kappa - 'linear' = linear-weighted kappa - two-dimensional numpy array = a custom matrix of
weights. Each weight corresponds to the \(w_{ij}\) values in the wikipedia description of how to calculate weighted Cohen’s kappa. Defaults to None.
- allow_off_by_one (bool, optional) – If true, ratings that are off by one are counted as equal, and all other differences are reduced by one. For example, 1 and 2 will be considered to be equal, whereas 1 and 3 will have a difference of 1 for when building the weights matrix. Defaults to False.
Returns: k – The kappa score, or weighted kappa score.
Return type: float
Raises: AssertionError
– Ify_true
!=y_pred
.ValueError
– If labels cannot be converted to int.ValueError
– If invalid weight scheme.
-
skll.
kendall_tau
(y_true, y_pred)[source]¶ Calculate Kendall’s tau between
y_true
andy_pred
.Parameters: - y_true (array-like of float) – The true/actual/gold labels for the data.
- y_pred (array-like of float) – The predicted/observed labels for the data.
Returns: ret_score – Kendall’s tau if well-defined, else 0.0
Return type: float
-
skll.
spearman
(y_true, y_pred)[source]¶ Calculate Spearman’s rank correlation coefficient between
y_true
andy_pred
.Parameters: - y_true (array-like of float) – The true/actual/gold labels for the data.
- y_pred (array-like of float) – The predicted/observed labels for the data.
Returns: ret_score – Spearman’s rank correlation coefficient if well-defined, else 0.0
Return type: float
-
skll.
pearson
(y_true, y_pred)[source]¶ Calculate Pearson product-moment correlation coefficient between
y_true
andy_pred
.Parameters: - y_true (array-like of float) – The true/actual/gold labels for the data.
- y_pred (array-like of float) – The predicted/observed labels for the data.
Returns: ret_score – Pearson product-moment correlation coefficient if well-defined, else 0.0
Return type: float