Version 0.4.2#
Changelog#
Bug fixes#
Fix a bug in
imblearn.over_sampling.SMOTENC
in which the the median of the standard deviation instead of half of the median of the standard deviation. By Guillaume Lemaitre in #491.Raise an error when passing target which is not supported, i.e. regression target or multilabel targets. Imbalanced-learn does not support this case. By Guillaume Lemaitre in #490.
Fix a bug in
imblearn.over_sampling.SMOTENC
in which a sparse matrices were densify duringinverse_transform
. By Guillaume Lemaitre in #495.Fix a bug in
imblearn.over_sampling.SMOTE_NC
in which a the tie breaking was wrongly sampling. By Guillaume Lemaitre in #497.
Version 0.4#
October, 2018
Warning
Version 0.4 is the last version of imbalanced-learn to support Python 2.7 and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.
Highlights#
This release brings its set of new feature as well as some API changes to strengthen the foundation of imbalanced-learn.
As new feature, 2 new modules imblearn.keras
and
imblearn.tensorflow
have been added in which imbalanced-learn samplers
can be used to generate balanced mini-batches.
The module imblearn.ensemble
has been consolidated with new classifier:
imblearn.ensemble.BalancedRandomForestClassifier
,
imblearn.ensemble.EasyEnsembleClassifier
,
imblearn.ensemble.RUSBoostClassifier
.
Support for string has been added in
imblearn.over_sampling.RandomOverSampler
and
imblearn.under_sampling.RandomUnderSampler
. In addition, a new class
imblearn.over_sampling.SMOTENC
allows to generate sample with data
sets containing both continuous and categorical features.
The imblearn.over_sampling.SMOTE
has been simplified and break down
to 2 additional classes:
imblearn.over_sampling.SVMSMOTE
and
imblearn.over_sampling.BorderlineSMOTE
.
There is also some changes regarding the API:
the parameter sampling_strategy
has been introduced to replace the
ratio
parameter. In addition, the return_indices
argument has been
deprecated and all samplers will exposed a sample_indices_
whenever this is
possible.
Changelog#
API#
Replace the parameter
ratio
bysampling_strategy
. #411 by Guillaume Lemaitre.Enable to use a
float
with binary classification forsampling_strategy
. #411 by Guillaume Lemaitre.Enable to use a
list
for the cleaning methods to specify the class to sample. #411 by Guillaume Lemaitre.Replace
fit_sample
byfit_resample
. An alias is still available for backward compatibility. In addition,sample
has been removed to avoid resampling on different set of data. #462 by Guillaume Lemaitre.
New features#
Add a
keras
andtensorflow
modules to create balanced mini-batches generator. #409 by Guillaume Lemaitre.Add
imblearn.ensemble.EasyEnsembleClassifier
which create a bag of AdaBoost classifier trained on balanced bootstrap samples. #455 by Guillaume Lemaitre.Add
imblearn.ensemble.BalancedRandomForestClassifier
which balanced each bootstrap provided to each tree of the forest. #459 by Guillaume Lemaitre.Add
imblearn.ensemble.RUSBoostClassifier
which applied a random under-sampling stage before each boosting iteration of AdaBoost. #469 by Guillaume Lemaitre.Add
imblern.over_sampling.SMOTENC
which generate synthetic samples on data set with heterogeneous data type (continuous and categorical features). #412 by Denis Dudnik and Guillaume Lemaitre.
Enhancement#
Add a documentation node to create a balanced random forest from a balanced bagging classifier. #372 by Guillaume Lemaitre.
Document the metrics to evaluate models on imbalanced dataset. #367 by Guillaume Lemaitre.
Add support for one-vs-all encoded target to support keras. #409 by Guillaume Lemaitre.
Adding specific class for borderline and SVM SMOTE using
BorderlineSMOTE
andSVMSMOTE
. #440 by Guillaume Lemaitre.Allow
imblearn.over_sampling.RandomOverSampler
can return indices using the attributesreturn_indices
. #439 by Hugo Gascon and Guillaume Lemaitre.Allow
imblearn.under_sampling.RandomUnderSampler
andimblearn.over_sampling.RandomOverSampler
to sample object array containing strings. #451 by Guillaume Lemaitre.
Bug fixes#
Fix bug in
metrics.classification_report_imbalanced
for whichy_pred
andy_true
where inversed. #394 by @Ole Silvig <klizter>.Fix bug in ADASYN to consider only samples from the current class when generating new samples. #354 by Guillaume Lemaitre.
Fix bug which allow for sorted behavior of
sampling_strategy
dictionary and thus to obtain a deterministic results when using the same random state. #447 by Guillaume Lemaitre.Force to clone scikit-learn estimator passed as attributes to samplers. #446 by Guillaume Lemaitre.
Fix bug which was not preserving the dtype of X and y when generating samples. #450 by Guillaume Lemaitre.
Add the option to pass a
Memory
object tomake_pipeline
like inpipeline.Pipeline
class. #458 by Christos Aridas.
Maintenance#
Remove deprecated parameters in 0.2 - #331 by Guillaume Lemaitre.
Make some modules private. #452 by Guillaume Lemaitre.
Upgrade requirements to scikit-learn 0.20. #379 by Guillaume Lemaitre.
Catch deprecation warning in testing. #441 by Guillaume Lemaitre.
Refactor and impose
pytest
style tests. #470 by Guillaume Lemaitre.
Documentation#
Remove some docstring which are not necessary. #454 by Guillaume Lemaitre.
Fix the documentation of the
sampling_strategy
parameters when used as a float. #480 by Guillaume Lemaitre.
Deprecation#
Deprecate
ratio
in favor ofsampling_strategy
. #411 by Guillaume Lemaitre.Deprecate the use of a
dict
for cleaning methods. alist
should be used. #411 by Guillaume Lemaitre.Deprecate
random_state
inimblearn.under_sampling.NearMiss
,imblearn.under_sampling.EditedNearestNeighbors
,imblearn.under_sampling.RepeatedEditedNearestNeighbors
,imblearn.under_sampling.AllKNN
,imblearn.under_sampling.NeighbourhoodCleaningRule
,imblearn.under_sampling.InstanceHardnessThreshold
,imblearn.under_sampling.CondensedNearestNeighbours
.Deprecate
kind
,out_step
,svm_estimator
,m_neighbors
inimblearn.over_sampling.SMOTE
. User should useimblearn.over_sampling.SVMSMOTE
andimblearn.over_sampling.BorderlineSMOTE
. #440 by Guillaume Lemaitre.Deprecate
imblearn.ensemble.EasyEnsemble
in favor of meta-estimatorimblearn.ensemble.EasyEnsembleClassifier
which follow the exact algorithm described in the literature. #455 by Guillaume Lemaitre.Deprecate
imblearn.ensemble.BalanceCascade
. #472 by Guillaume Lemaitre.Deprecate
return_indices
in all samplers. Instead, an attributesample_indices_
is created whenever the sampler is selecting a subset of the original samples. #474 by @Guillaume Lemaitre <glemaitre.