mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Those tools are intentionally not (yet) submitted to the main projects to avoid cluttering up the core libraries.

Link to the mlxtend repository on GitHub: https://github.com/rasbt/mlxtend.



Overview





preprocessing

[back to top]

A collection of different functions for various data preprocessing procedures.

The preprocessing utilities can be imported via

from mxtend.preprocessing import ...



MeanCenterer

[back to top]

class MeanCenterer(TransformerObj):
"""
Class for column centering of vectors and matrices.

Keyword arguments:
    X: NumPy array object where each attribute/variable is
       stored in an individual column. 
       Also accepts 1-dimensional Python list objects.

Class methods:
    fit: Fits column means to MeanCenterer object.
    transform: Uses column means from `fit` for mean centering.
    fit_transform: Fits column means and performs mean centering.

The class methods `transform` and `fit_transform` return a new numpy array
object where the attributes are centered at the column means.

"""


Examples:

Use the fit method to fit the column means of a dataset (e.g., the training dataset) to a new MeanCenterer object. Then, call the transform method on the same dataset to center it at the sample mean.

>> X_train
array([[1, 2, 3],
   [4, 5, 6],
   [7, 8, 9]])
>> mc = MeanCenterer().fit(X_train)
>> mc.transform(X_train)
array([[-3, -3, -3],
   [ 0,  0,  0],
   [ 3,  3,  3]])


To use the same parameters that were used to center the training dataset, simply call the transform method of the MeanCenterer instance on a new dataset (e.g., test dataset).

>> X_test 
array([[1, 1, 1],
   [1, 1, 1],
   [1, 1, 1]])
>> mc.transform(X_test)  
array([[-3, -4, -5],
   [-3, -4, -5],
   [-3, -4, -5]])


The MeanCenterer also supports Python list objects, and the fit_transform method allows you to directly fit and center the dataset.

>> Z
[1, 2, 3]
>> MeanCenterer().fit_transform(Z)
array([-1,  0,  1])


import matplotlib.pyplot as plt
import numpy as np

X = 2 * np.random.randn(100,2) + 5

plt.scatter(X[:,0], X[:,1])
plt.grid()
plt.title('Random Gaussian data w. mean=5, sigma=2')
plt.show()

Y = mean_centering(X)
plt.scatter(Y[:,0], Y[:,1])
plt.grid()
plt.title('Data after mean centering')
plt.show()





scikit-learn utilities

[back to top]


The scikit-learn utilities can be imported via

from mxtend.scikit-learn import ...



ColumnSelector for custom feature selection

[back to top]

A feature selector for scikit-learn's Pipeline class that returns specified columns from a NumPy array; extremely useful in combination with scikit-learn's Pipeline in cross-validation.

Example in Pipeline:

from mlxtend.sklearn import ColumnSelector
from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import StandardScaler

clf_2col = Pipeline(steps=[
    ('scaler', StandardScaler()),
    ('reduce_dim', ColumnSelector(cols=(1,3))),    # extracts column 2 and 4
    ('classifier', GaussianNB())   
    ]) 

ColumnSelector has a transform method that is used to select and return columns (features) from a NumPy array so that it can be used in the Pipeline like other transformation classes.

### original data

print('First 3 rows before:\n', X_train[:3,:])
First 3 rows before:
[[ 4.5  2.3  1.3  0.3]
[ 6.7  3.3  5.7  2.1]
[ 5.7  3.   4.2  1.2]]

### after selection

cols = ColumnExtractor(cols=(1,3)).transform(X_train)
print('First 3 rows:\n', cols[:3,:])

First 3 rows:
[[ 2.3  0.3]
[ 3.3  2.1]
[ 3.   1.2]]






matplotlib utilities

[back to top]


The matplotlib utilities can be imported via

from mxtend.matplotlib import ...



remove_borders

[back to top]

A function to remove borders from matplotlib plots.

def remove_borders(axes, left=False, bottom=False, right=True, top=True):
    """ 
    A function to remove chartchunk from matplotlib plots, such as axes
        spines, ticks, and labels.

        Keyword arguments:
            axes: An iterable containing plt.gca() or plt.subplot() objects, e.g. [plt.gca()].
            left, bottom, right, top: Boolean to specify which plot axes to hide.

    """

Example



Installation

[back to top]

You can use the following command to install mlxtend:
pip install mlxtend
or
easy_install mlxtend

Alternatively, you download the package manually from the Python Package Index https://pypi.python.org/pypi/mlxtend, unzip it, navigate into the package, and use the command:

python setup.py install