{% extends "layout.html" %} {% set title = 'skrub: Less wrangling, more machine learning' %} {%- block extrahead %} {{ super() }} {# Add here landing-page specific stuff that goes in the header (eg css) #} {%- endblock extrahead %} {% block docs_navbar %} {{ super() }} {# We add the full-width banner below the navbar, as the div there is still full-width (unlike the article) #}

skrub

  • scikit-learn compatible
  • Pandas and Polars dataframes inputs and outputs
  • Work on heterogeneous types (numeric, categorical, dates, text, missing values...)
{% endblock docs_navbar %} {% block docs_main %}

Less wrangling, more machine learning

skrub is a Python library to ease preprocessing and feature engineering for tabular machine learning.
Our long-term goal is to directly connect database tables to machine learning estimators.

Effortless Pipelines

Create strong scikit-learn pipeline baselines effortlessly with TableVectorizer and tabular_learner.

{% include "demo_tabular_learner.html" %}

Powerful Feature Engineering

Encode text and high cardinality categorical data with the GapEncoder and MinHashEncoder, and extract features from dates with the DatetimeEncoder.

{% include "demo_gap_encoder.html" %}

Interactive Data Exploration

Explore your dataframes interactively with TableReport.

{% include "demo_table_report_code.html" %}

Try it on your dataset →

Click anywhere on the table

{% include "demo_table_report_generated.html" %}

Our Community

The Skrub project is powered by the efforts of a world-wide community of contributors. Here we display a randomly selected group of 30 contributors.

Try it yourself!

Ready to write less code and get more insights? Dive into skrub now and be part of an emerging community!

{% endblock docs_main %} {%- block footer %} {%- endblock footer %}