Release Notes¶
Note: These release notes will only include notable or major bug fixes since most minor bug fixes tend to be esoteric and not generally interesting. Point (minor, e.g. 0.5.1) releases will generally not be found here and contain only bug fixes.
0.11.0 (June 28, 2017)¶
This release brings initial Pandas backend support along with a number of bug fixes and reliability enhancements. We recommend that all users upgrade from earlier versions of Ibis.
New features¶
- Experimental pandas backend to allow execution of ibis expression against pandas DataFrames
- Graphviz visualization of ibis expressions. Implements
_repr_png_
for Jupyter Notebook functionality - Ability to create a partitioned table from an ibis expression
- Support for missing operations in the SQLite backend: sqrt, power, variance, and standard deviation, regular expression functions, and missing power support for PostgreSQL
- Support for schemas inside databases with the PostgreSQL backend
- Appveyor testing on core ibis across all supported Python versions
- Add
year
/month
/day
methods todate
types - Ability to sort, group by and project columns according to positional index rather than only by name
- Added a
type
parameter toibis.literal
to allow user specification of literal types
Bug fixes¶
- Fix broken conda recipe
- Fix incorrectly typed fillna operation
- Fix postgres boolean summary operations
- Fix kudu support to reflect client API changes
- Fix equality of nested types and construction of nested types when the value type is specified as a string
API changes¶
- Deprecate passing integer values to the
ibis.timestamp
literal constructor, this will be removed in 0.12.0 - Added the
admin_timeout
parameter to the kudu clientconnect
function
Contributors¶
$ git shortlog --summary --numbered v0.10.0..v0.11.0
58 Phillip Cloud
1 Greg Rahn
1 Marius van Niekerk
1 Tarun Gogineni
1 Wes McKinney
0.8 (May 19, 2016)¶
This release brings initial PostgreSQL backend support along with a number of critical bug fixes and usability improvements. As several correctness bugs with the SQL compiler were fixed, we recommend that all users upgrade from earlier versions of Ibis.
New features¶
- Initial PostgreSQL backend contributed by Phillip Cloud.
- Add
groupby
as an alias forgroup_by
to table expressions
Bug fixes¶
- Fix an expression error when filtering based on a new field
- Fix Impala’s SQL compilation of using
OR
with compound filters - Various fixes with the
having(...)
function in grouped table expressions - Fix CTE (
WITH
) extraction insideUNION ALL
expressions. - Fix
ImportError
on Python 2 whenmock
library not installed
API changes¶
- The deprecated
ibis.impala_connect
andibis.make_client
APIs have been removed
0.7 (March 16, 2016)¶
This release brings initial Kudu-Impala integration and improved Impala and SQLite support, along with several critical bug fixes.
New features¶
- Apache Kudu (incubating) integration for Impala users. See the blog post for now. Will add some documentation here when possible.
- Add
use_https
option toibis.hdfs_connect
for WebHDFS connections in secure (Kerberized) clusters without SSL enabled. - Correctly compile aggregate expressions involving multiple subqueries.
To explain this last point in more detail, suppose you had:
table = ibis.table([('flag', 'string'),
('value', 'double')],
'tbl')
flagged = table[table.flag == '1']
unflagged = table[table.flag == '0']
fv = flagged.value
uv = unflagged.value
expr = (fv.mean() / fv.sum()) - (uv.mean() / uv.sum())
The last expression now generates the correct Impala or SQLite SQL:
SELECT t0.`tmp` - t1.`tmp` AS `tmp`
FROM (
SELECT avg(`value`) / sum(`value`) AS `tmp`
FROM tbl
WHERE `flag` = '1'
) t0
CROSS JOIN (
SELECT avg(`value`) / sum(`value`) AS `tmp`
FROM tbl
WHERE `flag` = '0'
) t1
Bug fixes¶
CHAR(n)
andVARCHAR(n)
Impala types now correctly map to Ibis string expressions- Fix inappropriate projection-join-filter expression rewrites resulting in incorrect generated SQL.
ImpalaClient.create_table
correctly passesSTORED AS PARQUET
forformat='parquet'
.- Fixed several issues with Ibis dependencies (impyla, thriftpy, sasl, thrift_sasl), especially for secure clusters. Upgrading will pull in these new dependencies.
- Do not fail in
ibis.impala.connect
when trying to create the temporary Ibis database if no HDFS connection passed. - Fix join predicate evaluation bug when column names overlap with table attributes.
- Fix handling of fully-materialized joins (aka
select *
joins) in SQLAlchemy / SQLite.
Contributors¶
Thank you to all who contributed patches to this release.
$ git log v0.6.0..v0.7.0 --pretty=format:%aN | sort | uniq -c | sort -rn
21 Wes McKinney
1 Uri Laserson
1 Kristopher Overholt
0.6 (December 1, 2015)¶
This release brings expanded pandas and Impala integration, including support for managing partitioned tables in Impala. See the new Ibis for Impala Users guide for more on using Ibis with Impala.
The Ibis for SQL Programmers guide also was written since the 0.5 release.
This release also includes bug fixes affecting generated SQL correctness. All users should upgrade as soon as possible.
New features¶
- New integrated Impala functionality. See Ibis for Impala Users for more details on these things.
- Improved Impala-pandas integration. Create tables or insert into existing
tables from pandas
DataFrame
objects. - Partitioned table metadata management API. Add, drop, alter, and insert into table partitions.
- Add
is_partitioned
property toImpalaTable
. - Added support for
LOAD DATA
DDL using theload_data
function, also supporting partitioned tables. - Modify table metadata (location, format, SerDe properties etc.) using
ImpalaTable.alter
- Interrupting Impala expression execution with Control-C will attempt to cancel the running query with the server.
- Set the compression codec (e.g. snappy) used with
ImpalaClient.set_compression_codec
. - Get and set query options for a client session with
ImpalaClient.get_options
andImpalaClient.set_options
. - Add
ImpalaTable.metadata
method that parses the output of theDESCRIBE FORMATTED
DDL to simplify table metadata inspection. - Add
ImpalaTable.stats
andImpalaTable.column_stats
to see computed table and partition statistics. - Add
CHAR
andVARCHAR
handling - Add
refresh
,invalidate_metadata
DDL options and addincremental
option tocompute_stats
forCOMPUTE INCREMENTAL STATS
.
- Improved Impala-pandas integration. Create tables or insert into existing
tables from pandas
- Add
substitute
method for performing multiple value substitutions in an array or scalar expression. - Division is by default true division like Python 3 for all numeric
data. This means for SQL systems that use C-style division semantics, the
appropriate
CAST
will be automatically inserted in the generated SQL. - Easier joins on tables with overlapping column names. See Ibis for SQL Programmers.
- Expressions like
string_expr[:3]
now work as expected. - Add
coalesce
instance method to all value expressions. - Passing
limit=None
to theexecute
method on expressions disables any default row limits.
API Changes¶
ImpalaTable.rename
no longer mutates the calling table expression.
Contributors¶
$ git log v0.5.0..v0.6.0 --pretty=format:%aN | sort | uniq -c | sort -rn
46 Wes McKinney
3 Uri Laserson
1 Phillip Cloud
1 mariusvniekerk
1 Kristopher Overholt
0.5 (September 10, 2015)¶
Highlights in this release are the SQLite, Python 3, Impala UDA support, and an asynchronous execution API. There are also many usability improvements, bug fixes, and other new features.
New features¶
- SQLite client and built-in function support
- Ibis now supports Python 3.4 as well as 2.6 and 2.7
- Ibis can utilize Impala user-defined aggregate (UDA) functions
- SQLAlchemy-based translation toolchain to enable more SQL engines having SQLAlchemy dialects to be supported
- Many window function usability improvements (nested analytic functions and deferred binding conveniences)
- More convenient aggregation with keyword arguments in
aggregate
functions - Built preliminary wrapper API for MADLib-on-Impala
- Add
var
andstd
aggregation methods and support in Impala - Add
nullifzero
numeric method for all SQL engines - Add
rename
method to Impala tables (for renaming tables in the Hive metastore) - Add
close
method toImpalaClient
for session cleanup (#533) - Add
relabel
method to table expressions - Add
insert
method to Impala tables - Add
compile
andverify
methods to all expressions to test compilation and ability to compile (since many operations are unavailable in SQLite, for example)
API changes¶
- Impala Ibis client creation now uses only
ibis.impala.connect
, andibis.make_client
has been deprecated
Contributors¶
$ git log v0.4.0..v0.5.0 --pretty=format:%aN | sort | uniq -c | sort -rn
55 Wes McKinney
9 Uri Laserson
1 Kristopher Overholt
0.4 (August 14, 2015)¶
New features¶
- Add tooling to use Impala C++ scalar UDFs within Ibis (#262, #195)
- Support and testing for Kerberos-enabled secure HDFS clusters
- Many table functions can now accept functions as parameters (invoked on the calling table) to enhance composability and emulate late-binding semantics of languages (like R) that have non-standard evaluation (#460)
- Add
any
,all
,notany
, andnotall
reductions on boolean arrays, as well ascumany
andcumall
- Using
topk
now produces an analytic expression that is executable (as an aggregation) but can also be used as a filter as before (#392, #91) - Added experimental database object “usability layer”, see
ImpalaClient.database
. - Add
TableExpr.info
- Add
compute_stats
API to table expressions referencing physical Impala tables - Add
explain
method toImpalaClient
to show query plan for an expression - Add
chmod
andchown
APIs toHDFS
interface for superusers - Add
convert_base
method to strings and integer types - Add option to
ImpalaClient.create_table
to create empty partitioned tables ibis.cross_join
can now join more than 2 tables at once- Add
ImpalaClient.raw_sql
method for running naked SQL queries ImpalaClient.insert
now validates schemas locally prior to sending query to cluster, for better usability.- Add conda installation recipes
Contributors¶
$ git log v0.3.0..v0.4.0 --pretty=format:%aN | sort | uniq -c | sort -rn
38 Wes McKinney
9 Uri Laserson
2 Meghana Vuyyuru
2 Kristopher Overholt
1 Marius van Niekerk
0.3 (July 20, 2015)¶
First public release. See http://ibis-project.org for more.
New features¶
- Implement window / analytic function support
- Enable non-equijoins (join clauses with operations other than
==
). - Add remaining string functions supported by Impala.
- Add
pipe
method to tables (hat-tip to the pandas dev team). - Add
mutate
convenience method to tables. - Fleshed out
WebHDFS
implementations: get/put directories, move files, etc. See the full HDFS API. - Add
truncate
method for timestamp values ImpalaClient
can execute scalar expressions not involving any table.- Can also create internal Impala tables with a specific HDFS path.
- Make Ibis’s temporary Impala database and HDFS paths configurable (see
ibis.options
). - Add
truncate_table
function to client (if the user’s Impala cluster supports it). - Python 2.6 compatibility
- Enable Ibis to execute concurrent queries in multithreaded applications (earlier versions were not thread-safe).
- Test data load script in
scripts/load_test_data.py
- Add an internal operation type signature API to enhance developer productivity.
Contributors¶
$ git log v0.2.0..v0.3.0 --pretty=format:%aN | sort | uniq -c | sort -rn
59 Wes McKinney
29 Uri Laserson
4 Isaac Hodes
2 Meghana Vuyyuru
0.2 (June 16, 2015)¶
New features¶
insert
method on Ibis client for inserting data into existing tables.parquet_file
,delimited_file
, andavro_file
client methods for querying datasets not yet available in Impala- New
ibis.hdfs_connect
method andHDFS
client API for WebHDFS for writing files and directories to HDFS - New timedelta API and improved timestamp data support
- New
bucket
andhistogram
methods on numeric expressions - New
category
logical datatype for handling bucketed data, among other things - Add
summary
API to numeric expressions - Add
value_counts
convenience API to array expressions - New string methods
like
,rlike
, andcontains
for fuzzy and regex searching - Add
options.verbose
option and configurableoptions.verbose_log
callback function for improved query logging and visibility - Support for new SQL built-in functions
ibis.coalesce
ibis.greatest
andibis.least
ibis.where
for conditional logic (see alsoibis.case
andibis.cases
)nullif
method on value expressionsibis.now
- New aggregate functions:
approx_median
,approx_nunique
, andgroup_concat
where
argument in aggregate functions- Add
having
method togroup_by
intermediate object - Added group-by convenience
table.group_by(exprs).COLUMN_NAME.agg_function()
- Add default expression names to most aggregate functions
- New Impala database client helper methods
create_database
drop_database
exists_database
list_databases
set_database
- Client
list_tables
searching / listing method - Add
add
,sub
, and other explicit arithmetic methods to value expressions
API Changes¶
- New Ibis client and Impala connection workflow. Client now combined from an Impala connection and an optional HDFS connection
Bug fixes¶
- Numerous expression API bug fixes and rough edges fixed
Contributors¶
$ git log v0.1.0..v0.2.0 --pretty=format:%aN | sort | uniq -c | sort -rn
71 Wes McKinney
1 Juliet Hougland
1 Isaac Hodes
0.1 (March 26, 2015)¶
First Ibis release.
- Expression DSL design and type system
- Expression to ImpalaSQL compiler toolchain
- Impala built-in function wrappers
$ git log 84d0435..v0.1.0 --pretty=format:%aN | sort | uniq -c | sort -rn
78 Wes McKinney
1 srus
1 Henry Robinson