API Reference¶
Creating connections¶
These methods are in the ibis
module namespace, and your main point of
entry to using Ibis.
hdfs_connect ([host, port, protocol, …]) |
Connect to HDFS |
Impala client¶
These methods are available on the Impala client object after connecting to
your HDFS cluster (ibis.hdfs_connect
) and connecting to Impala with
ibis.impala.connect
.
connect ([host, port, database, timeout, …]) |
Create an ImpalaClient for use with Ibis. |
ImpalaClient.close () |
Close Impala connection and drop any temporary objects |
ImpalaClient.database ([name]) |
Create a Database object for a given database name that can be used for |
Database methods¶
ImpalaClient.set_database (name) |
Set the default database scope for client |
ImpalaClient.create_database (name[, path, force]) |
Create a new Impala database |
ImpalaClient.drop_database (name[, force]) |
Drop an Impala database |
ImpalaClient.list_databases ([like]) |
List databases in the Impala cluster. |
ImpalaClient.exists_database (name) |
Checks if a given database exists |
ImpalaDatabase.create_table (table_name[, obj]) |
Dispatch to ImpalaClient.create_table. |
ImpalaDatabase.drop ([force]) |
Drop the database |
ImpalaDatabase.namespace (ns) |
Creates a derived Database instance for collections of objects having a common prefix. |
ImpalaDatabase.table (name) |
Return a table expression referencing a table in this database |
Table methods¶
The ImpalaClient
object itself has many helper utility methods. You’ll find
the most methods on ImpalaTable
.
ImpalaClient.database ([name]) |
Create a Database object for a given database name that can be used for |
ImpalaClient.table (name[, database]) |
Create a table expression that references a particular table in the |
ImpalaClient.sql (query) |
Convert a SQL query to an Ibis table expression |
ImpalaClient.raw_sql (query[, results]) |
Execute a given query string. |
ImpalaClient.list_tables ([like, database]) |
List tables in the current (or indicated) database. |
ImpalaClient.exists_table (name[, database]) |
Determine if the indicated table or view exists |
ImpalaClient.drop_table (table_name[, …]) |
Drop an Impala table |
ImpalaClient.create_table (table_name[, obj, …]) |
Create a new table in Impala using an Ibis table expression. |
ImpalaClient.insert (table_name[, obj, …]) |
Insert into existing table. |
ImpalaClient.truncate_table (table_name[, …]) |
Delete all rows from, but do not drop, an existing table |
ImpalaClient.get_schema (table_name[, database]) |
Return a Schema object for the indicated table and database |
ImpalaClient.cache_table (table_name[, …]) |
Caches a table in cluster memory in the given pool. |
ImpalaClient.load_data (table_name, path[, …]) |
Wraps the LOAD DATA DDL statement. |
ImpalaClient.get_options () |
Return current query options for the Impala session |
ImpalaClient.set_options (options) |
|
ImpalaClient.set_compression_codec (codec) |
Parameters |
The best way to interact with a single table is through the ImpalaTable
object you get back from ImpalaClient.table
.
ImpalaTable.add_partition (spec[, location]) |
Add a new table partition, creating any new directories in HDFS if necessary. |
ImpalaTable.alter ([location, format, …]) |
Change setting and parameters of the table. |
ImpalaTable.alter_partition (spec[, …]) |
Change setting and parameters of an existing partition |
ImpalaTable.column_stats () |
Return results of SHOW COLUMN STATS as a pandas DataFrame |
ImpalaTable.compute_stats ([incremental, async]) |
Invoke Impala COMPUTE STATS command to compute column, table, and partition statistics. |
ImpalaTable.describe_formatted () |
Return parsed results of DESCRIBE FORMATTED statement |
ImpalaTable.drop () |
Drop the table from the database |
ImpalaTable.drop_partition (spec) |
Drop an existing table partition |
ImpalaTable.files () |
Return results of SHOW FILES statement |
ImpalaTable.insert ([obj, overwrite, …]) |
Insert into Impala table. |
ImpalaTable.invalidate_metadata () |
|
ImpalaTable.load_data (path[, overwrite, …]) |
Wraps the LOAD DATA DDL statement. |
ImpalaTable.metadata () |
Return parsed results of DESCRIBE FORMATTED statement |
ImpalaTable.partition_schema () |
For partitioned tables, return the schema (names and types) for the |
ImpalaTable.partitions () |
Return a pandas.DataFrame giving information about this table’s partitions. |
ImpalaTable.refresh () |
|
ImpalaTable.rename (new_name[, database]) |
Rename table inside Impala. |
ImpalaTable.stats () |
Return results of SHOW TABLE STATS as a DataFrame. |
Creating views is also possible:
ImpalaClient.create_view (name, expr[, database]) |
Create an Impala view from a table expression |
ImpalaClient.drop_view (name[, database, force]) |
Drop an Impala view |
ImpalaClient.drop_table_or_view (name[, …]) |
Attempt to drop a relation that may be a view or table |
Accessing data formats in HDFS¶
ImpalaClient.avro_file (hdfs_dir, avro_schema) |
Create a (possibly temporary) table to read a collection of Avro data. |
ImpalaClient.delimited_file (hdfs_dir, schema) |
Interpret delimited text files (CSV / TSV / etc.) as an Ibis table. |
ImpalaClient.parquet_file (hdfs_dir[, …]) |
Make indicated parquet file in HDFS available as an Ibis table. |
Executing expressions¶
ImpalaClient.execute (expr[, params, limit, …]) |
Compile and execute Ibis expression using this backend client |
ImpalaClient.disable_codegen ([disabled]) |
Turn off or on LLVM codegen in Impala query execution |
PostgreSQL client¶
The PostgreSQL client is accessible through the ibis.postgres
namespace.
Use ibis.postgres.connect
with a SQLAlchemy-compatible connection string to
create a client.
connect ([host, user, password, port, …]) |
Create an Ibis client located at user:password`@`host:port connected to a PostgreSQL database named database. |
PostgreSQLClient.database ([name]) |
Connect to a database called name. |
PostgreSQLClient.list_tables ([like, …]) |
|
PostgreSQLClient.list_databases () |
|
PostgreSQLClient.table (name[, database, schema]) |
Create a table expression that references a particular a table called name in a PostgreSQL database called database. |
SQLite client¶
The SQLite client is accessible through the ibis.sqlite
namespace.
Use ibis.sqlite.connect
to create a SQLite client.
connect ([path, create]) |
Create an Ibis client connected to a SQLite database. |
SQLiteClient.attach (name, path[, create]) |
Connect another SQLite database file |
SQLiteClient.database ([name]) |
Create a Database object for a given database name that can be used for |
SQLiteClient.list_tables ([like, database, …]) |
|
SQLiteClient.table (name[, database]) |
Create a table expression that references a particular table in the |
HDFS¶
Client objects have an hdfs
attribute you can use to interact directly with
HDFS.
HDFS.ls (hdfs_path[, status]) |
Return contents of directory |
HDFS.chmod (hdfs_path, permissions) |
Change permissions of a file of directory |
HDFS.chown (hdfs_path[, owner, group]) |
Change owner (and/or group) of a file or directory |
HDFS.get (hdfs_path[, local_path, overwrite]) |
Download remote file or directory to the local filesystem |
HDFS.head (hdfs_path[, nbytes, offset]) |
Retrieve the requested number of bytes from a file |
HDFS.put (hdfs_path, resource[, overwrite, …]) |
Write file or directory to HDFS |
HDFS.put_tarfile (hdfs_path, local_path[, …]) |
Write contents of tar archive to HDFS directly without having to |
HDFS.rm (path) |
Delete a single file |
HDFS.rmdir (path) |
Delete a directory and all its contents |
HDFS.size (hdfs_path) |
Return total size of file or directory |
HDFS.status (path) |
Top-level expression APIs¶
These methods are available directly in the ibis
module namespace.
case () |
Similar to the .case method on array expressions, create a case builder |
literal (value[, type]) |
Create a scalar expression from a Python value. |
schema ([pairs, names, types]) |
Validate and return an Ibis Schema object |
table (schema[, name]) |
Create an unbound Ibis table for creating expressions. |
timestamp (value) |
Returns a timestamp literal if value is likely coercible to a timestamp |
where (boolean_expr, true_expr, false_null_expr) |
Equivalent to the ternary expression: if X then Y else Z |
ifelse (arg, true_expr, false_expr) |
Shorthand for implementing ternary expressions |
coalesce (*args) |
Compute the first non-null value(s) from the passed arguments in left-to-right order. |
greatest (*args) |
Compute the largest value (row-wise, if any arrays are present) among the supplied arguments. |
least (*args) |
Compute the smallest value (row-wise, if any arrays are present) among the supplied arguments. |
negate (arg) |
Negate a numeric expression |
desc (expr) |
Create a sort key (when used in sort_by) by the passed array expression or column name. |
now () |
Compute the current timestamp |
NA |
A scalar value expression representing NULL |
null () |
Create a NULL/NA scalar |
expr_list (exprs) |
|
row_number () |
Analytic function for the current row number, starting at 0 |
window ([preceding, following, group_by, …]) |
Create a window clause for use with window (analytic and aggregate) functions. |
trailing_window (periods[, group_by, order_by]) |
Create a trailing window for use with aggregate window functions. |
cumulative_window ([group_by, order_by]) |
Create a cumulative window clause for use with aggregate window functions. |
General expression methods¶
Expr.compile ([limit, params]) |
Compile expression to whatever execution target, to verify |
Expr.equals (other[, cache]) |
|
Expr.execute ([limit, async, params]) |
If this expression is based on physical tables in a database backend, execute it against that backend. |
Expr.pipe (f, *args, **kwargs) |
Generic composition function to enable expression pipelining. |
Expr.verify () |
Returns True if expression can be compiled to its attached client |
Table methods¶
TableExpr.add_column (expr[, name]) |
Add indicated column expression to table, producing a new table. |
TableExpr.aggregate (table[, metrics, by, having]) |
Aggregate a table with a given set of reductions, with grouping expressions, and post-aggregation filters. |
TableExpr.count () |
Returns the computed number of rows in the table expression |
TableExpr.distinct () |
Compute set of unique rows/tuples occurring in this table |
TableExpr.info ([buf]) |
Similar to pandas DataFrame.info. |
TableExpr.filter (table, predicates) |
Select rows from table based on boolean expressions |
TableExpr.get_column (name) |
Get a reference to a single column from the table |
TableExpr.get_columns (iterable) |
Get multiple columns from the table |
TableExpr.group_by ([by]) |
Create an intermediate grouped table expression, pending some group operation to be applied with it. |
TableExpr.groupby ([by]) |
Create an intermediate grouped table expression, pending some group operation to be applied with it. |
TableExpr.limit (table, n[, offset]) |
Select the first n rows at beginning of table (may not be deterministic depending on implementation and presence of a sorting). |
TableExpr.mutate (table[, exprs]) |
Convenience function for table projections involving adding columns |
TableExpr.projection (table, exprs) |
Compute new table expression with the indicated column expressions from this table. |
TableExpr.relabel (table, substitutions[, …]) |
Change table column names, otherwise leaving table unaltered |
TableExpr.schema () |
Get the schema for this table (if one is known) |
TableExpr.set_column (table, name, expr) |
Replace an existing column with a new expression |
TableExpr.sort_by (table, sort_exprs) |
Sort table by the indicated column expressions and sort orders |
TableExpr.union (left, right[, distinct]) |
Form the table set union of two table expressions having identical schemas. |
TableExpr.view () |
Create a new table expression that is semantically equivalent to the current one, but is considered a distinct relation for evaluation purposes (e.g. |
TableExpr.join (left, right[, predicates, how]) |
Perform a relational join between two tables. |
TableExpr.cross_join (*tables, **kwargs) |
Perform a cross join (cartesian product) amongst a list of tables, with |
TableExpr.inner_join (other[, predicates]) |
Perform a relational join between two tables. |
TableExpr.left_join (other[, predicates]) |
Perform a relational join between two tables. |
TableExpr.outer_join (other[, predicates]) |
Perform a relational join between two tables. |
TableExpr.semi_join (other[, predicates]) |
Perform a relational join between two tables. |
TableExpr.anti_join (other[, predicates]) |
Perform a relational join between two tables. |
Grouped table methods¶
GroupedTableExpr.aggregate ([metrics]) |
|
GroupedTableExpr.count ([metric_name]) |
Convenience function for computing the group sizes (number of rows per group) given a grouped table. |
GroupedTableExpr.having (expr) |
Add a post-aggregation result filter (like the having argument in |
GroupedTableExpr.mutate ([exprs]) |
Returns a table projection with analytic / window functions applied. |
GroupedTableExpr.order_by (expr) |
Expressions to use for ordering data for a window function computation. |
GroupedTableExpr.over (window) |
Add a window clause to be applied to downstream analytic expressions |
GroupedTableExpr.projection (exprs) |
Like mutate, but do not include existing table columns |
GroupedTableExpr.size ([metric_name]) |
Convenience function for computing the group sizes (number of rows per group) given a grouped table. |
Generic value methods¶
Scalar or column methods¶
ValueExpr.between (arg, lower, upper) |
Check if the input expr falls between the lower/upper bounds passed. |
ValueExpr.cast (arg, target_type) |
Cast value(s) to indicated data type. |
ValueExpr.coalesce (*args) |
Compute the first non-null value(s) from the passed arguments in left-to-right order. |
ValueExpr.fillna (arg, fill_value) |
Replace any null values with the indicated fill value |
ValueExpr.isin (arg, values) |
Check whether the value expression is contained within the indicated list of values. |
ValueExpr.notin (arg, values) |
Like isin, but checks whether this expression’s value(s) are not contained in the passed values. |
ValueExpr.nullif (value, null_if_expr) |
Set values to null if they match/equal a particular expression (scalar or array-valued). |
ValueExpr.hash (arg[, how]) |
Compute an integer hash value for the indicated value expression. |
ValueExpr.isnull (arg) |
Returns true if values are null |
ValueExpr.notnull (arg) |
Returns true if values are not null |
ValueExpr.over (expr, window) |
Turn an aggregation or full-sample analytic operation into a windowed operation. |
ValueExpr.typeof (arg) |
Return the data type of the argument according to the current backend |
ValueExpr.case (arg) |
Create a new SimpleCaseBuilder to chain multiple if-else statements. |
ValueExpr.cases (arg, case_result_pairs[, …]) |
Create a case expression in one shot. |
ValueExpr.substitute (arg, value[, …]) |
Substitute (replace) one or more values in a value expression |
Column methods¶
ColumnExpr.distinct (arg) |
Compute set of unique values occurring in this array. |
ColumnExpr.count (expr[, where]) |
Compute cardinality / sequence size of expression. |
ColumnExpr.min ([where]) |
|
ColumnExpr.max ([where]) |
|
ColumnExpr.approx_median ([where]) |
|
ColumnExpr.approx_nunique ([where]) |
|
ColumnExpr.group_concat (arg[, sep]) |
Concatenate values using the indicated separator (comma by default) to |
ColumnExpr.nunique (arg) |
Shorthand for foo.distinct().count(); computing the number of unique values in an array. |
ColumnExpr.summary (arg[, exact_nunique, prefix]) |
Compute a set of summary metrics from the input value expression |
ColumnExpr.value_counts (arg[, metric_name]) |
Compute a frequency table for this value expression |
ColumnExpr.first (arg) |
|
ColumnExpr.last (arg) |
|
ColumnExpr.dense_rank (arg) |
Compute position of first element within each equal-value group in sorted order, ignoring duplicate values. |
ColumnExpr.rank (arg) |
Compute position of first element within each equal-value group in sorted order. |
ColumnExpr.lag (arg[, offset, default]) |
|
ColumnExpr.lead (arg[, offset, default]) |
|
ColumnExpr.cummin (arg) |
Cumulative min. |
ColumnExpr.cummax (arg) |
Cumulative max. |
General numeric methods¶
Scalar or column methods¶
NumericValue.abs (arg) |
Absolute value |
NumericValue.ceil (arg) |
Round up to the nearest integer value greater than or equal to this value |
NumericValue.floor (arg) |
Round down to the nearest integer value less than or equal to this value |
NumericValue.sign (arg) |
|
NumericValue.exp (arg) |
|
NumericValue.sqrt (arg) |
|
NumericValue.log (arg[, base]) |
Perform the logarithm using a specified base |
NumericValue.ln (arg) |
Natural logarithm |
NumericValue.log2 (arg) |
Logarithm base 2 |
NumericValue.log10 (arg) |
Logarithm base 10 |
NumericValue.round (arg[, digits]) |
Round values either to integer or indicated number of decimal places. |
NumericValue.nullifzero (arg) |
Set values to NULL if they equal to zero. |
NumericValue.zeroifnull (arg) |
|
NumericValue.add (other) |
|
NumericValue.sub (other) |
|
NumericValue.mul (other) |
|
NumericValue.div (other) |
|
NumericValue.pow (other) |
|
NumericValue.rdiv (other) |
|
NumericValue.rsub (other) |
Column methods¶
NumericColumn.sum ([where]) |
|
NumericColumn.mean ([where]) |
|
NumericColumn.std (arg[, where, how]) |
Compute standard deviation of numeric array |
NumericColumn.var (arg[, where, how]) |
Compute standard deviation of numeric array |
NumericColumn.cumsum (arg) |
Cumulative sum. |
NumericColumn.cummean (arg) |
Cumulative mean. |
NumericColumn.bottomk (arg, k[, by]) |
|
NumericColumn.topk (arg, k[, by]) |
Produces |
NumericColumn.bucket (arg, buckets[, closed, …]) |
Compute a discrete binning of a numeric array |
NumericColumn.histogram (arg[, nbins, …]) |
Compute a histogram with fixed width bins |
Integer methods¶
Scalar or column methods¶
IntegerValue.convert_base (arg, from_base, …) |
Convert number (as integer or string) from one base to another |
IntegerValue.to_timestamp (arg[, unit]) |
Convert integer UNIX timestamp (at some resolution) to a timestamp type |
String methods¶
All string operations are valid either on scalar or array values
StringValue.convert_base (arg, from_base, to_base) |
Convert number (as integer or string) from one base to another |
StringValue.length (arg) |
Compute length of strings |
StringValue.lower (arg) |
Convert string to all lowercase |
StringValue.upper (arg) |
Convert string to all uppercase |
StringValue.reverse (arg) |
|
StringValue.ascii_str (arg) |
|
StringValue.strip (arg) |
Remove whitespace from left and right sides of string |
StringValue.lstrip (arg) |
Remove whitespace from left side of string |
StringValue.rstrip (arg) |
Remove whitespace from right side of string |
StringValue.capitalize (arg) |
|
StringValue.contains (arg, substr) |
Determine if indicated string is exactly contained in the calling string. |
StringValue.like (patterns) |
Wildcard fuzzy matching function equivalent to the SQL LIKE directive. |
StringValue.parse_url (arg, extract[, key]) |
Returns the portion of a URL corresponding to a part specified |
StringValue.substr (start[, length]) |
Pull substrings out of each string value by position and maximum length. |
StringValue.left (nchars) |
Return left-most up to N characters from each string. |
StringValue.right (nchars) |
Split up to nchars starting from end of each string. |
StringValue.repeat (n) |
Returns the argument string repeated n times |
StringValue.find (substr[, start, end]) |
Returns position (0 indexed) of first occurence of substring, |
StringValue.translate (from_str, to_str) |
Returns string with set of ‘from’ characters replaced by set of ‘to’ characters. |
StringValue.find_in_set (str_list) |
Returns postion (0 indexed) of first occurence of argument within a list of strings. |
StringValue.join (strings) |
Joins a list of strings together using the calling string as a separator |
StringValue.replace (arg, pattern, replacement) |
Replaces each exactly occurrence of pattern with given replacement string. |
StringValue.lpad (length[, pad]) |
Returns string of given length by truncating (on right) |
StringValue.rpad (length[, pad]) |
Returns string of given length by truncating (on right) |
StringValue.rlike (arg, pattern) |
Search string values using a regular expression. |
StringValue.re_search (arg, pattern) |
Search string values using a regular expression. |
StringValue.re_extract (arg, pattern, index) |
Returns specified index, 0 indexed, from string based on regex pattern |
StringValue.re_replace (arg, pattern, replacement) |
Replaces match found by regex with replacement string. |
Timestamp methods¶
All timestamp operations are valid either on scalar or array values
TimestampValue.truncate (arg, unit) |
Zero out smaller-size units beyond indicated unit. |
TimestampValue.year () |
|
TimestampValue.month () |
|
TimestampValue.day () |
|
TimestampValue.hour () |
|
TimestampValue.minute () |
|
TimestampValue.second () |
|
TimestampValue.millisecond () |
Boolean methods¶
BooleanValue.ifelse (arg, true_expr, false_expr) |
Shorthand for implementing ternary expressions |
BooleanColumn.any (arg) |
|
BooleanColumn.all (arg) |
|
BooleanColumn.cumany (arg) |
Cumulative any |
BooleanColumn.cumall (arg) |
Cumulative all |
Category methods¶
Category is a logical type with either a known or unknown cardinality. Values are represented semantically as integers starting at 0.
CategoryValue.label (arg, labels[, nulls]) |
Format a known number of categories as strings |
Decimal methods¶
DecimalValue.precision (arg) |
|
DecimalValue.scale (arg) |