pdmongo

pdmongo.read_mongo(collection, query, db, index_col=None, extra=None, columns=None, chunksize=None)[source]

Read MongoDB query into a DataFrame.

Returns a DataFrame corresponding to the result set of the query. Optionally provide an index_col parameter to use one of the columns as the index, otherwise default integer index will be used.

Parameters
  • collection (str) – Mongo collection to select for querying

  • query (list) – Must be an aggregate query. The input will be passed to pymongo .aggregate

  • db (pymongo.database.Database or database string URI) – The database to use

  • index_col (str or list of str, optional, default: None) – Column(s) to set as index(MultiIndex).

  • extra (list, tuple or dict, optional, default: None) – List of parameters to pass to find/aggregate method.

  • chunksize (int, default None) – If specified, return an iterator where chunksize is the number of docs to include in each chunk.

Returns

Dataframe

pdmongo.to_mongo(frame, name, db, if_exists='fail', index=True, index_label=None, chunksize=None)[source]

Write records stored in a DataFrame to a MongoDB collection.

Parameters
  • frame (DataFrame, Series)

  • name (str) – Name of collection.

  • db (pymongo.database.Database or database string URI) – The database to write to

  • if_exists ({‘fail’, ‘replace’, ‘append’}, default ‘fail’) –

    • fail: If table exists, do nothing.

    • replace: If table exists, drop it, recreate it, and insert data.

    • append: If table exists, insert data. Create if does not exist.

  • index (boolean, default True) – Write DataFrame index as a column.

  • index_label (str or sequence, optional) – Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.

  • chunksize (int, optional) – Specify the number of rows in each batch to be written at a time. By default, all rows will be written at once.