pyjstat

pyjstat is a python module for JSON-stat formatted data manipulation.

This module allows reading and writing JSON-stat [1] format with python, using data frame structures provided by the widely accepted pandas library [2]. The JSON-stat format is a simple lightweight JSON format for data dissemination. Pyjstat is inspired in rjstat [3], a library to read and write JSON-stat with R, by ajschumacher.

pyjstat is written and maintained by Miguel Expósito Martín and is distributed under the Apache 2.0 License (see LICENSE file).

[1]http://json-stat.org/ for JSON-stat information
[2]http://pandas.pydata.org for Python Data Analysis Library information
[3]https://github.com/ajschumacher/rjstat for rjstat library information

Example

Importing a JSON-stat file into a pandas data frame can be done as follows:

import urllib2
import json
import pyjstat
results = pyjstat.from_json_stat(json.load(urllib2.urlopen(
'http://json-stat.org/samples/oecd-canada.json')))
print results
class pyjstat.NumpyEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None)

Custom JSON encoder class for Numpy data types.

pyjstat.check_input(naming)

Check and validate input params.

Parameters:naming (string) – a string containing the naming type (label or id).
Returns:Nothing
Raises:ValueError – if the parameter is not in the allowed list.
pyjstat.from_json_stat(datasets, naming='label', value='value')

Decode JSON-stat formatted data into pandas.DataFrame object.

Parameters:
  • datasets (OrderedDict, list) – data in JSON-stat format, previously deserialized to a python object by json.load() or json.loads(), for example. Both List and OrderedDict are accepted as inputs.
  • naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id.’
  • value (string, optional) – name of the value column. Defaults to ‘value’.
Returns:

results – list of pandas.DataFrame with imported data.

Return type:

list

pyjstat.generate_df(js_dict, naming, value='value')

Decode JSON-stat dict into pandas.DataFrame object. Helper method that should be called inside from_json_stat().

Parameters:
  • js_dict (OrderedDict) – OrderedDict with data in JSON-stat format, previously deserialized into a python object by json.load() or json.loads(), for example.
  • naming (string) – dimension naming. Possible values: ‘label’ or ‘id.’
  • value (string, optional) – name of the value column. Defaults to ‘value’.
Returns:

output – pandas.DataFrame with converted data.

Return type:

DataFrame

pyjstat.get_df_row(dimensions, naming='label', i=0, record=None)

Generate row dimension values for a pandas dataframe.

Parameters:
  • dimensions (list) – list of pandas dataframes with dimension labels generated by get_dim_label or get_dim_index methods.
  • naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.
  • i (int) – dimension list iteration index. Default is 0, it’s used in the recursive calls to the method.
  • record (list) – list of values representing a pandas dataframe row, except for the value column. Default is empty, it’s used in the recursive calls to the method.
Yields:

list – list with pandas dataframe column values except for value column

pyjstat.get_dim_index(js_dict, dim)

Get index from a given dimension.

Parameters:
  • js_dict (dict) – dictionary containing dataset data and metadata.
  • dim (string) – dimension name obtained from JSON file.
Returns:

dim_index – DataFrame with index-based dimension data.

Return type:

pandas.DataFrame

pyjstat.get_dim_label(js_dict, dim)

Get label from a given dimension.

Parameters:
  • js_dict (dict) – dictionary containing dataset data and metadata.
  • dim (string) – dimension name obtained from JSON file.
Returns:

dim_label – DataFrame with label-based dimension data.

Return type:

pandas.DataFrame

pyjstat.get_dimensions(js_dict, naming)

Get dimensions from input data.

Parameters:
  • js_dict (dict) – dictionary containing dataset data and metadata.
  • naming (string, optional) – dimension naming. Possible values: ‘label’ or ‘id’.
Returns:

dimensions – list of pandas data frames with dimension category data. dim_names (list): list of strings with dimension names.

Return type:

list

pyjstat.get_values(js_dict, value='value')

Get values from input data.

Parameters:
  • js_dict (dict) – dictionary containing dataset data and metadata.
  • value (string, optional) – name of the value column. Defaults to ‘value’.
Returns:

values – list of dataset values.

Return type:

list

pyjstat.to_int(variable)

Convert variable to integer or string depending on the case.

Parameters:variable (string) – a string containing a real string or an integer.
Returns:variable – an integer or a string, depending on the content of variable.
Return type:int, string
pyjstat.to_json_stat(input_df, value='value', output='list')
Encode pandas.DataFrame object into JSON-stat format. The DataFrames
must have exactly one value column.
Parameters:
  • df – pandas data frame (or list of data frames) to
  • output (string) – accepts two values: ‘list’ or ‘dict’. Produce list of dicts or dict of dicts as output.
Returns:

output – String with JSON-stat object.

Return type:

string

pyjstat.to_str(variable)

Convert variable to integer or string depending on the case.

Parameters:variable (string) – a string containing a real string or an integer.
Returns:variable – an integer or a string, depending on the content of variable.
Return type:int, string
pyjstat.uniquify(seq)

Return unique values in a list in the original order. See: http://www.peterbe.com/plog/uniqifiers-benchmark

Parameters:seq (list) – original list.
Returns:list without duplicates preserving original order.
Return type:list

Previous topic

Welcome to pyjstat’s documentation!

This Page