tensortrade.environments.trading_environment module

class tensortrade.environments.trading_environment.TradingEnvironment(exchange, action_strategy, reward_strategy, feature_pipeline=None, **kwargs)[source]

Bases: tensorforce.environments.environment.Environment, gym.core.Env

A trading environment made for use with Gym-compatible reinforcement learning algorithms.

__init__(exchange, action_strategy, reward_strategy, feature_pipeline=None, **kwargs)[source]
Parameters
  • exchange (InstrumentExchange) – The InstrumentExchange that will be used to feed data from and execute trades within.

  • action_strategy (ActionStrategy) – The strategy for transforming an action into a Trade at each timestep.

  • reward_strategy (RewardStrategy) – The strategy for determining the reward at each timestep.

  • feature_pipeline (optional) – The pipeline of features to pass the observations through.

  • kwargs (optional) – Additional arguments for tuning the environment, logging, etc.

property action_strategy

The strategy for transforming an action into a Trade at each timestep.

Return type

ActionStrategy

property actions

The action space specification, required for tensorforce agents.

The tuple contains the following attributes:
  • type: Either ‘bool’, ‘int’, or ‘float’.

  • shape: The shape of the space. An int or list/tuple of `int`s.

  • num_actions (required if type == ‘int’): The number of discrete actions.

  • min_value (optional if type == ‘float’): An int or float. Defaults to None.

  • max_value (optional if type == ‘float’): An int or float. Defaults to None.

Return type

Tuple[Union[bool, int, float], Union[int, List[int], Tuple[int, …]], int, Union[int, float], Union[int, float]]

property exchange

The InstrumentExchange that will be used to feed data from and execute trades within.

Return type

InstrumentExchange

execute(action)[source]

Run one timestep within the environment based on the specified action, required for tensorforce agents.

Parameters

action – The trade action provided by the agent for this timestep.

Return type

Tuple[DataFrame, float, bool, dict]

Returns

observation (np.ndarray) – Provided by the environment’s exchange, often OHLCV or tick trade history data points. terminal (bool): If True, the environment is complete and should be restarted. reward (float): An amount corresponding to the benefit earned by the action taken this timestep.

property feature_pipeline

The feature pipeline to pass the observations through.

Return type

FeaturePipeline

render(mode='none')[source]

Renders the environment.

reset()[source]

Resets the state of the environment and returns an initial observation.

Return type

DataFrame

Returns

observation – the initial observation.

property reward_strategy

The strategy for determining the reward at each timestep.

Return type

RewardStrategy

property states

The state space specification, required for tensorforce agents.

The tuple contains the following attributes:
  • type: Either ‘bool’, ‘int’, or ‘float’.

  • shape: The shape of the space. An int or list/tuple of `int`s.

Return type

Tuple[Union[bool, int, float], Union[int, List[int], Tuple[int, …]]]

step(action)[source]

Run one timestep within the environment based on the specified action.

Parameters

action – The trade action provided by the agent for this timestep.

Return type

Tuple[DataFrame, float, bool, dict]

Returns

observation (pandas.DataFrame) – Provided by the environment’s exchange, often OHLCV or tick trade history data points. reward (float): An amount corresponding to the benefit earned by the action taken this timestep. done (bool): If True, the environment is complete and should be restarted. info (dict): Any auxiliary, diagnostic, or debugging information to output.