This class implements the DQN with Proportional prioritization strategy. More...
Public Member Functions | |
def | __init__ (self, pytorch.nn.Module target_model, pytorch.nn.Module policy_model, pytorch.optim.Optimizer optimizer, Union[LRScheduler, None] lr_scheduler, LossFunction loss_function, float gamma, float epsilon, float min_epsilon, float epsilon_decay_rate, int epsilon_decay_frequency, int memory_buffer_size, int target_model_update_rate, int policy_model_update_rate, int backup_frequency, float lr_threshold, int batch_size, int num_actions, str save_path, int bootstrap_rounds=1, str device="cpu", Optional[Dict[str, Any]] prioritization_params=None, float force_terminal_state_selection_prob=0.0, float tau=1.0, int apply_norm=-1, int apply_norm_to=-1, float eps_for_norm=5e-12, int p_for_norm=2, int dim_for_norm=0, Optional[float] max_grad_norm=None, float grad_norm_p=2.0) |
![]() | |
def | __init__ (self, pytorch.nn.Module target_model, pytorch.nn.Module policy_model, pytorch.optim.Optimizer optimizer, Union[LRScheduler, None] lr_scheduler, LossFunction loss_function, float gamma, float epsilon, float min_epsilon, float epsilon_decay_rate, int epsilon_decay_frequency, int memory_buffer_size, int target_model_update_rate, int policy_model_update_rate, int backup_frequency, float lr_threshold, int batch_size, int num_actions, str save_path, int bootstrap_rounds=1, str device="cpu", Optional[Dict[str, Any]] prioritization_params=None, float force_terminal_state_selection_prob=0.0, float tau=1.0, Union[int, str] apply_norm=-1, Union[int, List[str]] apply_norm_to=-1, float eps_for_norm=5e-12, int p_for_norm=2, int dim_for_norm=0, Optional[float] max_grad_norm=None, float grad_norm_p=2.0) |
None | load (self, Optional[str] custom_name_suffix=None) |
This method loads the target_model, policy_model, optimizer, lr_scheduler and agent_states from the supplied save_path argument in the DQN Agent class' constructor (also called init). More... | |
int | policy (self, Union[ndarray, pytorch.Tensor, List[float]] state_current) |
The policy for the agent. More... | |
None | save (self, Optional[str] custom_name_suffix=None) |
This method saves the target_model, policy_model, optimizer, lr_scheduler and agent_states in the supplied save_path argument in the DQN Agent class' constructor (also called init). More... | |
int | train (self, Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] state_current, Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] state_next, Union[int, float] reward, Union[int, float] action, Union[bool, int] done, Optional[Union[pytorch.Tensor, np.ndarray, float]] priority=1.0, Optional[Union[pytorch.Tensor, np.ndarray, float]] probability=1.0, Optional[Union[pytorch.Tensor, np.ndarray, float]] weight=1.0) |
![]() | |
Dict[str, Any] | __getstate__ (self) |
To get the agent's current state (dict of attributes). More... | |
def | __init__ (self) |
The class initializer. More... | |
None | __setstate__ (self, Dict[str, Any] state) |
To load the agent's current state (dict of attributes). More... | |
None | load (self, *args, **kwargs) |
Load method for the agent. More... | |
Any | policy (self, *args, **kwargs) |
Policy method for the agent. More... | |
None | save (self, *args, **kwargs) |
Save method for the agent. More... | |
Any | train (self, *args, **kwargs) |
Training method for the agent. More... | |
Private Member Functions | |
None | _apply_prioritization_strategy (self, pytorch.Tensor td_value, pytorch.Tensor random_indices) |
Void private method that applies the relevant prioritization strategy for the DQN. More... | |
Private Attributes | |
__prioritization_strategy_code | |
Additional Inherited Members | |
![]() | |
apply_norm | |
The input apply_norm argument; indicating the normalisation to be used. More... | |
apply_norm_to | |
The input apply_norm_to argument; indicating the quantity to normalise. More... | |
backup_frequency | |
The input model backup frequency in terms of timesteps. More... | |
batch_size | |
The batch size to be used when training policy model. More... | |
bootstrap_rounds | |
The input boostrap rounds. More... | |
device | |
The input device argument; indicating the device name. More... | |
dim_for_norm | |
The input dim_for_norm argument; indicating dimension along which we wish to normalise. More... | |
eps_for_norm | |
The input eps_for_norm argument; indicating epsilon to be used for normalisation. More... | |
epsilon | |
The input exploration factor. More... | |
epsilon_decay_frequency | |
The input epsilon decay frequency in terms of timesteps. More... | |
epsilon_decay_rate | |
The input epsilon decay rate. More... | |
force_terminal_state_selection_prob | |
The input force_terminal_state_selection_prob . More... | |
gamma | |
The input discounting factor. More... | |
grad_norm_p | |
The input grad_norm_p ; indicating the p-value for p-normalisation for gradient clippings. More... | |
loss_function | |
The input loss function. More... | |
lr_scheduler | |
The input optional LR Scheduler (this can be None). More... | |
lr_threshold | |
The input LR Threshold. More... | |
max_grad_norm | |
The input max_grad_norm ; indicating the maximum gradient norm for gradient clippings. More... | |
memory | |
The instance of rlpack._C.memory.Memory used for Replay buffer. More... | |
memory_buffer_size | |
The input argument memory_buffer_size ; indicating the buffer size used. More... | |
min_epsilon | |
The input minimum exploration factor after decays. More... | |
num_actions | |
The input number of actions. More... | |
optimizer | |
The input optimizer wrapped with policy_model parameters. More... | |
p_for_norm | |
The input p_for_norm argument; indicating p-value for p-normalisation. More... | |
policy_model | |
The input policy model. More... | |
policy_model_update_rate | |
The input argument policy_model_update_rate ; indicating the update rate of policy model. More... | |
prioritization_params | |
The input prioritization parameters. More... | |
save_path | |
The input save path for backing up agent models. More... | |
step_counter | |
The step counter; counting the total timesteps done so far up to memory_buffer_size. More... | |
target_model | |
The input target model. More... | |
target_model_update_rate | |
The input argument target_model_update_rate ; indicating the update rate of target model. More... | |
tau | |
The input tau ; indicating the soft update used to update target_model parameters. More... | |
![]() | |
loss | |
The list of losses accumulated after each backward call. More... | |
save_path | |
The path to save agent states and models. More... | |
This class implements the DQN with Proportional prioritization strategy.
def rlpack.dqn.dqn_proportional_prioritization_agent.DqnProportionalPrioritizationAgent.__init__ | ( | self, | |
pytorch.nn.Module | target_model, | ||
pytorch.nn.Module | policy_model, | ||
pytorch.optim.Optimizer | optimizer, | ||
Union[LRScheduler, None] | lr_scheduler, | ||
LossFunction | loss_function, | ||
float | gamma, | ||
float | epsilon, | ||
float | min_epsilon, | ||
float | epsilon_decay_rate, | ||
int | epsilon_decay_frequency, | ||
int | memory_buffer_size, | ||
int | target_model_update_rate, | ||
int | policy_model_update_rate, | ||
int | backup_frequency, | ||
float | lr_threshold, | ||
int | batch_size, | ||
int | num_actions, | ||
str | save_path, | ||
int | bootstrap_rounds = 1 , |
||
str | device = "cpu" , |
||
Optional[Dict[str, Any]] | prioritization_params = None , |
||
float | force_terminal_state_selection_prob = 0.0 , |
||
float | tau = 1.0 , |
||
int | apply_norm = -1 , |
||
int | apply_norm_to = -1 , |
||
float | eps_for_norm = 5e-12 , |
||
int | p_for_norm = 2 , |
||
int | dim_for_norm = 0 , |
||
Optional[float] | max_grad_norm = None , |
||
float | grad_norm_p = 2.0 |
||
) |
target_model | nn.Module: The target network for DQN model. This the network which has its weights frozen. |
policy_model | nn.Module: The policy network for DQN model. This is the network which is trained. |
optimizer | optim.Optimizer: The optimizer wrapped with policy model's parameters. |
lr_scheduler | Union[LRScheduler, None]: The PyTorch LR Scheduler with wrapped optimizer. |
loss_function | LossFunction: The loss function from PyTorch's nn module. Initialized instance must be passed. |
gamma | float: The gamma value for agent. |
epsilon | float: The initial epsilon for the agent. |
min_epsilon | float: The minimum epsilon for the agent. Once this value is reached, it is maintained for all further episodes. |
epsilon_decay_rate | float: The decay multiplier to decay the epsilon. |
epsilon_decay_frequency | int: The number of timesteps after which the epsilon is decayed. |
memory_buffer_size | int: The buffer size of memory; or replay buffer for DQN. |
target_model_update_rate | int: The timesteps after which target model's weights are updated with policy model weights: weights are weighted as per tau : see below)). |
policy_model_update_rate | int: The timesteps after which policy model is trained. This involves backpropagation through the policy network. |
backup_frequency | int: The timesteps after which models are backed up. This will also save optimizer, lr_scheduler and agent_states: epsilon the time of saving and memory. |
lr_threshold | float: The threshold LR which once reached LR scheduler is not called further. |
batch_size | int: The batch size used for inference through target_model and train through policy model |
num_actions | int: Number of actions for the environment. |
save_path | str: The save path for models: target_model and policy_model, optimizer, lr_scheduler and agent_states. |
bootstrap_rounds | int: The number of rounds until which gradients are to be accumulated before performing calling optimizer step. Gradients are mean reduced for bootstrap_rounds > 1. Default: 1. |
device | str: The device on which models are run. Default: "cpu". |
prioritization_params | Optional[Dict[str, Any]]: The parameters for prioritization in prioritized memory: or relay buffer). Default: None. |
force_terminal_state_selection_prob | float: The probability for forcefully selecting a terminal state in a batch. Default: 0.0. |
tau | float: The weighted update of weights from policy_model to target_model. This is done by formula target_weight = tau * policy_weight +: 1 - tau) * target_weight/. Default: -1. |
apply_norm | Union[int, str]: The code to select the normalization procedure to be applied on selected quantities; selected by apply_norm_to : see below)). Direct string can also be passed as per accepted keys. Refer below in Notes to see the accepted values. Default: -1 |
apply_norm_to | Union[int, List[str]]: The code to select the quantity to which normalization is to be applied. Direct list of quantities can also be passed as per accepted keys. Refer below in Notes to see the accepted values. Default: -1. |
eps_for_norm | float: Epsilon value for normalization: for numeric stability. For min-max normalization and standardized normalization. Default: 5e-12. |
p_for_norm | int: The p value for p-normalization. Default: 2: L2 Norm. |
dim_for_norm | int: The dimension across which normalization is to be performed. Default: 0. |
max_grad_norm | Optional[float]: The max norm for gradients for gradient clipping. Default: None |
grad_norm_p | Optional[float]: The p-value for p-normalization of gradients. Default: 2.0. |
Notes
The codes for apply_norm
are given as follows: -
"none"
)"min_max"
)"standardize"
)"p_norm"
)The codes for apply_norm_to
are given as follows:
["none"]
)["states"]
)["rewards"]
)["td"]
)["states", "rewards"]
)["states", "td"]
)If a valid max_norm_grad
is passed, then gradient clipping takes place else gradient clipping step is skipped. If max_norm_grad
value was invalid, error will be raised from PyTorch.
Reimplemented from rlpack.dqn.dqn_agent.DqnAgent.
|
private |
Void private method that applies the relevant prioritization strategy for the DQN.
td_value | pytorch.Tensor: The computed TD value. |
random_indices | The indices of randomly sampled transitions. |
Reimplemented from rlpack.dqn.dqn_agent.DqnAgent.
|
private |