RLPack
 
Loading...
Searching...
No Matches
rlpack.dqn.dqn.Dqn Class Reference

This is a helper class that selects the correct the variant of DQN implementations based on prioritization strategy determined by the argument prioritization_params. More...

Public Member Functions

def __new__ (cls, pytorch.nn.Module target_model, pytorch.nn.Module policy_model, pytorch.optim.Optimizer optimizer, Union[LRScheduler, None] lr_scheduler, LossFunction loss_function, float gamma, float epsilon, float min_epsilon, float epsilon_decay_rate, int epsilon_decay_frequency, int memory_buffer_size, int target_model_update_rate, int policy_model_update_rate, int backup_frequency, float lr_threshold, int batch_size, int num_actions, str save_path, int bootstrap_rounds=1, str device="cpu", Optional[Dict[str, Any]] prioritization_params=None, float force_terminal_state_selection_prob=0.0, float tau=1.0, int apply_norm=-1, int apply_norm_to=-1, float eps_for_norm=5e-12, int p_for_norm=2, int dim_for_norm=0, Optional[float] max_grad_norm=None, float grad_norm_p=2.0)
 

Static Private Member Functions

float __anneal_alpha_default_fn (float alpha, float alpha_annealing_factor)
 Protected method to anneal alpha parameter for important sampling weights. More...
 
float __anneal_beta_default_fn (float beta, float beta_annealing_factor)
 Protected method to anneal beta parameter for important sampling weights. More...
 
Dict[str, Any] __process_prioritization_params (Dict[str, Any] prioritization_params, int prioritization_strategy_code, Callable[[float, float], float] anneal_alpha_default_fn, Callable[[float, float], float] anneal_beta_default_fn, int batch_size)
 Private method to process the prioritization parameters. More...
 

Detailed Description

This is a helper class that selects the correct the variant of DQN implementations based on prioritization strategy determined by the argument prioritization_params.

Member Function Documentation

◆ __anneal_alpha_default_fn()

float rlpack.dqn.dqn.Dqn.__anneal_alpha_default_fn ( float  alpha,
float  alpha_annealing_factor 
)
staticprivate

Protected method to anneal alpha parameter for important sampling weights.

This will be called every alpha_annealing_frequency times. alpha_annealing_frequency is a key to be passed in dictionary prioritization_params argument in the DqnAgent class' constructor. This method is called by default to anneal alpha.

If alpha_annealing_frequency is not passed in prioritization_params, the annealing of alpha will not take place. This method uses another value alpha_annealing_factor that must also be passed in prioritization_params. alpha_annealing_factor is typically below 1 to slowly annealed it to 0 or min_alpha.

Parameters
alphafloat: The input alpha value to anneal.
alpha_annealing_factorfloat: The annealing factor to be used to anneal alpha.
Returns
float: Annealed alpha.

◆ __anneal_beta_default_fn()

float rlpack.dqn.dqn.Dqn.__anneal_beta_default_fn ( float  beta,
float  beta_annealing_factor 
)
staticprivate

Protected method to anneal beta parameter for important sampling weights.

This will be called every beta_annealing_frequency times. beta_annealing_frequency is a key to be passed in dictionary prioritization_params argument in the DqnAgent class' constructor.

If beta_annealing_frequency is not passed in prioritization_params, the annealing of beta will not take place. This method uses another value beta_annealing_factor that must also be passed in prioritization_params. beta_annealing_factor is typically above 1 to slowly annealed it to 1 or max_beta

Parameters
betafloat: The input beta value to anneal.
beta_annealing_factorfloat: The annealing factor to be used to anneal beta.
Returns
float: Annealed beta.

◆ __new__()

def rlpack.dqn.dqn.Dqn.__new__ (   cls,
pytorch.nn.Module  target_model,
pytorch.nn.Module  policy_model,
pytorch.optim.Optimizer  optimizer,
Union[LRScheduler, None]  lr_scheduler,
LossFunction  loss_function,
float  gamma,
float  epsilon,
float  min_epsilon,
float  epsilon_decay_rate,
int  epsilon_decay_frequency,
int  memory_buffer_size,
int  target_model_update_rate,
int  policy_model_update_rate,
int  backup_frequency,
float  lr_threshold,
int  batch_size,
int  num_actions,
str  save_path,
int   bootstrap_rounds = 1,
str   device = "cpu",
Optional[Dict[str, Any]]   prioritization_params = None,
float   force_terminal_state_selection_prob = 0.0,
float   tau = 1.0,
int   apply_norm = -1,
int   apply_norm_to = -1,
float   eps_for_norm = 5e-12,
int   p_for_norm = 2,
int   dim_for_norm = 0,
Optional[float]   max_grad_norm = None,
float   grad_norm_p = 2.0 
)
Parameters
target_modelnn.Module: The target network for DQN model. This the network which has its weights frozen.
policy_modelnn.Module: The policy network for DQN model. This is the network which is trained.
optimizeroptim.Optimizer: The optimizer wrapped with policy model's parameters.
lr_schedulerUnion[LRScheduler, None]: The PyTorch LR Scheduler with wrapped optimizer.
loss_functionLossFunction: The loss function from PyTorch's nn module. Initialized instance must be passed.
gammafloat: The gamma value for agent.
epsilonfloat: The initial epsilon for the agent.
min_epsilonfloat: The minimum epsilon for the agent. Once this value is reached, it is maintained for all further episodes.
epsilon_decay_ratefloat: The decay multiplier to decay the epsilon.
epsilon_decay_frequencyint: The number of timesteps after which the epsilon is decayed.
memory_buffer_sizeint: The buffer size of memory; or replay buffer for DQN.
target_model_update_rateint: The timesteps after which target model's weights are updated with policy model weights: weights are weighted as per tau: see below)).
policy_model_update_rateint: The timesteps after which policy model is trained. This involves backpropagation through the policy network.
backup_frequencyint: The timesteps after which models are backed up. This will also save optimizer, lr_scheduler and agent_states: epsilon the time of saving and memory.
lr_thresholdfloat: The threshold LR which once reached LR scheduler is not called further.
batch_sizeint: The batch size used for inference through target_model and train through policy model
num_actionsint: Number of actions for the environment.
save_pathstr: The save path for models: target_model and policy_model, optimizer, lr_scheduler and agent_states.
bootstrap_roundsint: The number of rounds until which gradients are to be accumulated before performing calling optimizer step. Gradients are mean reduced for bootstrap_rounds > 1. Default: 1.
devicestr: The device on which models are run. Default: "cpu".
prioritization_paramsOptional[Dict[str, Any]]: The parameters for prioritization in prioritized memory: or relay buffer). Default: None.
force_terminal_state_selection_probfloat: The probability for forcefully selecting a terminal state in a batch. Default: 0.0.
taufloat: The weighted update of weights from policy_model to target_model. This is done by formula target_weight = tau * policy_weight +: 1 - tau) * target_weight/. Default: -1.
apply_normUnion[int, str]: The code to select the normalization procedure to be applied on selected quantities; selected by apply_norm_to: see below)). Direct string can also be passed as per accepted keys. Refer below in Notes to see the accepted values. Default: -1
apply_norm_toUnion[int, List[str]]: The code to select the quantity to which normalization is to be applied. Direct list of quantities can also be passed as per accepted keys. Refer below in Notes to see the accepted values. Default: -1.
eps_for_normfloat: Epsilon value for normalization: for numeric stability. For min-max normalization and standardized normalization. Default: 5e-12.
p_for_normint: The p value for p-normalization. Default: 2: L2 Norm.
dim_for_normint: The dimension across which normalization is to be performed. Default: 0.
max_grad_normOptional[float]: The max norm for gradients for gradient clipping. Default: None
grad_norm_pOptional[float]: The p-value for p-normalization of gradients. Default: 2.0

Notes

For prioritization_params, when None: the default is passed, prioritized memory is not used. To use prioritized memory, pass a dictionary with keys alpha and beta. You can also pass alpha_decay_rate and beta_decay_rate additionally.

The code for prioritization strategies are:

  • Uniform: 0; uniform
  • Proportional: 1; proportional
  • Rank-Based: 2; rank-based

The codes for apply_norm are given as follows: -

  • No Normalization: -1; ("none")
  • Min-Max Normalization: 0; ("min_max")
  • Standardization: 1; ("standardize")
  • P-Normalization: 2; ("p_norm")

The codes for apply_norm_to are given as follows:

  • No Normalization: -1; (["none"])
  • On States only: 0; (["states"])
  • On Rewards only: 1; (["rewards"])
  • On TD value only: 2; (["td"])
  • On States and Rewards: 3; (["states", "rewards"])
  • On States and TD: 4; (["states", "td"])

If a valid max_norm_grad is passed, then gradient clipping takes place else gradient clipping step is skipped. If max_norm_grad value was invalid, error will be raised from PyTorch.

◆ __process_prioritization_params()

Dict[str, Any] rlpack.dqn.dqn.Dqn.__process_prioritization_params ( Dict[str, Any]  prioritization_params,
int  prioritization_strategy_code,
Callable[[float, float], float]  anneal_alpha_default_fn,
Callable[[float, float], float]  anneal_beta_default_fn,
int  batch_size 
)
staticprivate

Private method to process the prioritization parameters.

This includes sanity check and loading of default values of mandatory parameters.

Parameters
prioritization_paramsDict[str, Any]: The prioritization parameters for when we use prioritized memory.
prioritization_strategy_codeint: The prioritization code corresponding to the given prioritization strategy string.
anneal_alpha_default_fnCallable[[float, float], float]: The default annealing function for alpha.
anneal_beta_default_fnCallable[[float, float], float]: The default annealing function for beta.
batch_sizeint: The requested batch size; used in rank-based prioritization to determine the number of segments.
Returns
Dict[str, Any]: The processed prioritization parameters with necessary parameters loaded.