RLPack
 
Loading...
Searching...
No Matches
rlpack.dqn.dqn_agent.DqnAgent Class Reference

This class implements the basic DQN methodology, i.e. More...

+ Inheritance diagram for rlpack.dqn.dqn_agent.DqnAgent:
+ Collaboration diagram for rlpack.dqn.dqn_agent.DqnAgent:

Public Member Functions

def __init__ (self, pytorch.nn.Module target_model, pytorch.nn.Module policy_model, pytorch.optim.Optimizer optimizer, Union[LRScheduler, None] lr_scheduler, LossFunction loss_function, float gamma, float epsilon, float min_epsilon, float epsilon_decay_rate, int epsilon_decay_frequency, int memory_buffer_size, int target_model_update_rate, int policy_model_update_rate, int backup_frequency, float lr_threshold, int batch_size, int num_actions, str save_path, int bootstrap_rounds=1, str device="cpu", Optional[Dict[str, Any]] prioritization_params=None, float force_terminal_state_selection_prob=0.0, float tau=1.0, Union[int, str] apply_norm=-1, Union[int, List[str]] apply_norm_to=-1, float eps_for_norm=5e-12, int p_for_norm=2, int dim_for_norm=0, Optional[float] max_grad_norm=None, float grad_norm_p=2.0)
 
None load (self, Optional[str] custom_name_suffix=None)
 This method loads the target_model, policy_model, optimizer, lr_scheduler and agent_states from the supplied save_path argument in the DQN Agent class' constructor (also called init). More...
 
int policy (self, Union[ndarray, pytorch.Tensor, List[float]] state_current)
 The policy for the agent. More...
 
None save (self, Optional[str] custom_name_suffix=None)
 This method saves the target_model, policy_model, optimizer, lr_scheduler and agent_states in the supplied save_path argument in the DQN Agent class' constructor (also called init). More...
 
int train (self, Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] state_current, Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] state_next, Union[int, float] reward, Union[int, float] action, Union[bool, int] done, Optional[Union[pytorch.Tensor, np.ndarray, float]] priority=1.0, Optional[Union[pytorch.Tensor, np.ndarray, float]] probability=1.0, Optional[Union[pytorch.Tensor, np.ndarray, float]] weight=1.0)
 
- Public Member Functions inherited from rlpack.utils.base.agent.Agent
Dict[str, Any] __getstate__ (self)
 To get the agent's current state (dict of attributes). More...
 
def __init__ (self)
 The class initializer. More...
 
None __setstate__ (self, Dict[str, Any] state)
 To load the agent's current state (dict of attributes). More...
 
None load (self, *args, **kwargs)
 Load method for the agent. More...
 
Any policy (self, *args, **kwargs)
 Policy method for the agent. More...
 
None save (self, *args, **kwargs)
 Save method for the agent. More...
 
Any train (self, *args, **kwargs)
 Training method for the agent. More...
 

Data Fields

 apply_norm
 The input apply_norm argument; indicating the normalisation to be used. More...
 
 apply_norm_to
 The input apply_norm_to argument; indicating the quantity to normalise. More...
 
 backup_frequency
 The input model backup frequency in terms of timesteps. More...
 
 batch_size
 The batch size to be used when training policy model. More...
 
 bootstrap_rounds
 The input boostrap rounds. More...
 
 device
 The input device argument; indicating the device name. More...
 
 dim_for_norm
 The input dim_for_norm argument; indicating dimension along which we wish to normalise. More...
 
 eps_for_norm
 The input eps_for_norm argument; indicating epsilon to be used for normalisation. More...
 
 epsilon
 The input exploration factor. More...
 
 epsilon_decay_frequency
 The input epsilon decay frequency in terms of timesteps. More...
 
 epsilon_decay_rate
 The input epsilon decay rate. More...
 
 force_terminal_state_selection_prob
 The input force_terminal_state_selection_prob. More...
 
 gamma
 The input discounting factor. More...
 
 grad_norm_p
 The input grad_norm_p; indicating the p-value for p-normalisation for gradient clippings. More...
 
 loss_function
 The input loss function. More...
 
 lr_scheduler
 The input optional LR Scheduler (this can be None). More...
 
 lr_threshold
 The input LR Threshold. More...
 
 max_grad_norm
 The input max_grad_norm; indicating the maximum gradient norm for gradient clippings. More...
 
 memory
 The instance of rlpack._C.memory.Memory used for Replay buffer. More...
 
 memory_buffer_size
 The input argument memory_buffer_size; indicating the buffer size used. More...
 
 min_epsilon
 The input minimum exploration factor after decays. More...
 
 num_actions
 The input number of actions. More...
 
 optimizer
 The input optimizer wrapped with policy_model parameters. More...
 
 p_for_norm
 The input p_for_norm argument; indicating p-value for p-normalisation. More...
 
 policy_model
 The input policy model. More...
 
 policy_model_update_rate
 The input argument policy_model_update_rate; indicating the update rate of policy model. More...
 
 prioritization_params
 The input prioritization parameters. More...
 
 save_path
 The input save path for backing up agent models. More...
 
 step_counter
 The step counter; counting the total timesteps done so far up to memory_buffer_size. More...
 
 target_model
 The input target model. More...
 
 target_model_update_rate
 The input argument target_model_update_rate; indicating the update rate of target model. More...
 
 tau
 The input tau; indicating the soft update used to update target_model parameters. More...
 
- Data Fields inherited from rlpack.utils.base.agent.Agent
 loss
 The list of losses accumulated after each backward call. More...
 
 save_path
 The path to save agent states and models. More...
 

Private Member Functions

def _anneal_alpha (self)
 
def _anneal_beta (self)
 
None _apply_prioritization_strategy (self, pytorch.Tensor td_value, pytorch.Tensor random_indices)
 Void protected method that applies the relevant prioritization strategy for the DQN. More...
 
None _decay_epsilon (self)
 Protected method to decay epsilon. More...
 
None _grad_mean_reduction (self)
 Performs mean reduction and assigns the policy model's parameter the mean reduced gradients. More...
 
int _infer_action (self, pytorch.Tensor state_current, bool call_from_policy=True)
 Helper method to support action inference form policy model. More...
 
Tuple[ pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor,] _load_random_experiences (self)
 This method loads random transitions from memory. More...
 
pytorch.Tensor _temporal_difference (self, pytorch.Tensor rewards, pytorch.Tensor q_values, pytorch.Tensor dones)
 This method computes the temporal difference for given transitions. More...
 
None _train_policy_model (self)
 Protected method of the class to train the policy model. More...
 
None _update_target_model (self)
 Protected method of the class to update the target model. More...
 

Private Attributes

 __prioritization_strategy_code
 The prioritization strategy code. More...
 
 _grad_accumulator
 The list of gradients from each backward call. More...
 
 _normalization
 The normalisation tool to be used for agent. More...
 

Detailed Description

This class implements the basic DQN methodology, i.e.

DQN without prioritization. This class also acts as a base class for other DQN variants all of which override the method __apply_prioritization_strategy to implement their prioritization strategy.

Constructor & Destructor Documentation

◆ __init__()

def rlpack.dqn.dqn_agent.DqnAgent.__init__ (   self,
pytorch.nn.Module  target_model,
pytorch.nn.Module  policy_model,
pytorch.optim.Optimizer  optimizer,
Union[LRScheduler, None]  lr_scheduler,
LossFunction  loss_function,
float  gamma,
float  epsilon,
float  min_epsilon,
float  epsilon_decay_rate,
int  epsilon_decay_frequency,
int  memory_buffer_size,
int  target_model_update_rate,
int  policy_model_update_rate,
int  backup_frequency,
float  lr_threshold,
int  batch_size,
int  num_actions,
str  save_path,
int   bootstrap_rounds = 1,
str   device = "cpu",
Optional[Dict[str, Any]]   prioritization_params = None,
float   force_terminal_state_selection_prob = 0.0,
float   tau = 1.0,
Union[int, str]   apply_norm = -1,
Union[int, List[str]]   apply_norm_to = -1,
float   eps_for_norm = 5e-12,
int   p_for_norm = 2,
int   dim_for_norm = 0,
Optional[float]   max_grad_norm = None,
float   grad_norm_p = 2.0 
)
Parameters
target_modelnn.Module: The target network for DQN model. This the network which has its weights frozen.
policy_modelnn.Module: The policy network for DQN model. This is the network which is trained.
optimizeroptim.Optimizer: The optimizer wrapped with policy model's parameters.
lr_schedulerUnion[LRScheduler, None]: The PyTorch LR Scheduler with wrapped optimizer.
loss_functionLossFunction: The loss function from PyTorch's nn module. Initialized instance must be passed.
gammafloat: The gamma value for agent.
epsilonfloat: The initial epsilon for the agent.
min_epsilonfloat: The minimum epsilon for the agent. Once this value is reached, it is maintained for all further episodes.
epsilon_decay_ratefloat: The decay multiplier to decay the epsilon.
epsilon_decay_frequencyint: The number of timesteps after which the epsilon is decayed.
memory_buffer_sizeint: The buffer size of memory; or replay buffer for DQN.
target_model_update_rateint: The timesteps after which target model's weights are updated with policy model weights: weights are weighted as per tau: see below)).
policy_model_update_rateint: The timesteps after which policy model is trained. This involves backpropagation through the policy network.
backup_frequencyint: The timesteps after which models are backed up. This will also save optimizer, lr_scheduler and agent_states: epsilon the time of saving and memory.
lr_thresholdfloat: The threshold LR which once reached LR scheduler is not called further.
batch_sizeint: The batch size used for inference through target_model and train through policy model
num_actionsint: Number of actions for the environment.
save_pathstr: The save path for models: target_model and policy_model, optimizer, lr_scheduler and agent_states.
bootstrap_roundsint: The number of rounds until which gradients are to be accumulated before performing calling optimizer step. Gradients are mean reduced for bootstrap_rounds > 1. Default: 1.
devicestr: The device on which models are run. Default: "cpu".
prioritization_paramsOptional[Dict[str, Any]]: The parameters for prioritization in prioritized memory: or relay buffer). Default: None.
force_terminal_state_selection_probfloat: The probability for forcefully selecting a terminal state in a batch. Default: 0.0.
taufloat: The weighted update of weights from policy_model to target_model. This is done by formula target_weight = tau * policy_weight +: 1 - tau) * target_weight/. Default: -1.
apply_normUnion[int, str]: The code to select the normalization procedure to be applied on selected quantities; selected by apply_norm_to: see below)). Direct string can also be passed as per accepted keys. Refer below in Notes to see the accepted values. Default: -1
apply_norm_toUnion[int, List[str]]: The code to select the quantity to which normalization is to be applied. Direct list of quantities can also be passed as per accepted keys. Refer below in Notes to see the accepted values. Default: -1.
eps_for_normfloat: Epsilon value for normalization: for numeric stability. For min-max normalization and standardized normalization. Default: 5e-12.
p_for_normint: The p value for p-normalization. Default: 2: L2 Norm.
dim_for_normint: The dimension across which normalization is to be performed. Default: 0.
max_grad_normOptional[float]: The max norm for gradients for gradient clipping. Default: None
grad_norm_pOptional[float]: The p-value for p-normalization of gradients. Default: 2.0.

Notes

The codes for apply_norm are given as follows: -

  • No Normalization: -1; ("none")
  • Min-Max Normalization: 0; ("min_max")
  • Standardization: 1; ("standardize")
  • P-Normalization: 2; ("p_norm")

The codes for apply_norm_to are given as follows:

  • No Normalization: -1; (["none"])
  • On States only: 0; (["states"])
  • On Rewards only: 1; (["rewards"])
  • On TD value only: 2; (["td"])
  • On States and Rewards: 3; (["states", "rewards"])
  • On States and TD: 4; (["states", "td"])

If a valid max_norm_grad is passed, then gradient clipping takes place else gradient clipping step is skipped. If max_norm_grad value was invalid, error will be raised from PyTorch.

Reimplemented from rlpack.utils.base.agent.Agent.

Reimplemented in rlpack.dqn.dqn_proportional_prioritization_agent.DqnProportionalPrioritizationAgent, and rlpack.dqn.dqn_rank_based_prioritization_agent.DqnRankBasedPrioritizationAgent.

Member Function Documentation

◆ _anneal_alpha()

def rlpack.dqn.dqn_agent.DqnAgent._anneal_alpha (   self)
private

◆ _anneal_beta()

def rlpack.dqn.dqn_agent.DqnAgent._anneal_beta (   self)
private

◆ _apply_prioritization_strategy()

None rlpack.dqn.dqn_agent.DqnAgent._apply_prioritization_strategy (   self,
pytorch.Tensor  td_value,
pytorch.Tensor  random_indices 
)
private

Void protected method that applies the relevant prioritization strategy for the DQN.

Parameters
td_valuepytorch.Tensor: The computed TD value.
random_indicesThe indices of randomly sampled transitions.

Reimplemented in rlpack.dqn.dqn_proportional_prioritization_agent.DqnProportionalPrioritizationAgent, and rlpack.dqn.dqn_rank_based_prioritization_agent.DqnRankBasedPrioritizationAgent.

◆ _decay_epsilon()

None rlpack.dqn.dqn_agent.DqnAgent._decay_epsilon (   self)
private

Protected method to decay epsilon.

This method is called every epsilon_decay_frequency timesteps and decays the epsilon by epsilon_decay_rate, both supplied in DqnAgent class' constructor.

◆ _grad_mean_reduction()

None rlpack.dqn.dqn_agent.DqnAgent._grad_mean_reduction (   self)
private

Performs mean reduction and assigns the policy model's parameter the mean reduced gradients.

◆ _infer_action()

int rlpack.dqn.dqn_agent.DqnAgent._infer_action (   self,
pytorch.Tensor  state_current,
bool   call_from_policy = True 
)
private

Helper method to support action inference form policy model.

Parameters
state_currentpytorch.Tensor: The current state of the agent in the environment
call_from_policybool: The flag indicating if the method is being from DqnAgent.policy method or not. Default: True
Returns
int: The discrete action

◆ _load_random_experiences()

Tuple[ pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, ] rlpack.dqn.dqn_agent.DqnAgent._load_random_experiences (   self)
private

This method loads random transitions from memory.

This may also include forced terminal states if supplied force_terminal_state_selection_prob > 0 in DqnAgent constructor for each batch. i.e. if force_terminal_state_selection_prob = 0.1, approximately every 1 in 10 batches will have at least one terminal state forced by the loader.

Returns
Tuple[ pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, ]: The tuple of tensors as (states_current, states_next, rewards, actions, dones, priorities, probabilities, weights, random_indices).

◆ _temporal_difference()

pytorch.Tensor rlpack.dqn.dqn_agent.DqnAgent._temporal_difference (   self,
pytorch.Tensor  rewards,
pytorch.Tensor  q_values,
pytorch.Tensor   dones 
)
private

This method computes the temporal difference for given transitions.

Parameters
rewardspytorch.Tensor: The sampled batch of rewards.
q_valuespytorch.Tensor: The q-values inferred from target_model.
donespytorch.Tensor: The done values for each transition in the batch.
Returns
pytorch.Tensor: The TD value for each sample in the batch.

◆ _train_policy_model()

None rlpack.dqn.dqn_agent.DqnAgent._train_policy_model (   self)
private

Protected method of the class to train the policy model.

This method is called every policy_model_update_rate timesteps supplied in the DqnAgent class constructor. This method will load the random samples from memory (number of samples depend on batch_size supplied in DqnAgent constructor), and train the policy_model.

◆ _update_target_model()

None rlpack.dqn.dqn_agent.DqnAgent._update_target_model (   self)
private

Protected method of the class to update the target model.

This method is called every target_model_update_rate timesteps supplied in the DqnAgent class constructor.

◆ load()

None rlpack.dqn.dqn_agent.DqnAgent.load (   self,
Optional[str]   custom_name_suffix = None 
)

This method loads the target_model, policy_model, optimizer, lr_scheduler and agent_states from the supplied save_path argument in the DQN Agent class' constructor (also called init).

Parameters
custom_name_suffixOptional[str]: If supplied, additional suffix is added to names of target_model, policy_model, optimizer and lr_scheduler. Useful to load the best model by a custom suffix supplied for evaluation. Default: None

Reimplemented from rlpack.utils.base.agent.Agent.

◆ policy()

int rlpack.dqn.dqn_agent.DqnAgent.policy (   self,
Union[ndarray, pytorch.Tensor, List[float]]  state_current 
)

The policy for the agent.

This runs the inference on policy model with state_current and uses q-values to obtain the best action.

Parameters
state_currentUnion[ndarray, pytorch.Tensor, List[float]]: The current state agent is in.
Returns
int: The action to be taken.

Reimplemented from rlpack.utils.base.agent.Agent.

◆ save()

None rlpack.dqn.dqn_agent.DqnAgent.save (   self,
Optional[str]   custom_name_suffix = None 
)

This method saves the target_model, policy_model, optimizer, lr_scheduler and agent_states in the supplied save_path argument in the DQN Agent class' constructor (also called init).

agent_states includes current memory and epsilon values in a dictionary.

Parameters
custom_name_suffixOptional[str]: If supplied, additional suffix is added to names of target_model, policy_model, optimizer and lr_scheduler. Useful to save best model by a custom suffix supplied during a train run. Default: None

Reimplemented from rlpack.utils.base.agent.Agent.

◆ train()

int rlpack.dqn.dqn_agent.DqnAgent.train (   self,
Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]]  state_current,
Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]]  state_next,
Union[int, float]  reward,
Union[int, float]  action,
Union[bool, int]  done,
Optional[Union[pytorch.Tensor, np.ndarray, float]]   priority = 1.0,
Optional[Union[pytorch.Tensor, np.ndarray, float]]   probability = 1.0,
Optional[Union[pytorch.Tensor, np.ndarray, float]]   weight = 1.0 
)
  • The training method for agent, which accepts a transition from environment and returns an action for next transition. Use this method when you intend to train the agent.
  • This method will also run the policy to yield the best action for the given state.
  • For each transition (or experience) being passed, associated priority, probability and weight can be passed.
Parameters
state_currentUnion[pytorch.Tensor, np.ndarray, List[Union[float, int]]]: The current state in the environment.
state_nextUnion[pytorch.Tensor, np.ndarray, List[Union[float, int]]]: The next state returned by the environment.
rewardUnion[int, float]: Reward obtained by performing the action for the transition.
actionUnion[int, float]: Action taken for the transition
doneUnion[bool, int]: Indicates weather episode has terminated or not.
priorityOptional[Union[pytorch.Tensor, np.ndarray, float]]: The priority of the transition: for priority relay memory). Default: 1.0
probabilityOptional[Union[pytorch.Tensor, np.ndarray, float]]: The probability of the transition : for priority relay memory). Default: 1.0
weightOptional[Union[pytorch.Tensor, np.ndarray, float]]: The important sampling weight of the transition: for priority relay memory). Default: 1.0
Returns
int: The next action to be taken from state_next.

Reimplemented from rlpack.utils.base.agent.Agent.

Field Documentation

◆ __prioritization_strategy_code

rlpack.dqn.dqn_agent.DqnAgent.__prioritization_strategy_code
private

The prioritization strategy code.

◆ _grad_accumulator

rlpack.dqn.dqn_agent.DqnAgent._grad_accumulator
private

The list of gradients from each backward call.

This is only used when boostrap_rounds > 1 and is cleared after each boostrap round. The rlpack._C.grad_accumulator.GradAccumulator object for grad accumulation.

◆ _normalization

rlpack.dqn.dqn_agent.DqnAgent._normalization
private

The normalisation tool to be used for agent.

An instance of rlpack.utils.normalization.Normalization.

◆ apply_norm

rlpack.dqn.dqn_agent.DqnAgent.apply_norm

The input apply_norm argument; indicating the normalisation to be used.

◆ apply_norm_to

rlpack.dqn.dqn_agent.DqnAgent.apply_norm_to

The input apply_norm_to argument; indicating the quantity to normalise.

◆ backup_frequency

rlpack.dqn.dqn_agent.DqnAgent.backup_frequency

The input model backup frequency in terms of timesteps.

◆ batch_size

rlpack.dqn.dqn_agent.DqnAgent.batch_size

The batch size to be used when training policy model.

Corresponding number of samples are drawn from memory as per the prioritization strategy

◆ bootstrap_rounds

rlpack.dqn.dqn_agent.DqnAgent.bootstrap_rounds

The input boostrap rounds.

◆ device

rlpack.dqn.dqn_agent.DqnAgent.device

The input device argument; indicating the device name.

◆ dim_for_norm

rlpack.dqn.dqn_agent.DqnAgent.dim_for_norm

The input dim_for_norm argument; indicating dimension along which we wish to normalise.

◆ eps_for_norm

rlpack.dqn.dqn_agent.DqnAgent.eps_for_norm

The input eps_for_norm argument; indicating epsilon to be used for normalisation.

◆ epsilon

rlpack.dqn.dqn_agent.DqnAgent.epsilon

The input exploration factor.

◆ epsilon_decay_frequency

rlpack.dqn.dqn_agent.DqnAgent.epsilon_decay_frequency

The input epsilon decay frequency in terms of timesteps.

◆ epsilon_decay_rate

rlpack.dqn.dqn_agent.DqnAgent.epsilon_decay_rate

The input epsilon decay rate.

◆ force_terminal_state_selection_prob

rlpack.dqn.dqn_agent.DqnAgent.force_terminal_state_selection_prob

The input force_terminal_state_selection_prob.

This indicates the probability to force at least one terminal state sample in a batch.

◆ gamma

rlpack.dqn.dqn_agent.DqnAgent.gamma

The input discounting factor.

◆ grad_norm_p

rlpack.dqn.dqn_agent.DqnAgent.grad_norm_p

The input grad_norm_p; indicating the p-value for p-normalisation for gradient clippings.

◆ loss_function

rlpack.dqn.dqn_agent.DqnAgent.loss_function

The input loss function.

◆ lr_scheduler

rlpack.dqn.dqn_agent.DqnAgent.lr_scheduler

The input optional LR Scheduler (this can be None).

◆ lr_threshold

rlpack.dqn.dqn_agent.DqnAgent.lr_threshold

The input LR Threshold.

◆ max_grad_norm

rlpack.dqn.dqn_agent.DqnAgent.max_grad_norm

The input max_grad_norm; indicating the maximum gradient norm for gradient clippings.

◆ memory

rlpack.dqn.dqn_agent.DqnAgent.memory

The instance of rlpack._C.memory.Memory used for Replay buffer.

◆ memory_buffer_size

rlpack.dqn.dqn_agent.DqnAgent.memory_buffer_size

The input argument memory_buffer_size; indicating the buffer size used.

◆ min_epsilon

rlpack.dqn.dqn_agent.DqnAgent.min_epsilon

The input minimum exploration factor after decays.

◆ num_actions

rlpack.dqn.dqn_agent.DqnAgent.num_actions

The input number of actions.

◆ optimizer

rlpack.dqn.dqn_agent.DqnAgent.optimizer

The input optimizer wrapped with policy_model parameters.

◆ p_for_norm

rlpack.dqn.dqn_agent.DqnAgent.p_for_norm

The input p_for_norm argument; indicating p-value for p-normalisation.

◆ policy_model

rlpack.dqn.dqn_agent.DqnAgent.policy_model

The input policy model.

◆ policy_model_update_rate

rlpack.dqn.dqn_agent.DqnAgent.policy_model_update_rate

The input argument policy_model_update_rate; indicating the update rate of policy model.

Optimizer is called every policy_model_update_rate.

◆ prioritization_params

rlpack.dqn.dqn_agent.DqnAgent.prioritization_params

The input prioritization parameters.

◆ save_path

rlpack.dqn.dqn_agent.DqnAgent.save_path

The input save path for backing up agent models.

◆ step_counter

rlpack.dqn.dqn_agent.DqnAgent.step_counter

The step counter; counting the total timesteps done so far up to memory_buffer_size.

◆ target_model

rlpack.dqn.dqn_agent.DqnAgent.target_model

The input target model.

This model's parameters are frozen.

◆ target_model_update_rate

rlpack.dqn.dqn_agent.DqnAgent.target_model_update_rate

The input argument target_model_update_rate; indicating the update rate of target model.

A soft copy of parameters takes place form policy_model to target model as per the update rate

◆ tau

rlpack.dqn.dqn_agent.DqnAgent.tau

The input tau; indicating the soft update used to update target_model parameters.