This class provides the python interface to C_Memory, the C++ class which performs heavier workloads. More...
Public Member Functions | |
None | __delitem__ (self, int index) |
Deletion method for memory. More... | |
Any | __getattr__ (self, str item) |
Get attr method for memory. More... | |
List[pytorch.Tensor] | __getitem__ (self, int index) |
Indexing method for memory. More... | |
Dict[str, Any] | __getstate__ (self) |
Get state method for memory. More... | |
def | __init__ (self, Optional[int] buffer_size=32768, Optional[str] device="cpu", int prioritization_strategy_code=0, int batch_size=32) |
int | __len__ (self) |
Length method for memory. More... | |
str | __repr__ (self) |
Repr method for memory. More... | |
None | __setattr__ (self, str key, Any value) |
Set attr method for memory. More... | |
None | __setitem__ (self, int index, Tuple[Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]], Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]], Union[np.ndarray, float], Union[np.ndarray, float], Union[bool, int], Union[pytorch.Tensor, np.ndarray, float], Union[pytorch.Tensor, np.ndarray, float], Union[pytorch.Tensor, np.ndarray, float],] transition) |
Set item method for the memory. More... | |
None | __setstate__ (self, Dict[str, Any] state) |
Set state method for the memory. More... | |
str | __str__ (self) |
The str method for memory. More... | |
None | clear (self) |
This method clear the memory and renders it empty. More... | |
List[pytorch.Tensor] | get_actions (self) |
This retrieves all the actions from transitions accumulated so far. More... | |
List[pytorch.Tensor] | get_dones (self) |
This retrieves all the dones from transitions accumulated so far. More... | |
List[float] | get_priorities (self) |
This retrieves all the priorities for all the transitions, ordered by index. More... | |
List[pytorch.Tensor] | get_rewards (self) |
This retrieves all the rewards from transitions accumulated so far. More... | |
List[pytorch.Tensor] | get_states_current (self) |
This retrieves all the current states from transitions accumulated so far. More... | |
List[pytorch.Tensor] | get_states_next (self) |
This retrieves all the next states from transitions accumulated so far. More... | |
List[int] | get_terminal_state_indices (self) |
This retrieves the terminal state indices accumulated so far. More... | |
Dict[str, pytorch.Tensor] | get_transitions (self) |
This retrieves all the transitions accumulated so far. More... | |
None | initialize (self, C_Memory.C_MemoryData memory_data) |
This loads the memory from the provided C_MemoryData instance. More... | |
None | insert (self, Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] state_current, Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] state_next, Union[np.ndarray, float] reward, Union[np.ndarray, float] action, Union[bool, int] done, Optional[Union[pytorch.Tensor, np.ndarray, float]] priority=1.0, Optional[Union[pytorch.Tensor, np.ndarray, float]] probability=1.0, Optional[Union[pytorch.Tensor, np.ndarray, float]] weight=1.0) |
This method performs insertion to the memory. More... | |
int | num_terminal_states (self) |
Returns the number of terminal states. More... | |
Tuple[ pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor,] | sample (self, float force_terminal_state_probability=0.0, int parallelism_size_threshold=4096, float alpha=0.0, float beta=0.0, int num_segments=1) |
Load random samples from memory for a given batch. More... | |
int | tree_height (self) |
Returns the height of the Sum Tree when using prioritized memory. More... | |
None | update_priorities (self, pytorch.Tensor random_indices, pytorch.Tensor new_priorities) |
This method updates the priorities when prioritized memory is used. More... | |
C_Memory.C_MemoryData | view (self) |
This method returns the view of Memory, i.e. More... | |
Data Fields | |
buffer_size | |
The input buffer size. More... | |
c_memory | |
The instance of C_Memory; the C++ backend of Memory class. More... | |
device | |
The input device argument; indicating the device name. More... | |
prioritization_strategy_code | |
The input prioritization_strategy_code. More... | |
Static Private Member Functions | |
Tuple[ pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, bool,] | __prepare_inputs_c_memory_ (Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] state_current, Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] state_next, Union[pytorch.Tensor, np.ndarray, float] reward, Union[pytorch.Tensor, np.ndarray, float] action, Union[bool, int] done, Union[pytorch.Tensor, np.ndarray, float] priority, Union[pytorch.Tensor, np.ndarray, float] probability, Union[pytorch.Tensor, np.ndarray, float] weight) |
Prepares inputs to be sent to C++ backend. More... | |
This class provides the python interface to C_Memory, the C++ class which performs heavier workloads.
This class is used as a container to store tensors and sample from that container as per desired strategy (for DQN). This is equivalent to Experience Buffer, Replay Buffer etc.
def rlpack._C.memory.Memory.__init__ | ( | self, | |
Optional[int] | buffer_size = 32768 , |
||
Optional[str] | device = "cpu" , |
||
int | prioritization_strategy_code = 0 , |
||
int | batch_size = 32 |
||
) |
buffer_size | Optional[int]: The buffer size of the memory. No more than specified buffer elements are stored in the memory. Default: 32768 |
device | str: The cuda on which models are currently running. Default: "cpu". |
prioritization_strategy_code | int: Indicates code for prioritization strategy. Default: 0. |
batch_size | int: The batch size to be used for training cycle. Default: 32 |
None rlpack._C.memory.Memory.__delitem__ | ( | self, | |
int | index | ||
) |
Deletion method for memory.
index | int: Index at which we want to delete an item. Note that this operation can be expensive depending on the size of memory; O(n). |
Any rlpack._C.memory.Memory.__getattr__ | ( | self, | |
str | item | ||
) |
Get attr method for memory.
item | str: The attributes that has been set during runtime (through setattr). |
List[pytorch.Tensor] rlpack._C.memory.Memory.__getitem__ | ( | self, | |
int | index | ||
) |
Indexing method for memory.
index | int: The index at which we want to obtain the memory data. |
Dict[str, Any] rlpack._C.memory.Memory.__getstate__ | ( | self | ) |
Get state method for memory.
This makes this Memory class pickleable.
int rlpack._C.memory.Memory.__len__ | ( | self | ) |
Length method for memory.
|
staticprivate |
Prepares inputs to be sent to C++ backend.
state_current | Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]]: The current state agent is in. |
state_next | Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]]: The next state agent will go in for the specified action. |
reward | Union[np.ndarray, float]): The reward obtained in the transition. |
action | Union[np.ndarray, float]): The action taken for the transition. |
done | Union[bool, int]: Indicates weather episodes ended or not, i.e. if state_next is a terminal state or not. |
priority | Union[pytorch.Tensor, np.ndarray, float]): The priority of the transition: for priority relay memory). Default: None. |
probability | Union[pytorch.Tensor, np.ndarray, float]): The probability of the transition : for priority relay memory). Default: None. |
weight | Union[pytorch.Tensor, np.ndarray, float]): The important sampling weight of the transition: for priority relay memory). Default: None. |
is_terminal_state
indicates if the state is terminal state or not: corresponds to done). All the input values associated with transition tuple are type-casted to PyTorch Tensors. str rlpack._C.memory.Memory.__repr__ | ( | self | ) |
Repr method for memory.
None rlpack._C.memory.Memory.__setattr__ | ( | self, | |
str | key, | ||
Any | value | ||
) |
Set attr method for memory.
key | str: The desired attribute name. |
value | Any: The value for corresponding key. |
None rlpack._C.memory.Memory.__setitem__ | ( | self, | |
int | index, | ||
Tuple[ Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]], Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]], Union[np.ndarray, float], Union[np.ndarray, float], Union[bool, int], Union[pytorch.Tensor, np.ndarray, float], Union[pytorch.Tensor, np.ndarray, float], Union[pytorch.Tensor, np.ndarray, float], ] | transition | ||
) |
Set item method for the memory.
index | int: index to insert. |
transition | Tuple[ Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]], Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]], Union[np.ndarray, float], Union[np.ndarray, float], Union[bool, int], Union[pytorch.Tensor, np.ndarray, float], Union[pytorch.Tensor, np.ndarray, float], Union[pytorch.Tensor, np.ndarray, float] ]: The transition tuple in the order: state_current, state_next, reward, action, done, priority, probability, weight). |
None rlpack._C.memory.Memory.__setstate__ | ( | self, | |
Dict[str, Any] | state | ||
) |
Set state method for the memory.
state | Dict[str, Any]: This method loads the states back to memory instance. This helps unpickle the Memory. |
str rlpack._C.memory.Memory.__str__ | ( | self | ) |
The str method for memory.
Useful for printing the memory. On calling print(memory), will print the transition information.
None rlpack._C.memory.Memory.clear | ( | self | ) |
This method clear the memory and renders it empty.
List[pytorch.Tensor] rlpack._C.memory.Memory.get_actions | ( | self | ) |
This retrieves all the actions from transitions accumulated so far.
List[pytorch.Tensor] rlpack._C.memory.Memory.get_dones | ( | self | ) |
This retrieves all the dones from transitions accumulated so far.
List[float] rlpack._C.memory.Memory.get_priorities | ( | self | ) |
This retrieves all the priorities for all the transitions, ordered by index.
List[pytorch.Tensor] rlpack._C.memory.Memory.get_rewards | ( | self | ) |
This retrieves all the rewards from transitions accumulated so far.
List[pytorch.Tensor] rlpack._C.memory.Memory.get_states_current | ( | self | ) |
This retrieves all the current states from transitions accumulated so far.
List[pytorch.Tensor] rlpack._C.memory.Memory.get_states_next | ( | self | ) |
This retrieves all the next states from transitions accumulated so far.
List[int] rlpack._C.memory.Memory.get_terminal_state_indices | ( | self | ) |
This retrieves the terminal state indices accumulated so far.
Dict[str, pytorch.Tensor] rlpack._C.memory.Memory.get_transitions | ( | self | ) |
This retrieves all the transitions accumulated so far.
None rlpack._C.memory.Memory.initialize | ( | self, | |
C_Memory.C_MemoryData | memory_data | ||
) |
This loads the memory from the provided C_MemoryData instance.
memory_data | C_Memory.C_MemoryData: The C_MemoryData instance to load the memory form. |
None rlpack._C.memory.Memory.insert | ( | self, | |
Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] | state_current, | ||
Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] | state_next, | ||
Union[np.ndarray, float] | reward, | ||
Union[np.ndarray, float] | action, | ||
Union[bool, int] | done, | ||
Optional[Union[pytorch.Tensor, np.ndarray, float]] | priority = 1.0 , |
||
Optional[Union[pytorch.Tensor, np.ndarray, float]] | probability = 1.0 , |
||
Optional[Union[pytorch.Tensor, np.ndarray, float]] | weight = 1.0 |
||
) |
This method performs insertion to the memory.
state_current | Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]]: The current state agent is in. |
state_next | Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]]: The next state agent will go in for the specified action. |
reward | Union[np.ndarray, float]: The reward obtained in the transition. |
action | Union[np.ndarray, float]: The action taken for the transition. |
done | Union[bool, int]: Indicates weather episodes ended or not, i.e. if state_next is a terminal state or not. |
priority | Optional[Union[pytorch.Tensor, np.ndarray, float]]: The priority of the transition: for priority relay memory). Default: 1.0. |
probability | Optional[Union[pytorch.Tensor, np.ndarray, float]]: The probability of the transition : for priority relay memory). Default: 1.0. |
weight | Optional[Union[pytorch.Tensor, np.ndarray, float]]: The important sampling weight of the transition: for priority relay memory). Default: 1.0. |
int rlpack._C.memory.Memory.num_terminal_states | ( | self | ) |
Returns the number of terminal states.
Tuple[ pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, ] rlpack._C.memory.Memory.sample | ( | self, | |
float | force_terminal_state_probability = 0.0 , |
||
int | parallelism_size_threshold = 4096 , |
||
float | alpha = 0.0 , |
||
float | beta = 0.0 , |
||
int | num_segments = 1 |
||
) |
Load random samples from memory for a given batch.
force_terminal_state_probability | float: The probability for forcefully selecting a terminal state in a batch. Default: 0.0. |
parallelism_size_threshold | int: The minimum size of memory beyond which parallelism is used to shuffle and retrieve the batch of sample. Default: 4096. |
alpha | float: The alpha value for computation of probabilities. Default: 0.0. |
beta | float: The beta value for computation of important sampling weights. Default: 0.0. |
num_segments | int: The number of segments to use to uniformly sample for rank-based prioritization. |
int rlpack._C.memory.Memory.tree_height | ( | self | ) |
Returns the height of the Sum Tree when using prioritized memory.
This is only relevant when using prioritized buffer. Note that tree height is given as per buffer size and not as per number of elements.
None rlpack._C.memory.Memory.update_priorities | ( | self, | |
pytorch.Tensor | random_indices, | ||
pytorch.Tensor | new_priorities | ||
) |
This method updates the priorities when prioritized memory is used.
It will also update associated probabilities and important sampling weights.
random_indices | pytorch.Tensor: The list of random indices which were sampled previously. These indices are used to update the corresponding values. Must be a 1-D PyTorch Tensor. |
new_priorities | pytorch.Tensor: The list of new priorities corresponding to random_indices passed. |
C_Memory.C_MemoryData rlpack._C.memory.Memory.view | ( | self | ) |
This method returns the view of Memory, i.e.
the data stored in the memory.
rlpack._C.memory.Memory.buffer_size |
The input buffer size.
rlpack._C.memory.Memory.device |
The input device
argument; indicating the device name.
rlpack._C.memory.Memory.prioritization_strategy_code |
The input prioritization_strategy_code.
Refer rlpack.dqn.dqn_agent.DqnAgent.__init__() for more details