RLPack
 
Loading...
Searching...
No Matches
rlpack._C.memory.Memory Class Reference

This class provides the python interface to C_Memory, the C++ class which performs heavier workloads. More...

+ Inheritance diagram for rlpack._C.memory.Memory:
+ Collaboration diagram for rlpack._C.memory.Memory:

Public Member Functions

None __delitem__ (self, int index)
 Deletion method for memory. More...
 
Any __getattr__ (self, str item)
 Get attr method for memory. More...
 
List[pytorch.Tensor] __getitem__ (self, int index)
 Indexing method for memory. More...
 
Dict[str, Any] __getstate__ (self)
 Get state method for memory. More...
 
def __init__ (self, Optional[int] buffer_size=32768, Optional[str] device="cpu", int prioritization_strategy_code=0, int batch_size=32)
 
int __len__ (self)
 Length method for memory. More...
 
str __repr__ (self)
 Repr method for memory. More...
 
None __setattr__ (self, str key, Any value)
 Set attr method for memory. More...
 
None __setitem__ (self, int index, Tuple[Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]], Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]], Union[np.ndarray, float], Union[np.ndarray, float], Union[bool, int], Union[pytorch.Tensor, np.ndarray, float], Union[pytorch.Tensor, np.ndarray, float], Union[pytorch.Tensor, np.ndarray, float],] transition)
 Set item method for the memory. More...
 
None __setstate__ (self, Dict[str, Any] state)
 Set state method for the memory. More...
 
str __str__ (self)
 The str method for memory. More...
 
None clear (self)
 This method clear the memory and renders it empty. More...
 
List[pytorch.Tensor] get_actions (self)
 This retrieves all the actions from transitions accumulated so far. More...
 
List[pytorch.Tensor] get_dones (self)
 This retrieves all the dones from transitions accumulated so far. More...
 
List[float] get_priorities (self)
 This retrieves all the priorities for all the transitions, ordered by index. More...
 
List[pytorch.Tensor] get_rewards (self)
 This retrieves all the rewards from transitions accumulated so far. More...
 
List[pytorch.Tensor] get_states_current (self)
 This retrieves all the current states from transitions accumulated so far. More...
 
List[pytorch.Tensor] get_states_next (self)
 This retrieves all the next states from transitions accumulated so far. More...
 
List[int] get_terminal_state_indices (self)
 This retrieves the terminal state indices accumulated so far. More...
 
Dict[str, pytorch.Tensor] get_transitions (self)
 This retrieves all the transitions accumulated so far. More...
 
None initialize (self, C_Memory.C_MemoryData memory_data)
 This loads the memory from the provided C_MemoryData instance. More...
 
None insert (self, Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] state_current, Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] state_next, Union[np.ndarray, float] reward, Union[np.ndarray, float] action, Union[bool, int] done, Optional[Union[pytorch.Tensor, np.ndarray, float]] priority=1.0, Optional[Union[pytorch.Tensor, np.ndarray, float]] probability=1.0, Optional[Union[pytorch.Tensor, np.ndarray, float]] weight=1.0)
 This method performs insertion to the memory. More...
 
int num_terminal_states (self)
 Returns the number of terminal states. More...
 
Tuple[ pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor,] sample (self, float force_terminal_state_probability=0.0, int parallelism_size_threshold=4096, float alpha=0.0, float beta=0.0, int num_segments=1)
 Load random samples from memory for a given batch. More...
 
int tree_height (self)
 Returns the height of the Sum Tree when using prioritized memory. More...
 
None update_priorities (self, pytorch.Tensor random_indices, pytorch.Tensor new_priorities)
 This method updates the priorities when prioritized memory is used. More...
 
C_Memory.C_MemoryData view (self)
 This method returns the view of Memory, i.e. More...
 

Data Fields

 buffer_size
 The input buffer size. More...
 
 c_memory
 The instance of C_Memory; the C++ backend of Memory class. More...
 
 device
 The input device argument; indicating the device name. More...
 
 prioritization_strategy_code
 The input prioritization_strategy_code. More...
 

Static Private Member Functions

Tuple[ pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, bool,] __prepare_inputs_c_memory_ (Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] state_current, Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]] state_next, Union[pytorch.Tensor, np.ndarray, float] reward, Union[pytorch.Tensor, np.ndarray, float] action, Union[bool, int] done, Union[pytorch.Tensor, np.ndarray, float] priority, Union[pytorch.Tensor, np.ndarray, float] probability, Union[pytorch.Tensor, np.ndarray, float] weight)
 Prepares inputs to be sent to C++ backend. More...
 

Detailed Description

This class provides the python interface to C_Memory, the C++ class which performs heavier workloads.

This class is used as a container to store tensors and sample from that container as per desired strategy (for DQN). This is equivalent to Experience Buffer, Replay Buffer etc.

Constructor & Destructor Documentation

◆ __init__()

def rlpack._C.memory.Memory.__init__ (   self,
Optional[int]   buffer_size = 32768,
Optional[str]   device = "cpu",
int   prioritization_strategy_code = 0,
int   batch_size = 32 
)
Parameters
buffer_sizeOptional[int]: The buffer size of the memory. No more than specified buffer elements are stored in the memory. Default: 32768
devicestr: The cuda on which models are currently running. Default: "cpu".
prioritization_strategy_codeint: Indicates code for prioritization strategy. Default: 0.
batch_sizeint: The batch size to be used for training cycle. Default: 32

Member Function Documentation

◆ __delitem__()

None rlpack._C.memory.Memory.__delitem__ (   self,
int  index 
)

Deletion method for memory.

Parameters
indexint: Index at which we want to delete an item. Note that this operation can be expensive depending on the size of memory; O(n).

◆ __getattr__()

Any rlpack._C.memory.Memory.__getattr__ (   self,
str  item 
)

Get attr method for memory.

Parameters
itemstr: The attributes that has been set during runtime (through setattr).
Returns
Any: The value for the item pass.

◆ __getitem__()

List[pytorch.Tensor] rlpack._C.memory.Memory.__getitem__ (   self,
int  index 
)

Indexing method for memory.

Parameters
indexint: The index at which we want to obtain the memory data.
Returns
List[pytorch.Tensor]: The transition as tensors from the memory.

◆ __getstate__()

Dict[str, Any] rlpack._C.memory.Memory.__getstate__ (   self)

Get state method for memory.

This makes this Memory class pickleable.

Returns
Dict[str, Any]: The state of the memory.

◆ __len__()

int rlpack._C.memory.Memory.__len__ (   self)

Length method for memory.

Returns
int: The size of the memory.

◆ __prepare_inputs_c_memory_()

Tuple[ pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, bool, ] rlpack._C.memory.Memory.__prepare_inputs_c_memory_ ( Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]]  state_current,
Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]]  state_next,
Union[pytorch.Tensor, np.ndarray, float]  reward,
Union[pytorch.Tensor, np.ndarray, float]  action,
Union[bool, int]  done,
Union[pytorch.Tensor, np.ndarray, float]  priority,
Union[pytorch.Tensor, np.ndarray, float]  probability,
Union[pytorch.Tensor, np.ndarray, float]  weight 
)
staticprivate

Prepares inputs to be sent to C++ backend.

Parameters
state_currentUnion[pytorch.Tensor, np.ndarray, List[Union[float, int]]]: The current state agent is in.
state_nextUnion[pytorch.Tensor, np.ndarray, List[Union[float, int]]]: The next state agent will go in for the specified action.
rewardUnion[np.ndarray, float]): The reward obtained in the transition.
actionUnion[np.ndarray, float]): The action taken for the transition.
doneUnion[bool, int]: Indicates weather episodes ended or not, i.e. if state_next is a terminal state or not.
priorityUnion[pytorch.Tensor, np.ndarray, float]): The priority of the transition: for priority relay memory). Default: None.
probabilityUnion[pytorch.Tensor, np.ndarray, float]): The probability of the transition : for priority relay memory). Default: None.
weightUnion[pytorch.Tensor, np.ndarray, float]): The important sampling weight of the transition: for priority relay memory). Default: None.
Returns
Tuple[ pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, bool ]): The tuple of in order of: state_current, state_next, reward, action, done, priority, probability, weight, is_terminal_state). is_terminal_state indicates if the state is terminal state or not: corresponds to done). All the input values associated with transition tuple are type-casted to PyTorch Tensors.

◆ __repr__()

str rlpack._C.memory.Memory.__repr__ (   self)

Repr method for memory.

Returns
str: String with object's memory location.

◆ __setattr__()

None rlpack._C.memory.Memory.__setattr__ (   self,
str  key,
Any  value 
)

Set attr method for memory.

Parameters
keystr: The desired attribute name.
valueAny: The value for corresponding key.

◆ __setitem__()

None rlpack._C.memory.Memory.__setitem__ (   self,
int  index,
Tuple[ Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]], Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]], Union[np.ndarray, float], Union[np.ndarray, float], Union[bool, int], Union[pytorch.Tensor, np.ndarray, float], Union[pytorch.Tensor, np.ndarray, float], Union[pytorch.Tensor, np.ndarray, float], ]  transition 
)

Set item method for the memory.

Parameters
indexint: index to insert.
transitionTuple[ Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]], Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]], Union[np.ndarray, float], Union[np.ndarray, float], Union[bool, int], Union[pytorch.Tensor, np.ndarray, float], Union[pytorch.Tensor, np.ndarray, float], Union[pytorch.Tensor, np.ndarray, float] ]: The transition tuple in the order: state_current, state_next, reward, action, done, priority, probability, weight).

◆ __setstate__()

None rlpack._C.memory.Memory.__setstate__ (   self,
Dict[str, Any]  state 
)

Set state method for the memory.

Parameters
stateDict[str, Any]: This method loads the states back to memory instance. This helps unpickle the Memory.

◆ __str__()

str rlpack._C.memory.Memory.__str__ (   self)

The str method for memory.

Useful for printing the memory. On calling print(memory), will print the transition information.

Returns
str: The dictionary with encapsulated data of memory.

◆ clear()

None rlpack._C.memory.Memory.clear (   self)

This method clear the memory and renders it empty.

◆ get_actions()

List[pytorch.Tensor] rlpack._C.memory.Memory.get_actions (   self)

This retrieves all the actions from transitions accumulated so far.

Returns
List[pytorch.Tensor]: A list of tensors with action values.

◆ get_dones()

List[pytorch.Tensor] rlpack._C.memory.Memory.get_dones (   self)

This retrieves all the dones from transitions accumulated so far.

Returns
List[pytorch.Tensor]: A list of tensors with done values.

◆ get_priorities()

List[float] rlpack._C.memory.Memory.get_priorities (   self)

This retrieves all the priorities for all the transitions, ordered by index.

Returns
List[float]: A list of priorities ordered by index.

◆ get_rewards()

List[pytorch.Tensor] rlpack._C.memory.Memory.get_rewards (   self)

This retrieves all the rewards from transitions accumulated so far.

Returns
List[pytorch.Tensor]: A list of tensors with reward values.

◆ get_states_current()

List[pytorch.Tensor] rlpack._C.memory.Memory.get_states_current (   self)

This retrieves all the current states from transitions accumulated so far.

Returns
List[pytorch.Tensor]: A list of tensors with current state values.

◆ get_states_next()

List[pytorch.Tensor] rlpack._C.memory.Memory.get_states_next (   self)

This retrieves all the next states from transitions accumulated so far.

Returns
List[pytorch.Tensor]: A list of tensors with next state values.

◆ get_terminal_state_indices()

List[int] rlpack._C.memory.Memory.get_terminal_state_indices (   self)

This retrieves the terminal state indices accumulated so far.

Returns
List[int]: The list of terminal state indices.

◆ get_transitions()

Dict[str, pytorch.Tensor] rlpack._C.memory.Memory.get_transitions (   self)

This retrieves all the transitions accumulated so far.

Returns
Dict[str, pytorch.Tensor]: A dictionary with all transition information.

◆ initialize()

None rlpack._C.memory.Memory.initialize (   self,
C_Memory.C_MemoryData  memory_data 
)

This loads the memory from the provided C_MemoryData instance.

Parameters
memory_dataC_Memory.C_MemoryData: The C_MemoryData instance to load the memory form.

◆ insert()

None rlpack._C.memory.Memory.insert (   self,
Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]]  state_current,
Union[pytorch.Tensor, np.ndarray, List[Union[float, int]]]  state_next,
Union[np.ndarray, float]  reward,
Union[np.ndarray, float]  action,
Union[bool, int]  done,
Optional[Union[pytorch.Tensor, np.ndarray, float]]   priority = 1.0,
Optional[Union[pytorch.Tensor, np.ndarray, float]]   probability = 1.0,
Optional[Union[pytorch.Tensor, np.ndarray, float]]   weight = 1.0 
)

This method performs insertion to the memory.

Parameters
state_currentUnion[pytorch.Tensor, np.ndarray, List[Union[float, int]]]: The current state agent is in.
state_nextUnion[pytorch.Tensor, np.ndarray, List[Union[float, int]]]: The next state agent will go in for the specified action.
rewardUnion[np.ndarray, float]: The reward obtained in the transition.
actionUnion[np.ndarray, float]: The action taken for the transition.
doneUnion[bool, int]: Indicates weather episodes ended or not, i.e. if state_next is a terminal state or not.
priorityOptional[Union[pytorch.Tensor, np.ndarray, float]]: The priority of the transition: for priority relay memory). Default: 1.0.
probabilityOptional[Union[pytorch.Tensor, np.ndarray, float]]: The probability of the transition : for priority relay memory). Default: 1.0.
weightOptional[Union[pytorch.Tensor, np.ndarray, float]]: The important sampling weight of the transition: for priority relay memory). Default: 1.0.

◆ num_terminal_states()

int rlpack._C.memory.Memory.num_terminal_states (   self)

Returns the number of terminal states.

Returns
int: Num of terminal states.

◆ sample()

Tuple[ pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, ] rlpack._C.memory.Memory.sample (   self,
float   force_terminal_state_probability = 0.0,
int   parallelism_size_threshold = 4096,
float   alpha = 0.0,
float   beta = 0.0,
int   num_segments = 1 
)

Load random samples from memory for a given batch.

Parameters
force_terminal_state_probabilityfloat: The probability for forcefully selecting a terminal state in a batch. Default: 0.0.
parallelism_size_thresholdint: The minimum size of memory beyond which parallelism is used to shuffle and retrieve the batch of sample. Default: 4096.
alphafloat: The alpha value for computation of probabilities. Default: 0.0.
betafloat: The beta value for computation of important sampling weights. Default: 0.0.
num_segmentsint: The number of segments to use to uniformly sample for rank-based prioritization.
Returns
: Tuple[ pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, pytorch.Tensor, ]: The tuple of tensors as: (states_current, states_next, rewards, actions, dones, priorities, probabilities, weights, random_indices).

◆ tree_height()

int rlpack._C.memory.Memory.tree_height (   self)

Returns the height of the Sum Tree when using prioritized memory.

This is only relevant when using prioritized buffer. Note that tree height is given as per buffer size and not as per number of elements.

Returns
int: The height of the tree.

◆ update_priorities()

None rlpack._C.memory.Memory.update_priorities (   self,
pytorch.Tensor  random_indices,
pytorch.Tensor  new_priorities 
)

This method updates the priorities when prioritized memory is used.

It will also update associated probabilities and important sampling weights.

Parameters
random_indicespytorch.Tensor: The list of random indices which were sampled previously. These indices are used to update the corresponding values. Must be a 1-D PyTorch Tensor.
new_prioritiespytorch.Tensor: The list of new priorities corresponding to random_indices passed.

◆ view()

C_Memory.C_MemoryData rlpack._C.memory.Memory.view (   self)

This method returns the view of Memory, i.e.

the data stored in the memory.

Returns
(C_Memory.C_MemoryData): The C_MemoryData object which packages the current memory information. This object is pickleable and data can also be accessed via attributes.

Field Documentation

◆ buffer_size

rlpack._C.memory.Memory.buffer_size

The input buffer size.

◆ c_memory

rlpack._C.memory.Memory.c_memory

The instance of C_Memory; the C++ backend of Memory class.

◆ device

rlpack._C.memory.Memory.device

The input device argument; indicating the device name.

◆ prioritization_strategy_code

rlpack._C.memory.Memory.prioritization_strategy_code

The input prioritization_strategy_code.

Refer rlpack.dqn.dqn_agent.DqnAgent.__init__() for more details