RLPack
 
Loading...
Searching...
No Matches
C_Memory Class Reference

The class C_Memory is the C++ backend for memory-buffer used in algorithms that stores transitions in a buffer. This class contains optimized routines to support Python front-end of rlpack._C.memory.Memory class. More...

+ Collaboration diagram for C_Memory:

Data Structures

struct  C_MemoryData
 The class C_MemoryData keeps the references to data that is associated with C_Memory. This class implements the functions necessary to retrieve the data by de-referencing the data associated with C_Memory. More...
 

Public Member Functions

 C_Memory ()
 
 C_Memory (int64_t bufferSize, const std::string &device, const int32_t &prioritizationStrategyCode, const int32_t &batchSize)
 
void clear ()
 
void delete_item (int64_t index)
 
std::map< std::string, torch::Tensor > get_item (int64_t index)
 
void initialize (C_MemoryData &viewC_Memory)
 
void insert (torch::Tensor &stateCurrent, torch::Tensor &stateNext, torch::Tensor &reward, torch::Tensor &action, torch::Tensor &done, torch::Tensor &priority, torch::Tensor &probability, torch::Tensor &weight, bool isTerminalState)
 
int64_t num_terminal_states ()
 
std::map< std::string, torch::Tensor > sample (float_t forceTerminalStateProbability, int64_t parallelismSizeThreshold, float_t alpha=0.0, float_t beta=0.0, int64_t numSegments=0)
 
void set_item (int64_t index, torch::Tensor &stateCurrent, torch::Tensor &stateNext, torch::Tensor &reward, torch::Tensor &action, torch::Tensor &done, torch::Tensor &priority, torch::Tensor &probability, torch::Tensor &weight, bool isTerminalState)
 
size_t size ()
 
int64_t tree_height ()
 
void update_priorities (torch::Tensor &randomIndices, torch::Tensor &newPriorities)
 
C_MemoryData view () const
 
 ~C_Memory ()
 

Data Fields

std::shared_ptr< C_MemoryDatacMemoryData
 Shared Pointer to C_Memory::C_MemoryData. More...
 

Static Private Member Functions

static torch::Tensor compute_important_sampling_weights (torch::Tensor &probabilities, int64_t currentSize, float_t beta)
 
static torch::Tensor compute_probabilities (torch::Tensor &priorities, float_t alpha)
 

Private Attributes

std::deque< torch::Tensor > actions_
 Deque of torch tensors for actions. More...
 
int32_t batchSize_ = 32
 The batch size that is set during class initialisation. Number of samples equivalent to this are selected during sampling. More...
 
int64_t bufferSize_ = 32768
 Buffer size passed during the class initialisation. Defaults to 32768. More...
 
torch::Device device_ = torch::kCPU
 Torch device passed during class initialisation. Defaults to CPU. More...
 
std::map< std::string, torch::DeviceType > deviceMap_
 The map between std::string and torch::DeviceType; mapping the device name in string to DeviceType. More...
 
std::deque< torch::Tensor > dones_
 Deque of torch tensors for dones. More...
 
std::vector< int64_t > loadedIndices_
 Vector of loaded indices. This indicates the indices that have been loaded out of total capacity of the memory. More...
 
std::vector< int64_t > loadedIndicesSlice_
 The loaded indices slice; the slice of indices that is sampled during sampling process. In each sampling size its size is equal to C_Memory::batchSize_. More...
 
Offload< float_t > * offloadFloat_
 Offload class initialised with float template. More...
 
Offload< int64_t > * offloadInt64_
 Offload class initialised with int64 template. More...
 
std::deque< torch::Tensor > priorities_
 Deque of torch tensors for priorities. More...
 
std::deque< float_t > prioritiesFloat_
 Deque of float indicating the priorities in C++ float. Values are obtained from C_Memory::priorities_. More...
 
int32_t prioritizationStrategyCode_ = 0
 The prioritization strategy code that is being. This determines the sampling technique that is employed. Refer rlpack.dqn.dqn.Dqn.get_prioritization_code. More...
 
std::deque< torch::Tensor > probabilities_
 Deque of torch tensors for probabilities. More...
 
std::deque< torch::Tensor > rewards_
 Deque of torch tensors for rewards. More...
 
std::vector< torch::Tensor > sampledActions_
 The sampled action tensors from C_Memory::actions_. More...
 
std::vector< torch::Tensor > sampledDones_
 The done tensors from C_Memory::dones_. More...
 
std::vector< torch::Tensor > sampledIndices_
 The sampled indices as tensors from C_Memory::loadedIndices_. More...
 
std::vector< torch::Tensor > sampledPriorities_
 The sampled priority tensors from C_Memory::priorities. More...
 
std::vector< torch::Tensor > sampledRewards_
 The sampled reward tensors from C_Memory::rewards_. More...
 
std::vector< torch::Tensor > sampledStateCurrent_
 The sampled current state tensors from C_Memory::statesCurrent_. More...
 
std::vector< torch::Tensor > sampledStateNext_
 The sampled next state tensors from C_Memory::statesNext_. More...
 
std::vector< float_t > seedValues_
 The seed values generated during each sampling cycle for proportional based prioritization. More...
 
std::vector< int64_t > segmentQuantileIndices_
 The Quantile segment indices sampled when rank-based prioritization is used. More...
 
std::deque< torch::Tensor > statesCurrent_
 Deque of torch tensors for current states. More...
 
std::deque< torch::Tensor > statesNext_
 Deque of torch tensors for next states. More...
 
int64_t stepCounter_ = 0
 The counter variable the tracks the loaded indices in sync with total timesteps. Once memory reaches the buffer size, this will not update. More...
 
std::shared_ptr< SumTreesumTreeSharedPtr_
 Shared Pointer to SumTree class object. More...
 
std::deque< int64_t > terminalStateIndices_
 Deque of integers indicating the indices of terminal states. More...
 
std::deque< torch::Tensor > weights_
 Deque of torch tensors for weights. More...
 

Detailed Description

The class C_Memory is the C++ backend for memory-buffer used in algorithms that stores transitions in a buffer. This class contains optimized routines to support Python front-end of rlpack._C.memory.Memory class.

A memory index refers to an index that yields a transition from C_Memory. This works by indexing the following variables and grouping them together:

Constructor & Destructor Documentation

◆ C_Memory() [1/2]

C_Memory::C_Memory ( )

The default non-parameterised constructor. This constructor allocates memory as per default initialised variables. This initialises the rlpack._C.memory.Memory.c_memory and is equivalent to rlpack._C.memory.Memory.__init__.

◆ C_Memory() [2/2]

C_Memory::C_Memory ( int64_t  bufferSize,
const std::string &  device,
const int32_t &  prioritizationStrategyCode,
const int32_t &  batchSize 
)
explicit

The class constructor for C_Memory. This constructor initialised the C_Memory class and allocates the required memory as per input arguments. This initialises the rlpack._C.memory.Memory.c_memory and is equivalent to rlpack._C.memory.Memory.__init__.

Parameters
bufferSize: The buffer size to be used and allocated for the memory.
device: The device transfer relevant tensors to.
prioritizationStrategyCode: The prioritization strategy code. Refer rlpack.dqn.dqn.Dqn.get_prioritization_code.
batchSize: The batch size to be used for sampling.

◆ ~C_Memory()

C_Memory::~C_Memory ( )

The destructor for C_Memory.

Member Function Documentation

◆ clear()

void C_Memory::clear ( )

Clears the data in C_Memory. This will NOT free the memory since it doesn't perform any memory de-allocation. This is C++ backend of rlpack._C.memory.Memory.clear method.

◆ compute_important_sampling_weights()

torch::Tensor C_Memory::compute_important_sampling_weights ( torch::Tensor &  probabilities,
int64_t  currentSize,
float_t  beta 
)
staticprivate

Method to compute the important sampling weights for each probabilities.

Parameters
probabilities: The input probabilities for which IS weights are to be computed.
currentSize: The current size of the C_Memory (see C_Memory::size)
beta: The beta value for prioritization. Refer C_Memory::sample for more information.
Returns
The tensor with important sampling weights corresponding to each probability.

◆ compute_probabilities()

torch::Tensor C_Memory::compute_probabilities ( torch::Tensor &  priorities,
float_t  alpha 
)
staticprivate

Method to compute probabilities when not using uniform prioritization strategy.

Parameters
priorities: The sampled priorities for which probabilities are to be computed.
alpha: The alpha value for prioritization. Refer C_Memory::sample for more information.
Returns
The tensor with probabilities corresponding to each priority.

◆ delete_item()

void C_Memory::delete_item ( int64_t  index)

Deletion method for C_Memory. This is the C++ backend of rlpack._C.memory.Memory.__delitem__ so can be accessed by simple indexing operation (with operator []; del memory[index]) from Python side.

This the deletion is fast if index is either the first or last element, else will take O(n) to allocate memory for items after index.

Parameters
index: The index of the transition we want to remove.

◆ get_item()

std::map< std::string, torch::Tensor > C_Memory::get_item ( int64_t  index)

Getter method for C_Memory. This is the C++ backend of rlpack._C.memory.Memory.__getitem__ method so can be accessed by simple indexing operation (with operator []; item = memory[index]) from Python side.

Parameters
index: The index from which we want to obtain the transition
Returns
A map of transition quantities. The map will contain the following keys:
  • states_current
  • states_next
  • rewards
  • actions
  • dones
  • priorities
  • probabilities
  • weights

◆ initialize()

void C_Memory::initialize ( C_Memory::C_MemoryData viewC_MemoryData)

Initialize method for C_Memory for initializing all the data from an object of C_Memory::C_MemoryData. This is the C++ backend of rlpack._C.memory.Memory.initialize method

Parameters
viewC_MemoryData: An object of C_Memory::C_MemoryData.

◆ insert()

void C_Memory::insert ( torch::Tensor &  stateCurrent,
torch::Tensor &  stateNext,
torch::Tensor &  reward,
torch::Tensor &  action,
torch::Tensor &  done,
torch::Tensor &  priority,
torch::Tensor &  probability,
torch::Tensor &  weight,
bool  isTerminalState 
)

Insertion method for C_Memory. This is the C++ backend of rlpack._C.memory.Memory.insert method.

Parameters
stateCurrent: Current state from transition
stateNext: Next state from transition.
reward: Reward obtained during transition.
action: Action taken during transition.
done: Flag indicating if next state is terminal packaged in PyTorch Tensor.
priority: Priority value associated with the transition.
probability: Probability value associated with the transition.
weight: Weight value associated with the transition.
isTerminalState: Flag indicating if next state is terminal.

◆ num_terminal_states()

int64_t C_Memory::num_terminal_states ( )

Method to obtain the number of terminal states currently in C_Memory. This is the C++ backend of rlpack._C.memory.Memory.num_terminal_states method.

Returns
Number of terminal states so far.

◆ sample()

std::map< std::string, torch::Tensor > C_Memory::sample ( float_t  forceTerminalStateProbability,
int64_t  parallelismSizeThreshold,
float_t  alpha = 0.0,
float_t  beta = 0.0,
int64_t  numSegments = 0 
)

The sampling method for C_Memory. This is the C++ backend of rlpack._C.memory.Memory.sample. Sampling is done as per the prioritization strategy specified during initialisation of C_Memory.

Parameters
forceTerminalStateProbability: The probability to force a terminal state in final sample.
parallelismSizeThreshold: The threshold size of buffer (from C_Memory::size method) beyond with OpenMP parallelized routines are used for sampling.
alpha: The alpha value for prioritization. This is used to compute probabilities, where higher alpha indicates more aggressive prioritization.
beta: The beta value for prioritization. This is used to compute important sampling weights, where higher beta indicates more aggressive bias correction.
numSegments: The number of segments to be used for rank-based prioritization (in accordance with Zipf's law)
Returns
A map of sampled transitions separated by quantities. The map has the following keys with each key containing a tensor of shape (batchSize, ...):
  • states_current
  • states_next
  • rewards
  • actions
  • dones
  • priorities
  • probabilities
  • weights

◆ set_item()

void C_Memory::set_item ( int64_t  index,
torch::Tensor &  stateCurrent,
torch::Tensor &  stateNext,
torch::Tensor &  reward,
torch::Tensor &  action,
torch::Tensor &  done,
torch::Tensor &  priority,
torch::Tensor &  probability,
torch::Tensor &  weight,
bool  isTerminalState 
)

Setter method for C_Memory. This is the C++ backend of rlpack._C.memory.Memory.__setitem__ method so can be accessed by simple indexing operation (with operator []; memory[index] = index) from Python side. This method modified the items at the given index.

Parameters
index: The index to which we want to set the transition.
stateCurrent: Current state from transition
stateNext: Next state from transition.
reward: Reward obtained during transition.
action: Action taken during transition.
done: Flag indicating if next state is terminal packaged in PyTorch Tensor.
priority: Priority value associated with the transition.
probability: Probability value associated with the transition.
weight: Weight value associated with the transition.
isTerminalState: Flag indicating if next state is terminal.

◆ size()

size_t C_Memory::size ( )

This method obtains the current size of C_Memory. This is the C++ backend of rlpack._C.memory.Memory.__len__ method, so length can be obtained by in-built python function len(memory).

Returns
The size(or length) of C_Memory.

◆ tree_height()

int64_t C_Memory::tree_height ( )

Method to obtain the tree height of the sum tree if using a proportional prioritization strategy. This is the C++ backend of rlpack._C.memory.Memory.tree_height. If not using proportional prioritization strategy, calling this method will throw an error.

Returns
The tree height of the tree built.

◆ update_priorities()

void C_Memory::update_priorities ( torch::Tensor &  randomIndices,
torch::Tensor &  newPriorities 
)

The method to update priorities as per new values computed by agent as per the prioritization strategy. This is the C++ backend of rlpack._C.memory.Memory.update_priorities method.

Parameters
randomIndices: The random indices on which priorities are required to be updated. C_Memory::sample provides this information which can be used.
newPriorities: The new priorities computed by the agent as per the prioritization strategy.

◆ view()

C_Memory::C_MemoryData C_Memory::view ( ) const

The pointer to C_Memory::C_MemoryData object. This will contain references of data in C_Memory and provides an easy data view. This is the C++ backend of rlpack._C.memory.Memory.view method.

Field Documentation

◆ actions_

std::deque<torch::Tensor> C_Memory::actions_
private

Deque of torch tensors for actions.

◆ batchSize_

int32_t C_Memory::batchSize_ = 32
private

The batch size that is set during class initialisation. Number of samples equivalent to this are selected during sampling.

◆ bufferSize_

int64_t C_Memory::bufferSize_ = 32768
private

Buffer size passed during the class initialisation. Defaults to 32768.

◆ cMemoryData

std::shared_ptr<C_MemoryData> C_Memory::cMemoryData

Shared Pointer to C_Memory::C_MemoryData.

◆ device_

torch::Device C_Memory::device_ = torch::kCPU
private

Torch device passed during class initialisation. Defaults to CPU.

◆ deviceMap_

std::map<std::string, torch::DeviceType> C_Memory::deviceMap_
private
Initial value:
{
{"cpu", torch::kCPU},
{"cuda", torch::kCUDA},
{"mps", torch::kMPS}}

The map between std::string and torch::DeviceType; mapping the device name in string to DeviceType.

◆ dones_

std::deque<torch::Tensor> C_Memory::dones_
private

Deque of torch tensors for dones.

◆ loadedIndices_

std::vector<int64_t> C_Memory::loadedIndices_
private

Vector of loaded indices. This indicates the indices that have been loaded out of total capacity of the memory.

◆ loadedIndicesSlice_

std::vector<int64_t> C_Memory::loadedIndicesSlice_
private

The loaded indices slice; the slice of indices that is sampled during sampling process. In each sampling size its size is equal to C_Memory::batchSize_.

◆ offloadFloat_

Offload<float_t>* C_Memory::offloadFloat_
private

Offload class initialised with float template.

◆ offloadInt64_

Offload<int64_t>* C_Memory::offloadInt64_
private

Offload class initialised with int64 template.

◆ priorities_

std::deque<torch::Tensor> C_Memory::priorities_
private

Deque of torch tensors for priorities.

◆ prioritiesFloat_

std::deque<float_t> C_Memory::prioritiesFloat_
private

Deque of float indicating the priorities in C++ float. Values are obtained from C_Memory::priorities_.

◆ prioritizationStrategyCode_

int32_t C_Memory::prioritizationStrategyCode_ = 0
private

The prioritization strategy code that is being. This determines the sampling technique that is employed. Refer rlpack.dqn.dqn.Dqn.get_prioritization_code.

◆ probabilities_

std::deque<torch::Tensor> C_Memory::probabilities_
private

Deque of torch tensors for probabilities.

◆ rewards_

std::deque<torch::Tensor> C_Memory::rewards_
private

Deque of torch tensors for rewards.

◆ sampledActions_

std::vector<torch::Tensor> C_Memory::sampledActions_
private

The sampled action tensors from C_Memory::actions_.

◆ sampledDones_

std::vector<torch::Tensor> C_Memory::sampledDones_
private

The done tensors from C_Memory::dones_.

◆ sampledIndices_

std::vector<torch::Tensor> C_Memory::sampledIndices_
private

The sampled indices as tensors from C_Memory::loadedIndices_.

◆ sampledPriorities_

std::vector<torch::Tensor> C_Memory::sampledPriorities_
private

The sampled priority tensors from C_Memory::priorities.

◆ sampledRewards_

std::vector<torch::Tensor> C_Memory::sampledRewards_
private

The sampled reward tensors from C_Memory::rewards_.

◆ sampledStateCurrent_

std::vector<torch::Tensor> C_Memory::sampledStateCurrent_
private

The sampled current state tensors from C_Memory::statesCurrent_.

◆ sampledStateNext_

std::vector<torch::Tensor> C_Memory::sampledStateNext_
private

The sampled next state tensors from C_Memory::statesNext_.

◆ seedValues_

std::vector<float_t> C_Memory::seedValues_
private

The seed values generated during each sampling cycle for proportional based prioritization.

◆ segmentQuantileIndices_

std::vector<int64_t> C_Memory::segmentQuantileIndices_
private

The Quantile segment indices sampled when rank-based prioritization is used.

◆ statesCurrent_

std::deque<torch::Tensor> C_Memory::statesCurrent_
private

Deque of torch tensors for current states.

◆ statesNext_

std::deque<torch::Tensor> C_Memory::statesNext_
private

Deque of torch tensors for next states.

◆ stepCounter_

int64_t C_Memory::stepCounter_ = 0
private

The counter variable the tracks the loaded indices in sync with total timesteps. Once memory reaches the buffer size, this will not update.

◆ sumTreeSharedPtr_

std::shared_ptr<SumTree> C_Memory::sumTreeSharedPtr_
private

Shared Pointer to SumTree class object.

◆ terminalStateIndices_

std::deque<int64_t> C_Memory::terminalStateIndices_
private

Deque of integers indicating the indices of terminal states.

◆ weights_

std::deque<torch::Tensor> C_Memory::weights_
private

Deque of torch tensors for weights.