Actor-Critic methods are provided in RLPack via rlpack.actor_critic package. In-Built can be used with Actor Critic agents to train an agent on the fly. Currently following variants have been implemented:
Method | Description | Keyword |
---|---|---|
A2C | Synchronous Actor Critic Method | "a2c" |
A3C | Asynchronous Actor Critic Method | "`a3c`" |
Actor-Critic methods implemented in RLPack support both continuous and discrete action spaces. To support both types of action spaces simultaneously, RLPack provides an argument distribution
for actor critic methods to sample actions from. Currently, following distributions are available and accessible by keyword when using simulators.
Distribution | Description | Keyword |
---|---|---|
Normal | The Normal distribution (for continuous action spaces). More info can be found here. | "normal" |
LogNormal | The LogNormal distribution (for continuous action spaces). More info can be found here. | "log_normal" |
multivariate_normal | The Multivariate Normal distribution (for continuous action spaces). More info can be found [here] (https://pytorch.org/docs/stable/distributions.html#multivariatenormal) | "multivariate_normal" |
Categorical | The Categorical distribution (for discrete action spaces). More info can be found here. | "categorical" |
Binomial | The Binomial distribution (for discrete action spaces). More info can be found here. | "binomial" |
Bernoulli | The Bernoulli distribution (for discrete action spaces). More info can be found here | "bernoulli" |
Since Actor Critic implementations in RLPack support both continuous and discrete action spaces, all methods have an argument action_space
. You must pass this argument in the following way:
distribution
, hence make sure to check the arguments for the distribution you are passing. Generally the policy model in continuous case must output the statistics for probability distribution selected.(1,)
, then the action_space
would be [2, [1]]
, where 2 represents the no. of features in output of the policy model (for loc (mean) and scale (standard deviation), and [1]
is the desired action shape.4
.