When Bellman back-up can be computed exactly over the continuous action space ! Deep Deterministic Policy Gradient (DDPG) is a model-free off-policy algorithm for learning continous actions. Reduce the action space to a finite set. ! Our algorithm combines the spirits of both DQN (dealing with discrete action space) and DDPG (dealing with continuous action space) by … If the continuous action space is dis-cretized coarsely, the learned policy quality would decrease signicantly; otherwise, the discrete action space would in-crease exponentially. Discretized MDP ! However, many tasks of interest, especially physical control tasks, the action space is continuous. If you discretize the action space too finely, you wind up having an action space that is too large. Sometimes not needed: !
∙ The University of Texas at Austin ∙ 0 ∙ share . Introduction. Deep Reinforcement Learning. ! Deep Reinforcement Learning in Parameterized Action Space. Recent work has shown that deep neural networks are capable of approximating both value functions and policies in reinforcement learning domains featuring continuous state and action spaces. action space explosion. It combines ideas from DPG (Deterministic Policy Gradient) and DQN (Deep Q-Network). space dynamics model (“discretization”)! Recall that DQN (Deep Q-Network) stabilizes the learning of Q-function by experience replay and the frozen target network. 11/13/2015 ∙ by Matthew Hausknecht, et al. It uses Experience Replay and slow-learning target networks from DQN, and it is based on DPG, which can operate over continuous action spaces.
Original MDP (S, A, T, R, H) ! Grid the state-space: the vertices are the discrete states. The original DQN works in discrete space, and DDPG extends it to continuous space with the actor-critic framework while learning a deterministic policy. Deep Deterministic Policy Gradient (DDPG) algorithm.
Although DQN achieved huge success in higher dimensional problem, such as the Atari game, the action space is still discrete.
In DDPG, we want the believed best action … In this paper, we propose a parametrized deep Q-network (P- DQN) framework for the hybrid action space without approximation or relaxation. There are many other aspects to consider when using an RL algorithm, such as: how difficult is the problem to be solved, how many dimensions is the action space, does the state include images, etc.