Fantastic baselines: why to add it in Actor-Critic

This problem has been long haunting me. Recently I try to understand and dive deep with the help of JJ. I’m still trying to figure out different kinds of understanding, please feel free to tell me if there is anything wrong or misleading.

Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

 

References:

  1. Going Deeper Into Reinforcement Learning: Fundamentals of Policy Gradients
  2. RL — Policy Gradient Explained
  3. Policy Gradient Algorithms