强化学习策略梯度方法之: REINFORCE 算法
2017-03-26 15:57:56
最近在看policy gradient algorithm, 关于公式推导部分有一个 似然比例技巧 (the likelihood ratio trick). 网上有这么一个解释:
link: http://www.tuananhle.co.uk/notes/reinforce.html
现在,我们回过头来,再来看 REINFORCE:
from this blog, we can know a little about this algorithm: http://www.scholarpedia.org/article/Policy_gradient_methods