发表时间:2018
文章要点:文章想说RL很容易overfitting,然后就提出某个方式来判断是不是overfitting了。最后得出结论,通过多样化的训练可以减少overfitting(as soon as there is enough training data diversity)。作者其实就定义了一个指标
其实就是train时候的reward减去test时候的reward。然后做的实验就是改变随机种子的数量,结果就是随机种子数量越多,overfitting就越少。还有一些实验比如给reward加点随机性之类的,就不多说了。
总结:这篇文章估计是个大作业吧,不然也太夸张了。但是看他的引用居然有80,这就。。。
疑问:说实在的,这个指标真的能判断overfitting吗?万一都没有train好,或者说环境太难本来就训不上去,导致train和test表现都很差,然后算出来为0,这也算没有泛化误差吗?
相关文章
- 10-15Improving Generalization in Reinforcement Learning with Mixture Regularization
- 10-15Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space—Fundamental Theor
- 10-15Reinforcement Learning in Continuous Time and Space
- 10-15Deep Reinforcement Learning with Population-Coded Spiking Neural Network for Continuous Control
- 10-15DDPG:CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING
- 10-15Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning
- 10-15A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning
- 10-15Reinforcement Learning in Continuous State and Action Spaces: A Brief Note