发表时间:2020(ICML 2020)
文章要点:通常一个RL的问题,dynamics都比value function和policy function更复杂,这个时候去学model的话通常还不如直接去学value和policy。但是文中给出了反例,就是dynamics比value和policy更简单,这种情况下去学model然后用planning的方式去做决策,就会比model free的方式更有优势。然后作者顺道提出了一个简单的planning算法a simple multi-step model-based bootstrapping planner (BOOTS),就是往前走多条轨迹,然后回过头来选最好的动作。
总结:主要是给出一个例子吧,然后给人更加直接的感受,什么时候model和planning有用,什么时候model free更好。
疑问:没有看证明。
相关文章
- 10-24ImageNet Classification with Deep Convolutional Neural Networks
- 10-24论文解读《ImageNet Classification with Deep Convolutional Neural Networks》
- 10-24树卷积神经网络Tree-CNN: A Deep Convolutional Neural Network for Lifelong Learning
- 10-24(翻译)DeepInspect: A Black-box * Detection and Mitigation Framework for Deep Neural Networks
- 10-24[Machine Learning] Neural Networks: Representation Quiz
- 10-24<Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks>阅读笔记
- 10-24Fundamentals of Deep Learning – Starting with Artificial Neural Network 翻译
- 10-24《ImageNet Classification with Deep Convolutional Neural Networks》翻译
- 10-24深度神经网络的多任务学习概览(An Overview of Multi-task Learning in Deep Neural Networks)
- 10-24论文笔记系列-Neural Architecture Search With Reinforcement Learning