How to Combine Tree-Search Methods in Reinforcement Learning

郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布!

How to Combine Tree-Search Methods in Reinforcement Learning

 

AAAI 2019 Best Paper

 

Abstract

 

1 Introduction

 

2 Preliminaries

 

3 The h-Greedy Policy and h-PI

 

4 h-Greedy Consistency

 

5 The h-Greedy Policy Alone is Not Sufficient For Partial Evaluation

 

6 Backup the Tree-Search Byproducts

 

7 Relation to ExistingWork

 

8 Experiments

 

9 Summary and Future Work

 

A Proof of Lemma 1

 

B Affinity of Tπ and Consequences

 

C Proof of Proposition 2

 

D Proof of Theorem 3

 

E h-Greedy Consistency in Each Iteration

 

F A Note on the Alternative λ-Return Operator

 

G More Experimental Results

 

上一篇:用Python从零开始创建区块链


下一篇:filecoin benchmarks v25 GeForce GTX 1080 Ti