Sutton 出版论文的主页:
http://incompleteideas.net/publications.html
最近在做强化学习方面的课题, 发现在强化学习方面被称作强化学习之父的 Sutton 确实很厉害, TD算法和策略梯度策略算法都是他所提出的, 虽然Reinforcement learning 的现在框架是从 Q-learning 开始确定的,但是强化学习做的最早的人之一,对强化学习中经典思想的贡献最多的人估计就是Sutton了,Sutton本硕都是在MIT读的心理学,博士阶段才读的计算机,看来确实是很强的。作为强化学习最经典的论文,也是Sutton的博士毕业论文,很是值得读一读的,寻找该篇论文许久,发现可能是由于该篇论文发表的时间过久,所以所有的数据库都没有收录,唯一收入的应该是Sutton的博士授予的大学 Massachusetts 马萨诸塞州大学,但是由于该文章只向本校学生开发,所以找了几天都没有找到,今天灵机一动,为什么不到作者的个人主页上找一找呢,这一弄还果然发现了它的存在,特此mark一下。
----------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------
附:(Sutton主页 Publication部分内容)
Rich Sutton's Publications
First, a quick guide to the highlights, roughly in order of the work's popularity or potential current interest:
-
The
2nd edition of Reinforcement Learning: An Introduction
-
Emphatic TD(λ); Yu's convergence proof
-
Weighted importance sampling
version of LSTD(λ), linear-complexity algorithms
- True online TD(λ)
-
The predictive
approach to knowledge representation; PEAK; Horde; nexting
-
Fast gradient-based TD algorithms, nonlinear case, GQ(lambda),
control, Maei's thesis
-
RL book
- Temporal-difference learning; TD(lambda) details
-
The
TD model of Pavlovian conditioning; earlier Sutton-Barto
model; more biological 1982
& 1986;
and instrumental
learning
-
Dyna; as an integrated
architecture; with
FA 1996, 2008
- The options paper; UAV example; precursor not superseded;
- Policy gradient methods; Incremental Natural Actor-Critic Algorithms
-
PhD thesis, introduced actor-critic
architectures and "temporal credit assignment"
- PSRs; the predictive representations hypothesis; TD networks; with options
- RL for RoboCup soccer keepaway
- RL with continuous state and action
spaces
-
Step-size
adaptation by meta-gradient descent; IDBD; improved; earliest pub; in classical conditioning; in human category
learning, in
tracking
- Random representations; representation search; feature discovery; more
- Pole-balancing; tracking nonstationarity
- Exponentiated-gradient RL; fuller TR
- A study in alpha and lambda
- Two problems with backprop
Also, some RL pubs that aren't mine, available for researchers:
- Chris Watkins's thesis
-
Boyan's LSTD(lambda),
1999
- Barto and Bradtke LSTD, 1996
- Williams, 1992
-
Lin, 1992
- Ross, 1983, chapter 2
- Minsky, 1960, Steps to AI
- Good, 1965, Speculations concerning the first ultraintelligent machine
- Selfridge, 1958, Pandemonium
- Samuel, 1959
- Dayan, 1992
- Tesauro, 1992, TD-Gammon
- Watkins and Dayan, 1992
- Hamid Maei's PhD thesis, 2011
- Masoud Shahamiri's MSc thesis, 2008
- Janey Yu's proof of convergence of Emphatic TD(λ)
- Adam White's PhD thesis
- David Silver's PhD thesis
-
Brian Tanner's MSc thesis
- Kavosh Asadi's MSc thesis
- Travis Dick's MSc thesis
- Eddie Rafols MSc thesis
- Anna Koop's MSc thesis
- Leah Hackman's MSc thesis
- Mike Delp's MSc thesis
- MahdiehSadat Mirian HosseinAbadi's MSc thesis
- Gurvitz, Lin, and Hanson, 1995
- Rupam Mahmood's PhD thesis, 2017
- An, Miller, and Parks (1991)
-
Intro to Andreae (2017)
and Andreae (2017)
For any broken links, please send email to
rich@richsutton.com.