Sutton 出版论文的主页:
http://incompleteideas.net/publications.html
Phd 论文: temporal credit assignment in reinforcement learning
http://incompleteideas.net/publications.html#PhDthesis
最近在做强化学习方面的课题, 发现在强化学习方面被称作强化学习之父的 Sutton 确实很厉害, TD算法和策略梯度策略算法都是他所提出的, 虽然Reinforcement learning 的现在框架是从 Q-learning 开始确定的,但是强化学习做的最早的人之一,对强化学习中经典思想的贡献最多的人估计就是Sutton了,Sutton本硕都是在MIT读的心理学,博士阶段才读的计算机,看来确实是很强的。作为强化学习最经典的论文,也是Sutton的博士毕业论文,很是值得读一读的,寻找该篇论文许久,发现可能是由于该篇论文发表的时间过久,所以所有的数据库都没有收录,唯一收入的应该是Sutton的博士授予的大学 Massachusetts 马萨诸塞州大学,但是由于该文章只向本校学生开发,所以找了几天都没有找到,今天灵机一动,为什么不到作者的个人主页上找一找呢,这一弄还果然发现了它的存在,特此mark一下。
----------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------
附:(Sutton主页 Publication部分内容)
Rich Sutton's Publications
First, a quick guide to the highlights, roughly in order of the work's popularity or potential current interest:
- The
2nd edition of Reinforcement Learning: An Introduction - Emphatic TD(λ); Yu's convergence proof
-
Weighted importance sampling
version of LSTD(λ), linear-complexity algorithms - True online TD(λ)
-
The predictive
approach to knowledge representation; PEAK; Horde; nexting -
Fast gradient-based TD algorithms, nonlinear case, GQ(lambda),
control, Maei's thesis - RL book
- Temporal-difference learning; TD(lambda) details
-
The
TD model of Pavlovian conditioning; earlier Sutton-Barto
model; more biological 1982
& 1986;
and instrumental
learning -
Dyna; as an integrated
architecture; with
FA 1996, 2008 - The options paper; UAV example; precursor
not superseded; -
Policy gradient methods; Incremental Natural
Actor-Critic Algorithms -
PhD thesis, introduced actor-critic
architectures and "temporal credit assignment" -
PSRs; the
predictive
representations hypothesis; TD networks;
with options - RL for RoboCup soccer keepaway
- RL with continuous state and action
spaces -
Step-size
adaptation by meta-gradient descent; IDBD; improved; earliest pub; in classical conditioning; in human category
learning, in
tracking - Random representations; representation search; feature discovery; more
-
Pole-balancing;
tracking nonstationarity - Exponentiated-gradient RL; fuller TR
- A study in alpha and lambda
- Two problems with backprop
Also, some RL pubs that aren't mine, available for researchers:
- Chris Watkins's thesis
- Boyan's LSTD(lambda),
1999 - Barto and Bradtke LSTD, 1996
- Williams, 1992
- Lin, 1992
- Ross, 1983, chapter 2
- Minsky, 1960, Steps to AI
- Good, 1965, Speculations
concerning the first ultraintelligent machine - Selfridge, 1958, Pandemonium
- Samuel, 1959
- Dayan, 1992
- Tesauro, 1992, TD-Gammon
- Watkins and Dayan, 1992
- Hamid Maei's PhD thesis,
2011 - Masoud Shahamiri's MSc
thesis, 2008 - Janey Yu's proof of
convergence of Emphatic TD(λ) - Adam
White's PhD thesis - David
Silver's PhD thesis - Brian Tanner's MSc thesis
- Kavosh Asadi's MSc thesis
- Travis Dick's MSc thesis
- Eddie Rafols MSc thesis
- Anna Koop's MSc thesis
- Leah Hackman's MSc thesis
- Mike Delp's MSc thesis
- MahdiehSadat Mirian HosseinAbadi's
MSc thesis - Gurvitz, Lin, and
Hanson, 1995 - Rupam Mahmood's PhD thesis, 2017
- An, Miller, and Parks
(1991) -
Intro to Andreae (2017)
and Andreae (2017)
For any broken links, please send email to
rich@richsutton.com.