(元)强化学习相关开源代码

本地代码:https://github.com/lucifer2859/meta-RL

元强化学习简介:https://www.cnblogs.com/lucifer1997/p/13603979.html

一、Meta-RL

1、Learning to Reinforcement Learn:CogSci 2017

https://github.com/awjuliani/Meta-RL
环境:TensorFlow,CPU;
任务:Dependent(Easy, Medium, Hard, Uniform)/Independent/Restless Bandit,Contextual Bandit,GridWorld
A3C-Meta-Bandit - Set of bandit tasks described in paper. Including: Independent, Dependent, and Restless bandits.
A3C-Meta-Context - Rainbow bandit task using randomized colors to indicate reward-giving arm in each episode.
A3C-Meta-Grid - Rainbow Gridworld task; a variation of gridworld in which goal colors are randomzied each episode and must be learned “on the fly.”
模型:one-layer LSTM A3C [Figure 1(a),无Enc层];
实验:成功运行,无bug;训练收敛;结果大致相符;性能未达到论文效果(当前超参数);本地代码对其略有修改,参见https://github.com/lucifer2859/meta-RL/tree/master/Meta-RL;

https://github.com/achao2013/Learning-To-Reinforcement-Learn
环境:MXNet,CPU;
任务:Dependent(Easy, Medium, Hard, Uniform)/Independent/Restless Bandit;
模型:multi-layer LSTM A3C[无Enc层];
实验:未运行;
https://github.com/lucifer2859/meta-RL/tree/master/L2RL-pytorch
环境:PyTorch,CPU;
任务:Dependent(Easy, Medium, Hard, Uniform)/Independent/Restless Bandit;
模型:one-layer LSTM A3C [Figure 1(a),with GAE,无Enc层];
实验:成功运行,无bug;训练收敛;结果大致相符;性能未达到论文效果(当前超参数);

2、RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning (RL2):ICLR 2017

https://github.com/mwufi/meta-rl-bandits
环境:PyTorch,CPU;
任务:Independent Bandit;
模型:two-layer LSTM REINFORCE;
实验:成功运行,无bug;模型与论文不符,原文RNN模型为GRU;训练不收敛(当前超参数);
https://github.com/VashishtMadhavan/rl2
环境:TensorFlow,CPU;
任务:Dependent Bandit;
模型:one-layer LSTM A3C [无Enc层];
实验:运行失败,gym.error.UnregisteredEnv: No registered env with id: MediumBandit-v0;

3、Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (MAML):ICML 2017

https://github.com/tristandeleu/pytorch-maml-rl
环境:PyTorch,GPU;
任务:Multi-armed Bandit,Tabular MDP,Continuous Control with MuJoCo,2D Navigation Task;
模型:MAML TRPO;
实验:初始运行失败,terminate called after throwing an instance of ‘c10::Error’;参见https://github.com/tristandeleu/pytorch-maml-rl/issues/40#issuecomment-632598191即可解决;但是出现新问题(AttributeError: Can’t pickle local object ‘make_env.._make_env’);参见https://github.com/tristandeleu/pytorch-maml-rl/issues/51即可解决;最终成功运行train.py,但test.py运行失败;bandit-k5-n10不收敛(当前超参数);
https://github.com/cbfinn/maml_rl
环境:the TensorFlow rllab version,CPU;
任务:MuJoCo;
模型:MAML TRPO;
实验:未运行;

4、Evolved Policy Gradients (EPG):NeurIPS, 2018

https://github.com/openai/EPG
环境:Chainer,CPU;
任务:MuJoCo;
模型:EPG PPO;
实验:未运行;

5、A Simple Neural Attentive Meta-Learner:ICLR 2018

https://github.com/chanb/metalearning_RL
环境:PyTorch,GPU;
任务:Multi-armed Bandit,Tabular MDP;
模型:SNAIL,RL2(GRU)+ PPO;
实验:成功运行,无bug;

6、Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables (PEARL):arXiv: Learning, 2019

https://github.com/katerakelly/oyster
环境:PyTorch,GPU;
任务:MuJoCo;
模型:PEARL (SAC-based);
实验:Docker配置过程中运行docker build . -t pearl失败;放弃Docker配置在本地对相关包进行安装后,可以成功运行;使用本地包需要提前加一句:conda config --set restore_free_channel true,不然找不到大部分特定版本的包,就会导致创建环境失败;相关问题可以咨询Chains朱朱的主页 - 博客园 (cnblogs.com);

7、Improving Generalization in Meta Reinforcement Learning using Learned Objectives (MetaGenRL): ICLR 2020

http://louiskirsch.com/code/metagenrl
环境:TensorFlow,GPU;
任务:MuJoCo;
模型:MetaGenRL;
实验:在tensorflow-gpu1.14.0与tensorflow1.13.2环境上运行python ray_experiments.py train时都会出现bug;

二、RL-Adventure

1、Deep Q-Learning:

参见先前的Blog
https://www.cnblogs.com/lucifer1997/p/13458563.html;
https://github.com/lucifer2859/DQN;
https://github.com/Kaixhin/Rainbow
环境:PyTorch,GPU;
任务:Atari;
模型:Rainbow;
实验:成功运行;
https://github.com/TianhongDai/hindsight-experience-replay
环境:PyTorch,GPU(Not Recommended, Better Use CPU);
任务:MuJoCo;
模型:HER;
实验:未运行;

2、Policy Gradients:

https://github.com/higgsfield/RL-Adventure-2
环境:PyTorch,GPU;
任务:Gym;
模型:A2C,GAE,PPO,ACER,DDPG,TD3,SAC,GAIL,HER;
实验:成功运行;本地代码基于bug、issue以及性能对其进行修改,参见https://github.com/lucifer2859/Policy-Gradients;在本地代码中,所有模型(HER除外)均可以收敛且获得较好性能;HER的问题参见https://github.com/higgsfield/RL-Adventure-2/issues/14;SAC实现似乎与原文不符(参见https://github.com/higgsfield/RL-Adventure-2/issues/11);A2C实验仅在CartPole-v0上能够收敛;

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail;
环境:PyTorch/TensorFlow GPU;
任务:Atari,MuJoCo,PyBullet (including Racecar, Minitaur and Kuka),DeepMind Control Suite;
模型:A2C,PPO,ACKTR,GAIL
实验:未运行;

https://github.com/ikostrikov/pytorch-a3c
环境:PyTorch,CPU;
任务:Atari;
模型:A3C;
实验:初始运行失败,NotImplementedError;参考https://github.com/ikostrikov/pytorch-a3c/issues/66#issuecomment-559785590修改envs.py即可解决;最终成功运行;

https://github.com/haarnoja/sac
环境:TensorFlow,GPU;
任务:Continuous Control Tasks (MuJoCo);
模型:Soft Actor-Critic(SAC,第一版,模型带有状态价值函数V);
实验:未运行;

https://github.com/denisyarats/pytorch_sac
环境:PyTorch,GPU;
任务:Continuous Control Tasks (MuJoCo);
模型:Soft Actor-Critic(SAC,第一版,模型带有状态价值函数V);
实验:未运行;

http://github.com/rail-berkeley/softlearning/
环境:TensorFlow,GPU;
任务:Continuous Control Tasks (MuJoCo);
模型:Soft Actor-Critic(SAC,第二版,模型去掉了状态价值函数V);
实验:未运行;

https://github.com/ku2482/sac-discrete.pytorch
环境:PyTorch,GPU;
任务:Atari;
模型:SAC-Discrete(基于新版连续控制任务下的SAC改进的离散版本);
实验:成功运行;本地代码对其略有修改,参见https://github.com/lucifer2859/sac-discrete-pytorch;训练收敛,但性能与论文描述存在差异;

3、两者兼有:

https://github.com/ShangtongZhang/DeepRL
环境:PyTorch,GPU;
任务:Atari,MuJoCo;
模型:(Double/Dueling/Prioritized) DQN,C51,QR-DQN,(Continuous/Discrete) Synchronous Advantage A2C,N-Step DQN,DDPG,PPO,OC,TD3,COF-PAC,GradientDICE,Bi-Res-DDPG,DAC,Geoff-PAC,QUOTA,ACE;
实验:未运行;

https://github.com/astooke/rlpyt
环境:PyTorch,GPU;
任务:Atari;
模型:Modular, optimized implementations of common deep RL algorithms in PyTorch, with unified infrastructure supporting all three major families of model-free algorithms: policy gradient, deep-q learning, and q-function policy gradient.
Policy Gradient:A2C, PPO.
Replay Buffers:(supporting both DQN + QPG) non-sequence and sequence (for recurrent) replay, n-step returns, uniform or prioritized replay, full-observation or frame-based buffer (e.g. for Atari, stores only unique frames to save memory, reconstructs multi-frame observations).
Deep Q-Learning DQN + variants: Double, Dueling, Categorical (up to Rainbow minus Noisy Nets), Recurrent (R2D2-style).
Q-Function Policy Gradient DDPG, TD3, SAC.
实验:
成功运行,无bug;

https://github.com/vitchyr/rlkit
环境:PyTorch,GPU;
任务:gym[all]
模型:Skew-Fit,RIG,TDM,HER,DQN,SAC(新版),TD3,AWAC;
实验:未运行;
p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch: PyTorch implementations of deep reinforcement learning algorithms and environments (github.com)
环境:PyTorch;
任务:CartPole,MountainCar,Bit Flipping,Four Rooms,Long Corridor,Ant-[Maze, Push, Fall];
模型:DQN,DQN with Fixed Q Target,DDQN,DDQN with Prioritised Experience Replay,Dueling DDQN,REINFORCE,DDPG,TD3,SAC,SAC-Discrete,A3C,A2C,PPO,DQN-HER,DDPG-HER,h-DQN,Stochastic NN-HRL,DIAYN;
实验:部分模型在部分任务上成功运行(例如SAC-Discrete无法在Atari上成功运行);

https://github.com/hill-a/stable-baselines
环境:TensorFlow;

https://github.com/openai/baselines
环境:TensorFlow;

https://github.com/openai/spinningup
环境:TensorFlow/PyTorch
介绍:This is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning (deep RL). For the unfamiliar: reinforcement learning (RL) is a machine learning approach for teaching agents how to solve tasks by trial and error. Deep RL refers to the combination of RL with deep learning. This module contains a variety of helpful resources, including:
a short introduction to RL terminology, kinds of algorithms, and basic theory,
an essay about how to grow into an RL research role,
a curated list of important papers organized by topic,
a well-documented code repo of short, standalone implementations of key algorithms,
and a few exercises to serve as warm-ups.

三、Meta Learning (Learn to Learn)

1、Platform:

https://github.com/learnables/learn2learn

上一篇:android – 适配器上的检测项视图已被破坏


下一篇:Adaptive Critics and the Basal Ganglia