- Behavior imitation of individual board game players
可以把 behavior imitation 列为 keyword - by dividing the imitation process into two stages
→ \to → 技术比阶段的适应性更好,而且更有力量
through three techniques: generic policy learning, individual player behavior imitating, and board state feature extraction. - They both learn from gameplay records by adversarial imitation learning and are unified within the meta-learning framework.
→ \to → 给每个名字一个正确的位置,不要 and
Under the meta-learning framework, our approach learns player behavior using gameplay records through adversarial imitation learning.
→ \to → 有条理
On the one hand, gameplay records are used to build the basic data for adversarial imitation learning.
On the other hand, the meta-learning framework is adopted to construct the overall algorithm.
→ \to → 更有逻辑
From the data viewpoint, gameplay records form the basis for adversarial imitation learning.
From the architecture viewpoint, meta-learning is adopted to construct an adaptive framework. - 三个技术要与前面的问题所对应
- Fig 1 depicts → \to → Figure 1 depicts
- Our method takes demonstrations as input (including the state and actions of players’ records) and learns a policy for each player.
→ \to → 坚决反对 and
The inputs are demonstrations, including the state and actions of players’ records.
The output is a personalized strategy for each player. - In order to learn more diverse behaviors, we use adversarial imitation learning rather than supervised learning to understand the intentions of players.
→ \to → understand the intentions of players 近期、直接目的
→ \to → learn more diverse behaviors 远期、间接目的
→ \to → 修改为
We use adversarial imitation learning rather than supervised learning to more diverse behaviors.
In this way, the intentions of players may be better captured. - 几个模块分别描述
The CNN-based … extractor module …
The xxx module …
单元测试: 如何做的
Contribution: 集成测试, what we have achieved?