论文解读:ChatDev:Communicative Agents for Software Development
代码运行入口
运行入口在run.py,运行参数:
--task
"我想要一个五子棋的游戏,我希望界面能够好看一些,功能更够全一些,比如可以实现倒计时"
--name
"WuZi"
环境变量(openai key):
OPENAI_API_KEY=xxx
项目运行流程
代码本质上调用的还是camel的核心方法
- Init ChatChain(主要加载所有的配置文件路径,再将配置文件json里的各个值赋给ChatChain对象中的chat_env属性,以便之后进行到每个phase时,从chat_env中获取每个phase的所有配置并执行)
- Init Log(对log进行初始化配置)
- Pre Processing(移除掉无用的文件或一些全局配置设置的日志,对chat环境设置task prompt)
- Personnel Recruitment(加载所有的人员,即加载虚拟公司的人员列表)
- Chat Chain(整个项目的核心,执行开发流程)
- Post Processing(总结整个产品,移除软件目录的日志文件,统计开发耗时,总开销)
# Start ChatDev
# ----------------------------------------
# Init ChatChain
# ----------------------------------------
config_path, config_phase_path, config_role_path = get_config(args.config)
args2type = {'GPT_3_5_TURBO': ModelType.GPT_3_5_TURBO, 'GPT_4': ModelType.GPT_4, 'GPT_4_32K': ModelType.GPT_4_32k}
chat_chain = ChatChain(config_path=config_path,
config_phase_path=config_phase_path,
config_role_path=config_role_path,
task_prompt=args.task,
project_name=args.name,
org_name=args.org,
model_type=args2type[args.model],
code_path=args.path)
# ----------------------------------------
# Init Log
# ----------------------------------------
logging.basicConfig(filename=chat_chain.log_filepath, level=logging.INFO,
format='[%(asctime)s %(levelname)s] %(message)s',
datefmt='%Y-%d-%m %H:%M:%S', encoding="utf-8")
# ----------------------------------------
# Pre Processing
# ----------------------------------------
chat_chain.pre_processing()
# ----------------------------------------
# Personnel Recruitment
# ----------------------------------------
chat_chain.make_recruitment()
# ----------------------------------------
# Chat Chain
# ----------------------------------------
chat_chain.execute_chain()
# ----------------------------------------
# Post Processing
# ----------------------------------------
chat_chain.post_processing()
执行开发流程(以DemandAnalysis Phrase为例)
配置含义
首先是需要知道ChatChainConfig中的几个配置的含义:
- phaseType:阶段类型。分为SimplePhase和ComposedPhase两种,简单阶段是不包含子阶段的单个阶段,组合阶段就是可能由内部几个子阶段组合构成的大的阶段。
- max_turn_step:最大循环次数。如果小于零,则默认对话循环10次,否则就是指定的值。
- need_reflect:是否需要“反思”。需要”反思“意味着在phase结束后会对phase的结果进行检查,如果没有得出结论则会额外进行一轮counselor与CEO的会话,在这个会话中会回顾当前phase中所有的消息记录来促使CEO得到最终的结论。
总体来说,阶段分为SimplePhase和Composed,他们都是根据phase名找到对应的phase类执行。
执行步骤:
从全局chat环境更新phase环境
从chat_env中获取当前phase所需要的配置,如当前phase的prompt,当前phase所需要的一些输入(参数)。
执行聊天(chatting函数)
- 首先对role play进行初始化(在camel的源码中做的初始化)
里面初始化主要是做了一下assistant message的定义、user message的定义、管理agent对话的类的定义(这里面比较重要的即stored_messages,主要用来存储message)
- 初始化chat(更新user_agent的stored_messages)
- 接下来进行chat
开启chat
chat结束的条件有两个:
- 得出phase结论,任意一方将结论以为开头(info=True)进行输出(这个要求会在prompt中明确指出)或者达到最大循环次数chat_turn_limit。
- 会话所使用的token数达到chatgpt的上限(terminated=True)。
比如DemandAnalysis Phrase,需要CEO进行提问,需要CPO进行回答(以ChatGPT作为CPO进行 response)
ChatGPT响应的结果为Application:
可以看出响应的结果以开头,所以满足chat结束的第一条,直接结束chat。
- 在结束chat之后需要判断是否进行reflect(ChatChainConfig中的配置need_reflect=True)
如果没有得出结论,那么就会进行reflect。
实际上,reflect是新角色Counselor与CEO再进行chatting,直到得出结论。
- 继续对话
上面进行的是一轮对话,如果对话没有终止且对话轮次 < chat_turn_limit的时候,那么就可以继续进行对话
在ChatGPT响应结果之后,如果检测没有带有的结束标志,那么就会进行CTO->CEO的对话,递归调用step函数。
利用聊天结论更新全局chat环境
将当前phrase得到的结论更新到chat_env,以供下一phrase获取所需的配置。
这里将DemandAnalysis phrase所获得的结论存进了chat_env.env_dict[‘modality’]。
各子阶段的解读
DemandAnalysis Phrase
{
"phase": "DemandAnalysis",
"phaseType": "SimplePhase",
"max_turn_step": -1,
"need_reflect": "True"
}
需求分析阶段是一个简单阶段,最大循环次数为-1(代码中-1即10),需要反思,CEO<->CPO。
这一步决定了产品的modality(姑且把这个看成产品具体类型)。对于我提出的“我想要一个五子棋的游戏,我希望界面能够好看一些,功能更够全一些,比如可以实现倒计时”,DemandAnalysis Phrase结束之后给出的结论是Application。
<CEO>
[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Executive Officer. Now, we are both working at ChatDev and we share a common interest in collaborating to successfully complete a task assigned by a new customer.
Your main responsibilities include being an active decision-maker on users' demands and other key policy issues, leader, manager, and executor. Your decision-making role involves high-level decisions about policy and strategy; and your communicator role can involve speaking to the organization's management and employees.
Here is a new customer's task: 我想要一个五子棋的游戏,我希望界面能够好看一些,功能更够全一些,比如可以实现倒计时.
To complete the task, I will give you one or more instructions, and you must help me to write a specific solution that appropriately solves the requested instruction based on your expertise and my needs.]
ChatDev has made products in the following form before:
Image: can present information in line chart, bar chart, flow chart, cloud chart, Gantt chart, etc.
Document: can present information via .docx files.
PowerPoint: can present information via .pptx files.
Excel: can present information via .xlsx files.
PDF: can present information via .pdf files.
Website: can present personal resume, tutorial, products, or ideas, via .html files.
Application: can implement visualized game, software, tool, etc, via python.
Dashboard: can display a panel visualizing real-time information.
Mind Map: can represent ideas, with related concepts arranged around a core concept.
As the Chief Product Officer, to satisfy the new user's demand and the product should be realizable, you should keep discussing with me to decide which product modality do we want the product to be?
Note that we must ONLY discuss the product modality and do not discuss anything else! Once we all have expressed our opinion(s) and agree with the results of the discussion unanimously, any of us must actively terminate the discussion by replying with only one line, which starts with a single word <INFO>, followed by our final product modality without any other words, e.g., "<INFO> PowerPoint".
</CEO>
<CPO>
I think for a game like Five in a Row (Gomoku), an application would be the most suitable modality. It would allow for a visually appealing interface, full functionality, and the ability to implement features like countdown timers. What do you think?
</CPO>
<CEO>
I agree with you. An application would indeed be the most suitable modality for a game like Five in a Row (Gomoku). It provides the flexibility to create a visually appealing interface, incorporate all the required functionalities, and easily implement features like countdown timers. Let's proceed with developing an application for this game.
</CEO>
<CPO>
<INFO> Application
</CPO>
从上面的聊天记录可以看出进行了两轮对话最终达成一致()。
LanguageChoose Phrase
{
"phase": "LanguageChoose",
"phaseType": "SimplePhase",
"max_turn_step": -1,
"need_reflect": "True"
}
选择开发语言,是一个简单阶段,最大循环次数为-1(代码中-1即10),需要反思,CEO<->CTO。
这一步决定后续开发代码使用的编程语言。这里CEO向CTO提出要求设计提出的“我想要一个五子棋的游戏,我希望界面能够好看一些,功能更够全一些,比如可以实现倒计时”的项目,并且明确指出了”如果python可以完成这个任务,直接回复python“,所以CTO直接得出结论,使用python作为开发语言。
最后更新language到chat_env:
<CEO>
[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Technology Officer. we are both working at ChatDev. We share a common interest in collaborating to successfully complete a task assigned by a new customer.
You are very familiar to information technology. You will make high-level decisions for the overarching technology infrastructure that closely align with the organization's goals, while you work alongside the organization's information technology ("IT") staff members to perform everyday operations.
Here is a new customer's task: 我想要一个五子棋的游戏,我希望界面能够好看一些,功能更够全一些,比如可以实现倒计时.
To complete the task, You must write a response that appropriately solves the requested instruction based on your expertise and customer's needs.]
According to the new user's task and some creative brainstorm ideas listed below:
Task: "我想要一个五子棋的游戏,我希望界面能够好看一些,功能更够全一些,比如可以实现倒计时".
Modality: "application".
Ideas: "".
We have decided to complete the task through a executable software implemented via a programming language.
As the Chief Technology Officer, to satisfy the new user's demand and make the software realizable, you should propose a concrete programming language. If python can complete this task via Python, please answer Python; otherwise, answer another programming language (e.g., Java, C++, etc,).
Note that we must ONLY discuss the target programming language and do not discuss anything else! Once we all have expressed our opinion(s) and agree with the results of the discussion unanimously, any of us must actively terminate the discussion and conclude the best programming language we have discussed without any other words or reasons, return only one line using the format: "<INFO> *" where "*" represents a programming language.
</CEO>
<CTO>
<INFO> Python
</CTO>
上面的聊天记录可以看出,CTO agent直接就对CEO agent的问题做出了结论性回复
Coding Phrase
{
"phase": "Coding",
"phaseType": "SimplePhase",
"max_turn_step": 1,
"need_reflect": "False"
}
代码开发阶段,是一个简单阶段,最大循环次数为1,需要反思,CTO<->Programmer。
Programmer会在这一步完成代码的开发,同时遵守规定的代码格式。这里结束条件不再是双方都达成共识(),而是max_turn_step=1。coding会话结束后,在进行更新chat_env过程时,ChatDev会把根据会话的结果把不同的代码文件分开并且保存到本地文件。
如果启用了git管理,ChatDev还会创建一个本地git仓库进行版本控制。(“git_management”: “False”)
ChatGPT返回的是main.py、game.py、board.py、player.py
'''
This is the main file of the Five in a Row game.
'''
import tkinter as tk
from game import Game
def main():
root = tk.Tk()
root.title("Five in a Row")
game = Game(root)
game.start()
root.mainloop()
if __name__ == "__main__":
main()
'''
This file contains the Game class which manages the game logic.
'''
import tkinter as tk
from board import Board
from player import Player
class Game:
def __init__(self, root):
self.root = root
self.board = Board(root)
self.player1 = Player("Player 1", "X")
self.player2 = Player("Player 2", "O")
self.current_player = self.player1
def start(self):
self.board.draw()
self.board.canvas.bind("<Button-1>", self.on_click)
def on_click(self, event):
if not self.board.game_over:
x = event.x // self.board.cell_size
y = event.y // self.board.cell_size
if self.board.is_valid_move(x, y):
self.board.make_move(x, y, self.current_player.symbol)
if self.board.check_win(x, y):
self.board.game_over = True
self.show_winner_message()
else:
self.switch_player()
def switch_player(self):
if self.current_player == self.player1:
self.current_player = self.player2
else:
self.current_player = self.player1
def show_winner_message(self):
winner = self.current_player.name
message = f"{winner} wins!"
tk.messagebox.showinfo("Game Over", message)
'''
This file contains the Board class which represents the game board.
'''
import tkinter as tk
class Board:
def __init__(self, root):
self.root = root
self.size = 15
self.cell_size = 40
self.canvas_size = self.size * self.cell_size
self.canvas = tk.Canvas(self.root, width=self.canvas_size, height=self.canvas_size)
self.canvas.pack()
self.game_over = False
self.board = [[None] * self.size for _ in range(self.size)]
def draw(self):
for i in range(self.size):
for j in range(self.size):
x1 = i * self.cell_size
y1 = j * self.cell_size
x2 = x1 + self.cell_size
y2 = y1 + self.cell_size
self.canvas.create_rectangle(x1, y1, x2, y2, outline="black")
def is_valid_move(self, x, y):
return self.board[x][y] is None
def make_move(self, x, y, symbol):
self.board[x][y] = symbol
x1 = x * self.cell_size
y1 = y * self.cell_size
x2 = x1 + self.cell_size
y2 = y1 + self.cell_size
self.canvas.create_text((x1 + x2) / 2, (y1 + y2) / 2, text=symbol, font=("Arial", 20))
def check_win(self, x, y):
symbol = self.board[x][y]
directions = [(1, 0), (0, 1), (1, 1), (1, -1)]
for dx, dy in directions:
count = 1
for i in range(1, 5):
nx = x + dx * i
ny = y + dy * i
if 0 <= nx < self.size and 0 <= ny < self.size and self.board[nx][ny] == symbol:
count += 1
else:
break
if count == 5:
return True
return False
'''
This file contains the Player class which represents a player in the game.
'''
class Player:
def __init__(self, name, symbol):
self.name = name
self.symbol = symbol
<CTO>
[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Programmer. we are both working at ChatDev. We share a common interest in collaborating to successfully complete a task assigned by a new customer.
You can write/create computer software or applications by providing a specific programming language to the computer. You have extensive computing and coding experience in many varieties of programming languages and platforms, such as Python, Java, C, C++, HTML, CSS, JavaScript, XML, SQL, PHP, etc,.
Here is a new customer's task: 我想要一个五子棋的游戏,我希望界面能够好看一些,功能更够全一些,比如可以实现倒计时.
To complete the task, you must write a response that appropriately solves the requested instruction based on your expertise and customer's needs.]
According to the new user's task and our software designs listed below:
Task: "我想要一个五子棋的游戏,我希望界面能够好看一些,功能更够全一些,比如可以实现倒计时".
Modality: "application".
Programming Language: " Python"
Ideas:""
We have decided to complete the task through a executable software with multiple files implemented via Python. As the Programmer, to satisfy the new user's demands, you should write one or multiple files and make sure that every detail of the architecture is, in the end, implemented as code. The software should be equipped with graphical user interface (GUI) so that user can visually and graphically use it; so you must choose a GUI framework (e.g., in Python, you can implement GUI via tkinter, Pygame, Flexx, PyGUI, etc,).
Think step by step and reason yourself to the right decisions to make sure we get it right.
You will first lay out the names of the core classes, functions, methods that will be necessary, as well as a quick comment on their purpose.
Then you will output the content of each file including complete code. Each file must strictly follow a markdown code block format, where the following tokens must be replaced such that "FILENAME" is the lowercase file name including the file extension, "LANGUAGE" in the programming language, "DOCSTRING" is a string literal specified in source code that is used to document a specific segment of code, and "CODE" is the original code:
FILENAME
```LANGUAGE
'''
DOCSTRING
'''
CODE
You will start with the “main” file, then go to the ones that are imported by that file, and so on.
Please note that the code should be fully functional. Ensure to implement all functions. No placeholders (such as ‘pass’ in Python).
This is the basic structure of the Five in a Row game. The main file main.py
creates the game window and starts the game. The game logic is handled by the Game
class in the game.py
file. The game board is represented by the Board
class in the board.py
file. The Player
class in the player.py
file represents a player in the game.
The game board is displayed using a tkinter canvas, and players can make moves by clicking on the board. The game checks for a win condition after each move and displays a message if a player wins.
You can run the game by executing the main.py
file.
这一阶段没有进行reflect,整个对话只进行了一次CTO->Programmer。
## CodeCompleteAll Phrase
```json
{
"phase": "CodeCompleteAll",
"phaseType": "ComposedPhase",
"cycleNum": 10,
"Composition": [
{
"phase": "CodeComplete",
"phaseType": "SimplePhase",
"max_turn_step": 1,
"need_reflect": "False"
}
]
}
完成所有代码,是一个组合阶段,子阶段循环次数10,子阶段只有1个CodeComplete。
这个阶段首先会把Coding阶段生成的代码文件读取进来,逐个检查里边是不是有没写完的部分(用"pass"跳过),每检查到1个没完成的代码就进入子阶段CodeComplete,最多循环5次来把代码完成。
由于我的这个例子比较简单,从一开始就没有没完成的代码,所以update_phase_env阶段触发了break_cycle直接就结束了这个阶段。(判断是否终端cycle,是由phase_env中的unimplemented_file决定的,如果未实现的文件为空,那么就直接终止循环)
如果有未完成的代码,则会通过CodeComplete来让agent回答所有未实现的方法。
CodeReview Phrase
{
"phase": "CodeReview",
"phaseType": "ComposedPhase",
"cycleNum": 3,
"Composition": [
{
"phase": "CodeReviewComment",
"phaseType": "SimplePhase",
"max_turn_step": 1,
"need_reflect": "False"
},
{
"phase": "CodeReviewModification",
"phaseType": "SimplePhase",
"max_turn_step": 1,
"need_reflect": "False"
}
]
}
代码评审,是一个组合阶段,子阶段循环次数3,子阶段有CodeReviewComment和CodeReviewModification。
CodeReviewComment
CodeReviewComment是一个简单阶段,最大循环次数为1,不需要反思,Programmer<->CodeReviewer。这一步就是programmer指示reviewer检查代码,如果有遗漏的部分在回复中返回。
第一次循环中,给的修改如下:
from tkinter import messagebox
<programmer>
[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Code Reviewer. we are both working at ChatDev. We share a common interest in collaborating to successfully complete a task assigned by a new customer.
You can help programmers to assess source codes for software troubleshooting, fix bugs to increase code quality and robustness, and offer proposals to improve the source codes.
Here is a new customer's task: 我想要一个五子棋的游戏,我希望界面能够好看一些,功能更够全一些,比如可以实现倒计时.
To complete the task, you must write a response that appropriately solves the requested instruction based on your expertise and customer's needs.]
According to the new user's task and our software designs:
Task: "我想要一个五子棋的游戏,我希望界面能够好看一些,功能更够全一些,比如可以实现倒计时".
Modality: "application".
Programming Language: " Python"
Ideas: ""
Codes:
"
代码部分省略...
"
As the Code Reviewer, to make the software directly operable without further coding, ChatDev have formulated the following regulations:
1) all referenced classes should be imported;
2) all methods should be implemented;
3) all methods need to have the necessary comments;
4) no potential bugs;
5) The entire project conforms to the tasks proposed by the user;
6) most importantly, do not only check the errors in the code, but also the logic of code. Make sure that user can interact with generated software without losing any feature in the requirement;
Now, you should check the above regulations one by one and review the codes in detail, propose one comment with the highest priority about the codes, and give me instructions on how to fix. Tell me your comment with the highest priority and corresponding suggestions on revision. If the codes are perfect and you have no comment on them, return only one line like "<INFO> Finished".
</programmer>
<Code Reviewer>
[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Programmer. we are both working at ChatDev. We share a common interest in collaborating to successfully complete a task assigned by a new customer.
You can write/create computer software or applications by providing a specific programming language to the computer. You have extensive computing and coding experience in many varieties of programming languages and platforms, such as Python, Java, C, C++, HTML, CSS, JavaScript, XML, SQL, PHP, etc,.
Here is a new customer's task: 我想要一个五子棋的游戏,我希望界面能够好看一些,功能更够全一些,比如可以实现倒计时.
To complete the task, you must write a response that appropriately solves the requested instruction based on your expertise and customer's needs.]
<COMMENT> The code is missing the import statement for the `messagebox` module in the `game.py` file. This will cause an error when trying to show the winner message. To fix this, add the following import statement at the top of the `game.py` file:
```python
from tkinter import messagebox
This will import the messagebox
module from the tkinter
package, allowing the winner message to be displayed correctly.
</Code Reviewer>
这里实际上给的修改"from tkinter import messagebox"是错误的。
### CodeReviewModification
代码修复阶段,也是一个简单阶段,最大循环次数为1,不需要反思,CodeReviewer<->Programmer。这一步reviewer把刚刚发现的问题和建议反馈给programmer,programmer重新生成代码。
> 传统的回答问题可能导致不准确或不相关的信息,特别是在代码生成中,天真的指令可能导致意想不到的幻觉。在生成代码时,这个问题变得尤为严重。为了解决这个问题,提出了一种“思维指导”的机制,灵感来源于“思维链提示”。
CodeReviewModification本质上是CodeReviewComment的“角色反转”,CodeReviewComment是由programmer->reviewer,而CodeReviewModification是由reviewer->programmer。
```python
'''
This is the main file of the Five in a Row game.
'''
import tkinter as tk
from game import Game
def main():
root = tk.Tk()
root.title("Five in a Row")
game = Game(root)
game.start()
root.mainloop()
if __name__ == "__main__":
main()
'''
This file contains the Game class which manages the game logic.
'''
import tkinter as tk
from tkinter import messagebox
from board import Board
from player import Player
class Game:
def __init__(self, root):
self.root = root
self.board = Board(root)
self.player1 = Player("Player 1", "X")
self.player2 = Player("Player 2", "O")
self.current_player = self.player1
def start(self