Versal AIE 上手尝鲜 -- Standalone例程
最近陆陆续续有工程师拿到了VCK190单板。 VCK190带Xilinx的7nm AIE,有很强的处理能力。 本文介绍怎么运行Xilinx AIE的例程,熟悉AIE开发流程。
本文先介绍Standalone(BareMetal)的例程, 它来自于Vitis-Tutorials 的 AIE a2z。
1. 准备工作
1.1. License
在上手之前,需要注意是VCK190 Production单板,还是VCK190 ES单板。如果是VCK190 Production单板,使用VCK190 Voucher,在Xilinx网站,可以申请到License。安装License后,License的状态窗口下,能看到下列项目。
AIEBuild
AIESim
MEBuild
MESim
如果是VCK190 ES单板,需要在Lounge里申请"Versal Tools Early Eacess"; "Versal Tools PDI Early Eacess"的License,并在Vivado里使能ES器件。在Vivado/2020.2/scripts/init.tcl的文件里,添加“enable_beta_device xcvc*”,可以自动使能ES器件。
1.2. Platform
在进行开发之前,需要准备Platform。 VCK190 Production单板和VCK190 ES单板使用的Platform不一样,可以从下面链接下载各自的Platform,再复制到目录“Xilinx/Vitis/2020.2/platforms/”下。
VCK190 Production Platform
VCK190 ES Platform
准备好后,目录结构与下面类似。
1.3. Common Images
Xilinx现在还提供了Common Images,包含对应单板的Linux启动文件,和编译器、sysroots(头文件、应用程序库)等。可以在Xilinx Download下载Versal common image。
1.4. 测试环境
Host OS: Ubuntu 18.04
Vitis 2020.2
PetaLinux 2020.2
VCK190 Production
2. AIE Standalone Flow
例程AIE a2z 是Standalone (BareMetal)的例程,Versal的A72不运行Linux。 它很全面,包含创建Platform、创建AIE Kernel、创建PL Kernel、创建A72应用程序、调试AIE Kernel。在Xilinx的文档中,AIE的程序,叫Kernel; 在Vitis里使用HLS开发的PL设计,也叫Kernel。
注意,2021年7月份,Vitis Tutorials的"master"分支,才包含例程AIE a2z 。
2.1. AIE a2z 分析
2.1.1. 文件列表
AIE a2z 包含下列文件。
aie_adder:
│ description.json
│ details.rst
│ Makefile
│ qor.json
│ README.rst
│ system.cfg
│ utils.mk
│ xrt.ini
│
├─data
│ golden.txt
│ input0.txt
│ input1.txt
│
└─src
aie_adder.cc
aie_graph.cpp
aie_graph.h
aie_kernel.h
host.cpp
pl_mm2s.cpp
pl_s2mm.cpp
2.1.2. aie_adder.cc
aie_adder.cc是定义AIE Kernel的文件,也是最重要的文件,仿真和实际运行都需要。
AIE Kernel也很简单,相当于是C语言编程的HelloWorld, 只是读取2个向量,做加法运算后,再写出去。
void aie_adder(input_stream_int32* in0, input_stream_int32* in1, output_stream_int32* out) {
v4int32 a = readincr_v4(in0);
v4int32 b = readincr_v4(in1);
v4int32 c = operator+(a, b);
writeincr_v4(out, c);
}
2.1.3. aie_graph.cpp
aie_graph.cpp定义和控制运算的graph,这个例子中,只用于仿真。
#include "aie_graph.h"
PLIO* in0 = new PLIO("DataIn0", adf::plio_32_bits, "data/input0.txt");
PLIO* in1 = new PLIO("DataIn1", adf::plio_32_bits, "data/input1.txt");
PLIO* out = new PLIO("DataOut", adf::plio_32_bits, "data/output.txt");
// Hank: only for simulation??
simulation::platform<2, 1> platform(in0, in1, out);
simpleGraph addergraph;
connect<> net0(platform.src[0], addergraph.in0);
connect<> net1(platform.src[1], addergraph.in1);
connect<> net2(addergraph.out, platform.sink[0]);
#ifdef __AIESIM__
int main(int argc, char** argv) {
addergraph.init();
addergraph.run(4);
addergraph.end();
return 0;
}
#endif
2.1.4. aie_graph.h
aie_graph.cpp定义了运算的graph,仿真和实际运行都需要。
#include <adf.h>
#include "aie_kernel.h"
using namespace adf;
class simpleGraph : public graph {
private:
kernel adder;
public:
port<input> in0, in1;
port<output> out;
simpleGraph() {
adder = kernel::create(aie_adder);
connect<stream>(in0, adder.in[0]);
connect<stream>(in1, adder.in[1]);
connect<stream>(adder.out[0], out);
source(adder) = "aie_adder.cc";
runtime<ratio>(adder) = 0.1;
};
};
2.1.5. aie_kernel.h
aie_kernel.h最简单,只声明了aie_adder的原型,仿真和实际运行都需要。
void aie_adder(input_stream_int32* in0, input_stream_int32* in1, output_stream_int32* out);
2.1.6. host.cpp
host.cpp会申请内存,加载数据, 加载xclbin, 运行AIE Kernel。
simpleGraph addergraph;
static std::vector<char> load_xclbin(xrtDeviceHandle device, const std::string& fnm) {
// load bit stream
std::ifstream stream(fnm);
stream.seekg(0, stream.end);
size_t size = stream.tellg();
stream.seekg(0, stream.beg);
std::vector<char> header(size);
stream.read(header.data(), size);
auto top = reinterpret_cast<const axlf*>(header.data());
xrtDeviceLoadXclbin(device, top);
return header;
}
int main(int argc, char** argv) {
// Open xclbin
auto dhdl = xrtDeviceOpen(0); // Open Device the local device
auto xclbin = load_xclbin(dhdl, "krnl_adder.xclbin");
auto top = reinterpret_cast<const axlf*>(xclbin.data());
adf::registerXRT(dhdl, top->m_header.uuid);
int DataInput0[sizeIn], DataInput1[sizeIn];
for (int i = 0; i < sizeIn; i++) {
DataInput0[i] = rand() % 100;
DataInput1[i] = rand() % 100;
}
// input memory
// Allocating the input size of sizeIn to MM2S
// This is using low-level XRT call xclAllocBO to allocate the memory
xrtBufferHandle in_bohdl0 = xrtBOAlloc(dhdl, sizeIn * sizeof(int), 0, 0);
auto in_bomapped0 = reinterpret_cast<uint32_t*>(xrtBOMap(in_bohdl0));
memcpy(in_bomapped0, DataInput0, sizeIn * sizeof(int));
printf("Input memory virtual addr 0x%px\n", in_bomapped0);
xrtBufferHandle in_bohdl1 = xrtBOAlloc(dhdl, sizeIn * sizeof(int), 0, 0);
auto in_bomapped1 = reinterpret_cast<uint32_t*>(xrtBOMap(in_bohdl1));
memcpy(in_bomapped1, DataInput1, sizeIn * sizeof(int));
printf("Input memory virtual addr 0x%px\n", in_bomapped1);
// output memory
// Allocating the output size of sizeOut to S2MM
// This is using low-level XRT call xclAllocBO to allocate the memory
xrtBufferHandle out_bohdl = xrtBOAlloc(dhdl, sizeOut * sizeof(int), 0, 0);
auto out_bomapped = reinterpret_cast<uint32_t*>(xrtBOMap(out_bohdl));
memset(out_bomapped, 0xABCDEF00, sizeOut * sizeof(int));
printf("Output memory virtual addr 0x%px\n", out_bomapped);
// mm2s ip
// Using the xrtPLKernelOpen function to manually control the PL Kernel
// that is outside of the AI Engine graph
xrtKernelHandle mm2s_khdl1 = xrtPLKernelOpen(dhdl, top->m_header.uuid, "pl_mm2s:{pl_mm2s_1}");
// Need to provide the kernel handle, and the argument order of the kernel arguments
// Here the in_bohdl is the input buffer, the nullptr is the streaming interface and must be null,
// lastly, the size of the data. This info can be found in the kernel definition.
xrtRunHandle mm2s_rhdl1 = xrtKernelRun(mm2s_khdl1, in_bohdl0, nullptr, sizeIn);
printf("run pl_mm2s_1\n");
xrtKernelHandle mm2s_khdl2 = xrtPLKernelOpen(dhdl, top->m_header.uuid, "pl_mm2s:{pl_mm2s_2}");
xrtRunHandle mm2s_rhdl2 = xrtKernelRun(mm2s_khdl2, in_bohdl1, nullptr, sizeIn);
printf("run pl_mm2s_2\n");
// s2mm ip
// Using the xrtPLKernelOpen function to manually control the PL Kernel
// that is outside of the AI Engine graph
xrtKernelHandle s2mm_khdl = xrtPLKernelOpen(dhdl, top->m_header.uuid, "pl_s2mm");
// Need to provide the kernel handle, and the argument order of the kernel arguments
// Here the out_bohdl is the output buffer, the nullptr is the streaming interface and must be null,
// lastly, the size of the data. This info can be found in the kernel definition.
xrtRunHandle s2mm_rhdl = xrtKernelRun(s2mm_khdl, out_bohdl, nullptr, sizeOut);
printf("run pl_s2mm\n");
// graph execution for AIE
printf("graph init. This does nothing because CDO in boot PDI already configures AIE.\n");
addergraph.init();
printf("graph run\n");
addergraph.run(N_ITER);
addergraph.end();
printf("graph end\n");
// wait for mm2s done
auto state = xrtRunWait(mm2s_rhdl1);
std::cout << "mm2s_1 completed with status(" << state << ")\n";
xrtRunClose(mm2s_rhdl1);
xrtKernelClose(mm2s_khdl1);
state = xrtRunWait(mm2s_rhdl2);
std::cout << "mm2s_2 completed with status(" << state << ")\n";
xrtRunClose(mm2s_rhdl2);
xrtKernelClose(mm2s_khdl2);
// wait for s2mm done
state = xrtRunWait(s2mm_rhdl);
std::cout << "s2mm completed with status(" << state << ")\n";
xrtRunClose(s2mm_rhdl);
xrtKernelClose(s2mm_khdl);
// Comparing the execution data to the golden data
// clean up XRT
std::cout << "Releasing remaining XRT objects...\n";
xrtBOFree(in_bohdl0);
xrtBOFree(in_bohdl1);
xrtBOFree(out_bohdl);
xrtDeviceClose(dhdl);
return errorCount;
}
2.1.7. pl_mm2s.cpp
pl_mm2s.cpp是利用HLS做的PL设计,用于从内存搬移数据到AIE Kernel。
void pl_mm2s(ap_int<32>* mem, hls::stream<qdma_axis<32, 0, 0, 0> >& s, int size) {
data_mover:
for (int i = 0; i < size; i++) {
qdma_axis<32, 0, 0, 0> x;
x.data = mem[i];
x.keep_all();
s.write(x);
}
}
2.1.8. pl_mm2s.cpp
pl_mm2s.cpp也是利用HLS做的PL设计,用于从AIE Kernel搬移数据到内存。
void pl_s2mm(ap_int<32>* mem, hls::stream<qdma_axis<32, 0, 0, 0> >& s, int size) {
data_mover:
for (int i = 0; i < size; i++) {
qdma_axis<32, 0, 0, 0> x = s.read();
mem[i] = x.data;
}
}
2.2. 经验
AIE a2z 做得相当完善,基本可以顺利完成。 在实验过程中,可能遇到下列问题。
2.2.1. AXI Interrupt
创建平台(Platform)时,AXI中断控制器(axi_intc)没有连接中断源。Vitis编译工程时,会连接HLS设计的IP模块的中断输出到AXI中断控制器(axi_intc)。 如果验证平台(Platform)的Block Design时,Vivado会报告下列关于中断控制器消息,提示没有中断源,可以忽略。
[BD 41-759] The input pins (listed below) are either not connected or do not have a source port, and they don‘t have a tie-off specified. These pins are tied-off to all 0‘s to avoid error in Implementation flow.
Please check your design and connect them as needed:
/axi_intc_0/intr
2.2.2. sys_clk0
Vivado也会对输入时钟报告下列时钟不匹配的消息。Vivado创建Block Design时,默认的时钟是100MHz。单板上的实际时钟是200MHz。选中sys_clk0_0,在属性中,把它更改为200MHz。
[xilinx.com:ip:axi_noc:1.0-1] /ps_nocClock frequency of the connected clock (/ps_noc/sys_clk0) is 100.000 MHz while "Input System Clock Frequency" is 200.000 MHz. Please either reconfigure the parameter "Input System Clock Period" of the axi_noc (in DDR Basic tab) or change frequency of the connected clock (CONFIG.FREQ_HZ) within the range of 199920031.987 to 200080032.013 Hz.
2.2.3. AIE license
如果Vitis编译工程时,报告“AIE license not found”,请申请license。
AIE license not found !
/opt/Xilinx/Vitis/2020.2/aietools/bin/aieir_be: line 96: kill: (-28000) - No such process
ERROR: [aiecompiler 77-753] This application has discovered an exceptional condition from which it cannot recover while executing the following command
>> aieir_be --time-passes=0 --trace-plio-width=64 --pl-freq=0 --use-real-noc=true --show-loggers=false --high-performance=false --kernel-address-location=false --target=x86sim --swfifo-threshold=40 --single-mm2s-channel=false --workdir=./Work --exit-after=complete --event-trace-config= --test-iterations=-1 --stacksize=1024 --platform=/proj/hankf/vck190/vck190_aie_a2z/vitis/base_pfm_vck190_aie_a2z/export/base_pfm_vck190_aie_a2z/base_pfm_vck190_aie_a2z.xpfm --event-trace-custom-config= --disable-dma-cmd-alignment=false --enable-ecc-scrubbing=false --write-partitioned-file=true --schemafile=AIEGraphSchema.json --include="/opt/Xilinx/Vitis/2020.2/aietools/include" --include="/opt/Xilinx/Vitis_HLS/2020.2/include" --include="../" --include="../src" --include="../data" --include="../src/kernels" --device= --write-unified-data=false --fastmath=false --event-trace-advanced-mapping=0 --log-level=1 --enable-reconfig=false --aiesim-xrt-api=false --gen-graph-cleanup=false --use-canonical-net-names=false --event-trace-port=plio --new-placer=true --use-phy-shim=true --xlopt=0 --pre-compile-kernels=false --validate-only=false --trace-aiesim-option=0 --aiearch=aie --mapped-soln-udm= --optimize-pktids=false --no-init=false --num-trace-streams=1 --aie-heat-map=false --phydevice= --exec-timed=0 --pl-auto-restart=false --routed-soln-udm= --enable-profiling=false --disable-transform-merge-broadcast=false --verbose=true --use-async-rtp-locks=true --repo-path= --genArchive=false --pl-axi-lite=false --new-router=true --aie-driver-v1=false --logcfg-file= --event-trace-bounding-box= --enable-reconfig-dma-autostart=false --heapsize=1024 --logical-arch= --nodot-graph=false --shim-constraints= --disable-dma-autostart=false --disable-transform-broadcast-split=true -json ./Work/temp/project.json -sdf-graph /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.cpp.
2.2.4. 安装dot
Vitis在编译过程中,会用到工具dot。如果没有安装sudo apt install graphviz,会得到错误"sh: 1: dot: not foun"。 在Ubuntu 18.04下,如果有管理员权限,使用命令“sudo apt install graphviz”能安装dot。
DEBUG:MapperPartitioner: Adding Edge : Name=D_net2 SrcPort=i1_po0 DstPort=i3_pi0 EdgeType=mem
DEBUG:MapperPartitioner:Done--Add Double Buffer Edge SrcPort=i1_po0 DstPort=i3_pi0 type=mem Edge=net2:i1-(buf2)->i3
DEBUG:MapperPartitioner:Graph After Adding Double Edges
sh: 1: dot: not found
ERROR: [aiecompiler 77-753] This application has discovered an exceptional condition from which it cannot recover while executing the following command
>> dot ./Work/reports/project.dot -Tpng -o ./Work/reports/project.png
.
Please check the output log for errors and fix those before you run the application.
/opt/Xilinx/Vitis/2020.2/aietools/bin/aieir_be: line 96: kill: (-44668) - No such process
ERROR: [aiecompiler 77-753] This application has discovered an exceptional condition from which it cannot recover while executing the following command
>> aieir_be --time-passes=0 --trace-plio-width=64 --pl-freq=0 --use-real-noc=true --show-loggers=false --high-performance=false --kernel-address-location=false --target=hw --swfifo-threshold=40 --single-mm2s-channel=false --workdir=./Work --exit-after=complete --event-trace-config= --test-iterations=-1 --stacksize=1024 --platform=/proj/hankf/vck190/vck190_aie_a2z/vitis/base_pfm_vck190_aie_a2z/export/base_pfm_vck190_aie_a2z/base_pfm_vck190_aie_a2z.xpfm --event-trace-custom-config= --disable-dma-cmd-alignment=false --enable-ecc-scrubbing=false --write-partitioned-file=true --schemafile=AIEGraphSchema.json --include="/opt/Xilinx/Vitis/2020.2/aietools/include" --include="/opt/Xilinx/Vitis_HLS/2020.2/include" --include="../" --include="../src" --include="../data" --include="../src/kernels" --device= --write-unified-data=false --fastmath=false --event-trace-advanced-mapping=0 --log-level=1 --enable-reconfig=false --aiesim-xrt-api=false --gen-graph-cleanup=false --use-canonical-net-names=false --event-trace-port=plio --new-placer=true --use-phy-shim=true --xlopt=0 --pre-compile-kernels=false --validate-only=false --trace-aiesim-option=0 --aiearch=aie --mapped-soln-udm= --optimize-pktids=false --no-init=false --num-trace-streams=1 --aie-heat-map=false --phydevice= --exec-timed=0 --pl-auto-restart=false --routed-soln-udm= --enable-profiling=false --disable-transform-merge-broadcast=false --verbose=true --use-async-rtp-locks=true --repo-path= --genArchive=false --pl-axi-lite=false --new-router=true --aie-driver-v1=false --logcfg-file= --event-trace-bounding-box= --enable-reconfig-dma-autostart=false --heapsize=1024 --logical-arch= --nodot-graph=false --shim-constraints= --disable-dma-autostart=false --disable-transform-broadcast-split=true -json ./Work/temp/project.json -sdf-graph /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.cpp.
2.2.5. 软件Emulation
运行软件Emulation的时候,要选择AIE工程,不选择system project。如果选择system project运行软件Emulation,会出现下列错误。
Error while launching program:
The selected system project ‘simple_application_system‘ contains applications (simple_application) that doesn‘t support launching software emulation.
The selected system project ‘simple_application_system‘ contains applications (simple_application) that doesn‘t support launching software emulation.
2.2.6. 硬件Emulation
先运行软件Emulation,再运行硬件Emulation。
如果直接运行硬件Emulation,会出现下列错误。
Failed to start emulator on the project ‘simple_application_system‘ using the build configuration ‘Emulation-HW‘.
Launch emulator script doesn‘t exist at location ‘/proj/hankf/vck190/vck190_aie_a2z_script_hw_prj/custom_pfm_vck190/vitis/simple_application_system/Emulation-HW/package/launch_hw_emu.sh‘.
另外Vitis里,先选择AIE工程,再编译AIE工程,然后去启动硬件Emulation,菜单里可能没有目标。编译后,要重新选择system project,再选择AIE工程,再去启动硬件Emulation,菜单里就会有目标。
2.2.7. A72软件没有ap_int.h
文件mm2s.cpp和s2mm.cpp时给HLS设计用的,不能添加到A72的软件工程里。如果把它们加到了A72的软件工程里,会遇到错误“ap_int.h: No such file or directory”。
aarch64-none-elf-g++ -Wall -O0 -g3 -I"/opt/Xilinx/Vitis/2020.2/aietools/include"
-I"/proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src"
-I"/../include" -c -fmessage-length=0 -MT"src/mm2s.o" -mcpu=cortex-a72 -I/proj/hankf/vck190/vck190_aie_a2z/vitis/vck190_aie_a2z_aie_output_platform/export/vck190_aie_a2z_aie_output_platform/sw/vck190_aie_a2z_aie_output_platform/standalone_domain/bspinclude/include -MMD -MP -MF"src/mm2s.d" -MT"src/mm2s.o" -o "src/mm2s.o" "../src/mm2s.cpp"
../src/mm2s.cpp:33:10: fatal error: ap_int.h: No such file or directory
33 | #include <ap_int.h>
| ^~~~~~~~~~
2.2.8. A72软件工程找不到simple(input_window, output_window )
A72软件要控制AIE Kernel,需要相关信息。因此预先把AIE工程编译后产生的文件“Hardware/Work/ps/c_rts/aie_control.cpp“,添加到 A72软件工程。
如果忘记添加,可能会得到错误信息,“undefined reference to `simple(input_window
aarch64-none-elf-g++ -L/opt/Xilinx/Vitis/2020.2/aietools/lib/aarchnone64.o -mcpu=cortex-a72 -Wl,-T -Wl,../src/lscript.ld -L/proj/hankf/vck190/vck190_aie_a2z/vitis/vck190_aie_a2z_aie_output_platform/export/vck190_aie_a2z_aie_output_platform/sw/vck190_aie_a2z_aie_output_platform/standalone_domain/bsplib/lib -o "aie_a2z_vck190_a72_ctrl_app.elf" ./src/main.o ./src/platform.o -ladf_api -Wl,--start-group,-lxil,-lgcc,-lc,-lstdc++,--end-group
/opt/Xilinx/Vitis/2020.2/gnu/aarch64/lin/aarch64-none/x86_64-oesdk-linux/usr/bin/aarch64-xilinx-elf/aarch64-xilinx-elf-ld.real: ./src/main.o: in function `simpleGraph::simpleGraph()‘:
/proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.h:17: undefined reference to `simple(input_window<cint16>*, output_window<cint16>*)‘
/opt/Xilinx/Vitis/2020.2/gnu/aarch64/lin/aarch64-none/x86_64-oesdk-linux/usr/bin/aarch64-xilinx-elf/aarch64-xilinx-elf-ld.real: /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.h:17: undefined reference to `simple(input_window<cint16>*, output_window<cint16>*)‘
/opt/Xilinx/Vitis/2020.2/gnu/aarch64/lin/aarch64-none/x86_64-oesdk-linux/usr/bin/aarch64-xilinx-elf/aarch64-xilinx-elf-ld.real: /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.h:18: undefined reference to `simple(input_window<cint16>*, output_window<cint16>*)‘
makefile:48: recipe for target ‘aie_a2z_vck190_a72_ctrl_app.elf‘ failed
/opt/Xilinx/Vitis/2020.2/gnu/aarch64/lin/aarch64-none/x86_64-oesdk-linux/usr/bin/aarch64-xilinx-elf/aarch64-xilinx-elf-ld.real: /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.h:18: undefined reference to `simple(input_window<cint16>*, output_window<cint16>*)‘
collect2.real: error: ld returned 1 exit status
make: *** [aie_a2z_vck190_a72_ctrl_app.elf] Error 1
2.2.9. Package
编译A72程序后,要编译system project,将所有模块打包再一起。这时候,要根据04-ps_application_creation_run_all.md的Step 3. Build the Full System,添加打包选项,“--package.ps_elf ../../A-to-Z_app/Debug/A-to-Z_app.elf,a72-0 --package.defer_aie_run”。
如果没有添加,会报告错误“no xclbin input is found”。
Package step cannot be performed since the platform has a VPP link generated XSA and no xclbin input is found. Please provide a valid xclbin location in system project package options
11:21:21 Build Finished (took 646ms)