从简单的需求说起
最近用Electron做一个App,碰到了一个很简单的需求,就是将Python环境下训练的Pytorch深度学习模型加载到Electron中去执行。
开始想的也比较简单,本身Pytorch官方提供了libtorch库,Pytorch的C++端,所以可以将Pytorch模型保存为.pt,然后用libtorch加载。然后再利用node-gyp将其编译成动态链接文件.node,让Nodejs加载。
libtorch介绍
官网地址:https://pytorch.org/cppdocs/frontend.html
Libtorch是Pytorch的C++前端,用于CPU和GPU张量计算的C++14库,为机器学习和神经网络提供自动微分和各种更高级别的抽象。换成人话就是C++版的Pytorch,API也和Python版的Pytorch类似。在某些情况下,由于性能和可移植性要求,可能并不能使用Python解释器,比如低延迟、高性能或者多线程环境或者在模型部署上,这个时候就可以使用C++前端去做了。
libtorch提供的C++API和Python端的类似,熟悉Python版的Pytroch的话其实还是比较简单,主要还是下面这些组件
Component | Description |
---|---|
torch::Tensor | 可自动微分、高效的CPU/GPU张量模块 |
torch::nn | 用于神经网络建模的可组合模块集合 |
torch::optim | 优化器模块,即使用SGD、Adam等优化算法来训练模型 |
torch::data | 数据集、数据管道和多线程、异步加载器 |
torch::serialize | 用于存储和加载模型检查点和序列化API |
torch::python | C++模型绑定到Python中 |
torch::jit | 对TorchScript JIT编译器的纯C++访问 |
把libtorch下载下来后,可以看到结构,主要就是include目录(包含各种头文件)和lib目录(动/静态链接库),还一个share目录,放的是cmake文件。
简单的代码
按照上面的思路,代码其实很简单,首先用libtorch写个加载.pt
模型并执行的函数
// torch_script.cpp
#include "torch/script.h"
#include "torch_script.h"
vector<float> module_forward(const char *pathname, const vector<float> &input) {
try {
// 加载模型
torch::jit::Module module = torch::jit::load(pathname);
vector<torch::jit::IValue> in_batch;
at::Tensor in = torch::tensor(input);
in_batch.emplace_back(torch::reshape(in, {1, int64_t(input.size())}));
at::Tensor output = module.forward(in_batch).toTensor(); // run model
auto float_out = output.data_ptr<float>();
return vector<float>(float_out, float_out + output.size(1));
} catch (const c10::Error &e) {
cerr << e.msg() << endl;
}
return vector<float>();
}
然后用node-api-addon
库将其转化为V8类型,并暴露moduleForward
函数让Nodejs端调用
// node_script.cpp
#include "node_script.h"
Napi::Array ModuleForward(const Napi::CallbackInfo& info) {
Napi::Env env = info.Env();
Napi::Array result = Napi::Array::New(env);
Napi::String pathname = info[0].ToString();
Napi::Array input = info[1].As<Napi::Array>();
vector<float> in;
for (size_t i = 0; i < input.Length(); i++)
in.push_back(input.Get(i).ToNumber());
vector<float> r = module_forward(pathname.Utf8Value().c_str(), in);
for (size_t i = 0; i < r.size(); i++)
result.Set(i, Napi::Number::New(env, r[i]));
return result;
}
Napi::Object Init(Napi::Env env, Napi::Object exports) {
exports.Set("moduleForward", Napi::Function::New(env, ModuleForward));
return exports;
}
NODE_API_MODULE(torch_script, Init)
开始踩各种坑
node-gyp编译
node-gyp:https://github.com/nodejs/node-gyp
按照最开始的想法,直接用node-gyp编译成.node
文件,因此对应的binding.gyp
也很容易
{
"targets": [
{
"target_name": "torch_script",
"include_dirs": [
"<!@(node -p \"require('node-addon-api').include\")",
"libtorch/include"
],
# 添加下面的依赖库,根据当前Node.js版本判断
"dependencies": [
"<!(node -p \"require('node-addon-api').gyp\")"
],
"cflags!": ["-fno-exceptions"],
"cflags_cc!": ["-fno-exceptions"],
"defines": [
"NAPI_DISABLE_CPP_EXCEPTIONS" # 记得加这个宏
],
"sources": [
"torch_script.cpp",
"node_script.cpp",
]
}
]
}
然后执行node-gyp configure && node-gyp build
,开始第一类错误,这个原因能分析得到,libtorch库里面是用了C++的异常机制的,而node-gyp默认是把异常机制关掉的,另外细心的人可能会发现上面binding.gyp不是写了"cflags!: ["-fno-exceptions"]"
命令,把无异常的排除掉了嘛,然而事实上这还跟电脑上的C++编译器有关,因此需要在binding.gyp
里把各种异常机制打开
修改binding.gyp
,添加conditions
字段,为OS == "mac"
时直接修改xcode_setting
,启用GCC_ENABLE_CPP_EXCEPTIONS
{
"targets": [
{
... ,
+ "cflags": ["-fexceptions"],
+ "cflags_cc": ["-fexceptions"],
+ "conditions": [
+ ['OS=="mac"', { # 直接在xcode上打开异常捕获功能
+ 'xcode_settings': {
+ 'GCC_ENABLE_CPP_EXCEPTIONS': 'YES'
+ }
+ }]
+ ],
"defines": [
- "NAPI_DISABLE_CPP_EXCEPTIONS"
],
...,
}
]
}
接着报错,不过这个错误和第一类一样,libtorch里用到了dynamic_cast/typeid
等语法,这个需要在C++编译器里添加-frtti
选项
修改binding.gyp
,在编译时添加-frtti
选项,同时xcode_settings
里启用GCC_ENABLE_CPP_RTTI
{
"targets": [
...,
+ "cflags!": ["-fno-exceptions", "-fno-rtti"],
+ "cflags_cc!": ["-fno-exceptions", "-fno-rtti"],
+ "cflags": ["-fexceptions", "-frtti"],
+ "cflags_cc": ["-fexceptions", "-frtti"],
"conditions": [
['OS=="mac"', { # 直接在xcode上打开异常捕获功能
'xcode_settings': {
'GCC_ENABLE_CPP_EXCEPTIONS': 'YES',
+ 'GCC_ENABLE_CPP_RTTI': 'YES'
}
}]
],
···,
]
}
然后就能编译通过了
看到没有报错还是比较高兴的,所以想都没想直接写个js文件测试一下,代码也很简单
// 加载.node
const torchScript = require("./build/Release/torch_script");
// 运行模型
const t = torchScript.moduleForward("./resnet24_se.pt", Array.from({length: 256}, v => 1));
console.log(t);
然后肯定报错啊,世界上哪有那么简单就成功的事。不过其实也能想到,毕竟没把libtorch里的动态链接库和静态链接库链接进来,这时候就能把问题转化到如何在nodejs插件中加载静/动态链接库。
首先尝试从binding.gyp
入手,试试libraries
和link_settings
这两个命令
{
'targets': [
{
...,
+ 'libraries': [
+ '<!@(ls /Users/dengpengfei/Documents/Project/C++/Node-addon-libtorch/libtorch/lib)'
+ ],
+ 'link_settings': {
+ 'library_dirs': [
+ '/Users/dengpengfei/Documents/Project/C++/Node-addon-libtorch/libtorch/lib'
+ ]
+ },
}
...,
]
}
结果并不理想,大概是动态链接库没加载上
直接用cmake编译
cmake官网:https://cmake.org/
开始想另一个思路,手动编译,反正.node
文件就是个动态链接库,那么为啥不自己用cmake
去编译一个动态链接库呢,然后开始研究CMakeLists.txt
怎么写,其实在写CMakeLists.txt
时也碰到了很多问题
# CMakeLists.txt
cmake_minimum_required(VERSION 3.19)
project(NodeScript)
# libtorch
set(CMAKE_PREFIX_PATH /Users/dengpengfei/Documents/Project/JavaScript/sei-app/lib/libtorch)
# 设置为C++14,因为libtorch是拿C++14写的
set(CMAKE_CXX_STANDARD 14)
add_compile_options(-std=c++14)
# 链接头文件,绝对路径,nodejs、node-addon-api、libtorch
include_directories(/Users/dengpengfei/.node-gyp/12.16.2/include/node)
include_directories(/Users/dengpengfei/Documents/Project/C++/Node-addon-libtorch/node_modules/node-addon-api)
include_directories(/Users/dengpengfei/Documents/Project/C++/Node-addon-libtorch/libtorch/include)
# 链接libtorch库文件
link_directories(/Users/dengpengfei/Documents/Project/C++/Node-addon-libtorch/libtorch/lib)
file(GLOB SOURCE_FILES "./*.cpp" "./*.h")
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
# 添加编译目标,即动态链接库
add_library(${PROJECT_NAME} SHARED ${SOURCE_FILES})
# 设置为C++14
set_property(TARGET ${PROJECT_NAME} PROPERTY CXX_STANDARD 14)
set_property(TARGET ${PROJECT_NAME} PROPERTY LINKER_LANGUAGE CXX)
# 链接
target_include_directories(${PROJECT_NAME} PRIVATE /Users/dengpengfei/.node-gyp/12.16.2/include/node)
target_include_directories(${PROJECT_NAME} PRIVATE /Users/dengpengfei/Documents/Project/C++/Node-addon-libtorch/node_modules/node-addon-api)
# 编译目标后缀
set_target_properties(${PROJECT_NAME} PROPERTIES PREFIX "" SUFFIX ".node")
# 链接libtorch的链接库
target_link_libraries(${PROJECT_NAME} ${TORCH_LIBRARIES})
add_definitions(-Wall -O2 -fexceptions)
然后mkdir build && cd build && cmake .. && cmake --build .
编译,自然也是报错的,仔细想想,如果直接编译的话,nodejs本身也有一些链接库,但我们编译后的东西是放到nodejs环境中去执行的,因此需要跳过这个报错
因此给CMakeLists.txt
添加这么一段命令set(CMAKE_SHARED_LINKER_FLAGS "-undefined dynamic_lookup")
,这里CMAKE_SHARED_LINKER_FLAGS
其实是用于构建动态链接库时的一个附加编译器标志,当设置为-undefined dynamic_lookup
时则会跳过未解析符号的报错(比如上面的undefined symbols)
...
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
+ set(CMAKE_SHARED_LINKER_FLAGS "-undefined dynamic_lookup")
add_library(${PROJECT_NAME} SHARED ${SOURCE_FILES})
set_property(TARGET ${PROJECT_NAME} PROPERTY CXX_STANDARD 14)
...
然后就能编译成功了,如果会报错的话,建议删除cmake
的缓存,即删除build目录下的CMakeFiles目录、cmake_install.cmake、CmakeCache.txt等文件
运行我们的test.js,可算是成功了
cmake-js编译
cmake-js:https://github.com/cmake-js/cmake-js
Cmake.js也是nodejs的插件构建工具,工作方式和node-gyp差不多,与node-gyp不同的是,cmake.js是基于CMake构建系统的。
再用Cmake构建时可能会碰到一些兼容性的问题,比如我mac能跑,到window就不一定了,cmake-js其实能解决这个问题,因此可以改写我们的CMakeLists.txt
cmake_minimum_required(VERSION 3.19)
project(NodeScript)
set(CMAKE_PREFIX_PATH /Users/dengpengfei/Documents/Project/JavaScript/sei-app/lib/libtorch)
set(CMAKE_CXX_STANDARD 14)
add_compile_options(-std=c++14)
# 头文件这个,其实意思差不多,一个用安装在.node-gyp下面的node头文件,一个用安装在.cmake-js下的头文件
+ include_directories(${CMAKE_JS_INC})
- include_directories(/Users/dengpengfei/.node-gyp/12.16.2/include/node)
include_directories(/Users/dengpengfei/Documents/Project/C++/Node-addon-libtorch/node_modules/node-addon-api)
include_directories(/Users/dengpengfei/Documents/Project/C++/Node-addon-libtorch/libtorch/include)
link_directories(/Users/dengpengfei/Documents/Project/C++/Node-addon-libtorch/libtorch/lib)
file(GLOB SOURCE_FILES "./*.cpp" "./*.h")
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
# 这一段要不要都行,cmake-js会自动加这一段
- set(CMAKE_SHARED_LINKER_FLAGS "-undefined dynamic_lookup")
add_library(${PROJECT_NAME} SHARED ${SOURCE_FILES})
set_property(TARGET ${PROJECT_NAME} PROPERTY CXX_STANDARD 14)
set_property(TARGET ${PROJECT_NAME} PROPERTY LINKER_LANGUAGE CXX)
# 这里也是同一个意思
- target_include_directories(${PROJECT_NAME} PRIVATE /Users/dengpengfei/.node-gyp/12.16.2/include/node)
+ target_include_directories(${PROJECT_NAME} PRIVATE ${CMAKE_JS_INC})
target_include_directories(${PROJECT_NAME} PRIVATE "/Users/dengpengfei/Documents/Project/C++/Node-addon-libtorch/node_modules/node-addon-api")
set_target_properties(${PROJECT_NAME} PROPERTIES PREFIX "" SUFFIX ".node")
# 一些链接库,mac系统下应该是空字符串
+ target_link_libraries(${PROJECT_NAME} ${CMAKE_JS_LIB})
target_link_libraries(${PROJECT_NAME} ${TORCH_LIBRARIES})
add_definitions(-Wall -O2 -fexceptions)
注意到上面用到的CMAKE_JS_INC
、CMAKE_JS_LIB
变量,我们可以在cmake-js中找到源码,针对window系统其实还会加一些东西,除此之外,在用cmake编译时还会加一些额外的选项,考虑的东西自然比直接用CMake要全面一些。
// lib/cMake.js getCinfigureCommand()
CMake.prototype.getConfigureCommand = async function () {
// Create command:
let command = [this.path, this.projectRoot, "--no-warn-unused-cli"];
let D = [];
// CMake.js watermark
D.push({"CMAKE_JS_VERSION": environment.moduleVersion});
// Build configuration:
D.push({"CMAKE_BUILD_TYPE": this.config});
if (environment.isWin) D.push({"CMAKE_RUNTIME_OUTPUT_DIRECTORY": this.workDir});
else D.push({"CMAKE_LIBRARY_OUTPUT_DIRECTORY": this.buildDir});
// Include and lib:
let incPaths;
if (this.dist.headerOnly) {
incPaths = [path.join(this.dist.internalPath, "/include/node")];
}
else {
let nodeH = path.join(this.dist.internalPath, "/src");
let v8H = path.join(this.dist.internalPath, "/deps/v8/include");
let uvH = path.join(this.dist.internalPath, "/deps/uv/include");
incPaths = [nodeH, v8H, uvH];
}
// NAN
let nanH = await locateNAN(this.projectRoot);
if (nanH) incPaths.push(nanH);
// Includes:
D.push({"CMAKE_JS_INC": incPaths.join(";")});
// Sources:
let srcPaths = [];
if (environment.isWin) {
let delayHook = path.normalize(path.join(__dirname, 'cpp', 'win_delay_load_hook.cc'));
srcPaths.push(delayHook.replace(/\\/gm, '/'));
}
D.push({"CMAKE_JS_SRC": srcPaths.join(";")}); // 非window系统里这个就是空的
// Runtime:
D.push({"NODE_RUNTIME": this.targetOptions.runtime});
D.push({"NODE_RUNTIMEVERSION": this.targetOptions.runtimeVersion});
D.push({"NODE_ARCH": this.targetOptions.arch});
if (environment.isWin) {
// Win
let libs = this.dist.winLibs;
if (libs.length) D.push({"CMAKE_JS_LIB": libs.join(";")});
}
// Custom options
for (let k of _.keys(this.cMakeOptions)) D.push({[k]: this.cMakeOptions[k]});
// Toolset:
await this.toolset.initialize(false);
if (this.toolset.generator) command.push("-G", this.toolset.generator);
if (this.toolset.platform) command.push("-A", this.toolset.platform);
if (this.toolset.toolset) command.push("-T", this.toolset.toolset);
if (this.toolset.cppCompilerPath) D.push({"CMAKE_CXX_COMPILER": this.toolset.cppCompilerPath});
if (this.toolset.cCompilerPath) D.push({"CMAKE_C_COMPILER": this.toolset.cCompilerPath});
if (this.toolset.compilerFlags.length) D.push({"CMAKE_CXX_FLAGS": this.toolset.compilerFlags.join(" ")});
if (this.toolset.linkerFlags.length) D.push({"CMAKE_SHARED_LINKER_FLAGS": this.toolset.linkerFlags.join(" ")});
if (this.toolset.makePath) D.push({"CMAKE_MAKE_PROGRAM": this.toolset.makePath});
// Load NPM config
...省略
command = command.concat(D.map(function (p) {
return "-D" + _.keys(p)[0] + "=" + _.values(p)[0];
}));
return command;
};
然后cmake-js compile
直接编译,没有报错还是很舒服的
最后test.js测试一下,应该没什么问题
其他问题
其实除了以上问题外,还碰到了各种各样奇葩的问题,也想过将node-gyp编译出来的.o文件和libtorch的链接库文件用gcc链接编译在一起,不过貌似不行。还碰到一个比较奇葩的问题就是报libtorch库里没有.so文件,因为libtorch目录下只有.dylib和.a文件,确实没有.so的,然后网上找了半天,又说cmake里面关于libtorch路径的部分不能加双引号,有说是libtorch里面的cmake写的有bug,最后搞了半天,把最新版的libtorch下下来,发现它里面就有.so文件,然后把.so文件拷回去就可以了。但真是这个问题嘛,显然不是,其实就是cmake有缓存,事实上把缓存清了后再编译就不会出现这个错误了。
还有就是保存pytorch模型需要使用TorchScript去保存,这样才能在C++端调用,还有就是正常训练模型是在GPU上训练的,但有时候需要在CPU上推理,所以还得保存时需要保存CPU版本的模型,还有就是模型里面存在各种分支判断,这个时候就不能用torch.jit.trace
。
train_loader, validate_loader, test_loader = loader("./preprocess_dataset/dataset-mixin.mat", batch_size=BATCH_SIZE)
model = attention_resnet(num_classes=4)
start = time.time()
losses, accuracy, confusion = train(model, train_loader, validate_loader, epoch=EPOCH)
draw_table("Train Time", sec2min(time.time() - start))
draw_table("Validate Accuracy", format(accuracy, ".4f"))
model = model.cpu() # cpu版本的模型
script_module = torch.jit.script(model) # 不能用torch.jit.trace
script_module.save("model_saved/resnet24_cbam_k128_s_100_un_shift.pt")
APP Demo
最后做出来的App大概就是这个样子,读取信号样本,然后用训练好的模型进行分类,再显示个分类准确率
总结
写代码10分钟,搞链接编译10小时。 如果报逻辑错误还好,报链接编译上的错误真的就很头疼,不过也暴露出了自己C++基础薄弱,不过,这也是个学习的过程,慢慢的把cmake这些也整明白了。上面报的错误其实也是很小的一部分,碰到错误就先自己想想,想不清楚就上网上找,当然,网上也有很多都是忽悠你的,最后好得靠自己慢慢搞。
参考
Github地址:https://github.com/sundial-dreams/nodejs_libtorch
libtorch官方文档:https://pytorch.org/cppdocs/frontend.html
Node-gyp:https://github.com/nodejs/node-gyp
Cmake文档:https://cmake.org/cmake/help/v3.21/
cmake-js:https://github.com/cmake-js/cmake-js