目录
一.环境配置
1.1Tensorrt安装:
官方手册:Installation Guide :: NVIDIA Deep Learning TensorRT Documentation
选择tar安装
- Install the following dependencies, if not already present:
- CUDA 10.2, 11.0 update 1, 11.1 update 1, 11.2 update 2, 11.3 update 1, or 11.4 update 2
- cuDNN 8.2.1
- Python 3 (Optional)
- Download the TensorRT tar file that matches the CPU architecture and CUDA version you are using.
- Choose where you want to install TensorRT. This tar file will install everything into a subdirectory called TensorRT-8.x.x.x.
- Unpack the tar file.
version="8.x.x.x" arch=$(uname -m) cuda="cuda-x.x" cudnn="cudnn8.x" tar xzvf TensorRT-${version}.Linux.${arch}-gnu.${cuda}.${cudnn}.tar.gz
Where:- 8.x.x.x is your TensorRT version
- cuda-x.x is CUDA version 10.2 or 11.4
- cudnn8.x is cuDNN version 8.2
ls TensorRT-${version} bin data doc graphsurgeon include lib onnx_graphsurgeon python samples targets TensorRT-Release-Notes.pdf uff
- Add the absolute path to the TensorRTlib directory to the environment variable LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<TensorRT-${version}/lib>
- Install the Python TensorRT wheel file.
cd TensorRT-${version}/python python3 -m pip install tensorrt-*-cp3x-none-linux_x86_64.whl
- Install the Python UFF wheel file. This is only required if you plan to use TensorRT with TensorFlow.
cd TensorRT-${version}/uff python3 -m pip install uff-0.6.9-py2.py3-none-any.whl
Check the installation with:which convert-to-uff
- Install the Python graphsurgeon wheel file.
cd TensorRT-${version}/graphsurgeon python3 -m pip install graphsurgeon-0.4.5-py2.py3-none-any.whl
- Install the Python onnx-graphsurgeon wheel file.
cd TensorRT-${version}/onnx_graphsurgeon python3 -m pip install onnx_graphsurgeon-0.3.12-py2.py3-none-any.whl
- Verify the installation:
- Ensure that the installed files are located in the correct directories. For example, run the tree -d command to check whether all supported installed files are in place in the lib, include, data, etc… directories.
- Build and run one of the shipped samples, for example, sampleMNIST in the installed directory. You should be able to compile and execute the sample without additional settings. For more information, see the “Hello World” For TensorRT (sampleMNIST).
- The Python samples are in the samples/python directory
1.2 Opencv安装
二.C++ FastReID-TensorRT
2.1模型转换
由.pth到.wts,FastReID提供了代码
python projects/FastRT/tools/gen_wts.py --config-file='config/you/use/in/fastreid/xxx.yml' \
--verify --show_model --wts_path='outputs/trt_model_file/xxx.wts' \
MODEL.WEIGHTS '/path/to/checkpoint_file/model_best.pth' MODEL.DEVICE "cuda:0"
之后将.wts移动到/FastRT目录下:
2.2修改config文件
根据你的模型backbone配置(sbs_R50-ibn或kd-r34-r101_ibn等等),根据其提供的文档进行修改。
https://github.com/JDAI-CV/fast-reid/tree/master/projects/FastRT#ConfigSection
下面是kd-r34-r101_ibn的例子:
对于模型的位置,batchsize,inputsize,输出特征维度,device编号,模型的backbone。head等等进行配置。
static const std::string WEIGHTS_PATH = "../kd_r34_distill.wts";
static const std::string ENGINE_PATH = "./kd_r34_distill.engine";
static const int MAX_BATCH_SIZE = 4;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 512;
static const int DEVICE_ID = 0;
static const FastreidBackboneType BACKBONE = FastreidBackboneType::r34_distill;
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = false;
static const bool WITH_NL = false;
static const int EMBEDDING_DIM = 0;
2.3建立第三方库
主要是cnp库,来读写numpy
cd third_party/cnpy
cmake -DCMAKE_INSTALL_PREFIX=../../libs/cnpy -DENABLE_STATIC=OFF . && make -j4 && make install
2.4Build fastrt execute file
mkdir build
cd build
cmake -DBUILD_FASTRT_ENGINE=ON \
-DBUILD_DEMO=ON \
-DUSE_CNUMPY=ON ..
make
这一步在make时如果报错
fatal error: NvInfer.h No such file or directory
#include "NvInfer.h"
^~~~~~~~~~~
compilation terminated.
我们可以看到在demo/cmakelist.txt里关于tensorrt库和头文件的配置为:
include_directories(/usr/include/x86_64-linux-gnu/)
link_directories(/usr/lib/x86_64-linux-gnu/)
说明我们没有将TensorRT 的库和头文件添加到系统路径下,需要:
# 在TensorRT的路径下
sudo cp -r ./lib/* /usr/lib
sudo cp -r ./include/* /usr/include
或者直接在cmakelist.txt中添加tenorrt库和头文件的绝对路径:
include_directories(/.../TensorRT-7.2.3.4/include/)
link_directories(/.../TensorRT-7.2.3.4/lib/)
问题解决
2.5 Run
./demo/fastrt -s // 序列化模型,并生成.engine文件
./demo/fastrt -d // 反序列化engine文件,并运行
在反序列化engine文件时报错:
[E] [TRT] INVALID_CONFIG: The engine plan file is generated on an incompatible device, expecting compute 7.5 got compute 8.6, please rebuild.
说明生成engine文件的显卡与解译的显卡型号不同,算力不匹配,需要保持一致,便可解决