- pytorch转onnx
import torch torch_model = torch.load("save.pt") # pytorch模型加载 batch_size = 1 #批处理大小 input_shape = (3,244,244) #输入数据 # set the model to inference mode torch_model.eval() x = torch.randn(batch_size,*input_shape) # 生成张量 export_onnx_file = "test.onnx" # 目的ONNX文件名 torch.onnx.export(torch_model, x, export_onnx_file, opset_version=10, do_constant_folding=True, # 是否执行常量折叠优化 input_names=["input"], # 输入名 output_names=["output"], # 输出名 dynamic_axes={"input":{0:"batch_size"}, # 批处理变量 "output":{0:"batch_size"}})
- tensorflow,pb模型转uff再转engine
pb模型转uff
首先安装convert-to-uff: apt install uff-converter-tf 执行:python3 /usr/local/bin/convert-to-uff --help 输出: Converts TensorFlow models to Unified Framework Format (UFF). positional arguments: input_file path to input model (protobuf file of frozen GraphDef) optional arguments: -h, --help show this help message and exit -l, --list-nodes show list of nodes contained in input file -t, --text write a text version of the output in addition to the binary --write_preprocessed write the preprocessed protobuf in addition to the binary -q, --quiet disable log messages -d, --debug Enables debug mode to provide helpful debugging output -o OUTPUT, --output OUTPUT name of output uff file -O OUTPUT_NODE, --output-node OUTPUT_NODE name of output nodes of the model -I INPUT_NODE, --input-node INPUT_NODE name of a node to replace with an input to the model. Must be specified as: "name,new_name,dtype,dim1,dim2,..." -p PREPROCESSOR, --preprocessor PREPROCESSOR the preprocessing file to run before handling the graph. This file must define a `preprocess` function that accepts a GraphSurgeon DynamicGraph as it's input. All transformations should happen in place on the graph, as return values are discarded 转换过程: python3 /usr/local/bin/convert-to-uff model.pb -o model.uff -O softmax/Softmax -I input_1,input_1,float32,1,3,224,224
uff再转engine
执行:/usr/src/tensorrt/bin/trtexec --help 输出: === Model Options === --uff=<file> UFF model --onnx=<file> ONNX model --model=<file> Caffe model (default = no model, random weights used) --deploy=<file> Caffe prototxt file --output=<name>[,<name>]* Output names (it can be specified multiple times); at least one output is required for UFF and Caffe --uffInput=<name>,X,Y,Z Input blob name and its dimensions (X,Y,Z=C,H,W), it can be specified multiple times; at least one is required for UFF models --uffNHWC Set if inputs are in the NHWC layout instead of NCHW (use X,Y,Z=H,W,C order in --uffInput) === Build Options === --maxBatch Set max batch size and build an implicit batch engine (default = 1) --explicitBatch Use explicit batch sizes when building the engine (default = implicit) --minShapes=spec Build with dynamic shapes using a profile with the min shapes provided --optShapes=spec Build with dynamic shapes using a profile with the opt shapes provided --maxShapes=spec Build with dynamic shapes using a profile with the max shapes provided Note: if any of min/max/opt is missing, the profile will be completed using the shapes provided and assuming that opt will be equal to max unless they are both specified; partially specified shapes are applied starting from the batch size; dynamic shapes imply explicit batch input names can be wrapped with single quotes (ex: 'Input:0') Input shapes spec ::= Ishp[","spec] Ishp ::= name":"shape shape ::= N[["x"N]*"*"] --inputIOFormats=spec Type and formats of the input tensors (default = all inputs in fp32:chw) --outputIOFormats=spec Type and formats of the output tensors (default = all outputs in fp32:chw) IO Formats: spec ::= IOfmt[","spec] IOfmt ::= type:fmt type ::= "fp32"|"fp16"|"int32"|"int8" fmt ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32")["+"fmt] --workspace=N Set workspace size in megabytes (default = 16) --minTiming=M Set the minimum number of iterations used in kernel selection (default = 1) --avgTiming=M Set the number of times averaged in each iteration for kernel selection (default = 8) --fp16 Enable fp16 algorithms, in addition to fp32 (default = disabled) --int8 Enable int8 algorithms, in addition to fp32 (default = disabled) --calib=<file> Read INT8 calibration cache file --safe Only test the functionality available in safety restricted flows --saveEngine=<file> Save the serialized engine --loadEngine=<file> Load a serialized engine === Inference Options === --batch=N Set batch size for implicit batch engines (default = 1) --shapes=spec Set input shapes for dynamic shapes inputs. Input names can be wrapped with single quotes(ex: 'Input:0') Input shapes spec ::= Ishp[","spec] Ishp ::= name":"shape shape ::= N[["x"N]*"*"] --loadInputs=spec Load input values from files (default = generate random inputs). Input names can be wrapped with single quotes (ex: 'Input:0') Input values spec ::= Ival[","spec] Ival ::= name":"file --iterations=N Run at least N inference iterations (default = 10) --warmUp=N Run for N milliseconds to warmup before measuring performance (default = 200) --duration=N Run performance measurements for at least N seconds wallclock time (default = 3) --sleepTime=N Delay inference start with a gap of N milliseconds between launch and compute (default = 0) --streams=N Instantiate N engines to use concurrently (default = 1) --exposeDMA Serialize DMA transfers to and from device. (default = disabled) --useSpinWait Actively synchronize on GPU events. This option may decrease synchronization time but increase CPU usage and power (default = disabled) --threads Enable multithreading to drive engines with independent threads (default = disabled) --useCudaGraph Use cuda graph to capture engine execution and then launch inference (default = disabled) --buildOnly Skip inference perf measurement (default = disabled) === Build and Inference Batch Options === When using implicit batch, the max batch size of the engine, if not given, is set to the inference batch size; when using explicit batch, if shapes are specified only for inference, they will be used also as min/opt/max in the build profile; if shapes are specified only for the build, the opt shapes will be used also for inference; if both are specified, they must be compatible; and if explicit batch is enabled but neither is specified, the model must provide complete static dimensions, including batch size, for all inputs === Reporting Options === --verbose Use verbose logging (default = false) --avgRuns=N Report performance measurements averaged over N consecutive iterations (default = 10) --percentile=P Report performance for the P percentage (0<=P<=100, 0 representing max perf, and 100 representing min perf; (default = 99%) --dumpOutput Print the output tensor(s) of the last inference iteration (default = disabled) --dumpProfile Print profile information per layer (default = disabled) --exportTimes=<file> Write the timing results in a json file (default = disabled) --exportOutput=<file> Write the output tensors to a json file (default = disabled) --exportProfile=<file> Write the profile information per layer in a json file (default = disabled) === System Options === --device=N Select cuda device N (default = 0) --useDLACore=N Select DLA core N for layers that support DLA (default = none) --allowGPUFallback When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled) --plugins Plugin library (.so) to load (can be specified multiple times) === Help === --help Print this message 转换过程: /usr/src/tensorrt/bin/trtexec --uff=/home/model/model.uff --uffInput=input_1,1,3,224,224 --output=softmax/Softmax --saveEngine=/home/model/model.engine --outputIOFormats=fp32:chw --buildOnly --useCudaGraph
- tensorflow,pb模型转onnx再转engine
pb模型转onnx
第一步安装tf2onnx:pip install -U tf2onnx 执行:python3 -m tf2onnx.convert --help 查看使用方式 输出: usage: convert.py [-h] [--input INPUT] [--graphdef GRAPHDEF] [--saved-model SAVED_MODEL] [--tag TAG] [--signature_def SIGNATURE_DEF] [--concrete_function CONCRETE_FUNCTION] [--checkpoint CHECKPOINT] [--keras KERAS] [--large_model] [--output OUTPUT] [--inputs INPUTS] [--outputs OUTPUTS] [--opset OPSET] [--custom-ops CUSTOM_OPS] [--extra_opset EXTRA_OPSET] [--target {rs4,rs5,rs6,caffe2}] [--continue_on_error] [--verbose] [--debug] [--output_frozen_graph OUTPUT_FROZEN_GRAPH] [--fold_const] [--inputs-as-nchw INPUTS_AS_NCHW] Convert tensorflow graphs to ONNX. optional arguments: -h, --help show this help message and exit --input INPUT input from graphdef --graphdef GRAPHDEF input from graphdef --saved-model SAVED_MODEL input from saved model --tag TAG tag to use for saved_model --signature_def SIGNATURE_DEF signature_def from saved_model to use --concrete_function CONCRETE_FUNCTION For TF2.x saved_model, index of func signature in __call__ (--signature_def is ignored) --checkpoint CHECKPOINT input from checkpoint --keras KERAS input from keras model --large_model use the large model format (for models > 2GB) --output OUTPUT output model file --inputs INPUTS model input_names --outputs OUTPUTS model output_names --opset OPSET opset version to use for onnx domain --custom-ops CUSTOM_OPS list of custom ops --extra_opset EXTRA_OPSET extra opset with format like domain:version, e.g. com.microsoft:1 --target {rs4,rs5,rs6,caffe2} target platform --continue_on_error continue_on_error --verbose, -v verbose output, option is additive --debug debug mode --output_frozen_graph OUTPUT_FROZEN_GRAPH output frozen tf graph to file --fold_const Deprecated. Constant folding is always enabled. --inputs-as-nchw INPUTS_AS_NCHW transpose inputs as from nhwc to nchw Usage Examples: python -m tf2onnx.convert --saved-model saved_model_dir --output model.onnx python -m tf2onnx.convert --input frozen_graph.pb --inputs X:0 --outputs output:0 --output model.onnx python -m tf2onnx.convert --checkpoint checkpoint.meta --inputs X:0 --outputs output:0 --output model.onnx For help and additional information see: https://github.com/onnx/tensorflow-onnx If you run into issues, open an issue here: https://github.com/onnx/tensorflow-onnx/issues 转换成onnx python3 -m tf2onnx.convert --input model.pb --inputs input_1:0 --outputs softmax/Softmax:0 --inputs-as-nchw input_1:0 --output model.onnx --opset 13
onnx转engine
onnx转engine(这是动态输入时的转换方式) /usr/src/tensorrt/bin/trtexec --onnx=/home/model/model.onnx --explicitBatch --minShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --optShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --maxShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --shapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw --saveEngine=/home/model/model.engine --buildOnly --useCudaGraph 如果是固定输入则去除--explicitBatch --minShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --optShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --maxShapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3 --shapes=\'input_1:0\':1x3x224x224,\'softmax/Softmax:0\':1x3,然后增加--batch batch_size,这里的batch_size是tensorflow模型的batch_size