Jetson TX2实现EfficientDet推理加速(二)

一、参考资料

TensorRT实现EfficientDet推理加速(一)

二、可能出现的问题

  • infer推理错误

    [TensorRT] ERROR: 2: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion status == kSTATUS_SUCCESS failed.)
    
  • 直接用pip安装pip install onnx_graphsurgeon报错

    解决办法
    pip install nvidia-pyindex
    pip install onnx-graphsurgeon
    
  • 生成onnx过程中,不支持

    INFO:EfficientDetGraphSurgeon:Created NMS plugin 'EfficientNMS_TRT' with attributes: {'plugin_version': '1', 'background_class': -1, 'max_output_boxes': 100, 'score_threshold': 0.4000000059604645, 'iou_threshold': 0.5, 'score_activation': True, 'box_coding': 1}
    Warning: Unsupported operator EfficientNMS_TRT. No schema registered for this operator.
    Warning: Unsupported operator EfficientNMS_TRT. No schema registered for this operator.
    Warning: Unsupported operator EfficientNMS_TRT. No schema registered for this operator.
    
  • 安装dm-tree失败
    unable to execute ‘bazel’: No such file or directory #1089
    dm-tree安装方法
    dm-tree源码

    Failed to build dm-tree
    Installing collected packages: dm-tree
        Running setup.py install for dm-tree ... error
    
    源码编译安装方法一(未成功):
    
    CMake-GUI关键配置
    CMAKE_SOURCE_DIR = /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/tree/tree
    CMAKE_BINARY_DIR = /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/build_tree
    
    输出:
    Current build type is: RELEASE
    PROJECT_BINARY_DIR is: /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/build_tree
    pybind11 v2.6.2 
    Configuring done
    Generating done
    
  • 源码编译安装方法一出错

    /usr/bin/ld: cannot open output file tree/_tree.cpython-37m-aarch64-linux-gnu.so: No such file or directory
    collect2: error: ld returned 1 exit status
    CMakeFiles/_tree.dir/build.make:101: recipe for target 'tree/_tree.cpython-37m-aarch64-linux-gnu.so' failed
    make[2]: *** [tree/_tree.cpython-37m-aarch64-linux-gnu.so] Error 1
    CMakeFiles/Makefile2:127: recipe for target 'CMakeFiles/_tree.dir/all' failed
    make[1]: *** [CMakeFiles/_tree.dir/all] Error 2
    Makefile:90: recipe for target 'all' failed
    make: *** [all] Error 2
    
    源码编译安装方法二(成功):
    先安装requirements.txt里的依赖包
    pip install -r /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/tree/docs/requirements.txt
    
    python setup.py install
    
  • 安装tensorflow-model-optimization失败

    Failed to build dm-tree
    Installing collected packages: dm-tree, tensorflow-model-optimization
        Running setup.py install for dm-tree ... error
    
    安装好 dm-tree,即可顺利安装 tensorflow-model-optimization
    
  • 安装bazel失败
    Install Tensorflow Object Detection API for

    解决办法:
    https://github.com/jkjung-avt/jetson_nano/blob/master/install_bazel-3.1.0.sh
    
    通过源码编译安装bazel
    The error complains about a missing binary called bazel.
    You can install it via building from the source
    
    #!/bin/bash
    #
    # Reference: https://docs.bazel.build/versions/master/install-ubuntu.html#install-with-installer-ubuntu
    
    set -e
    
    folder=${HOME}/src
    mkdir -p $folder
    
    echo "** Install requirements"
    sudo apt-get install -y pkg-config zip g++ zlib1g-dev unzip
    sudo apt-get install -y openjdk-8-jdk
    
    echo "** Download bazel-3.1.0 sources"
    pushd $folder
    if [ ! -f bazel-3.1.0-dist.zip ]; then
      wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-dist.zip
    fi
    
    echo "** Build and install bazel-3.1.0"
    
  • 在GTX 1650服务器中运行的环境,直接pip安装到Jetson TX2失败,部分包无法安装

    pip install -r requirements-gpu.txt
    
    解决办法:
    删去requirements-gpu.txt文件中所有包的版本号,默认安装与Jetson TX2匹配的最新版本
    
  • 创建virtualenv虚拟环境失败

    tx2@tx2:/media/mydisk/MyDocuments/PyProjects/automl/efficientdet$ virtualenv -p /usr/bin/python3 venv
    Already using interpreter /usr/bin/python3
    Using base prefix '/usr'
    New python executable in /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/bin/python3
    Also creating executable in /media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/bin/python
    Installing setuptools, pkg_resources, pip, wheel...
      Complete output from command /media/mydisk/MyDocu...det/venv/bin/python3 - setuptools pkg_resources pip wheel:
      Exception:
    Traceback (most recent call last):
      File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/basecommand.py", line 215, in main
        status = self.run(options, args)
      File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/commands/install.py", line 290, in run
        with self._build_session(options) as session:
      File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/basecommand.py", line 69, in _build_session
        if options.cache_dir else None
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/posixpath.py", line 80, in join
        a = os.fspath(a)
    TypeError: expected str, bytes or os.PathLike object, not int
    ----------------------------------------
    ...Installing setuptools, pkg_resources, pip, wheel...done.
    Traceback (most recent call last):
      File "/usr/bin/virtualenv", line 11, in <module>
        load_entry_point('virtualenv==15.1.0', 'console_scripts', 'virtualenv')()
      File "/usr/lib/python3/dist-packages/virtualenv.py", line 724, in main
        symlink=options.symlink)
      File "/usr/lib/python3/dist-packages/virtualenv.py", line 992, in create_environment
        download=download,
      File "/usr/lib/python3/dist-packages/virtualenv.py", line 922, in install_wheel
        call_subprocess(cmd, show_stdout=False, extra_env=env, stdin=SCRIPT)
      File "/usr/lib/python3/dist-packages/virtualenv.py", line 817, in call_subprocess
        % (cmd_desc, proc.returncode))
    OSError: Command /media/mydisk/MyDocu...det/venv/bin/python3 - setuptools pkg_resources pip wheel failed with error code 2
    
    错误原因:
    Traceback (most recent call last):
      File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/basecommand.py", line 215, in main
        status = self.run(options, args)
      File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/commands/install.py", line 290, in run
        with self._build_session(options) as session:
      File "/usr/share/python-wheels/pip-9.0.1-py2.py3-none-any.whl/pip/basecommand.py", line 69, in _build_session
    
    virtualenv -p /usr/bin/python3 venv
    python3与pip版本不匹配,创建virtualenv找到的pip版本是9.0.1
    
    解决办法:
    用pycharm自动创建virtualenv虚拟环境
    
  • 生成FP32引擎成功,但生成FP16引擎失败

    [TensorRT] ERROR: 2: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion status == kSTATUS_SUCCESS failed.)
    Traceback (most recent call last):
      File "build_engine.py", line 240, in <module>
        main(args)
      File "build_engine.py", line 212, in main
        args.calib_batch_size)
      File "build_engine.py", line 203, in create_engine
        with self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f:
    AttributeError: __enter__
    
    [EfficientNMS_TRT not working on jetson nano (TensorRT 8.0.1) #1538](https://github.com/NVIDIA/TensorRT/issues/1538)
    错误原因:
    This problem did not occur if BatchedNMS_TRT was used instead of EfficientNMS_TRT by giving the --legacy_plugins option when creating the onnx file in create_onnx.py.
    
    What's even more strange is that it was built without any problems at Jetson Xavier NX. (same Jetpack, tensorrt version).
    有人尝试,在Jetson TX2中会出现这个问题,但是在Jetson Xavier NX没有任何问题。
    
    解决办法:
    生成onnx的时候,添加 `--legacy_plugins` 参数
    
    python create_onnx.py \
        --input_shape '1,512,512,3' \
        --saved_model /media/mydisk/YOYOFile/saved_model \
        --onnx /media/mydisk/YOYOFile/saved_model_onnx/model.onnx \
        --legacy_plugins
    
  • 如果无法跟踪tensorRT错误信息

    [builder.build_engine throws AttributeError: __enter__ #234](https://github.com/NVIDIA/TensorRT/issues/234)
    如果找不到tensorRT报错的原因,可能是tensorRT内部的错误,且tensorRT的日志信息不明显,可以降低tensorRT的日志等级。
    
    解决方法:修改日志的等级
    trt.Logger.ERROR 改为 trt.Logger.VERBOSE
    
    et the TRT_LOGGER's verbosity to VERBOSE: TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
    
  • 显存不足

    [TensorRT] ERROR: Tactic Device request: 1686MB Available: 1536MB. Device memory is insufficient to use tactic.
    
    Jetson TX2提示现存不足的ERROR,但是程序并不会终止,可以推测Jetson TX2内部自动进行内存/显存优化,防止因为显存不够的问题导致程序终止。
    
    (venv) tx2@tx2:/media/mydisk/MyDocuments/PyProjects/TensorRT/samples/python/efficientdet$ time python compare_tf.py \
    >     --engine /media/mydisk/YOYOFile/saved_model_trt_fp16/engine.trt \
    >     --saved_model /media/mydisk/YOYOFile/saved_model \
    >     --input /media/mydisk/YOYOFile/coco_calib \
    >     --output /media/mydisk/YOYOFile/output_fp16
    2021-10-22 15:35:22.133357: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
    2021-10-22 15:35:34.777079: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
    2021-10-22 15:35:34.777723: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:35:34.777983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
    pciBusID: 0000:00:00.0 name: NVIDIA Tegra X2 computeCapability: 6.2
    coreClock: 1.3GHz coreCount: 2 deviceMemorySize: 7.67GiB deviceMemoryBandwidth: 38.74GiB/s
    2021-10-22 15:35:34.778194: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
    2021-10-22 15:35:34.778427: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
    2021-10-22 15:35:34.778583: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.10
    2021-10-22 15:35:34.778749: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
    2021-10-22 15:35:34.779183: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
    2021-10-22 15:35:34.825369: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.10
    2021-10-22 15:35:34.861433: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.10
    2021-10-22 15:35:34.861805: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
    2021-10-22 15:35:34.862251: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:35:34.862703: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:35:34.862908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
    2021-10-22 15:37:02.440933: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:37:02.441206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties: 
    pciBusID: 0000:00:00.0 name: NVIDIA Tegra X2 computeCapability: 6.2
    coreClock: 1.3GHz coreCount: 2 deviceMemorySize: 7.67GiB deviceMemoryBandwidth: 38.74GiB/s
    2021-10-22 15:37:02.441661: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:37:02.442112: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:37:02.442278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
    2021-10-22 15:37:02.442651: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
    2021-10-22 15:37:09.339992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
    2021-10-22 15:37:09.340386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
    2021-10-22 15:37:09.340484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
    2021-10-22 15:37:09.341206: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:37:09.341823: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:37:09.342411: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
    2021-10-22 15:37:09.342745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 80 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
    2021-10-22 15:40:55.306220: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
    2021-10-22 15:40:55.546753: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 31250000 Hz
    len(batch_images): ['/media/mydisk/YOYOFile/coco_calib/COCO_train2014_000000000009.jpg']
    2021-10-22 15:42:32.819948: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
    2021-10-22 15:42:33.464673: I tensorflow/stream_executor/cuda/cuda_dnn.cc:380] Loaded cuDNN version 8201
    2021-10-22 15:42:33.958547: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 24.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
    2021-10-22 15:42:33.983722: W tensorflow/core/kernels/gpu_utils.cc:49] Failed to allocate memory for convolution redzone checking; skipping this check. This is benign and only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once.
    2021-10-22 15:42:42.844197: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 22.75MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
    2021-10-22 15:42:43.485925: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
    2021-10-22 15:42:45.070240: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 16.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
    2021-10-22 15:42:45.094177: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 16.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
    ...
    ...
    2021-10-22 15:42:55.842108: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 1474560 totalling 2.81MiB
    2021-10-22 15:42:55.842169: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 27442176 totalling 26.17MiB
    2021-10-22 15:42:55.842230: I tensorflow/core/common_runtime/bfc_allocator.cc:1058] Sum Total of in-use chunks: 58.26MiB
    2021-10-22 15:42:55.842290: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] total_region_allocated_bytes_: 84848640 memory_limit_: 84848640 available bytes: 0 curr_region_allocation_bytes_: 134217728
    2021-10-22 15:42:55.869607: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: 
    Limit:                        84848640
    InUse:                        61095168
    MaxInUse:                     68186112
    NumAllocs:                        1583
    MaxAllocSize:                 27442176
    Reserved:                            0
    PeakReserved:                        0
    LargestFreeBlock:                    0
    
    2021-10-22 15:42:55.869973: W tensorflow/core/common_runtime/bfc_allocator.cc:467] ****************************************___*******************************xxx_______________________
    2021-10-22 15:42:55.936569: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at fused_batch_norm_op.cc:1360 : Resource exhausted: OOM when allocating tensor with shape[1,96,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    Traceback (most recent call last):
      File "compare_tf.py", line 263, in <module>
        main(args)
      File "compare_tf.py", line 234, in main
        tf_images, tf_detections = run(tf_batcher, tf_infer, "TensorFlow", args.nms_threshold)
      File "compare_tf.py", line 124, in run
        res_detections += inferer.infer(batch, scales, nms_threshold)
      File "compare_tf.py", line 77, in infer
        output = self.pred_fn(**input)
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1711, in __call__
        return self._call_impl(args, kwargs)
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/wrap_function.py", line 247, in _call_impl
        args, kwargs, cancellation_manager)
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1729, in _call_impl
        return self._call_with_flat_signature(args, kwargs, cancellation_manager)
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1778, in _call_with_flat_signature
        return self._call_flat(args, self.captured_inputs, cancellation_manager)
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1961, in _call_flat
        ctx, args, cancellation_manager=cancellation_manager))
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 596, in call
        ctx=ctx)
      File "/media/mydisk/MyDocuments/PyProjects/automl/efficientdet/venv/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
        inputs, attrs, num_outputs)
    tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
      (0) Resource exhausted:  OOM when allocating tensor with shape[1,96,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    	 [[node efficientnet-b0/blocks_1/tpu_batch_normalization/FusedBatchNormV3 (defined at compare_tf.py:42) ]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    
    	 [[strided_slice_18/_36]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    
      (1) Resource exhausted:  OOM when allocating tensor with shape[1,96,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    	 [[node efficientnet-b0/blocks_1/tpu_batch_normalization/FusedBatchNormV3 (defined at compare_tf.py:42) ]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    
    0 successful operations.
    0 derived errors ignored. [Op:__inference_pruned_42115]
    
    Function call stack:
    pruned -> pruned
    
    
    real	7m57.829s
    user	6m34.100s
    sys	0m14.384s
    
上一篇:蓝桥杯 1111: Cylinder


下一篇:WP采集汇集WP采集插件-WP关键词采集文章