PyTorch for Jetson Nano - version 1.4.0 now available

Below are pre-built PyTorch pip wheel installers for Python 2.7 and Python 3.6 on Jetson Nano, Jetson TX2, and Jetson Xavier with JetPack >= 4.2.1

note: these binaries are built for ARM aarch64 architecture, so run these commands on a Jetson (not on a host PC)
UPDATE: check out our new torch2trt tool for converting PyTorch models to TensorRT! https://github.com/NVIDIA-AI-IOT/torch2trt

PyTorch v1.4.0

> Python 2.7*
wget https://nvidia.box.com/shared/static/1v2cc4ro6zvsbu0p8h6qcuaqco1qcsif.whl -O torch-1.4.0-cp27-cp27mu-linux_aarch64.whl
sudo apt-get install libopenblas-base
pip install torch-1.4.0-cp27-cp27mu-linux_aarch64.whl

> Python 3.6*
wget https://nvidia.box.com/shared/static/ncgzus5o23uck9i5oth2n8n06k340l6k.whl -O torch-1.4.0-cp36-cp36m-linux_aarch64.whl
sudo apt-get install python3-pip libopenblas-base
pip3 install Cython
pip3 install numpy torch-1.4.0-cp36-cp36m-linux_aarch64.whl

* includes OpenBLAS support, USE_DISTRIBUTED=1 with OpenMPI backend, and resources patch from PyTorch issue #8103
* as per the PyTorch Release Notes, Python 2 support is deprecated and PyTorch v1.4 is the last version to support Python 2.

PyTorch v1.3.0

> Python 2.7*
wget https://nvidia.box.com/shared/static/6t52xry4x2i634h1cfqvc9oaoqfzrcnq.whl -O torch-1.3.0-cp27-cp27mu-linux_aarch64.whl
pip install torch-1.3.0-cp27-cp27mu-linux_aarch64.whl

> Python 3.6*
wget https://nvidia.box.com/shared/static/phqe92v26cbhqjohwtvxorrwnmrnfx1o.whl -O torch-1.3.0-cp36-cp36m-linux_aarch64.whl
pip3 install numpy torch-1.3.0-cp36-cp36m-linux_aarch64.whl

* includes resources patch from PyTorch issue #8103

PyTorch v1.2.0

> Python 2.7*
wget https://nvidia.box.com/shared/static/8gcxrmcc6q4oc7xsoybk5wb26rkwugme.whl -O torch-1.2.0a0+8554416-cp27-cp27mu-linux_aarch64.whl
pip install torch-1.2.0a0+8554416-cp27-cp27mu-linux_aarch64.whl

> Python 3.6*
wget https://nvidia.box.com/shared/static/06vlvedmqpqstu1dym49fo7aapgfyyu9.whl -O torch-1.2.0a0+8554416-cp36-cp36m-linux_aarch64.whl
pip3 install numpy torch-1.2.0a0+8554416-cp36-cp36m-linux_aarch64.whl

* includes resources patch from PyTorch issue #8103

PyTorch v1.1.0

> Python 2.7*
wget https://nvidia.box.com/shared/static/n9p7u0tem1hqe0kyhjspzz78xpka7f5e.whl -O torch-1.1.0-cp27-cp27mu-linux_aarch64.whl
pip install torch-1.1.0-cp27-cp27mu-linux_aarch64.whl

> Python 3.6*
wget https://nvidia.box.com/shared/static/mmu3xb3sp4o8qg9tji90kkxl1eijjfc6.whl -O torch-1.1.0-cp36-cp36m-linux_aarch64.whl
pip3 install numpy torch-1.1.0-cp36-cp36m-linux_aarch64.whl

* includes resources patch from PyTorch issue #8103

PyTorch v1.0.0

> Python 2.7
wget https://nvidia.box.com/shared/static/d5v4bngglqhdbr4g9ir4eeg6k6miwqnv.whl -O torch-1.0.0a0+bb15580-cp27-cp27mu-linux_aarch64.whl
pip install torch-1.0.0a0+bb15580-cp27-cp27mu-linux_aarch64.whl

> Python 3.6
wget https://nvidia.box.com/shared/static/2ls48wc6h0kp1e58fjk21zast96lpt70.whl -O torch-1.0.0a0+bb15580-cp36-cp36m-linux_aarch64.whl
pip3 install numpy torch-1.0.0a0+bb15580-cp36-cp36m-linux_aarch64.whl


torchvision

$ sudo apt-get install libjpeg-dev zlib1g-dev
$ git clone --branch <version> https://github.com/pytorch/vision torchvision   # see below for version of torchvision to download
$ cd torchvision
$ sudo python setup.py install
$ cd ../  # attempting to load torchvision from build dir will result in import error

Select the version of torchvision to download depending on the version of PyTorch that you have installed:
PyTorch v1.0 - torchvision v0.2.2
PyTorch v1.1 - torchvision v0.3.0
PyTorch v1.2 - torchvision v0.4.0
PyTorch v1.3 - torchvision v0.4.2
PyTorch v1.4 - torchvision v0.5.0

Verification
To verify that PyTorch has been installed correctly on your system, launch an interactive Python interpreter from terminal, 
('python' command for 2.7 or 'python3' for 3.6) and run the following:

>>> import torch
>>> print(torch.__version__)
>>> print('CUDA available: ' + str(torch.cuda.is_available()))
>>> print('cuDNN version: ' + str(torch.backends.cudnn.version()))
>>> a = torch.cuda.FloatTensor(2).zero_()
>>> print('Tensor a = ' + str(a))
>>> b = torch.randn(2).cuda()
>>> print('Tensor b = ' + str(b))
>>> c = a + b
>>> print('Tensor c = ' + str(c))

>>> import torchvision
>>> print(torchvision.__version__)


Build Instructions
Below are the steps used to build the PyTorch wheels. These were compiled in a couple of hours on a Xavier for Nano, TX2, and Xavier. 

Note that if you are trying to build on Nano, you will need to mount a swap file.

Max Performance
$ sudo nvpmodel -m 0
$ sudo ~/jetson_clocks.sh

Download PyTorch sources
$ git clone --recursive --branch <version> http://github.com/pytorch/pytorch
$ cd pytorch

Apply Patch
The patch below is to avoid the "too many CUDA resources requested for launch" error (PyTorch issue #8103)

diff --git a/aten/src/ATen/cuda/CUDAContext.cpp b/aten/src/ATen/cuda/CUDAContext.cpp
index e48c020b03..0ecc111c4b 100644
--- a/aten/src/ATen/cuda/CUDAContext.cpp
+++ b/aten/src/ATen/cuda/CUDAContext.cpp
@@ -24,6 +24,8 @@ void initCUDAContextVectors() {
 void initDeviceProperty(DeviceIndex device_index) {
   cudaDeviceProp device_prop;
   AT_CUDA_CHECK(cudaGetDeviceProperties(&device_prop, device_index));
+  // patch for "too many resources requested for launch"
+  device_prop.maxThreadsPerBlock = device_prop.maxThreadsPerBlock / 2;
   device_properties[device_index] = device_prop;
 }
 
diff --git a/aten/src/ATen/cuda/detail/KernelUtils.h b/aten/src/ATen/cuda/detail/KernelUtils.h
index af788ff8f8..fb27ab808c 100644
--- a/aten/src/ATen/cuda/detail/KernelUtils.h
+++ b/aten/src/ATen/cuda/detail/KernelUtils.h
@@ -19,7 +19,10 @@ namespace at { namespace cuda { namespace detail {
   for (int i=_i_n_d_e_x; _i_n_d_e_x < (n); _i_n_d_e_x+=blockDim.x * gridDim.x, i=_i_n_d_e_x)
 
 // Use 1024 threads per block, which requires cuda sm_2x or above
-constexpr int CUDA_NUM_THREADS = 1024;
+//constexpr int CUDA_NUM_THREADS = 1024;
+
+// patch for "too many resources requested for launch"
+constexpr int CUDA_NUM_THREADS = 512;
 
 // CUDA: number of blocks for threads.
 inline int GET_BLOCKS(const int N)
diff --git a/aten/src/THCUNN/common.h b/aten/src/THCUNN/common.h
index 61cd90cdd6..cec1fa2698 100644
--- a/aten/src/THCUNN/common.h
+++ b/aten/src/THCUNN/common.h
@@ -5,7 +5,10 @@
   "Some of weight/gradient/input tensors are located on different GPUs. Please move them to a single one.")
 
 // Use 1024 threads per block, which requires cuda sm_2x or above
-const int CUDA_NUM_THREADS = 1024;
+//const int CUDA_NUM_THREADS = 1024;
+
+// patch for "too many resources requested for launch"
+const int CUDA_NUM_THREADS = 512;
 
 // CUDA: number of blocks for threads.
 inline int GET_BLOCKS(const int N)

Note that this exact patch is for PyTorch 1.3/1.4 - the source changes are the same for previous versions, but the file line locations may have changed, so it is recommended to apply these changes by hand.

Set Build Options
$ export USE_NCCL=0
$ export USE_DISTRIBUTED=0                # skip setting this if you want to enable OpenMPI backend
$ export TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2"

$ export PYTORCH_BUILD_VERSION=<version>  # without the leading 'v', e.g. 1.3.0 for PyTorch v1.3.0
$ export PYTORCH_BUILD_NUMBER=1

(remember to re-export these environment variables if you change terminal)

Build wheel for Python 2.7 (to pytorch/dist)
$ sudo apt-get install python-pip cmake libopenblas-dev
$ pip install -U pip

$ sudo pip install -U setuptools
$ sudo pip install -r requirements.txt

$ pip install scikit-build --user
$ pip install ninja --user

$ python setup.py bdist_wheel


Build wheel for Python 3.6 (to pytorch/dist)
$ sudo apt-get install python3-pip cmake libopenblas-dev

$ sudo pip3 install -U setuptools
$ sudo pip3 install -r requirements.txt

$ pip3 install scikit-build --user
$ pip3 install ninja --user

$ python3 setup.py bdist_wheel


Note on upgrading pip
If you get this error from pip/pip3 after upgrading pip with "pip install -U pip":

pip
Traceback (most recent call last):
  File "/usr/bin/pip", line 9, in <module>
    from pip import main
ImportError
: cannot import name 'main'

You can either downgrade pip to it's original version:
# Python 2.7
$ sudo python -m pip uninstall pip && sudo apt install python-pip --reinstall

# Python 3.6
$ sudo python3 -m pip uninstall pip && sudo apt install python3-pip --reinstall

-or- you can patch /usr/bin/pip (or /usr/bin/pip3)
diff --git a/pip b/pip
index 56bbb2b..62f26b9 100755
--- a/pip
+++ b/pip
@@ -6,6 +6,6 @@ import sys
 # Run the main entry point, similarly to how setuptools does it, but because
 # we didn't install the actual entry point from setup.py, don't use the
 # pkg_resources API.
-from pip import main
+from pip import __main__
 if __name__ == '__main__':
-    sys.exit(main())
+    sys.exit(__main__._main())

 

上一篇:ARMV8体系结构简介:AArch64系统级体系结构之编程模型(1)-EL/ET/ST


下一篇:申请鲲鹏920测试机试水+编译nginx