这几天终于把tensorflow安装上了,中间遇到过不少的问题,这里记录下来。供大家想源码安装的参考。
安装环境:POWER8处理器,Docker容器Ubuntu14.04镜像。
Build Tensorflow for IBM POWER8 CPU from Source Code
1. My os environment
14.04.1-Ubuntu SMP
ppc64le
gcc 4.8.4
python 2.7.6
2. Install bazel and protobuf
I only have openjdk-7. so I installed bazel 0.1.0, and bazel 0.1.0 needs protobuf v3.0.0-alpha-3, you can refer to “Build Bazel<v0.1.0> for IBM POWER8 CPU from Source Code" for the installation.
3. Install other dependencies
sudo apt-get install python-pip python-dev python-numpy
sudo apt-get install swig
4. get source code
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
5. modify ~/.bazelrc
add build options #you can visit http://bazel.io/docs/bazel-user-manual.html to find these options' descriptions
to build in standalone : --spawn_strategy=standalone --genrule_strategy=standalone
to limit cpu and ram usage : --jobs=20 --ram_utilization_factor percentage=30
6. build source code
./configure (select GPU or CPU)
bazel build -c opt //tensorflow/cc:tutorials_example_trainer
7. Create the pip package and install
7.1 generate tensorflow whl package
if you wan to use tensorflow in python, a pip package should be created
$ bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
# or build with GPU support:
$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
after a night, a message displayed:
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 32556.820s, Critical Path: 31793.39s
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
7.2 tensorflow whl package path
opuser@nova:~/tensorflow/tensorflow$ ls /tmp/tensorflow_pkg/
tensorflow-0.5.0-cp27-none-linux_ppc64le.whl
7.3 install whl package using pip
opuser@nova:~/tensorflow/tensorflow$ sudo pip install /tmp/tensorflow_pkg/tensorflow-0.5.0-cp27-none-linux_ppc64le.whl
7.4 tensflow installed package path
opuser@nova:~/tensorflow/tensorflow/tensorflow/models/image/mnist$ ls /usr/local/lib/python2.7/dist-packages
tensorflow tensorflow-0.5.0.dist-info
7.5 train a mnist dataset(#sudo is needed)
# You can alternatively pass the path to the model program file to the python interpreter.
opuser@nova:~$ sudo python /usr/local/lib/python2.7/dist-packages/tensorflow/models/image/mnist/convolutional.py
Succesfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Succesfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Succesfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Succesfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
can't determine number of CPU cores: assuming 4
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
can't determine number of CPU cores: assuming 4
I tensorflow/core/common_runtime/direct_session.cc:60] Direct session inter op parallelism threads: 4
Initialized!
Epoch 0.00
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Minibatch loss: 3.289, learning rate: 0.010000
......
8. problems during compiling
<Error: gcc: internal compiler error: Killed, com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.
>
This is due to the lack of cpu ram or swap. you can modify --jobs value or --ram_utilization_factor value . or check if there is any process that occupies large ram. and kill it. It happends to me that there may exist two bazel servers. so I need to kill one.
9. reference
tensorflow/tensorflow/g3doc/get_started/os_setup.md
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md
bazel-user-manual.html
http://bazel.io/docs/bazel-user-manual.html
cuda or cudnn version dismatch
https://github.com/tensorflow/tensorflow/issues/125