环境:Ubuntu12.0464位系统,hadoop-2.5.2-src.tar.gz,JDK1.7(1.8的JDK不行)
准备:
1、安装G++,CMake和zlib1g-dev,其中G++是必安装的,而CMake和zlib1g-dev(实际上是Zlibdevel软件)则是在编译native库时需要的,我们编译源码导入Eclipse不需要编译native库,或者不编译native库来编译源码,也不需要这两个软件,所以CMake和zlib1g-dev可以不安装。但如果要编译native库,你得安装。
Native库:hadoop由java开发,但有一些需求和操作并不适合使用java,所以hadoop就自己编写了一些库函数以供使用,因此出现了native库。在hadoop的配置文件里可以指定是否使用native库。通过本地库,Hadoop可以更加高效地执行某一些操作。
安装命令:sudo apt-get install g++ cmake zlib1g-dev
安装forrest
Apache forrest.
http://forrest.apache.org/mirrors.cgi
安装并且设置FORREST_HOME 到profile里面。
2、安装Maven。在Apache官网上下载bin文件,解压到相应目录。然后配置/etc/profile即可,环境变量名为M2_HOME,如下:(配置完后注意source /etc/profile)
#Maven
export M2_HOME=/usr/local/apache-maven-3.3.1
export PATH=$PATH:$M2_HOME/bin
由于编译时要下载很多东西,我们可以编辑maven\conf\settings.xml文件,将镜像站点改为中国开源镜像点,如下:
<mirrors>
<mirror>
<id>nexus-osc</id>
<mirrorOf>*</mirrorOf>
<name>Nexusosc</name>
<url>http://maven.oschina.net/content/groups/public/</url>
</mirror>
</mirrors>
和(注意不要把这个文件改错了)
<profiles>
<profile>
<id>jdk-1.7</id>
<activation>
<jdk>1.7</jdk>
</activation>
<repositories>
<repository>
<id>nexus</id>
<name>local private nexus</name>
<url>http://maven.oschina.net/content/groups/public/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>nexus</id>
<name>local private nexus</name>
<url>http://maven.oschina.net/content/groups/public/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</pluginRepository>
</pluginRepositories>
</profile>
</profiles>
然后将这个maven\conf\settings.xml文件拷贝到~/.m2文件夹下,保证用户每次使用maven都可以用到这个配置(.m2是个隐藏文件夹)
3、安装protobuf2.5(必需是这个版本或以上):
在网上下载protobuf2.5.0,然后解压(解压后文件夹名为protobuf-2.5.0),执行以下命令:
cd protobuf-2.5.0
./configure --prefix=/usr/local(此命令的意思是把软件安装在/usr/local目录下)
make
sudo make install
如果如上安装在/usr/local下,那么protoc的lib将会安装在/usr/local/lib下,头文件信息安装在/usr/local/include/google/protobuf下。
使用命令protoc--version查看是否安装成功,如果出现libprotoc 2.5.0则表示安装成功
但如果出现:protoc:error while loading shared libraries: libprotoc.so.8: cannot open shared objectfile: No such file or directory则表示安装失败。
失败原因是Ubuntu没有将/usr/local/lib库包含在环境变量path里面,我们只需修改/etc/profile,加上
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib 即可,改完注意source/etc/profile
以上三步都是必做的,少了一步或其中软件没安好,下面编译都将失败。
4、安装findbugs:
为了编译本地库native而需要安装findbugs,如果不编译native库,请忽略。编译native库时,如果不安装则报错:
[ERROR] Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (site) on projecthadoop-common: An Ant BuildException has occured: stylesheet/home/qjj/hadoop/hadoop-2.5.2-src/hadoop-common-project/hadoop-common/${env.FINDBUGS_HOME}/src/xsl/default.xsldoesn't exist.)
安装方法:解压(unzip)从http://sourceforge.jp/projects/sfnet_findbugs/下载到的findbugs(findbugs-3.0.1.zip),将文件夹放到自己的安装目录下,然后配置环境变量即可,配置/etc/profile:
#FindBugs
export FINDBUGS_HOME=/usr/local/findbugs-3.0.1
export PATH=$PATH:$FINDBUGS_HOME/bin
之后source/etc/profile即可,使用findbugs -version查看是否安装成功,出现”3.0.1”表示安装成功。
5、安装openssl devel:同样,需要编译native库的安装,否则不安装。编译native库时,如果不安装则会报错:
[ERROR] Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on projecthadoop-pipes: An Ant BuildException has occured: exec returned: 1 -> [Help1]
安装方法:从http://www.openssl.org/source/上下载源码,解压,进入根目录执行以下命令:
./config --prefix=/usr/local(配置安装目录为/usr/local)
make
sudo make install
6、配置DNS:
修改: vim/etc/resolv.conf,在后面加入,可以加快解析DNS
nameserver 8.8.8.8
nameserver 8.8.4.4
这里将上面所有需要的软件提供给大家:链接:http://pan.baidu.com/s/1bn8P6wv 密码:p5a3
开始:(1)编译成eclipse工程文件:
4、 这时在hadoop的工程目录下面就会出现很多个工程,其实都是hadoop的各个模块,这时源码便已经导入了。不过不要高兴太早…等Eclipsebulid path之后,这时你会发现Eclipse下面报了很多个错误,如下(真的好多….)
编译成功的hadoop源码工程文件,可以直接导入Eclipse:链接:http://pan.baidu.com/s/1qW8yY52 密码:w密码:wied
---------------------------------------------分割线-----------------------------------------
(2)编译Hadoop源码,生成安装文件
相比上面,步骤就简单多了。解压源码2.5.2,进入hadoop-2.5.2-src,执行:
mvn package -Pdist -DskipTests –Dtar(package是打包生成jar的命令)
然后就是漫长的等待,直到它编译成功!成功后,生成的bin文件的路径在:hadoop-2.5.2-src/hadoop-dist/target/下。这个是不编译native库的命令,编译完成后,在根目录下不会生成lib文件夹。
使用命令:mvnpackage -Pdist,native,docs -DskipTests –Dtar 可以编译native库,并生成文档。只不过这个过程相对漫长些。
(hadoop编译成功的文件(未编译native):链接:http://pan.baidu.com/s/1qWPuaAC 密码:57fr)
问题:关于上面出现的问题,在给出的链接里分析的很详细,还有其他一些问题在里面也有,我这里主要说下我碰到的编译过程中的问题。由于编译不是很熟,所以走了些弯路,一个显著的问题如下:
[ERROR] Failed to execute goalorg.apache.maven.plugins:maven-surefire-plugin:2.16:test (default-test) onproject hadoop-common: There are test failures.
[ERROR] Please refer to /home/qjj/hadoop/hadoop-2.5.2-src/hadoop-common-project/hadoop-common/target/surefire-reportsfor the individual test results.
这里提到的是test编译出错,主要是因为我没加 –DskipTests 参数,这个参数意思就是忽略test内容…
补充:hadoop的编译和导入eclipse其实在官方文档里写的很清楚(源码根目录下的build.txt),只不过我们一直走弯路,喜欢到网上去搜。其实最正确的方法就在眼前啊….
官方编译安装文档:
Build instructions for Hadoop
----------------------------------------------------------------------------------
Requirements:
* Unix System
* JDK 1.6+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code)
* Zlib devel (if compiling native code)
* openssl devel ( if compiling native hadoop-pipes )
* Internet connection for first build (to fetch all Maven and Hadoopdependencies)
----------------------------------------------------------------------------------
Maven main modules:
hadoop (Main Hadoopproject)
- hadoop-project (Parent POM for all Hadoop Mavenmodules. )
(All plugins & dependencies versions are defined here.)
-hadoop-project-dist (Parent POM formodules that generate distributions.)
-hadoop-annotations (Generates theHadoop doclet used to generated the Javadocs)
-hadoop-assemblies (Mavenassemblies used by the different modules)
-hadoop-common-project (Hadoop Common)
-hadoop-hdfs-project (Hadoop HDFS)
-hadoop-mapreduce-project (Hadoop MapReduce)
- hadoop-tools (Hadoop tools like Streaming,Distcp, etc.)
- hadoop-dist (Hadoop distribution assembler)
----------------------------------------------------------------------------------
Where to run Maven from?
It can be run from anymodule. The only catch is that if not run from utrunk
all modules that are notpart of the build run must be installed in the local
Maven cache or available ina Maven repository.
----------------------------------------------------------------------------------
Maven build goals:
* Clean : mvn clean
* Compile : mvn compile [-Pnative]
* Run tests : mvn test [-Pnative]
* Create JAR : mvn package
* Run findbugs : mvn compile findbugs:findbugs
* Run checkstyle : mvn compile checkstyle:checkstyle
* Install JAR in M2cache : mvn install
* Deploy JAR to Mavenrepo : mvn deploy
* Run clover : mvn test -Pclover[-DcloverLicenseLocation=${user.name}/.clover.license]
* Run Rat : mvn apache-rat:check
* Build javadocs : mvn javadoc:javadoc
* Build distribution : mvn package[-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar]
* Change Hadoop version : mvn versions:set -DnewVersion=NEWVERSION
Build options:
* Use -Pnative tocompile/bundle native code
* Use -Pdocs to generate& bundle the documentation in the distribution (using -Pdist)
* Use -Psrc to create aproject source TAR.GZ
* Use -Dtar to create a TARwith the distribution (using -Pdist)
Snappy build options:
Snappy is a compressionlibrary that can be utilized by the native code.
It is currently an optionalcomponent, meaning that Hadoop can be built with
or without this dependency.
* Use -Drequire.snappy tofail the build if libsnappy.so is not found.
If this option is notspecified and the snappy library is missing,
we silently build aversion of libhadoop.so that cannot make use of snappy.
This option is recommendedif you plan on making use of snappy and want
to get more repeatablebuilds.
* Use -Dsnappy.prefix tospecify a nonstandard location for the libsnappy
header files and libraryfiles. You do not need this option if you have
installed snappy using apackage manager.
* Use -Dsnappy.lib tospecify a nonstandard location for the libsnappy library
files. Similarly to snappy.prefix, you do not needthis option if you have
installed snappy using apackage manager.
* Use -Dbundle.snappy tocopy the contents of the snappy.lib directory into
the final tar file. Thisoption requires that -Dsnappy.lib is also given,
and it ignores the-Dsnappy.prefix option.
Tests options:
* Use -DskipTests to skiptests when running the following Maven goals:
'package', 'install', 'deploy' or 'verify'
*-Dtest=<TESTCLASSNAME>,<TESTCLASSNAME#METHODNAME>,....
*-Dtest.exclude=<TESTCLASSNAME>
* -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java
----------------------------------------------------------------------------------
Building components separately
If you are building a submodule directory, all the hadoopdependencies this
submodule has will be resolved as all other 3rd party dependencies.This is,
from the Maven cache or from a Maven repository (if not available inthe cache
or the SNAPSHOT 'timed out').
An alternative is to run 'mvn install -DskipTests' from Hadoop sourcetop
level once; and then work from the submodule. Keep in mind thatSNAPSHOTs
time out after a while, using the Maven '-nsu' will stop Maven fromtrying
to update SNAPSHOTs from external repos.
----------------------------------------------------------------------------------
Protocol Buffer compiler
The version of Protocol Buffer compiler, protoc, must match theversion of the
protobuf JAR.
If you have multiple versions of protoc in your system, you can setin your
build shell the HADOOP_PROTOC_PATH environment variable to point tothe one you
want to use for the Hadoop build. If you don't define thisenvironment variable,
protoc is looked up in the PATH.
----------------------------------------------------------------------------------
Importing projects to eclipse
When you import the project to eclipse, install hadoop-maven-pluginsat first.
$ cd hadoop-maven-plugins
$ mvn install
Then, generate eclipse project files.
$ mvn eclipse:eclipse-DskipTests
At last, import to eclipse by specifying the root directory of theproject via
[File] > [Import] > [Existing Projects into Workspace].
----------------------------------------------------------------------------------
Building distributions:
Create binary distribution without native code and withoutdocumentation:
$ mvn package -Pdist-DskipTests -Dtar
Create binary distribution with native code and with documentation:
$ mvn package-Pdist,native,docs -DskipTests -Dtar
Create source distribution:
$ mvn package -Psrc-DskipTests
Create source and binary distributions with native code anddocumentation:
$ mvn package-Pdist,native,docs,src -DskipTests -Dtar
Create a local staging version of the website (in /tmp/hadoop-site)
$ mvn clean site; mvnsite:stage -DstagingDirectory=/tmp/hadoop-site
----------------------------------------------------------------------------------
Handling out of memory errors in builds
----------------------------------------------------------------------------------
If the build process fails with an out of memory error, you shouldbe able to fix
it by increasing the memory used by maven -which can be done via theenvironment
variable MAVEN_OPTS.
Here is an example setting to allocate between 256 and 512 MB ofheap space to
Maven
export MAVEN_OPTS="-Xms256m -Xmx512m"