一、系统及软件环境
1、操作系统
CentOS release 6.5 (Final)
内核版本:2.6.32-431.el6.x86_64
master.fansik.com:192.168.83.118
node1.fansik.com:192.168.83.119
node2.fansik.com:192.168.83.120
2、jdk版本:1.7.0_75
3、Hadoop版本:2.7.2
二、安装前准备
1、关闭防火墙和selinux
# setenforce 0
# service iptables stop
2、配置host文件
192.168.83.118 master.fansik.com
192.168.83.119 node1.fansik.com
192.168.83.120 node2.fansik.com
3、生成秘钥
master.fansik.com上执行# ssh-keygen一直回车
# scp ~/.ssh/id_rsa.pub node1.fansik.com:/root/.ssh/authorized_keys
# scp ~/.ssh/id_rsa.pub node2.fansik.com:/root/.ssh/authorized_keys
# chmod 600 /root/.ssh/authorized_keys
4、安装jdk
# tar xf jdk-7u75-linux-x64.tar.gz
# mv jdk1.7.0_75 /usr/local/jdk1.7
# vim /etc/profile.d/java.sh加入如下内容:
export JAVA_HOME=/usr/local/jdk1.7
export JRE_HOME=/usr/local/jdk1.7/jre
export CLASSPATH=.:$JAVA_HOME/lib:/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
# source /etc/profile
5、同步时间(否则后边分析文件的时候可能会有问题)
# ntpdate 202.120.2.101(上海交通大学的服务器)
三、安装Hadoop
Hadoop的官方下载站点,可以选择相应的版本下载:http://hadoop.apache.org/releases.html
分别在三台机器上执行下面的操作:
# tar xf hadoop-2.7.2.tar.gz
# mv hadoop-2.7.2 /usr/local/hadoop
# cd /usr/local/hadoop/
# mkdir tmp dfs dfs/data dfs/name
四、配置Hadoop
master.fansik.com上的配置
# vim /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.83.118:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop/tmp</value> </property> <property> <name>io.file.buffer.size</name> <value>121702</value> </property> </configuration>
# vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>192.168.83.118.9001</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
# cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
# vim (!$|/usr/local/hadoop/etc/hadoop/mapred-site.xml)
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>192.168.83.118:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>192.168.83.118:19888</value> </property> </configuration>
# vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>192.168.83.118:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>192.168.83.118:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>192.168.83.118:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>192.168.83.118:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>192.168.83.118:8088</value> </property> <property> <name>yarn.resourcemanager.resource.memory.mb</name> <value>2048</value> </property> </configuration>
# vim /usr/local/hadoop/etc/hadoop/slaves
192.168.83.119
192.168.83.120
将master上的etc目录同步至node1和node2
# rsync -av /usr/local/hadoop/etc/ node1.fansik.com:/usr/local/hadoop/etc/
# rsync -av /usr/local/hadoop/etc/ node2.fansik.com:/usr/local/hadoop/etc/
在master.fansik.com上操作即可,两个node会自动启动
配置Hadoop的环境变量
# vim /etc/profile.d/hadoop.sh
export PATH=/usr/local/hadoop/bin:/usr/local/hadoop/bin:$PATH
# source /etc/profile
初始化
# hdfs namenode -format
查看是否报错
# echo $?
启动服务
# start-all.sh
停止服务
# stop-all.sh
启动服务后即可通过下列地址访问:
http://192.168.83.118:8088
http://192.168.83.118:50070
五、测试Hadoop
在master.fansik.com上操作
# hdfs dfs -mkdir /fansik
如果在创建目录的时候提示下列的警告可以忽略
16/07/29 17:38:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your pform... using builtin-java classes where applicable
解决办法:
到下列站点去下载相应的版本即可:
http://dl.bintray.com/sequenceiq/sequenceiq-bin/
# tar -xvf hadoop-native-64-2.7.0.tar -C /usr/local/hadoop/lib/native/
如果提示:copyFromLocal:Cannot create directory /123/. Name node is in safe mode
说明Hadoop开启了安全模式,解决办法
hdfs dfsadmin -safemode leave
将myservicce.sh复制到fansik目录下
# hdfs dfs -copyFromLocal ./myservicce.sh /fansik
查看/fansik目录下是否有了myservicce.sh文件
# hdfs dfs -ls /fansik
使用workcount分析文件
# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /fansik/myservicce.sh /zhangshan/
查看分析后的文件:
# hdfs dfs -ls /zhangshan/
Found 2 items
-rw-r--r-- 2 root supergroup 0 2016-08-02 15:19 /zhangshan/_SUCCESS
-rw-r--r-- 2 root supergroup 415 2016-08-02 15:19 /zhangshan/part-r-00000
查看分析结果:
# hdfs dfs -cat /zhangshan/part-r-00000