一、说明
从上一节可看出,虽然搭建好了HA架构,但是只能手动进行active与standby的切换;
接下来看一下用zookeeper进行自动故障转移:
# 在启动HA之后,两个NameNode都是standby状态,可以利用zookeeper的选举功能,选出一个当Active # 监控 ZKFC FailoverController
二、配置
1、hdfs-site.xml
#”开启自动转移功能“,加入以下内容;
<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>
2、core-site.xml
#”设置故障转移的zookeeper集群“,加入以下内容;
<property> <name>ha.zookeeper.quorum</name> <value>master:2181,slave1:2181,slave2:2181</value> </property>
3、关闭集群所有服务
#master [root@master hadoop-2.5.0]# sbin/stop-dfs.sh [root@master ~]# xcall jps ====== master jps ====== 18719 Jps ====== slave1 jps ====== 19150 Jps ====== slave2 jps ====== 13595 Jps #如果还有其他服务(zookeeper等)也要关闭;
4、同步配置文件
[root@master hadoop]# pwd /opt/app/hadoop-2.5.0/etc/hadoop [root@master hadoop]# scp -r hdfs-site.xml core-site.xml root@slave1:/opt/app/hadoop-2.5.0/etc/hadoop/ [root@master hadoop]# scp -r hdfs-site.xml core-site.xml root@slave2:/opt/app/hadoop-2.5.0/etc/hadoop/
5、启动zookeeper
#所有节点启动zookeeper [root@master ~]# /opt/app/zookeeper-3.4.5/bin/zkServer.sh start [root@slave1 ~]# /opt/app/zookeeper-3.4.5/bin/zkServer.sh start [root@slave2 ~]# /opt/app/zookeeper-3.4.5/bin/zkServer.sh start #查看 [root@master ~]# xcall jps ====== master jps ====== 18824 Jps 18765 QuorumPeerMain ====== slave1 jps ====== 19201 QuorumPeerMain 19263 Jps ====== slave2 jps ====== 13646 QuorumPeerMain 13702 Jps
6、初始化HA在Zookeeper中状态
#master [root@master hadoop-2.5.0]# bin/hdfs zkfc -formatZK # 此时可以在slave1上用客户端连入zookeeper查看: [root@slave1 zookeeper-3.4.5]# bin/zkCli.sh [zk: localhost:2181(CONNECTED) 1] ls / [zookeeper] [zk: localhost:2181(CONNECTED) 2] ls / #生成了hadoop-ha [hadoop-ha, zookeeper]
7、启动HDFS服务
#master [root@master hadoop-2.5.0]# sbin/start-dfs.sh #查看启动情况 [root@master ~]# xcall jps ====== master jps ====== 19588 DFSZKFailoverController #ZKFC监控进程 19087 NameNode 19193 DataNode 19393 JournalNode 18765 QuorumPeerMain 19662 Jps ====== slave1 jps ====== 19743 DFSZKFailoverController #ZKFC监控进程 19201 QuorumPeerMain 19800 Jps 19613 JournalNode 19521 DataNode 19443 NameNode ====== slave2 jps ====== 13646 QuorumPeerMain 13850 DataNode 14014 Jps 13942 JournalNode #查看nn1 nn2的状态 [root@master hadoop-2.5.0]# bin/hdfs haadmin -getServiceState nn1 19/04/18 10:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active [root@master hadoop-2.5.0]# bin/hdfs haadmin -getServiceState nn2 19/04/18 10:34:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable standby #可见已经自动把nn1选举为active了,nn2为standby;在web中也可以看到;
8、测试故障自动转移
可以kill掉active状态的namenode,查看standby状态的namenode是否已经自动变为active了;