Hadoop HA 完全分布式搭建
说明: 多个服务器搭建集群
常见问题: 端口是否开放,ip是否禁止,jdk版本,JAVA_HOME 环境是否正确…
问题排查可以查看日志信息
如果服务器不够,也可以用三台的,也可以用VMware 模拟集群搭建
华为云 1核1G 服务器, 使用jdk-8u211-linux-x64.tar.gz, 启动集群,可能会出现版本不合的问题
解决:
- sudo yum install -y java-1.8.0-openjdk 安装
- 查看java 家目录, 并重新设置 JAVA_HOME 环境
1.which java
2.ls -lrt /usr/bin/java
3.ls -lrt /etc/alternatives/java
注:
尽量不要过多的格式化HDFS
目录:
1.集群规划
2.服务器环境准备
3.安装JDK
4.安装zookeeper
5.安装hadoop
一.集群规划
主机名 | 公网ip | 用户 | 安装的软件 | 启动的进程备注 | 备注 |
---|---|---|---|---|---|
cluster01 | xxx.xxx.116.58 | hadoop | jdk、hadoop | NameNode、DFSZKFailoverController(zkfc) | 阿里云 |
cluster02 | xxx.xxx.208.25 | hadoop | jdk、hadoop | NameNode、DFSZKFailoverController(zkfc) | 阿里云 |
cluster03 | xxx.xxx.53.107 | hadoop | jdk、hadoop | ResourceManager | 华为云 |
cluster04 | xxx.xxx.19.123 | hadoop | jdk、hadoop | ResourceManager | 华为云 |
cluster05 | xxx.xxx.44.196 | hadoop | jdk、hadoop、zookeeper | DataNode、NodeManager、JournalNode、QuorumPeerMain | 腾讯云 |
cluster06 | xxx.xxx.64.132 | hadoop | jdk、hadoop、zookeeper | DataNode、NodeManager、JournalNode、QuorumPeerMain | 百度云 |
cluster07 | xxx.xxx.127.16 | hadoop | jdk、hadoop、zookeeper | DataNode、NodeManager、JournalNode、QuorumPeerMain | 百度云 |
三台机
主机名 | 启动进程 |
---|---|
cluster01 | NameNode1、zkfc1,rm1, zk1,journalnode1, datanode1, nodemanager1 |
cluster02 | NameNode2、zkfc2,rm2, zk2,journalnode1, datanode1, nodemanager1 |
cluster03 | zk3,journalnode1, datanode1, nodemanager1 |
二.服务器环境准备
系统环境: Cnetos7 64 位
Hdaoop版本: Apache Hdaoop hadoop-2.7.2.tar.gz
链接:https://pan.baidu.com/s/1v-CTr9jRqgTBQUbMGCP2Dg
提取码:inkq
Jdk版本: jdk-8u211-linux-x64.tar.gz
链接:https://pan.baidu.com/s/1Ytq8ecZzwRsGmoMw1c00-A
提取码:p2ii
Zookeeper版本: zookeeper-3.4.10.tar.gz
链接:https://pan.baidu.com/s/1V2MX4vN1Ycoq6hqCnZ8ZPA
提取码:rqq3
2.1.设置主机名—root登录
2.1.1 设置主机名
vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=cluster01
NETWORKING=yes
HOSTNAME=cluster02
NETWORKING=yes
HOSTNAME=cluster03
NETWORKING=yes
HOSTNAME=cluster04
NETWORKING=yes
HOSTNAME=cluster05
NETWORKING=yes
HOSTNAME=cluster06
NETWORKING=yes
HOSTNAME=cluster07
2.1.2 让主机名生效
- 退出再登录,主机名已经改变
hostname cluster05
exit #退出再登录,主机名已经改变
2.2. 设置ip映射—root登录
2.2.1 设置ip映射
vi /etc/hosts
- 本机要用内网 eg: 172.17.16.17 cluster05
xxx.xxx.116.58 cluster01
xxx.xxx.208.25 cluster02
xxx.xxx.53.107 cluster03
xxx.xxx.19.123 cluster04
172.17.16.17 cluster05
xxx.xxx.64.132 cluster06
xxx.xxx.127.16 cluster07
xxx.xxx.2.171 cluster08
2.3. 创建用户–root登录
2.3.1 创建用户/设置密码
cluster01---hadoop
cluster02---hadoop
cluster03---hadoop
cluster04---hadoop
cluster05---hadoop
cluster06---hadoop
cluster07---hadoop
2.3.2 让hadoop 用户能行使root权限
每一个主机都要配置
1. vi /etc/sudoers
#找到:
root ALL=(ALL) ALL
# 配置:用户名 ALL=(ALL) ALL
hadoop ALL=(ALL) ALL
2. wq! 强制保存退出
2.3.3 我的文件目录
-
/home/hadoop: hadoop 的家目录
-
opt: 应用安装目录
-
installPage: 应用安装压缩包存放地
- hadoop-2.7.2.tar.gz
- jdk-8u211-linux-x64.tar.gz
- zookeeper-3.4.10.tar.gz
-
hadoop: hadoop-2.7.2.tar.gz 解压目录
- hadoop-2.7.2
-
jdk:jdk-8u211-linux-x64.tar.gz 解压目录
- jdk-8u211-linux-x64
-
zookeeper: zookeeper-3.4.10.tar.gz解压目录
- zookeeper-3.4.10
-
installPage: 应用安装压缩包存放地
-
opt: 应用安装目录
2.4. 配置SSH免登录-- hadoop用户登录
- HDFS 启动,需要启动 Namenode1, NameNode2, Datanode1, Datanode2, Datanode3 这5个进程, 这个过程需要SSH远程登录
- Yarn 启动, 需要启动 Nodemanager1, Nodemanager2, Nodemanager3 这3个进程, 这个过程需要SSH远程登录
- ResourceMangager 启动不需要SSH 远程登录, ResourceMangager1,ResourceMangager2 手动启动即可
说明:
- 在cluster01 上 启动 HDFS, 需要ssh 远程登录 cluster02,cluster05,cluster06,cluster07
- 在 cluster03 上启动 ResourceMangager需要ssh 远程登录 cluster05,cluster06,cluster07
所以这里只需要配置
- cluster01----免密----->cluster02,cluster05,cluster06,cluster07
- cluster03----免密----->cluster05,cluster06,cluster07
为了传输数据方便,这里七台机器全都互相配置了免密登录
2.4.1 生成密钥
ssh-keygen -t rsa 一路回车不用输入密码
2.4.2 分发密钥
ssh-copy-id 主机名
- 分发密钥时要输入密码
- 这里7台机子互相分发密钥,共分发了49次
#ssh-copy-id 用户名@主机ip/主机名
# 发送完之后,ssh就可以免密登录了
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster01
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster02
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster03
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster04
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster05
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster06
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster07
三.安装JDK— hadoop 用户
每台机器上要安装
1. 解压安装包
- tar -zxvf jdk-8u211-linux-x64.tar.gz
2. 修改配置文件
- sudo vi /etc/profile
- erport JAVA_HOME=jdk安装目录
- wq 保存退出
3. 使配置文件生效
- source /etc/profile
# 以 cluster01
[hadoop@cluster01 jdk]$ sudo vi /etc/profile
export JAVA_HOME=/home/hadoop/opt/jdk/jdk1.8.0_211
export PATH=$PATH:$JAVA_HOME/bin
# 使配置文件生效
[hadoop@cluster01 jdk]$ source /etc/profile
# 测试
[hadoop@cluster01 jdk]$ java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
四.安装Zookeeper集群— hadoop 用户
操作主机: cluster05
4.1修改配置文件
4.1.1 zookeeper-3.4.10 目录结构
4.1.2 进入到 conf 目录
1) 修改一下配置文件名
- mv zoo_sample.cfg zoo.cfg
[hadoop@cluster05 conf]$ mv zoo_sample.cfg zoo.cfg
[hadoop@cluster05 conf]$ ll
total 12
-rw-rw-r-- 1 hadoop hadoop 535 Mar 23 2017 configuration.xsl
-rw-rw-r-- 1 hadoop hadoop 2161 Mar 23 2017 log4j.properties
-rw-rw-r-- 1 hadoop hadoop 922 Mar 23 2017 zoo.cfg
2) 修改配置文件—设置 数据存放目录
- vi zoo.cfg
# 目录要提前建好
dataDir=/home/hadoop/opt/zookeeper/zookeeper-3.4.10/zkData
3) 修改配置文件—添加集群节点
- 末尾添加
#server.id=主机名:2888:3888
#2888:3888 leader 选举的通信端口 id 号随意设置,必须要是唯一的
server.1=cluster05:2888:3888
server.2=cluster06:2888:3888
server.3=cluster07:2888:3888
4) 保存退出
- wq
3.1.3 创建所需文件
1) 创建 dataDir 对应的目录
- 在zookeeper 安装根目录创建zkData
- mkdir zkData
[hadoop@cluster05 zookeeper-3.4.10]$ mkdir zkData
2) 进入到 zkData 目录, 创建zookeeper集群主机对应的ip文件
- 根据上面配置的serverid 生成相对应的id文件
- cluster05是1, cluster06是2, cluster07是3
# 这里是以cluster05 为例 之前配置过 server.1=cluster05:2888:3888 其id为1
echo 1 > myid
# 查看
[hadoop@cluster05 zkData]$ cat myid
1
4.1.4 将设置好的zookeeper 分发给其他几台机器
scp -r /home/hadoop/opt/zookeeper/zookeeper-3.4.10/ cluster06:/home/hadoop/opt/zookeeper/
scp -r /home/hadoop/opt/zookeeper/zookeeper-3.4.10/ cluster07:/home/hadoop/opt/zookeeper/
4.1.5 修改cluster06, cluster07 上的myid 文件(要与配置文件对应)
# cluster06
[hadoop@cluster06 zkData]$ echo 2 > myid
[hadoop@cluster06 zkData]$ cat myid
2
# cluster07
[hadoop@cluster07 zkData]$ echo 3 > myid
[hadoop@cluster07 zkData]$ cat myid
3
4.2 zookeeper 集群启动测试
4.2.1 分别启动集群
- ./zkServer.sh start
- 如果觉得每次都要进入到bin目录才能执行脚本,可以将zookeeper配进环境变量
4.2.2 集群状态查看
#cluster05
[hadoop@cluster05 zookeeper-3.4.10]$./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/zookeeper01/opt/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower
# cluster06
[hadoop@cluster06 zookeeper-3.4.10]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/zookeeper01/opt/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: leader
# cluster07
[hadoop@cluster07 zookeeper-3.4.10]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/zookeeper01/opt/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower
注:
如果集群中机器启动少于半数,集群不能正常工作
[hadoop@cluster05 zookeeper-3.4.10]$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/zookeeper01/opt/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
4.3 客户端操作
[zookeeper01@cluster05 bin]$ ./zkCli.sh
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 4] help
ZooKeeper -server host:port cmd args
stat path [watch]
set path data [version]
ls path [watch]
delquota [-n|-b] path
ls2 path [watch]
setAcl path acl
setquota -n|-b val path
history
redo cmdno
printwatches on|off
delete path [version]
sync path
listquota path
rmr path
get path [watch]
create [-s] [-e] path data acl
addauth scheme auth
quit
getAcl path
close
connect host:port
[zk: localhost:2181(CONNECTED) 5] ls /
[zookeeper]
五.安装Hadoop Ha集群— hadoop 用户
操作主机: cluster01
5.1 安装
5.1.1 解压
tar -zxvf /home/hadoop/opt/hadoop/hadoop-2.7.2.tar.gz
-C /home/hadoop/opt/hadoop
[hadoop@cluster01 hadoop]$ ll
total 4
drwxr-xr-x 9 hadoop hadoop 4096 Jan 26 2016 hadoop-2.7.2
5.1.2 配置环境变量
[hadoop@cluster01 jdk]$ sudo vi /etc/profile
export JAVA_HOME=/home/hadoop/opt/jdk/jdk1.8.0_211
export HADOOP_HOME=/home/hadoop/opt/hadoop/hadoop-2.7.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# 使配置文件生效
[hadoop@cluster01 jdk]$ source /etc/profile
5.2 修改配置文件
[hadoop@cluster01 hadoop]$ cd /home/hadoop/opt/hadoop/hadoop-2.7.2/etc/hadoop
需要修改六个配置文件
- hadoop-env.sh
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
- yarn-site.xml
- slaves
5.2.1 修改 hadoop-env.sh
[hadoop@cluster01 hadoop]$ vi hadoop-env.sh
hadoop-env.sh 配置JAVA_HOME
5.2.2 修改 core-site.xml
[hadoop@cluster01 hadoop]$ vi core-site.xml
core-site.xml
<configuration>
<!-- 指定hdfs的nameservice为ns1 (每一个nameservice 下有多个namenode) -->
<!-- 可能存在多个 nameserice, 访问的时候通过 nameserve的 名字访问 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1/</value>
</property>
<!-- 指定hadoop工作目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/opt/hadoop/hadoop-2.7.2/tmp</value>
</property>
<!-- 指定zookeeper地址 主机名:端口号-->
<property>
<name>ha.zookeeper.quorum</name>
<value>cluster05:2181,cluster06:2181,cluster07:2181</value>
</property>
<!-- 客户端连接重试次数。 -->
<property>
<name>ipc.client.connect.max.retries</name>
<value>100</value>
</property>
<!-- 两次重新建立连接之间的时间间隔,单位为毫秒。 -->
<property>
<name>ipc.client.connect.retry.interval</name>
<value>10000</value>
</property>
</configuration>
5.2.3 修改 hdfs-site.xml
[hadoop@cluster01 hadoop]$ vi hdfs-site.xml
hdfs-site.xml
<configuration>
<!--生成副本数量: 我这里有三台 datanode-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!-- ns1下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>cluster01:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>cluster01:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>cluster02:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>cluster02:50070</value>
</property>
<!-- JournalNode用来存放元数据,依赖于zookeeper -->
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name> <value>qjournal://cluster05:8485;cluster06:8485;cluster07:8485/ns1</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/opt/hadoop/hadoop-2.7.2/journaldata</value>
</property>
<!-- 开启NameNode失败自动切换 true: 开启-->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式,使用的 hadoop 内部的类 -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<!-- 配置防止脑裂的机制(即出现两个 namenode 同时对外提供服务的情况)-->
<!-- sshfence:远程登录杀死出现状况的 namenode-->
<!-- 如果远程ssh登录的端口号不是 22 sshfence(用户名:端口号) -->
<!-- shell: 远程登录超时等无响应的后续解决方案(自定义脚本)-->
<!-- shell(/bin/true): 因为没有定义脚本,返回true直接切换-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 单位:ms 3000ms == s0s-->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
5.2.4 修改 mapred-site.xml.template
# 修改配置文件名
[hadoop@cluster01 hadoop]$ mv mapred-site.xml.template mapred-site.xml
[hadoop@cluster01 hadoop]$ vi mapred-site.xml
mapred-site.xml
<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5.2.5 修改 yarn-site.xml
# 修改配置文件名
[hadoop@cluster01 hadoop]$ vi yarn-site.xml
yarn-site.xml
<configuration>
<!-- 开启RM高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的集群id: 这里有两个 resourcemanager,设置一个组id(自定义)-->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- 设置resourcemanager分组下的 RM的名字(自定义) -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>cluster03</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>cluster04</value>
</property>
<!-- 指定zookeeper集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>cluster05:2181,cluster06:2181,cluster07:2181</value>
</property>
<!-- reducer取数据的方式是mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
5.2.6 修改 slaves
slaves 主要用来说明从节点所在位置, Namenode读取slaves 获取 Datanode 所在主机, Resourcemanager 读取 slaves 获取NodeManager 所在主机
-
cluster01 上启动HDFS,进而启动 DataNode
- DataNode 在 cluster05, cluster06, cluster07
- 所以要在clusters01 上的slaves 中配置 Datanode 在哪些主机
-
cluster03 上启动Yarn,进而启动 Nodemanager
- Nodemanager 在 cluster05, cluster06, cluster07
- 所以要在clusters03 上的slaves 中配置 Nodemanager 在哪些主机
所以 cluster01 cluster03 的 slaves 配置一样(根据自己情况进行设置),其他的机器可以不用配
# 修改配置文件名
[hadoop@cluster01 hadoop]$ vi slaves
cluster05
cluster06
cluster07
5.3 配置免密登录
如果之前配置过,这里就不用配置了
-
配置 cluster01 到 cluster01,cluster02,cluster05,cluster06,cluster07 的免登录
- Namenode,Datanode启动是远程登录
-
配置 cluster03到cluster05,cluster06,cluster07的免登录
- ResourceManager启动是本地登录, NodeManager 启动是远程登录
5.4 将配置好的 hadoop 分发给其他几台机器
过程可能有点漫长
[hadoop@cluster01 opt]$ cd /home/hadoop/opt/
[hadoop@cluster01 opt]$ scp -r hadoop/ cluster02:/home/hadoop/opt/
[hadoop@cluster01 opt]$ scp -r hadoop/ cluster03:/home/hadoop/opt/
[hadoop@cluster01 opt]$ scp -r hadoop/ cluster04:/home/hadoop/opt/
[hadoop@cluster01 opt]$ scp -r hadoop/ cluster05:/home/hadoop/opt/
[hadoop@cluster01 opt]$ scp -r hadoop/ cluster06:/home/hadoop/opt/
[hadoop@cluster01 opt]$ scp -r hadoop/ cluster07:/home/hadoop/opt/
5.5 启动集群
注意:
要严格按照顺序启动
5.5.1 启动zookeeper 集群
- 操作主机: cluster05, cluster06, cluster07
- 如果启动过了(可能会报端口被占用),确保 QuorumPeerMain 进程正常运行就行
[hadoop@cluster05 zookeeper-3.4.10]$ cd /home/hadoop/opt/zookeeper/zookeeper-3.4.10
#启动cluster05
[hadoop@cluster05 zookeeper-3.4.10]$ bin/zkServer.sh start
[hadoop@cluster05 zookeeper-3.4.10]$ jps
26065 QuorumPeerMain
26091 Jps
#启动cluster06
[hadoop@cluster06 zookeeper-3.4.10]$ bin/zkServer.sh start
[hadoop@cluster06 zookeeper-3.4.10]$ jps
26065 QuorumPeerMain
26091 Jps
#启动cluster07
[hadoop@cluster07 zookeeper-3.4.10]$ bin/zkServer.sh start
[hadoop@cluster07 zookeeper-3.4.10]$ jps
26065 QuorumPeerMain
26091 Jps
5.5.2 启动 journalNode
- 操作主机: cluster05, cluster06, cluster07
[hadoop@cluster05 zookeeper-3.4.10]$cd /home/hadoop/opt/hadoop/hadoop-2.7.2
# cluster05
[hadoop@cluster05 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start journalnode
[hadoop@cluster05 hadoop-2.7.2]$ jps
20928 Jps
20822 JournalNode
3129 QuorumPeerMain
# cluster06
[hadoop@cluster06 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start journalnode
[hadoop@cluster06 hadoop-2.7.2]$ jps
935 JournalNode
1002 Jps
18222 QuorumPeerMain
#cluster07
[hadoop@cluster07 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start journalnode
[hadoop@cluster07 hadoop-2.7.2]$ jps
15392 Jps
15338 JournalNode
25131 QuorumPeerMain
5.5.3 格式化HDFS
- 操作主机: cluster01
[hadoop@cluster05 hadoop-2.7.2]$ cd /home/hadoop/opt/hadoop/hadoop-2.7.2
[hadoop@cluster05 hadoop-2.7.2]$ bin/hdfs namenode -format
5.5.4 将hadoop.tmp.dir生成的文件复制给另外一个NameNode
- 操作主机: cluster01
- 确保两个Namenode的初始fsimage 一致
[hadoop@cluster01 hadoop-2.7.2]$ pwd
/home/hadoop/opt/hadoop/hadoop-2.7.2
[hadoop@cluster01 hadoop-2.7.2]$ scp -r tmp cluster02:/home/hadoop/opt/hadoop/hadoop-2.7.2/
5.5.5 格式化ZKFS
- 操作主机: cluster01
[hadoop@cluster01 hadoop-2.7.2]$ bin/hdfs zkfc -formatZK
5.5.6 启动HDFS
- 操作主机: cluster01
[hadoop@cluster01 hadoop-2.7.2]$ sbin/start-dfs.sh
# cluster01
Starting namenodes on [cluster01 cluster02]
cluster01: starting namenode, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-namenode-cluster01.out
cluster02: starting namenode, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-namenode-cluster02.out
cluster05: starting datanode, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-datanode-cluster05.out
cluster07: starting datanode, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-datanode-cluster07.out
cluster06: starting datanode, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-datanode-cluster06.out
Starting journal nodes [cluster05 cluster06 cluster07]
cluster05: journalnode running as process 30338. Stop it first.
cluster07: journalnode running as process 22724. Stop it first.
cluster06: journalnode running as process 12382. Stop it first.
Starting ZK Failover Controllers on NN hosts [cluster01 cluster02]
cluster01: starting zkfc, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-zkfc-cluster01.out
cluster02: starting zkfc, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-zkfc-cluster02.out
[hadoop@cluster01 hadoop-2.7.2]$ jps
6567 NameNode
6956 Jps
6862 DFSZKFailoverController
# cluster02
[hadoop@cluster02 hadoop-2.7.2]$ jps
12274 NameNode
12371 DFSZKFailoverController
12491 Jps
# cluster05
[hadoop@cluster05 zookeeper-3.4.10]$ jps
4496 Jps
30338 JournalNode
3416 DataNode
10494 QuorumPeerMain
# cluster06
[hadoop@cluster06 hadoop-2.7.2]$ jps
21738 DataNode
7707 QuorumPeerMain
23805 Jps
12382 JournalNode
# cluster07
[hadoop@cluster07 hadoop-2.7.2]$ jps
22724 JournalNode
19094 QuorumPeerMain
23980 Jps
23695 DataNode
5.5.7 启动Yarn–ResourceManager1
- 操作主机: cluster03
[hadoop@cluster01 hadoop-2.7.2]$ sbin/start-yarn.sh
5.5.8 启动Yarn–ResourceManager2
- 操作主机: cluster04
[hadoop@cluster04 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager
5.5.9 查看集群中进程是否正常启动
# cluster01
[hadoop@cluster01 hadoop-2.7.2]$ jps
11440 DFSZKFailoverController
11138 NameNode
11604 Jps
# cluster02
[hadoop@cluster02 hadoop-2.7.2]$ jps
16112 NameNode
16209 DFSZKFailoverController
16389 Jps
# cluster03
[hadoop@cluster03 ~]$ jps
18345 Jps
17946 ResourceManager
# cluster04
[hadoop@cluster04 ~]$ jps
25953 ResourceManager
25992 Jps
# cluster05
[hadoop@cluster05 ~]$ jps
30176 JournalNode
31956 NodeManager
31417 DataNode
906 Jps
10494 QuorumPeerMain
# cluster06
[hadoop@cluster06 ~]$ jps
11216 DataNode
12086 NodeManager
14391 Jps
7707 QuorumPeerMain
9661 JournalNode
# cluster07
[hadoop@cluster07 ~]$ jps
21777 Jps
21458 NodeManager
19094 QuorumPeerMain
21255 DataNode
21032 JournalNode
5.5.10 web 访问
要开放对应的端口号