多台服务器搭建 Hadoop HA 完全分布式

Hadoop HA 完全分布式搭建

说明: 多个服务器搭建集群
  常见问题:  端口是否开放,ip是否禁止,jdk版本,JAVA_HOME 环境是否正确…
  问题排查可以查看日志信息
  如果服务器不够,也可以用三台的,也可以用VMware 模拟集群搭建
  华为云 1核1G 服务器, 使用jdk-8u211-linux-x64.tar.gz, 启动集群,可能会出现版本不合的问题
  解决:

  • sudo yum install -y java-1.8.0-openjdk 安装
  • 查看java 家目录, 并重新设置 JAVA_HOME 环境
    1.which java
    2.ls -lrt /usr/bin/java
    3.ls -lrt /etc/alternatives/java

注: 尽量不要过多的格式化HDFS

目录:

   1.集群规划

   2.服务器环境准备

   3.安装JDK

   4.安装zookeeper

   5.安装hadoop

一.集群规划

主机名 公网ip 用户 安装的软件 启动的进程备注 备注
cluster01 xxx.xxx.116.58 hadoop jdk、hadoop NameNode、DFSZKFailoverController(zkfc) 阿里云
cluster02 xxx.xxx.208.25 hadoop jdk、hadoop NameNode、DFSZKFailoverController(zkfc) 阿里云
cluster03 xxx.xxx.53.107 hadoop jdk、hadoop ResourceManager 华为云
cluster04 xxx.xxx.19.123 hadoop jdk、hadoop ResourceManager 华为云
cluster05 xxx.xxx.44.196 hadoop jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain 腾讯云
cluster06 xxx.xxx.64.132 hadoop jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain 百度云
cluster07 xxx.xxx.127.16 hadoop jdk、hadoop、zookeeper DataNode、NodeManager、JournalNode、QuorumPeerMain 百度云
三台机
主机名 启动进程
cluster01 NameNode1、zkfc1,rm1, zk1,journalnode1, datanode1, nodemanager1
cluster02 NameNode2、zkfc2,rm2, zk2,journalnode1, datanode1, nodemanager1
cluster03 zk3,journalnode1, datanode1, nodemanager1

二.服务器环境准备

系统环境: Cnetos7 64 位

Hdaoop版本: Apache Hdaoop hadoop-2.7.2.tar.gz
链接:https://pan.baidu.com/s/1v-CTr9jRqgTBQUbMGCP2Dg
提取码:inkq

Jdk版本: jdk-8u211-linux-x64.tar.gz
链接:https://pan.baidu.com/s/1Ytq8ecZzwRsGmoMw1c00-A
提取码:p2ii

Zookeeper版本: zookeeper-3.4.10.tar.gz
链接:https://pan.baidu.com/s/1V2MX4vN1Ycoq6hqCnZ8ZPA
提取码:rqq3

2.1.设置主机名—root登录

2.1.1 设置主机名
  vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=cluster01

NETWORKING=yes
HOSTNAME=cluster02

NETWORKING=yes
HOSTNAME=cluster03

NETWORKING=yes
HOSTNAME=cluster04

NETWORKING=yes
HOSTNAME=cluster05

NETWORKING=yes
HOSTNAME=cluster06

NETWORKING=yes
HOSTNAME=cluster07
2.1.2 让主机名生效
  • 退出再登录,主机名已经改变
  hostname cluster05
  exit   #退出再登录,主机名已经改变

2.2. 设置ip映射—root登录

2.2.1 设置ip映射

vi /etc/hosts

  • 本机要用内网 eg: 172.17.16.17 cluster05
xxx.xxx.116.58     cluster01
xxx.xxx.208.25     cluster02
xxx.xxx.53.107     cluster03
xxx.xxx.19.123     cluster04
172.17.16.17       cluster05
xxx.xxx.64.132     cluster06
xxx.xxx.127.16     cluster07
xxx.xxx.2.171      cluster08

2.3. 创建用户–root登录

2.3.1 创建用户/设置密码
cluster01---hadoop
cluster02---hadoop
cluster03---hadoop
cluster04---hadoop
cluster05---hadoop
cluster06---hadoop
cluster07---hadoop
2.3.2 让hadoop 用户能行使root权限

每一个主机都要配置
1. vi /etc/sudoers

#找到:
root   ALL=(ALL) ALL
# 配置:用户名  ALL=(ALL) ALL
hadoop    ALL=(ALL) ALL

2. wq! 强制保存退出

2.3.3 我的文件目录

多台服务器搭建 Hadoop HA 完全分布式

  • /home/hadoop: hadoop 的家目录
    • opt: 应用安装目录
      • installPage: 应用安装压缩包存放地
        • hadoop-2.7.2.tar.gz
        • jdk-8u211-linux-x64.tar.gz
        • zookeeper-3.4.10.tar.gz
      • hadoop: hadoop-2.7.2.tar.gz 解压目录
        • hadoop-2.7.2
      • jdk:jdk-8u211-linux-x64.tar.gz 解压目录
        • jdk-8u211-linux-x64
      • zookeeper: zookeeper-3.4.10.tar.gz解压目录
        • zookeeper-3.4.10

2.4. 配置SSH免登录-- hadoop用户登录

  • HDFS 启动,需要启动 Namenode1, NameNode2, Datanode1, Datanode2, Datanode3 这5个进程, 这个过程需要SSH远程登录
  • Yarn 启动, 需要启动 Nodemanager1, Nodemanager2, Nodemanager3 这3个进程, 这个过程需要SSH远程登录
  • ResourceMangager 启动不需要SSH 远程登录, ResourceMangager1,ResourceMangager2 手动启动即可

说明:

  • cluster01 上 启动 HDFS, 需要ssh 远程登录 cluster02,cluster05,cluster06,cluster07
  • cluster03 上启动 ResourceMangager需要ssh 远程登录 cluster05,cluster06,cluster07

所以这里只需要配置

  • cluster01----免密----->cluster02,cluster05,cluster06,cluster07
  • cluster03----免密----->cluster05,cluster06,cluster07

为了传输数据方便,这里七台机器全都互相配置了免密登录

2.4.1 生成密钥

ssh-keygen -t rsa   一路回车不用输入密码

2.4.2 分发密钥

ssh-copy-id 主机名

  • 分发密钥时要输入密码
  • 这里7台机子互相分发密钥,共分发了49次
#ssh-copy-id 用户名@主机ip/主机名
# 发送完之后,ssh就可以免密登录了   
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster01
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster02
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster03
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster04
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster05
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster06
[hadoop@cluster01 hadoop]$ ssh-copy-id @cluster07

三.安装JDK— hadoop 用户

每台机器上要安装

1. 解压安装包

  • tar -zxvf jdk-8u211-linux-x64.tar.gz

2. 修改配置文件

  • sudo vi /etc/profile
  • erport JAVA_HOME=jdk安装目录
  • wq 保存退出

3. 使配置文件生效

  • source /etc/profile
# 以 cluster01
[hadoop@cluster01 jdk]$ sudo vi /etc/profile

export JAVA_HOME=/home/hadoop/opt/jdk/jdk1.8.0_211
export PATH=$PATH:$JAVA_HOME/bin

# 使配置文件生效
[hadoop@cluster01 jdk]$  source /etc/profile

# 测试
[hadoop@cluster01 jdk]$ java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)

四.安装Zookeeper集群— hadoop 用户

操作主机: cluster05

4.1修改配置文件

4.1.1 zookeeper-3.4.10 目录结构

多台服务器搭建 Hadoop HA 完全分布式

4.1.2 进入到 conf 目录

1) 修改一下配置文件名

  • mv zoo_sample.cfg zoo.cfg
[hadoop@cluster05 conf]$ mv zoo_sample.cfg  zoo.cfg
[hadoop@cluster05 conf]$ ll
total 12
-rw-rw-r-- 1 hadoop hadoop  535 Mar 23  2017 configuration.xsl
-rw-rw-r-- 1 hadoop hadoop 2161 Mar 23  2017 log4j.properties
-rw-rw-r-- 1 hadoop hadoop  922 Mar 23  2017 zoo.cfg

2) 修改配置文件—设置 数据存放目录

  • vi zoo.cfg
# 目录要提前建好
dataDir=/home/hadoop/opt/zookeeper/zookeeper-3.4.10/zkData

3) 修改配置文件—添加集群节点

  • 末尾添加
#server.id=主机名:2888:3888   
#2888:3888 leader 选举的通信端口   id 号随意设置,必须要是唯一的

server.1=cluster05:2888:3888
server.2=cluster06:2888:3888
server.3=cluster07:2888:3888

4) 保存退出

  • wq
3.1.3 创建所需文件

1) 创建 dataDir 对应的目录

  • 在zookeeper 安装根目录创建zkData
  • mkdir zkData
[hadoop@cluster05 zookeeper-3.4.10]$ mkdir zkData

2) 进入到 zkData 目录, 创建zookeeper集群主机对应的ip文件

  • 根据上面配置的serverid 生成相对应的id文件
  • cluster05是1, cluster06是2, cluster07是3
# 这里是以cluster05 为例  之前配置过 server.1=cluster05:2888:3888  其id为1
echo 1 > myid   
#  查看 
[hadoop@cluster05 zkData]$ cat myid
1
4.1.4 将设置好的zookeeper 分发给其他几台机器
scp -r /home/hadoop/opt/zookeeper/zookeeper-3.4.10/   cluster06:/home/hadoop/opt/zookeeper/

scp -r /home/hadoop/opt/zookeeper/zookeeper-3.4.10/    cluster07:/home/hadoop/opt/zookeeper/
4.1.5 修改cluster06, cluster07 上的myid 文件(要与配置文件对应)
# cluster06
[hadoop@cluster06 zkData]$ echo 2 > myid
[hadoop@cluster06 zkData]$ cat myid
2

# cluster07
[hadoop@cluster07 zkData]$ echo 3 > myid
[hadoop@cluster07 zkData]$ cat myid
3

4.2 zookeeper 集群启动测试

4.2.1 分别启动集群
  • ./zkServer.sh start
  • 如果觉得每次都要进入到bin目录才能执行脚本,可以将zookeeper配进环境变量
4.2.2 集群状态查看
#cluster05
[hadoop@cluster05 zookeeper-3.4.10]$./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/zookeeper01/opt/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower


# cluster06
[hadoop@cluster06 zookeeper-3.4.10]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/zookeeper01/opt/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: leader


# cluster07 
[hadoop@cluster07 zookeeper-3.4.10]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/zookeeper01/opt/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower

注: 如果集群中机器启动少于半数,集群不能正常工作

[hadoop@cluster05 zookeeper-3.4.10]$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/zookeeper01/opt/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.

4.3 客户端操作

[zookeeper01@cluster05 bin]$ ./zkCli.sh


WATCHER::

WatchedEvent state:SyncConnected type:None path:null

[zk: localhost:2181(CONNECTED) 4] help

ZooKeeper -server host:port cmd args
        stat path [watch]
        set path data [version]
        ls path [watch]
        delquota [-n|-b] path
        ls2 path [watch]
        setAcl path acl
        setquota -n|-b val path
        history 
        redo cmdno
        printwatches on|off
        delete path [version]
        sync path
        listquota path
        rmr path
        get path [watch]
        create [-s] [-e] path data acl
        addauth scheme auth
        quit 
        getAcl path
        close 
        connect host:port
[zk: localhost:2181(CONNECTED) 5] ls /
[zookeeper]

五.安装Hadoop Ha集群— hadoop 用户

操作主机: cluster01

5.1 安装

5.1.1 解压
tar -zxvf  /home/hadoop/opt/hadoop/hadoop-2.7.2.tar.gz
-C /home/hadoop/opt/hadoop

[hadoop@cluster01 hadoop]$ ll
total 4
drwxr-xr-x 9 hadoop hadoop 4096 Jan 26  2016 hadoop-2.7.2
5.1.2 配置环境变量
[hadoop@cluster01 jdk]$ sudo vi /etc/profile

export JAVA_HOME=/home/hadoop/opt/jdk/jdk1.8.0_211
export HADOOP_HOME=/home/hadoop/opt/hadoop/hadoop-2.7.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

# 使配置文件生效
[hadoop@cluster01 jdk]$  source /etc/profile

5.2 修改配置文件

[hadoop@cluster01 hadoop]$ cd /home/hadoop/opt/hadoop/hadoop-2.7.2/etc/hadoop

需要修改六个配置文件

  • hadoop-env.sh
  • core-site.xml
  • hdfs-site.xml
  • mapred-site.xml
  • yarn-site.xml
  • slaves
5.2.1 修改 hadoop-env.sh
[hadoop@cluster01 hadoop]$ vi hadoop-env.sh

hadoop-env.sh 配置JAVA_HOME
多台服务器搭建 Hadoop HA 完全分布式

5.2.2 修改 core-site.xml
[hadoop@cluster01 hadoop]$ vi core-site.xml

core-site.xml

<configuration>
    
	<!-- 指定hdfs的nameservice为ns1 (每一个nameservice 下有多个namenode) -->
    <!-- 可能存在多个 nameserice, 访问的时候通过 nameserve的 名字访问 -->
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://ns1/</value>
   </property>
    
    <!-- 指定hadoop工作目录-->
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/home/hadoop/opt/hadoop/hadoop-2.7.2/tmp</value>
	</property>
					
	<!-- 指定zookeeper地址 主机名:端口号-->
	<property>
		<name>ha.zookeeper.quorum</name>
		<value>cluster05:2181,cluster06:2181,cluster07:2181</value>
	</property>
    
   <!-- 客户端连接重试次数。 -->
   <property>
       <name>ipc.client.connect.max.retries</name>
       <value>100</value>
   </property>
    <!-- 两次重新建立连接之间的时间间隔,单位为毫秒。 -->
   <property>
        <name>ipc.client.connect.retry.interval</name>
        <value>10000</value>
   </property>
    
</configuration>
5.2.3 修改 hdfs-site.xml
[hadoop@cluster01 hadoop]$ vi  hdfs-site.xml

hdfs-site.xml

<configuration>
    
      <!--生成副本数量: 我这里有三台 datanode-->
      <property>
        <name>dfs.replication</name>
        <value>3</value>
      </property>
    
		<!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->
		<property>
			<name>dfs.nameservices</name>
			<value>ns1</value>
		</property>
    
		<!-- ns1下面有两个NameNode,分别是nn1,nn2 -->
		<property>
			<name>dfs.ha.namenodes.ns1</name>
			<value>nn1,nn2</value>
		</property>
    
		<!-- nn1的RPC通信地址 -->
		<property>
			<name>dfs.namenode.rpc-address.ns1.nn1</name>
			<value>cluster01:9000</value>
		</property>
        <!-- nn1的http通信地址 -->
		<property>
			<name>dfs.namenode.http-address.ns1.nn1</name>
			<value>cluster01:50070</value>
		</property>
    
    
		<!-- nn2的RPC通信地址 -->
		<property>
			<name>dfs.namenode.rpc-address.ns1.nn2</name>
			<value>cluster02:9000</value>
		</property>
		<!-- nn2的http通信地址 -->
		<property>
			<name>dfs.namenode.http-address.ns1.nn2</name>
			<value>cluster02:50070</value>
		</property>
    
        <!-- JournalNode用来存放元数据,依赖于zookeeper -->
		<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->

		<property>
			<name>dfs.namenode.shared.edits.dir</name>						                         <value>qjournal://cluster05:8485;cluster06:8485;cluster07:8485/ns1</value>
		</property>
    
		<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
		<property>
			<name>dfs.journalnode.edits.dir</name>
			<value>/home/hadoop/opt/hadoop/hadoop-2.7.2/journaldata</value>
		</property>
    
		<!-- 开启NameNode失败自动切换 true: 开启-->
		<property>
			<name>dfs.ha.automatic-failover.enabled</name>
			<value>true</value>
		</property>
    
		<!-- 配置失败自动切换实现方式,使用的 hadoop 内部的类 -->
		<property>
			<name>dfs.client.failover.proxy.provider.ns1</name>
		 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
		</property>
    
		<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
        <!-- 配置防止脑裂的机制(即出现两个 namenode 同时对外提供服务的情况)-->
        <!-- sshfence:远程登录杀死出现状况的 namenode-->
        <!-- 如果远程ssh登录的端口号不是 22  sshfence(用户名:端口号) -->
        <!-- shell: 远程登录超时等无响应的后续解决方案(自定义脚本)-->
        <!-- shell(/bin/true): 因为没有定义脚本,返回true直接切换-->
		<property>
			<name>dfs.ha.fencing.methods</name>
			<value>
				sshfence
				shell(/bin/true)
			</value>
		</property>
    
		<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
        <property>
			<name>dfs.ha.fencing.ssh.private-key-files</name>
			<value>/home/hadoop/.ssh/id_rsa</value>
		</property>
    
		<!-- 配置sshfence隔离机制超时时间 单位:ms    3000ms == s0s-->
		<property>
			<name>dfs.ha.fencing.ssh.connect-timeout</name>
			<value>30000</value>
		</property>
    
</configuration>
5.2.4 修改 mapred-site.xml.template
# 修改配置文件名
[hadoop@cluster01 hadoop]$ mv mapred-site.xml.template mapred-site.xml
[hadoop@cluster01 hadoop]$ vi mapred-site.xml

mapred-site.xml

<configuration>
	<!-- 指定mr框架为yarn方式 -->
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
</configuration>
5.2.5 修改 yarn-site.xml
# 修改配置文件名
[hadoop@cluster01 hadoop]$ vi yarn-site.xml

yarn-site.xml

 <configuration>
    
		<!-- 开启RM高可用 -->
		<property>
			<name>yarn.resourcemanager.ha.enabled</name>
			<value>true</value>
		</property>

		<!-- 指定RM的集群id: 这里有两个 resourcemanager,设置一个组id(自定义)-->
		<property>
	    	<name>yarn.resourcemanager.cluster-id</name>
			<value>yrc</value>
		</property>

		<!-- 设置resourcemanager分组下的 RM的名字(自定义) -->
		<property>
			<name>yarn.resourcemanager.ha.rm-ids</name>
			<value>rm1,rm2</value>
		</property>

		<!-- 分别指定RM的地址 -->
		<property>
			<name>yarn.resourcemanager.hostname.rm1</name>
			<value>cluster03</value>
		</property>
    
		<property>
			<name>yarn.resourcemanager.hostname.rm2</name>
			<value>cluster04</value>
		</property>

		<!-- 指定zookeeper集群地址 -->
		<property>
			<name>yarn.resourcemanager.zk-address</name>
			<value>cluster05:2181,cluster06:2181,cluster07:2181</value>
		</property>
    
        <!-- reducer取数据的方式是mapreduce_shuffle -->
		<property>
			<name>yarn.nodemanager.aux-services</name>
			<value>mapreduce_shuffle</value>
		</property>
    
</configuration>
5.2.6 修改 slaves

slaves 主要用来说明从节点所在位置, Namenode读取slaves 获取 Datanode 所在主机, Resourcemanager 读取 slaves 获取NodeManager 所在主机

  • cluster01 上启动HDFS,进而启动 DataNode
    • DataNode 在 cluster05, cluster06, cluster07
    • 所以要在clusters01 上的slaves 中配置 Datanode 在哪些主机
  • cluster03 上启动Yarn,进而启动 Nodemanager
    • Nodemanager 在 cluster05, cluster06, cluster07
    • 所以要在clusters03 上的slaves 中配置 Nodemanager 在哪些主机

所以 cluster01 cluster03 的 slaves 配置一样(根据自己情况进行设置),其他的机器可以不用配

# 修改配置文件名
[hadoop@cluster01 hadoop]$ vi slaves
cluster05
cluster06
cluster07

5.3 配置免密登录

如果之前配置过,这里就不用配置了

  • 配置 cluster01 到 cluster01,cluster02,cluster05,cluster06,cluster07 的免登录
    • Namenode,Datanode启动是远程登录
  • 配置 cluster03到cluster05,cluster06,cluster07的免登录
    • ResourceManager启动是本地登录, NodeManager 启动是远程登录

5.4 将配置好的 hadoop 分发给其他几台机器

过程可能有点漫长

[hadoop@cluster01 opt]$ cd /home/hadoop/opt/
[hadoop@cluster01 opt]$ scp -r  hadoop/ cluster02:/home/hadoop/opt/
[hadoop@cluster01 opt]$ scp -r  hadoop/ cluster03:/home/hadoop/opt/
[hadoop@cluster01 opt]$ scp -r  hadoop/ cluster04:/home/hadoop/opt/
[hadoop@cluster01 opt]$ scp -r  hadoop/ cluster05:/home/hadoop/opt/
[hadoop@cluster01 opt]$ scp -r  hadoop/ cluster06:/home/hadoop/opt/
[hadoop@cluster01 opt]$ scp -r  hadoop/ cluster07:/home/hadoop/opt/

5.5 启动集群

注意:要严格按照顺序启动

5.5.1 启动zookeeper 集群
  • 操作主机: cluster05, cluster06, cluster07
  • 如果启动过了(可能会报端口被占用),确保 QuorumPeerMain 进程正常运行就行
[hadoop@cluster05 zookeeper-3.4.10]$ cd /home/hadoop/opt/zookeeper/zookeeper-3.4.10
#启动cluster05
[hadoop@cluster05 zookeeper-3.4.10]$ bin/zkServer.sh start

[hadoop@cluster05 zookeeper-3.4.10]$ jps
26065 QuorumPeerMain
26091 Jps

#启动cluster06
[hadoop@cluster06 zookeeper-3.4.10]$ bin/zkServer.sh start

[hadoop@cluster06 zookeeper-3.4.10]$ jps
26065 QuorumPeerMain
26091 Jps

#启动cluster07
[hadoop@cluster07 zookeeper-3.4.10]$ bin/zkServer.sh start

[hadoop@cluster07 zookeeper-3.4.10]$ jps
26065 QuorumPeerMain
26091 Jps
5.5.2 启动 journalNode
  • 操作主机: cluster05, cluster06, cluster07
[hadoop@cluster05 zookeeper-3.4.10]$cd  /home/hadoop/opt/hadoop/hadoop-2.7.2
# cluster05
[hadoop@cluster05 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start journalnode
[hadoop@cluster05 hadoop-2.7.2]$ jps
20928 Jps
20822 JournalNode
3129 QuorumPeerMain

# cluster06
[hadoop@cluster06 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start journalnode
[hadoop@cluster06 hadoop-2.7.2]$ jps
935 JournalNode
1002 Jps
18222 QuorumPeerMain

#cluster07
[hadoop@cluster07 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start journalnode
[hadoop@cluster07 hadoop-2.7.2]$ jps
15392 Jps
15338 JournalNode
25131 QuorumPeerMain
5.5.3 格式化HDFS
  • 操作主机: cluster01
[hadoop@cluster05 hadoop-2.7.2]$ cd /home/hadoop/opt/hadoop/hadoop-2.7.2
[hadoop@cluster05 hadoop-2.7.2]$ bin/hdfs namenode -format 
5.5.4 将hadoop.tmp.dir生成的文件复制给另外一个NameNode
  • 操作主机: cluster01
  • 确保两个Namenode的初始fsimage 一致
[hadoop@cluster01 hadoop-2.7.2]$ pwd
/home/hadoop/opt/hadoop/hadoop-2.7.2

[hadoop@cluster01 hadoop-2.7.2]$ scp -r tmp cluster02:/home/hadoop/opt/hadoop/hadoop-2.7.2/
5.5.5 格式化ZKFS
  • 操作主机: cluster01
[hadoop@cluster01 hadoop-2.7.2]$ bin/hdfs zkfc -formatZK
5.5.6 启动HDFS
  • 操作主机: cluster01
[hadoop@cluster01 hadoop-2.7.2]$ sbin/start-dfs.sh
# cluster01
Starting namenodes on [cluster01 cluster02]
cluster01: starting namenode, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-namenode-cluster01.out
cluster02: starting namenode, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-namenode-cluster02.out
cluster05: starting datanode, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-datanode-cluster05.out
cluster07: starting datanode, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-datanode-cluster07.out
cluster06: starting datanode, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-datanode-cluster06.out
Starting journal nodes [cluster05 cluster06 cluster07]
cluster05: journalnode running as process 30338. Stop it first.
cluster07: journalnode running as process 22724. Stop it first.
cluster06: journalnode running as process 12382. Stop it first.
Starting ZK Failover Controllers on NN hosts [cluster01 cluster02]
cluster01: starting zkfc, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-zkfc-cluster01.out
cluster02: starting zkfc, logging to /home/hadoop/opt/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-zkfc-cluster02.out

[hadoop@cluster01 hadoop-2.7.2]$ jps
6567 NameNode
6956 Jps
6862 DFSZKFailoverController

# cluster02
[hadoop@cluster02 hadoop-2.7.2]$ jps
12274 NameNode
12371 DFSZKFailoverController
12491 Jps

# cluster05
[hadoop@cluster05 zookeeper-3.4.10]$ jps
4496 Jps
30338 JournalNode
3416 DataNode
10494 QuorumPeerMain

# cluster06
[hadoop@cluster06 hadoop-2.7.2]$ jps
21738 DataNode
7707 QuorumPeerMain
23805 Jps
12382 JournalNode

# cluster07
[hadoop@cluster07 hadoop-2.7.2]$ jps
22724 JournalNode
19094 QuorumPeerMain
23980 Jps
23695 DataNode
5.5.7 启动Yarn–ResourceManager1
  • 操作主机: cluster03
[hadoop@cluster01 hadoop-2.7.2]$ sbin/start-yarn.sh
5.5.8 启动Yarn–ResourceManager2
  • 操作主机: cluster04
[hadoop@cluster04 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager
5.5.9 查看集群中进程是否正常启动
# cluster01
[hadoop@cluster01 hadoop-2.7.2]$ jps
11440 DFSZKFailoverController
11138 NameNode
11604 Jps

# cluster02
[hadoop@cluster02 hadoop-2.7.2]$ jps
16112 NameNode
16209 DFSZKFailoverController
16389 Jps


# cluster03
[hadoop@cluster03 ~]$ jps
18345 Jps
17946 ResourceManager


# cluster04
[hadoop@cluster04 ~]$ jps
25953 ResourceManager
25992 Jps


# cluster05
[hadoop@cluster05 ~]$ jps
30176 JournalNode
31956 NodeManager
31417 DataNode
906 Jps
10494 QuorumPeerMain


# cluster06
[hadoop@cluster06 ~]$ jps
11216 DataNode
12086 NodeManager
14391 Jps
7707 QuorumPeerMain
9661 JournalNode

# cluster07
[hadoop@cluster07 ~]$ jps
21777 Jps
21458 NodeManager
19094 QuorumPeerMain
21255 DataNode
21032 JournalNode
5.5.10 web 访问

要开放对应的端口号
多台服务器搭建 Hadoop HA 完全分布式

多台服务器搭建 Hadoop HA 完全分布式
多台服务器搭建 Hadoop HA 完全分布式

上一篇:HADOOP HA部署


下一篇:[SDOI2017]序列计数