目录
集群规划
Hadoop HA部署:
1)软件环境
2)系统环境准备
3)配置SSH通信
4)配置环境变量
5)配置zookeeper
6)配置Hadoop
7)启动集群
8)启动关闭顺序
9)Hadoop HA部署避坑指南
集群规划
主机 | 安装软件 | 进程 |
---|---|---|
hadoop001 | Hadoop、Zookeeper | NameNode DFSZKFailoverController JournalNode DataNode ResourceManager JobHistoryServer NodeManager QuorumPeerMain |
hadoop002 | Hadoop、Zookeeper | NameNode DFSZKFailoverController JournalNode DataNode ResourceManager NodeManager QuorumPeerMain |
hadoop003 | Hadoop、Zookeeper | JournalNode DataNode QuorumPeerMain NodeManager |
Hadoop HA部署
以阿里云主机为例
1)软件环境
- jdk8u45
- Hadoop-2.6.0-CDH5.7.0
- Zookeeper3.4.6
2)系统环境准备
1.添加用户
[root@hadoop002 ~]# useradd hadoop
[root@hadoop002 ~]# passwd hadoop
2.更改用户权限
[root@hadoop002 ~]# vi /etc/sudoers
3)配置SSH通信
阿里云的centos镜像如果产生Permission denied, please try again错误,需要配置/etc/ssh/sshd_config文件。解决方法:
https://help.aliyun.com/knowledge_detail/41487.html?spm=a2c4e.11153987.0.0.6bcc4fbb6frbyn
a.修改主机的hosts文件
[root@hadoop001 .ssh]# vi /etc/hosts
传输hosts文件至另外两台主机
b.三台主机生成密钥
[hadoop@hadoop001 ~\]$ ssh-keygen
Generating public/private rsa key pair.
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
4b:fe:5c:6a:f5:df:80:28:02:96:6f:b8:7e:be:a1:0a hadoop@hadoop001
The key's randomart image is:
+--\[ RSA 2048\]----+
| |
| |
| |
| . |
| + S |
| . + o . ... |
|E . = + ..o.. |
| . +.o +.o ...|
| .o+oo. .+ .o|
+-----------------+
c.将各台主机相应的id_rsa.pub文件分发到其他主机
此处id_rsa.pub2代表hadoop002机器下,如果是hadoop003机器为id_rsa.pub3,以此类推
hadoop002
[hadoop@hadoop002 .ssh]$ scp id_rsa.pub root@hadoop001:/home/hadoop/.ssh/id_rsa.pub2
The authenticity of host 'hadoop001 (*.*.*.43)' can't be established.
RSA key fingerprint is 7f:5b:5d:20:6e:f1:9c:18:01:1e:c4:97:ea:6f:2c:a2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop001,*.*.*.43' (RSA) to the list of known hosts.
root@hadoop001's password:
id_rsa.pub 100% 398 0.4KB/s 00:00
hadoop003
[hadoop@hadoop003 .ssh]$ scp id_rsa.pub root@hadoop001:/home/hadoop/.ssh/id_rsa.pub2
The authenticity of host 'hadoop001 (*.*.*.43)' can't be established.
RSA key fingerprint is 7f:5b:5d:20:6e:f1:9c:18:01:1e:c4:97:ea:6f:2c:a2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop001,*.*.*.43' (RSA) to the list of known hosts.
root@hadoop001's password:
id_rsa.pub 100% 398 0.4KB/s 00:00
hadoop001将id_rsa.pub写入authorized_keys,并分发至其他主机
[hadoop@hadoop001 .ssh]$ cat id_rsa.pub >> authorized_keys
[hadoop@hadoop001 .ssh]$ ll
total 20
-rw-rw-r-- 1 hadoop hadoop 398 Feb 14 00:44 authorized_keys
-rw------- 1 hadoop hadoop 1675 Feb 14 00:32 id_rsa
-rw-r--r-- 1 hadoop hadoop 398 Feb 14 00:32 id_rsa.pub
-rw-r--r-- 1 root root 398 Feb 14 00:36 id_rsa.pub2
-rw-r--r-- 1 root root 398 Feb 14 00:42 id_rsa.pub3
[hadoop@hadoop001 .ssh]$ cat id_rsa.pub2 >> authorized_keys
[hadoop@hadoop001 .ssh]$ cat id_rsa.pub3 >> authorized_keys
[hadoop@hadoop001 .ssh]$ scp authorized_keys root@hadoop002:/home/hadoop/.ssh/
The authenticity of host 'hadoop002 (*.*.*.42)' can't be established.
RSA key fingerprint is a2:5c:9d:ee:67:0d:66:0d:df:1b:47:3d:f5:3c:2c:8d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop002,*.*.*.42' (RSA) to the list of known hosts.
root@hadoop002's password:
authorized_key 100% 1194 1.2KB/s 00:00
[hadoop@hadoop001 .ssh]$ scp authorized_keys root@hadoop003:/home/hadoop/.ssh/
The authenticity of host 'hadoop003 (*.*.*.41)' can't be established.
RSA key fingerprint is aa:43:c2:8b:31:09:b7:46:d5:e2:a3:79:69:94:0c:50.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop003,*.*.*.41' (RSA) to the list of known hosts.
root@hadoop003's password:
authorized_key 100% 1194 1.2KB/s 00:00
因为是通过root用户传输,通过Xshell同时操作退出后修改
[hadoop@hadoop001 .ssh]$ exit
logout
[root@hadoop001 hadoop]# chown -R hadoop:hadoop /home/hadoop/.ssh/*
[root@hadoop001 hadoop]# chown -R hadoop:hadoop /home/hadoop/.ssh/
d.修改权限,同时操作多台
[hadoop@hadoop001 .ssh]$ sudo chmod 700 -R ~/.ssh[sudo] password for hadoop:
[hadoop@hadoop001 .ssh]$ sudo chmod 600 ~/.ssh/authorized_keys
[hadoop@hadoop001 .ssh]$ ll
total 12
-rw------- 1 hadoop hadoop 398 Feb 13 23:46 authorized_keys
-rwx------ 1 hadoop hadoop 1675 Feb 13 23:45 id_rsa
-rwx------ 1 hadoop hadoop 398 Feb 13 23:45 id_rsa.pub
测试SSH通信
4)配置环境变量
java目录 /usr/java 设置为全局,注意此处有坑
[root@hadoop001 jdk1.8.0_45]# vi /etc/profile
[root@hadoop001 jdk1.8.0_45]# source /etc/profile
[root@hadoop001 jdk1.8.0_45]# java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
[root@hadoop001 jdk1.8.0_45]# which java
/usr/java/jdk1.8.0_45/bin/java
配置hadoop&zookeeper环境变量,配置在当前用户
安装目录
环境变量
创建data,logs,tmp文件夹,并把tmp改为777权限
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ mkdir data logs tmp
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ sudo chmod -R 777 tmp
[sudo] password for hadoop:
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ ll
total 68
drwxrwxr-x 2 hadoop hadoop 4096 Jan 12 22:15 bin
drwxrwxr-x 2 hadoop hadoop 4096 Feb 14 01:52 data
drwxrwxr-x 3 hadoop hadoop 4096 Jan 12 22:15 etc
drwxrwxr-x 2 hadoop hadoop 4096 Jan 12 22:15 include
drwxrwxr-x 3 hadoop hadoop 4096 Jan 12 22:15 lib
drwxrwxr-x 2 hadoop hadoop 4096 Jan 12 22:15 libexec
-rw-rw-r-- 1 hadoop hadoop 17087 Jan 12 22:15 LICENSE.txt
drwxrwxr-x 2 hadoop hadoop 4096 Feb 14 01:58 logs
-rw-rw-r-- 1 hadoop hadoop 101 Jan 12 22:15 NOTICE.txt
-rw-rw-r-- 1 hadoop hadoop 1366 Jan 12 22:15 README.txt
drwxrwxr-x 2 hadoop hadoop 4096 Jan 12 22:15 sbin
drwxrwxr-x 4 hadoop hadoop 4096 Jan 12 22:15 share
drwxrwxrwx 2 hadoop hadoop 4096 Feb 14 01:58 tmp
5)配置zookeeper
[hadoop@hadoop001 conf]$ pwd
/home/hadoop/app/zookeeper-3.4.6/conf
[hadoop@hadoop001 conf]$ ls
configuration.xsl zoo_sample.cfg
log4j.properties
[hadoop@hadoop001 conf]$ cp zoo_sample.cfg zoo.cfg
[hadoop@hadoop001 conf]$ ls
configuration.xsl zoo.cfg
log4j.properties zoo_sample.cfg
[hadoop@hadoop001 conf]$ vi zoo.cfg
复制到其他主机
[hadoop@hadoop001 conf]$ scp zoo.cfg hadoop002:/home/hadoop/app/zookeeper-3.4.6/conf
zoo.cfg 100% 1033 1.0KB/s 00:00
[hadoop@hadoop001 conf]$ scp zoo.cfg hadoop003:/home/hadoop/app/zookeeper-3.4.6/conf
zoo.cfg 100% 1033 1.0KB/s 00:00
[hadoop@hadoop001 zookeeper-3.4.6]$ mkdir data
[hadoop@hadoop001 zookeeper-3.4.6]$ touch data/myid
[hadoop@hadoop001 zookeeper-3.4.6]$ echo 1 >data/myid
[hadoop@hadoop001 zookeeper-3.4.6]$ cat data/myid
1
hadoop002/003也修改配置
[hadoop@hadoop002 zookeeper-3.4.6]$ echo 2 >data/myid
[hadoop@hadoop003 zookeeper-3.4.6]$ echo 3 >data/myid
6)配置Hadoop
修改hadoop-env.sh文件内的JAVA_HOME
[hadoop@hadoop001 hadoop]$ vi hadoop-env.sh
[hadoop@hadoop001 hadoop]$ scp hadoop-env.sh hadoop002:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop
hadoop-env.sh 100% 4233 4.1KB/s 00:00
[hadoop@hadoop001 hadoop]$ scp hadoop-env.sh hadoop003:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop
hadoop-env.sh 100% 4233 4.1KB/s 00:00
编辑slaves
编辑core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--Yarn 需要使用 fs.defaultFS 指定NameNode URI -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://alterpan</value>
</property>
<!--==============================Trash机制======================================= -->
<property>
<!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 -->
<name>fs.trash.checkpoint.interval</name>
<value>0</value>
</property>
<property>
<!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 -->
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<!--指定hadoop临时目录, hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这>个路径中 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/tmp</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<!--指定ZooKeeper超时间隔,单位毫秒 -->
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>2000</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec
</value>
</property>
</configuration>
编辑hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--HDFS超级用户 -->
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<!--开启web hdfs -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/name</value>
<description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.namenode.edits.dir</name>
<value>${dfs.namenode.name.dir}</value>
<description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/data</value>
<description>datanode存放block本地目录(需要修改)</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 块大小256M (默认128M) -->
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<!--======================================================================= -->
<!--HDFS高可用配置 -->
<!--指定hdfs的nameservice为alterpan,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>alterpan</value>
</property>
<property>
<!--设置NameNode IDs 此版本最大只支持两个NameNode -->
<name>dfs.ha.namenodes.alterpan</name>
<value>nn1,nn2</value>
</property>
<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.alterpan.nn1</name>
<value>hadoop001:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.alterpan.nn2</name>
<value>hadoop002:8020</value>
</property>
<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
<property>
<name>dfs.namenode.http-address.alterpan.nn1</name>
<value>hadoop001:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.alterpan.nn2</name>
<value>hadoop002:50070</value>
</property>
<!--==================Namenode editlog同步 ============================================ -->
<!--保证数据恢复 -->
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<property>
<!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
<!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/alterpan</value>
</property>
<property>
<!--JournalNode存放数据地址 -->
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/data/dfs/jn</value>
</property>
<!--==================DataNode editlog同步 ============================================ -->
<property>
<!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
<!-- 配置失败自动切换实现方式 -->
<name>dfs.client.failover.proxy.provider.alterpan</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--==================Namenode fencing:=============================================== -->
<!--Failover后防止停掉的Namenode启动,造成两个服务 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<!--多少milliseconds 认为fencing失败 -->
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
<!--开启基于Zookeeper -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!--动态许可datanode连接namenode列表 -->
<property>
<name>dfs.hosts</name>
<value>/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop/slaves</value>
</property>
</configuration>
编辑mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 配置 MapReduce Applications -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- JobHistory Server ============================================================== -->
<!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop001:10020</value>
</property>
<!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop001:19888</value>
</property>
<!-- 配置 Map段输出的压缩,snappy-->
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
</configuration>
编辑yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- nodemanager 配置 ================================================= -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.localizer.address</name>
<value>0.0.0.0:23344</value>
<description>Address where the localizer IPC is.</description>
</property>
<property>
<name>yarn.nodemanager.webapp.address</name>
<value>0.0.0.0:23999</value>
<description>NM Webapp address.</description>
</property>
<!-- HA 配置 =============================================================== -->
<!-- Resource Manager Configs -->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 使嵌入式自动故障转移。HA环境启动,与 ZKRMStateStore 配合 处理fencing -->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<!-- 集群名称,确保HA选举时对应的集群 -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!--这里RM主备结点需要单独指定,(可选)
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm2</value>
</property>
-->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<!-- ZKRMStateStore 配置 -->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
</property>
<!-- Client访问RM的RPC地址 (applications manager interface) -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>hadoop001:23140</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>hadoop002:23140</value>
</property>
<!-- AM访问RM的RPC地址(scheduler interface) -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>hadoop001:23130</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>hadoop002:23130</value>
</property>
<!-- RM admin interface -->
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>hadoop001:23141</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>hadoop002:23141</value>
</property>
<!--NM访问RM的RPC端口 -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>hadoop001:23125</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>hadoop002:23125</value>
</property>
<!-- RM web application 地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop001:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop002:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm1</name>
<value>hadoop001:23189</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm2</name>
<value>hadoop002:23189</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop001:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
<discription>单个任务可申请最少内存,默认1024MB</discription>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<discription>单个任务可申请最大内存,默认8192MB</discription>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
</configuration>
7)启动集群
a.启动zookeeper,注意是否正常开启leader
b.启动Hadoop(HDFS + YARN)
格式化前现在JN节点上启动JN
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ cd sbin
[hadoop@hadoop001 sbin]$ ./hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-journalnode-hadoop001.out
[hadoop@hadoop001 sbin]$ jps
1955 JournalNode
2006 Jps
1878 QuorumPeerMain
c.选取hadoop001机器进行namenode格式化
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ hadoop namenode -format
d.同步元数据到hadoop002
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ scp -r data hadoop002:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/
in_use.lock 100% 14 0.0KB/s 00:00
VERSION 100% 154 0.2KB/s 00:00
fsimage_0000000000000 100% 62 0.1KB/s 00:00
seen_txid 100% 2 0.0KB/s 00:00
VERSION 100% 204 0.2KB/s 00:00
fsimage_0000000000000 100% 338 0.3KB/s 00:00
e.初始化zkfc
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ hdfs zkfc -formatZK
f.启动HDFS
[hadoop@hadoop001 sbin]$ ./start-yarn.sh
g.hadoop002 备机启动 RM
[hadoop@hadoop002 ~]# yarn-daemon.sh start resourcemanager
8)集群启动关闭顺序
--------------------启动-----------------------------
a.启动zookeeper
[hadoop@hadoop001 bin]# zkServer.sh start
[hadoop@hadoop002 bin]# zkServer.sh start
[hadoop@hadoop002 bin]# zkServer.sh start
b.启动Hadoop(HDFS + YARN)
[hadoop@hadoop001 sbin]# start-dfs.sh
[hadoop@hadoop001 sbin]# start-yarn.sh
[hadoop@hadoop002 sbin]# yarn-daemon.sh start resourcemanager
[hadoop@hadoop001 ~]# $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
--------------------关闭-----------------------------
a.关闭HDFS + YARN
[hadoop@hadoop001 sbin]# stop-yarn.sh
[hadoop@hadoop002 sbin]# yarn-daemon.sh stop resourcemanager
[hadoop@hadoop001 sbin]# stop-dfs.sh
b.关闭zookeeper
[hadoop@hadoop001 bin]# zkServer.sh stop
[hadoop@hadoop002 bin]# zkServer.sh stop
[hadoop@hadoop003 bin]# zkServer.sh stop
9)Hadoop HA部署避坑指南
1.Java权限问题,chown -R root:root java所在目录解决
2.配置SSH通信时报错Permission denied, please try again如何解决:
1)https://help.aliyun.com/knowledge_detail/41487.html?spm=a2c4e.11153987.0.0.6bcc4fbb6frbyn
2)检查配置的SSH文件,密钥文件没有存储到对应的主机也会造成此种情况
3.集群启动时报错Name or service not knownstname hadoop001如何解决
1)检查日志
2)手工启动可以正常运行
3)查看slaves文件(我就是因为salves文件问题,注意如果是在windows编辑上传的话需要文件格式转换,不然显示的是[dos])