前言
基于前面的搭建,就会发现三台容器都是发布到一台物理机上,在一台物理机上实现了CDH集群的效果,这拿来测试还行,当时实际环境下,资源是远远不够用的。
接下来,将基于前面的的步骤,使用安装包搭建完全分布式的CDH集群,在多个物理机上时间CDH集群的搭建。
跨服务容器之间的通讯问题是搭建完全分布式CDH集群的难点,这里将使用Dokcer Swarm网络解决了这个问题。
1. 拷贝安装包
将镜像安装包拷贝到各个节点
主节点拷贝 master-server.tar.gz,hadoop_CDH.zip
从节点拷贝 agent-server.tar.gz
2. 卸载Docker(各个节点)
systemctl stop docker
yum -y remove docker-ce docker-ce-cli containerd.io
rm -rf /var/lib/docker
卸载旧版本:
yum -y remove docker docker-client docker-client-latest docker-common docker-latest \
docker-latest-logratate docker-logrotate docker-engine
3. 安装Docker(各个节点)
安装需要的软件包:yum install -y yum-utils
添加国内yum源:yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
更新yum索引:yum makecache fast
安装docker:yum -y install docker-ce docker-ce-cli containerd.io
测试命令:
systemctl start docker \
&& docker version
结果:
Client: Docker Engine - Community
Version: 20.10.8
API version: 1.41
#配置镜像加速
sudo mkdir -p /etc/docker \
&& ( cat <<EOF
{"registry-mirrors":["https://qiyb9988.mirror.aliyuns.com"]}
EOF
) >> /etc/docker/daemon.json \
&& sudo systemctl daemon-reload \
&& sudo systemctl restart docker \
&& systemctl status docker
3. 初始化swarm (主节点asrserver001作为manager)
# 如果初始使用,则强制swarm节点离开(各个节点)
docker swarm leave --force
# 注意:advertise-addr必须是内网地址,为了各个节点且能够相互ping通,ssh免密登录,
# 不然会导致节点内的容器相互访问异常(比如22端口访问拒绝)
[root@server001 ~]# docker swarm init --advertise-addr 172.16.0.6
结果:
Swarm initialized: current node (iqs8gjyc6rbecu8isps4i5xv9) is now a manager.
To add a worker to this swarm, run the following command:
# 成为worker需执行的命令
docker swarm join --token SWMTKN-1-66m3f30eafi307affyhjwp4954kuai9n5xb1lveetflg4u7bkb-cqzfkonnjxxtk7zqcl9omhs5b 172.16.0.6:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
注意:如果部署在云服务上,出站端口记得开放2377 端口
4. 加入worker节点(从节点server002-003 作为work)
# 获取上一步的执行命令,分别在从节点上执行
[root@server002 ~]# docker swarm join --token SWMTKN-1-66m3f30eafi307affyhjwp4954kuai9n5xb1lveetflg4u7bkb-cqzfkonnjxxtk7zqcl9omhs5b 172.16.0.6:2377
[root@server003 ~]# docker swarm join --token SWMTKN-1-66m3f30eafi307affyhjwp4954kuai9n5xb1lveetflg4u7bkb-cqzfkonnjxxtk7zqcl9omhs5b 172.16.0.6:2377
# 从上可知,swarm node通过2377端口接入到swarm集群中的,务必开启该端口
结果:
This node joined a swarm as a worker.
5. 查看集群node信息(主节点)
[root@server001 ~]# docker node ls
# 结果
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
iqs8gjyc6rbecu8isps4i5xv9 * asrserver001 Ready Active Leader 20.10.8
rqrxol843ojajibfqbfcsk1l8 asrserver002 Ready Active 20.10.8
yu7udkwkul8nujgwdsx1tp5yo asrserver003 Ready Active 20.10.8
从上诉可知各个节点已经形成集群,server001为Leader(manager),其他节点为worker
6. 创建overlay网络(主节点)
[root@server001 ~]# docker network create --opt encrypted --driver overlay --attachable cdh-net && docker network ls
结果:
31qxzd9bs57my40deif8j3hsu
NETWORK ID NAME DRIVER SCOPE
d3b73b97d240 bridge bridge local
31qxzd9bs57m cdh-net overlay swarm
3be8470b3027 docker_gwbridge bridge local
f2fcf804158d host host local
1oaefqouo4sv ingress overlay swarm
e927f8141ece none null local
# 从上可知,ingress为docker swarm自带的网络,用于集群节点之间的通讯
cdh-net为自行添加的网络,后期用于跨服务容器之间的通讯
score=local为docker自身的网络模式,用于单个节点内容器之间的通讯方式
# 查看cdh-net网络详细信息
[root@server001 ~]# docker network inspect cdh-net
结果:
[{"Name": "cdh-net",
"Id": "s3q5ldynr8riytkq3a4beyazc",
"Created": "2021-09-13T05:50:12.398783253Z",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [{
"Subnet": "10.0.1.0/24",
"Gateway": "10.0.1.1"}]},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""},
"ConfigOnly": false,
"Containers": null,
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097",
"encrypted": ""},
"Labels": null}]
# 从该网络可知cdh-net的子网和网卡信息,在各个节点启动的容器时指定该网络,并分配子网,可实现跨服务容器的正常通讯
# 从节点(worker)查看不到cdh-net网络,只要在启动容器加入该网络后才能够查看
7.server001节点上启动容器
# 往docker内加载镜像包:
[root@server001 ~]# docker load -i /root/master-server.tar.gz && docker images
结果:
Loaded image: master-server/cdh:6.3.2
REPOSITORY TAG IMAGE ID CREATED SIZE
master-server/cdh 6.3.2 d4f3e4ee3f9e 26 minutes ago 3.62GB
# 创建启动容器: 指定swarm自定义网络cdh-net
[root@server001 ~]# docker run \
--restart always \
-d --name server001 \
--hostname server001 \
--net cdh-net \
--ip 10.0.1.4 \
-p 8020:8020 \
-p 8088:8088 \
-p 19888:19888 \
-p 9870:9870 \
-p 9000:9000 \
-p 7180:7180 \
-p 2181:2181 \
--privileged=true \
-v /usr/local/src/host-config/hosts:/etc/hosts \
-v /etc/localtime:/etc/localtime:ro \
-e TZ="Asia/Shanghai" \
master-server/cdh:6.3.2 \
/usr/sbin/init \
&& docker ps
8. server002节点上启动容器
# 往docker内加载镜像包:
[root@server002 ~]# docker load -i /root/agent-server.tar.gz && docker images
结果:
Loaded image: agent-server/cdh:6.3.2
REPOSITORY TAG IMAGE ID CREATED SIZE
agent-server/cdh 6.3.2 5d91a7f659a1 11 minutes ago 2.8GB
# 创建启动容器:
# 指定swarm自定义网络cdh-net,自动加入swarm集群
[root@server002 ~]# docker run -d \
--restart always \
--hostname server002 \
--name server002 \
--net cdh-net \
--ip 10.0.1.6 \
-p 10000:10000 \
-p 2181:2181 \
--privileged=true \
-v /usr/local/src/host-config/hosts:/etc/hosts \
-v /etc/localtime:/etc/localtime:ro \
-e TZ="Asia/Shanghai" \
agent-server/cdh:6.3.2 \
/usr/sbin/init \
&& docker ps
9. server003节点上启动容器
# 往docker内加载镜像包:
[root@server003 ~]# docker load -i agent-server.tar.gz && docker images
结果:
Loaded image: agent-server/cdh:6.3.2
REPOSITORY TAG IMAGE ID CREATED SIZE
agent-server/cdh 6.3.2 5d91a7f659a1 11 minutes ago 2.8GB
# 创建启动容器:
# 指定swarm自定义网络cdh-net,自动加入swarm集群
[root@server003 ~]# docker run -d \
--restart always \
--hostname server003 \
--name server003 \
--net cdh-net \
--ip 10.0.1.8 \
-p 12345:12345 \
-p 2181:2181 \
--privileged=true \
-v /usr/local/src/host-config/hosts:/etc/hosts \
-v /etc/localtime:/etc/localtime:ro \
-e TZ="Asia/Shanghai" \
agent-server/cdh:6.3.2 \
/usr/sbin/init \
&& docker ps
10. 查看集群网络(各个节点)
检查各个节点组成的集群网络配置是否正确。
正常情况下:再cdh-net网络下每加入一个容器,“Peers”下就会新增该节点的IP地址,从而保证swarm集群之间的正常通讯
# 三台节点都可以查看cdh-net网络,只有容器ip,name改变,其他结果不变
[root@server001 ~]# docker network inspect cdh-net
结果:
[{"Name": "cdh-net",
"Id": "enzbj3sg1wg1s5vn5gextjc59",
"Created": "2021-09-13T14:27:55.559422055+08:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [{
"Subnet": "10.0.1.0/24",
"Gateway": "10.0.1.1"}]},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""},
"ConfigOnly": false,
"Containers": {
"b8e1b1f987f1af38946018f77dfb8429a9d41ae503f4d42f4391fbfae53d0b46": {
"Name": "server003", # 容器名称
"EndpointID": "5da0812008ec5af9fac93ed7e8e4ceeb09a1ffb59e3d8b6be83c7bd319a1c3ea",
"MacAddress": "02:42:0a:00:01:06",
"IPv4Address": "10.0.1.8/24", # 容器ip地址 ,新增容器ip递增
"IPv6Address": ""
},
"lb-cdh-net": {
"Name": "cdh-net-endpoint",
"EndpointID": "48ec1b73e478b7c6475048229a8d803646d66b71a7e7f5b0719641a906d0e07b",
"MacAddress": "02:42:0a:00:01:07",
"IPv4Address": "10.0.1.9/24", #
"IPv6Address": ""}},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097",
"encrypted": ""},
"Labels": {},
"Peers": [{
"Name": "a0f495c4d7a7",
"IP": "172.16.0.6" # 节点server001的内网ip地址
},{
"Name": "973c153cd191",
"IP": "172.16.0.16" # 节点server002的内网ip地址
},{
"Name": "d4f899e63511",
"IP": "172.16.0.2"}]}] # 节点server003的内网ip地址
# 节点之间(node)通过2377端口保持通讯, 容器网络通过7946端口保持通讯
# 如果启动server容器后,Peers看不到容器相应的ip,应该是该容器没有绑定到cdh-net网络上
# docker network disconnect cdh-net server001
# docker network connect cdh-net server001
11. 测试跨服务容器之间的网络通讯(各个节点)
# 进入各个容器:
[root@server001 ~]# docker exec -ti --privileged=true server001 /bin/bash
[root@server003 ~]# docker exec -ti --privileged=true server002 /bin/bash
[root@server003 ~]# docker exec -ti --privileged=true server003 /bin/bash
docker exec -ti --privileged=true $(docker ps | awk 'NR==2 {print $1}') /bin/bash
ping server001 -c 3 && ping server002 -c 3 && ping server003 -c 3
结果:PING server001 (10.0.1.2) 56(84) bytes of data.
64 bytes from server001.cdh-net (10.0.1.2): icmp_seq=1 ttl=64 time=0.419 ms
64 bytes from server001.cdh-net (10.0.1.2): icmp_seq=2 ttl=64 time=0.342 ms
64 bytes from server001.cdh-net (10.0.1.2): icmp_seq=3 ttl=64 time=0.368 ms
--- server001 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.342/0.376/0.419/0.035 ms
PING server002 (10.0.1.4) 56(84) bytes of data.
64 bytes from server002 (10.0.1.4): icmp_seq=1 ttl=64 time=0.025 ms
64 bytes from server002 (10.0.1.4): icmp_seq=2 ttl=64 time=0.035 ms
64 bytes from server002 (10.0.1.4): icmp_seq=3 ttl=64 time=0.036 ms
--- server002 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.025/0.032/0.036/0.005 ms
PING server003 (10.0.1.8) 56(84) bytes of data.
64 bytes from server003.cdh-net (10.0.1.8): icmp_seq=1 ttl=64 time=0.230 ms
64 bytes from server003.cdh-net (10.0.1.8): icmp_seq=2 ttl=64 time=0.297 ms
64 bytes from server003.cdh-net (10.0.1.8): icmp_seq=3 ttl=64 time=0.319 ms
--- server003 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.230/0.282/0.319/0.037 ms
12. 跨服务容器设置ssh(各个节点)
# 初始化各个容器root密码,
# 统一修改为12345678或者123456,由于asr-admin需要通过ssh上传flinkx-json文件过来,所以如果密码修改后需要和java端口对接好
passwd root
# 修改/etc/host
10.0.1.4 server001
10.0.1.6 server002
10.0.1.8 server003
# 各个容器执行生成分发ssh密钥,
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa \
&& ssh-copy-id server001 \
&& ssh-copy-id server002 \
&& ssh-copy-id server003
# 测试免密访问各个容器
ssh server001
ssh server002
ssh server003
13. 拷贝mysql jdbc驱动(主节点容器内)
把parcel安装包和msyql驱动包也拷贝到现场的server001容器内,接下来都在server001上操作
该驱动是为了让CDH服务能够连接mysql数据库,保存操作数据
先在宿主机上,将mysql的jar包和待安装的程序拷贝到容器的/root目录下:
[root@server001 ~]# tree /root/hadoop_CDH/
hadoop_CDH/
├── flink-csd
│ ├── FLINK-1.10.2.jar
│ └── FLINK_ON_YARN-1.10.2.jar
├── mysql-jdbc
│ └── mysql-connector-java.jar
└── parcel
├── CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel
├── FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel
└── manifest.json
3 directories, 6 files
# 在宿主机上执行,开始拷贝:
[root@server001 ~]# docker cp /root/hadoop_CDH/ server001:/root
# 重新进入容器
[root@server001 ~]# docker exec -ti --privileged=true server001 /bin/bash
# 回到server001容器内
[root@server001 ~]# mkdir -p /usr/share/java/ \
&& cp /root/hadoop_CDH/mysql-jdbc/mysql-connector-java.jar /usr/share/java/ \
&& rm -rf /root/hadoop_CDH/mysql-jdbc/ \
&& ls /usr/share/java/
结果:0
mysql-connector-java.jar
14. 配置parcel安装包(主节点容器内)
cd /opt/cloudera/parcel-repo/;mv /root/hadoop_CDH/parcel/* ./ \
&& sha1sum CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel| awk '{ print $1 }' > CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha \
&& sha1sum FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel | awk '{ print $1 }' > FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel.sha \
&& rm -rf /root/hadoop_CDH/parcel/ \
&& chown -R cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo/* \
&& ll /opt/cloudera/parcel-repo/
结果:
total 2330372
-rw-r--r-- 1 cloudera-scm cloudera-scm 2082186246 6月 15 16:15 CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel
-rw-r--r-- 1 cloudera-scm cloudera-scm 41 9月 12 18:11 CDH-6.3.2-1.cdh6.3.2.p0.1605554-el7.parcel.sha
-rw-r--r-- 1 cloudera-scm cloudera-scm 304055379 12月 1 2020 FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel
-rw-r--r-- 1 cloudera-scm cloudera-scm 41 9月 12 18:11 FLINK-1.10.2-BIN-SCALA_2.12-el7.parcel.sha
-rw-r--r-- 1 cloudera-scm cloudera-scm 34411 7月 9 09:53 manifest.json
15. 配置Flink安装包(主节点容器内)
# 在将flink的jar拷贝到/opt/cloudera/csd/,
cp /root/hadoop_CDH/flink-csd/* /opt/cloudera/csd/ \
&& ll /opt/cloudera/csd/ \
&& rm -rf /root/hadoop_CDH/flink-csd/
结果:
total 20
-rw-r--r-- 1 root root 7737 7月 9 10:01 FLINK-1.10.2.jar
-rw-r--r-- 1 root root 8260 7月 9 10:01 FLINK_ON_YARN-1.10.2.jar
16. 初始化CDH 的scm库(主节点容器内)
mysql和cdh是在同一个容器内,这样方便迁移。之前尝试过将mysql单独做成一个容器,但是在迁移的过程遇到不明的问题,暂且放弃这样方案
cdh的操作数据也可以存储在oracle等数据库中,从scm_prepare_database.sh可知;
从/etc/cloudera-scm-server/db.properties可以看到数据库连接信息
# mysql和cdh同容器:
/opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm 123456
结果:
JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera
Verifying that we can write to /etc/cloudera-scm-server
Creating SCM configuration file in /etc/cloudera-scm-server
Executing: /usr/java/jdk1.8.0_181-cloudera/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar:/opt/cloudera/cm/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
Tue Jul 06 08:58:16 UTC 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[ main] DbCommandExecutor INFO Successfully connected to database.
All done, your SCM database is configured correctly!
17. 启动所有节点的agent服务(各个节点容器内)
# 重启各个容器的agent服务,防止在parcel安装包分发解压是出现问题
systemctl enable cloudera-scm-agent \
&& systemctl restart cloudera-scm-agent \
&& systemctl status cloudera-scm-agent
18. 启动master服务(主节点容器内)
systemctl enable cloudera-scm-server \
&& systemctl restart cloudera-scm-server \
&& sleep 2 \
&& tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
# 结果:等待启动
2021-07-06 09:01:33,685 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server. 启动成功标识
2021-07-06 09:02:23,792 INFO avro-servlet-hb-processor-2:com.cloudera.server.common.AgentAvroServlet: (5 skipped) AgentAvroServlet: heartbeat processing stats: average=46ms, min=11ms, max=192ms.
# 在容器内运行: 判断server是否启动成功
[root@server001 ~]# curl http://server001:7180
<head><meta http-equiv="refresh" content="0;url=/cmf/"></head>
# 在宿主机上运行:判断端口是否映射出去
[root@server001 ~]# curl http://server001:7180
<head><meta http-equiv="refresh" content="0;url=/cmf/"></head>
# 说明scm已经启动成功,能够登录cm平台
CDH服务启动后,按照上次的步骤进行安装