物理机环境
CPU: Intel® Xeon® CPU D-1581 @ 1.80GHz
内存: 64 GiB DDR3 Single-bit ECC
硬盘:HS-SSD-C2000Pro_2048G_30037121557
虚拟机架构
hostname | IP |
---|---|
CentOS-01 | 10.10.10.31 |
CentOS-02 | 10.10.10.32 |
CentOS-03 | 10.10.10.33 |
安装模板虚拟机
主机名称 CentOS-01、IP 地址 10.10.10.31、内存 4G、硬盘 100G、系统为CentOS8
使用 yum 安装需要虚拟机可以正常上网,yum 安装前可以先测试下虚拟机联网情况
ping www.baidu.com
PING baidu.com (220.181.38.251) 56(84) bytes of data.
64 bytes from 220.181.38.251 (220.181.38.251): icmp_seq=1 ttl=50 time=40.7 ms
64 bytes from 220.181.38.251 (220.181.38.251): icmp_seq=2 ttl=50 time=40.7 ms
64 bytes from 220.181.38.251 (220.181.38.251): icmp_seq=3 ttl=50 time=40.0 ms
换国内源(以华为源为例)
sudo wget -O /etc/yum.repos.d/CentOS-Base.repo https://repo.huaweicloud.com/repository/conf/CentOS-8-reg.repo
安装 epel-release
sudo yum install -y epel-release
关闭防火墙,关闭防火墙开机自启
sudo systemctl stop firewalld
sudo systemctl disable firewalld.service
创建 owenxuan 用户,并修改 owenxuan 用户的密码
sudo useradd owenxuan
sudo passwd owenxuan
配置 owenxaun 用户具有 root 权限,方便后期加 sudo 执行 root 权限的命令
sudo vim /etc/sudoers
修改/etc/sudoers 文件,在%wheel 这行下面添加一行,如下所示:
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
## Allows people in group wheel to run all commands
%wheel ALL=(ALL) ALL
owenxuan ALL=(ALL) NOPASSWD:ALL
在/opt 目录下创建文件夹,并修改所属主和所属组
在/opt 目录下创建 module、software 文件夹
mkdir /opt/module
mkdir /opt/software
修改 module、software 文件夹的所有者和所属组均为 owenxuan用户
sudo chown owenxuan:owenxuan /opt/module
sudo chown owenxuan:owenxuan /opt/software
查看 module、software 文件夹的所有者和所属组
ll
总用量 4
drwxr-xr-x. 6 owenxuan owenxuan 80 12月 22 16:19 module
drwxr-xr-x. 2 owenxuan owenxuan 4096 12月 22 16:37 software
将ip地址设为静态ip
vim /etc/sysconfig/network-scripts/ifcfg-enp1s0
修改/etc/sysconfig/network-scripts/ifcfg-enp1s0
添加MACADDR 并在路由器中设置静态分配ip
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
NAME=enp1s0
UUID=f2728000-045f-43f1-b2c3-4884e6a545f3
DEVICE=enp1s0
ONBOOT=yes
MACADDR=66:18:C7:F8:95:B9
安装java8
使用官网提供的最新java8链接
-
登录oracle官网 Java下载地址
-
注册oracle账号 下载 x86 Compressed Archive
使用FinalShell 将 JDK 导入到 opt 目录下面的 software 文件夹下面
在 Linux 系统下的 opt 目录中查看软件包是否导入成功
ll /opt/software/
总用量 143360
-rw-rw-r--. 1 owenxuan owenxuan 146799982 12月 22 11:16 jdk-8u311-linux-x64.tar.gz
解压 JDK 到/opt/module 目录下
cd /opt/software/
tar -zxvf jdk-8u311-linux-x64.tar.gz -C ../module/
将JDK更名
cd ../module/
mv jdk1.8.0_311/ jdk
配置 JDK 环境变量
新建/etc/profile.d/my_env.sh 文件
sudo vim /etc/profile.d/my_env.sh
添加如下内容
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk
export PATH=$PATH:$JAVA_HOME/bin
保存后退出
source 一下/etc/profile 文件,让新的环境变量 PATH 生效
source /etc/profile
测试 JDK 是否安装成功
java -version
如果能看到以下结果,则代表 Java 安装成功。
java version "1.8.0_311"
Java(TM) SE Runtime Environment (build 1.8.0_311-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.311-b11, mixed mode)
安装 Hadoop
使用FinalShell 将 Hadoop导入到 opt 目录下面的 software 文件夹下面
在 Linux 系统下的 opt 目录中查看软件包是否导入成功
ll /opt/software
总用量 734364
-rw-rw-r--. 1 owenxuan owenxuan 605187279 12月 24 10:51 hadoop-3.3.1.tar.gz
解压 Hadoop到/opt/module 目录下
cd /opt/software/
tar -zxvf hadoop-3.3.1.tar.gz -C ../module/
将Hadoop更名
cd ../module/
mv hadoop-3.3.1/ hadoop
配置 Hadoop环境变量
新建/etc/profile.d/my_env.sh 文件
sudo vim /etc/profile.d/my_env.sh
添加如下内容
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
保存后退出
source 一下/etc/profile 文件,让新的环境变量 PATH 生效
source /etc/profile
测试 JDK 是否安装成功
hadoop version
如果能看到以下结果,则代表 Java 安装成功。
Hadoop 3.3.1
Source code repository https://github.com/apache/hadoop.git -r a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T05:13Z
Compiled with protoc 3.7.1
From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using /opt/module/hadoop/share/hadoop/common/hadoop-common-3.3.1.jar
集群配置
集群部署规划
-
NameNode 和 SecondaryNameNode 不要安装在同一台服务器
-
ResourceManager 也很消耗内存,不要和 NameNode、SecondaryNameNode 配置在 同一台机器上。
CentOS-01 | CentOS-02 | CentOS-03 | |
---|---|---|---|
HDFS | NameNode DataNode | DataNode | SecondaryNameNode DataNode |
YARN | NodeManager | ResourceManager NodeManage | NodeManage |
配置集群
进入Hadoop/etc/hadoop目录
cd $HADOOP_HOME/etc/hadoop
配置 core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定 NameNode 的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://10.10.10.31:8020</value>
</property>
<!-- 指定 hadoop 数据的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop/data</value>
</property>
<!-- 配置 HDFS 网页登录使用的静态用户为 owenxuan -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>owenxuan</value>
</property>
<property>
<name>hadoop.proxyuser.owenxuan.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.owenxuan.groups</name>
<value>*</value>
</property>
</configuration>
配置 hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- nn web 端访问地址-->
<property>
<name>dfs.namenode.http-address</name>
<value>10.10.10.31:9870</value>
</property>
<!-- 2nn web 端访问地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>10.10.10.33:9868</value>
</property>
</configuration>
配置 yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- 指定 MR 走 shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定 ResourceManager 的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>10.10.10.32</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/opt/module/hadoop/etc/hadoop:/opt/module/hadoop/share/hadoop/common/lib/*:/opt/module/hadoop/share/hadoop/common/*:/opt/module/hadoop/share/hadoop/hdfs:/opt/module/hadoop/share/hadoop/hdfs/lib/*:/opt/module/hadoop/share/hadoop/hdfs/*:/opt/module/hadoop/share/hadoop/mapreduce/*:/opt/module/hadoop/share/hadoop/yarn:/opt/module/hadoop/share/hadoop/yarn/lib/*:/opt/module/hadoop/share/hadoop/yarn/*</value>
</property>
<!-- 开启日志聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://10.10.10.31:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为 7 天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
配置 mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定 MapReduce 程序运行在 Yarn 上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>10.10.10.31:10020</value>
</property>
<!-- 历史服务器 web 端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>10.10.10.31:19888</value>
</property>
</configuration>
复制虚拟机
修改新虚拟机hostname
hostnamectl set-hostname CentOS-02
hostnamectl set-hostname CentOS-03
修改新虚拟机网卡MAC地址
vim /etc/sysconfig/network-scripts/ifcfg-enp1s0
修改网卡MAC地址
可随机更换,保证不重复即可
在路由器设置静态获取IP
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
NAME=enp1s0
UUID=f2728000-045f-43f1-b2c3-4884e6a545f3
DEVICE=enp1s0
ONBOOT=yes
MACADDR=66:18:C7:F8:95:B9
重启所有虚拟机
SSH 无密登录配置
在/home/owenxuan目录下新建.ssh文件夹
mkdir ~/.ssh
生成公钥和私钥
cd ~/.ssh/
ssh-keygen -t rsa
一直按回车
直到出现
+---[RSA 3072]----+
|BB+*++. . |
|*+=+=. o . |
|B.=o.. = |
|.+ +. + . |
|. . o . S |
|.E . * . .. . |
|. . B . .+ |
| = o.. |
| .. .oo |
+----[SHA256]-----+
将公钥拷贝到要免密登录的目标机器上
ssh-copy-id 10.10.10.31
ssh-copy-id 10.10.10.32
ssh-copy-id 10.10.10.33
在CentOS-02和CentOS-03执行同样的操作
启动集群
如果集群是第一次启动
需要在 CentOS-01 节点格式化 NameNode(注意:格式化 NameNode,会产生新的集群 id,导致 NameNode 和 DataNode 的集群 id 不一致,集群找不到已往数据。如果集群在运行过程中报错,需要重新格式化 NameNode 的话,一定要先停止 namenode 和 datanode 进程,并且要删除所有机器的 data 和 logs 目录,然后再进行格式化。)
cd /opt/module/hadoop/
hdfs namenode -format
启动 HDFS
sbin/start-dfs.sh
在配置了 ResourceManager 的节点(CentOS-02)启动 YARN
sbin/start-yarn.sh
启动 HistoryServer
mapred --daemon start historyserver
Web 端查看 HDFS 的 NameNode
- 浏览器中输入:http://10.10.10.31:9870
- 查看 HDFS 上存储的数据信息
Web 端查看 YARN 的 ResourceManager
- 浏览器中输入:http://10.10.10.32:8088
- 查看 YARN 上运行的 Job 信息
集群基本测试
上传文件到集群
hadoop fs -mkdir /input
hadoop fs -put $HADOOP_HOME/wcinput/word.txt /input
执行 wordcount 程序
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar wordcount /input /output
编写 Hadoop 集群常用脚本
Hadoop 集群启停脚本(包含 HDFS,Yarn,Historyserver):myhadoop.sh
mkdir bin && cd bin
vim myhadoop.sh
输入如下内容
#!/bin/bash
if [ # -lt 1 ]
then
echo "No Args Input..."
exit ;
fi
case1 in
"start")
echo " =================== 启动 hadoop 集群 ==================="
echo " --------------- 启动 hdfs ---------------"
ssh 10.10.10.31 "/opt/module/hadoop/sbin/start-dfs.sh"
echo " --------------- 启动 yarn ---------------"
ssh 10.10.10.32 "/opt/module/hadoop/sbin/start-yarn.sh"
echo " --------------- 启动 historyserver ---------------"
ssh 10.10.10.31 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
;;
"stop")
echo " =================== 关闭 hadoop 集群 ==================="
echo " --------------- 关闭 historyserver ---------------"
ssh 10.10.10.31 "/opt/module/hadoop/bin/mapred --daemon stop historyserver"
echo " --------------- 关闭 yarn ---------------"
ssh 10.10.10.32 "/opt/module/hadoop/sbin/stop-yarn.sh"
echo " --------------- 关闭 hdfs ---------------"
ssh 10.10.10.31 "/opt/module/hadoop/sbin/stop-dfs.sh"
;;
*)
echo "Input Args Error..."
;;
esa
保存后退出,然后赋予脚本执行权限
chmod +x myhadoop.sh
测试脚本
启动集群
myhadoop.sh start
关闭集群
myhadoop.sh stop
查看三台服务器 Java 进程脚本:jpsall
cd bin
vim jpsall
输入如下内容
#!/bin/bash
for host in 10.10.10.31 10.10.10.32 10.10.10.33
do
echo =============== $host ===============
ssh $host jps
don
保存后退出,然后赋予脚本执行权限
chmod +x jpsall
测试脚本
jpsall
集群分发脚本:xsync
cd bin
vim xsync
输入如下内容
#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2. 遍历集群所有机器
for host in 10.10.10.31 10.10.10.32 10.10.10.33
do
echo ==================== $host ====================
#3. 遍历所有目录,挨个发送
for file in $@
do
#4. 判断文件是否存在
if [ -e $file ]
then
#5. 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6. 获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
don
保存后退出,然后赋予脚本执行权限
chmod +x xsync
测试脚本
xsync /home/owenxuan/bin
在其他机器上查看~/目录下是否存在bin目录