Hadoop部署

物理机环境

CPU: Intel® Xeon® CPU D-1581 @ 1.80GHz

内存: 64 GiB DDR3 Single-bit ECC

硬盘:HS-SSD-C2000Pro_2048G_30037121557

虚拟机架构

hostname IP
CentOS-01 10.10.10.31
CentOS-02 10.10.10.32
CentOS-03 10.10.10.33

安装模板虚拟机

主机名称 CentOS-01、IP 地址 10.10.10.31、内存 4G、硬盘 100G、系统为CentOS8

使用 yum 安装需要虚拟机可以正常上网,yum 安装前可以先测试下虚拟机联网情况

ping www.baidu.com
PING baidu.com (220.181.38.251) 56(84) bytes of data.
64 bytes from 220.181.38.251 (220.181.38.251): icmp_seq=1 ttl=50 time=40.7 ms
64 bytes from 220.181.38.251 (220.181.38.251): icmp_seq=2 ttl=50 time=40.7 ms
64 bytes from 220.181.38.251 (220.181.38.251): icmp_seq=3 ttl=50 time=40.0 ms

换国内源(以华为源为例)

sudo wget -O /etc/yum.repos.d/CentOS-Base.repo https://repo.huaweicloud.com/repository/conf/CentOS-8-reg.repo

安装 epel-release

sudo yum install -y epel-release

关闭防火墙,关闭防火墙开机自启

sudo systemctl stop firewalld
sudo systemctl disable firewalld.service

创建 owenxuan 用户,并修改 owenxuan 用户的密码

sudo useradd owenxuan 
sudo passwd owenxuan 

配置 owenxaun 用户具有 root 权限,方便后期加 sudo 执行 root 权限的命令

sudo vim /etc/sudoers

修改/etc/sudoers 文件,在%wheel 这行下面添加一行,如下所示:

## Allow root to run any commands anywhere
root ALL=(ALL) ALL
## Allows people in group wheel to run all commands
%wheel ALL=(ALL) ALL
owenxuan ALL=(ALL) NOPASSWD:ALL

在/opt 目录下创建文件夹,并修改所属主和所属组

在/opt 目录下创建 module、software 文件夹

mkdir /opt/module
mkdir /opt/software

修改 module、software 文件夹的所有者和所属组均为 owenxuan用户

sudo chown owenxuan:owenxuan /opt/module 
sudo chown owenxuan:owenxuan /opt/software

查看 module、software 文件夹的所有者和所属组

ll
总用量 4
drwxr-xr-x. 6 owenxuan owenxuan   80 12月 22 16:19 module
drwxr-xr-x. 2 owenxuan owenxuan 4096 12月 22 16:37 software

将ip地址设为静态ip

vim /etc/sysconfig/network-scripts/ifcfg-enp1s0 

修改/etc/sysconfig/network-scripts/ifcfg-enp1s0

添加MACADDR 并在路由器中设置静态分配ip

TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
NAME=enp1s0
UUID=f2728000-045f-43f1-b2c3-4884e6a545f3
DEVICE=enp1s0
ONBOOT=yes
MACADDR=66:18:C7:F8:95:B9

安装java8

使用官网提供的最新java8链接

  • 登录oracle官网 Java下载地址

  • 注册oracle账号 下载 x86 Compressed Archive

使用FinalShell 将 JDK 导入到 opt 目录下面的 software 文件夹下面

在 Linux 系统下的 opt 目录中查看软件包是否导入成功

ll /opt/software/
总用量 143360
-rw-rw-r--. 1 owenxuan owenxuan 146799982 12月 22 11:16 jdk-8u311-linux-x64.tar.gz

解压 JDK 到/opt/module 目录下

cd /opt/software/
tar -zxvf jdk-8u311-linux-x64.tar.gz -C ../module/

将JDK更名

cd ../module/
mv jdk1.8.0_311/ jdk

配置 JDK 环境变量

新建/etc/profile.d/my_env.sh 文件

sudo vim /etc/profile.d/my_env.sh

添加如下内容

#JAVA_HOME
export JAVA_HOME=/opt/module/jdk
export PATH=$PATH:$JAVA_HOME/bin

保存后退出

source 一下/etc/profile 文件,让新的环境变量 PATH 生效

source /etc/profile

测试 JDK 是否安装成功

java -version

如果能看到以下结果,则代表 Java 安装成功。

java version "1.8.0_311"
Java(TM) SE Runtime Environment (build 1.8.0_311-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.311-b11, mixed mode)

安装 Hadoop

Hadoop 下载地址

使用FinalShell 将 Hadoop导入到 opt 目录下面的 software 文件夹下面

在 Linux 系统下的 opt 目录中查看软件包是否导入成功

ll /opt/software
总用量 734364
-rw-rw-r--. 1 owenxuan owenxuan 605187279 12月 24 10:51 hadoop-3.3.1.tar.gz

解压 Hadoop到/opt/module 目录下

cd /opt/software/
tar -zxvf hadoop-3.3.1.tar.gz -C ../module/

将Hadoop更名

cd ../module/
mv hadoop-3.3.1/ hadoop

配置 Hadoop环境变量

新建/etc/profile.d/my_env.sh 文件

sudo vim /etc/profile.d/my_env.sh

添加如下内容

#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

保存后退出

source 一下/etc/profile 文件,让新的环境变量 PATH 生效

source /etc/profile

测试 JDK 是否安装成功

hadoop version

如果能看到以下结果,则代表 Java 安装成功。

Hadoop 3.3.1
Source code repository https://github.com/apache/hadoop.git -r a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T05:13Z
Compiled with protoc 3.7.1
From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using /opt/module/hadoop/share/hadoop/common/hadoop-common-3.3.1.jar

集群配置

集群部署规划

  • NameNode 和 SecondaryNameNode 不要安装在同一台服务器

  • ResourceManager 也很消耗内存,不要和 NameNode、SecondaryNameNode 配置在 同一台机器上。

CentOS-01 CentOS-02 CentOS-03
HDFS NameNode DataNode DataNode SecondaryNameNode DataNode
YARN NodeManager ResourceManager NodeManage NodeManage

配置集群

进入Hadoop/etc/hadoop目录

cd $HADOOP_HOME/etc/hadoop

配置 core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <!-- 指定 NameNode 的地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://10.10.10.31:8020</value>
    </property>
    <!-- 指定 hadoop 数据的存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/module/hadoop/data</value>
    </property>
    <!-- 配置 HDFS 网页登录使用的静态用户为 owenxuan -->
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>owenxuan</value>
    </property>
    <property>
        <name>hadoop.proxyuser.owenxuan.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.owenxuan.groups</name>
        <value>*</value>
    </property>
</configuration>

配置 hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- nn web 端访问地址-->
<property>
 <name>dfs.namenode.http-address</name>
 <value>10.10.10.31:9870</value>
 </property>
<!-- 2nn web 端访问地址-->
 <property>
 <name>dfs.namenode.secondary.http-address</name>
 <value>10.10.10.33:9868</value>
 </property>
</configuration>

配置 yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
    <!-- 指定 MR 走 shuffle -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <!-- 指定 ResourceManager 的地址-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>10.10.10.32</value>
    </property>
    <property>
        <name>yarn.application.classpath</name>
        <value>/opt/module/hadoop/etc/hadoop:/opt/module/hadoop/share/hadoop/common/lib/*:/opt/module/hadoop/share/hadoop/common/*:/opt/module/hadoop/share/hadoop/hdfs:/opt/module/hadoop/share/hadoop/hdfs/lib/*:/opt/module/hadoop/share/hadoop/hdfs/*:/opt/module/hadoop/share/hadoop/mapreduce/*:/opt/module/hadoop/share/hadoop/yarn:/opt/module/hadoop/share/hadoop/yarn/lib/*:/opt/module/hadoop/share/hadoop/yarn/*</value>
    </property>
    <!-- 开启日志聚集功能 -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <!-- 设置日志聚集服务器地址 -->
    <property>
        <name>yarn.log.server.url</name>
        <value>http://10.10.10.31:19888/jobhistory/logs</value>
    </property>
    <!-- 设置日志保留时间为 7 天 -->
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
</configuration>

配置 mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- 指定 MapReduce 程序运行在 Yarn 上 -->
 <property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
 </property>
 <!-- 历史服务器端地址 -->
<property>
 <name>mapreduce.jobhistory.address</name>
 <value>10.10.10.31:10020</value>
</property>
<!-- 历史服务器 web 端地址 -->
<property>
 <name>mapreduce.jobhistory.webapp.address</name>
 <value>10.10.10.31:19888</value>
</property>
</configuration>

复制虚拟机

修改新虚拟机hostname

hostnamectl set-hostname CentOS-02
hostnamectl set-hostname CentOS-03

修改新虚拟机网卡MAC地址

vim /etc/sysconfig/network-scripts/ifcfg-enp1s0

修改网卡MAC地址

可随机更换,保证不重复即可

在路由器设置静态获取IP

TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
NAME=enp1s0
UUID=f2728000-045f-43f1-b2c3-4884e6a545f3
DEVICE=enp1s0
ONBOOT=yes
MACADDR=66:18:C7:F8:95:B9

重启所有虚拟机

SSH 无密登录配置

在/home/owenxuan目录下新建.ssh文件夹

mkdir ~/.ssh

生成公钥和私钥

cd ~/.ssh/
ssh-keygen -t rsa

一直按回车

直到出现

+---[RSA 3072]----+
|BB+*++. .        |
|*+=+=. o .       |
|B.=o..  =        |
|.+ +.  + .       |
|. . o . S        |
|.E . * . .. .    |
|.   . B . .+     |
|       =  o..    |
|        .. .oo   |
+----[SHA256]-----+

将公钥拷贝到要免密登录的目标机器上

ssh-copy-id 10.10.10.31
ssh-copy-id 10.10.10.32
ssh-copy-id 10.10.10.33

在CentOS-02和CentOS-03执行同样的操作

启动集群

如果集群是第一次启动

需要在 CentOS-01 节点格式化 NameNode(注意:格式化 NameNode,会产生新的集群 id,导致 NameNode 和 DataNode 的集群 id 不一致,集群找不到已往数据。如果集群在运行过程中报错,需要重新格式化 NameNode 的话,一定要先停止 namenode 和 datanode 进程,并且要删除所有机器的 data 和 logs 目录,然后再进行格式化。)

cd /opt/module/hadoop/
hdfs namenode -format

启动 HDFS

sbin/start-dfs.sh

在配置了 ResourceManager 的节点(CentOS-02)启动 YARN

sbin/start-yarn.sh

启动 HistoryServer

mapred --daemon start historyserver

Web 端查看 HDFS 的 NameNode

Web 端查看 YARN 的 ResourceManager

集群基本测试

上传文件到集群

hadoop fs -mkdir /input
hadoop fs -put $HADOOP_HOME/wcinput/word.txt /input

执行 wordcount 程序

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar wordcount /input /output

编写 Hadoop 集群常用脚本

Hadoop 集群启停脚本(包含 HDFS,Yarn,Historyserver):myhadoop.sh

mkdir bin && cd bin
vim myhadoop.sh

输入如下内容

#!/bin/bash
if [ # -lt 1 ]
then
	echo "No Args Input..."
	exit ;
fi
case1 in
"start")
echo " =================== 启动 hadoop 集群 ==================="
echo " --------------- 启动 hdfs ---------------"
ssh 10.10.10.31 "/opt/module/hadoop/sbin/start-dfs.sh"
echo " --------------- 启动 yarn ---------------"
ssh 10.10.10.32 "/opt/module/hadoop/sbin/start-yarn.sh"
echo " --------------- 启动 historyserver ---------------"
ssh 10.10.10.31 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
;;
"stop")
echo " =================== 关闭 hadoop 集群 ==================="
echo " --------------- 关闭 historyserver ---------------"
ssh 10.10.10.31 "/opt/module/hadoop/bin/mapred --daemon stop historyserver"
echo " --------------- 关闭 yarn ---------------"
ssh 10.10.10.32 "/opt/module/hadoop/sbin/stop-yarn.sh"
echo " --------------- 关闭 hdfs ---------------"
ssh 10.10.10.31 "/opt/module/hadoop/sbin/stop-dfs.sh"
;;
*)
echo "Input Args Error..."
;;
esa

保存后退出,然后赋予脚本执行权限

chmod +x myhadoop.sh

测试脚本

启动集群

myhadoop.sh start

关闭集群

myhadoop.sh stop

查看三台服务器 Java 进程脚本:jpsall

cd bin
vim jpsall

输入如下内容

#!/bin/bash
for host in 10.10.10.31 10.10.10.32 10.10.10.33
do
	echo =============== $host ===============
	ssh $host jps
	don

保存后退出,然后赋予脚本执行权限

chmod +x jpsall

测试脚本

jpsall

集群分发脚本:xsync

cd bin
vim xsync 

输入如下内容

#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
	echo Not Enough Arguement!
	exit;
fi
#2. 遍历集群所有机器
for host in 10.10.10.31 10.10.10.32 10.10.10.33
do
	echo ==================== $host ====================
	#3. 遍历所有目录,挨个发送
	for file in $@
	do
		#4. 判断文件是否存在
		if [ -e $file ]
		then
			#5. 获取父目录
			pdir=$(cd -P $(dirname $file); pwd)
			#6. 获取当前文件的名称
			fname=$(basename $file)
			ssh $host "mkdir -p $pdir"
			rsync -av $pdir/$fname $host:$pdir
		else
			echo $file does not exists!
		fi
	done
	don

保存后退出,然后赋予脚本执行权限

chmod +x xsync

测试脚本

xsync /home/owenxuan/bin

在其他机器上查看~/目录下是否存在bin目录

上一篇:使用虚拟机搭建Hadoop(伪分布式搭建、分布式搭建)


下一篇:01 hadoop入门