Hadoop集群搭建

@20201/6/18

Hadoop集群搭建

1、配置hostname

编辑/etc/hosts,在文件尾部追加如下内容:

192.168.21.10 master
192.168.21.20 slave1
192.168.21.30 slave2

ip 根据自己环境修改,最后/etc/hosts 内容如下:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.21.10 master
192.168.21.20 slave1
192.168.21.30 slave2

把修改后的/etc/hosts 复制到其他机器上。

2、配置SSH

配置SSH 主要是程序能登陆到其他机器上自动执行命令
SSH配置 其实主要是把自己的公钥放到其他机器的authorized_keys里面就行了

  1. 首先 通过命令 ssh-keygen -t rsa 在当前路径下生成一个.ssh的目录。生成机器A的公钥和私钥id_rsa 和id_rsa.pub
  2. 然后把机器A的公钥中的内容追加到机器B中 .ssh/authorized_keys 中(不存在就手动新建),这样就可以在机器A 中通过ssh 登陆到机器B 上面
  3. 如果要从机器B登陆到机器A,过程也是一样,把机器B的公钥放到A的authorized_keys 文件中

3、安装jdk

下载jdk:下载地址
目前在官网下载低于jdk1.8的java jdk的时候需要登陆,这边分享一个账号,方便下载
账号:2696671285@qq.com
密码:Oracle123
账号提供者:https://blog.csdn.net/WNsshssm/article/details/84315519

下载之后上传文件到Linux后解压 :
Hadoop集群搭建
编辑/etc/profile, 添加以下配置

export JAVA_HOME=/root/bigdata/java8/jdk1.8.0_291
export JRE_HOME=/root/bigdata/java8/jdk1.8.0_291/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:                                                         $CLASSPATH
export PATH=$JAVA_HOME/bin:$PATH

JAVA_HOME 根据自己的实际路径填写

然后执行

source /etc/profile

最后执行java -version 验证是否安装成功
Hadoop集群搭建

4、安装Hadoop

下载Hadoop安装包:https://archive.apache.org/dist/hadoop/common/https://archive.apache.org/dist/hadoop/common/

上传解压
Hadoop集群搭建

其中dfs目录是手动创建的,用来存一些数据,后面配置中会用到

进入$HADOOP_HOME/etc/hadoop 配置

配置core-site.xml


<configuration>
    <property>
       <name>fs.defaultFS</name>
       <value>hdfs://master:9000</value>
       <description>NameNode URI.</description>
    </property>
    <property>
       <name>io.file.buffer.size</name>
       <value>131072</value>
       <description>Size of read/write buffer used inSequenceFiles.</description>
    </property>
    <property>
       <name>hadoop.tmp.dir</name>
       <value>/root/bigdata/hadoop-3.1.1/dfs/tmp</value>
       <description>namenode上本地的hadoop临时文件夹</description>
    </property>

</configuration>

配置 hdfs-site.xml

<configuration>
<property>
 <name>dfs.namenode.secondary.http-address</name>
 <value>master:50090</value>
 <description>The secondary namenode http server address andport.,不应该和namenode 同一台机器的 </description>
</property>
<property>
 <name>dfs.namenode.name.dir</name>
 <value>file:///root/bigdata/hadoop-3.1.1/dfs/name</value>
 <description>  存储fsimage和edit   log ,可以设置多个目录保证可用性 </description>
</property>
<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:///root/bigdata/hadoop-3.1.1/dfs/data</value>
 <description>Comma separated list of paths on the local filesystemof a DataNode where it should store its blocks.</description>
</property>
<property>
 <name>dfs.namenode.checkpoint.dir</name>
 <value>file:///root/bigdata/hadoop-3.1.1/dfs/namesecondary</value>
 <description>Determines where on the local filesystem the DFSsecondary name node should store the temporary images to merge. If this is acomma-delimited list of directories then the image is replicated in all of thedirectories for redundancy.</description>
</property>
<property>
 <name>dfs.replication</name>
 <value>2</value>
</property>
</configuration>

配置mapred-site.xml

<configuration>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Theruntime framework for executing MapReduce jobs. Can be one of local, classic oryarn.</description>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
<description>MapReduce JobHistoryServer IPC host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
<description>MapReduce JobHistoryServer Web UI host:port</description>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/root/bigdata/hadoop-3.1.1/etc/hadoop,
/root/bigdata/hadoop-3.1.1/hadoop/common/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/common/lib/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/hdfs/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/hdfs/lib/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/mapreduce/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/mapreduce/lib/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/yarn/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>

配置yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
 <name>yarn.resourcemanager.hostname</name>
 <value>master</value>
 <description>The hostname of theRM.</description>
</property>
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
 <description>Shuffle service that needs to be set for Map Reduceapplications.</description>
</property>

</configuration>

hadoop-env.sh 中设置JAVA_HOME

export JAVA_HOME=/root/bigdata/java8/jdk1.8.0_291

works 中设置 从节点,其实也可以把master 也当做一个DataNode

[root@master hadoop]# cat  workers
slave2
slave1
[root@master hadoop]#

其中有些路径要根据自己的机器路径作修改

然后是把解压并配置好的hadoop 目录复制到其他机器上

 scp -r  /root/bigdata/hadoop-3.1.1  root@slave1:/root/bigdata
 scp -r  /root/bigdata/hadoop-3.1.1  root@slave2:/root/bigdata

在每台机器上添加变量$HADOOP_HOME

编辑/etc/profile追加以下内容:

export HADOOP_HOME=/root/bigdata/hadoop-3.1.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

刷新 source /etc/profile

然后集群开始初始化,执行 hdfs namenode -format ,只用格式化一次就行
成功之后启动集群。执行 sbin/start-all.sh
报错

[root@master sbin]#  start-all.sh
Starting namenodes on [master]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [master]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
Starting resourcemanager
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.

解决方法:
sbin/start-dfs.sh 和sbin/stop-dfs.sh 加上以下内容:

HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

sbin/start-yarn.sh 和 sbin/stop-yarn.sh 加上以下内容:

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

再次执行start-all.sh ,启动成功

master:
Hadoop集群搭建
slave1:
Hadoop集群搭建
slave2:
Hadoop集群搭建

访问webui
hadoop:http://192.168.21.10:9870/(hadoop3 端口由50070改为9870)
yarn:http://192.168.21.10:8088/

如果在window上无法访问可能是防火墙没关

查看防火墙: systemctl status firewalld.service

Hadoop集群搭建
红框中表明防火墙是打开的
输入systemctl stop firewalld.service命令,进行关闭防火墙
再次查看防火墙状态:
Hadoop集群搭建
可以看到已经关闭

再在命令行中输入命令“systemctl disable firewalld.service”命令,即可永久关闭防火墙。

最后webui:

Hadoop集群搭建
Hadoop集群搭建

上一篇:bigdata_hive进阶


下一篇:故障处理:OpenStack对接商业存储NetApp cinder调度规则问题