@20201/6/18
Hadoop集群搭建
1、配置hostname
编辑/etc/hosts,在文件尾部追加如下内容:
192.168.21.10 master
192.168.21.20 slave1
192.168.21.30 slave2
ip 根据自己环境修改,最后/etc/hosts 内容如下:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.21.10 master
192.168.21.20 slave1
192.168.21.30 slave2
把修改后的/etc/hosts 复制到其他机器上。
2、配置SSH
配置SSH 主要是程序能登陆到其他机器上自动执行命令
SSH配置 其实主要是把自己的公钥放到其他机器的authorized_keys里面就行了
- 首先 通过命令 ssh-keygen -t rsa 在当前路径下生成一个.ssh的目录。生成机器A的公钥和私钥id_rsa 和id_rsa.pub
- 然后把机器A的公钥中的内容追加到机器B中 .ssh/authorized_keys 中(不存在就手动新建),这样就可以在机器A 中通过ssh 登陆到机器B 上面
- 如果要从机器B登陆到机器A,过程也是一样,把机器B的公钥放到A的authorized_keys 文件中
3、安装jdk
下载jdk:下载地址
目前在官网下载低于jdk1.8的java jdk的时候需要登陆,这边分享一个账号,方便下载
账号:2696671285@qq.com
密码:Oracle123
账号提供者:https://blog.csdn.net/WNsshssm/article/details/84315519
下载之后上传文件到Linux后解压 :
编辑/etc/profile, 添加以下配置
export JAVA_HOME=/root/bigdata/java8/jdk1.8.0_291
export JRE_HOME=/root/bigdata/java8/jdk1.8.0_291/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib: $CLASSPATH
export PATH=$JAVA_HOME/bin:$PATH
JAVA_HOME 根据自己的实际路径填写
然后执行
source /etc/profile
最后执行java -version 验证是否安装成功
4、安装Hadoop
下载Hadoop安装包:https://archive.apache.org/dist/hadoop/common/https://archive.apache.org/dist/hadoop/common/
上传解压
其中dfs目录是手动创建的,用来存一些数据,后面配置中会用到
进入$HADOOP_HOME/etc/hadoop 配置
配置core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
<description>NameNode URI.</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
<description>Size of read/write buffer used inSequenceFiles.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/bigdata/hadoop-3.1.1/dfs/tmp</value>
<description>namenode上本地的hadoop临时文件夹</description>
</property>
</configuration>
配置 hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
<description>The secondary namenode http server address andport.,不应该和namenode 同一台机器的 </description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///root/bigdata/hadoop-3.1.1/dfs/name</value>
<description> 存储fsimage和edit log ,可以设置多个目录保证可用性 </description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///root/bigdata/hadoop-3.1.1/dfs/data</value>
<description>Comma separated list of paths on the local filesystemof a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///root/bigdata/hadoop-3.1.1/dfs/namesecondary</value>
<description>Determines where on the local filesystem the DFSsecondary name node should store the temporary images to merge. If this is acomma-delimited list of directories then the image is replicated in all of thedirectories for redundancy.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
配置mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Theruntime framework for executing MapReduce jobs. Can be one of local, classic oryarn.</description>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
<description>MapReduce JobHistoryServer IPC host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
<description>MapReduce JobHistoryServer Web UI host:port</description>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/root/bigdata/hadoop-3.1.1/etc/hadoop,
/root/bigdata/hadoop-3.1.1/hadoop/common/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/common/lib/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/hdfs/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/hdfs/lib/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/mapreduce/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/mapreduce/lib/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/yarn/*,
/root/bigdata/hadoop-3.1.1/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
配置yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
<description>The hostname of theRM.</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>Shuffle service that needs to be set for Map Reduceapplications.</description>
</property>
</configuration>
hadoop-env.sh 中设置JAVA_HOME
export JAVA_HOME=/root/bigdata/java8/jdk1.8.0_291
works 中设置 从节点,其实也可以把master 也当做一个DataNode
[root@master hadoop]# cat workers
slave2
slave1
[root@master hadoop]#
其中有些路径要根据自己的机器路径作修改
然后是把解压并配置好的hadoop 目录复制到其他机器上
scp -r /root/bigdata/hadoop-3.1.1 root@slave1:/root/bigdata
scp -r /root/bigdata/hadoop-3.1.1 root@slave2:/root/bigdata
在每台机器上添加变量$HADOOP_HOME
编辑/etc/profile追加以下内容:
export HADOOP_HOME=/root/bigdata/hadoop-3.1.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
刷新 source /etc/profile
然后集群开始初始化,执行 hdfs namenode -format ,只用格式化一次就行
成功之后启动集群。执行 sbin/start-all.sh
报错
[root@master sbin]# start-all.sh
Starting namenodes on [master]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [master]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
Starting resourcemanager
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.
解决方法:
sbin/start-dfs.sh 和sbin/stop-dfs.sh 加上以下内容:
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
sbin/start-yarn.sh 和 sbin/stop-yarn.sh 加上以下内容:
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
再次执行start-all.sh ,启动成功
master:
slave1:
slave2:
访问webui
hadoop:http://192.168.21.10:9870/(hadoop3 端口由50070改为9870)
yarn:http://192.168.21.10:8088/
如果在window上无法访问可能是防火墙没关
查看防火墙: systemctl status firewalld.service
红框中表明防火墙是打开的
输入systemctl stop firewalld.service命令,进行关闭防火墙
再次查看防火墙状态:
可以看到已经关闭
再在命令行中输入命令“systemctl disable firewalld.service”命令,即可永久关闭防火墙。
最后webui: