win10+centos7+hadoop 集群环境搭建

一. 前期准备

1. Vmware workstation pro 16 

    官网下载 : https://www.vmware.com/ 

    密钥:ZF3R0-FHED2-M80TY-8QYGC-NPKYF (若失效请自行百度)

2. xshell,xftp 官网下载(需要注册)

3. 国内镜像网站下载centos(笔者以centos7为例),如华为,阿里,清华的镜像。

    https://mirrors.tuna.tsinghua.edu.cn ,https://developer.aliyun.com/mirror/,https://mirrors.huaweicloud.com/

4. 所需要的包

    hadoop 2.7.0      jdk-1.8.0

二. 安装Vmware workstation pro 16

三. 安装 xshell,xftp

四. Vmware网络配置

原因:将Windows系统与虚拟机放到同一子网下,这样可以通过window系统上的浏览器访问虚拟机中hadoop集群的文件管理页面(即 master:50070 页面),为后续idea连接hadoop集  群也提供了方便。固定虚拟机IP地址,方便后续操作。

1. 在VMware软件里面的编辑----》虚拟网络编辑器---》选择VMnet8模式

win10+centos7+hadoop 集群环境搭建

 

2. 点击NAT设置

win10+centos7+hadoop 集群环境搭建

 

 

 3. 点击1中的DHCP设置 

win10+centos7+hadoop 集群环境搭建

 

 

 4. 设置VMnet8的地址

     win10+centos7+hadoop 集群环境搭建

 

 

 5. 右击VMnet8->属性

win10+centos7+hadoop 集群环境搭建win10+centos7+hadoop 集群环境搭建

 

 

 

 

五. 创建虚拟机实例

笔者创建了三个虚拟机,hostname分别为master,slave1,slave2. 

创建过程中网络类型选择NAT模式

修改主机名,请参考:https://www.cnblogs.com/HusterX/articles/13425074.html

关闭防火墙(或者开放对应窗口)

win10+centos7+hadoop 集群环境搭建
firewall-cmd --state
systemctl stop firewalld.service
systemctl disable firewalld.service
CloseFirewalld

增加用户(增加对hadoop管理的一个用户,在本地搭建是可以忽略这一步,只要全程在root权限下操作即可)

win10+centos7+hadoop 集群环境搭建UserAdd

最终结果

IP地址 计算机名 主要作用
192.168.47.131 master namenode,JobTracker
192.168.47.132 salve1 datenode,TaskTracker
192.168.47.130 slave2 datenode,TaskTracker

编辑master的 /etc/hosts 文件

win10+centos7+hadoop 集群环境搭建
192.168.47.131 master
192.168.47.132 slave1
192.168.47.130 slave2
master‘s hosts

 

六. Centos7系统环境搭建

以下步骤用xshell连接master 后操作或者在 master内直接进行操作。

PS:笔者全程用root权限操作(若使用hadoop用户,请注意权限问题)

1.用xftp软件 将 Jdk Hadoop 压缩包上传到master中的某个目录下(笔者以 /opt 目录为例)

2. 搭建Java环境

win10+centos7+hadoop 集群环境搭建
1.检查是否有Java
  java -version

2.若有,则移除openjdk
   查看:
   rpm -qa | grep openjdk
   删除
   rpm -e --nodeps [相关的软件包]

3.解压缩上传到 /opt 目录下的 jdk
   tar -zxvf jdk1.8*****
   重命名
   mv  jdk1.8*****  jdk8

4.增加环境变量(root用户下)
   vim /etc/profile

   增加
   export JAVA_HOME=/opt/jdk8
   export PATH=$PATH:$JAVA_HOME/bin

5.生效 
   source /etc/profile

6.测试
   java -version
Centos7 Java环境搭建流程

3. 搭建hadoop环境

win10+centos7+hadoop 集群环境搭建
1.解压缩上传到 /opt 目录下的hadoop压缩包
  tar -zxf hadoop-2.7.3.tar.gz 
  重命名
  mv  hadoop-2.7.3 hadoop

2.配置hadoop环境变量(root用户下)
   vim /etc/profile

   增加
   export HADOOP_HOME=/opt/hadoop

   修改
   export 
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

3.测试
   hadoop version

[root@master ~]# hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar
Centos7 hadoop环境搭建流程

4. ssh免密登录

win10+centos7+hadoop 集群环境搭建
1. 生成密钥
[root@master ~]# ssh-keygen -t rsa 
/root/.ssh下 
[root@master ~]# ls /root/.ssh/
id_rsa  id_rsa.pub

2. 加入信任列表
[root@master ~]# cat id_rsa.pub >> authorized_keys
[root@master ~]# ls /root/.ssh/
authorized_keys  id_rsa  id_rsa.pub

3. 设置权限
[root@master ~]# chmod 600 authorized_keys

4. 在其余centos系统中重复1 2 3 
 
5. 分发给集群中的其他主机
    使用模式:ssh-copy-id [-i [identity_file]] [user@]machine
[root@master ~]# ssh-copy-id  [user]@[IP]
SSH免密登录操作

5. 在 /opt/hadoop/ 下创建目录

   (1)  创建hdfs

   (2)  hdfs下创建 name,data,tmp目录

6. hadoop配置文件

在hadoop-env.sh中增加 export JAVA_HOME=/opt/jdk8

win10+centos7+hadoop 集群环境搭建
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.
export JAVA_HOME=${JAVA_HOME}

# The jsvc implementation to use. Jsvc is required to run secure datanodes
# that bind to privileged ports to provide authentication of data transfer
# protocol.  Jsvc is not required if SASL is configured for authentication of
# data transfer protocol using non-privileged ports.
#export JSVC_HOME=${JSVC_HOME}

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

# Extra Java CLASSPATH elements.  Automatically insert capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
  if [ "$HADOOP_CLASSPATH" ]; then
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
  else
    export HADOOP_CLASSPATH=$f
  fi
done

# The maximum amount of heap to use, in MB. Default is 1000.
#export HADOOP_HEAPSIZE=
#export HADOOP_NAMENODE_INIT_HEAPSIZE=""

# Extra Java runtime options.  Empty by default.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"

export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"

export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"

# On secure datanodes, user to run the datanode as after dropping privileges.
# This **MUST** be uncommented to enable secure HDFS if using privileged ports
# to provide authentication of data transfer protocol.  This **MUST NOT** be
# defined if SASL is configured for authentication of data transfer protocol
# using non-privileged ports.
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}

# Where log files are stored.  $HADOOP_HOME/logs by default.
#export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER

# Where log files are stored in the secure data environment.
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}

###
# HDFS Mover specific parameters
###
# Specify the JVM options to be used when starting the HDFS Mover.
# These options will be appended to the options specified as HADOOP_OPTS
# and therefore may override any similar flags set in HADOOP_OPTS
#
# export HADOOP_MOVER_OPTS=""

###
# Advanced Users Only!
###

# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by 
#       the user that will run the hadoop daemons.  Otherwise there is the
#       potential for a symlink attack.
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER
export JAVA_HOME=/opt/jdk8
hadoop-env.sh
win10+centos7+hadoop 集群环境搭建
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/opt/hadoop/hdfs/tmp</value>
        <discription>A base for other temporary directories.</discription>
    </property>
    <!--master 的IP地址 -->
    <!-- 可以直接写 hdfs://master:900 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.47.131:9000</value>
    </property>
</configuration>
core-site.xml
win10+centos7+hadoop 集群环境搭建
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <!--dfs.replication表示副本的数量,伪分布式要设置为1-->
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
<!--dfs.namenode.name.dir表示本地磁盘目录,是存储fsimage文件的地方-->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/opt/hadoop/hdfs/name</value>
    </property>
<!--dfs.datanode.data.dir表示本地磁盘目录,HDFS数据存放block的地方-->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/opt/hadoop/hdfs/data</value>
    </property>

</configuration>
hdfs-site.xml
win10+centos7+hadoop 集群环境搭建
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>
yarn-site.xml
win10+centos7+hadoop 集群环境搭建
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
mapred-site.xml
win10+centos7+hadoop 集群环境搭建
slave1
slave2
slaves

PS : slaves文件根据自己的部署确定,笔者部署了俩台slave

7. master环境配置到此结束

   将master中的 /opt/jdk,/opt/hadoop 复制到slave1,slave2的 /opt 目录下。

   将master中的 /etc/hosts,/etc/profile 复制到slave2,slave2对应的目录下。

8. 在slave1,slave2中运行

   source  /etc/profile

附笔者的  /etc/profile 文件

win10+centos7+hadoop 集群环境搭建
# /etc/profile

# System wide environment and startup programs, for login setup
# Functions and aliases go in /etc/bashrc

# It‘s NOT a good idea to change this file unless you know what you
# are doing. It‘s much better to create a custom.sh shell script in
# /etc/profile.d/ to make custom changes to your environment, as this
# will prevent the need for merging in future updates.

pathmunge () {
    case ":${PATH}:" in
        *:"$1":*)
            ;;
        *)
            if [ "$2" = "after" ] ; then
                PATH=$PATH:$1
            else
                PATH=$1:$PATH
            fi
    esac
}


if [ -x /usr/bin/id ]; then
    if [ -z "$EUID" ]; then
        # ksh workaround
        EUID=`/usr/bin/id -u`
        UID=`/usr/bin/id -ru`
    fi
    USER="`/usr/bin/id -un`"
    LOGNAME=$USER
    MAIL="/var/spool/mail/$USER"
fi

# Path manipulation
if [ "$EUID" = "0" ]; then
    pathmunge /usr/sbin
    pathmunge /usr/local/sbin
else
    pathmunge /usr/local/sbin after
    pathmunge /usr/sbin after
fi

HOSTNAME=`/usr/bin/hostname 2>/dev/null`
HISTSIZE=1000
if [ "$HISTCONTROL" = "ignorespace" ] ; then
    export HISTCONTROL=ignoreboth
else
    export HISTCONTROL=ignoredups
fi

export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL

# By default, we want umask to get set. This sets it for login shell
# Current threshold for system reserved uid/gids is 200
# You could check uidgid reservation validity in
# /usr/share/doc/setup-*/uidgid file
if [ $UID -gt 199 ] && [ "`/usr/bin/id -gn`" = "`/usr/bin/id -un`" ]; then
    umask 002
else
    umask 022
fi

for i in /etc/profile.d/*.sh /etc/profile.d/sh.local ; do
    if [ -r "$i" ]; then
        if [ "${-#*i}" != "$-" ]; then 
            . "$i"
        else
            . "$i" >/dev/null
        fi
    fi
done

unset i
unset -f pathmung
export JAVA_HOME=/opt/jdk8
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
/etc/profile

 

 

七. 启动hadoop集群

win10+centos7+hadoop 集群环境搭建
1. 格式化系统
hadoop namenode -format / hdfs namenode -formate
(注意只需要执行一次,后面再启动不需要再次格式化,
除非master/slave有修改)

2.启动hadoop(进入 /opt/hadoop 目录下)
   sbin/start-all.sh

3.验证 jps,查看master进程
   [root@master hadoop]# jps
   28448 ResourceManager
   31777 Jps
   28293 SecondaryNameNode
   28105 NameNode
   查看salve1进程
   [root@slave1 ~]# jps
   22950 Jps
   18665 NodeManager
   18558 DataNode

4. 浏览器中查看
    http://master:50070
     若window与虚拟机在同一子网中,可以在window系统中通过浏览器打开,若不在,可以通过配置实现。同样也可在master中通过浏览器访问。

5. 停止hadoop集群(进入 /opt/hadoop )
    sbin/stop-all.sh
启动命令

 

八. 程序测试

win10+centos7+hadoop 集群环境搭建
1.在 /opt/hadoop 目录下
echo "this is a test case, loading, please wait a minit" >> test

2.用hdfs命令创建输入文件夹
   hadoop fs -mkdir /input

3.用hdfs命令将test内容放入 /input 文件夹中 
   hadoop fs -put test /input

4.运行hadoop自带的wordcount例子
   hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /input /output

5. 查看输出结果
   hadoop fs -ls /output
   hadoop fs -cat /output/part-r-00000
TestCase

 

PS:这些文件夹以及都是在hdfs上,因此无法在本地磁盘中找到。且在程序运行前,结果文件夹output必须是不存在的。若文件需要更改,然后重新运行程序,则需要将input和output都删除,重新生成。或者新建两个对应的文件夹。如果需要重新hadoop namenode -format 务必把之前的日志,临时等文件进行删除

 

win10+centos7+hadoop 集群环境搭建
删除命令
hadoop fs -rmr [/targetDir]

列出目标文件夹的文件
hadoop fs -ls [/targetDir]

将本地文件放到hdfs上
hadoop fs -put localFile remoteFilePath
hdfs命令

 

九. IDEA 连接hadoop集群运行WordCount

请参考:https://www.cnblogs.com/HusterX/p/14162985.html

ZF3R0-FHED2-M80TY-8QYGC-NPKYF

win10+centos7+hadoop 集群环境搭建

上一篇:WinForm引用ActiveX组件,对Com组件的学习


下一篇:API自动化测试-测试数据集