0610 hadoop01

Hadoop第一天

1. 数据的分布式存储

 

2. 什么是HDFS?

  1. 海量数据是存储在集群上的(利用多台机器作为存储资源)
  2. 多台机器组成一个有组织的群体(主节点,从节点)
  3. 从节点启动后,向主节点汇报自已的资源
  4. 主节点接收到从节点的注册后,维护集群(列表有几个节点,每个节点的存储容量信息)
  5. 客户端存储数据时,请求主节点进行存储
  6. 主节点接收到客户端请求后,验证,返回存储位置给客户端
  7. 客户端请求对应的存储节点存储数据
  8. 数据在集群上存储,保存多个副本秋保证数据安全性

3. HDFS架构

 

4. 搭建Hadoop HDFS存储集群

4.1. 集群规划

主机名称

ip地址

安装节点名称

hadoop01

192.168.254.101

NameNode DataNode

hadoop02

192.168.254.102

DataNode

hadoop03

192.168.254.103

DataNode

4.2. 虚拟机集群环境准备(每台机器需要准备)

1) 安装jdk并配置好环境变量

2) 修改主机名

3) 修改主机名与ip映射关系

4) 关闭防火墙

5) 配置统一时间同步

4.3. NTP时间同步服务器配

4.3.1. 原理

 

4.3.2. 配置hadoop01时间同步服务器

查看时间同步状态:

[root@hadoop01 ~]# service ntpd status

ntpd is stopped

4.3.3. ntp配置文件

[root@hadoop01 ~]# vim /etc/ntp.conf

添加:

restrict 192.168.254.101 nomodify notrap nopeer noquery

修改:

restrict 192.168.254.1 mask 255.255.255.0 nomodify notrap

注释以下内容:

#server 0.centos.pool.ntp.org iburst

#server 1.centos.pool.ntp.org iburst

#server 2.centos.pool.ntp.org iburst

#server 3.centos.pool.ntp.org iburst

添加:

server 127.127.1.0

fudge 127.127.0.1 stratum 10

启动NTP服务:

[root@hadoop01 ~]# service ntpd start

设置开机启动:

[root@hadoop01 ~]# chkconfig ntpd on

4.3.4. 时间命令

查看当前的时间

[root@hadoop01 ~]# date

Thu Jun 10 10:45:39 EDT 2021

[root@hadoop03 ~]# service ntpd stop

[root@hadoop03 ~]# ntpdate hadoop01

10 Jun 11:23:57 ntpdate[2362]: step time server 192.168.254.101 offset 115685179.275931 sec

 

4.4. 客户端同步配置

[root@hadoop02 ~]# vim /etc/ntp.conf

修改:

# the administrative functions.

restrict 192.168.254.101 nomodify notrap nopeer noquery

# Hosts on local network are less restricted.

restrict 192.168.254.1 mask 255.255.255.0 nomodify notrap

 

注释掉:

#server 0.centos.pool.ntp.org iburst

#server 1.centos.pool.ntp.org iburst

#server 2.centos.pool.ntp.org iburst

#server 3.centos.pool.ntp.org iburst

添加:

server 192.168.254.101

fudge 127.127.0.1 stratum 10

 

4.5. 配置HDFS集群

4.5.1. 上传hadoop包到服务器

 

4.5.2. 解压安装包

[root@hadoop01 software]# tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/

4.5.3. 配置core-site.xml核心配置文件

[root@hadoop01 hadoop]# vim /opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml

<configuration>

        <!--配置NameNode的地址-->

        <property>

                <name>fs.defaultFS</name>

                <value>hdfs://hadoop01:9000</value>

        </property>

        <!-- 指定hadoop运行时存储的临时文件目录-->

        <property>

          <name>hadoop.tmp.dir</name>

          <value>/opt/module/hadoop-2.7.2/data/tmp</value>

        </property>

</configuration>

 

4.5.4. 配置java环境变量

[root@hadoop01 hadoop]# vim /opt/module/hadoop-2.7.2/etc/hadoop/hadoop-env.sh

修改:

export JAVA_HOME=/opt/module/jdk1.8.0_144

4.5.5. 配置hdfs-site.xml

[root@hadoop01 hadoop]# vim /opt/module/hadoop-2.7.2/etc/hadoop/hdfs-site.xml

<configuration>

        <!-- 指定hdfs副本的数量-->

        <property>

          <name>dfs.replication</name>

          <value>3</value>

        </property>

        <property>

                <name>dfs.namenode.http-address</name>

                <value>hadoop01:50070</value>

        </property>

</configuration>

4.6. 配置yarn

4.6.1. 配置jdk路径

[root@hadoop01 hadoop]# vim yarn-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

4.6.2. 配置yarn-site.xml

[root@hadoop01 hadoop]# vim /opt/module/hadoop-2.7.2/etc/hadoop/yarn-site.xml

<configuration>

        <!--reduce获取数据方式-->

        <property>

                <name>yarn.nodemanager.aux-services</name>

                <value>mapreduce_shuffle</value>

        </property>

        <!--配置yarn的resourcemanager地址-->

        <property>

                <name>yarn.resourcemanager.hostname</name>

                <value>hadoop01</value>

        </property>

</configuration>

4.6.3. 配置mapreduce的jdk环境

[root@hadoop01 hadoop]# vim mapred-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

4.6.4. 配置mapred-site.xml

[root@hadoop01 hadoop]# mv mapred-site.xml.template mapred-site.xml

<configuration>

        <!--指定mapreduce的运行方式yarn-->

        <property>

          <name>mapreduce.framework.name</name>

          <value>yarn</value>

        </property>

</configuration>

 

4.6.5. 配置dataNode节点

[root@hadoop01 hadoop]# vim slaves

hadoop01

hadoop02

hadoop03

4.6.6. 同步集群配置

[root@hadoop01 module]# xsync /opt/module/hadoop-2.7.2/

4.6.7. 启动集群方式

方式一: 单节点启动HDFS

[root@hadoop01 sbin]# ./hadoop-daemons.sh start/stop/restart  namenode|datanode|secondarynamenode

方式二: 多节点启动HDFS

[root@hadoop01 sbin]# ./stop-dfs.sh 或者

[root@hadoop01 sbin]# ./start-dfs.sh

Starting namenodes on [hadoop01]

hadoop01: starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-namenode-hadoop01.out

hadoop03: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-hadoop03.out

hadoop02: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-hadoop02.out

hadoop01: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-hadoop01.out

 

4.7. 启动集群

第一次启动需要格式化namenode,以后再次启动不需要再执行以下命令:

[root@hadoop01 bin]# ./hadoop namenode -format

启动HDFS集群:

[root@hadoop01 sbin]# ./start-dfs.sh

4.8. 查看集群存储

 

 

5. HDFS的shell操作

5.1. 基本命令格式

bin/hadoop fs 命令

5.2. 命令大全

Usage: hadoop fs [generic options]

        [-appendToFile <localsrc> ... <dst>]

        [-cat [-ignoreCrc] <src> ...]

        [-checksum <src> ...]

        [-chgrp [-R] GROUP PATH...]

        [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]

        [-chown [-R] [OWNER][:[GROUP]] PATH...]

        [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]

        [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]

        [-count [-q] [-h] <path> ...]

        [-cp [-f] [-p | -p[topax]] <src> ... <dst>]

        [-createSnapshot <snapshotDir> [<snapshotName>]]

        [-deleteSnapshot <snapshotDir> <snapshotName>]

        [-df [-h] [<path> ...]]

        [-du [-s] [-h] <path> ...]

        [-expunge]

        [-find <path> ... <expression> ...]

        [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]

        [-getfacl [-R] <path>]

        [-getfattr [-R] {-n name | -d} [-e en] <path>]

        [-getmerge [-nl] <src> <localdst>]

        [-help [cmd ...]]

        [-ls [-d] [-h] [-R] [<path> ...]]

        [-mkdir [-p] <path> ...]

        [-moveFromLocal <localsrc> ... <dst>]

        [-moveToLocal <src> <localdst>]

        [-mv <src> ... <dst>]

        [-put [-f] [-p] [-l] <localsrc> ... <dst>]

        [-renameSnapshot <snapshotDir> <oldName> <newName>]

        [-rm [-f] [-r|-R] [-skipTrash] <src> ...]

        [-rmdir [--ignore-fail-on-non-empty] <dir> ...]

        [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]

        [-setfattr {-n name [-v value] | -x name} <path>]

        [-setrep [-R] [-w] <rep> <path> ...]

        [-stat [format] <path> ...]

        [-tail [-f] <file>]

        [-test -[defsz] <path>]

        [-text [-ignoreCrc] <src> ...]

        [-touchz <path> ...]

        [-truncate [-w] <length> <path> ...]

        [-usage [cmd ...]]

 

查看帮助:

[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -命令

5.3. 常用的命令

5.3.1. 显示目录信息

[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -ls /

5.3.2. 上传文件到hdfs

[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -put /opt/software/hadoop-2.7.2.tar.gz hdfs://hadoop01:9000/

5.3.3. -mkdir 创建目录

[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -mkdir -p /abc/dbf/ccc

5.3.4. 删除目录

[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -mkdir -p /abc

 

6. HDFS的java客户端操作

6.1. 配置win10环境变量

 

 

 

 

6.2. 创建工程并导入依赖

<dependencies>
    <!--单元测试-->
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.12</version>
    </dependency>
    <!--导入日志依赖-->
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-core</artifactId>
        <version>2.13.3</version>
    </dependency>
    <!-- hadoop-common 公共依赖-->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.7.2</version>
    </dependency>

    <!-- hadoop-client 客户端依赖-->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.7.2</version>
    </dependency>

    <!-- hadoop-hdfs 依赖-->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>2.7.2</version>
    </dependency>


</dependencies>

6.3. 创建HdfsCient


public class HdfsCient {

    @Test
    public void testMkdir() throws  Exception{
        //获取文件系统
        Configuration configuration = new Configuration();
        //配置连接的集群
        configuration.set("fs.defaultFS","hdfs://hadoop01:9000");

        //获取文件系统
        FileSystem fs = FileSystem.get(new URI("hdfs://hadoop01:9000"),configuration,"root");

        //创建目录
        fs.mkdirs(new Path("/cc"));

        //关闭fs
        fs.close();

    }
}

 

小结: 

  1. 获取配置
  2. 设置配置信息
  3. 创建命令
  4. 关闭对象

 

上一篇:Hadoop安装部署&伪分布式搭建


下一篇:Linux----->免密登录认证