Hadoop第一天
1. 数据的分布式存储
2. 什么是HDFS?
- 海量数据是存储在集群上的(利用多台机器作为存储资源)
- 多台机器组成一个有组织的群体(主节点,从节点)
- 从节点启动后,向主节点汇报自已的资源
- 主节点接收到从节点的注册后,维护集群(列表有几个节点,每个节点的存储容量信息)
- 客户端存储数据时,请求主节点进行存储
- 主节点接收到客户端请求后,验证,返回存储位置给客户端
- 客户端请求对应的存储节点存储数据
- 数据在集群上存储,保存多个副本秋保证数据安全性
3. HDFS架构
4. 搭建Hadoop HDFS存储集群
4.1. 集群规划
主机名称 |
ip地址 |
安装节点名称 |
hadoop01 |
192.168.254.101 |
NameNode DataNode |
hadoop02 |
192.168.254.102 |
DataNode |
hadoop03 |
192.168.254.103 |
DataNode |
4.2. 虚拟机集群环境准备(每台机器需要准备)
1) 安装jdk并配置好环境变量
2) 修改主机名
3) 修改主机名与ip映射关系
4) 关闭防火墙
5) 配置统一时间同步
4.3. NTP时间同步服务器配
4.3.1. 原理
4.3.2. 配置hadoop01时间同步服务器
查看时间同步状态:
[root@hadoop01 ~]# service ntpd status
ntpd is stopped
4.3.3. ntp配置文件
[root@hadoop01 ~]# vim /etc/ntp.conf
添加:
restrict 192.168.254.101 nomodify notrap nopeer noquery
修改:
restrict 192.168.254.1 mask 255.255.255.0 nomodify notrap
注释以下内容:
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
添加:
server 127.127.1.0
fudge 127.127.0.1 stratum 10
启动NTP服务:
[root@hadoop01 ~]# service ntpd start
设置开机启动:
[root@hadoop01 ~]# chkconfig ntpd on
4.3.4. 时间命令
查看当前的时间
[root@hadoop01 ~]# date
Thu Jun 10 10:45:39 EDT 2021
[root@hadoop03 ~]# service ntpd stop
[root@hadoop03 ~]# ntpdate hadoop01
10 Jun 11:23:57 ntpdate[2362]: step time server 192.168.254.101 offset 115685179.275931 sec
4.4. 客户端同步配置
[root@hadoop02 ~]# vim /etc/ntp.conf
修改:
# the administrative functions.
restrict 192.168.254.101 nomodify notrap nopeer noquery
# Hosts on local network are less restricted.
restrict 192.168.254.1 mask 255.255.255.0 nomodify notrap
注释掉:
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
添加:
server 192.168.254.101
fudge 127.127.0.1 stratum 10
4.5. 配置HDFS集群
4.5.1. 上传hadoop包到服务器
4.5.2. 解压安装包
[root@hadoop01 software]# tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/
4.5.3. 配置core-site.xml核心配置文件
[root@hadoop01 hadoop]# vim /opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml
<configuration> <!--配置NameNode的地址--> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop01:9000</value> </property> <!-- 指定hadoop运行时存储的临时文件目录--> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-2.7.2/data/tmp</value> </property> </configuration> |
4.5.4. 配置java环境变量
[root@hadoop01 hadoop]# vim /opt/module/hadoop-2.7.2/etc/hadoop/hadoop-env.sh
修改:
export JAVA_HOME=/opt/module/jdk1.8.0_144
4.5.5. 配置hdfs-site.xml
[root@hadoop01 hadoop]# vim /opt/module/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
<configuration> <!-- 指定hdfs副本的数量--> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.http-address</name> <value>hadoop01:50070</value> </property> </configuration> |
4.6. 配置yarn
4.6.1. 配置jdk路径
[root@hadoop01 hadoop]# vim yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144 |
4.6.2. 配置yarn-site.xml
[root@hadoop01 hadoop]# vim /opt/module/hadoop-2.7.2/etc/hadoop/yarn-site.xml
<configuration> <!--reduce获取数据方式--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--配置yarn的resourcemanager地址--> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop01</value> </property> </configuration> |
4.6.3. 配置mapreduce的jdk环境
[root@hadoop01 hadoop]# vim mapred-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144 |
4.6.4. 配置mapred-site.xml
[root@hadoop01 hadoop]# mv mapred-site.xml.template mapred-site.xml
<configuration> <!--指定mapreduce的运行方式yarn--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> |
4.6.5. 配置dataNode节点
[root@hadoop01 hadoop]# vim slaves
hadoop01
hadoop02
hadoop03
4.6.6. 同步集群配置
[root@hadoop01 module]# xsync /opt/module/hadoop-2.7.2/
4.6.7. 启动集群方式
方式一: 单节点启动HDFS
[root@hadoop01 sbin]# ./hadoop-daemons.sh start/stop/restart namenode|datanode|secondarynamenode
方式二: 多节点启动HDFS
[root@hadoop01 sbin]# ./stop-dfs.sh 或者
[root@hadoop01 sbin]# ./start-dfs.sh
Starting namenodes on [hadoop01]
hadoop01: starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-namenode-hadoop01.out
hadoop03: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-hadoop03.out
hadoop02: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-hadoop02.out
hadoop01: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-hadoop01.out
4.7. 启动集群
第一次启动需要格式化namenode,以后再次启动不需要再执行以下命令:
[root@hadoop01 bin]# ./hadoop namenode -format
启动HDFS集群:
[root@hadoop01 sbin]# ./start-dfs.sh
4.8. 查看集群存储
5. HDFS的shell操作
5.1. 基本命令格式
bin/hadoop fs 命令
5.2. 命令大全
Usage: hadoop fs [generic options] [-appendToFile <localsrc> ... <dst>] [-cat [-ignoreCrc] <src> ...] [-checksum <src> ...] [-chgrp [-R] GROUP PATH...] [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>] [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-count [-q] [-h] <path> ...] [-cp [-f] [-p | -p[topax]] <src> ... <dst>] [-createSnapshot <snapshotDir> [<snapshotName>]] [-deleteSnapshot <snapshotDir> <snapshotName>] [-df [-h] [<path> ...]] [-du [-s] [-h] <path> ...] [-expunge] [-find <path> ... <expression> ...] [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-getfacl [-R] <path>] [-getfattr [-R] {-n name | -d} [-e en] <path>] [-getmerge [-nl] <src> <localdst>] [-help [cmd ...]] [-ls [-d] [-h] [-R] [<path> ...]] [-mkdir [-p] <path> ...] [-moveFromLocal <localsrc> ... <dst>] [-moveToLocal <src> <localdst>] [-mv <src> ... <dst>] [-put [-f] [-p] [-l] <localsrc> ... <dst>] [-renameSnapshot <snapshotDir> <oldName> <newName>] [-rm [-f] [-r|-R] [-skipTrash] <src> ...] [-rmdir [--ignore-fail-on-non-empty] <dir> ...] [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]] [-setfattr {-n name [-v value] | -x name} <path>] [-setrep [-R] [-w] <rep> <path> ...] [-stat [format] <path> ...] [-tail [-f] <file>] [-test -[defsz] <path>] [-text [-ignoreCrc] <src> ...] [-touchz <path> ...] [-truncate [-w] <length> <path> ...] [-usage [cmd ...]] |
查看帮助:
[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -命令
5.3. 常用的命令
5.3.1. 显示目录信息
[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -ls /
5.3.2. 上传文件到hdfs
[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -put /opt/software/hadoop-2.7.2.tar.gz hdfs://hadoop01:9000/
5.3.3. -mkdir 创建目录
[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -mkdir -p /abc/dbf/ccc
5.3.4. 删除目录
[root@hadoop01 hadoop-2.7.2]# bin/hadoop fs -mkdir -p /abc
6. HDFS的java客户端操作
6.1. 配置win10环境变量
6.2. 创建工程并导入依赖
<dependencies> |
6.3. 创建HdfsCient类
|
小结:
- 获取配置
- 设置配置信息
- 创建命令
- 关闭对象