HDFS中大数据常见运维指令总结

一、查看HDFS下的参数信息

[root@master ~]# hdfs
Usage: hdfs [--config confdir] COMMAND
       where COMMAND is one of:
  dfs                  run a filesystem command on the file systems supported in Hadoop.
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  journalnode          run the DFS journalnode
  zkfc                 run the ZK Failover Controller daemon
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  haadmin              run a DFS HA admin client
  fsck                 run a DFS filesystem checking utility
  balancer             run a cluster balancing utility
  jmxget               get JMX exported values from NameNode or DataNode.
  mover                run a utility to move block replicas across
                       storage types
  oiv                  apply the offline fsimage viewer to an fsimage
  oiv_legacy           apply the offline fsimage viewer to an legacy fsimage
  oev                  apply the offline edits viewer to an edits file
  fetchdt              fetch a delegation token from the NameNode
  getconf              get config values from configuration
  groups               get the groups which users belong to
  snapshotDiff         diff two snapshots of a directory or diff the
                       current directory contents with a snapshot
  lsSnapshottableDir   list all snapshottable dirs owned by the current user
                                                Use -help to see options
  portmap              run a portmap service
  nfs3                 run an NFS version 3 gateway
  cacheadmin           configure the HDFS cache
  crypto               configure HDFS encryption zones
  storagepolicies      get all the existing block storage policies
  version              print the version

Most commands print help when invoked w/o parameters.

二、hdfs与dfs结合使用的参数信息

[root@master ~]# hdfs dfs
Usage: hadoop fs [generic options]
        [-appendToFile <localsrc> ... <dst>]
        [-cat [-ignoreCrc] <src> ...]
        [-checksum <src> ...]
        [-chgrp [-R] GROUP PATH...]
        [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
        [-chown [-R] [OWNER][:[GROUP]] PATH...]
        [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
        [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
        [-count [-q] [-h] <path> ...]
        [-cp [-f] [-p | -p[topax]] <src> ... <dst>]
        [-createSnapshot <snapshotDir> [<snapshotName>]]
        [-deleteSnapshot <snapshotDir> <snapshotName>]
        [-df [-h] [<path> ...]]
        [-du [-s] [-h] <path> ...]
        [-expunge]
        [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
        [-getfacl [-R] <path>]
        [-getfattr [-R] {-n name | -d} [-e en] <path>]
        [-getmerge [-nl] <src> <localdst>]
        [-help [cmd ...]]
        [-ls [-d] [-h] [-R] [<path> ...]]
        [-mkdir [-p] <path> ...]
        [-moveFromLocal <localsrc> ... <dst>]
        [-moveToLocal <src> <localdst>]
        [-mv <src> ... <dst>]
        [-put [-f] [-p] [-l] <localsrc> ... <dst>]
        [-renameSnapshot <snapshotDir> <oldName> <newName>]
        [-rm [-f] [-r|-R] [-skipTrash] <src> ...]
        [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
        [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
        [-setfattr {-n name [-v value] | -x name} <path>]
        [-setrep [-R] [-w] <rep> <path> ...]
        [-stat [format] <path> ...]
        [-tail [-f] <file>]
        [-test -[defsz] <path>]
        [-text [-ignoreCrc] <src> ...]
        [-touchz <path> ...]
        [-usage [cmd ...]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

其他一些操作命令

说明:仅记录用于学习的指令。

1、追加文件内容到hdfs文件系统中的文件

hdfs dfs -appendToFile testc.sh /top.sh

2、查看hadoop的Sequencefile文件内容

[root@master ~]# hdfs dfs -text /sparktest.squence
11 aa
22 bb
11 cc

3、使用df命令查看可用空间

[root@master ~]# hdfs dfs -df -h
Filesystem Size Used Available Use%
hdfs://master:9000 64.5 G 812.7 M 49.4 G 1%

4、降低复制因子(默认3副本)

[root@master ~]# hdfs dfs -setrep -w 2 /sparktest.txt

5、使用du命令查看已用空间

[root@master ~]# hdfs dfs -du -s -h /hbase
240.3 K  /hbase

 

三、hdfs与getconf结合使用

[root@master ~]# hdfs getconf
hdfs getconf is utility for getting configuration information from the config file.

hadoop getconf 
        [-namenodes]                    gets list of namenodes in the cluster.
        [-secondaryNameNodes]                   gets list of secondary namenodes in the cluster.
        [-backupNodes]                  gets list of backup nodes in the cluster.
        [-includeFile]                  gets the include file path that defines the datanodes that can join the cluster.
        [-excludeFile]                  gets the exclude file path that defines the datanodes that need to decommissioned.
        [-nnRpcAddresses]                       gets the namenode rpc addresses
        [-confKey [key]]                        gets a specific key from the configuration

1、获取NameNode的节点名称

[root@master ~]# hdfs getconf -namenodes
master

2、获取hdfs最小块信息(默认大小为1M,即1048576字节,如果想要修改的话必须为512的倍数,因为HDFS底层传输数据是每512字节进行校验)

[root@master ~]# hdfs getconf -confKey dfs.namenode.fs-limits.min-block-size
1048576

3、查找hdfs的NameNode的RPC地址

[root@master ~]# hdfs getconf -nnRpcAddresses
master:9000

 

四、hdfs与dfsadmin结合使用

[root@master ~]# hdfs dfsadmin
Usage: hdfs dfsadmin
Note: Administrative commands can only be run as the HDFS superuser.
        [-report [-live] [-dead] [-decommissioning]]
        [-safemode <enter | leave | get | wait>]
        [-saveNamespace]
        [-rollEdits]
        [-restoreFailedStorage true|false|check]
        [-refreshNodes]
        [-setQuota <quota> <dirname>...<dirname>]
        [-clrQuota <dirname>...<dirname>]
        [-setSpaceQuota <quota> <dirname>...<dirname>]
        [-clrSpaceQuota <dirname>...<dirname>]
        [-finalizeUpgrade]
        [-rollingUpgrade [<query|prepare|finalize>]]
        [-refreshServiceAcl]
        [-refreshUserToGroupsMappings]
        [-refreshSuperUserGroupsConfiguration]
        [-refreshCallQueue]
        [-refresh <host:ipc_port> <key> [arg1..argn]
        [-reconfig <datanode|...> <host:ipc_port> <start|status>]
        [-printTopology]
        [-refreshNamenodes datanode_host:ipc_port]
        [-deleteBlockPool datanode_host:ipc_port blockpoolId [force]]
        [-setBalancerBandwidth <bandwidth in bytes per second>]
        [-fetchImage <local directory>]
        [-allowSnapshot <snapshotDir>]
        [-disallowSnapshot <snapshotDir>]
        [-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
        [-getDatanodeInfo <datanode_host:ipc_port>]
        [-metasave filename]
        [-setStoragePolicy path policyName]
        [-getStoragePolicy path]
        [-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
        [-help [cmd]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

1、查看指定命令的帮助信息

[root@master ~]# hdfs dfsadmin -help safemode
-safemode <enter|leave|get|wait>:  Safe mode maintenance command.
                Safe mode is a Namenode state in which it
                        1.  does not accept changes to the name space (read-only)
                        2.  does not replicate or delete blocks.
                Safe mode is entered automatically at Namenode startup, and
                leaves safe mode automatically when the configured minimum
                percentage of blocks satisfies the minimum replication
                condition.  Safe mode can also be entered manually, but then
                it can only be turned off manually as well.

2、查看当前的模式

[root@master ~]# hdfs dfsadmin -safemode get
Safe mode is OFF

3、进入安全模式

[root@master ~]# hdfs dfsadmin -safemode enter

4、离开安全模式

[root@master ~]# hdfs dfsadmin -safemode leave

5、安全模式的wait状态

[root@master ~]# hdfs dfsadmin -safemode wait

6、检查HDFS集群的状态

[root@master ~]# hdfs dfsadmin -report
Configured Capacity: 69209960448 (64.46 GB)     #此集群中HDFS已配置的容量
Present Capacity: 53855645696 (50.16 GB)     #现有的HFDS容量
DFS Remaining: 53003517952 (49.36 GB)       #剩余的HDFS容量
DFS Used: 852127744 (812.65 MB)          #HDFS使用存储的统计信息,按照文件大小统计
DFS Used%: 1.58%                    #同上,这里按照的是百分比统计
Under replicated blocks: 156            ##显示是否有任何未充分复制的块
Blocks with corrupt replicas: 0           #显示是否有损坏的块 
Missing blocks: 0                    #显示是否有丢失的块

-------------------------------------------------
Live datanodes (3):                  #显示集群中有多少个DataNode是活动的并可用
Name: 192.168.200.102:50010 (slave02)
Hostname: slave02
Decommission Status : Normal             #当前节点的DataNode的状态(Normal表示正常)
Configured Capacity: 23069986816 (21.49 GB)    #DataNOde的配置和使用的容量
DFS Used: 284041216 (270.88 MB)
Non DFS Used: 3754188800 (3.50 GB)
DFS Remaining: 19031756800 (17.72 GB)
DFS Used%: 1.23%
DFS Remaining%: 82.50%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)                    #缓存使用情况统计信息
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Aug 12 10:30:19 CST 2019


Name: 192.168.200.100:50010 (master)
Hostname: master
Decommission Status : Normal
Configured Capacity: 23069986816 (21.49 GB)
DFS Used: 284045312 (270.89 MB)
Non DFS Used: 7988813824 (7.44 GB)
DFS Remaining: 14797127680 (13.78 GB)
DFS Used%: 1.23%
DFS Remaining%: 64.14%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Aug 12 10:30:18 CST 2019


Name: 192.168.200.101:50010 (slave01)
Hostname: slave01
Decommission Status : Normal
Configured Capacity: 23069986816 (21.49 GB)
DFS Used: 284041216 (270.88 MB)
Non DFS Used: 3611312128 (3.36 GB)
DFS Remaining: 19174633472 (17.86 GB)
DFS Used%: 1.23%
DFS Remaining%: 83.12%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Aug 12 10:30:19 CST 2019

7、获取某个namenode的节点状态

hdfs haadmin -getServiceState master

五、hdfs与fsck结合使用

1、查看hdfs文件系统信息

[root@master hadoop]# hdfs fsck /
.........................................
........................................
Total size: 279242984 B                        #代表根目录下文件总大小 Total dirs: 342                             #根目录下总共有多少目录 Total files: 460                             #代表检测的目录下总共有多少文件 Total symlinks: 0                     #代表检测下目录下有多少个符号链接 Total blocks (validated): 434 (avg. block size 643417 B) #代表检测的目录下有多少的block是有效的 Minimally replicated blocks: 434 (100.0 %) #代表拷贝的最小block块数 Over-replicated blocks: 0 (0.0 %)                #代表当前副本数大于指定副本数的block数量 Under-replicated blocks: 156 (35.944702 %)            #代表当前副本数小于指定副本数的block数量 Mis-replicated blocks: 0 (0.0 %)                #代表丢失的block数量 Default replication factor: 3                     #代表默认的副本数(自身一份,默认拷贝两份) Average block replication: 3.0                    #代表块平均的副本数 Corrupt blocks: 0                      #代表坏的block数,这个值不为0,说明当前集群有不可恢复的块,即数据丢失 Missing replicas: 1092 (45.614037 %)            #代表丢失的副本数  
Number of data-nodes: 3                      代表有多少个Datanode节点 Number of racks: 1                      #代表有多少个机架 FSCK ended at Mon Aug 12 10:44:46 CST 2019 in 217 milliseconds The filesystem under path '/' is HEALTHY              #检测状态

2、fsck指令显示HDFS块信息

Status: HEALTHY
 Total size:    279242984 B
 Total dirs:    342
 Total files:   460
 Total symlinks:                0

[root@master hadoop]# hdfs fsck / -files -blocks
.............................................................
.................................................. Total blocks (validated): 434 (avg. block size 643417 B) Minimally replicated blocks: 434 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 156 (35.944702 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Corrupt blocks: 0 Missing replicas: 1092 (45.614037 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Mon Aug 12 10:54:27 CST 2019 in 415 milliseconds The filesystem under path '/' is HEALTHY

 

上一篇:Valgrind


下一篇:行的迁移