- 这篇在学习之前,需要搭建好HBase集群,当你的集群搭建好了,那么就开始学习HBase的shell操作把~
- 这篇文章将只会介绍使用频率最高的shell命令,这些没有什么难度,只是一些熟练问题,我的HBase版本是2.1.1
- 进入HBase的命令行的命令是
HBase shell
- 查看HBase shell中命令帮助的命令是
help 'xxx'
create建表
-
前一篇文章说过了,建表必须指定列族这件事是不能忘的,比如创建一个名为test的表,表中有一个列族名为cf
hbase(main):029:0> create 'test','cf' Created table test Took 1.2710 seconds => Hbase::Table - test
- 所以也证实了之前说的必须有列族的指定,如果不加会报错的,列是依附于列族上的
-
以上创建方式只是指定了列族而没有指定列族内的列,为什么不用指定列呢?
- HBase不像RDBMS一样在建表的时候就必须指定列,因为RDBMS的数据需要有地方放,如果RDBMS不指定列,那么他一个表还有什么?他的数据往哪里放?但是在HBASE中列是相当灵活的,如果你现在不懂什么意思也没关系,下面shell操作会说明这一切.HBase中的列只有在你需要插入数据的时候才会生成,不过确切地说不能叫“生成”,因为并没 有生成列定义之类的操作(意思是如果你建表,会有建表的操作以及表的定义但是列是没有的).你只是向HBase中插入了一个单元格,而这 个单元格是由表:列族:行:列来定位的,列名就成为了cell的属性名,这才让这行数据有了一个列,而别的行有没有这一列,HBase只有遍历的时候才会知道,如果还是不知道怎么回事,下面在介绍put的时候我会画一张图说明一下
list查看库中表
hbase(main):031:0> list
TABLE
test #只有test表
1 row(s)
Took 0.0354 seconds
=> ["test"]
describe查看表属性
-
查看test表的属性
hbase(main):032:0> desc 'test' Table test is ENABLED test COLUMN FAMILIES DESCRIPTION { NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536' } 1 row(s) Took 0.1347 seconds
- 用describe和desc效果是一样的
-
观察上面的输出NAME只是列族名而不是表名,并且后面的全部属性都是针对这个cf列族的,为了说明,我们将增加一个列族
hbase(main):034:0> alter 'test','cf2' Updating all regions with the new schema... 1/1 regions updated. #更新Region,因为Region是按行来存储的,现在一行的结构发生了变化 Done. Took 2.6644 seconds
-
再次来看表属性
hbase(main):035:0> desc 'test' Table test is ENABLED test COLUMN FAMILIES DESCRIPTION { NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536' } { NAME => 'cf2', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE =>'65536' } 2 row(s) Took 0.0913 seconds
- 果然是描述列族的
put添加数据
-
我们在HBase中的cf列族内增加指定列
hbase(main):036:0> put 'test','row1','cf:name','wangziqiang' Took 0.2273 seconds hbase(main):037:0> put 'test','row1','cf:age',20 Took 0.0156 seconds hbase(main):038:0> put 'test','row1','cf:height',183 Took 0.0154 seconds
- shell说明:在test表中,插入一行数据,rowkey为row1,这一行的cf列族内添加单元格的列名为name,age,height,数据分别为wangziqiang,20,183
-
可以使用scan扫描表中数据
hbase(main):039:0> scan 'test' ROW COLUMN+CELL row1 column=cf:age, timestamp=1543664259164, value=20 row1 column=cf:height, timestamp=1543664308514, value=183 row1 column=cf:name, timestamp=1543664222231, value=wangziqiang 1 row(s) Took 0.0435 seconds
-
到这我们就知道了之前说的列增加是相当灵活是怎么一回事了,下面是数据各部分的定义
# rowkey 列族:列名 时间戳 值 row1 column=cf:name, timestamp=1543664222231, value=wangziqiang
- 关于时间戳:如果你不指定的话就像刚才我们那种put使用方法,那么系统会以插入时间的时间戳为准作为其值,当然HBase也可以支持你自己定义timestamp的值,任意的都可以123,321...,HBase当然也是喜新厌旧的,它会展示最新的timestamp的数据
-
那我们之前说的,当增加数据的时候rowkey如果重复那么其值就会做更新操作是真的吗?
hbase(main):043:0> put 'test','row1','cf:age',18 Took 0.0139 seconds hbase(main):044:0> scan 'test' ROW COLUMN+CELL row1 column=cf:age, timestamp=1543664971060, value=18 #发生变化 row1 column=cf:height, timestamp=1543664308514, value=183 row1 column=cf:name, timestamp=1543664222231, value=wangziqiang 1 row(s) Took 0.0520 seconds
- 确实是如我们所说,同时它也更新了timestamp的值,那么之前说的被更新掉的值没有被删掉是真的吗?
-
要想查验这个,我们就必须更改表的属性了,我们看到刚才用desc查看表属性中有一个属性名为VERSIONS.他代表了你的表为你保留几个被更新掉的数据,默认的为1,所以如果我们想要看到历史记录,就需要修改这个信息
hbase(main):047:0> alter 'test',{NAME=>'cf',VERSIONS=>5} Updating all regions with the new schema... 1/1 regions updated. Done. Took 3.5151 seconds
-
注意符号是
=>
,并不是=
,当我们再次desc查看属性的时候,我们发现该属性已经变了,所以现在我们将多put几次数据,查看一下效果hbase(main):059:0> put 'test','row1','cf:age',17 Took 0.0338 seconds hbase(main):060:0> put 'test','row1','cf:age',16 Took 0.0082 seconds hbase(main):061:0> put 'test','row1','cf:age',15 Took 0.0175 seconds hbase(main):062:0> put 'test','row1','cf:age',20 Took 0.0152 seconds hbase(main):063:0> get 'test','row1',{COLUMN=>'cf:age',VERSIONS=>5} COLUMN CELL cf:age timestamp=1543665590821, value=20 cf:age timestamp=1543665587740, value=15 cf:age timestamp=1543665582866, value=16 cf:age timestamp=1543665580576, value=17 cf:age timestamp=1543664971060, value=18 1 row(s) Took 0.0570 seconds
- 看来之前说的都是对的哈哈,对于get命令接下来会说到的,现在的意思就是取你age单元格的五条历史信息,当然这个数字如果超过属性VERSIONS定义的,也是以属性VERSIONS数量为准
-
好了回过头来解决之前的一个问题:为什么不能叫生成列?
- 对于表的生成不管是HBase还是RDBMS中,都有这个表结构的定义,那么我们上面已经学会简单的使用put存放数据了,那为什么列不能确切的定义为生成呢?我们知道列族是对于整个表起作用的,但是列族下的列对于每个行来说是不同的,如图
-
shell操作证明图片是正确的
hbase(main):069:0> put 'test','row2','cf:phone',1348888888 Took 0.0445 seconds hbase(main):071:0> put 'test','row2','cf:addr','beijing' Took 0.0256 seconds hbase(main):072:0> put 'test','row3','cf:id',132 Took 0.0269 seconds hbase(main):073:0> scan 'test' ROW COLUMN+CELL row1 column=cf:age, timestamp=1543665757228, value=22 row1 column=cf:height, timestamp=1543664308514, value=183 row1 column=cf:name, timestamp=1543664222231, value=wangziqiang row2 column=cf:addr, timestamp=1543669367525, value=beijing row2 column=cf:phone, timestamp=1543669351162, value=1348888888 row3 column=cf:id, timestamp=1543669389010, value=132 3 row(s) Took 0.0428 seconds
- 看到这我们就知道了,这个列为啥不能说叫生成了,因为他在表中并没有结构的定义,每一行都不尽相同,HBase并不知道每一行数据列有什么差距,只能是在扫描表的时候,他才会知道,并且这个列名,其实就是单元格cell的名字(我的理解)
scan扫描表
- 之前已经简单的使用过了scan来查看表的数据了,但是HBase在使用过程中,表数据会相当庞大,所以不应该直接使用scan扫描整个表,而是指定扫描范围
-
扫描范围可以指定开始行和结束行,时间戳范围,指定列都可以,具体的可以查看HBase的命令帮助,下面我们将使用开始行和结束行进行限制扫描,以及使用时间戳范围扫描
hbase(main):075:0> scan 'test' ROW COLUMN+CELL row1 column=cf:age, timestamp=1543665757228, value=22 row1 column=cf:height, timestamp=1543664308514, value=183 row1 column=cf:name, timestamp=1543664222231, value=wangziqiang row2 column=cf:addr, timestamp=1543669367525, value=beijing row2 column=cf:phone, timestamp=1543669351162, value=1348888888 row3 column=cf:id, timestamp=1543669389010, value=132 3 row(s) Took 0.2155 seconds #开始行和结束行扫描 hbase(main):076:0> scan 'test',{STARTROW=>'row1',ENDROW=>'row2'} #前包后不包 ROW COLUMN+CELL row1 column=cf:age, timestamp=1543665757228, value=22 row1 column=cf:height, timestamp=1543664308514, value=183 row1 column=cf:name, timestamp=1543664222231, value=wangziqiang 1 row(s) Took 0.0569 seconds #开始时间戳范围扫描 hbase(main):085:0> scan 'test', {COLUMNS => 'cf', TIMERANGE => [1543665757228,1543669367525]} ROW COLUMN+CELL row1 column=cf:age, timestamp=1543665757228, value=22 row2 column=cf:phone, timestamp=1543669351162, value=1348888888 2 row(s) Took 0.0287 seconds
get获取值
-
以最简单的为例子:row1:cf:name
hbase(main):086:0> get 'test','row1','cf:name' COLUMN CELL cf:name timestamp=1543664222231, value=wangziqiang 1 row(s) Took 0.0441 seconds
- get也有过滤的功能,像之前的获取历史版本的条数就算是过滤,具体的可以看命令帮助
delete删除表数据
-
我们将历史记录删除一个
hbase(main):096:0> get 'test','row1',{COLUMN=>'cf:age',VERSIONS=>5} COLUMN CELL cf:age timestamp=1543665757228, value=22 cf:age timestamp=1543665590821, value=20 cf:age timestamp=1543665587740, value=15 cf:age timestamp=1543665582866, value=16 cf:age timestamp=1543664971060, value=18 1 row(s) Took 0.0279 seconds hbase(main):097:0> delete 'test','row1','cf:age',1543664971060 Took 0.0222 seconds hbase(main):098:0> get 'test','row1',{COLUMN=>'cf:age',VERSIONS=>5} COLUMN CELL cf:age timestamp=1543665757228, value=22 cf:age timestamp=1543665590821, value=20 cf:age timestamp=1543665587740, value=15 cf:age timestamp=1543665582866, value=16 1 row(s) Took 0.0252 seconds
-
如果不指定删除的时间戳,那么是删除最新数据的,我们再次扫描,数据真真的被删除了,其实不然,他只是打上了一个删除标记,想查出被删除的数据还是有办法的
hbase(main):105:0> scan 'test',{RAW=>TRUE,VERSIONS=>5} ROW COLUMN+CELL row1 column=cf:age, timestamp=1543665757228, type=Delete row1 column=cf:age, timestamp=1543665757228, value=22 row1 column=cf:age, timestamp=1543665590821, value=20 row1 column=cf:age, timestamp=1543665587740, type=Delete row1 column=cf:age, timestamp=1543665587740, value=15 row1 column=cf:age, timestamp=1543665582866, value=16 row1 column=cf:age, timestamp=1543665580576, type=Delete row1 column=cf:age, timestamp=1543665580576, value=17 row1 column=cf:age, timestamp=1543664971060, type=Delete row1 column=cf:height, timestamp=1543664308514, value=183 row1 column=cf:name, timestamp=1543664222231, value=wangziqiang row2 column=cf:addr, timestamp=1543669367525, value=beijing row2 column=cf:phone, timestamp=1543669351162, value=1348888888 row3 column=cf:id, timestamp=1543669389010, value=132 3 row(s) Took 0.0914 seconds
- 被删除的数据都会有一个
type=Delete
标记 -
delete我们发现只是可以删除指定的列,那么如果痛快点删除一行数据呢?那就使用deleteall
hbase(main):106:0> scan 'test' ROW COLUMN+CELL row1 column=cf:age, timestamp=1543665590821, value=20 row1 column=cf:height, timestamp=1543664308514, value=183 row1 column=cf:name, timestamp=1543664222231, value=wangziqiang row2 column=cf:addr, timestamp=1543669367525, value=beijing row2 column=cf:phone, timestamp=1543669351162, value=1348888888 row3 column=cf:id, timestamp=1543669389010, value=132 3 row(s) Took 0.0820 seconds hbase(main):107:0> deleteall 'test','row2' Took 0.0622 seconds hbase(main):108:0> scan 'test' ROW COLUMN+CELL row1 column=cf:age, timestamp=1543665590821, value=20 row1 column=cf:height, timestamp=1543664308514, value=183 row1 column=cf:name, timestamp=1543664222231, value=wangziqiang row3 column=cf:id, timestamp=1543669389010, value=132 2 row(s) Took 0.0210 seconds
drop删除表
-
HBase跟RDBMS不一样的是,RDBMS直接删除就可以了,只要你不存在主外键,但是HBase表是有启用和禁用状态的,创建成功默认是启用的,当我们在启用状态删除会报错,所以再删除之前,我们需要禁用表之后再删除,有时候HBase已经上线了,并且有很多人连接到了这个表,这时候禁用表会有些慢,因为他要通知所有使用这个表的RegionServer来禁用这个表
#删除之前先禁用 hbase(main):109:0> disable 'test' Took 1.3984 seconds hbase(main):110:0> scan 'test' ROW COLUMN+CELL org.apache.hadoop.hbase.TableNotEnabledException: test is disabled. at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:736) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:328) at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:139) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ERROR: Table test is disabled! For usage try 'help "scan"' Took 0.1362 seconds hbase(main):111:0> drop 'test' Took 0.7804 seconds hbase(main):112:0> list TABLE 0 row(s) Took 0.0046 seconds => []
- 如上在禁用后,我们可以使用获取数据的命令来检查是否表已经被禁用了,禁用后就可以用drop直接删除了
-
检查是否被禁用也可以使用
is_disable
hbase(main):114:0> create 'test','cf' Created table test Took 1.3164 seconds => Hbase::Table - test hbase(main):115:0> is_disabled 'test' false Took 0.0110 seconds => 1 hbase(main):116:0> disable 'test' Took 0.7647 seconds hbase(main):117:0> is_disabled 'test' true Took 0.0394 seconds => 1 hbase(main):118:0> drop 'test' Took 0.4623 seconds hbase(main):119:0> list TABLE 0 row(s) Took 0.0052 seconds => []
- 好了,命令就介绍了这么几个,然后HBase中还有很多很多,以后用到了再详细说吧哈哈