Hive分区表与分桶

2023-08-14 22:51:28

分区表

在Hive Select查询中。通常会扫描整个表内容，会消耗非常多时间做不是必需的工作。

分区表指的是在创建表时，指定partition的分区空间。

分区语法

create table tablename

name string

)

partitioned by(key type,…）

create table if not exists employees(

name string,

salary string,

subordinates array<string>,

deductions map<string,float>,

address struct<street:string,city:string,state:string,zip:int>

)

partitioned by (dt string,type string)

row format delimited fields terminated by '\t'

collection items terminated by ','

map keys terminated by ':'

lines terminated by '\n'

stored as textfile

;

Hive分区表与分桶

分区表操作

添加分区

Alter table employees add if not exists partition(country='xxx'[,state='yyyy'])

Alter table employees add if not exists partition(dt='20140715',type='test');

Hive分区表与分桶

删除分区

Alter table employees drop if exists partition(country='xxx'[,state='yyyy’)

Hive分桶

对于每个表（table）或者分区。Hive能够进一步组织成桶，也就是说捅是更为细粒度的数据范困划分。

Hive是针对某一列进行分捅。

Hive採用对列值哈希，然后除以捅的个数求余的方式决定该条记录存放在哪个桶其中。

优点

获得更高的查询处理效率。

使取样（sampling）更高效

分桶语法

create table bucketed_user(

id string ,

name string

)

clustered by (id) sorted by (name) into 4 buckets

row format delimited fields terminated by '\t'

stored as textfile;

设置

set hive.enforce.bucketing = true;

插入数据

insert overwrite table bucketed_user select addr ,name from testtable;

Hive分区表与分桶

Hive分区与分桶比較

Hive分区表与分桶

码农公寓

分区表

Hive分桶

相关文章