Hive 5、Hive 的数据类型 和 DDL Data Definition Language)

官方帮助文档:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

Hive的数据类型

-- 扩展数据类型
data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type -- (Note: Available in Hive 0.7.0 and later)
array_type : ARRAY < data_type >
map_type : MAP < primitive_type, data_type >
struct_type : STRUCT < col_name : data_type [COMMENT col_comment], ...>
union_type : UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)
-- 基本数据类型
primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| STRING
| BINARY -- (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)
| DECIMAL -- (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)
| DATE -- (Note: Available in Hive 0.12.0 and later)
| VARCHAR -- (Note: Available in Hive 0.12.0 and later)
| CHAR -- (Note: Available in Hive 0.13.0 and later)

Hive  DDL

  Hive完整的DDL

  Hive DDL的语方法为类SQL语法,所以标准的SQL语法大多数在Hive中都可用;

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

   Hive建表

-- Hive建表 语法
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path]; create table person(
id int,
name string,
age int,
likes array<string>,
address map<string,string>
)
row format delimited
-- 指定导入数据的列与列之间的分隔符
fields terminated by ','
-- 指定Array类型的分隔符
collection ITEMS TERMINATED BY '-'
-- 指定map类型的分隔符
map keys terminated by ':'
-- 指定行与行之间的分隔符
lines terminated by '\n';

  表新建成功以后可以使用desc查询表结构

Hive 5、Hive 的数据类型 和 DDL  Data Definition Language)

  Hive导入数据

# 三条数据,列与列之间用,号隔开;array之间用-号隔开;map之间用:号隔开;行与行用换行符隔开
1,tom,28,game-music-book,stu:henan-home:henan-work:beijing
2,jack,21,money-meinv,stu:wuhan-home:wuhan
3,lusi,18,shopping-music,stu:shanghai-home:beijing

  导入数据:

hive> load data local inpath '/opt/data.txt' overwrite into table person;

  Hive 5、Hive 的数据类型 和 DDL  Data Definition Language)

  数据查询 

-- 查询所有
select * from person; -- 还可以这样查
select * from person where name='tom'; -- 或者这样
select * from person where likes[]='music'; -- 还有这样
select * from person where address['stu']='shanghai'; -- 还有这样
select avg(age) from person; -- ... 等标准的SQL语法大多都可以在Hive中使用包括一些函数,因为Hive是类SQL的;

  但在Hive中不推荐进行这些操作:Insert、Update、Delete等操作,因为Hive的特性是对数据仓库的数据进行提取,针对的数据是批量的,不适合行级的运算;

  

  清空表

-- 使truncate清空表
TRUNCATE TABLE person;
-- 通过覆盖的方式清空表
insert overwrite table person select * from person where 1=2;

  删除表

drop table person;
上一篇:SELECT INTO 和 INSERT INTO SELECT比较


下一篇:python之路--模块和包