大数据开发之Hive篇9-Hive宏(Macro)

2024-03-08 08:55:57

备注:
Hive 版本 2.1.1

文章目录

一.Hive宏介绍
二.Hive 宏案例
参考:

一.Hive宏介绍

在编写HQL的过程中，很多逻辑需要反复使用。在关系型函数中，可以通过自定义函数来实现，Hive中实现的方法有多种。
1） Hive 宏
2） Hive hpl/sql
3） Hive UDF

这里，我们介绍实现最简单的一种Hive 宏。

语法:

-- 创建宏
CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression;
-- 删除宏
DROP TEMPORARY MACRO [IF EXISTS] macro_name;

创建临时宏使用给定的可选列列表作为表达式的输入创建宏。宏在当前会话期间存在。
宏只能是临时宏，只在本次会话中可见、有效。因此你需要将宏脚本放在SQL脚本的头部。
如果函数不存在，DROP临时宏将返回错误，除非指定了if EXISTS。

二.Hive 宏案例

2.1 数据准备

degree_type
1  -- 小学
2  -- 初中
3  -- 职业高中
4  -- 中专
5  -- 高中
6  -- 大专
7  -- 本科
8  -- 硕士
8 -- 博士


create table user_info(id int,degree_type int);
insert into user_info values (1,3);
insert into user_info values (2,1);
insert into user_info values (3,6);
insert into user_info values (4,4);
insert into user_info values (5,5);
insert into user_info values (6,9);
insert into user_info values (7,8);
insert into user_info values (8,2);
insert into user_info values (9,7);


hive> 
    > select * from user_info;
OK
user_info.id    user_info.degree_type
1       3
2       1
3       6
4       4
5       5
6       9
7       8
8       2
9       7
Time taken: 0.088 seconds, Fetched: 9 row(s)

此时需要展示的是学历的中文信息，而非枚举类型的1、2、3这些数字。

2.2 case when实现

代码:

select id,
       case degree_type when 1 then '小学'
                        when 2 then '初中'
                        when 3 then '职业高中'
                        when 4 then '中专'
                        when 5 then '高中'
                        when 6 then '大专'
                        when 7 then '本科'
                        when 8 then '硕士'
                        when 9 then '博士'
                        else null
     end as degree 
  from user_info

测试记录:

hive> select id,
    >        case degree_type when 1 then '小学'
    >                         when 2 then '初中'
    >                         when 3 then '职业高中'
    >                         when 4 then '中专'
    >                         when 5 then '高中'
    >                         when 6 then '大专'
    >                         when 7 then '本科'
    >                         when 8 then '硕士'
    >                         when 9 then '博士'
    >                         else null
    >      end as degree 
    >   from user_info ;
Query ID = root_20201217103340_b5a1b58f-de3e-4ed8-98c4-dbd079412474
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0237, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0237/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0237
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-12-17 10:33:47,506 Stage-1 map = 0%,  reduce = 0%
2020-12-17 10:33:53,690 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.13 sec
MapReduce Total cumulative CPU time: 6 seconds 130 msec
Ended Job = job_1606698967173_0237
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 6.13 sec   HDFS Read: 9479 HDFS Write: 429 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 130 msec
OK
id      degree
1       职业高中
5       高中
6       博士
7       硕士
8       初中
2       小学
3       大专
4       中专
9       本科
Time taken: 14.239 seconds, Fetched: 9 row(s)
hive>

2.3 Hive宏实现

代码:

drop temporary macro if exists get_degree;

create temporary macro get_degree(degree_type string)
if (degree_type is not null,
   case degree_type when 1 then '小学'
                        when 2 then '初中'
                        when 3 then '职业高中'
                        when 4 then '中专'
                        when 5 then '高中'
                        when 6 then '大专'
                        when 7 then '本科'
                        when 8 then '硕士'
                        when 9 then '博士'
                        else null
     end,null);

select id,get_degree(degree_type) from user_info;

测试记录:

hive> 
    > create temporary macro get_degree(degree_type string)
    > if (degree_type is not null,
    >    case degree_type when 1 then '小学'
    >                         when 2 then '初中'
    >                         when 3 then '职业高中'
    >                         when 4 then '中专'
    >                         when 5 then '高中'
    >                         when 6 then '大专'
    >                         when 7 then '本科'
    >                         when 8 then '硕士'
    >                         when 9 then '博士'
    >                         else null
    >      end,null);
OK
Time taken: 0.032 seconds
hive> 
    > select id,get_degree(degree_type) from user_info;
Query ID = root_20201217104333_6ac59bfb-4915-48f8-965f-fae4642957c0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0238, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0238/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0238
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-12-17 10:43:42,748 Stage-1 map = 0%,  reduce = 0%
2020-12-17 10:43:48,993 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 3.11 sec
2020-12-17 10:43:50,025 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.21 sec
MapReduce Total cumulative CPU time: 6 seconds 210 msec
Ended Job = job_1606698967173_0238
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 6.21 sec   HDFS Read: 10077 HDFS Write: 429 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 210 msec
OK
id      _c1
1       职业高中
5       高中
6       博士
7       硕士
8       初中
2       小学
3       大专
4       中专
9       本科
Time taken: 17.841 seconds, Fetched: 9 row(s)
hive> exit;
[root@hp1 ~]# hive
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/hive-common-2.1.1-cdh6.3.1.jar!/hive-log4j2.properties Async: false

WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> use test;
OK
Time taken: 1.204 seconds
hive> select id,get_degree(degree_type) from user_info;
FAILED: SemanticException [Error 10011]: Invalid function get_degree
hive>

测试记录可以看出，exit退出hive客户端后重新登陆，临时宏就不可用了，如果使用宏，需要在当前回话的最开始运行创建宏的语句。

参考:

1.https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/DropMacro

码农公寓