Hive函数
一、常用内置函数
数学运算函数
字符串函数
时间函数
条件控制函数
集和函数
分组聚合函数
表生成函数
JSON解析函数
窗口分析函数
二、自定义函数
为什么要自定义函数?
有时候 hive 自带的函数不能满足当前需要,需要自定义函数来解决问题
UDF、UDAF、UDTF比较
- UDF 操作作用于单个数据行,并且产生一个数据行作为输出。大多数函数都属于这一类(比如数学函数和字符串函数)。返回对应值,一对一。
- UDAF 接受多个输入数据行,并产生一个输出数据行。像 COUNT 和 MAX 这样的函数就是聚集函数。返回聚合值,多对一。
- UDTF 操作作用于单个数据行,并且产生多个数据行,一个表作为输出。lateral view explore(),返回拆分值,一对多。
自定义UDF
创建临时函数
编写UDF代码
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>hive-udf</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<hadoop.version>2.7.3</hadoop.version>
<hive.version>2.3.6</hive.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>2.3.6</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.3.6</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
</dependencies>
<build>
<finalName>hive-udf</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<!-- 添加此项后,可直接使用mvn package | mvn install -->
<!-- 不添加此项,需直接使用mvn package assembly:single -->
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
自定义标准函数需要继承实现抽象类org.apache.hadoop.hive.ql.udf.generic.GenericUDF
BASE64加密UDF
package udf;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.hive.ql.exec.UDF;
import sun.misc.BASE64Encoder;
import java.io.UnsupportedEncodingException;
/**
* @Author: H.w
* @Date: 2021/1/7 下午3:45
* @Description: Base64加密UDF
**/
public class Base64Encrypt extends UDF {
public String evaluate(String message) throws Exception {
//判断传进来的参数是否为空
if(StringUtils.isBlank(message)){
return "";
}
//base64 加密
byte[] bt = null;
String newMsg = null;
try {
bt = message.getBytes("utf-8");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
if( bt != null){
newMsg = new BASE64Encoder().encode(bt);
}
if(newMsg.contains("\r\n")){
newMsg = newMsg.replace("\r\n","");
}else if(newMsg.contains("\r")){
newMsg = newMsg.replace("\r","");}else if(newMsg.contains("\n")){
newMsg = newMsg.replace("\n","");
}
return newMsg;
}
}
BASE64解密UDF
package udf;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.hive.ql.exec.UDF;
import sun.misc.BASE64Decoder;
/**
* @Author: H.w
* @Date: 2021/1/7 下午3:56
* @Description: Base64解压UDF
**/
public class Base64Decrypt extends UDF {
public String evaluate(String msg) throws Exception {
//判断传进来的参数是否为空
if (StringUtils.isBlank(msg)) {
return "";
}
//base64 解密
byte[] bt = null;
String result = null;
if (msg != null) {
BASE64Decoder decoder = new BASE64Decoder();
try {
bt = decoder.decodeBuffer(msg);
result = new String(bt, "utf-8");
} catch (Exception e) {
e.printStackTrace();
}
}
return result;
}
}
打包
将jar添加到hive服务器
hive> add jar /home/huangwei/IdeaProjects/hive-udf/target/hive-udf-jar-with-dependencies.jar;
Added [/home/huangwei/IdeaProjects/hive-udf/target/hive-udf-jar-with-dependencies.jar] to class path
Added resources: [/home/huangwei/IdeaProjects/hive-udf/target/hive-udf-jar-with-dependencies.jar]
创建临时函数
hive> create temporary function base_en as 'udf.Base64Encrypt';
OK
Time taken: 0.281 seconds
hive> create temporary function base_de as 'udf.Base64Decrypt';
OK
Time taken: 0.004 seconds
测试
hive> select base_en('hello');
OK
aGVsbG8=
Time taken: 0.462 seconds, Fetched: 1 row(s)
hive> select base_de('aGVsbG8=');
OK
hello
Time taken: 0.078 seconds, Fetched: 1 row(s)
注意:这种只是创建临时函数,重启Hive就会时效
创建永久函数
将jar上传到指定位置
[root@localhost lib]# hdfs dfs -mkdir -p /hive/lib;
[root@localhost lib]# hdfs dfs -put /home/huangwei/IdeaProjects/hive-udf/target/hive-udf-jar-with-dependencies.jar /hive/lib
创建永久生效的UDF函数
hive> create function base64en as 'udf.Base64Encrypt' using jar 'hdfs://localhost:9000/hive/lib/hive-udf-jar-with-dependencies.jar';
Added [/opt/hive/tmp/217fe48a-a64c-42d4-8696-e07c57e6ee13_resources/hive-udf-jar-with-dependencies.jar] to class path
Added resources: [hdfs://localhost:9000/hive/lib/hive-udf-jar-with-dependencies.jar]
OK
Time taken: 0.371 seconds
hive> create function base64de as 'udf.Base64Decrypt' using jar 'hdfs://localhost:9000/hive/lib/hive-udf-jar-with-dependencies.jar';
Added [/opt/hive/tmp/689060e4-4c5e-4126-a685-c9d008ea75cb_resources/hive-udf-jar-with-dependencies.jar] to class path
Added resources: [hdfs://localhost:9000/hive/lib/hive-udf-jar-with-dependencies.jar]
OK
Time taken: 2.665 seconds
检测:
hive> select base64en('hello');
Added [/opt/hive/tmp/580da8d8-3cfa-4b2f-a862-03c37fc66ff4_resources/hive-udf-jar-with-dependencies.jar] to class path
Added resources: [hdfs://localhost:9000/hive/lib/hive-udf-jar-with-dependencies.jar]
OK
aGVsbG8=
Time taken: 2.86 seconds, Fetched: 1 row(s)
hive> select base64de('aGVsbG8=');
Added [/opt/hive/tmp/580da8d8-3cfa-4b2f-a862-03c37fc66ff4_resources/hive-udf-jar-with-dependencies.jar] to class path
Added resources: [hdfs://localhost:9000/hive/lib/hive-udf-jar-with-dependencies.jar]
OK
hello
Time taken: 0.141 seconds, Fetched: 1 row(s)
检查MySQL中元数据
mysql> SELECT * FROM FUNCS;
+---------+-------------------+-------------+-------+-----------+-----------+------------+------------+
| FUNC_ID | CLASS_NAME | CREATE_TIME | DB_ID | FUNC_NAME | FUNC_TYPE | OWNER_NAME | OWNER_TYPE |
+---------+-------------------+-------------+-------+-----------+-----------+------------+------------+
| 1 | udf.Base64Encrypt | 1610011833 | 1 | base64en | 1 | NULL | USER |
| 6 | udf.Base64Decrypt | 1610011914 | 1 | base64de | 1 | NULL | USER |
+---------+-------------------+-------------+-------+-----------+-----------+------------+------------+
2 rows in set (0.00 sec)
可以看到函数的信息已经注册到元数据中了。