HIVE JSON解析

        {"name":"zhangsan","age":"22","timeStamp":"978300760","id":"1"}{"name":"lisi","age":"21","timeStamp":"978300790","id":"2"}{"name":"wangwu","age":"22","timeStamp":"978300780","id":"3"}

1、使用函数get_json_object(string json_string, string path)返回值:String说明:解析json的字符串json_string,返回path指定的内容。如果输入的json字符串无效,那么返回NUll,函数每次只能返回一个数据项。hive (default)> select get_json_object('{"name":"zhangsan","age":"22","timeStamp":"978300760","id":"1"}','$.name');OK_c0zhangsanTime taken: 0.176 seconds, Fetched: 1 row(s)hive (default)> create database test;OKTime taken: 0.045 secondshive (default)> use test;OKTime taken: 0.021 secondshive (test)> create table json(data string);OKTime taken: 0.084 secondshive (test)> load data local inpath '/opt/module/hive/stu_json.txt' into table json; Loading data to table test.jsonTable test.json stats: [numFiles=1, totalSize=187]OKTime taken: 0.296 secondshive (test)> select * from json;OKjson.data{"name":"zhangsan","age":"22","timeStamp":"978300760","id":"1"}{"name":"lisi","age":"21","timeStamp":"978300790","id":"2"}{"name":"wangwu","age":"22","timeStamp":"978300780","id":"3"}Time taken: 0.055 seconds, Fetched: 3 row(s)hive (test)> select get_json_object(data,'$.name') as name from json;OKnamezhangsanlisiwangwuTime taken: 0.074 seconds, Fetched: 3 row(s)

2、json_tuple(jsonStr, k1, k2, ...)参数为一组键k1,k2。。。和json字符串。返回值的元组。可以在一次调用中输入多次键。hive (test)> select json_tuple(data,'name','age') from json;OKc0 c1zhangsan 22lisi 21wangwu 22Time taken: 0.08 seconds, Fetched: 3 row(s)

3、UDF解析JSON

查阅hive的UDF函数指南可知,虽然udf中的get_json_object和json_tuple能对json解析,但有时候没法实现复杂业务扩展,同时也没有UDF对JSON数组的解析,因此,很多时候需要自己实现解析JSON的UDF,写UDF中如果使用大量开源库(例如fastjson或gson)则会直接导致生成的jar依赖较大,并不是最理想的,本博文查阅了hive源码,发现hive内部用JSONObject和JSONArray解析json对象,则我们同样可以使用这两个对象解析即可,本文旨在提供一个可调用json对象和数组的解析udf,并提供复杂json的解析,而不是替代get_json_object等内置函数。 

上一篇:为了生活 1DAY


下一篇:日志分析工具Log Parser介绍