在hive中对于json的数据格式,可以使用get_json_object或json_tuple先解析然后查询。
也可以直接在hive中创建json格式的表结构,这样就可以直接查询,实战如下(hive-2.3.0版本):
1. 准备数据源
将以下内容保存为test.txt
{"student":{"name":"king","age":11,"sex":"M"},"class":{"book":"语文","level":2,"score":80},"teacher":{"name":"t1","class":"语文"}}
{"student":{"name":"wang","age":12,"sex":"M"},"class":{"book":"语文","level":2,"score":80},"teacher":{"name":"t1","class":"语文"}}
{"student":{"name":"test","age":13,"sex":"M"},"class":{"book":"语文","level":2,"score":80},"teacher":{"name":"t1","class":"语文"}}
{"student":{"name":"test2","age":14,"sex":"M"},"class":{"book":"语文","level":2,"score":80},"teacher":{"name":"t1","class":"语文"}}
{"student":{"name":"test3","age":15,"sex":"M"},"class":{"book":"语文","level":2,"score":80},"teacher":{"name":"t1","class":"语文"}}
{"student":{"name":"test4","age":16,"sex":"M"},"class":{"book":"语文","level":2,"score":80},"teacher":{"name":"t1","class":"语文"}}
2. 创建hive表
注意serde格式大小写不能写错: org.apache.hive.hcatalog.data.JsonSerDe
create external table if not exists dw_stg.student(
student map<string,string> comment "学生信息",
class map<string,string> comment "课程信息",
teacher map<string,string> comment "授课老师信息"
)
comment "学生课程信息"
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
stored as textfile;
3. 上传数据
将test.txt上传到刚才创建的student目录
hdfs dfs -put test.txt /user/hive/warehouse/dw_stg.db/student/
4. 使用hql查询
查询所有信息记录:
查询字段student信息
查询字段class信息
查询学生姓名为test4的所有记录
取json串中某个值可以使用 student['name'] ,如下:
select
student['name'] as stuName,
class['book'] as cls_book,
class['score'] as cls_score,
teacher['name'] as tech_name
from student
where student['name'] = 'test4';
总体看起来,比使用get_json_object或json_tuple解析方便多了。