spark处理嵌套json

json文件数据如下:
{“avg_orders_count”: [{“count”: 1.0, “days”: 3}, {“count”: 0.6, “days”: 5}, {“count”: 0.3, “days”: 10}, {“count”: 0.2, “days”: 15}, {“count”: 0.1, “days”: 30}, {“count”: 0.066, “days”: 45}, {“count”: 0.066, “days”: 60}, {“count”: 0.053, “days”: 75}, {“count”: 0.044, “days”: 90}], “m_hotel_id”: “92500636”}
{“avg_orders_count”: [{“count”: 0.666, “days”: 3}, {“count”: 0.4, “days”: 5}, {“count”: 0.4, “days”: 10}, {“count”: 0.266, “days”: 15}, {“count”: 0.33, “days”: 30}, {“count”: 0.466, “days”: 45}, {“count”: 0.583, “days”: 60}, {“count”: 0.68, “days”: 75}, {“count”: 0.6111, “days”: 90}], “m_hotel_id”: “92409831”}
spark读json文件:

from pyspark.sql import SparkSession, Row, functions
session = SparkSession.builder.appName("sort").getOrCreate()
data = session.read.json('test1')
data.head()

spark处理嵌套json

mydata = data.select(explode(data.avg_orders_count), data.m_hotel_id).toDF('my_count', 'id')
mydata.head()

spark处理嵌套json

mydata = mydata.select(mydata.id, 'my_count.days', 'my_count.count')
mydata.show()

 

spark处理嵌套json
这样就把json展开了,可以做自己想做的操作了~

 

 

https://blog.csdn.net/u013215956/article/details/86232425?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2~all~first_rank_v2~rank_v25-1-86232425.nonecase&utm_term=spark%20%E8%AF%BB%E5%8F%96%E5%B5%8C%E5%A5%97%E7%9A%84json&spm=1000.2123.3001.4430

上一篇:jsx语法


下一篇:(二)搭建虚拟机环境