JSON文件解析

2021-12-02 08:22:27

JSON文件解析

最近使用百度文字识别功能来抓取图片内的文字和位置，

百度把识别结果以JSON的形式返了回来，内容如下：

{‘words_result‘: [{‘words‘: ‘勤道天‘, ‘location‘: {‘top‘: 190, ‘left‘: 135, ‘width‘: 499, ‘height‘: 136}, ‘chars‘: [{‘char‘: ‘勤‘, ‘location‘: {‘top‘: 190, ‘left‘: 135, ‘width‘: 81, ‘height‘: 136}}, {‘char‘: ‘道‘, ‘location‘: {‘top‘: 190, ‘left‘: 385, ‘width‘: 125, ‘height‘: 136}}, {‘char‘: ‘天‘, ‘location‘: {‘top‘: 190, ‘left‘: 509, ‘width‘: 82, ‘height‘: 136}}]}, {‘words‘: ‘刚欲平川智海‘, ‘location‘: {‘top‘: 337, ‘left‘: 161, ‘width‘: 471, ‘height‘: 113}, ‘chars‘: [{‘char‘: ‘刚‘, ‘location‘: {‘top‘: 337, ‘left‘: 230, ‘width‘: 51, ‘height‘: 63}}, {‘char‘: ‘欲‘, ‘location‘: {‘top‘: 337, ‘left‘: 265, ‘width‘: 56, ‘height‘: 62}}, {‘char‘: ‘平‘, ‘location‘: {‘top‘: 347, ‘left‘: 335, ‘width‘: 67, ‘height‘: 76}}, {‘char‘: ‘川‘, ‘location‘: {‘top‘: 384, ‘left‘: 501, ‘width‘: 41, ‘height‘: 66}}, {‘char‘: ‘智‘, ‘location‘: {‘top‘: 381, ‘left‘: 541, ‘width‘: 39, ‘height‘: 68}}, {‘char‘: ‘海‘, ‘location‘: {‘top‘: 378, ‘left‘: 579, ‘width‘: 39, ‘height‘: 68}}]}, {‘words‘: ‘政治家‘, ‘location‘: {‘top‘: 348, ‘left‘: 186, ‘width‘: 16, ‘height‘: 70}, ‘chars‘: [{‘char‘: ‘政‘, ‘location‘: {‘top‘: 374, ‘left‘: 186, ‘width‘: 16, ‘height‘: 10}}, {‘char‘: ‘治‘, ‘location‘: {‘top‘: 388, ‘left‘: 186, ‘width‘: 16, ‘height‘: 10}}, {‘char‘: ‘家‘, ‘location‘: {‘top‘: 402, ‘left‘: 186, ‘width‘: 16, ‘height‘: 10}}]}, {‘words‘: ‘任意2套省20%‘, ‘location‘: {‘top‘: 704, ‘left‘: 287, ‘width‘: 468, ‘height‘: 76}, ‘chars‘: [{‘char‘: ‘任‘, ‘location‘: {‘top‘: 704, ‘left‘: 287, ‘width‘: 51, ‘height‘: 76}}, {‘char‘: ‘意‘, ‘location‘: {‘top‘: 704, ‘left‘: 363, ‘width‘: 51, ‘height‘: 76}}, {‘char‘: ‘2‘, ‘location‘: {‘top‘: 704, ‘left‘: 433, ‘width‘: 42, ‘height‘: 76}}, {‘char‘: ‘套‘, ‘location‘: {‘top‘: 704, ‘left‘: 466, ‘width‘: 51, ‘height‘: 76}}, {‘char‘: ‘省‘, ‘location‘: {‘top‘: 704, ‘left‘: 545, ‘width‘: 50, ‘height‘: 76}}, {‘char‘: ‘2‘, ‘location‘: {‘top‘: 704, ‘left‘: 614, ‘width‘: 42, ‘height‘: 76}}, {‘char‘: ‘0‘, ‘location‘: {‘top‘: 704, ‘left‘: 639, ‘width‘: 42, ‘height‘: 76}}, {‘char‘: ‘%‘, ‘location‘: {‘top‘: 704, ‘left‘: 690, ‘width‘: 42, ‘height‘: 76}}]}], ‘log_id‘: 1380934582706110464, ‘words_result_num‘: 4}

看着挺乱的是吧，如果不了解JSON文件结构还真是有点头晕呐。

一、什么是JSON文件

如上，内容其实就是一堆字符串。当然它是有结构的，可以用来存储数据。

二、结构分析

学过python的同学都知道“{}“号表示字典（也叫对象），”[]“号是列表（也叫数组）。

仔细看上面，JSON就是通过这两种格式的组合来存储各种复杂数据的。

1、字典

字典就是{‘键名’:键值} 的这么一种形式存数据

键名必须用引号包起来，是个字符串。（单引号双引号都行）

键值可以是任何形式（字符串、数值、列表、字典...）。

它们之间通过冒号”:“关联成一对。形如{”key“:vaule}

如果字典有多个元素，使用逗号”,“隔开。如{"key1":vaule,"key2":vaule,"key3":vaule}

2、列表

列表就是[xx,xx,xx]的形式，元素之间用逗号”,“分割

python里列表甚至可以存储不同类型的元素，如["a","b","c",1,2,3]

3、例子分析：

看最上边的例子，最外层就是一个字典{‘words_result‘: [XXX,...] , ‘log_id‘: 1380934582706110464 , ‘words_result_num‘: 4}

此字典有三个元素（即键值对），为啥是三个？别忘了元素间是用逗号”,“隔开的。嗯......数完没？

第一个元素‘words_result‘: [XXX,...]，键名为‘words_result‘，值是一个列表[XXX]

第二个元素‘log_id‘: 1380934582706110464，键名为‘log_id‘，值是个数值

第三个元素‘words_result_num‘: 4，键名为‘words_result_num‘，值也是个数值

我们需要的数据（文字及位置信息）都在第一个元素的列表[]里了。

来看看这个[XXX]列表里都有啥：

[{‘words‘: ‘勤道天‘, ‘location‘: {‘top‘: 190, ‘left‘: 135, ‘width‘: 499, ‘height‘: 136}, ‘chars‘: [{‘char‘: ‘勤‘, ‘location‘: {‘top‘: 190, ‘left‘: 135, ‘width‘: 81, ‘height‘: 136}}, {‘char‘: ‘道‘, ‘location‘: {‘top‘: 190, ‘left‘: 385, ‘width‘: 125, ‘height‘: 136}}, {‘char‘: ‘天‘, ‘location‘: {‘top‘: 190, ‘left‘: 509, ‘width‘: 82, ‘height‘: 136}}]}, {‘words‘: ‘刚欲平川智海‘, ‘location‘: {‘top‘: 337, ‘left‘: 161, ‘width‘: 471, ‘height‘: 113}, ‘chars‘: [{‘char‘: ‘刚‘, ‘location‘: {‘top‘: 337, ‘left‘: 230, ‘width‘: 51, ‘height‘: 63}}, {‘char‘: ‘欲‘, ‘location‘: {‘top‘: 337, ‘left‘: 265, ‘width‘: 56, ‘height‘: 62}}, {‘char‘: ‘平‘, ‘location‘: {‘top‘: 347, ‘left‘: 335, ‘width‘: 67, ‘height‘: 76}}, {‘char‘: ‘川‘, ‘location‘: {‘top‘: 384, ‘left‘: 501, ‘width‘: 41, ‘height‘: 66}}, {‘char‘: ‘智‘, ‘location‘: {‘top‘: 381, ‘left‘: 541, ‘width‘: 39, ‘height‘: 68}}, {‘char‘: ‘海‘, ‘location‘: {‘top‘: 378, ‘left‘: 579, ‘width‘: 39, ‘height‘: 68}}]}, {‘words‘: ‘政治家‘, ‘location‘: {‘top‘: 348, ‘left‘: 186, ‘width‘: 16, ‘height‘: 70}, ‘chars‘: [{‘char‘: ‘政‘, ‘location‘: {‘top‘: 374, ‘left‘: 186, ‘width‘: 16, ‘height‘: 10}}, {‘char‘: ‘治‘, ‘location‘: {‘top‘: 388, ‘left‘: 186, ‘width‘: 16, ‘height‘: 10}}, {‘char‘: ‘家‘, ‘location‘: {‘top‘: 402, ‘left‘: 186, ‘width‘: 16, ‘height‘: 10}}]}, {‘words‘: ‘任意2套省20%‘, ‘location‘: {‘top‘: 704, ‘left‘: 287, ‘width‘: 468, ‘height‘: 76}, ‘chars‘: [{‘char‘: ‘任‘, ‘location‘: {‘top‘: 704, ‘left‘: 287, ‘width‘: 51, ‘height‘: 76}}, {‘char‘: ‘意‘, ‘location‘: {‘top‘: 704, ‘left‘: 363, ‘width‘: 51, ‘height‘: 76}}, {‘char‘: ‘2‘, ‘location‘: {‘top‘: 704, ‘left‘: 433, ‘width‘: 42, ‘height‘: 76}}, {‘char‘: ‘套‘, ‘location‘: {‘top‘: 704, ‘left‘: 466, ‘width‘: 51, ‘height‘: 76}}, {‘char‘: ‘省‘, ‘location‘: {‘top‘: 704, ‘left‘: 545, ‘width‘: 50, ‘height‘: 76}}, {‘char‘: ‘2‘, ‘location‘: {‘top‘: 704, ‘left‘: 614, ‘width‘: 42, ‘height‘: 76}}, {‘char‘: ‘0‘, ‘location‘: {‘top‘: 704, ‘left‘: 639, ‘width‘: 42, ‘height‘: 76}}, {‘char‘: ‘%‘, ‘location‘: {‘top‘: 704, ‘left‘: 690, ‘width‘: 42, ‘height‘: 76}}]}]

开头就见到花括号”{“，没错，看来列表里边存了字典。

这里有个技巧，使用notepad++打开JSON文件，鼠标点到第一个花括号”{“上，与其一对的花括号”}“就会红色高亮显示。

我们大致观察一下，不难发现以下结构：[{},{},{},...]

列表里存了N个字典元素，而且每个字典的结构相同。

取出第一字典，再看看它的结构：

{‘words‘: ‘勤道天‘, ‘location‘: {‘top‘: 190, ‘left‘: 135, ‘width‘: 499, ‘height‘: 136}, ‘chars‘: [{‘char‘: ‘勤‘, ‘location‘: {‘top‘: 190, ‘left‘: 135, ‘width‘: 81, ‘height‘: 136}}, {‘char‘: ‘道‘, ‘location‘: {‘top‘: 190, ‘left‘: 385, ‘width‘: 125, ‘height‘: 136}}, {‘char‘: ‘天‘, ‘location‘: {‘top‘: 190, ‘left‘: 509, ‘width‘: 82, ‘height‘: 136}}]}

它的结构是：{‘words‘: 字符串, ‘location‘: 字典,chars:列表}

ok,你看出来了吗？至此我想大家应该已经会分析JSON的结构了吧。

三、取出数据

非常简单，都是使用索引器来取出值。

1、取列表元素

如一个列表list=[a,b,c]

想取第一个元素a,就这样list[0]

2、取字典元素

如一个字典dic={"words":"锅大侠","age":100,"sex":"unknown"}

取出年龄100，就这样dic[‘age‘]

3、实战演练

需求：取出所有的单个文字和位置，打印到txt文件里。

注：百度的返回结果是名为response的对象，通过response.json()方法直接取得JSON内容。

① 拿到结果列表（即通过键名”words_result“从字典中取出值，值是列表类型）

resultList=response.json()[‘words_result‘]

② 遍历列表，取出每个数组元素。只要字的列表

    for item in resultList:
        chars=item[‘chars‘]               #字列表

③ 遍历字列表，取出字和位置信息

        for item2 in chars:
            char=item2[‘char‘]   		  #字
            location=item2[‘location‘]
            top = location[‘top‘]         #上
            left = location[‘left‘]       #左
            width = location[‘width‘]     #宽
            height = location[‘height‘]   #高

④ 输出效果

序号   内容
 
1      勤
       宽度：81     高度：136
       左间距：135   右间距：190
 
2      道
       宽度：125     高度：136
       左间距：385   右间距：190
 
3      天
       宽度：82     高度：136
       左间距：509   右间距：190
 
4      刚
       宽度：51     高度：63
       左间距：230   右间距：337
 
5      欲
       宽度：56     高度：62
       左间距：265   右间距：337

......

JSON文件解析

码农公寓

JSON文件解析

一、什么是JSON文件

二、结构分析

1、字典

2、列表

3、例子分析：

三、取出数据

1、取列表元素

2、取字典元素

3、实战演练

相关文章