Python编程:通过百度文字识别提取表格数据

百度文字识别文档:

https://ai.baidu.com/docs#/OCR-Python-SDK/top

安装sdk

pip install baidu-aip

先创建应用,得到appid

要识别的表格图片:

Python编程:通过百度文字识别提取表格数据

代码示例

from aip import AipOcr

""" 你的 APPID AK SK """
APP_ID = '你的 App ID'
API_KEY = '你的 Api Key'
SECRET_KEY = '你的 Secret Key'

client = AipOcr(APP_ID, API_KEY, SECRET_KEY)

with open("names.png", "rb") as f:
    image = f.read()

result = client.basicGeneral(image)
print(result)

识别结果:

{
    "log_id":3213553909522465362,
    "words_result_num":20,
    "words_result":[
        {
            "words":"表格1:"
        },
        {
            "words":"姓名"
        },
        {
            "words":"年龄"
        },
        {
            "words":"性别"
        },
        {
            "words":"李雷"
        },
        {
            "words":"20男"
        },
        {
            "words":"韩梅梅"
        },
        {
            "words":"23女"
        },
        {
            "words":"赵小三"
        },
        {
            "words":"25女"
        },
        {
            "words":"Table2."
        },
        {
            "words":"Name"
        },
        {
            "words":"ge"
        },
        {
            "words":"Gender"
        },
        {
            "words":"Tom"
        },
        {
            "words":"30 Male"
        },
        {
            "words":"Jack"
        },
        {
            "words":"33 Male"
        },
        {
            "words":"one"
        },
        {
            "words":"31Female"
        }
    ]
}

结果不太满意,年龄和性别被合在一起了

上一篇:不同设备查看端口光功率命令


下一篇:【百度地图API】建立全国银行位置查询系统(四)——如何利用百度地图的数据生成自己的标注