python-带有列表的JSON_normalize JSON文件包含字典(包括示例)

2023-08-10 15:21:58

这是我正在处理2条记录的示例json文件：

[{"Time":"2016-01-10",
"ID"
:13567,
"Content":{
    "Event":"UPDATE",
    "Id":{"EventID":"ABCDEFG"},
    "Story":[{
        "@ContentCat":"News",
        "Body":"Related Meeting Memo: Engagement with target firm for potential M&A.  Please be on call this weekend for news updates.",
        "BodyTextType":"PLAIN_TEXT",
        "DerivedId":{"Entity":[{"Id":"Amy","Score":70}, {"Id":"Jon","Score":70}]},
        "DerivedTopics":{"Topics":[
                            {"Id":"Meeting","Score":70},
                            {"Id":"Performance","Score":70},
                            {"Id":"Engagement","Score":100},
                            {"Id":"Salary","Score":70},
                            {"Id":"Career","Score":100}]
                        },
        "HotLevel":0,
        "LanguageString":"ENGLISH",
        "Metadata":{"ClassNum":50,
                    "Headline":"Attn: Weekend",
                    "WireId":2035,
                    "WireName":"IIS"},
        "Version":"Original"}
                ]},
"yyyymmdd":"20160110",
"month":201601},
{"Time":"2016-01-12",
"ID":13568,
"Content":{
    "Event":"DEAL",
    "Id":{"EventID":"ABCDEFG2"},
    "Story":[{
        "@ContentCat":"Details",
        "Body":"Test email contents",
        "BodyTextType":"PLAIN_TEXT",
        "DerivedId":{"Entity":[{"Id":"Bob","Score":100}, {"Id":"Jon","Score":70}, {"Id":"Jack","Score":60}]},
        "DerivedTopics":{"Topics":[
                            {"Id":"Meeting","Score":70},
                            {"Id":"Engagement","Score":100},
                            {"Id":"Salary","Score":70},
                            {"Id":"Career","Score":100}]
                        },
        "HotLevel":0,
        "LanguageString":"ENGLISH",
        "Metadata":{"ClassNum":70,
                    "Headline":"Attn: Weekend",
                    "WireId":2037,
                    "WireName":"IIS"},
        "Version":"Original"}
                ]},
"yyyymmdd":"20160112",
"month":201602}]

我正在尝试获取实体ID级别的数据帧(从记录1提取Amy和Jon以及从记录2提取Bob,Jon和Jack).

但是我早早就遇到了错误.到目前为止,这是我的代码,假设示例json被另存为sample.json：

data = json.load(open('sample.json'))
test = json_normalize(data, record_path=['Content', 'Story'])

导致此错误：

TypeError: string indices must be integers

我怀疑是因为Content.Story实际上是一个包含字典的列表,而不是字典本身.但是我不清楚如何真正克服这个问题？

编辑：为澄清起见,我最终尝试达到实体ID的级别(内容>故事>派生ID>实体> ID).在显示Content.Story代码示例只是为了说明我现在在弄清楚这一点的位置.

解决方法:

json_normalize(data,record_path = [[‘Content’,’Story’]])

那应该工作.

码农公寓

相关文章