我希望使用功能和迭代工具将长数据集转换为宽数据集,我的理解是这是groupby的任务.我以前曾经问了几个关于这个的问题,并且认为我有这个问题,但在这种情况下并不完全,这应该更简单:
> Python functional transformation of JSON list of dictionaries from long to wide
> Correct use of a fold or reduce function to long-to-wide data in python or javascript?
这是我的数据:
from itertools import groupby
from operator import itemgetter
from pprint import pprint
>>> longdat=[
{"id":"cat", "name" : "best meower", "value": 10},
{"id":"cat", "name" : "cleanest paws", "value": 8},
{"id":"cat", "name" : "fanciest", "value": 9},
{"id":"dog", "name" : "smelly", "value": 9},
{"id":"dog", "name" : "dumb", "value": 9},
]
这是我想要的格式:
>>> widedat=[
{"id":"cat", "best meower": 10, "cleanest paws": 8, "fanciest": 9},
{"id":"dog", "smelly": 9, "dumb": 9},
]
以下是我失败的尝试:
# WRONG
>>> gh = groupby(sorted(longdat,key=id),itemgetter('id'))
>>> list(gh)
[('cat', <itertools._grouper object at 0x5d0b550>), ('dog', <itertools._grouper object at 0x5d0b210>)]
好的,需要从迭代器中获取第二个项目,这是公平的.
#WRONG
>>> gh = groupby(sorted(longdat,key=id),itemgetter('id'))
>>> for g,v in gh:
... {"id":i["id"], i["name"]:i["value"] for i in v}
^
SyntaxError: invalid syntax
很奇怪,它看起来很有效.让我们解开那些循环以确保.
#WRONG
gb = groupby(sorted(longdat,key=id),itemgetter('id'))
data = {}
for g,v in gb:
data[g] = {}
for i in v:
data[g] = i
#WRONG
gb = groupby(sorted(longdat,key=id),itemgetter('id'))
data = []
for g,v in gb:
for i in v:
data[g] = i
啊!好的,让我们回到单行表格
#WRONG
>>> gb = groupby(sorted(longdat,key=id),itemgetter('id'))
>>> [{"id":g, i["name"]:i["value"]} for i in k for g,k in gb]
[]
什么?为什么空?!让我们再次基本上解开这个:
#WRONG
gb = groupby(sorted(longdat,key=id),itemgetter('id'))
for g,k in gb:
for i in k:
print(g, i["name"],i["value"])
cat best meower 10
cat fanciest 9
cat cleanest paws 8
dog smelly 9
dog dumb 9
现在,最后一个显然是最糟糕的 – 很明显我的数据基本上就在它开始的地方,好像我甚至没有组合.
为什么这不起作用,我怎么能以我正在寻找的格式得到它?
此外,是否可能完全迭代地对此进行说明,以便我可以这样做
>>> result[0]
{"id":"cat", "best meower": 10, "cleanest paws": 8, "fanciest": 9}
并且只获得第一个结果而不处理整个列表(除了必须查看/ all / where id ==’cat’?)
解决方法:
传递给排序函数的键函数是id
.它将返回所有列表项的所有不同值.
它应该是itemgetter(‘id’)或lambda x:x.id.
>>> id(longdat[0])
41859624L
>>> id(longdat[1])
41860488L
>>> id(longdat[2])
41860200L
>>> itemgetter('id')(longdat[1])
'cat'
>>> itemgetter('id')(longdat[2])
'cat'
>>> itemgetter('id')(longdat[3])
'cat'
from itertools import groupby
from operator import itemgetter
longdat = [
{"id":"cat", "name" : "best meower", "value": 10},
{"id":"cat", "name" : "cleanest paws", "value": 8},
{"id":"cat", "name" : "fanciest", "value": 9},
{"id":"dog", "name" : "smelly", "value": 9},
{"id":"dog", "name" : "dumb", "value": 9},
]
getid = itemgetter('id')
result = [
dict([['id', key]] + [[d['name'], d['value']] for d in grp])
for key, grp in groupby(sorted(longdat, key=getid), key=getid)
]
print(result)
输出:
[{'best meower': 10, 'fanciest': 9, 'id': 'cat', 'cleanest paws': 8},
{'dumb': 9, 'smelly': 9, 'id': 'dog'}]