该问题是先前一个问题的扩展:rebuild python array based on common elements
-但又有所不同,足以提出一个新问题:
我已经为此苦了一段时间.我的数据是来自sql查询的字典数组.数组中的每个元素都代表一个货件,并且基于键具有共同的值.
data = [
{"CustName":"customer1", "PartNum":"part1", "delKey":"0001", "qty":"10", "memo":"blah1"},
{"CustName":"customer1", "PartNum":"part1", "delKey":"0002", "qty":"10", "memo":"blah2"},
{"CustName":"customer1", "PartNum":"part1", "delKey":"0003", "qty":"10", "memo":"blah3"},
{"CustName":"customer2", "PartNum":"part3", "delKey":"0004", "qty":"20", "memo":"blah4"},
{"CustName":"customer2", "PartNum":"part3", "delKey":"0005", "qty":"20", "memo":"blah5"},
{"CustName":"customer3", "PartNum":"partXYZ", "delKey":"0006", "qty":"50", "memo":"blah6"},
{"CustName":"customer3", "PartNum":"partABC", "delKey":"0007", "qty":"100", "memo":"blah7"}]
我想要的输出根据特定键分组
dataOut = [
{"CustName":"customer1", "Parts":[
{"PartNum":"part1", "deliveries":[
{"delKey":"0001", "qty":"10", "memo":"blah1"},
{"delKey":"0002", "qty":"10", "memo":"blah2"},
{"delKey":"0003", "qty":"10", "memo":"blah3"}]}]},
{"CustName":"customer2", "Parts":[
{"PartNum":"part3", "deliveries":[
{"delKey":"0004", "qty":"20", "memo":"blah4"},
{"delKey":"0005", "qty":"20", "memo":"blah5"}]}]},
{"CustName":"customer3", "Parts":[
{"PartNum":"partXYZ", "deliveries":[
{"delKey":"0006", "qty":"50", "memo":"blah6"}]},
{"PartNum":"partABC", "deliveries":[
{"delKey":"0007", "qty":"100", "memo":"blah7"}]}]}]
我可以使用上一个问题提供的defaultdict和list comprehension进行单个级别的分组,并稍加修改
d = defaultdict(list)
for item in data:
d[item['CustName']].append(item)
print([{'CustName': key, 'parts': value} for key, value in d.items()])
但是我似乎无法获得输出数组中的第二级-PartNum键的分组.通过一些研究,我认为我需要做的是使用defaultdict作为外部`defaultdict’的类型,如下所示:
d = defaultdict(defaultdict(list))
这会引发错误,因为defaultdict返回了一个函数,所以我需要使用lambda(是吗?)
d = defaultdict(lambda:defaultdict(list))
for item in data:
d[item['CustName']].append(item) <----this?
我的问题是如何“访问”循环中的第二级数组,并告诉“内部” defaultdict对(PartNum)进行分组?数据来自数据库程序员,并且项目不断发展以添加越来越多的数据(键),因此我希望这种解决方案尽可能通用,以防丢掉更多数据.我希望能够根据需要执行的级别“链接”默认值.我正在学习,所以我一直在努力了解lambda和defaultdict类型的基础知识以及从何而来.
解决方法:
使用@Pynchia建议的groupby并使用@hege_hegedus建议的对无序数据进行排序:
from itertools import groupby
dataOut = []
dataSorted = sorted(data, key=lambda x: (x["CustName"], x["PartNum"]))
for cust_name, cust_group in groupby(dataSorted, lambda x: x["CustName"]):
dataOut.append({
"CustName": cust_name,
"Parts": [],
})
for part_num, part_group in groupby(cust_group, lambda x: x["PartNum"]):
dataOut[-1]["Parts"].append({
"PartNum": part_num,
"deliveries": [{
"delKey": delivery["delKey"],
"memo": delivery["memo"],
"qty": delivery["qty"],
} for delivery in part_group]
})
如果您查看第二个for循环,这将有望回答您有关在循环中访问第二级数组的问题.