这段代码:
from itertools import groupby, count
L = [38, 98, 110, 111, 112, 120, 121, 898]
groups = groupby(L, key=lambda item, c=count():item-next(c))
tmp = [list(g) for k, g in groups]
取[38,98,110,111,112,120,121,898],按连续数字对它进行分组,并将它们与最终输出合并:
['38', '98', '110,112', '120,121', '898']
如何使用包含多列的列表列表来完成同样的操作,例如下面的列表,您可以按名称对其进行分组,然后合并第二列值,然后合并.
换句话说,这个数据:
L= [
['Italy','1','3']
['Italy','2','1'],
['Spain','4','2'],
['Spain','5','8'],
['Italy','3','10'],
['Spain','6','4'],
['France','5','3'],
['Spain','20','2']]
应该给出以下输出:
[['Italy','1-2-3','3-1-10'],
['France','5','3'],
['Spain','4-5-6','2-8-4'],
['Spain','20','2']]
更多的itertools更适合这项任务吗?
使用Python中的itertools / more-itertools将多列列表的项目组合并组合
解决方法:
这基本上是相同的分组技术,但它不使用itertools.count,而是使用枚举来生成索引.
首先,我们对数据进行排序,以便将给定国家/地区的所有项目组合在一起,并对数据进行排序.然后我们使用groupby为每个国家制作一个小组.然后我们在内部循环中使用groupby将每个国家/地区的连续数据组合在一起.最后,我们使用zip& .join将数据重新排列为所需的输出格式.
from itertools import groupby
from operator import itemgetter
lst = [
['Italy','1','3'],
['Italy','2','1'],
['Spain','4','2'],
['Spain','5','8'],
['Italy','3','10'],
['Spain','6','4'],
['France','5','3'],
['Spain','20','2'],
]
newlst = [[country] + ['-'.join(s) for s in zip(*[v[1][1:] for v in g])]
for country, u in groupby(sorted(lst), itemgetter(0))
for _, g in groupby(enumerate(u), lambda t: int(t[1][1]) - t[0])]
for row in newlst:
print(row)
产量
['France', '5', '3']
['Italy', '1-2-3', '3-1-10']
['Spain', '20', '2']
['Spain', '4-5-6', '2-8-4']
我承认lambda有点神秘;它可能更适合使用正确的def函数.我会在几分钟内补充一下.
使用更易读的键功能也是一样的.
def keyfunc(t):
# Unpack the index and data
i, data = t
# Get the 2nd column from the data, as an integer
val = int(data[1])
# The difference between val & i is constant in a consecutive group
return val - i
newlst = [[country] + ['-'.join(s) for s in zip(*[v[1][1:] for v in g])]
for country, u in groupby(sorted(lst), itemgetter(0))
for _, g in groupby(enumerate(u), keyfunc)]