参见英文答案 > itertools product speed up 6个
我知道itertools.product用于迭代关键字的几个维度的列表.例如,如果我有这个:
categories = [
[ 'A', 'B', 'C', 'D'],
[ 'E', 'F', 'G', 'H'],
[ 'I', 'J', 'K', 'L']
]
我使用itertools.product(),我有类似的东西:
>>> [ x for x in itertools.product(*categories) ]
('A', 'E', 'I'),
('A', 'E', 'J'),
('A', 'E', 'K'),
('A', 'E', 'L'),
('A', 'F', 'I'),
('A', 'F', 'J'),
# and so on...
是否有一种与numpy数组做同样事情的等效,直接的方法?
解决方法:
这个问题已被问过几次:
Using numpy to build an array of all combinations of two arrays
第一个链接有一个工作numpy解决方案,声称比itertools快几倍,但没有提供基准测试.此代码由名为pv的用户编写.如果您觉得有用,请点击链接并支持他的回答:
import numpy as np
def cartesian(arrays, out=None):
"""
Generate a cartesian product of input arrays.
Parameters
----------
arrays : list of array-like
1-D arrays to form the cartesian product of.
out : ndarray
Array to place the cartesian product in.
Returns
-------
out : ndarray
2-D array of shape (M, len(arrays)) containing cartesian products
formed of input arrays.
Examples
--------
>>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
array([[1, 4, 6],
[1, 4, 7],
[1, 5, 6],
[1, 5, 7],
[2, 4, 6],
[2, 4, 7],
[2, 5, 6],
[2, 5, 7],
[3, 4, 6],
[3, 4, 7],
[3, 5, 6],
[3, 5, 7]])
"""
arrays = [np.asarray(x) for x in arrays]
dtype = arrays[0].dtype
n = np.prod([x.size for x in arrays])
if out is None:
out = np.zeros([n, len(arrays)], dtype=dtype)
m = n / arrays[0].size
out[:,0] = np.repeat(arrays[0], m)
if arrays[1:]:
cartesian(arrays[1:], out=out[0:m,1:])
for j in xrange(1, arrays[0].size):
out[j*m:(j+1)*m,1:] = out[0:m,1:]
return out
然而,在同一篇文章中,Alex Martelli–他是SO的伟大Python大师 – 写道,itertools是完成这项任务的最快方法.所以这是一个快速的基准,证明了亚历克斯的话.
import numpy as np
import time
import itertools
def cartesian(arrays, out=None):
...
def test_numpy(arrays):
for res in cartesian(arrays):
pass
def test_itertools(arrays):
for res in itertools.product(*arrays):
pass
def main():
arrays = [np.fromiter(range(100), dtype=int), np.fromiter(range(100, 200), dtype=int)]
start = time.clock()
for _ in range(100):
test_numpy(arrays)
print(time.clock() - start)
start = time.clock()
for _ in range(100):
test_itertools(arrays)
print(time.clock() - start)
if __name__ == '__main__':
main()
输出:
0.421036
0.06742
所以,你绝对应该使用itertools.