我在我的应用程序中使用scipy.sparse,并希望进行一些性能测试.为了做到这一点,我需要创建一个大型的稀疏矩阵(然后将在我的应用程序中使用它).只要矩阵很小,我就可以使用以下命令创建它
import scipy.sparse as sp
a = sp.rand(1000,1000,0.01)
这将产生1000 x 1000矩阵,其中包含10.000个非零条目(合理的密度意味着每行大约10个非零条目)
问题是当我尝试创建一个较大的矩阵时,例如,一个100.000 x 100.000矩阵(我之前已经处理过较大的矩阵),我运行
import scipy.sparse as sp
N = 100000
d = 0.0001
a = sp.rand(N, N, d)
这应该导致带有10万乘100.000矩阵和100万个非零条目(在可能的范围内),我收到一条错误消息:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
sp.rand(100000,100000,0.0000001)
File "C:\Python27\lib\site-packages\scipy\sparse\construct.py", line 723, in rand
j = random_state.randint(mn)
File "mtrand.pyx", line 935, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:10327)
OverflowError: Python int too large to convert to C long
这是我无法消除的一些令人讨厌的内部错误.
我知道我可以通过创建一百n×n矩阵然后将它们堆叠在一起来创建10×n×10 * n矩阵,但是,我认为scipy.sparse应该能够处理大型稀疏矩阵(I再说一遍,100k x 100k绝对不是很大,scipy不仅仅是舒适地处理具有几百万行的矩阵.我想念什么吗?
解决方法:
在不深入探讨问题的情况下,您应该确保在Linux平台上使用基于64位体系结构的64位构建.在那里,本地“长”数据类型为64位大小(我相信与Windows相反).
供参考,请参阅这些表:
> http://www.unix.org/whitepapers/64bit.html(-> long在LP64上为64位)
> http://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models
编辑:
也许以前我还不够明确-在64位Windows上,经典的本机“长”数据类型为32位大小(另请参见问题this).在您的情况下,这可能是个问题.也就是说,当您将平台更改为Linux时,您的代码可能仅适用.我不能绝对确定地说这句话,因为它实际上取决于numpy / scipy C源中使用的是哪种本机数据类型(当然,Windows上有64位数据类型可用,通常使用编译器指令执行平台案例分析) ,然后通过宏选择适当的类型-我无法想象它们偶然使用了32位数据类型).
编辑2:
我可以提供三个支持我的假设的数据样本.
Debian存储库中的Debian 64位,Python 2.7.3和SciPy 0.10.1二进制文件:
Python 2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy; print scipy.__version__; import scipy.sparse as s; s.rand(100000, 100000, 0.0001).shape
0.10.1
(100000, 100000)
Windows 7 64位,32位Python构建,32位SciPy 0.10.1构建,均来自ActivePython:
ActivePython 2.7.5.6 (ActiveState Software Inc.) based on
Python 2.7.5 (default, Sep 16 2013, 23:16:52) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy; print scipy.__version__; import scipy.sparse as s; s.rand(100000, 100000, 0.0001).shape
0.10.1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\user\AppData\Roaming\Python\Python27\site-packages\scipy\sparse\construct.py", line 426, in rand
raise ValueError(msg % np.iinfo(tp).max)
ValueError: Trying to generate a random sparse matrix such as the product of dimensions is
greater than 2147483647 - this is not supported on this machine
Windows 7 64位,64位ActivePython构建,64位SciPy 0.15.1构建(来自Gohlke,针对MKL构建):
ActivePython 3.4.1.0 (ActiveState Software Inc.) based on
Python 3.4.1 (default, Aug 7 2014, 13:09:27) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy; scipy.__version__; import scipy.sparse as s; s.rand(100000, 100000, 0.0001).shape
'0.15.1'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python34\lib\site-packages\scipy\sparse\construct.py", line 723, in rand
j = random_state.randint(mn)
File "mtrand.pyx", line 935, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:10327)
OverflowError: Python int too large to convert to C long