我正在使用3维numpy数组,最终将在其上执行PCA.我首先将3-D数组展平为2-D,以便可以计算协方差(然后是特征值和特征向量).
在计算协方差矩阵时,我使用numpy.cov与numpy.dot得出了不同的结果.如果我的二维数组是(5,9),我想得到一个5×5(即NxN)协方差矩阵.这是我使用numpy.dot获得的.使用numpy.cov,我得到的协方差矩阵为9×9.这与我所需的形状不符,但说实话,我不知道哪一个是正确的.在我研究的示例中,我已经看到了两种用于计算协方差的方法.
如果我通过numpy.linalg.eig计算携带numpy.dot vs. numpy.cov,我显然会得到不同的答案(全部打印在示例输出中).因此,在这一点上,我对于哪种方法正确或我可能在哪里出错感到非常困惑.
这是带有输出的测试代码.谢谢你的帮助.
import numpy as np
a = np.random.random(((5,3,3))); # example of what real input will look like
# create 2D flattened version of 3D input array
d1,d2,d3 = a.shape
b = np.zeros([d1,d2*d3])
for i in range(len(a)):
b[i] = a[i].flatten()
print "shape of 3D array: ", a.shape
print "shape of flattened 2D array: ", b.shape, "\n"
print "flattened 2D array:\n", b, "\n"
# mean-center the flattened array
b -= np.mean(b, axis=0)
# calculate the covariance matrix of the flattened array
covar1 = np.cov(b, rowvar=0) # this makes a 9x9 array
covar2 = np.dot(b, b.T) # this makes a 5x5 array
print "covariance via numpy.cov:\n", covar1, "\n"
print "covariance via numpy.dot:\n", covar2, "\n"
# calculate eigenvalues and eigenvectors
eval1, evec1 = np.linalg.eig(covar1)
eval2, evec2 = np.linalg.eig(covar2)
print "eigenvalues via numpy.cov covariance matrix:\n", eval1, "\n"
print "eigenvectors via numpy.cov covariance matrix:\n", evec1, "\n"
print "eigenvalues via numpy.dot covariance matrix:\n", eval2, "\n"
print "eigenvectors via numpy.dot covariance matrix:\n", evec2, "\n"
======= Output =======
shape of 3D array: (5, 3, 3)
shape of flattened 2D array: (5, 9)
flattened 2D array:
[[ 0.94964127 0.71015973 0.80994774 0.49727821 0.38270025 0.89136202
0.19876615 0.72461047 0.43646456]
[ 0.00502329 0.70593521 0.44001479 0.97576486 0.37261107 0.6318449
0.86301405 0.21820704 0.91507706]
[ 0.75411747 0.98462782 0.65109776 0.1083943 0.12867679 0.63172813
0.85803498 0.89507165 0.62291308]
[ 0.88589874 0.02797773 0.6421045 0.17255432 0.5713524 0.28589519
0.55888288 0.7961657 0.4453764 ]
[ 0.85774793 0.19511453 0.92167001 0.27340606 0.41849435 0.98349776
0.19354437 0.2974041 0.52064868]]
covariance via numpy.cov():
[[ 0.15180806 -0.04977355 0.05733885 -0.11340765 0.00840097 0.01461576
-0.08596712 0.07512366 -0.07509614]
[-0.04977355 0.15853367 -0.02337953 0.0357429 -0.05604085 0.02600021
0.06158462 0.0229808 0.03506849]
[ 0.05733885 -0.02337953 0.0335786 -0.03485899 0.00294469 0.03209583
-0.05378417 0.00490397 -0.02751816]
[-0.11340765 0.0357429 -0.03485899 0.12340238 0.0052609 0.0144986
0.02494029 -0.07492008 0.05109007]
[ 0.00840097 -0.05604085 0.00294469 0.0052609 0.02529647 -0.01263607
-0.02327657 -0.01136774 -0.01037048]
[ 0.01461576 0.02600021 0.03209583 0.0144986 -0.01263607 0.07415853
-0.05387152 -0.0345835 -0.00342481]
[-0.08596712 0.06158462 -0.05378417 0.02494029 -0.02327657 -0.05387152
0.11053971 0.00903926 0.04727671]
[ 0.07512366 0.0229808 0.00490397 -0.07492008 -0.01136774 -0.0345835
0.00903926 0.09436665 -0.03526195]
[-0.07509614 0.03506849 -0.02751816 0.05109007 -0.01037048 -0.00342481
0.04727671 -0.03526195 0.03900974]]
covariance via numpy.dot():
[[ 0.3211555 -0.34304471 -0.01453859 -0.1071505 0.14357829]
[-0.34304471 1.24506647 -0.11174019 -0.43907983 -0.35120174]
[-0.01453859 -0.11174019 0.57018674 -0.10412646 -0.3397815 ]
[-0.1071505 -0.43907983 -0.10412646 0.60465919 0.0456976 ]
[ 0.14357829 -0.35120174 -0.3397815 0.0456976 0.50170735]]
eigenvalues via numpy.cov covariance matrix:
[ 3.34339027e-01 +0.00000000e+00j 1.98268985e-01 +0.00000000e+00j
5.71434551e-02 +0.00000000e+00j 1.13399310e-01 +0.00000000e+00j
3.38418299e-18 +1.46714498e-17j 3.38418299e-18 -1.46714498e-17j
1.20944017e-18 +0.00000000e+00j -8.89005842e-18 +0.00000000e+00j
-6.59244508e-18 +0.00000000e+00j]
eigenvectors via numpy.cov covariance matrix:
[[-0.33898927+0.j 0.01567746+0.j -0.32410513+0.j
0.01868249+0.j 0.03901578-0.09858459j 0.03901578+0.09858459j
-0.17596347+0.j 0.08294235+0.j 0.04883282+0.j ]
[ 0.03740184+0.j -0.01106985+0.j 0.11199662+0.j
-0.36257285+0.j 0.66513867+0.j 0.66513867+0.j
0.34810753+0.j -0.05174886+0.j -0.21147240+0.j ]
[ 0.42193056+0.j 0.10153367+0.j -0.52774125+0.j
-0.57292678+0.j -0.02584078-0.15425679j -0.02584078+0.15425679j
-0.02594397+0.j -0.23132722+0.j -0.33824532+0.j ]
[-0.08723679+0.j -0.17700647+0.j -0.04490487+0.j
0.14531440+0.j -0.08669754+0.21485879j -0.08669754-0.21485879j
-0.73208352+0.j 0.04474123+0.j -0.09159437+0.j ]
[-0.26991334+0.j 0.39182156+0.j 0.18023454+0.j
-0.14727224+0.j -0.21261400+0.1100362j -0.21261400-0.1100362j
0.15211635+0.j 0.54168898+0.j -0.36386803+0.j ]
[-0.39361702+0.j 0.48389127+0.j 0.12668909+0.j
0.07739853+0.j 0.31569702-0.34166187j 0.31569702+0.34166187j
0.11287735+0.j -0.74889136+0.j -0.42472067+0.j ]
[-0.29962418+0.j -0.01577641+0.j 0.35742257+0.j
-0.68969822+0.j -0.28182091+0.13998238j -0.28182091-0.13998238j
-0.40124817+0.j 0.06419507+0.j 0.47506061+0.j ]
[-0.57032501+0.j -0.60505095+0.j -0.30688172+0.j
-0.11823642+0.j 0.07618472-0.0915626j 0.07618472+0.0915626j
0.32272841+0.j -0.10872383+0.j -0.25867852+0.j ]
[-0.23498699+0.j 0.45164240+0.j -0.57569388+0.j
0.03856674+0.j -0.07478874+0.27512969j -0.07478874-0.27512969j
-0.10101603+0.j 0.25440413+0.j 0.47403650+0.j ]]
eigenvalues via numpy.dot covariance matrix:
[ 1.33735611e+00 7.93075942e-01 2.08276008e-16 4.53597239e-01
2.28573820e-01]
eigenvectors via numpy.dot covariance matrix:
[[ 0.1223889 -0.87441162 -0.4472136 -0.13172011 0.05545353]
[-0.54658696 0.08157704 -0.4472136 0.61361759 0.34360056]
[ 0.70163289 0.24699239 -0.4472136 0.41717057 -0.26958257]
[-0.41754523 0.17603863 -0.4472136 -0.33135976 -0.69632398]
[ 0.1401104 0.36980356 -0.4472136 -0.56770828 0.56685246]]
解决方法:
np.dot只是两个矩阵的矩阵乘积.那不是协方差.为什么使用rowvar = 0?如果只是执行np.cov(b),它将给出正确尺寸的矩阵.