用Python求均值与方差,可以自己写,也可以借助于numpy,不过到底哪个快一点呢?
我做了个实验,首先生成9百万个样本:
1
2
3
|
nlist = range ( 0 , 9000000 )
nlist = [ float (i) / 1000000
for i in
nlist]
N = len (nlist)
|
第二行是为了让样本小一点,否则从1加到9百万会溢出的。
自己实现,遍历数组来求均值方差:
1
2
3
4
5
6
7
|
sum1 = 0.0
sum2 = 0.0
for
i in
range (N):
sum1 + = nlist[i]
sum2 + = nlist[i] * * 2
mean = sum1 / N
var = sum2 / N - mean * * 2
|
用时5.3s
借助numpy的向量运算来求:
1
2
3
4
5
6
7
|
import
numpy
narray = numpy.array(nlist)
sum1 = narray. sum ()
narray2 = narray * narray
sum2 = narray2. sum ()
mean = sum1 / N
var = sum2 / N - mean * * 2
|
用时1.0s
结论:还是用numpy吧~毕竟针对性优化过就是不一样~