我想在我的python应用程序中绘制图表,但是源numpy数组太大,无法做到这一点(大约1’000’000).我想对相邻元素取平均值.第一个想法是以C风格完成的:
step = 19000 # every 19 seconds (for example) make new point with neam value
dt = <ordered array with time stamps>
value = <some random data that we want to draw>
index = dt - dt % step
cur = 0
res = []
while cur < len(index):
next = cur
while next < len(index) and index[next] == index[cur]:
next += 1
res.append(np.mean(value[cur:next]))
cur = next
但是此解决方案的运行速度非常慢.我试图做喜欢this:
step = 19000 # every 19 seconds (for example) make new point with neam value
dt = <ordered array with time stamps>
value = <some random data that we want to draw>
index = dt - dt % step
data = np.arange(index[0], index[-1] + 1, step)
res = [value[index == i].mean() for i in data]
pass
此解决方案比第一个慢.解决此问题的最佳方法是什么?
解决方法:
np.histogram可以提供任意bin上的和.如果您有时间序列,例如:
import numpy as np
data = np.random.rand(1000) # Random numbers between 0 and 1
t = np.cumsum(np.random.rand(1000)) # Random time series, from about 1 to 500
那么您可以使用np.histogram计算5秒间隔内的合并总和:
t_bins = np.arange(0., 500., 5.) # Or whatever range you want
sums = np.histogram(t, t_bins, weights=data)[0]
如果您想要平均值而不是总和,请删除权重并使用bin计数:
means = sums / np.histogram(t, t_bins)][0]
此方法类似于this answer中的方法.