快速方法减少python中自相关函数的噪声？

2022-07-12 07:52:28

我可以使用numpy的内置功能来计算自相关：
numpy.correlate(x,x,mode =’same’)

但是,由此产生的相关性自然是嘈杂的.我可以对数据进行分区,并在每个结果窗口上计算相关性,然后将它们平均在一起以计算出更清晰的自相关,类似于signal.welch所做的. numpy或scipy中是否有一个方便的函数可以做到这一点,这可能比我自己计算分区并遍历数据的速度快吗？

更新

这是由@kazemakase答案激发的.我试图用一些用于生成下图的代码来说明我的意思.

可以看到@kazemakase是正确的,因为AC函数自然会平均噪声.但是,平均AC的优点是速度更快！ np.correlate似乎缩放为慢O(n ^ 2)而不是我期望的O(nlogn),如果相关性是通过FFT使用循环卷积来计算的…

from statsmodels.tsa.arima_model import ARIMA
import statsmodels as sm
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(12345)
arparams = np.array([.75, -.25, 0.2, -0.15])
maparams = np.array([.65, .35])
ar = np.r_[1, -arparams] # add zero-lag and negate
ma = np.r_[1, maparams] # add zero-lag
x = sm.tsa.arima_process.arma_generate_sample(ar, ma, 10000)


def calc_rxx(x):
    x = x-x.mean()
    N = len(x)
    Rxx = np.correlate(x,x,mode="same")[N/2::]/N
    #Rxx = np.correlate(x,x,mode="same")[N/2::]/np.arange(N,N/2,-1)
    return  Rxx/x.var()

def avg_rxx(x,nperseg=1024):
    rxx_windows = []
    Nw = int(np.floor(len(x)/nperseg))
    print Nw
    first = True
    for i in range(Nw-1):
        xw = x[i*nperseg:nperseg*(i+1)]
        y = calc_rxx(xw)
        if i%1 == 0:
            if first:
                plt.semilogx(y,"k",alpha=0.2,label="Short AC")
                first = False
            else:
                plt.semilogx(y,"k",alpha=0.2)

        rxx_windows.append(y)

    print np.shape(rxx_windows)
    return np.mean(rxx_windows,axis=0)



plt.figure()
r_avg = avg_rxx(x,nperseg=300)
r = calc_rxx(x)
plt.semilogx(r_avg,label="Average AC")
plt.semilogx(r,label="Long AC")

plt.xlabel("Lag")
plt.ylabel("Auto-correlation")
plt.legend()
plt.xlim([0,150])
plt.show()

解决方法:

TL-DR：为减少自相关函数中的噪声,请增加信号x的长度.

像频谱估计一样对数据进行分区和平均是一个有趣的想法.我希望它能…

自相关定义为

假设我们将数据划分为两个窗口.他们的自相关变为

请注意它们仅在求和范围内有所不同.基本上,我们将自相关的总和分为两部分.当我们将这些加回去时,我们又回到了原始的自相关！所以我们什么也没得到.

结论是,在numpy / scipy中没有实现这样的事情,因为这样做没有任何意义.

备注：

>我希望很容易看到这扩展到任意数量的分区.
>为了简单起见,我省略了归一化.如果将Rxx除以n并将部分Rxx除以n / 2,则得到Rxx / n ==(Rxx1 * 2 / n Rxx2 * 2 / n)/ 2.归一化部分自相关的平均值等于完全归一化自相关.
>为了使其更简单,我假设信号x的索引可以超出0和n-1的范围.实际上,如果信号存储在阵列中,则通常是不可能的.在这种情况下,完全自相关和部分自相关之间存在很小的差异,该差异随滞后量l的增加而增加.不幸的是,这仅仅是精度的损失,并不能减少噪声.

Code heretic! I don’t belive your evil math!

当然,我们可以尝试一下,看看：

import matplotlib.pyplot as plt
import numpy as np

n = 2**16
n_segments = 8

x = np.random.randn(n)  # data

rx = np.correlate(x, x, mode='same') / n  # ACF
l1 = np.arange(-n//2, n//2)  # Lags

segments = x.reshape(n_segments, -1)
m = segments.shape[1]

rs = []
for y in segments:
    ry = np.correlate(y, y, mode='same') / m  # partial ACF
    rs.append(ry)

l2 = np.arange(-m//2, m//2)  # lags of partial ACFs

plt.plot(l1, rx, label='full ACF')
plt.plot(l2, np.mean(rs, axis=0), label='partial ACF')
plt.xlim(-m, m)
plt.legend()
plt.show()

尽管我们使用8个段对ACF进行平均,但噪声水平在视觉上保持不变.

Okay, so that’s why it does not work but what is the solution?

这是个好消息：自相关已经是一种降噪技术！好吧,至少在某种程度上：ACF的一种应用是寻找被噪声隐藏的周期性信号.

由于噪声(理想情况下)的均值为零,因此其影响会减少我们总结的更多元素.换句话说,您可以使用更长的信号来降低自相关中的噪声. (我想这可能不适用于每种类型的噪声,但应该适用于通常的高斯白噪声及其亲属.)

随着更多数据样本的出现,噪声会降低：

import matplotlib.pyplot as plt
import numpy as np

for n in [2**6, 2**8, 2**12]:
    x = np.random.randn(n)

    rx = np.correlate(x, x, mode='same') / n  # ACF
    l1 = np.arange(-n//2, n//2)  # Lags

    plt.plot(l1, rx, label='n={}'.format(n))

plt.legend()    
plt.xlim(-20, 20)
plt.show()

码农公寓

相关文章