我想绘制核密度估计的等高线图,其中KDE集成在每个等高线图填充区域内.
举个例子,假设我计算了2D数据的KDE:
data = np.random.multivariate_normal((0, 0), [[1, 1], [2, 0.7]], 100)
x = data[:, 0]
y = data[:, 1]
xmin, xmax = min(x), max(x)
ymin, ymax = min(y), max(y)
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)
我知道如何绘制KDE的等高线图.
fig = plt.figure()
ax = fig.gca()
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
cfset = ax.contourf(xx, yy, f, cmap='Blues')
cset = ax.contour(xx, yy, f, colors='k')
plt.show()
但是,此等值线图显示了每个填充区域内的概率密度.相反,我希望该图表示每个填充区域内的总概率.
解决方法:
请注意,只有您的轮廓是“单调的”时,以下内容才是正确的,即在轮廓线内,您只能找到高于相应轮廓等级的像素值.另请注意,如果您的密度是多峰的,则将单独峰中的相应区域集中在一起.
如果这是真的/可接受的,则可以通过按值排序像素来解决问题.
我不知道您的绘图程序选择其轮廓级别的哪种启发式方法,但假设您将它们(按升序排列,比如说)存储在一个名为“级别”的变量中,您可以尝试类似
ff = f.ravel()
order = np.argsort(ff)
fsorted = ff[order]
F = np.cumsum(fsorted)
# depending on how your density is normalised next line may be superfluous
# also note that this is only correct for equal bins
# and, finally, to be unimpeachably rigorous, this disregards the probability
# mass outside the field of view, so it calculates probability condtional
# on being in the field of view
F /= F[-1]
boundaries = fsorted.searchsorted(levels)
new_levels = F[boundaries]
现在,为了能够使用它,您的绘图程序必须允许您*选择轮廓标签或至少选择放置轮廓的级别.在后一种情况下,假设有一个kwarg’级别’
# make a copy to avoid problems with in-place shuffling
# i.e. overwriting positions whose original values are still to be read out
F[order] = F.copy()
F.shape = f.shape
cset = ax.contour(xx, yy, F, levels=new_levels, colors='k')
我已经从您的一条评论中复制了以下内容,以使其更加清晰可见
Finally, if one wants to really have the probability within each filled area, this is a workaround that works: cb = fig.colorbar(cfset, ax = ax) values = cb.values.copy() values[1:] -= values[:-1].copy() cb.set_ticklabels(values) – Laura