WGAN:Wasserstein GAN

Wasserstein GAN

Paper:https://arxiv.org/pdf/1701.07875.pdf
Code:https://github.com/igul222/improved_wgan_training
参考:
https://lilianweng.github.io/lil-log/2017/08/20/from-GAN-to-WGAN.html
https://vincentherrmann.github.io/blog/wasserstein/
(阅读笔记)

1.Intro

  • 得到目标概率密度一般就利用极大似然估计的方法,而不同分布之间则一般用散度衡量。
  • 模型生成得到的分布与原始真实的分布不太可能有交叉的地方。两个分布都仅仅只是各自有各自的,而不是联合的,得到的这种形式的目标分布是不理想的。It is then unlikely that…have a non-negligible intersection.
    • 所以很多文献都是通过给目标分布添加噪声来尽量覆盖所有的例子,但是会使图像受损。
    • 而GAN就是通过生成器让低维流形产生高维的分布,当下效果也不是很理想。
  • 主要目标是衡量分布之间的距离。we direct our attention on the various ways to measure how close the model distribution and the real distribution are, or equivalently.
  • 研究了EM距离。we provide a comprehensive theoretical analysis of how the Earth Mover (EM) distance behaves in comparison to popular probability distances and divergences used in the context of learning distributions.
  • 定义了WGAN。we define a form of GAN called Wasserstein-GAN that minimizes a reasonable and efficient approximation of the EM distance, and we theoretically show that the corresponding optimization problem is sound.

2.Distances

  • 各种distances(divergences)distances(divergences)distances(divergences):TV\mathbf{TV}TV;KL\mathbf{KL}KL;JS\mathbf{JS}JS等(可见论文fff-GAN),而EarthEarthEarth-Mover(EM)Mover(EM)Mover(EM)如下:
    W(Pr,Pg)=infγΠ(Pr,Pg)E(x,y)γ[xy]=infγΠ(Pr,Pg)[ γ(x,y)xy]dxdy(1) \begin{aligned} W(\mathbb{P}_{r},\mathbb{P}_{g})&=\inf_{\gamma \in \Pi(\mathbb{P}_{r},\mathbb{P}_{g})} \mathbb{E}_{(x,y) \sim \gamma} \left[\|x-y \| \right]\\ &=\inf_{\gamma \in \Pi(\mathbb{P}_{r},\mathbb{P}_{g})} \int \int \left[\ \gamma(x,y)\|x-y \| \right]\mathrm{d}x\mathrm{d}y \tag{1} \end{aligned} W(Pr​,Pg​)​=γ∈Π(Pr​,Pg​)inf​E(x,y)∼γ​[∥x−y∥]=γ∈Π(Pr​,Pg​)inf​∫∫[ γ(x,y)∥x−y∥]dxdy​(1)
    Pr,Pg\mathbb{P}_{r},\mathbb{P}_{g}Pr​,Pg​的联合分布集为Π\PiΠ;γ\gammaγ是其中一种联合分布;从γ\gammaγ中抽样得到所有(x,y)(x,y)(x,y),用范数衡量距离后再求均值;在所有联合分布集Π\PiΠ中,γ\gammaγ使该期望达到下界,该最小值即是EarthEarthEarth-Mover(EM)Mover(EM)Mover(EM)。
    所以具体实现就是类似推土的意思,主要目标是保证每一组抽样点相似:


    WGAN:Wasserstein GAN

  • 假设有均匀分布ZU[0,1]Z \sim U[0,1]Z∼U[0,1],现有真实分布P0(0,Z)R2P_0 \sim (0,Z)\in \mathbb{R}^2P0​∼(0,Z)∈R2,类似在二维坐标图中,点分布于yyy轴000到111。而目标使分布gθ(θ,Z)g_\theta \sim(\theta,Z)gθ​∼(θ,Z)去拟合P0P_0P0​。
    (x,y)P,x=0 and yU(0,1)(x,y)Q,x=θ,0θ1 and θ,yU(0,1)(2) \forall (x, y) \in P, x = 0 \text{ and } y \sim U(0, 1) \tag{2} \\ \forall (x, y) \in Q, x = \theta, 0 \leq \theta \leq 1 \text{ and } \theta, y \sim U(0, 1) \\ ∀(x,y)∈P,x=0 and y∼U(0,1)∀(x,y)∈Q,x=θ,0≤θ≤1 and θ,y∼U(0,1)(2)

    WGAN:Wasserstein GAN
    所以有如下距离定义,只有当θ=0\theta=0θ=0时,才能达到最小,但是除了WWW,均达不到最下值。:
    W(P0,Pθ)=θJS(P0,Pθ)={log2if θ00if θ=0KL(P0,Pθ)=KL(Pθ,P0)={if θ00if θ=0TV(P0,Pθ)={1if θ00if θ=0where: DKL(PQ=x=0,yU(0,1)1log10=+  DJS(PQ)=12(x=0,yU(0,1)1log11/2+x=0,yU(0,1)1log11/2)=log2 (3) \begin{aligned} W(\mathbb{P}_{0},\mathbb{P}_{\theta})&=|\theta|\\ \mathbf{JS}(\mathbb{P}_{0},\mathbb{P}_{\theta})&= \begin{cases} \log 2& \text{if $\theta \neq$0}\\ 0& \text{if $\theta=$0} \end{cases} \\ \mathbf{KL}(\mathbb{P}_{0},\mathbb{P}_{\theta})&=\mathbf{KL}(\mathbb{P}_{\theta},\mathbb{P}_{0})= \begin{cases} \infty& \text{if $\theta \neq$0}\\ 0& \text{if $\theta=$0} \end{cases} \\ \mathbf{TV}(\mathbb{P}_{0},\mathbb{P}_{\theta})&= \begin{cases} 1 & \text{if $\theta \neq$0}\\ 0& \text{if $\theta=$0} \end{cases} \\ \text{where: $D_{KL}(P \| Q$) }& \text{$= \sum_{x=0, y \sim U(0, 1)} 1 \cdot \log\frac{1}{0} = +\infty$ } \\ \text{ $D_{JS}(P \| Q$)}&= \text{$\frac{1}{2}(\sum_{x=0, y \sim U(0, 1)} 1 \cdot \log\frac{1}{1/2} + \sum_{x=0, y \sim U(0, 1)} 1 \cdot \log\frac{1}{1/2}) = \log 2$ } \\ \tag{3} \end{aligned} W(P0​,Pθ​)JS(P0​,Pθ​)KL(P0​,Pθ​)TV(P0​,Pθ​)where: DKL​(P∥Q)  DJS​(P∥Q)​=∣θ∣={log20​if θ​=0if θ=0​=KL(Pθ​,P0​)={∞0​if θ​=0if θ=0​={10​if θ​=0if θ=0​=∑x=0,y∼U(0,1)​1⋅log01​=+∞ =21​(∑x=0,y∼U(0,1)​1⋅log1/21​+∑x=0,y∼U(0,1)​1⋅log1/21​)=log2 ​(3)

  • Why Wasserstein is indeed weak?(有待研究更新)
    论文还叙述了为什么Wasserstein距离是比JS\mathbf{JS}JS距离差的,但作者仍然用Wasserstein距离。证明用到了一些泛函的概念。X\mathcal{X}X为R2\mathbb{R}^2R2中的一组集,即XR2\mathcal{X}\in \mathbb{R}^2X∈R2;Cb(X)C_b(\mathcal{X})Cb​(X)是将X\mathcal{X}X映射到R\mathbb{R}R的函数的空间(Cb(X)C_b(\mathcal{X})Cb​(X)中每一个元素都是函数,它是一集合):
    Cb(X)={f:XR,f is continuous and bounded}(4) \begin{aligned} C_b(\mathcal{X}) &= \{ f:\mathcal{X} \rightarrow \mathbb{R}, &\text{$f$ is continuous and bounded} \}\\ \tag{4} \end{aligned} Cb​(X)​={f:X→R,​f is continuous and bounded}​(4)
    当有fCb(X)f \in C_b(\mathcal{X})f∈Cb​(X)后,按照矩阵的方式理解则有,所以fff的无穷范数即是得到的R2\mathbb{R}^2R2空间结果的绝对值最大值:
    assume:fm×nXn×1=Rm×1fm×nXn×d=Rm×df=maxxXf(x)(5) \begin{aligned} \text{assume:}f_{m \times n} \cdot \mathcal{X}_{n \times 1}= \mathbb{R}_{m \times 1} \\ \therefore f_{m \times n} \cdot \mathcal{X}_{n \times d}= \mathbb{R}_{m \times d} \\ \therefore \|f\|_{\infin} = \max_{x \in \mathcal{X}}|f(x)| \tag{5} \end{aligned} assume:fm×n​⋅Xn×1​=Rm×1​∴fm×n​⋅Xn×d​=Rm×d​∴∥f∥∞​=x∈Xmax​∣f(x)∣​(5)
    给集合(Cb(X)(C_b(\mathcal{X})(Cb​(X)赋予一范数进行约束得到一个赋范向量空间(Cb(X),)(C_b(\mathcal{X}),\| \cdot \| )(Cb​(X),∥⋅∥)(ff_\infinf∞​范数诱导的自然拓扑)
    E×ER (x,y)xy (x,y)xy(6) \begin{aligned} {\mathbb {E}}\times {\mathbb {E}}\longrightarrow {\mathbb {R}} {\displaystyle \ (x,y)\mapsto \Vert x-y\Vert } \ (x,y)\mapsto \Vert x-y\Vert \tag{6} \end{aligned} E×E⟶R (x,y)↦∥x−y∥ (x,y)↦∥x−y∥​(6)

3.WGAN

  • 利用Kantorovich-Rubinstein对偶性,将推土距离转换如下(but why?有待研究更新),其中KKK代表K-Lipschitz:f(x1)f(x2)Kx1x2\text{K-Lipschitz}:\lvert f(x_1) - f(x_2) \rvert \leq K \lvert x_1 - x_2 \rvertK-Lipschitz:∣f(x1​)−f(x2​)∣≤K∣x1​−x2​∣,约束函数平稳,斜率不能太大:
    W(Pr,Pθ)=1KsupfLKExPr[f(x)]ExPθ[f(x)](7) \begin{aligned} W(\mathbb{P}_{r},\mathbb{P}_{\theta})= \frac{1}{K} \sup_{\| f \|_L \leq K} \mathbb{E}_{x \sim \mathbb{P}_{r}}[f(x)] - \mathbb{E}_{x \sim \mathbb{P}_{\theta}}[f(x)] \tag{7} \end{aligned} W(Pr​,Pθ​)=K1​∥f∥L​≤Ksup​Ex∼Pr​​[f(x)]−Ex∼Pθ​​[f(x)]​(7)
    所以有K-Lipschitz\text{K-Lipschitz}K-Lipschitz函数{fw}wW\{ f_w \}_{w \in W}{fw​}w∈W​,判别器需要学到一个好的fff,并且要求损失函数如下进行收敛:
    L(Pr,Pθ)=W(Pr,Pθ)=maxwWExpr[fw(x)]Ezpr(z)[fw(gθ(z))](8) \begin{aligned} L(\mathbb{P}_{r},\mathbb{P}_{\theta})=W(\mathbb{P}_{r},\mathbb{P}_{\theta})= \max_{w \in W} \mathbb{E}_{x \sim p_r}[f_w(x)] - \mathbb{E}_{z \sim p_r(z)}[f_w(g_\theta(z))] \tag{8} \end{aligned} L(Pr​,Pθ​)=W(Pr​,Pθ​)=w∈Wmax​Ex∼pr​​[fw​(x)]−Ez∼pr​(z)​[fw​(gθ​(z))]​(8)

    WGAN:Wasserstein GAN

正如算法流程所述, 以便使用梯度下降,所以文中使用约束权重范围的方法,以防止改变权重造成很大的改变,确保1-Lipschitz\text{1-Lipschitz}1-Lipschitz。

WGAN:Wasserstein GANWGAN:Wasserstein GAN 强大源 发布了29 篇原创文章 · 获赞 15 · 访问量 1万+ 私信 关注
上一篇:csu oj 1343 Long Long


下一篇:深入理解 Neutron -- OpenStack 网络实现(1):GRE 模式