Explained: A Style-Based Generator Architecture for GANs (StyleGAN)

文章目录


生成图像最大的调整是对输出的控制。

one of their main challenges is controlling their output, i.e. changing specific features such pose, face shape and hair style in an image of a face.

A Style-Based Generator Architecture for GANs (StyleGAN) 提出了一个创新的模型来解决了这个问题。

StyleGAN generates the artificial image gradually, starting from a very low resolution and continuing to a high resolution (1024×1024). By modifying the input of each level separately, it controls the visual features that are expressed in that level, from coarse features (pose, face shape) to fine details (hair color), without affecting other levels.

Background

GAN :

  • a generator that synthesizes new samples from scratch
  • a discriminator that takes samples from both the training data and the generator’s output and predicts if they are “real” or “fake”.

Generator:

  • input is a random vector (noise)

The discriminator:

  • also improves over time by comparing generated samples with real samples, making it harder for the generator to deceive it.

The key innovation of ProGAN is the progressive training — it starts by training the generator and the discriminator with a very low-resolution image (e.g. 4×4) and adds a higher resolution layer every time.
Explained: A Style-Based Generator Architecture for GANs (StyleGAN)

  • ProGAN generates high-quality images but, as in most models, its ability to control specific features of the generated image is very limited.

How StyleGAN works

  • The StyleGAN paper offers an upgraded version of ProGAN’s image generator, with a focus on the generator network.

The lower the layer (and the resolution), the coarser the features it affects. 层(分辨率)越低,越影响粗特征。

特征分类:

  1. Coarse - resolution of up to 82 - affects pose, general hair style, face shape, etc (姿势、一般发型、脸型)
  2. Middle - resolution of 162 to 322 - affects finer facial features, hair style, eyes open/closed, etc. (发型, 眼睛开闭)
  3. Fine - resolution of 642 to 10242 - affects color scheme (eye, hair and skin) and micro features. (眼睛、头发、皮肤的颜色方案)

? 这个分界线是如何试探出来的

生成器新增的点

Mapping Network

the ability to control visual features with the input vector is limited, as it must follow the probability density of the training data. a phenomenon called features entanglement.
解决方法:by using another neural network the model can generate a vector that doesn’t have to follow the training data distribution and can reduce the correlation between features. (减少特征之间的相关性)
Explained: A Style-Based Generator Architecture for GANs (StyleGAN)

Style Modules (AdaIN)

Explained: A Style-Based Generator Architecture for GANs (StyleGAN)
AdaIN (Adaptive Instance Normalization)模块将映射网络创建的编码信息ⱳ传输到生成的图像中。

  • 首先对卷积层输出的每个通道进行归一化处理,以确保步骤3的缩放和移动达到预期的效果
  • 中间向量ⱳ使用另一个全连接层(标记为A)转换为每个通道的比例和偏差。

Removing traditional input

传统模型使用随机输入来创建生成器的初始化图像(i.e. the input of the 4x4 level)
然而图像特征由ⱳ和AdaIN控制, 因此,初始输入可以省略,用常数值代替。
该方法有效,可能是因为它减少了特征纠缠(feature entanglement)。仅使用ⱳ,网络学习更容易,而不依赖于纠缠的输入向量。
Explained: A Style-Based Generator Architecture for GANs (StyleGAN)

Stochastic Variation

雀斑,头发的准确位置,皱纹 这些可以增加输出的多样性。
The common method to insert these small features into GAN images is adding random noise to the input vector.
The noise in StyleGAN is added in a similar way to the AdaIN mechanism:在AdaIN模块之前,每个通道都添加了一个缩放的噪声,并稍微改变了它所运行的分辨率级别的特征的视觉表达。

Explained: A Style-Based Generator Architecture for GANs (StyleGAN)

Style Mixing

the model randomly selects two input vectors and generates the intermediate vector ⱳ for them.
The model generates two images A and B and then combines them by taking low-level features from A and the rest of the features from B.

Truncation trick in W

生成模型的挑战之一是处理在训练数据中表现不佳的区域。
为了避免生成糟糕的图像,StyleGAN截断了中间向量ⱳ,迫使它保持在“平均”中间向量的附近。

通过选择许多随机输入,来获得平均值, 生成中间向量。

When generating new images, instead of using Mapping Network output directly

ⱳ is transformed into ⱳ_new=ⱳ_avg+

上一篇:Weighing Features of Lung and Heart Regions forThoracic Disease Classification


下一篇:【3】数据驱动-Scenarios Outlines