GAN inversion:is to obtain the ‘real’ images’ latent codes and perform some subsequent image processing tasks
by manipulating the latent codes in the latent space.
一、gan models
- DCGAN
- WGAN
- BigGAN
- PGGAN
- StyleGAN
二、datasets
- ImageNet
- CelebA
- Flickr-Faces-HQ(FFHQ)
- LSUN
- DeepFashion,AnimeFaces, and StreetScapes
三、evaluation metrics
For evaluation, there are two important aspects for GAN inversion: how photorealistic(图形真实感,图像质量) (image quality) and faithful(inversion accuracy) (反演精确度) the generated image is. IS, FID, and LPIPS are widely used measurements to assess the quality of GAN-generated images; recent studies have also used SWD.
IS and FID are metrics for image diversity, while LPIPS is a metric for similarity. For inversion accuracy, most methods use the reconstruction distance,e.g.PSNR or SSIM.Some other methods [59] use cosine or Euclidean distance toevaluate different attributes between the input and output,while other approaches [95] use classification accuracy forassessment.
3.1 image quality
(1) The mean opinion score(MOS) and difference mean opinion score(DMOS) have been used for subjective image quality assessment, where human raters are asked to assign perceptual quality scores to images.
平均意见分数(MOS)和差异平均意见分数(DMOS)已用于主观图像质量评估,其中要求人类评分员为图像分配感知质量分数。
(1-5:bad to good,and the final MOS is calculated as the arithmetic mean算数平均值)
(2)the inception score (IS): is a widely used metric to measure the quality and diversity of images generated from GAN models.
(3)FID
(4)FSD
(5)SWD
(6)LPIPS
3.2 inversion accuracy
(1)propose reconstructor classification accuracy(RCA) to measure modelinterpretability by predicting the direction in the latentspace that a given image transformation is generated.
(2)Reconstruction Distances.To evaluate the reconstruction,the most widely used metrics are peak signal-to-noise ratio(PSNR) and structural similarity(SSIM)
四、gan inversion methods
一个好的潜在空间应该结构简单,易于嵌入。这种潜在空间中的最新代码应具有以下两个属性:
(1)它应真实地重建输入图像,并具有照片级真实感;
(2)它应便于下游任务(例如图像编辑)
4.1which space to embed
(1)Z space(the original):The generative model in the GAN architecture learns to map the values sampled from a simple distribution,e.g., normal or uniform distribution(正态分布和均匀分布), to the generated images.
the constraints of the Z space subject to a normal distribution limit its representative capacity for the semantic attributes
(2)W and W+ space:
最近的工作[16]通过一个8层的MLP实现的非线性映射网络,进一步将Native Z转换为映射样式矢量,该网络形成另一个中间潜在间隔层,即W space。
由于映射网络和仿射变换,StyleGAN的W Space比Z space包含更多未混淆的特征。一些研究分析了两种空间的可分性和语义。在[21]中,Shen等人说明了使用W Space的模型在可分性和表示方面比基于Z Space的模型表现得更好。StyleGAN的生成器倾向于基于W Space学习语义信息,其性能优于使用Z Space的生成器。
对于语义,上述工作根据不同属性的潜在分离边界来评估分类准确性。由于直接嵌入W空间或者Z空间并不容易,一些工作[24],[25]利用了另一个潜在空间W+,其中,不同的中间latent vector w 通过AdaIN[81]馈入到生成器的每个层中。对于具有18层的1024×1024 stylegan,w∈ W has 512 dimensions,w∈ W+的demensions为18×512.
(3)S space:
This S space is proposed to achieve spatial disentanglement in the spatial dimension instead of at the semantic level.
空间纠缠是由于基于风格的生成器的内在复杂性和adain的空间不变性。
Recent methods [63], [67] have used learned affine transformations(仿射变换) to turn z∈ Z or w∈ W into channelwise style parameters s for each layer of the generator. By directly intervening the style codes∈S, both methods [63], [67] can achieve fine-grained(细粒度) controls on local translations
(4)P space:没看明白...
4.2inversion methods
(1)learning-based gan inversion
the learning-based approach often achieves better performance than direct optimization and does not fall into local optima (局部最优)
(2)optimization-based gan inversion
(3)Hybrid (混合)GAN Inversion
4.3 characteristics of gan inversion methods
1.supported resolution
2.semantic awareness
3.layerwise
4.out of distribution
4.4latent space navigation
五、applications
5.1 image manipulation
5.2 image generation
5.3 image restoration(图像复原,恢复)
5.4 image interpolation(图像插值)
5.5 style transfer
5.6 compressive sensing(压缩感知)
5.7 semantic diffusion
5.8 category transfer
5.9adversarial defense
5.10 3d restruction
5.11 image understanding
5.12 multimodal learning
5.13 medical imaging
六、challenges and future directions