论文阅读笔记:gan inversion:a survey

GAN inversion:is to obtain the ‘real’ images’ latent codes and perform some subsequent  image  processing  tasks 

by  manipulating  the latent codes in the latent space.

一、gan models

  • DCGAN
  • WGAN
  • BigGAN
  • PGGAN
  • StyleGAN

二、datasets

  • ImageNet
  • CelebA
  • Flickr-Faces-HQ(FFHQ) 
  • LSUN
  • DeepFashion,AnimeFaces, and StreetScapes

三、evaluation metrics

For  evaluation,  there  are  two  important  aspects  for  GAN inversion:  how  photorealistic(图形真实感,图像质量)  (image  quality)  and  faithful(inversion  accuracy) (反演精确度) the  generated  image  is.  IS,  FID,  and LPIPS are widely used measurements to assess the quality of  GAN-generated  images;  recent  studies  have  also  used SWD. 

IS  and  FID  are  metrics  for  image  diversity,  while LPIPS is a metric for similarity. For inversion accuracy, most methods use the reconstruction distance,e.g.PSNR or SSIM.Some other methods [59] use cosine or Euclidean distance toevaluate different attributes between the input and output,while  other  approaches  [95]  use  classification  accuracy  forassessment.

3.1 image quality

(1) The   mean   opinion   score(MOS)  and difference   mean opinion score(DMOS) have been used for subjective image quality assessment, where human raters are asked to assign perceptual  quality  scores  to  images. 

平均意见分数(MOS)和差异平均意见分数(DMOS)已用于主观图像质量评估,其中要求人类评分员为图像分配感知质量分数。

(1-5:bad to good,and the final MOS is calculated as the arithmetic mean算数平均值)

 (2)the inception score (IS): is  a  widely  used  metric  to measure the quality and diversity of images generated from GAN models.

(3)FID

(4)FSD

(5)SWD

(6)LPIPS

3.2 inversion accuracy

(1)propose reconstructor  classification  accuracy(RCA)  to  measure  modelinterpretability  by  predicting  the  direction  in  the  latentspace that a given image transformation is generated. 

(2)Reconstruction  Distances.To  evaluate  the  reconstruction,the most widely used metrics are peak signal-to-noise ratio(PSNR)  and structural  similarity(SSIM) 

 

四、gan inversion methods

一个好的潜在空间应该结构简单,易于嵌入。这种潜在空间中的最新代码应具有以下两个属性:

(1)它应真实地重建输入图像,并具有照片级真实感;

(2)它应便于下游任务(例如图像编辑)

4.1which space to embed

(1)Z space(the original):The  generative  model  in  the  GAN  architecture learns  to  map  the  values  sampled  from  a  simple  distribution,e.g., normal or uniform distribution(正态分布和均匀分布), to the generated images.

the constraints of the Z space subject to a normal distribution limit its representative capacity for the semantic attributes

(2)W and W+ space:

最近的工作[16]通过一个8层的MLP实现的非线性映射网络,进一步将Native Z转换为映射样式矢量,该网络形成另一个中间潜在间隔层,即W space。

由于映射网络和仿射变换,StyleGAN的W Space比Z space包含更多未混淆的特征。一些研究分析了两种空间的可分性和语义。在[21]中,Shen等人说明了使用W Space的模型在可分性和表示方面比基于Z Space的模型表现得更好。StyleGAN的生成器倾向于基于W Space学习语义信息,其性能优于使用Z Space的生成器。

对于语义,上述工作根据不同属性的潜在分离边界来评估分类准确性。由于直接嵌入W空间或者Z空间并不容易,一些工作[24],[25]利用了另一个潜在空间W+,其中,不同的中间latent vector w 通过AdaIN[81]馈入到生成器的每个层中。对于具有18层的1024×1024 stylegan,w∈ W has 512 dimensions,w∈ W+的demensions为18×512.

(3)S space:

This S space is proposed to achieve spatial disentanglement in the spatial dimension instead of at the semantic level.

空间纠缠是由于基于风格的生成器的内在复杂性和adain的空间不变性。

Recent  methods  [63],  [67]  have  used  learned  affine transformations(仿射变换) to turn z∈ Z or w∈ W into channelwise style parameters s for each layer of the generator. By directly intervening the style codes∈S, both methods [63], [67] can achieve fine-grained(细粒度) controls on local translations 

(4)P space:没看明白...

 

4.2inversion methods 

(1)learning-based gan inversion

the  learning-based  approach  often  achieves  better  performance than direct optimization and does not fall into local optima (局部最优)

论文阅读笔记:gan inversion:a survey

 

(2)optimization-based gan inversion

(3)Hybrid (混合)GAN Inversion

4.3 characteristics of gan inversion methods

1.supported resolution

2.semantic awareness

3.layerwise

4.out of distribution 

 

4.4latent space navigation

五、applications

5.1 image manipulation

 

 

5.2 image generation

5.3 image restoration(图像复原,恢复)

5.4 image interpolation(图像插值)

5.5 style transfer

5.6 compressive sensing(压缩感知) 

5.7 semantic diffusion

5.8 category transfer

5.9adversarial defense

5.10 3d restruction

5.11 image understanding

5.12 multimodal learning

5.13 medical imaging

 

六、challenges and future directions

 

 

 

 

 

 

 

 

上一篇:The Ultimate Guide to Buying A New Camera


下一篇:VMware如何克隆一个虚拟机