text-to-image

2024-03-11 22:28:55

1.Generative Adversarial Text to Image Synthesis

介绍：《Generative Adversarial Text to Image Synthesis》阅读笔记 - 知乎

paper：https://arxiv.org/abs/1605.05396

code：https://github.com/reedscot/icml2016

2.Learning What and Where to Draw(2016)

GAWWN:Learning What and Where to Draw 论文解读_迷途的CH的博客-CSDN博客

论文地址：https://arxiv.org/abs/1610.02454

源码地址：GitHub - reedscot/nips2016: Learning What and Where to Draw

一、相关工作

本文是《Generative Adversarial Text to Image Synthesis》和《Learning Deep Representations of Fine-Grained Visual Descriptions》的续作。

对GAN的相关理解：经典网络复现系列（三）：GAN_zlrai5895的博客-CSDN博客

二、基本思想及成果

文章提出了一个新模型，即Generative Adversarial What-Where Network（GAWWN），该网络通过给出的在哪个位置绘制什么内容的说明来生成图像。

以文本描述和对象位置为条件在Caltech-UCSD Birds数据集上展示生成高质量的128×128图像。系统还能够以部分为条件（例如，只有喙和尾部）。

需要注意的是，实验中在训练文本编码器的时候使用到了Caltech-UCSD Birds的标签（鸟的类别）。但是现实生活中大多图片(COCO数据集）不会给每张图片中的场景配一个标签。所以一方面该模型有一定的局限性，另一方面，生成的图像分辨率也比较低（128*128）

三、数据集

本次实验使用的数据集是加利福尼亚理工学院鸟类数据库-2011（CUB_200_2011）。

3.Stackgan(2016)

text to image（四）:《Stackgan》_zlrai5895的博客-CSDN博客_stackgan

论文地址：https://arxiv.org/pdf/1612.03242v1.pdf

源码地址：GitHub - hanzhanggit/StackGAN-Pytorch

StackGAN本质上就是两个Conditional GAN的堆叠。如果我们没办法一次生成高分辨率又 plausible 的图片，那么可以分两次生成。第一阶段的Conditional GAN利用文本描述提取出的嵌入向量（text embedding）粗略勾画物体主要的形状和颜色，生成低分辨率的图片。第二阶段的对抗生成网络将第一阶段的低分辨率图片和文本描述提取出的嵌入向量（text embedding）作为输入，生成细节丰富的高分辨率图片。

4.StackGAN++

StackGAN++_Forlogenの解忧杂货铺-CSDN博客_stackgan++

text to image（五）:《StackGAN++》_zlrai5895的博客-CSDN博客_stackgan++

论文地址：https://arxiv.org/abs/1710.10916

源码地址：https://github.com/hanzhanggit/StackGAN-v2

与前作StackGAN相比，StackGAN v2有三点改进：

采用树状结构，多个生成器生成不同尺度的图像，每个尺度对应一个鉴别器。从而生成了多尺度fake images。
除了conditional loss，引入了unconditional loss。即不使用条件信息，直接使用服从标准正态分布的噪声z生成fake image的损失。
引入了color regulation，对生成的fake images 的色彩信息加以限制。

5.AttnGan(stakGan++)

text to image（六）:《AttnGAN》_zlrai5895的博客-CSDN博客_attngan

论文地址：https://arxiv.org/abs/1711.10485

源码地址：https://github.com/taoxugit/AttnGAN

6.TAC-GAN

数据集：Oxford-102

text to image（七）:《TAC-GAN 》_zlrai5895的博客-CSDN博客

论文地址：https://arxiv.org/abs/1703.06412v2

源码地址：GitHub - dashayushman/TAC-GAN: A Tensorflow implementation of the Text Conditioned Auxiliary Classifier Generative Adversarial Network for Generating Images from text descriptions (https://arxiv.org/abs/1703.06412)

7.Image Generation from Scene Graphs(18)

text to image（八）:《Image Generation from Scene Graphs》_zlrai5895的博客-CSDN博客

论文地址：https://arxiv.org/abs/1804.01622

源码地址： GitHub - google/sg2im: Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 2018

不同于先前的方法，李飞飞小组提出可以使用场景图作为中间媒介。即由原本的

文本----->图像（也就是RNN+GAN的直接搭配）

转化为

文本--→场景图--→图像

8。Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Net

text to image（九）:《Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Ne 》_zlrai5895的博客-CSDN博客

论文地址：https://arxiv.org/pdf/1802.09178.pdf

源码地址：https://github.com/ypxie/HDGan

9。Inferring Semantic Layout for Hierarchical Text-to-image Synthesis

text to image（十）：《Inferring Semantic Layout for Hierarchical Text-to-image Synthesis》_zlrai5895的博客-CSDN博客

论文地址:https://arxiv.org/abs/1801.05091