PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION

文章目录


前言

1.activation:生成高分辨率图像很困难,因为更高的分辨率使得更容易将生成的图像与训练图像区分开来,从而大大放大了梯度问题。PGGAN的主要观点是:逐步增加发生器和鉴别器,从更容易的低分辨率图像开始,并添加新的层,随着训练的进行引入更高分辨率的细节。

1.activation:The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses.逐步增加发生器和鉴别器:从低分辨率开始,我们添加新的层,随着训练的进展模拟越来越多的细节。
2.


提示:以下是本篇文章正文内容,下面案例可供参考

一、PGGAN

示例:pandas 是基于NumPy 的一种工具,该工具是为了解决数据分析任务而创建的。

二、使用步骤

1.网络结构

生成G的网络结构:

def G_paper(
    latents_in,                         # First input: Latent vectors [minibatch, latent_size].
    labels_in,                          # Second input: Labels [minibatch, label_size].
    num_channels        = 1,            # Number of output color channels. Overridden based on dataset.
    resolution          = 32,           # Output resolution. Overridden based on dataset.
    label_size          = 0,            # Dimensionality of the labels, 0 if no labels. Overridden based on dataset.
    fmap_base           = 8192,         # Overall multiplier for the number of feature maps.
    fmap_decay          = 1.0,          # log2 feature map reduction when doubling the resolution.
    fmap_max            = 512,          # Maximum number of feature maps in any layer.
    latent_size         = None,         # Dimensionality of the latent vectors. None = min(fmap_base, fmap_max).
    normalize_latents   = True,         # Normalize latent vectors before feeding them to the network?
    use_wscale          = True,         # Enable equalized learning rate?
    use_pixelnorm       = True,         # Enable pixelwise feature vector normalization?
    pixelnorm_epsilon   = 1e-8,         # Constant epsilon for pixelwise feature vector normalization.
    use_leakyrelu       = True,         # True = leaky ReLU, False = ReLU.
    dtype               = 'float32',    # Data type to use for activations and outputs.
    fused_scale         = True,         # True = use fused upscale2d + conv2d, False = separate upscale2d layers.
    structure           = None,         # 'linear' = human-readable, 'recursive' = efficient, None = select automatically.
    is_template_graph   = False,        # True = template graph constructed by the Network class, False = actual evaluation.
    **kwargs):                          # Ignore unrecognized keyword args.
    

生成器G主要由block(x, res)和torgb(x, res)构建。
当训练刚开始时,图像分辨率是4x4,此时的网络结构:

block: res=2,lod=8 Nx(512x16) Dense Reshape Nx512x4x4 Conv2d Input: Nx512 Nx512x4x4
def block(x, res): # res = 2..resolution_log2
    with tf.variable_scope('%dx%d' % (2**res, 2**res)):
         if res == 2: # 4x4
            if normalize_latents: x = pixel_norm(x, epsilon=pixelnorm_epsilon)
             with tf.variable_scope('Dense'):
                  x = dense(x, fmaps=nf(res-1)*16, gain=np.sqrt(2)/4, use_wscale=use_wscale) # override gain to match the original Theano implementation
                  x = tf.reshape(x, [-1, nf(res-1), 4, 4])
                  x = PN(act(apply_bias(x)))
              with tf.variable_scope('Conv'):
                    x = PN(act(apply_bias(conv2d(x, fmaps=nf(res-1), kernel=3, use_wscale=use_wscale))))
         ......

当图像分辨率大于等于8x8时(例如分辨率=8x8):

block: res=3,lod=7 Nx512x8x8 upscale2d_conv2d conv2d Nx512x4x4 Nx512x8x8
def block(x, res): # res = 2..resolution_log2
    with tf.variable_scope('%dx%d' % (2**res, 2**res)):
    ......
         else: # 8x8 and up
           if fused_scale:
              with tf.variable_scope('Conv0_up'):
                   x = PN(act(apply_bias(upscale2d_conv2d(x, fmaps=nf(res-1), kernel=3, use_wscale=use_wscale))))
            else:
                x = upscale2d(x)
                with tf.variable_scope('Conv0'):
                     x = PN(act(apply_bias(conv2d(x, fmaps=nf(res-1), kernel=3, use_wscale=use_wscale))))
                with tf.variable_scope('Conv1'):
                    x = PN(act(apply_bias(conv2d(x, fmaps=nf(res-1), kernel=3, use_wscale=use_wscale))))
......

torgb层结构如下:(假设此时生成图像分辨率为8x8)

Nx512x8x8 Conv2d Output: Nx3x8x8
def torgb(x, res): # res = 2..resolution_log2
        lod = resolution_log2 - res
        with tf.variable_scope('ToRGB_lod%d' % lod):
            return apply_bias(conv2d(x, fmaps=num_channels, kernel=1, gain=1, use_wscale=use_wscale))

网络主体结构通过递归实现,其中渐进式增长训练通过改变lod_in实现,lod_in由提供。

graph TD
subgraph ide0 ["cru_nimg = 2100k,lod_in = 6.5"]
A(Input: Nx512) 
A-->B(Dense)
subgraph ide01["block: res=2,lod=8"]
B-->D(Reshape)
D-->F(Conv2d)
end
F-->G(Nx512x4x4)

G-->H(upscale2d_conv2d)
subgraph ide02["block: res=3,lod=7"]
H-->J(Conv2d)
end
J-->K(Nx512x8x8)

K-->Q(upscale2d_conv2d)
subgraph ide04["block: res=4,lod=6"]
Q-->R(Conv2d)
end
R-->S(Nx512x16x16)


K-->L(upscale2d_conv2d)
subgraph ide03["block: res=4,lod=6"]
L-->M(Conv2d)
end
M-->N(Nx512x16x16)
N -->O(Conv2d)
subgraph ide04 ["torgb:res=4,lod=6"]
O-->P("Output: Nx3x16x16")
style A fill:#FFFAFA, stroke:#FFFAFA
style C fill:#FFFAFA, stroke:#FFFAFA


end


subgraph ide1 ["cru_nimg = 1500k,lod_in = 7"]
a(Input:Nx512)
a -->b(Dense)
subgraph ide11["block: res=2,lod=8"]
b-->d(Reshape)
d-->f(Conv2d)
end
f-->g(Nx512x4x4)
g-->h(upscale2d_conv2d)
subgraph ide12["block: res=3,lod=7"]
h-->i(conv2d)
end
i-->j(Nx512x8x8)
end
style a fill:#FFFAFA, stroke:#FFFAFA
style g fill:#FFFAFA, stroke:#FFFAFA
style j fill:#FFFAFA, stroke:#FFFAFA
    if structure == 'recursive':
        def grow(x, res, lod):
            y = block(x, res)
            img = lambda: upscale2d(torgb(y, res), 2**lod)
            if res > 2: img = cset(img, (lod_in > lod), lambda: upscale2d(lerp(torgb(y, res), upscale2d(torgb(x, res - 1)), lod_in - lod), 2**lod))
            if lod > 0: img = cset(img, (lod_in < lod), lambda: grow(y, res + 1, lod - 1))
            return img()
        images_out = grow(combo_in, 2, resolution_log2 - 2)

2.训练过程

PGGAN的训练主要由train.py中的train_progressive_gan()实现。

# config.py
desc        = 'pgan'      
train       = EasyDict(func='train.train_progressive_gan')  # Options for main training func.
sched       = EasyDict()                                    # Options for train.TrainingSchedule.
desc += '-preset-v2-1gpu'
num_gpus = 1
sched.minibatch_base = 4
sched.minibatch_dict = {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4}
sched.G_lrate_dict = {1024: 0.0015}
sched.D_lrate_dict = EasyDict(sched.G_lrate_dict)
train.total_kimg = 12000
desc += '-fp32'
sched.max_minibatch_per_gpu = {256: 16, 512: 8, 1024: 4}

首先,加载数据集:

# config.py
desc += '-celebahq';            
dataset = EasyDict(tfrecord_dir='celebahq/XXX'); 
train.mirror_augment = True
training_set = dataset.load_dataset(data_dir=config.data_dir, verbose=True, **config.dataset)

然后,创建网络:

# config.py
desc        = 'pgan'      
G           = EasyDict(func='networks.G_paper')             # Options for generator network.
D           = EasyDict(func='networks.D_paper')             # Options for discriminator network.
print('Constructing networks...')
with tf.device('/gpu:0'):
     G = tfutil.Network('G', num_channels=training_set.shape[0], resolution=training_set.shape[1], label_size=training_set.label_size, **config.G)
     D = tfutil.Network('D', num_channels=training_set.shape[0], resolution=training_set.shape[1], label_size=training_set.label_size, **config.D)
     Gs = G.clone('Gs')
     Gs_update_op = Gs.setup_as_moving_average_of(G, beta=G_smoothing)

然后,创建tensorflow图:
1.模型的输入:(由于是tf1版本的代码)

print('Building TensorFlow graph...')
with tf.name_scope('Inputs'):
     lod_in = tf.placeholder(tf.float32, name='lod_in', shape=[]) 
     lrate_in = tf.placeholder(tf.float32, name='lrate_in', shape=[])
     minibatch_in = tf.placeholder(tf.int32, name='minibatch_in', shape=[])
     minibatch_split = minibatch_in // config.num_gpus
     reals, labels   = training_set.get_minibatch_tf()
     reals_split     = tf.split(reals, config.num_gpus)
     labels_split    = tf.split(labels, config.num_gpus)

2.设置优化器Optimizer:

# config.py
G_opt       = EasyDict(beta1=0.0, beta2=0.99, epsilon=1e-8) # Options for generator optimizer.
D_opt       = EasyDict(beta1=0.0, beta2=0.99, epsilon=1e-8) # Options for discriminator optimizer.
G_opt = tfutil.Optimizer(name='TrainG', learning_rate=lrate_in, **config.G_opt)
D_opt = tfutil.Optimizer(name='TrainD', learning_rate=lrate_in, **config.D_opt)

这里,

with tf.name_scope('GPU%d' % gpu), tf.device('/gpu:%d' % gpu):
     G_gpu = G 
     D_gpu = D 
     lod_assign_ops = [tf.assign(G_gpu.find_var('lod'), lod_in), 
     tf.assign(D_gpu.find_var('lod'), lod_in)]
     reals_gpu = process_reals(reals_split[gpu], lod_in, mirror_augment, training_set.dynamic_range, drange_net)
     labels_gpu = labels_split[gpu]

3.设置G和D的损失函数:

# config.py
G_loss      = EasyDict(func='loss.G_wgan_acgan')            # Options for generator loss.
D_loss      = EasyDict(func='loss.D_wgangp_acgan')          # Options for discriminator loss.
with tf.name_scope('G_loss'), tf.control_dependencies(lod_assign_ops):
          G_loss = tfutil.call_func_by_name(G=G_gpu, D=D_gpu, opt=G_opt, training_set=training_set, minibatch_size=minibatch_split, **config.G_loss)
with tf.name_scope('D_loss'), tf.control_dependencies(lod_assign_ops):
          D_loss = tfutil.call_func_by_name(G=G_gpu, D=D_gpu, opt=D_opt, training_set=training_set, minibatch_size=minibatch_split, reals=reals_gpu, labels=labels_gpu, **config.D_loss)

4.设置反向传播的自定义梯度:利用tf.train.Optimizer.apply_gradients更新权值

G_opt.register_gradients(tf.reduce_mean(G_loss), G_gpu.trainables)
D_opt.register_gradients(tf.reduce_mean(D_loss), D_gpu.trainables)
G_train_op = G_opt.apply_updates()
D_train_op = D_opt.apply_updates()

现在,开始训练: # total_kimg是指总共需要训练的img数量,cur_nimg是指现在已经训练过的img数量,cur_tick是指,tick_start_nimg是指,
# resume_kimg是指,prev_lod是指

print('Training...')
cur_nimg = int(resume_kimg * 1000)
cur_tick = 0
tick_start_nimg = cur_nimg
tick_start_time = time.time()
train_start_time = tick_start_time - resume_time
prev_lod = -1.0
while cur_nimg < total_kimg * 1000:
        # Choose training parameters and configure training ops.
        # 选择训练参数,配置训练操作。
        sched = TrainingSchedule(cur_nimg, training_set, **config.sched)
        training_set.configure(sched.minibatch, sched.lod)
        # 通过比较prev_lod与sched.lod看是否引入了new_layer,
        # 当引入new_layer的时候,重置优化器内部状态(e.g. Adam moments),并更新prev_lod
        if reset_opt_for_new_lod:  
            if np.floor(sched.lod) != np.floor(prev_lod) or np.ceil(sched.lod) != np.ceil(prev_lod):
                G_opt.reset_optimizer_state()
                D_opt.reset_optimizer_state()
        prev_lod = sched.lod
        # Run training ops.
        # 每更新D_repeats次(默认为1)鉴别器D,更新1次生成器G参数,并更新cur_nimg
        for repeat in range(minibatch_repeats):
            for _ in range(D_repeats):
                tfutil.run([D_train_op, Gs_update_op], {lod_in: sched.lod, lrate_in: sched.D_lrate, minibatch_in: sched.minibatch})
                cur_nimg += sched.minibatch
            tfutil.run([G_train_op], {lod_in: sched.lod, lrate_in: sched.G_lrate, minibatch_in: sched.minibatch})
tfutil.run([G_train_op], {lod_in: sched.lod, lrate_in: sched.G_lrate, minibatch_in: sched.minibatch})
tf.get_default_session().run(G_train_op(lod_in=sched.lod, lrate_in=sched.G_lrate, minibatch_in=sched.minibatch))
            # Perform maintenance tasks once per tick.
            # 每tick执行一次维护任务
            done = (cur_nimg >= total_kimg * 1000)
            if cur_nimg >= tick_start_nimg + sched.tick_kimg * 1000 or done:
               cur_tick += 1
               cur_time = time.time()
               tick_kimg = (cur_nimg - tick_start_nimg) / 1000.0
               tick_start_nimg = cur_nimg
               tick_time = cur_time - tick_start_time
               total_time = cur_time - train_start_time
               maintenance_time = tick_start_time - maintenance_start_time
               maintenance_start_time = cur_time
print('tick %-5d kimg %-8.1f lod %-5.2f minibatch %-4d time %-12s sec/tick %-7.1f sec/kimg %-7.2f maintenance %.1f' % (
                tfutil.autosummary('Progress/tick', cur_tick),
                tfutil.autosummary('Progress/kimg', cur_nimg / 1000.0),
                tfutil.autosummary('Progress/lod', sched.lod),
                tfutil.autosummary('Progress/minibatch', sched.minibatch),
                misc.format_time(tfutil.autosummary('Timing/total_sec', total_time)),
                tfutil.autosummary('Timing/sec_per_tick', tick_time),
                tfutil.autosummary('Timing/sec_per_kimg', tick_time / tick_kimg),
                tfutil.autosummary('Timing/maintenance_sec', maintenance_time)))
            tfutil.autosummary('Timing/total_hours', total_time / (60.0 * 60.0))
            tfutil.autosummary('Timing/total_days', total_time / (24.0 * 60.0 * 60.0))
            tfutil.save_summaries(summary_log, cur_nimg)
# Save snapshots.
            if cur_tick % image_snapshot_ticks == 0 or done:
                grid_fakes = Gs.run(grid_latents, grid_labels, minibatch_size=sched.minibatch//config.num_gpus)
                misc.save_image_grid(grid_fakes, os.path.join(result_subdir, 'fakes%06d.png' % (cur_nimg // 1000)), drange=drange_net, grid_size=grid_size)
            if cur_tick % network_snapshot_ticks == 0 or done:
                misc.save_pkl((G, D, Gs), os.path.join(result_subdir, 'network-snapshot-%06d.pkl' % (cur_nimg // 1000)))
            # Record start time of the next tick.
            tick_start_time = time.time()

训练完成,保存模型权重,日志文件:

misc.save_pkl((G, D, Gs), os.path.join(result_subdir, 'network-final.pkl'))
summary_log.close()
open(os.path.join(result_subdir, '_training-done.txt'), 'wt').close()

该处使用的url网络请求的数据。


在train.py的classTrainingSchedule()中,通过控制参数,在训练过程中实现渐进式增长的训练: lod_initial_resolution=4, lod_training_kimg=600和lod_transition_kimg=600表示每经过600k次迭代,增长分辨率, 生成器G和鉴别器D的学习率设置初始学习率,除此之外,通过读取config文件的dict决定。
# train.py
def train_progressive_gan():
    ...
    sched = TrainingSchedule(total_kimg * 1000, training_set, **config.sched)
    sched = TrainingSchedule(cur_nimg, training_set, **config.sched)
    ...
# config.py
desc += '-preset-v2-1gpu'; 
num_gpus = 1; 
sched.minibatch_base = 4; 
sched.minibatch_dict = {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4}; 
sched.G_lrate_dict = {1024: 0.0015}; 
sched.D_lrate_dict = EasyDict(sched.G_lrate_dict); 
train.total_kimg = 12000
desc += '-fp32'; 
sched.max_minibatch_per_gpu = {256: 16, 512: 8, 1024: 4}

同样,tick_kimg也是通过dict决定。

class TrainingSchedule:
   def __init__(
      self,
      cur_nimg,
      training_set,
      lod_initial_resolution  = 4,        # Image resolution used at the beginning.
      lod_training_kimg       = 600,      # Thousands of real images to show before doubling the resolution.
      lod_transition_kimg     = 600,      # Thousands of real images to show when fading in new layers.
      minibatch_base          = 16,       # Maximum minibatch size, divided evenly among GPUs.
      minibatch_dict          = {},       # Resolution-specific overrides.
      max_minibatch_per_gpu   = {},       # Resolution-specific maximum minibatch size per GPU.
      G_lrate_base            = 0.001,    # Learning rate for the generator.
      G_lrate_dict            = {},       # Resolution-specific overrides.
      D_lrate_base            = 0.001,    # Learning rate for the discriminator.
      D_lrate_dict            = {},       # Resolution-specific overrides.
      tick_kimg_base          = 160,      # Default interval of progress snapshots.
      tick_kimg_dict          = {4: 160, 8:140, 16:120, 32:100, 64:80, 128:60, 256:40, 512:20, 1024:10} # Resolution-specific overrides.
    ):

假设,现在模型训练了1500k张图像(cur_nimg=1500×1000),kimg指现在已经训练过1500k张图像,phase_dur指现训练阶段(训练分辨率=8×8)所需要训练的图像数量(默认为600k+600k=1200k),phase_idx指现训练阶段对应的idx为1,phase_kimg指在该阶段已训练的图像数量(300k)。

       self.kimg = cur_nimg / 1000.0
       phase_dur = lod_training_kimg + lod_transition_kimg
       phase_idx = int(np.floor(self.kimg / phase_dur)) if phase_dur > 0 else 0
       phase_kimg = self.kimg - phase_idx * phase_dur       

接着计算lod,其中lod=10(最终输出分辨率=1024)
首先,减去2(初始训练图像分辨率=4×4),lod=8
其次,减去已经完成的阶段idx(phase_idx=1),lod=7
然后,减去该阶段转换阶段完成的百分比;(1)cur_nimg=1500k,训练还未进入转换阶段,lod=7;(2)如果此时的cur_nimg=2100k,可以得知,8×8的训练已经完成,正在向16×16阶段进行转换,lod减去0.5((900k-600k)/600k=0.5),lod=6.5。
最终,计算本次训练的图像分辨率:(1)仍处于8×8的训练的训练阶段;(2)已经处于16×16的训练阶段。

       self.lod = training_set.resolution_log2
       self.lod -= np.floor(np.log2(lod_initial_resolution))
       self.lod -= phase_idx
       if lod_transition_kimg > 0:
          self.lod -= max(phase_kimg - lod_training_kimg, 0.0) / lod_transition_kimg
       self.lod = max(self.lod, 0.0)
       self.resolution = 2 ** (training_set.resolution_log2 - int(np.floor(self.lod)))

通过当前resolution决定minibatch大小,以及生成器G和鉴别器D的学习率大小,其中,minibatch的大小还与config.py文件里的设置有关。
假设现在resolution=128,gpu=2,minibatch = 16(此处要保证每个GPU分配的batch大小相同),
假设现在resolution=256,gpu=4,minibatch = 8,(此处要保证每个GPU分配的batch大小相同),
生成器G和鉴别器D的初始学习率为0.001,1024×1024的学习率为0.0015。

# config.py
sched.minibatch_dict = {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4}
sched.max_minibatch_per_gpu = {256: 16, 512: 8, 1024: 4}
sched.G_lrate_dict = {1024: 0.0015}; 
sched.D_lrate_dict = EasyDict(sched.G_lrate_dict); 
       # Minibatch size.
       self.minibatch = minibatch_dict.get(self.resolution, minibatch_base)
       self.minibatch -= self.minibatch % config.num_gpus
       if self.resolution in max_minibatch_per_gpu:
          self.minibatch = min(self.minibatch, max_minibatch_per_gpu[self.resolution] * config.num_gpus)
       self.G_lrate = G_lrate_dict.get(self.resolution, G_lrate_base)
       self.D_lrate = D_lrate_dict.get(self.resolution, D_lrate_base)  
       self.tick_kimg = tick_kimg_dict.get(self.resolution, tick_kimg_base)

总结

提示:这里对文章进行总结:
例如:以上就是今天要讲的内容,本文仅仅简单介绍了pandas的使用,而pandas提供了大量能使我们快速便捷地处理数据的函数和方法。

上一篇:kyber调度器原理及源码分析


下一篇:python-APScheduler