文章目录
前言
1.activation:生成高分辨率图像很困难,因为更高的分辨率使得更容易将生成的图像与训练图像区分开来,从而大大放大了梯度问题。PGGAN的主要观点是:逐步增加发生器和鉴别器,从更容易的低分辨率图像开始,并添加新的层,随着训练的进行引入更高分辨率的细节。
1.activation:The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses.逐步增加发生器和鉴别器:从低分辨率开始,我们添加新的层,随着训练的进展模拟越来越多的细节。
2.
提示:以下是本篇文章正文内容,下面案例可供参考
一、PGGAN
示例:pandas 是基于NumPy 的一种工具,该工具是为了解决数据分析任务而创建的。
二、使用步骤
1.网络结构
生成G的网络结构:
def G_paper(
latents_in, # First input: Latent vectors [minibatch, latent_size].
labels_in, # Second input: Labels [minibatch, label_size].
num_channels = 1, # Number of output color channels. Overridden based on dataset.
resolution = 32, # Output resolution. Overridden based on dataset.
label_size = 0, # Dimensionality of the labels, 0 if no labels. Overridden based on dataset.
fmap_base = 8192, # Overall multiplier for the number of feature maps.
fmap_decay = 1.0, # log2 feature map reduction when doubling the resolution.
fmap_max = 512, # Maximum number of feature maps in any layer.
latent_size = None, # Dimensionality of the latent vectors. None = min(fmap_base, fmap_max).
normalize_latents = True, # Normalize latent vectors before feeding them to the network?
use_wscale = True, # Enable equalized learning rate?
use_pixelnorm = True, # Enable pixelwise feature vector normalization?
pixelnorm_epsilon = 1e-8, # Constant epsilon for pixelwise feature vector normalization.
use_leakyrelu = True, # True = leaky ReLU, False = ReLU.
dtype = 'float32', # Data type to use for activations and outputs.
fused_scale = True, # True = use fused upscale2d + conv2d, False = separate upscale2d layers.
structure = None, # 'linear' = human-readable, 'recursive' = efficient, None = select automatically.
is_template_graph = False, # True = template graph constructed by the Network class, False = actual evaluation.
**kwargs): # Ignore unrecognized keyword args.
生成器G主要由block(x, res)和torgb(x, res)构建。
当训练刚开始时,图像分辨率是4x4,此时的网络结构:
def block(x, res): # res = 2..resolution_log2
with tf.variable_scope('%dx%d' % (2**res, 2**res)):
if res == 2: # 4x4
if normalize_latents: x = pixel_norm(x, epsilon=pixelnorm_epsilon)
with tf.variable_scope('Dense'):
x = dense(x, fmaps=nf(res-1)*16, gain=np.sqrt(2)/4, use_wscale=use_wscale) # override gain to match the original Theano implementation
x = tf.reshape(x, [-1, nf(res-1), 4, 4])
x = PN(act(apply_bias(x)))
with tf.variable_scope('Conv'):
x = PN(act(apply_bias(conv2d(x, fmaps=nf(res-1), kernel=3, use_wscale=use_wscale))))
......
当图像分辨率大于等于8x8时(例如分辨率=8x8):
def block(x, res): # res = 2..resolution_log2
with tf.variable_scope('%dx%d' % (2**res, 2**res)):
......
else: # 8x8 and up
if fused_scale:
with tf.variable_scope('Conv0_up'):
x = PN(act(apply_bias(upscale2d_conv2d(x, fmaps=nf(res-1), kernel=3, use_wscale=use_wscale))))
else:
x = upscale2d(x)
with tf.variable_scope('Conv0'):
x = PN(act(apply_bias(conv2d(x, fmaps=nf(res-1), kernel=3, use_wscale=use_wscale))))
with tf.variable_scope('Conv1'):
x = PN(act(apply_bias(conv2d(x, fmaps=nf(res-1), kernel=3, use_wscale=use_wscale))))
......
torgb层结构如下:(假设此时生成图像分辨率为8x8)
def torgb(x, res): # res = 2..resolution_log2
lod = resolution_log2 - res
with tf.variable_scope('ToRGB_lod%d' % lod):
return apply_bias(conv2d(x, fmaps=num_channels, kernel=1, gain=1, use_wscale=use_wscale))
网络主体结构通过递归实现,其中渐进式增长训练通过改变lod_in实现,lod_in由提供。
graph TD
subgraph ide0 ["cru_nimg = 2100k,lod_in = 6.5"]
A(Input: Nx512)
A-->B(Dense)
subgraph ide01["block: res=2,lod=8"]
B-->D(Reshape)
D-->F(Conv2d)
end
F-->G(Nx512x4x4)
G-->H(upscale2d_conv2d)
subgraph ide02["block: res=3,lod=7"]
H-->J(Conv2d)
end
J-->K(Nx512x8x8)
K-->Q(upscale2d_conv2d)
subgraph ide04["block: res=4,lod=6"]
Q-->R(Conv2d)
end
R-->S(Nx512x16x16)
K-->L(upscale2d_conv2d)
subgraph ide03["block: res=4,lod=6"]
L-->M(Conv2d)
end
M-->N(Nx512x16x16)
N -->O(Conv2d)
subgraph ide04 ["torgb:res=4,lod=6"]
O-->P("Output: Nx3x16x16")
style A fill:#FFFAFA, stroke:#FFFAFA
style C fill:#FFFAFA, stroke:#FFFAFA
end
subgraph ide1 ["cru_nimg = 1500k,lod_in = 7"]
a(Input:Nx512)
a -->b(Dense)
subgraph ide11["block: res=2,lod=8"]
b-->d(Reshape)
d-->f(Conv2d)
end
f-->g(Nx512x4x4)
g-->h(upscale2d_conv2d)
subgraph ide12["block: res=3,lod=7"]
h-->i(conv2d)
end
i-->j(Nx512x8x8)
end
style a fill:#FFFAFA, stroke:#FFFAFA
style g fill:#FFFAFA, stroke:#FFFAFA
style j fill:#FFFAFA, stroke:#FFFAFA
if structure == 'recursive':
def grow(x, res, lod):
y = block(x, res)
img = lambda: upscale2d(torgb(y, res), 2**lod)
if res > 2: img = cset(img, (lod_in > lod), lambda: upscale2d(lerp(torgb(y, res), upscale2d(torgb(x, res - 1)), lod_in - lod), 2**lod))
if lod > 0: img = cset(img, (lod_in < lod), lambda: grow(y, res + 1, lod - 1))
return img()
images_out = grow(combo_in, 2, resolution_log2 - 2)
2.训练过程
PGGAN的训练主要由train.py中的train_progressive_gan()实现。
# config.py
desc = 'pgan'
train = EasyDict(func='train.train_progressive_gan') # Options for main training func.
sched = EasyDict() # Options for train.TrainingSchedule.
desc += '-preset-v2-1gpu'
num_gpus = 1
sched.minibatch_base = 4
sched.minibatch_dict = {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4}
sched.G_lrate_dict = {1024: 0.0015}
sched.D_lrate_dict = EasyDict(sched.G_lrate_dict)
train.total_kimg = 12000
desc += '-fp32'
sched.max_minibatch_per_gpu = {256: 16, 512: 8, 1024: 4}
首先,加载数据集:
# config.py
desc += '-celebahq';
dataset = EasyDict(tfrecord_dir='celebahq/XXX');
train.mirror_augment = True
training_set = dataset.load_dataset(data_dir=config.data_dir, verbose=True, **config.dataset)
然后,创建网络:
# config.py
desc = 'pgan'
G = EasyDict(func='networks.G_paper') # Options for generator network.
D = EasyDict(func='networks.D_paper') # Options for discriminator network.
print('Constructing networks...')
with tf.device('/gpu:0'):
G = tfutil.Network('G', num_channels=training_set.shape[0], resolution=training_set.shape[1], label_size=training_set.label_size, **config.G)
D = tfutil.Network('D', num_channels=training_set.shape[0], resolution=training_set.shape[1], label_size=training_set.label_size, **config.D)
Gs = G.clone('Gs')
Gs_update_op = Gs.setup_as_moving_average_of(G, beta=G_smoothing)
然后,创建tensorflow图:
1.模型的输入:(由于是tf1版本的代码)
print('Building TensorFlow graph...')
with tf.name_scope('Inputs'):
lod_in = tf.placeholder(tf.float32, name='lod_in', shape=[])
lrate_in = tf.placeholder(tf.float32, name='lrate_in', shape=[])
minibatch_in = tf.placeholder(tf.int32, name='minibatch_in', shape=[])
minibatch_split = minibatch_in // config.num_gpus
reals, labels = training_set.get_minibatch_tf()
reals_split = tf.split(reals, config.num_gpus)
labels_split = tf.split(labels, config.num_gpus)
2.设置优化器Optimizer:
# config.py
G_opt = EasyDict(beta1=0.0, beta2=0.99, epsilon=1e-8) # Options for generator optimizer.
D_opt = EasyDict(beta1=0.0, beta2=0.99, epsilon=1e-8) # Options for discriminator optimizer.
G_opt = tfutil.Optimizer(name='TrainG', learning_rate=lrate_in, **config.G_opt)
D_opt = tfutil.Optimizer(name='TrainD', learning_rate=lrate_in, **config.D_opt)
这里,
with tf.name_scope('GPU%d' % gpu), tf.device('/gpu:%d' % gpu):
G_gpu = G
D_gpu = D
lod_assign_ops = [tf.assign(G_gpu.find_var('lod'), lod_in),
tf.assign(D_gpu.find_var('lod'), lod_in)]
reals_gpu = process_reals(reals_split[gpu], lod_in, mirror_augment, training_set.dynamic_range, drange_net)
labels_gpu = labels_split[gpu]
3.设置G和D的损失函数:
# config.py
G_loss = EasyDict(func='loss.G_wgan_acgan') # Options for generator loss.
D_loss = EasyDict(func='loss.D_wgangp_acgan') # Options for discriminator loss.
with tf.name_scope('G_loss'), tf.control_dependencies(lod_assign_ops):
G_loss = tfutil.call_func_by_name(G=G_gpu, D=D_gpu, opt=G_opt, training_set=training_set, minibatch_size=minibatch_split, **config.G_loss)
with tf.name_scope('D_loss'), tf.control_dependencies(lod_assign_ops):
D_loss = tfutil.call_func_by_name(G=G_gpu, D=D_gpu, opt=D_opt, training_set=training_set, minibatch_size=minibatch_split, reals=reals_gpu, labels=labels_gpu, **config.D_loss)
4.设置反向传播的自定义梯度:利用tf.train.Optimizer.apply_gradients更新权值
G_opt.register_gradients(tf.reduce_mean(G_loss), G_gpu.trainables)
D_opt.register_gradients(tf.reduce_mean(D_loss), D_gpu.trainables)
G_train_op = G_opt.apply_updates()
D_train_op = D_opt.apply_updates()
现在,开始训练: # total_kimg是指总共需要训练的img数量,cur_nimg是指现在已经训练过的img数量,cur_tick是指,tick_start_nimg是指,
# resume_kimg是指,prev_lod是指
print('Training...')
cur_nimg = int(resume_kimg * 1000)
cur_tick = 0
tick_start_nimg = cur_nimg
tick_start_time = time.time()
train_start_time = tick_start_time - resume_time
prev_lod = -1.0
while cur_nimg < total_kimg * 1000:
# Choose training parameters and configure training ops.
# 选择训练参数,配置训练操作。
sched = TrainingSchedule(cur_nimg, training_set, **config.sched)
training_set.configure(sched.minibatch, sched.lod)
# 通过比较prev_lod与sched.lod看是否引入了new_layer,
# 当引入new_layer的时候,重置优化器内部状态(e.g. Adam moments),并更新prev_lod
if reset_opt_for_new_lod:
if np.floor(sched.lod) != np.floor(prev_lod) or np.ceil(sched.lod) != np.ceil(prev_lod):
G_opt.reset_optimizer_state()
D_opt.reset_optimizer_state()
prev_lod = sched.lod
# Run training ops.
# 每更新D_repeats次(默认为1)鉴别器D,更新1次生成器G参数,并更新cur_nimg
for repeat in range(minibatch_repeats):
for _ in range(D_repeats):
tfutil.run([D_train_op, Gs_update_op], {lod_in: sched.lod, lrate_in: sched.D_lrate, minibatch_in: sched.minibatch})
cur_nimg += sched.minibatch
tfutil.run([G_train_op], {lod_in: sched.lod, lrate_in: sched.G_lrate, minibatch_in: sched.minibatch})
tfutil.run([G_train_op], {lod_in: sched.lod, lrate_in: sched.G_lrate, minibatch_in: sched.minibatch})
tf.get_default_session().run(G_train_op(lod_in=sched.lod, lrate_in=sched.G_lrate, minibatch_in=sched.minibatch))
# Perform maintenance tasks once per tick.
# 每tick执行一次维护任务
done = (cur_nimg >= total_kimg * 1000)
if cur_nimg >= tick_start_nimg + sched.tick_kimg * 1000 or done:
cur_tick += 1
cur_time = time.time()
tick_kimg = (cur_nimg - tick_start_nimg) / 1000.0
tick_start_nimg = cur_nimg
tick_time = cur_time - tick_start_time
total_time = cur_time - train_start_time
maintenance_time = tick_start_time - maintenance_start_time
maintenance_start_time = cur_time
print('tick %-5d kimg %-8.1f lod %-5.2f minibatch %-4d time %-12s sec/tick %-7.1f sec/kimg %-7.2f maintenance %.1f' % (
tfutil.autosummary('Progress/tick', cur_tick),
tfutil.autosummary('Progress/kimg', cur_nimg / 1000.0),
tfutil.autosummary('Progress/lod', sched.lod),
tfutil.autosummary('Progress/minibatch', sched.minibatch),
misc.format_time(tfutil.autosummary('Timing/total_sec', total_time)),
tfutil.autosummary('Timing/sec_per_tick', tick_time),
tfutil.autosummary('Timing/sec_per_kimg', tick_time / tick_kimg),
tfutil.autosummary('Timing/maintenance_sec', maintenance_time)))
tfutil.autosummary('Timing/total_hours', total_time / (60.0 * 60.0))
tfutil.autosummary('Timing/total_days', total_time / (24.0 * 60.0 * 60.0))
tfutil.save_summaries(summary_log, cur_nimg)
# Save snapshots.
if cur_tick % image_snapshot_ticks == 0 or done:
grid_fakes = Gs.run(grid_latents, grid_labels, minibatch_size=sched.minibatch//config.num_gpus)
misc.save_image_grid(grid_fakes, os.path.join(result_subdir, 'fakes%06d.png' % (cur_nimg // 1000)), drange=drange_net, grid_size=grid_size)
if cur_tick % network_snapshot_ticks == 0 or done:
misc.save_pkl((G, D, Gs), os.path.join(result_subdir, 'network-snapshot-%06d.pkl' % (cur_nimg // 1000)))
# Record start time of the next tick.
tick_start_time = time.time()
训练完成,保存模型权重,日志文件:
misc.save_pkl((G, D, Gs), os.path.join(result_subdir, 'network-final.pkl'))
summary_log.close()
open(os.path.join(result_subdir, '_training-done.txt'), 'wt').close()
该处使用的url网络请求的数据。
在train.py的classTrainingSchedule()中,通过控制参数,在训练过程中实现渐进式增长的训练: lod_initial_resolution=4, lod_training_kimg=600和lod_transition_kimg=600表示每经过600k次迭代,增长分辨率, 生成器G和鉴别器D的学习率设置初始学习率,除此之外,通过读取config文件的dict决定。
# train.py
def train_progressive_gan():
...
sched = TrainingSchedule(total_kimg * 1000, training_set, **config.sched)
sched = TrainingSchedule(cur_nimg, training_set, **config.sched)
...
# config.py
desc += '-preset-v2-1gpu';
num_gpus = 1;
sched.minibatch_base = 4;
sched.minibatch_dict = {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4};
sched.G_lrate_dict = {1024: 0.0015};
sched.D_lrate_dict = EasyDict(sched.G_lrate_dict);
train.total_kimg = 12000
desc += '-fp32';
sched.max_minibatch_per_gpu = {256: 16, 512: 8, 1024: 4}
同样,tick_kimg也是通过dict决定。
class TrainingSchedule:
def __init__(
self,
cur_nimg,
training_set,
lod_initial_resolution = 4, # Image resolution used at the beginning.
lod_training_kimg = 600, # Thousands of real images to show before doubling the resolution.
lod_transition_kimg = 600, # Thousands of real images to show when fading in new layers.
minibatch_base = 16, # Maximum minibatch size, divided evenly among GPUs.
minibatch_dict = {}, # Resolution-specific overrides.
max_minibatch_per_gpu = {}, # Resolution-specific maximum minibatch size per GPU.
G_lrate_base = 0.001, # Learning rate for the generator.
G_lrate_dict = {}, # Resolution-specific overrides.
D_lrate_base = 0.001, # Learning rate for the discriminator.
D_lrate_dict = {}, # Resolution-specific overrides.
tick_kimg_base = 160, # Default interval of progress snapshots.
tick_kimg_dict = {4: 160, 8:140, 16:120, 32:100, 64:80, 128:60, 256:40, 512:20, 1024:10} # Resolution-specific overrides.
):
假设,现在模型训练了1500k张图像(cur_nimg=1500×1000),kimg指现在已经训练过1500k张图像,phase_dur指现训练阶段(训练分辨率=8×8)所需要训练的图像数量(默认为600k+600k=1200k),phase_idx指现训练阶段对应的idx为1,phase_kimg指在该阶段已训练的图像数量(300k)。
self.kimg = cur_nimg / 1000.0
phase_dur = lod_training_kimg + lod_transition_kimg
phase_idx = int(np.floor(self.kimg / phase_dur)) if phase_dur > 0 else 0
phase_kimg = self.kimg - phase_idx * phase_dur
接着计算lod,其中lod=10(最终输出分辨率=1024)
首先,减去2(初始训练图像分辨率=4×4),lod=8
其次,减去已经完成的阶段idx(phase_idx=1),lod=7
然后,减去该阶段转换阶段完成的百分比;(1)cur_nimg=1500k,训练还未进入转换阶段,lod=7;(2)如果此时的cur_nimg=2100k,可以得知,8×8的训练已经完成,正在向16×16阶段进行转换,lod减去0.5((900k-600k)/600k=0.5),lod=6.5。
最终,计算本次训练的图像分辨率:(1)仍处于8×8的训练的训练阶段;(2)已经处于16×16的训练阶段。
self.lod = training_set.resolution_log2
self.lod -= np.floor(np.log2(lod_initial_resolution))
self.lod -= phase_idx
if lod_transition_kimg > 0:
self.lod -= max(phase_kimg - lod_training_kimg, 0.0) / lod_transition_kimg
self.lod = max(self.lod, 0.0)
self.resolution = 2 ** (training_set.resolution_log2 - int(np.floor(self.lod)))
通过当前resolution决定minibatch大小,以及生成器G和鉴别器D的学习率大小,其中,minibatch的大小还与config.py文件里的设置有关。
假设现在resolution=128,gpu=2,minibatch = 16(此处要保证每个GPU分配的batch大小相同),
假设现在resolution=256,gpu=4,minibatch = 8,(此处要保证每个GPU分配的batch大小相同),
生成器G和鉴别器D的初始学习率为0.001,1024×1024的学习率为0.0015。
# config.py
sched.minibatch_dict = {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4}
sched.max_minibatch_per_gpu = {256: 16, 512: 8, 1024: 4}
sched.G_lrate_dict = {1024: 0.0015};
sched.D_lrate_dict = EasyDict(sched.G_lrate_dict);
# Minibatch size.
self.minibatch = minibatch_dict.get(self.resolution, minibatch_base)
self.minibatch -= self.minibatch % config.num_gpus
if self.resolution in max_minibatch_per_gpu:
self.minibatch = min(self.minibatch, max_minibatch_per_gpu[self.resolution] * config.num_gpus)
self.G_lrate = G_lrate_dict.get(self.resolution, G_lrate_base)
self.D_lrate = D_lrate_dict.get(self.resolution, D_lrate_base)
self.tick_kimg = tick_kimg_dict.get(self.resolution, tick_kimg_base)
总结
提示:这里对文章进行总结:
例如:以上就是今天要讲的内容,本文仅仅简单介绍了pandas的使用,而pandas提供了大量能使我们快速便捷地处理数据的函数和方法。