Eager essentials
Eager 要领
Tensorflow的eager execution 是一个命令式编程环境(imperative programming environment),他可以运算返回具体值,而不是构建计算图形以便稍后运行。这样可以轻松的使用TensorFlow和调试模型,并且还可以减少样板。
Eager execution是一个灵活的机器学习研究和实验的平台,他提供:
- An intuitive interface(直观的界面)——自然地构建python代码并使用python数据结构。快速地迭代小型模型和小型的数据集。
- Easily debugging(容易调试)——直接调用ops(操作)来检查运行模型或测试更改。使用标准的python调试工具进行及时错误报告。
natural control flow(自然的控制流)——使用python控制流而不是计算图控制流,简化了动态模型的规范。
安装与基本使用
from __future__ import absolute_import, division, print_function, unicode_literals !pip install -q tensorflow-gpu==2.0.0-beta1 import tensorflow as tf import cProfile
而在TensorFlow2.0中,eager是默认开启的。
tf.executing_eagerly() # 改名返回eager mode
如果eager打开,你可以运行TensorFlow操作并且立刻返回结果:
x = [[2.]] m = tf.matmul(x, x) print("hello, {}".format(m)) # hello,[[4.]]
打开eager execution会改变TensorFlow的操作行为——现在他们直接计算并返回他们的值给python。tf.tensor的对象是指的具体的值而非计算图中的符号句柄。由于在会话(session)中没有构建计算图,因此使用print()或调试器检查结果很容易。计算,打印和检查Tensor的值不会破坏计算梯度的flow。
eager execution与numpy很好协作。numpy操作接受tf.tensor参数。TensorFlow数学运算将python对象和numpy数组转换为tf.tensor对象。tf.tensor.numpy方法将对象的值作为numpy ndarray返回。
另外,eagerexecution支持broadcasting。运算符重载:
a = tf.constant([[1,2], [3,4] ]) print(a) # a tensor include(matrix,shape=(2,2),dtype=int32) b = tf.add(a,1) print(b) # broadingcasting-> [[2,3],[4,5]] print(a*b) # operator overloading import numpy as np c = np.multiply(a,b) # use numpy values print(c) print(a.numpy()) # tensor->numpy
动态控制流
使用eager execution的一个好处是在执行模型时可以使用host language的全部功能,例如:
def fizzbuzz(max_num): counter = tf.constant(0) max_num = tf.convert_to_tensor(max_num) for num in range(1, max_num.numpy()+1): num = tf.constant(num) if int(num % 3) == 0 and int(num % 5) == 0: print('FizzBuzz') elif int(num % 3) == 0: print('Fizz') elif int(num % 5) == 0: print('Buzz') else: print(num.numpy()) counter += 1
fizzbuzz(15) # 1 2 Fizz
Eager training
Computing gradients
自动微分(automatic differentiation)在机器学习算法中是非常有用的,比如在神经网络中的反向传播(backpropagation)。在eager execution中,使用tf.GradienTape来跟踪稍后计算梯度的操作。
你可以用tf.GradientTape在eager中训练或计算梯度。这在负载的训练循环中非常有用。
因为在每次发生调用(call)的时候,都可能发生不同的操作,所有的钱向传播都记录到了一个“tape”中, 为了计算梯度,将tape反向“播放”然后丢弃掉。一个特定的tf.GradientTape只能计算一次梯度,后续调用会引发运行时的错误。(没懂)
训练模型train a model
下面这个例子创建了一个多层模型,对于标准的MNIST手写数字进行分类。他演示了在eager执行环境下优化器和卷积池化层之类的API构建可训练计算图。
# Fetch and format the mnist data (mnist_images, mnist_labels), _ = tf.keras.datasets.mnist.load_data() dataset = tf.data.Dataset.from_tensor_slices( (tf.cast(mnist_images[...,tf.newaxis]/255, tf.float32), tf.cast(mnist_labels,tf.int64))) dataset = dataset.shuffle(1000).batch(32) # Build the model mnist_model = tf.keras.Sequential([ tf.keras.layers.Conv2D(16,[3,3], activation='relu', input_shape=(None, None, 1)), tf.keras.layers.Conv2D(16,[3,3], activation='relu'), tf.keras.layers.GlobalAveragePooling2D(), tf.keras.layers.Dense(10) ]) # Even without training, call the model and inspect the output in eager execution: for images,labels in dataset.take(1): print("Logits: ", mnist_model(images[0:1]).numpy())
虽然keras模型具有内置训练循环(使用fit方法),有时候你需要更多自定义,这是一个用eager实现循环的例子:
optimizer = tf.keras.optimizers.Adam() loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) loss_history = [] def train_step(images, labels): with tf.GradientTape() as tape: logits = mnist_model(images, training=True) # Add asserts to check the shape of the output. tf.debugging.assert_equal(logits.shape, (32, 10)) loss_value = loss_object(labels, logits) loss_history.append(loss_value.numpy().mean()) grads = tape.gradient(loss_value, mnist_model.trainable_variables) optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables)) def train(): for epoch in range(3): for (batch, (images, labels)) in enumerate(dataset): train_step(images, labels) print ('Epoch {} finished'.format(epoch)) train() # Epoch 0 finished;Epoch 1 finished ...
import matplotlib.pyplot as plt plt.plot(loss_history) plt.xlabel('Batch #') plt.ylabel('Loss [entropy]')
Variables and optimizers
在训练期间tf.Variable对象存储mutable(可变的)tf.Tensor的值,可以使得自动微分更加简单,模型的参数可以作为变量封装在类中。
使用tf.Variable和tf.GradientTape更好地封装模型参数。例如,可以在自动微分的例子上进行重写:
class Model(tf.keras.Model): def __init__(self): super(Model, self).__init__() self.W = tf.Variable(5., name='weight') self.B = tf.Variable(10., name='bias') def call(self, inputs): return inputs * self.W + self.B # A toy dataset of points around 3 * x + 2 NUM_EXAMPLES = 2000 training_inputs = tf.random.normal([NUM_EXAMPLES]) noise = tf.random.normal([NUM_EXAMPLES]) training_outputs = training_inputs * 3 + 2 + noise # The loss function to be optimized def loss(model, inputs, targets): error = model(inputs) - targets return tf.reduce_mean(tf.square(error)) def grad(model, inputs, targets): with tf.GradientTape() as tape: loss_value = loss(model, inputs, targets) return tape.gradient(loss_value, [model.W, model.B]) # Define: # 1. A model. # 2. Derivatives of a loss function with respect to model parameters. # 3. A strategy for updating the variables based on the derivatives. model = Model() optimizer = tf.keras.optimizers.SGD(learning_rate=0.01) print("Initial loss: {:.3f}".format(loss(model, training_inputs, training_outputs))) # Training loop for i in range(300): grads = grad(model, training_inputs, training_outputs) optimizer.apply_gradients(zip(grads, [model.W, model.B])) if i % 20 == 0: print("Loss at step {:03d}: {:.3f}".format(i, loss(model, training_inputs, training_outputs))) print("Final loss: {:.3f}".format(loss(model, training_inputs, training_outputs))) print("W = {}, B = {}".format(model.W.numpy(), model.B.numpy()))View Code
Use objects for state during eager execution
在TF1.x的计算图执行的时候,程序状态(例如 variables)是存储在全局集合中的,其生命周期是由tf.Session对象管理的。相反,在eager模式下,程序状态对象的生命周期是由其相应的python对象的生命周期决定的。
Variables are objects
在eager模式期间,variables在对象的最后一个引用被删除之前将一直存在而不被删除。.
if tf.test.is_gpu_available(): with tf.device("gpu:0"): print("GPU enabled") v = tf.Variable(tf.random.normal([1000, 1000])) v = None # v no longer takes up GPU memory
object-based saving 基于对象的保存检查点
这一节是培训检查点指南的缩写版本。
tf.train.Checkpoint 可以用来save和restore tf.Variables to/from checkpoint:
(变量保存和恢复)
# 首先创建一变量,并常见保存点变量 x = tf.Variable(10.) checkpoint = tf.train.Checkpoint(x=x) x.assign(2.) #赋给x一个新的值,并保存 checkpoint_path = './ckpt/' checkpoint.save('./ckpt/') # 这个地方是./ckpt/而不是./ckpt。 # 所以保存在./ckpt/ 目录下的 -1文件中。 # 如果是./ckpt,则直接保存在当前目录的ckpt-1的文件中 x.assign(11.) # Change the variable after saving. # Restore values from the checkpoint checkpoint.restore(tf.train.latest_checkpoint(checkpoint_path)) print(x) # =><tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>
为了保存和恢复模型,tf.train.Checkpoint存储对象的内部状态,而不需要隐藏变量。要记录一个模型的状态,优化器,以及全局步骤,也需要通过tf.train.Checkpoint来保存:
(模型的保存和恢复)
# save and restore model import os model = tf.keras.Sequential([ tf.keras.layers.Conv2D(16,[3,3],activation='relu'), tf.keras.layers.GlobalAveragePooling2D(), tf.keras.layers.Dense(10) ]) optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) checkpoint_dir = 'path/to/model_dir' if not os.path.exists(checkpoint_dir): os.makedirs(checkpoint_dir) checkpoint_prefix = os.path.join(checkpoint_dir,'ckpt') # print(checkpoint_prefix) # path/to/model_dir/ckpt root = tf.train.Checkpoint(optimizer=optimizer,model=model) root.save(checkpoint_prefix) # ./path/to/ckpt-1.xxxx root.restore(tf.train.latest_checkpoint(checkpoint_dir)) # 恢复变量
注意:在许多训练循环中,在调用tf.train.Checkpoint.restore之后创建变量。 这些变量将在创建后立即恢复,并且可以使用断言来确保检查点已完全加载。 有关详细信息,请参阅培训检查点指南。
高级自动微分主题
相关推荐阅读:https://www.cnblogs.com/richqian/p/4549590.html
https://www.cnblogs.com/richqian/p/4534356.html
https://www.jianshu.com/p/fe2e7f0e89e5
Dynamic models
tf.GradientTape也可用于动态模型。 这是回溯线搜索算法(backtracking line search alg)的示例,尽管控制流很复杂,但它看起来像普通的NumPy代码,除了有自动微分是可区分的:(不会)
def line_search_step(fn, init_x, rate=1.0): with tf.GradientTape() as tape: # Variables are automatically recorded, but manually watch a tensor tape.watch(init_x) value = fn(init_x) grad = tape.gradient(value, init_x) grad_norm = tf.reduce_sum(grad * grad) init_value = value while value > init_value - rate * grad_norm: x = init_x - rate * grad value = fn(x) rate /= 2.0 return x, value
Custom gradients(自定义梯度)
自定义梯度是一种重写梯度的简单方法。根据输入,输出或结果定义梯度。例如这有一种在后向传递中剪切渐变范数的简单方法:
@tf.custom_gradient def clip_gradient_by_norm(x, norm): y = tf.identity(x) def grad_fn(dresult): return [tf.clip_by_norm(dresult, norm), None] return y, grad_fn # 自定义梯度通常用于为一系列操作提供数值稳定的梯度: def log1pexp(x): return tf.math.log(1 + tf.exp(x)) def grad_log1pexp(x): with tf.GradientTape() as tape: tape.watch(x) value = log1pexp(x) return tape.gradient(value, x) # The gradient computation works fine at x = 0. grad_log1pexp(tf.constant(0.)).numpy()
Performance
在eager模式下,计算会自动卸载(offload)到GPU,如果要控制 计算运行的设备,你可以使用tf.device(/gpu:0)快(或等效的CPU设备)中把他包含进去。
import time def measure(x, steps): # TensorFlow initializes a GPU the first time it's used, exclude from timing. tf.matmul(x, x) start = time.time() for i in range(steps): x = tf.matmul(x, x) # tf.matmul can return before completing the matrix multiplication # (e.g., can return after enqueing the operation on a CUDA stream). # The x.numpy() call below will ensure that all enqueued operations # have completed (and will also copy the result to host memory, # so we're including a little more than just the matmul operation # time). _ = x.numpy() end = time.time() return end - start # shape = (1000, 1000)
shape = (50, 50) # 我的电脑貌似只能跑50的,超过100jupyter notebook就会挂掉,另外 我依然不会查看GPU使用率 steps = 200 print("Time to multiply a {} matrix by itself {} times:".format(shape, steps)) # Run on CPU: with tf.device("/cpu:0"): print("CPU: {} secs".format(measure(tf.random.normal(shape), steps))) # Run on GPU, if available: if tf.test.is_gpu_available(): with tf.device("/gpu:0"): print("GPU: {} secs".format(measure(tf.random.normal(shape), steps))) else: print("GPU: not found")
一个tf.tensor对象可以复制到不同的设备上去执行操作:
if tf.test.is_gpu_available(): x = tf.random.normal([10, 10]) x_gpu0 = x.gpu() x_cpu = x.cpu() _ = tf.matmul(x_cpu, x_cpu) # Runs on CPU _ = tf.matmul(x_gpu0, x_gpu0) # Runs on GPU:0