【PaddlePaddle系列】手写数字识别

2022-11-15 13:41:07

最近百度为了推广自家编写对深度学习框架PaddlePaddle不断推出各种比赛。百度声称PaddlePaddle是一个“易学、易用”的开源深度学习框架，然而网上的资料少之又少。虽然百度很用心地提供了许多文档，而且还是中英双语具备，但是最关键的是报错了很难在网上找到相应的解决办法。为了明年备战百度的比赛，便开始学习以下PaddlePaddle。

1、安装

PaddlePaddle同样支持CUDA加速运算，但是如果没有NVIDIA的显卡，那就还是装CPU版本。

CPU版本安装：pip install paddlepaddle

GPU版本根据所安装的CUDA版本以及cuDNN版本有所不同：

CUDA9 + cuDNN7.0：pip install paddlepaddle-gpu

CUDA8 + cuDNN7.0 : pip install paddlepaddle-gpu==0.14.0.post87

CUDA8 + cuDNN5.0 : pip install paddlepaddle-gpu==0.14.0.post85

2、手写数字识别

其实，Paddle的GitHub提供了这个例程。但是，个人感觉这个例程部分直接调用PaddlePaddle内部类使得读者阅读起来十分困难。特别是数据输入（Feed）中的reader，如果直接看程序，它直接一个函数就完成了图像输入，完全搞不懂它是如何操作。这里也就重点将这里，个人感觉这是和Tensorflow较大的区别。

2.1、网络构建

程序中提供了三种网络模型，代码很明显，这里应该不用太多说，直接贴出来了。需要注意的是，PaddlePaddle将图像的通道数放在最前面，即为[C H W]，区别于[H W C]。

(1)、单层全连接层+softmax

#a full-connect-layer network using softmax as activation function

def softmax_regression():

    img = fluid.layers.data(name='img',shape=[1,28,28],dtype='float32')

    predict = fluid.layers.fc(input=img,size=10,act='softmax')

    return predict

(2)、多层全连接层+softmax

#3 full-connect-layers network using softmax as activation function
def multilayer_perceptron():

    img = fluid.layers.data(name='img',shape=[1,28,28],dtype='float32')

    hidden = fluid.layers.fc(input = img,size=128,act='softmax')

    hidden = fluid.layers.fc(input = hidden,size=64,act='softmax')

    prediction = fluid.layers.fc(input = hidden,size=10,act='softmax')

    return prediction

(3)、卷积神经网络

#traditional converlutional neural network
def cnn():

    img = fluid.layers.data(name='img',shape=[1, 28, 28], dtype ='float32')

    # first conv pool

    conv_pool_1 = fluid.nets.simple_img_conv_pool(

        input = img,

        filter_size = 5,

        num_filters = 20,

        pool_size=2,

        pool_stride=2,

        act="relu")

    conv_pool_1 = fluid.layers.batch_norm(conv_pool_1)

    # second conv pool

    conv_pool_2 = fluid.nets.simple_img_conv_pool(

        input=conv_pool_1,

        filter_size=5,

        num_filters=50,

        pool_size=2,

        pool_stride=2,

        act="relu")

    # output layer with softmax activation function. size = 10 since there are only 10 possible digits.

    prediction = fluid.layers.fc(input=conv_pool_2, size=10, act='softmax')

    return prediction

2.2、构建损失函数

PaddlePaddle的损失函数的构建基本上与tensorflow没有太大的区别。但是需要指出的是：（1）在tensorflow中交叉熵的求解函数是使用[0 0 0 ... 1 ...]等长向量求解。但是在PaddlePaddle中，交叉熵是直接与一个整数求解；（2）标签(lable)的输入数据类型使用的是int64，尽管reader生成器返回的是int类型。笔者尝试将其改为int32类型，但是会出错。另外在其他实践过程中使用int32也是有相应的错误。

def train_program():

    #if using dtype='int64', it reports errors!

    label = fluid.layers.data(name='label', shape=[1], dtype='int64')

    # Here we can build the prediction network in different ways. Please

    predict = cnn()

    #predict = softmax_regression()

    #predict = multilayer_perssion()

    # Calculate the cost from the prediction and label.

    cost = fluid.layers.cross_entropy(input=predict, label=label)

    avg_cost = fluid.layers.mean(cost)

    acc = fluid.layers.accuracy(input=predict, label=label)

    return [avg_cost, acc]

PaddlePaddle使用Trainer进行训练，只需构建训练函数train_program作为Trainer参数（这个下面个再详细讲解）。这里要说一下，函数返回一个向量[arg_cost, acc]，其中第一个元素作为损失函数，而后面几个元素则是可选的，用于在迭代过程中print出来。所以，返回arg_cost是必要的，其他是可选的。特别说明：不要作死将一个常量放在里面，也就是里面的元素必须是会随着训练而变化，如果作死“acc=1”，则在训练中会报错。

2.3、训练

PaddlePaddle使用fulid.Trainer来创建训练器。这里则需要配备好训练器的train_program(损失函数)、place(是否使用GPU)以及optimizer_program(优化器)。然后调用train函数来进行训练。详细可见下面程序：

def optimizer_program():

    return fluid.optimizer.Adam(learning_rate=0.001)

if __name__ == "__main__":

    print("run minst train\n")

    minst_prefix = '/home/dzqiu/DataSet/minst/'

    train_image_path   = minst_prefix + 'train-images-idx3-ubyte.gz'

    train_label_path   = minst_prefix + 'train-labels-idx1-ubyte.gz'

    test_image_path    = minst_prefix + 't10k-images-idx3-ubyte.gz'

    test_label_path    = minst_prefix + 't10k-labels-idx1-ubyte.gz'

    #reader_creator在将在下面讲述

    train_reader = paddle.batch(paddle.reader.shuffle(#shuffle用于打乱buffer的循序

                    reader_creator(train_image_path,train_label_path,buffer_size=100),

                                        buf_size=500),

                                        batch_size=64)

    test_reader  = paddle.batch(
                    reader_creator(test_image_path,test_label_path,buffer_size=100),

                    batch_size=64)              #测试集就不用打乱了

    #if use GPU, use 'export FLAGS_fraction_of_gpu_memory_to_use=0' at first

    use_cuda = True

    place    = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()

    trainer  = fluid.Trainer(train_func=train_program,
                             place=place,
                             optimizer_func=optimizer_program)

    params_dirname = "recognize_digits_network.inference.model"

    lists = []

    #

    def event_handler(event):

        if isinstance(event,fluid.EndStepEvent):#每步触发事件

            if event.step % 100 == 0:

                print("Pass %d, Epoch %d, Cost %f, Acc %f"\

                       %(event.step, event.epoch,

                       event.metrics[0],#train_program返回的第一个参数arg_cost

                       event.metrics[1]))#train_program返回的第二个参数acc

        if isinstance(event,fluid.EndEpochEvent):#每次迭代触发事件

            trainer.save_params(params_dirname)

            #使用test的时候，返回值就是train_program的返回，所以赋值需要对应

            avg_cost, acc = trainer.test(reader=test_reader,
                                         feed_order=['img','label'])

            print("Test with Epoch %d, avg_cost: %s, acc: %s"

                  %(event.epoch, avg_cost, acc))

            lists.append((event.epoch, avg_cost, acc))

    # Train the model now

    trainer.train(num_epochs=5,event_handler=event_handler,
                  reader=train_reader,feed_order=['img', 'label'])

    # find the best pass

    best = sorted(lists, key=lambda list: float(list[1]))[0]

    print 'Best pass is %s, testing Avgcost is %s' % (best[0], best[1])

    print 'The classification accuracy is %.2f%%' % (float(best[2]) * 100)

2.4、训练数据的读取 Reader

PaddlePaddle的训练数据读取仅用一个paddle.dataset.mnist.train()解决，封装起来难以理解其操作，更不能看出如何读取自己的训练集。这里，我将这个段函数从源码中挖出来简化为reader_creator，实现对minst数据集的读取，首先让我们看看minst数据集的格式：

训练集中，标签集前8个字节是magic和数目，后面每个字节代表数字0-9的标签；图像集中前16字节是一些数据集信息，包括magic、图像数目、行数和列数，后面每个字节代表每个像素点，也就是说我们连续取出28*28个字节安顺序就可以组成28*28的图片。弄清楚文件如何读取，那么就可以编写reader：

def reader_creator(image_filename,label_filename,buffer_size):

    def reader():

    #调用命令读取文件，Linux下使用zcat

        if platform.system()=='Linux':

            zcat_cmd = 'zcat'

        elif paltform.system()=='Windows':

            zcat_cmd = 'gzcat'

        else:

            raise NotImplementedError("This program is suported on Windows or Linux,\

                                      but your platform is" + platform.system())

        #create a subprocess to read the images

        sub_img = subprocess.Popen([zcat_cmd, image_filename], stdout = subprocess.PIPE)

        sub_img.stdout.read(16) #skip some magic bytes 这里我们已经知道，所以我们不在需要前16字节

        #create a subprocess to read the labels

        sub_lab = subprocess.Popen([zcat_cmd, label_filename], stdout = subprocess.PIPE)

        sub_lab.stdout.read(8)  #skip some magic bytes 同理

    try:

            while True:         #前面使用try,故若再读取过程中遇到结束则会退出

        #label is a pixel repersented by a unsigned byte,so just read a byte

                labels = numpy.fromfile(

                            sub_lab.stdout,'ubyte',count=buffer_size).astype("int")

                if labels.size != buffer_size:

                    break

        #read 28*28 byte as array,and then resize it

                images = numpy.fromfile(

                            sub_img.stdout,'ubyte',
                            count=buffer_size * 28 * 28)
                            .reshape(buffer_size, 28, 28).astype("float32")

        #mapping each pixel into (-1,1)

                images = images / 255.0 * 2.0 - 1.0;

                for i in xrange(buffer_size):

                    yield images[i,:],int(labels[i]) #将图像与标签抛出，循序与feed_order对应！

        finally:

            try:

        #terminate the reader subprocess

                sub_img.terminate()

            except:

                pass

            try:

        #terminate the reader subprocess

                sub_lable.terminate()

            except:

                pass

    return reader

2.5、运行结果

训练集中有60000张图片，buffer_size为100，batch_size为64，所以应该Pass了900多次。

Pass , Batch , Cost 4.250958, Acc 0.062500

Pass , Batch , Cost 0.249865, Acc 0.953125

Pass , Batch , Cost 0.281933, Acc 0.906250

Pass , Batch , Cost 0.147851, Acc 0.953125

Pass , Batch , Cost 0.144059, Acc 0.968750

Pass , Batch , Cost 0.082035, Acc 0.953125

Pass , Batch , Cost 0.105593, Acc 0.984375

Pass , Batch , Cost 0.148170, Acc 0.968750

Pass , Batch , Cost 0.182150, Acc 0.937500

Pass , Batch , Cost 0.066323, Acc 0.968750

Test with Epoch , avg_cost: 0.07329441363440427, acc: 0.9762620192307693

Pass , Batch , Cost 0.157396, Acc 0.953125

Pass , Batch , Cost 0.050120, Acc 0.968750

Pass , Batch , Cost 0.086324, Acc 0.984375

Pass , Batch , Cost 0.002137, Acc 1.000000

Pass , Batch , Cost 0.173876, Acc 0.984375

Pass , Batch , Cost 0.059772, Acc 0.968750

Pass , Batch , Cost 0.035788, Acc 0.984375

Pass , Batch , Cost 0.008351, Acc 1.000000

Pass , Batch , Cost 0.022678, Acc 0.984375

Pass , Batch , Cost 0.021835, Acc 1.000000

Test with Epoch , avg_cost: 0.06836433922317389, acc: 0.9774639423076923

Pass , Batch , Cost 0.214221, Acc 0.937500

Pass , Batch , Cost 0.212448, Acc 0.953125

Pass , Batch , Cost 0.007266, Acc 1.000000

Pass , Batch , Cost 0.015241, Acc 1.000000

Pass , Batch , Cost 0.061948, Acc 0.984375

Pass , Batch , Cost 0.043950, Acc 0.984375

Pass , Batch , Cost 0.018946, Acc 0.984375

Pass , Batch , Cost 0.015527, Acc 0.984375

Pass , Batch , Cost 0.035185, Acc 0.984375

Pass , Batch , Cost 0.004890, Acc 1.000000

Test with Epoch , avg_cost: 0.05774364945361809, acc: 0.9822716346153846

Pass , Batch , Cost 0.031849, Acc 0.984375

Pass , Batch , Cost 0.059525, Acc 0.953125

Pass , Batch , Cost 0.022106, Acc 0.984375

Pass , Batch , Cost 0.006763, Acc 1.000000

Pass , Batch , Cost 0.056089, Acc 0.984375

Pass , Batch , Cost 0.018876, Acc 1.000000

Pass , Batch , Cost 0.010325, Acc 1.000000

Pass , Batch , Cost 0.010989, Acc 1.000000

Pass , Batch , Cost 0.026476, Acc 0.984375

Pass , Batch , Cost 0.007792, Acc 1.000000

Test with Epoch , avg_cost: 0.05476908334449968, acc: 0.9830729166666666

Pass , Batch , Cost 0.061547, Acc 0.984375

Pass , Batch , Cost 0.002315, Acc 1.000000

Pass , Batch , Cost 0.009715, Acc 1.000000

Pass , Batch , Cost 0.024202, Acc 0.984375

Pass , Batch , Cost 0.150663, Acc 0.968750

Pass , Batch , Cost 0.082586, Acc 0.984375

Pass , Batch , Cost 0.012232, Acc 1.000000

Pass , Batch , Cost 0.055258, Acc 0.984375

Pass , Batch , Cost 0.016068, Acc 1.000000

Pass , Batch , Cost 0.004945, Acc 1.000000

Test with Epoch , avg_cost: 0.041706092633705505, acc: 0.9865785256410257

Best pass is , testing Avgcost is 0.041706092633705505

The classification accuracy is 98.66%

2.6 测试接口

PaddlePaddle提供接口函数，调用接口即可。特别的是，图像需要转化为[N C H W]的张量，如果是一张图像，这里N当然是1，因为是灰度图C也便是1。具体看下面代码：

def load_image(file):

        im = Image.open(file).convert('L')

        im = im.resize((28, 28), Image.ANTIALIAS)

        im = numpy.array(im).reshape(1, 1, 28, 28).astype(np.float32) #[N C H W] 这里多了一个N

        im = im / 255.0 * 2.0 - 1.0

        return im

    cur_dir = os.path.dirname(os.path.realpath(__file__))

    img = load_image(cur_dir + '/infer_3.png')

    inferencer = fluid.Inferencer(

        # infer_func=softmax_regression, # uncomment for softmax regression

        # infer_func=multilayer_perceptron, # uncomment for MLP

        infer_func=cnn,  # uncomment for LeNet5

        param_path=params_dirname,

        place=place)

    results = inferencer.infer({'img': img})

    lab = numpy.argsort(results)  # probs and lab are the results of one batch data

    print "Label of infer_3.png is: %d" % lab[0][0][-1]

测试结果如下：

Label of infer_3.png is:

3、结语

PaddlePaddle与tensorflow还是有一定的区别，而且除了错误很难搜到解决方法，笔者会另外开一篇博客整理总结PaddlePaddle遇到的各种问题，这个对于例程的讲解也将会继续下去，坚持每周三更新，快开学了，还加把劲。

源码地址：Github

参考：Paddle/book /02.recognize_digits

码农公寓

相关文章