《Web安全之机器学习入门》笔记:第七章 7.8 朴素贝叶斯识别mnist验证码

        本小节是通过使用nb算法对mnist数据集的数字识别,不过效果一般般。

        1.源码改错

        作者提供的配套源码编译时有如下问题报错:

C:\ProgramData\Anaconda3\python.exe C:/Users/liujiannan/PycharmProjects/pythonProject/Web安全之机器学习入门/code/7-6.py
Traceback (most recent call last):
  File "C:/Users/liujiannan/PycharmProjects/pythonProject/Web安全之机器学习入门/code/7-6.py", line 25, in <module>
    training_data, valid_data, test_data=load_data()
  File "C:/Users/liujiannan/PycharmProjects/pythonProject/Web安全之机器学习入门/code/7-6.py", line 19, in load_data
    training_data, valid_data, test_data = pickle.load(fp)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

        查看出错部分源码

def load_data():
    with gzip.open('..') as fp:
        training_data, valid_data, test_data = pickle.load(fp)
    return training_data, valid_data, test_dat

        修改方法如下所示:

def load_data():
    with gzip.open('../data/MNIST/mnist.pkl.gz') as fp:
        training_data, valid_data, test_data = pickle.load(fp, encoding="bytes")
    return training_data, valid_data, test_data

        2.数据集处理

def load_data():
    with gzip.open('../data/MNIST/mnist.pkl.gz') as fp:
        training_data, valid_data, test_data = pickle.load(fp, encoding="bytes")
    return training_data, valid_data, test_data


if __name__ == '__main__':
    training_data, valid_data, test_data=load_data()
    x1,y1=training_data
    x2,y2=test_data

  3.完整源码

# -*- coding:utf-8 -*-

import re
import matplotlib.pyplot as plt
import os
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import model_selection
import os
from sklearn.naive_bayes import GaussianNB


import pickle
import gzip


def load_data():
    with gzip.open('../data/MNIST/mnist.pkl.gz') as fp:
        training_data, valid_data, test_data = pickle.load(fp, encoding="bytes")
    return training_data, valid_data, test_data


if __name__ == '__main__':
    training_data, valid_data, test_data=load_data()
    x1,y1=training_data
    x2,y2=test_data
    clf = GaussianNB()
    clf.fit(x1, y1)
    score = model_selection.cross_val_score(clf, x2, y2, scoring="accuracy")
    print(score)
    print(score.mean())





  4.运行结果 

[0.53684841 0.58385839 0.6043857 ]
0.575030833157769

很明显,结果不咋地,nb对于多分类效果较差,而对于二分类效果还可以。

上一篇:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 的自我理解


下一篇:Training 1.2