读取下载到本地的EMNIST数据集中的Letters数据集

2023-12-22 17:11:27

读取自己从官网上下载的数据集Letters。在读取Letters数据集前先了解一下什么是Letters?其实很简单就是一个包涵了a-z和A-Z的手写体数据集总共52个字母，但是却只分了37类，这是因为{C,I,J,K,L,M,O,P,S,U,V,W,X,Y,Z}这几个的大小写很难识别。

在了解完Letters数据集后就是导入数据集了。首先，需要将下载好的压缩包进行解压

Letters数据集：

链接：https://pan.baidu.com/s/1Uq82VExaCJ7Z94cwdX_VRw 提取码: f8vp

解压之后得到四个文件：

1、emnist-letters-test-images-idx3-ubyte.gz 训练集

2、emnist-letters-test-labels-idx1-ubyte.gz 训练集标签

3、emnist-letters-train-images-idx3-ubyte.gz 测试集

4、emnist-letters-train-labels-idx1-ubyte.gz 测试集标签

然后将以上4个压缩文件解压得到对应的Idx文件

之后把解压好的文件和test.py文件放在同一目录下

在导入数据前先检查一下是否有“idx2numpy”这个第三方库，如果没有就需要Win+R打开运行框输入cmd

进入命令提示符，进入Python安装的路径（作者Python的安装路径是：C:\Python37）后接着进入Scripts之后输入：pip install idx2numpy进行安装。

安装完成后就可以用idx2numpy导入Letters数据集的数据了：

import idx2numpy
#导入训练集和训练集标签
X_train = idx2numpy.convert_from_file('./emnist-letters-train-images-idx3-ubyte')
y_train = idx2numpy.convert_from_file('./emnist-letters-train-labels-idx1-ubyte')
#导入测试集和测试集标签
X_test = idx2numpy.convert_from_file('./emnist-letters-test-images-idx3-ubyte')
y_test = idx2numpy.convert_from_file('./emnist-letters-test-labels-idx1-ubyte')

码农公寓

相关文章