Handwritten Digits数据集的简介
根据官方对数据集的描述,我们可以知道完整的手写体数字图像分为两个数据集合。其中,训练数据样本3823条,测试数据1797条,图像数据通过8X8的像素矩阵表示,共有64个像素维度。1个目标维度用来标记每个图像样本代表的数字类别。该数据没有缺失的特征值,并且不论是训练还是测试样本.在数字类别方面都采样得非常平均,是一份非常规整的数据集。
We used preprocessing programs made available by NIST to extract normalized bitmaps of handwritten digits from a preprinted form. From a total of 43 people, 30 contributed to the training set and different 13 to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of 4x4 and the number of on pixels are counted in each block. This generates an input matrix of 8x8 where each element is an integer in the range 0..16. This reduces dimensionality and gives invariance to small distortions.
我们使用NIST提供的预处理程序从预先打印的表单中提取手写数字的标准化位图。共有43人参加,其中30人参加了train,13人参加了test。32x32位图分为不重叠的4x4块,每个块中的像素数都计算在内。这将生成8x8的输入矩阵,其中每个元素都是0到16之间的整数。这减少了维数,并使小变形不变性。
Number of Instances: optdigits.tra Training 3823 optdigits.tes Testing 1797 The way we used the dataset was to use half of training for actual training, one-fourth for validation and one-fourth for writer-dependent testing. The test set was used for writer-independent testing and is the actual quality measure.
optdigits.tra 训练3823份+测试1797份。我们使用数据集的方法是将一半的训练用于实际训练,四分之一用于验证,四分之一用于依赖作者的测试。测试集用于独立于作者的测试,是实际的质量度量。
属性数64输入+1类属性7。对于每个属性:所有输入属性都是0到16范围内的整数。最后一个属性是类代码0..9 8。缺少属性值无
内容转载自:Optical Recognition of Handwritten Digits
9. Class Distribution
Class: No of examples in training set
0: 376
1: 389
2: 380
3: 389
4: 387
5: 376
6: 377
7: 387
8: 380
9: 382
Class: No of examples in testing set
0: 178
1: 182
2: 177
3: 183
4: 181
5: 182
6: 181
7: 179
8: 174
Handwritten Digits数据集的安装
点击对应数据文件即可下载!
数据集下载:https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/
训练集网址:https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tra
Handwritten Digits数据集的使用方法
Two versions of this database available.
1) Preprocessed data can be found in optdigits.tra and optdigits.tes
See optdigits.names for information regarding the preprocessing.
2) The original format of the data can be found in files prefixed with
optdigits-orig.
Cathy Blake
Sept 3,1998
、