Tesseract 训练

2024-03-04 08:14:11

目标图片

原来识别效果

训练后来

1,下载安装 jtessboxeditorfx

里面有自带 tesseract-ocr 的库，用哪个版本放哪个版本搞成一致

或者直接添加系统path设置到上面的 tesseract-ocr

主要是待会儿用命令行的时候关联的那个库命令。

2，使用命令生成 box

tesseract xqchi.normal.exp0.tif xqchi.normal.exp0 -l chi_sim -psm 7 batch.nochop makebox

3，使用 jtessboxeditorfx 修改box （所以版本要一致，不然tesseract 命令跟 jtess 找到的地方不一致，这是版本不一样，底层图片处理不一样，体参都不一样）

然后就缝缝补补操作都挺费时间。

3，使用命令生成traineddata 当然我是搞成BAT 不然把把打一遍

使用一连串命令，提特征

echo engchar 0 0 0 0 0 >font_properties

tesseract xqchi.normal.exp0.tif xqchi.normal.exp0 nobatch box.train

unicharset_extractor xqchi.normal.exp0.box

shapeclustering -F font_properties -U unicharset -O normal.unicharset xqchi.normal.exp0.tr

mftraining -F font_properties -U unicharset -O xqchi.normal xqchi.normal.exp0.tr

cntraining xqchi.normal.exp0.tr

rename normproto normal.normproto
rename inttemp normal.inttemp
rename pffmtable normal.pffmtable
rename shapetable normal.shapetable

combine_tessdata normal.

后面我加的删除中间文件，看的头大。
del font_properties
del normal.normproto
del normal.pffmtable
del normal.shapetable
del normal.unicharset
del normal.inttemp
del unicharset
del xqchi.normal.exp0.box
del xqchi.normal
del xqchi.normal.exp0.tr
del inttemp

最后就生成了

最后拷贝过去追加用就好了。

Done？肯定没有，源码那么厚。

码农公寓

相关文章