工具: homebrew(x86)
环境: conda虚拟环境 python=3.7
tips: M1 芯片利用 homebrew 安装 miniconda 搭建 python3.7 的虚拟环境
如果还装了 miniforge,注意在安装完后根据提示 init 一下你的 shell
OCRmyPDF 似乎还未对3.8及以上的版本作适配
官方文档: https://ocrmypdf.readthedocs.io/en/latest/installation.html
搭建虚拟环境 & 安装 OCRmyPDF
brew install miniconda
conda install python=3.7
conda create -n py37 python=3.7
conda activate py37
conda info | grep env # 看下目前环境
pip install ocrmypdf
ocrmypdf --version
# conda deactivate # 关闭
配齐依赖的包
官方提示:
As of ocrmypdf 7.2.1, the following versions are recommended:
Python 3.7 or 3.8
Ghostscript 9.23 or newer
qpdf 8.2.1
Tesseract 4.0.0 or newer
以下三个为可选项:
jbig2enc 0.29 or newer
pngquant 2.5 or newer
unpaper 6.1
利用下面这个命令根据提示进行配置,
ocrmypdf -l eng --clean-final input.pdf ocr.pdf
出现该指令代表以已经配置成功: InputFileError: File not found - input.pdf
- 安装 tesseract-lang
brew install tesseract-lang
- 安装 unpaper
报错信息 The program ‘unpaper’ could not be executed or was not found on your system PATH. This program is required when you use the [’–clean, --clean-final’] arguments. You could try omitting these arguments, or installthe package.
brew install unpaper
- 安装 Ghostscript
报错信息 Could not find program ‘gs’ on the PATH
brew install Ghostscript
此时再次运行测试的命令就不会报依赖包缺失的错误了,若有其他的需求,根据官方文档给出的提示用 brew install
安装即可