使用tesseract-ocr读取图片文字(转)

1、下载安装tesseract:

  https://digi.bib.uni-mannheim.de/tesseract/

2、配置环境变量:

  在path变量中加入tesseract-ocr的安装路径

3、使用tesseract指令,测试安装是否成功

4、使用命令行:

  1.tesseract + 图片路径 + 保存结果名 + -l 语言集

  示列: tesseract 1606150081.png 1606150081 -l chi_sim

  2.tesseract + 图片路径 +stdout -l +语言集

  示列: tesseract D:\company\ruigushop\spring-2s\test.png stdout -l chi_sim

5、Java代码:

  

package com.lbh.web.controller;

/*
 * Copyright@lbhbinhao@163.com
 * Author:liubinhao
 * Date:2020/11/23
 * ++++ ______ @author       liubinhao   ______             ______
 * +++/     /|                         /     /|           /     /|
 * +/_____/  |                       /_____/  |         /_____/  |
 * |     |   |                      |     |   |        |     |   |
 * |     |   |                      |     |   |________|     |   |
 * |     |   |                      |     |  /         |     |   |
 * |     |   |                      |     |/___________|     |   |
 * |     |   |___________________   |     |____________|     |   |
 * |     |  /                  / |  |     |   |        |     |   |
 * |     |/ _________________/  /   |     |  /         |     |  /
 * |_________________________|/b    |_____|/           |_____|/
 */
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;

import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;

@RestController
public class LiteralExtractController {

    @PostMapping("/image/extract")
    public String reg(@RequestParam("file")MultipartFile file) throws IOException {
        String result = "";
        String filename = file.getOriginalFilename();
        File save = new File(System.getProperty("user.dir")+"\\"+filename);
        if (!save.exists()){
            save.createNewFile();
        }
        file.transferTo(save);
        String cmd = String.format("tesseract %s stdout -l %s",System.getProperty("user.dir")+"\\"+filename,"chi_sim");
        result = cmd(cmd);
        return result;
    }

    public static String cmd(String cmd) {
        BufferedReader br = null;
        try {
            Process p = Runtime.getRuntime().exec(cmd);
            br = new BufferedReader(new InputStreamReader(p.getInputStream()));
            String line = null;
            StringBuilder sb = new StringBuilder();
            while ((line = br.readLine()) != null) {
                sb.append(line + "\n");
            }
            return sb.toString();
        } catch (Exception e) {
            e.printStackTrace();
        }
        finally
        {
            if (br != null)
            {
                try {
                    br.close();
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }
        return null;
    }
}

转自:https://mp.weixin.qq.com/s/CvDF_AyxyOZftQvpub1A1Q

使用tesseract-ocr读取图片文字(转)

上一篇:[海思] 中断申请和重启问题


下一篇:leetcode 199. 二叉树的右视图 剑指 Offer II 046. 二叉树的右侧视图