MacBookユーザー向けPython OCRのおすすめ

学生の診療確認書を認識するために試していないOCRはないようだ。

Tesseract、EasyOCR、PaddleOCRなどさまざまなOCRを試したが、韓国語に対する満足できる性能を持ったOCRはなかった。

しかし最近、MacBookにあるライブテキスト機能をラッピングしたPythonライブラリを発見した。

Appleで公開されたVision Framework APIを利用したものである。

1. Vision Framework

Vision Frameworkは、AppleがHigh Sierraから開発者向けに公開したマシンビジョンフレームワークである。

画像分類、整列、テキスト認識、顔認識などさまざまなAPIを提供する。

https://developer.apple.com/kr/machine-learning/api/

developer.apple.com

オンデバイスのため、インターネット接続も特に必要ない。

純粋なSwiftで実装する場合は、以下のように利用できる。

import Foundation
import Vision
import AppKit

func loadNSImage(_ path: String) -> NSImage? {
    return NSImage(contentsOfFile: path)
}

func cgImage(from nsImage: NSImage) -> CGImage? {
    var rect = CGRect(origin: .zero, size: nsImage.size)
    return nsImage.cgImage(forProposedRect: &rect, context: nil, hints: nil)
}

let args = CommandLine.arguments
guard args.count >= 2 else {
    fputs("Usage: ocr <image_path> [lang1,lang2,...] [roi]\n", stderr)
    exit(1)
}
let imagePath = args[1]
let langs = args.count >= 3 ? args[2].split(separator: ",").map { String($0) } : ["ko-KR","en-US"]

// ROI: "x,y,w,h" in 0~1 (optional)
var roi: CGRect? = nil
if args.count >= 4 {
    let comps = args[3].split(separator: ",").compactMap { Double($0) }
    if comps.count == 4 {
        roi = CGRect(x: comps[0], y: comps[1], width: comps[2], height: comps[3])
    }
}

guard let nsImage = loadNSImage(imagePath), let cg = cgImage(from: nsImage) else {
    fputs("Failed to load image\n", stderr)
    exit(1)
}

let request = VNRecognizeTextRequest { request, error in
    if let error = error {
        fputs("Error: \(error.localizedDescription)\n", stderr)
        exit(1)
    }
    let observations = request.results as? [VNRecognizedTextObservation] ?? []
    for obs in observations {
        if let top = obs.topCandidates(1).first {
            print(top.string)
        }
    }
}

// 核心オプション
request.recognitionLevel = .accurate
request.usesLanguageCorrection = true
request.recognitionLanguages = langs
request.customWords = [] // 必要に応じてドメイン辞書を追加

if let r = roi {
    request.regionOfInterest = r
}

// 最新のリビジョンをサポートしていれば自動的に使用（明示が必要な場合は以下のラインを使用）
// request.revision = VNRecognizeTextRequestRevision3 // OSに合わせて変更

let handler = VNImageRequestHandler(cgImage: cg, options: [:])
do {
    try handler.perform([request])
} catch {
    fputs("Perform error: \(error.localizedDescription)\n", stderr)
    exit(1)
}

そしてターミナルで以下のようにコンパイル後実行すればよい。

xcrun swiftc ocr.swift -o ocr
./ocr sample.png "ko-KR,en-US"

認識率と速度がすごい。

EasyOCRを使用したときは1枚読むのに3秒以上かかるが、Vision Frameworkを利用すると1秒あたり2枚ずつ処理する。

検索してみると、誰かが作ったPythonライブラリがあった。

2. ocrmac

GitHub - straussmaximilian/ocrmac: A python wrapper to extract text from images on a mac system. Uses the vision framework from Apple.

A python wrapper to extract text from images on a mac system. Uses the vision framework from Apple. - straussmaximilian/ocrmac

github.com

やはり誰かがありがたくもAPIをPythonでラッピングしてくれた。

pipでダウンロードして利用すれば良い。

pip install ocrmac

そしてサンプル画像を一度認識させてみよう。

最初は文字認識がうまくいかなかったが、ターゲット言語を指定したら認識率が大幅に向上した。

from ocrmac import ocrmac
import matplotlib.pyplot as plt

img_path = './pdf2png/scan0.png'
result = ocrmac.OCR(img_path, language_preference=["ko-KR","en-US"]).recognize()
for line in result:
    print(line)

最初の値は文字、2番目の値は精度、3番目の値はバウンディングボックスである。

通常はcv2でバウンディングボックスを描くが、このライブラリはそのまま描画してくれる。

以下のようにannotation_PILを使えばよい。

from ocrmac import ocrmac
import matplotlib.pyplot as plt

img_path = './pdf2png/scan0.png'
result = ocrmac.OCR(img_path, language_preference=["ko-KR","en-US"])
img  = result.annotate_PIL()
plt.figure(figsize=(12,12))
plt.imshow(img)

フレームワークをlivetextにしたり、認識レベルを変更することも可能だ。

詳しい例は以下を利用しよう。

ocrmac/ExampleNotebook.ipynb at main · straussmaximilian/ocrmac

A python wrapper to extract text from images on a mac system. Uses the vision framework from Apple. - straussmaximilian/ocrmac

github.com

3. レビュー

AppleのVision Frameworkを使ってOCRに関する渇望が一気に解消された。

200dpiの画像100枚を処理するのに、EasyOCRでは10分程度かかるが、ocrmacではほぼ1分以内で完了する。

これでウェブサービスを一度作ってみてもいいかもしれない。

誰かが教えてくれていたらMacBookをもっと愛していたかもしれないのに…

遅すぎて知ったことが残念な限りだ。

목차

MacBookユーザー向けPython OCRのおすすめ

1. Vision Framework

2. ocrmac

3. レビュー