Question 1

Should I pick Simplified or Traditional?

Accepted Answer

Pick whichever your source document uses. If you don't know: documents from Mainland China and Singapore use Simplified (fewer strokes per character); documents from Taiwan, Hong Kong, and Macau use Traditional (more complex characters). If unsure, pick both — Tesseract can handle mixed but accuracy is best with the matched character set.

Question 2

How accurate is Chinese OCR?

Accepted Answer

On clean modern scans at 300 DPI: 90-95% character-level accuracy. Older or handwritten Chinese drops sharply (handwriting recognition is a separate, harder problem we don't solve here). For best results, scan at 300 DPI minimum and ensure straight orientation.

Question 3

Does it handle vertical Chinese text?

Accepted Answer

Modern Tesseract handles vertical text in Chinese, Japanese, and Korean reasonably well. Old printed books with right-to-left page order may need additional pre-processing — contact us if you have a large archive of vertical-text material to digitize.

OCR Chinese PDF — Simplified and Traditional

Frequently asked questions