python - Tesseract - recognizing indents -

May 15, 2010

i'm trying perform ocr tesseract on images such 1

in preprocessing part cropping 3 separate columns, , applying thresholding rid of watermark , applying (gaussian) blurring smoothing out rough pixels (i'm doing programmatically python).

the preprocessed image ends looking specific aim have in mind achieving highest accuracy possible, while being able detect spaces present in text. particular want able recognize line-wrapped features such "xtronic cvt® (continuously variable transmission) sport & eco modes". these indicated subtle indents left margin, , crucial features capture.

with preprocessing routines tried far, including 1 above, achieve mediocre character recognition accuracy , bad indent detection. note experimented various tesseract options such page segmentation modes (psm) , preserve_interword_spaces option, without success. have 2 questions:

1. given quality of original image, kind of accuracy theoretically achievable?

2. possible reliably capture indents in image , kind of preprocessing might me that?

Search This Blog

Insert

python - Tesseract - recognizing indents -

Comments

Post a Comment

Popular posts from this blog

service - Android MediaPlayer calls onCompletion before it already finished -

javascript - Training Neural Network to play flappy bird with genetic algorithm - Why can't it learn? -

javascript - Create a stacked percentage column -