python - Tesseract - recognizing indents -


i'm trying perform ocr tesseract on images such 1 enter image description here

in preprocessing part cropping 3 separate columns, , applying thresholding rid of watermark , applying (gaussian) blurring smoothing out rough pixels (i'm doing programmatically python).

the preprocessed image ends looking enter image description here specific aim have in mind achieving highest accuracy possible, while being able detect spaces present in text. particular want able recognize line-wrapped features such "xtronic cvt® (continuously variable transmission) sport & eco modes". these indicated subtle indents left margin, , crucial features capture.

with preprocessing routines tried far, including 1 above, achieve mediocre character recognition accuracy , bad indent detection. note experimented various tesseract options such page segmentation modes (psm) , preserve_interword_spaces option, without success. have 2 questions:

1. given quality of original image, kind of accuracy theoretically achievable?

2. possible reliably capture indents in image , kind of preprocessing might me that?


Comments

Popular posts from this blog

javascript - Create a stacked percentage column -

Optimising Firebase database by automatically overwriting data -

javascript - Angular UI-Grid customTemplate directive causing rows to load slowly/? -