python - Tesseract - recognizing indents -
i'm trying perform ocr tesseract on images such 1 
in preprocessing part cropping 3 separate columns, , applying thresholding rid of watermark , applying (gaussian) blurring smoothing out rough pixels (i'm doing programmatically python).
the preprocessed image ends looking
specific aim have in mind achieving highest accuracy possible, while being able detect spaces present in text. particular want able recognize line-wrapped features such "xtronic cvt® (continuously variable transmission) sport & eco modes". these indicated subtle indents left margin, , crucial features capture.
with preprocessing routines tried far, including 1 above, achieve mediocre character recognition accuracy , bad indent detection. note experimented various tesseract options such page segmentation modes (psm) , preserve_interword_spaces option, without success. have 2 questions:
1. given quality of original image, kind of accuracy theoretically achievable?
2. possible reliably capture indents in image , kind of preprocessing might me that?
Comments
Post a Comment