jeudi 9 avril 2015

Problem with recognition of numbers 3 and 8

Hi , I'm having a problem with recognition of an invoice image, the recognition is reading most of the 8 characters as 3s.

Attached is the image I'm using.

I have tried with different PSM and some basic configuration options (resolution, avoid loading dawgs).

Any help is appreciated.

Pièces jointes (1)
test1.tif
177 Ko   Afficher   Télécharger
Cliquez ici pour répondre
Dmitri Silaev
24 févr.
You need upscaling, then a bit of blurring and it should work.
For upscaling personally I tried Lanczos with a factor of 3x. This eliminates most of "8 vs. 3" errors. Don't forget that your source TIFF is BW (2 colors) so you have to save the upscaling result e.g. as a 24bit PNG.
For blurring - I used FastStone Image Viewer's Blur with a parameter of 14. If you want to use ImageMagick - I don't know how it exactly relates to Gaussian blur sigma, you have to experiment.
Then a standard command line for Tesseract works well. At least no more "8 vs. 3" errors.

Best regards,
Dmitri Silaev
www.CustomOCR.com



- afficher le texte des messages précédents -
- afficher le texte des messages précédents -
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ad762df6-4617-4184-b5c5-aedf1ec9b92c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Andy Brandt
20:06 (il y a 3 heures)
I'm having a similar issue with a font that i've trained for numbers and a few symbols only - i've attached a sample of the numbers. It is detecting 2's as 8's in my case.

I tried using a Gaussian blur and it appears to help the issue. It also appears that depending on how much or how little blur it changes the results. Do you know why this is?

Do you know if it would help to blur the images when training tesseract too?

Thanks!
Andy
- afficher le texte des messages précédents -
Pièces jointes (1)
txt.png
19 Ko   Afficher   Télécharger

Aucun commentaire:

Enregistrer un commentaire