Hi , I'm having a problem with recognition of an invoice image, the recognition is reading most of the 8 characters as 3s.
Attached is the image I'm using.
I have tried with different PSM and some basic configuration options (resolution, avoid loading dawgs).
Any help is appreciated.
Pièces jointes (1)
Cliquez ici pour répondre
| Dmitri Silaev |
24 févr.
|
You need upscaling, then a bit of blurring and it should work.
For
upscaling personally I tried Lanczos with a factor of 3x. This
eliminates most of "8 vs. 3" errors. Don't forget that your source TIFF
is BW (2 colors) so you have to save the upscaling result e.g. as a
24bit PNG.
- afficher le texte des messages précédents -
--- afficher le texte des messages précédents -
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ad762df6- 4617-4184-b5c5-aedf1ec9b92c% 40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
| Andy Brandt |
20:06 (il y a 3 heures)
|
I'm
having a similar issue with a font that i've trained for numbers and a
few symbols only - i've attached a sample of the numbers. It is
detecting 2's as 8's in my case.
I tried using a
Gaussian blur and it appears to help the issue. It also appears that
depending on how much or how little blur it changes the results. Do you
know why this is?
Do you know if it would help to blur the images when training tesseract too?
Thanks!
Andy
- afficher le texte des messages précédents -
Pièces jointes (1)
Aucun commentaire:
Enregistrer un commentaire