Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR: Second Try based on user feedback #26

Open
seandenigris opened this issue Oct 27, 2020 · 0 comments
Open

OCR: Second Try based on user feedback #26

seandenigris opened this issue Oct 27, 2020 · 0 comments

Comments

@seandenigris
Copy link
Owner

seandenigris commented Oct 27, 2020

Use case:

  • When OCRing a receipt, the amount, 12.43 is mistakenly read as 12-43.
  • We know from the domain that there won't be a negative amount here, and certainly there wouldn't be a dash in the middle. The format must be something like: $?/d+(./d+)?
  • The user indicates that this particular area on the receipt is an amount
  • We want to retry to OCR just that area as an amount

How to do about this? Two ways that pop out are: 1) give a pattern to the engine?, or if we can't do that 2) restrict allowed characters to numbers and decimal (fairly straightforward with Tesseract - although there may have been a bug prior to 4.1)

Next question, who needs to do/know about this? In our OCR element, we currently have the capability for the user to say "this area should be an amount". Now we have the text and location. I guess for now we can put it in the the element. We want to:

  1. See if the existing text is compatible
    2a. If it is, use it
    2b. If it isn't, re-OCR using some rules and try again (i.e. go to one, but don't get into an infinite loop)

CURRENT: Validation of number is embedded in visitor/reader - we should attempt to validate first?

seandenigris added a commit that referenced this issue Oct 29, 2020
It seems we have a good hook set up, now to implement the second pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant