OCR: Second Try based on user feedback #26

seandenigris · 2020-10-27T17:58:08Z

Use case:

When OCRing a receipt, the amount, 12.43 is mistakenly read as 12-43.
We know from the domain that there won't be a negative amount here, and certainly there wouldn't be a dash in the middle. The format must be something like: $?/d+(./d+)?
The user indicates that this particular area on the receipt is an amount
We want to retry to OCR just that area as an amount

How to do about this? Two ways that pop out are: 1) give a pattern to the engine?, or if we can't do that 2) restrict allowed characters to numbers and decimal (fairly straightforward with Tesseract - although there may have been a bug prior to 4.1)

Next question, who needs to do/know about this? In our OCR element, we currently have the capability for the user to say "this area should be an amount". Now we have the text and location. I guess for now we can put it in the the element. We want to:

See if the existing text is compatible
2a. If it is, use it
2b. If it isn't, re-OCR using some rules and try again (i.e. go to one, but don't get into an infinite loop)

CURRENT: Validation of number is embedded in visitor/reader - we should attempt to validate first?

The text was updated successfully, but these errors were encountered:

It seems we have a good hook set up, now to implement the second pass

seandenigris added a commit that referenced this issue Oct 29, 2020

Issue #26: OCR 2nd Pass Guided by User (WIP)

2596686

It seems we have a good hook set up, now to implement the second pass

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR: Second Try based on user feedback #26

OCR: Second Try based on user feedback #26

seandenigris commented Oct 27, 2020 •

edited

Loading

OCR: Second Try based on user feedback #26

OCR: Second Try based on user feedback #26

Comments

seandenigris commented Oct 27, 2020 • edited Loading

seandenigris commented Oct 27, 2020 •

edited

Loading