You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When OCRing a receipt, the amount, 12.43 is mistakenly read as 12-43.
We know from the domain that there won't be a negative amount here, and certainly there wouldn't be a dash in the middle. The format must be something like: $?/d+(./d+)?
The user indicates that this particular area on the receipt is an amount
We want to retry to OCR just that area as an amount
How to do about this? Two ways that pop out are: 1) give a pattern to the engine?, or if we can't do that 2) restrict allowed characters to numbers and decimal (fairly straightforward with Tesseract - although there may have been a bug prior to 4.1)
Next question, who needs to do/know about this? In our OCR element, we currently have the capability for the user to say "this area should be an amount". Now we have the text and location. I guess for now we can put it in the the element. We want to:
See if the existing text is compatible
2a. If it is, use it
2b. If it isn't, re-OCR using some rules and try again (i.e. go to one, but don't get into an infinite loop)
CURRENT: Validation of number is embedded in visitor/reader - we should attempt to validate first?
The text was updated successfully, but these errors were encountered:
Use case:
12.43
is mistakenly read as12-43
.$?/d+(./d+)?
How to do about this? Two ways that pop out are: 1) give a pattern to the engine?, or if we can't do that 2) restrict allowed characters to numbers and decimal (fairly straightforward with Tesseract - although there may have been a bug prior to 4.1)
Next question, who needs to do/know about this? In our OCR element, we currently have the capability for the user to say "this area should be an amount". Now we have the text and location. I guess for now we can put it in the the element. We want to:
2a. If it is, use it
2b. If it isn't, re-OCR using some rules and try again (i.e. go to one, but don't get into an infinite loop)
CURRENT: Validation of number is embedded in visitor/reader - we should attempt to validate first?
The text was updated successfully, but these errors were encountered: