You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have provided the image from which I am trying to extract text from, using tesseract ocr.
Along with that, I have also provided the result or the extracted text from the image.
As it can be observed from the images, the extracted text is not very accurate. Negative symbols have been omitted, some undesired characters are also there in the extracted text. (I have marked some of the incorrect results with blue boxes)
I have tried to improve the results by preprocessing and bringing changes in the parameters of the model. I have tried:
binarizing the images
HDR processing of the processes
Even then, such inconsistencies remain.
How to improve the detection and extraction of text in tesseract? I have also tried paddleocr for the same task. Even then, symbols such as euro, some negative signs are not being detected.
The text was updated successfully, but these errors were encountered:
@zdenop Thank you for your response. I tried each and every step mentioned in this documentation. Even then, some decimal points are being omitted such as 22.5 is being misunderstood as 225. Moreover some numbers and being wrongly detected, such as -9 is being extracted as = ). Some negative symbols are also being omitted.
I have tried preprocessing the images and have implemented the following:
noise removal
canny edge detection
hough line transform
binarization
hdr processing
Pls provide your guidance and help me resolve this issue.
And what did you learn about table recognition?
What forum posts about table recognition, what other issues are stated about table recognition? You should check these sources BEFORE posting the issue.
Your Feature Request
I have provided the image from which I am trying to extract text from, using tesseract ocr.
Along with that, I have also provided the result or the extracted text from the image.
As it can be observed from the images, the extracted text is not very accurate. Negative symbols have been omitted, some undesired characters are also there in the extracted text. (I have marked some of the incorrect results with blue boxes)
I have tried to improve the results by preprocessing and bringing changes in the parameters of the model. I have tried:
Even then, such inconsistencies remain.
How to improve the detection and extraction of text in tesseract? I have also tried paddleocr for the same task. Even then, symbols such as euro, some negative signs are not being detected.
The text was updated successfully, but these errors were encountered: