Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some words are not recognized correctly with the language file tessdata_best\eng.trainedata #4318

Open
ProgramacionDk opened this issue Sep 18, 2024 · 0 comments

Comments

@ProgramacionDk
Copy link

Current Behavior

FGO073

FGO037

FGO037

FG101

FG114

FGO037
FG184

FG095
FG184

resultado.txt

Expected Behavior

FG073

FG037

FG037

FG101

FG114

FG037
FG184

FG095
FG184

Suggested Fix

No response

tesseract -v

tesseract v5.4.0.20240606
leptonica-1.84.1
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.1) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.3 : libwebp 1.4.0 : libopenjp2 2.5.2
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.7.4 zlib/1.3.1 liblzma/5.6.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.6

Operating System

Windows 10

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

Some words are not recognized correctly, for example, the word FG073 is recognized as FGO073.

I run tesseract with attached image and english trained data tessdata_best\eng.traineddata downloaded from https://github.com/tesseract-ocr/tessdata_best.
The trained data tessdata_fast\eng.traineddata it works fine.

tesseract v5.4.0.20240606 compiled by UB Mannheim
https://github.com/UB-Mannheim/tesseract/wiki

ocr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants