-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine available languages and provide a choice for them #8
Comments
Keep it simple. I think it would be sufficient to have a user option (similar to the tesseract path option) for the language / script which is preset to
|
Regarding 1. If this works fine, one could implement a Dropdownmenu to just select the language. I think that would be enough. |
A simple solution in a free textbox in the new preferences as @stweil suggested is now implemented. I am aware of the command in tesseract to show all available languages, but I don't see a possibility to call this from Zotero and save its output somewhere. But yeah we could create a file with something like this. Let us wait a little bit more and in practice how good the simple solution is already working. |
Have had a related problem: not being accustomed to type "deu" but always "de" in similar cases (...which I should have verified by trying "tesseract list-lang" of course...) took me quite a long time to get the solution - also because the system doesn't throw any error messages in that case (sadly!). A dropdown-box (or simply: more examples!) would have helped a lot! |
Currently, we use a fixed language as
deu
oreng
for OCR with Tesseract. But in a lot of cases it is even better to choosescript/Latin
, or for old textsscript/Fraktur
. Also other languages or scripts should be available to choose from.There are several things to consider here:
tesseract --list-langs
from the extension, but we cannot access the output or pipe the output somewhere from Zotero. Should we just ship a one-liner script (shell script for linux/mac and bat file for windows) which is then calling the command above and pipe it to a file, which we then can analyze? Other ideas?deu
model for German texts andeng
model for English texts. However, this might not always be that simple. For example for older German texts one should maybe usescript/Fraktur
model instead and even thescript/Latin
model is quite often better for texts including names also in foreign languages etc.CC @stweil @luerhard
The text was updated successfully, but these errors were encountered: