Skip to content
This repository has been archived by the owner. It is now read-only.

Commit

Permalink
Fixing the OCR on server-side.
Browse files Browse the repository at this point in the history
For some reasons, the behavior of ocrmypdf seem to have change. Whereas before we were expecting directly the .txt file from it, now it was generating a PDF with the ocr-ed text overlaid to it. This commit fix this issue by overwriting the original scan PDF with a pdf with text overlaid and run the usual pdftotext on this new PDF.
  • Loading branch information
christian-oreilly authored and pafonta committed Aug 21, 2018
1 parent 5961509 commit 710d283
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion nat/restServer.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,8 @@ def runOCR(fileName):
app.OCRLock.release()

# Run OCR
run_ocrmypdf(fileName + ".pdf", fileName + ".txt")
run_ocrmypdf(fileName + ".pdf", fileName + ".pdf")
check_call(['pdftotext', '-enc', 'UTF-8', fileName + ".pdf", fileName + ".txt"])

acquireLockWithTimeout()
del app.OCRFiles[app.OCRFiles.index(fileName)]
Expand Down

0 comments on commit 710d283

Please sign in to comment.