Skip to content
This repository has been archived by the owner on Feb 14, 2022. It is now read-only.
/ OCRmyFiles Public archive

Moved to codeberg.org - https://codeberg.org/DecaTec/OCRmyFiles - Bash script for adding a text layer to PDF files and converting images in PDFs (with OCR).

License

Notifications You must be signed in to change notification settings

DecaTec/OCRmyFiles

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 

Repository files navigation

Get it on Codeberg

⚠️ Archived, moved to Codeberg: https://codeberg.org/DecaTec/OCRmyFiles ⚠️

Thus, this GitHub repository is outdated and not longer maintained on GitHub. Please update your references.

OCRmyFiles

Bash script for adding a text layer to PDF files and converting images in PDFs (with OCR).

Adds an OCR text layer to all PDF files in the given input directory and saves the new PDF files to the output directory.

When the input directory also contains image files (e.g. jpg, png), these are converted to (OCR'ed) PDFs.

All other file types are just copied from the input directory to the output directory.

Requirements

Usage

  • Download script or clone repository
  • Make script executable sudo chmod +x OCRmyFiles.sh
  • Modify the script to fit your needs:
    • Set default input/output directories
    • Modify the OCRmyPDF command line arguments (you can find an overview of available command line arguments here)
    • Modify the Tesseract command line arguments (you can find an overview of available command line arguments here)
  • Call the script:
    • OCRmyFiles.sh (no parameter): using default directories for input/output (as defined in the script itself)
    • OCRmyFiles.sh <inputDir> <outputDir>: using specified directories for input/output
  • The script might print some warnings/errors from Tesseract. These can be ignored in most cases as the OCR text layer will be created anyway
  • You can also call this script with a cronjob for automated processing of PDFs/images:
    • With the user the cronjob should be executed, call contab -e
    • Add the following to run the script e.g. every 30 minutes: */30 * * * * /path/to/the/script/OCRmyFiles.sh > /dev/null 2>&1

About

Moved to codeberg.org - https://codeberg.org/DecaTec/OCRmyFiles - Bash script for adding a text layer to PDF files and converting images in PDFs (with OCR).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages