Skip to content

Docker container to automatically optimize, OCR and tag PDF files and convert them to PDF/A

Notifications You must be signed in to change notification settings

trasrikgaldifei/ocr_sidekick

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

OCR Sidekick

Intention

OCR Sidekick is a private project to automatically process PDF files coming from a scanner.

There are two main reasons for the project:

Therefore, a solution was needed to unattendedly process the PDF files and pass them on to the Paperless consumer. This includes:

  • rectifying the scan
  • applying OCR
  • doing some guesswork regarding
    • sender / correspondent
    • document date
    • document title
    • tagging
  • Generating a filename to best utilize the Paperless consumer

The project

This project is mainly based on multiple open source projects:

The main focus was on easy and unattended use, which led to a docker image controlled by config files. Most should be self-explanatory.

About

Docker container to automatically optimize, OCR and tag PDF files and convert them to PDF/A

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published