Skip to content

Using scanners and OCR to grep dead trees the easy way (Linux only)

License

Notifications You must be signed in to change notification settings

mjourdan/paperwork

 
 

Repository files navigation

Paperwork

Description

Paperwork is a personal document manager for scanned documents (and PDFs).

It's designed to be easy and fast to use. The idea behind Paperwork is "scan & forget": You should be able to just scan a new document and forget about it until the day you need it again.

In other words, let the machine do most of the work for you.

Screenshots

Main Window & Scan

Search Suggestions

Labels

Settings window

Details

Papers are organized into documents. Each document contains pages.

It uses mainly 4 other pieces of software:

  • Sane: To scan the pages
  • Tesseract: To extract the words from the pages (OCR)
  • GTK/Glade: For the user interface
  • Whoosh: To index and search documents, and provide keyword suggestions

Page orientation is automatically guessed using OCR.

Since OCR is not perfect, and since some documents don't contain useful keywords, Paperwork allows also to put labels on each document.

Licence

GPLv3 or later. See COPYING.

Installation

Archives

Github can automatically provides .tar.gz and .zip files if required. However, they are not required to install Paperwork. They are indicated here as a convenience for package maintainers.

Contact/Help

Development

All the information can be found on the wiki

About

Using scanners and OCR to grep dead trees the easy way (Linux only)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.5%
  • Shell 0.5%