Photo Date Project

This project is dedicated to extracting the text from a series of scanned photos and storing it as exif data for easy classification. The photos were scanned using ScanMyPhotos.com with a resolution of 300 dpi. There are approximately 6000 photos in total to scan.

Dependencies

The dependencies for this project can be found in requirements.txt, although this isn't accurate because I haven't quite gotten a chance to really learn how virtualenvs work.... Definitely required: PIL (Pillow), tesseract.

Photos

Included are >300 photos that can be used to test on. The photos generally have a date in orange text in the bottom corner. The dates may not be in the same absolute location because some photos were scanned upside down. This is fairly straightforward to implement once the OCR part of it is working.

A sample of the photos is available here.

OCR

My overall strategy has been to load each file, crop in the image to a window slightly larger than where the date is (by trial-and-error), and then filter the image using HSV values to pick out that bright orange that really stands out to the human eye. Who knows if this is the best way, but I've tried contrast, brightness, color filtering with RGB (which is painfully difficult to pick a 'range of colors'...). So far HSV works the best.

Tesseract, while free and open source, is not programatically friendly for some reason. I could get it to read bits and pieces of the numbers, but nothing absolute or reliable.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
exif_editor		exif_editor
source		source
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Photo Date Project

Dependencies

Photos

OCR

About

Releases

Packages

Languages

pike00/Photo-Date-Project

Folders and files

Latest commit

History

Repository files navigation

Photo Date Project

Dependencies

Photos

OCR

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages