A Python script that facilitates the identification and removal of duplicate images within a designated directory, as well as image sorting using Machine Learning. This script utilizes image hashing for comparison purposes and now incorporates sorting functionality using TensorFlow for enhanced efficiency. The script offers both a command-line interface and a user-friendly GUI for seamless utilization.
- Choose Folder: Click the "Browse" button to select the target folder containing the images.
- Duplicate Detection Level: Utilize the Detection Level slider to fine-tune the sensitivity of the duplicate detection mechanism.
- Keep Non-Media Files: Check this option to retain non-media files while the process is underway.
- Initiate Processing: Click the "Process" button to initiate the procedure for detecting and eliminating duplicate images.
- Start Sorting: Click the "Sort" button to begin sorting all images into automatically generated categories.
- Clone the repository or download the script to your local machine.
- Install the necessary Python libraries using the subsequent command:
pip install Flask imagehash tensorflow tqdm
- Execute the script by utilizing the command line or by launching the graphical user interface (GUI).
- The logic behind near duplicate detection is complicated. It has a dedicated repo here developed as part of a thesis.
- The script leverages image hashing to facilitate image comparison, and now employs TensorFlow for efficient sorting, thereby enhancing the identification of duplicate images based on the designated threshold (agro_threshold).
We welcome your valuable feedback and encourage you to report any encountered issues for further refinement.