- Git
- Anaconda
- Python 3.7.2
Type below commands on the Analconda Prompt sequentially.
git clone https://gitlab.com/naoki.ohsugi/img_pdf_retrieval.git
cd img_pdf_retrieval
conda create -n img_pdf_retrieval python==3.7.2
conda activate img_pdf_retrieval
pip install -r requirements.txt
Download haarcascade_frontalface_default.xml and put it to img_pdf_retrieval
folder.
If no need to search PDF files, the following setting can be skipped. Otherwise, download poppler-windows from @oschwartz10612's repo.
Please make sure to add the bin/
folder to PATH or use poppler_path = r"C:\path\to\poppler-xx\bin" as an argument
in convert_from_path
.
Update [FOLDERS] section in config.ini
Open config.ini
, change, and add the target folders to retrieve the image/PDF files. You can specify multiple folders with the serial numbers as follows.
[FOLDERS]
0 = C:\Users\<Uesr Name>\img_pdf_retrieval\data\targets\
1 = C:\...
2 = ...
Before booting server, need indexing
python indexing.py
It takes a few hours, according to the registered folders and the number of files found in your environment.
python search_server.py
Open http://127.0.0.1:5000/ on the browser.
- Push button and select image to search.
- Push
Submit
button - Input image to be shown in the area (3)
- Found images will be shown in the area (4)
- Record the time stamp of the source files and skip already indexed and not updated files in the database.
- Running on the system tray and execute indexing periodically (e.g. every 12 hours).