PDF Searcher

Overview

The PDF Searcher is a Python project that provides a gRPC API for uploading, searching, and summarizing documents. It includes a gRPC server for handling document-related operations and can be easily deployed using Docker. It also uses chromadb which is a vector database to store the data. After uploading the documents through the gRPC API the project converts the pdf file to text and embeds the text using all-MiniLM-L6-v2 which is a sentence-transformers model.

In addition to uploading documents and adding them to the database, you can call the APIs to search for a query in the database and summarizing your texts.

Prerequisites

Make sure you have the following prerequisites installed:

Python 3.8
Docker
Other dependencies (specified in requirements.txt)

Getting Started

Running Locally

Clone the repository:

git clone https://github.com/kian79/PDF-searcher.git

Navigate to the project directory:
```
cd PDF_searcher
```
Install dependencies:
```
pip install -r requirements.txt
```

Run the gRPC server:

PYTHONPATH=.:.. python grpc_api/server.py

The server should be running on localhost:50051.

Docker Setup

Build the Docker image:
```
docker build -t pdf_searcher .
```
Run the Docker container:
```
docker run -p 50051:50051 pdf_searcher
```

The gRPC server should be accessible on localhost:50051.

Usage

To interact with the Document Service, you can use the provided gRPC client script or integrate the service into your own Python applications.

Example usage in Python client:

import grpc
import document_service_pb2 as pb2
import document_service_pb2_grpc as pb2_grpc

def upload_document(file_content, document_name):
 with grpc.insecure_channel("localhost:50051") as channel:
     stub = pb2_grpc.DocumentServiceStub(channel)
     request = pb2.UploadRequest(file_content=file_content, document_name=document_name)
     response = stub.UploadDocument(request)
     return response.document_id

# Other client functions...

# Example usage:
with open("path/to/your/document.pdf", "rb") as file:
 pdf_content = file.read()

document_id = upload_document(pdf_content)
print(f"Uploaded document with ID: {document_id}")

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
database		database
grpc_api		grpc_api
pdf_processing		pdf_processing
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
datatypes.py		datatypes.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Searcher

Overview

Prerequisites

Getting Started

Running Locally

Docker Setup

Usage

About

Releases

Packages

Languages

kian79/PDF-searcher

Folders and files

Latest commit

History

Repository files navigation

PDF Searcher

Overview

Prerequisites

Getting Started

Running Locally

Docker Setup

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages