Skip to content

Commit

Permalink
Update Class and release 0.2.2
Browse files Browse the repository at this point in the history
Signed-off-by: samadpls <[email protected]>
  • Loading branch information
samadpls committed Nov 21, 2024
1 parent 058cc8e commit 33d7a2f
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 21 deletions.
17 changes: 8 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,14 @@



Welcome to **BestRAG**! This Python library enables you to efficiently store and retrieve embeddings using a hybrid Retrieval-Augmented Generation (RAG) approach. It combines dense, sparse, and late interaction embeddings to provide a robust solution for handling large datasets.
Introducing **BestRAG**! This Python library leverages a hybrid Retrieval-Augmented Generation (RAG) approach to efficiently store and retrieve embeddings. By combining dense, sparse, and late interaction embeddings, **BestRAG** offers a robust solution for managing large datasets.

---

## ✨ Features

🚀 **Hybrid RAG**: Utilizes dense, sparse, and late interaction embeddings for enhanced performance.
🔌 **Easy Integration**: Simple API for storing and searching embeddings.
📄 **PDF Support**: Directly store embeddings from PDF documents.

## 🚀 Installation

Expand Down Expand Up @@ -43,13 +48,7 @@ results = rag.search(query="your search query", limit=10)
print(results)
```

> **Note**: To generate your API key and endpoint, visit [Qdrant](https://qdrant.tech/).
## ✨ Features

- **Hybrid RAG**: Utilizes dense, sparse, and late interaction embeddings for enhanced performance.
- **Easy Integration**: Simple API for storing and searching embeddings.
- **PDF Support**: Directly store embeddings from PDF documents.
> **Note**: Qdrant offers a free tier with 1GB of storage. To generate your API key and endpoint, visit [Qdrant](https://qdrant.tech/).
## 🤝 Contributing

Expand Down
23 changes: 20 additions & 3 deletions bestrag/best_rag.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
"""B-RAG"""
"""BestRAG"""
# Authors: Abdul Samad Siddiqui <[email protected]>

import re
import uuid
from typing import List, Optional
from qdrant_client import QdrantClient, models
from qdrant_client.http.models import Distance, VectorParams
from qdrant_client.http.models import Distance
from fastembed import TextEmbedding
from fastembed.sparse.bm25 import Bm25
import PyPDF2
Expand Down Expand Up @@ -35,6 +35,8 @@ def __init__(self,
late_interaction_model_name: Optional[str] = "BAAI/bge-small-en-v1.5"
):
self.collection_name = collection_name
self.api_key = api_key
self.url = url
self.client = QdrantClient(url=url, api_key=api_key)

self.dense_model = TextEmbedding()
Expand Down Expand Up @@ -216,7 +218,7 @@ def search(self, query: str, limit: int = 10):
models.Prefetch(
query=query_vector["dense-vector"],
using="dense-vector",
limit=50,
limit=20,
)
],
query=query_vector["output-token-embeddings"],
Expand All @@ -225,3 +227,18 @@ def search(self, query: str, limit: int = 10):
)

return results

def __str__(self):
"""
Return a string representation of the BestRAG object, including its parameters.
"""
info = (
"**************************************************\n"
"* BestRAG Object Information *\n"
"**************************************************\n"
f"* URL: {self.url}\n"
f"* API Key: {self.api_key}\n"
f"* Collection Name: {self.collection_name}\n"
"**************************************************"
)
return info
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ fastembed==0.4.1
streamlit
pytest
flake8
PyPDF2
PyPDF2==3.0.1
qdrant-client
onnxruntime==1.19.2
pytest
11 changes: 3 additions & 8 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,8 @@

setup(
name="bestrag",
version="0.2.0",
description="BestRAG (Best Retrieval Augmented) is a library for storing and"
" searching document embeddings in a Qdrant vector database. It uses a "
"hybrid embedding technique combining dense, late interaction and sparse representations for better performance.",
version="0.2.1",
description="bestrag: Library for storing and searching document embeddings in a Qdrant vector database using hybrid embedding techniques.",
author="samadpls",
author_email="[email protected]",
long_description=long_description,
Expand All @@ -17,10 +15,7 @@
packages=find_packages(),
install_requires=[
"fastembed==0.4.1",
"streamlit",
"pytest",
"flake8",
"PyPDF2",
"PyPDF2==3.0.1",
"qdrant-client",
"onnxruntime==1.19.2",
],
Expand Down

0 comments on commit 33d7a2f

Please sign in to comment.