What's Changed
New SDK that allows for module-wise optimization.
Basic Usage:
from ragbuilder import RAGBuilder
# Initialize and optimize
builder = RAGBuilder.from_source_with_defaults(input_source='data.pdf')
results = builder.optimize()
# Run a query through the complete pipeline
response = results.invoke("What is HNSW?")
# View optimization summary
print(results.summary())
Advanced Configuration
For fine-grained control, you can customize every aspect:
from ragbuilder.config import (
DataIngestOptionsConfig,
RetrievalOptionsConfig,
GenerationOptionsConfig
)
# Configure data ingestion
data_ingest_config = DataIngestOptionsConfig(
input_source="data.pdf",
document_loaders=[
{"type": "pymupdf"},
{"type": "unstructured"}
],
chunking_strategies=[{
"type": "RecursiveCharacterTextSplitter",
"chunker_kwargs": {"separators": ["\n\n", "\n", " ", ""]}
}],
chunk_size={"min": 500, "max": 2000, "stepsize": 500},
embedding_models=[{
"type": "openai",
"model_kwargs": {"model": "text-embedding-3-large"}
}]
)
# Configure retrieval
retrieval_config = RetrievalOptionsConfig(
retrievers=[
{
"type": "vector_similarity",
"retriever_k": [20],
"weight": 0.5
},
{
"type": "bm25",
"retriever_k": [20],
"weight": 0.5
}
],
rerankers=[{
"type": "BAAI/bge-reranker-base"
}],
top_k=[3, 5]
)
# Initialize with custom configs
builder = RAGBuilder(
data_ingest_config=data_ingest_config,
retrieval_config=retrieval_config
)
# Access individual components
vectorstore = results.data_ingest.get_vectorstore()
docs = results.retrieval.invoke("What is RAG?")
answer = results.generation.invoke("What is RAG?")
Full Changelog: 0.0.22...v0.1.4