Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying vectordb with AgentAI #20

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

adivik2000
Copy link
Contributor

This PR has the initial functionality of querying a vectordb(using Chroma Db for now) with agentai.

A query model for Chroma looks like this ->

class Query(BaseModel):
    """Query Model to search the vector database. If query_embeddings is provided, query_texts will be ignored."""

    query_embeddings: Optional[List[Embedding]] = Field(None, description="Embedding for the query to search")
    query_texts: Optional[List[str]] = Field(None, description="Simplified query from the user to search")
    k: int = Field(..., description="The number of results requested")
    include: Include = Field(
        ["documents", "embeddings", "metadatas", "distances"], description="Data to include in results"
    )

An example functionality of how we can do this ->

@tool(registry=db_registry)
def query_vector_db(query: Query):
    """
    Ask the vector database a question
    """
    print(f"Querying vector database: {query}")
    results = client_db.get_docs(query=query)
    return results


question = f"""Search for the content about where food comes from in the vector database.
    Get me three results from the vector database and include the documents and distances."""

conversation = Conversation()
conversation.add_message(
    "user",
    question,
)

chat_response = chat_complete_execute_fn(conversation, tool_registry=db_registry, model="gpt-3.5-turbo")
print(chat_response)

Outputs ->

({'ids': [['90834f80-0432-475e-af9b-9688215db92d', 'a3c0e748-0937-46b1-a167-5aa01a70bbac', '81bed12d-1a84-4b4b-bd09-9fa964240278']], 'distances': [[0.7584866881370544, 1.0528839826583862, 1.372355341911316]], 'metadatas': None, 'embeddings': None, 'documents': [["CHAPTER.... 1 Agricultural Practices", "In order to provide food for a large population- regular production.. patterns can be identified.", "Storage\n1.3 ......Preparation of Soil"]]}, {'query': {'query_texts': ['where does food come from'], 'k': 3, 'include': ['documents', 'distances']}}, <function query_vector_db at 0x168888400>)

The Parsing capability of document is limited to pdfs with Unstructured and Azure Document Intelligence(Form Recognizer) for now. Can expand it as needed.

Some of the code is taken from other PRs that are currently open(which doesn't have to be reviewed in this PR). Do leave a comment after merging the ones earlier than this and I'll resolve the conflicts.

Files to review:

In Docs Folder:

  • Two Notes books - One with Unstructured and the other with Azure Doc Intelligence

In Agentai folder:

  • Two Files - parsers.py and vectordb.py

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant