-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adding an hierarchical document builder and a auto-merge retriever #56
Conversation
Pull Request Test Coverage Report for Build 10560416383Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a first pass.
An end-to-end raw example script would still be helpful.
haystack_experimental/components/retrievers/auto_merging_retriever.py
Outdated
Show resolved
Hide resolved
haystack_experimental/components/retrievers/auto_merging_retriever.py
Outdated
Show resolved
Hide resolved
haystack_experimental/components/splitters/hierarchical_doc_builder.py
Outdated
Show resolved
Hide resolved
haystack_experimental/components/splitters/hierarchical_doc_builder.py
Outdated
Show resolved
Hide resolved
haystack_experimental/components/splitters/hierarchical_doc_builder.py
Outdated
Show resolved
Hide resolved
haystack_experimental/components/splitters/hierarchical_doc_builder.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments.
Let's also add these components to the Experiments catalog in README.
haystack_experimental/components/retrievers/auto_merging_retriever.py
Outdated
Show resolved
Hide resolved
haystack_experimental/components/retrievers/auto_merging_retriever.py
Outdated
Show resolved
Hide resolved
haystack_experimental/components/retrievers/auto_merging_retriever.py
Outdated
Show resolved
Hide resolved
…ever.py Co-authored-by: Stefano Fiorucci <[email protected]>
Co-authored-by: Stefano Fiorucci <[email protected]>
Let's also remember to add pydoc configs here. |
still work-in-progress, I'm testing this new Retriever against every possible DocumentStore in the integrations repo and collecting the issues. |
Here is the current status regarding all the doc stores, I'm still investigating Weaviate
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small change then good to merge for me.
Related Issues
Proposed Changes:
Adds a
HierarchicalDocumentBuilder
: it's used to split aDocument
into multipleDocument
objects of different block sizes building a hierarchical tree structure where each smaller block is a child of a previous larger block.Adds a
AutoMergingRetriever
leverages the hierarchical tree structure of documents, where the leaf nodes are indexed in a document store. During retrieval, if the number of matched leaf documents below the same parent is higher than a defined threshold, the retriever will return the parent document instead of the individual leaf documents.Picture to help understand what's being implemented
How did you test it?
Notes for the reviewer
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.