-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add page_number to metadata in DocumentSplitter #7599
Conversation
…It should work as expected in issue deepset-ai#6705.
…red from the one it had on Haystack 1.x
…red from the one it had on Haystack 1.x Solve some minor bugs spotted by tests.
…into issue-6705
@CarlosFerLo thank you very much for your contribution, your PR looks good - I just left some small comments to improve things a bit - let me know if it's not clear or you need help with something. |
Update docstring from suggestion Co-authored-by: David S. Batista <[email protected]>
Thanks, really appreciate it. Excited to be able to collaborate. |
Co-authored-by: David S. Batista <[email protected]>
releasenotes/notes/add-page-number-to-document-splitter-162e9dc7443575f0.yaml
Outdated
Show resolved
Hide resolved
Pull Request Test Coverage Report for Build 8876963566Details
💛 - Coveralls |
Related Issues
page_number
to meta of Documents inDocumentSplitter
#6705Proposed Changes:
I updated the
DocumentSplitter
methods so that it adds the "page_number" field to the metadata of output documents. This field contains the page number where you can find the document on the original document. The implementation is the same as the one on the v1.25.x.How did you test it?
I added some new unit test for testing this behaviour, but testing was mainly functional as it was based on a previously functioning code.
Notes for the reviewer
This is my first contribution!!! The
.gitignore
change is to counter a VSCode extension I have that I am not able to eliminate the commit.Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
. ✅