-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
!feat: unify NLTKDocumentSplitter and DocumentSplitter #8617
!feat: unify NLTKDocumentSplitter and DocumentSplitter #8617
Conversation
Pull Request Test Coverage Report for Build 12298154884Details
💛 - Coveralls |
@davidsbatista do we have to update (de)serialization with these new init params? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me explain better what needs to be done to comply with our deprecation policy:
- in this PR, do not remove
NLTKDocumentSplitter
- leave the file untouched - deprecate
NLTKDocumentSplitter
- it can be done in the other PR that you created - these 2 PRs will be incorporated in 2.9.0
- only after the 2.9.0 release, remove
NLTKDocumentSplitter
Ping me if anything is unclear
@@ -494,3 +498,301 @@ def test_run_document_only_whitespaces(self): | |||
doc = Document(content=" ") | |||
results = splitter.run([doc]) | |||
assert results["documents"][0].content == " " | |||
|
|||
|
|||
class TestSplittingNLTKSentenceSplitter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm did all tests from the test_nltk_document_splitter.py file make it into here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not, only the ones that over NLTK-specific code since there were also tests, in NLTKDocumentSplitter, to cover the same functionality as in DocumentSplitter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have missed something, I will double check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please incorporate the suggested change and then feel free to merge.
Co-authored-by: Stefano Fiorucci <[email protected]>
can someone approve this one: #8628 |
Related Issues
DocumentSplitter
andNLTKDocumentSplitter
#8600Proposed Changes:
How did you test it?
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
and added!
in case the PR includes breaking changes.