Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add keep_id parameter to DocumentCleaner #7618

Closed
wants to merge 7 commits into from

Conversation

CarlosFerLo
Copy link
Contributor

@CarlosFerLo CarlosFerLo commented Apr 29, 2024

Related Issues

Proposed Changes:

The DocumentCleaner has now an optional property called keep_id that keeps the original id of all the input documents.

How did you test it?

Added one extra unit test and edited the one that checks correct initialisation of the object.

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes ✅
  • I added unit tests and updated the docstrings ✅
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:. ✅
  • I documented my code ✅
  • I ran pre-commit hooks and fixed any issue ✅

@github-actions github-actions bot added topic:tests 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Apr 29, 2024
@CarlosFerLo CarlosFerLo changed the title Issue 7557 feat: add keep_id parameter to DocumentCleaner Apr 29, 2024
@CarlosFerLo CarlosFerLo marked this pull request as ready for review April 29, 2024 18:39
@CarlosFerLo CarlosFerLo requested review from a team as code owners April 29, 2024 18:39
@CarlosFerLo CarlosFerLo requested review from dfokina and vblagoje and removed request for a team April 29, 2024 18:39
@coveralls
Copy link
Collaborator

coveralls commented Apr 29, 2024

Pull Request Test Coverage Report for Build 9004490329

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.004%) to 90.41%

Files with Coverage Reduction New Missed Lines %
components/preprocessors/document_cleaner.py 1 98.85%
Totals Coverage Status
Change from base Build 9004274756: 0.004%
Covered Lines: 6524
Relevant Lines: 7216

💛 - Coveralls

@vblagoje
Copy link
Member

vblagoje commented May 8, 2024

Sorry for the delay @CarlosFerLo - I'm taking a look at this one now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preprocessing should allow keeping Document ids unchanged
3 participants