Enhance find_outliers
and identify_outliers
performance by avoiding duplication and filtering columns
#140
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR enhances
find_outliers
andidentify_outliers
performance by filtering to only the necessary columns and avoiding data duplication where possible. Along the path towards this PR I also analyzed code coverage and found we needed to clean up a bunch of legacy fixtures which are no longer used here (should boost the coverage percentage overall now as a result).Closes #134
Closes #86
What kind of change(s) are included?
Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.