Enhance `find_outliers` and `identify_outliers` performance by avoiding duplication and filtering columns #140

d33bs · 2024-11-20T15:40:08Z

Description

This PR enhances find_outliers and identify_outliers performance by filtering to only the necessary columns and avoiding data duplication where possible. Along the path towards this PR I also analyzed code coverage and found we needed to clean up a bunch of legacy fixtures which are no longer used here (should boost the coverage percentage overall now as a result).

Closes #134
Closes #86

What kind of change(s) are included?

Documentation (changes docs or other related content)
Bug fix (fixes an issue).
Enhancement (adds functionality).
Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

I have read and followed the CONTRIBUTING.md guidelines.
I have searched for existing content to ensure this is not a duplicate.
I have performed a self-review of these additions (including spelling, grammar, and related).
These changes pass all pre-commit checks.
I have added comments to my code to help provide understanding
I have added a test which covers the code changes found within this PR
I have deleted all non-relevant text in this pull request template.

d33bs added 4 commits November 20, 2024 08:06

isolate focus on specific columns

62ecf54

update for efficiencies in identify outliers

52d57ff

clean up tests and add new coverage

e2e274e

update code comment

fadbebc

d33bs marked this pull request as ready for review November 20, 2024 15:44

d33bs requested a review from jenna-tomkinson November 21, 2024 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance `find_outliers` and `identify_outliers` performance by avoiding duplication and filtering columns #140

Enhance `find_outliers` and `identify_outliers` performance by avoiding duplication and filtering columns #140

d33bs commented Nov 20, 2024

Enhance find_outliers and identify_outliers performance by avoiding duplication and filtering columns #140

Are you sure you want to change the base?

Enhance find_outliers and identify_outliers performance by avoiding duplication and filtering columns #140

Conversation

d33bs commented Nov 20, 2024

Description

What kind of change(s) are included?

Checklist

Enhance `find_outliers` and `identify_outliers` performance by avoiding duplication and filtering columns #140

Enhance `find_outliers` and `identify_outliers` performance by avoiding duplication and filtering columns #140