-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored modules related to input-output #194
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #194 +/- ##
==========================================
- Coverage 99.68% 99.68% -0.01%
==========================================
Files 11 11
Lines 638 634 -4
==========================================
- Hits 636 632 -4
Misses 2 2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactoring ✨ !
Just some suggestions on function names and docstrings. Maybe it's not a lot to add to this PR but also happy to have it separately.
Thanks for the review @sfmig, I like all you suggestions and will implement them here. The scope of this PR will increase to "refactoring load_poses module" and I will edit the PR title and description accordingly. |
Quality Gate passedIssues Measures |
from_numpy()
function to the load_poses
module
@sfmig I've updated this PR's description and title. I think there is no need to go line-by-line through the diff again, just let me know whether you agree with the changes as I've described them in the updated PR description. |
Looks fantastic @niksirbi 🚀 |
Description
What is this PR
Why is this PR needed?
The public functions in the
load_poses.py
module currently assume that users are always loading data from a file (or from a DeepLabCut-style pandas dataframe). However, there are some use-cases where the data are already in Python, in the form of numpy arrays, perhaps imported with custom loaders (this is not hypothetical, a potential user has already asked for it). There is a way to convert such data into a properly-formattedmovement
dataset, but this way is not easy to find and is not documented.What does this PR do?
Adds a
from_numpy()
function that explicitly acceptsposition
(+ optionalconfidence
) data in the form of numpy arrays and returns amovement
dataset. Under the hood it calls theValidPosesDataset
validator and the existing_from_valid_data()
utility.The addition of this function enabled me to slightly refactor the
load_poses.py
module such thatfrom_numpy()
is the single point-of-entry into amovement
dataset - i.e. every other loading function first reads data into numpy arrays before calling the new function. This was already de facto the case, but it's much more explicit now. Moreover, this refactoring also enabled me to get read of a redundant validation call forLightningPose
data.Here's the schematic of the updated
load_poses.py
module. The previous version can be found here.How has this PR been tested?
I added a simple unit test for the new function. The underlying
ValidPosesDataset
is already extensively tested, and so are all file loaders.Is this a breaking change?
No.
Does this PR require an update to the documentation?
The API index has been updated accordingly. The new function's docstring also includes example usage.
Checklist:
EDIT 2024-05-31
Following @sfmig review, the scope of this PR expanded, resulting in a more thorough refactoring of IO-related modules:
load_poses.py
,save_poses.py
, andvalidators.py
. This mostly involved renaming functions and editing docstrings, to make the whole thing more logical and internally consistent.These are the names of the updated public functions:
Note that we renamed
from_dlc_df
tofrom_dlc_style_df
(and likewise for save), because LightningPose also uses "DeepLabCut-style" dataframes.We also decided to rename private functions such that it's clear what is being converted to what, e.g.:
_ds_from_sleap_labels_file()
instead of_load_from_sleap_labels.file()
. There is one remaining inconsistency, namely the fact that public functions start withfrom_
while private functions start with_ds_from_
or_df_from_
. That's because the way public functions are actually invoked is the following:and
load_poses.ds_from_file
would be redundant. Perhaps there is scope for renamingload_poses
toload_dataset
(andsave_poses
tosave_dataset
accordingly), such that the syntax would bemovement.io.load_dataset.from_file()
. That could make more sense now, because "poses" is a bit ambiguous, while we've fully defined what a "dataset" is. I'll open an issue about that.Here's the updated diagram for
movement
's I/O functionalites.