-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need to re-design dataset validators? #210
Comments
Just read this thread. As I am trying out locally things to move forward #197 (struggling with the test structure rn), I would probably make sure I do not end up finding solutions for the validator that are then overcome by a redefinition of those classes, what do you think @niksirbi ? For what matters, I think it makes sense to start as early as possible to allow for dimensions optionality like the |
Hey @vigji, basically we have two design contraints right now, and both of them hinge on redefining the validators:
I think it would be ideal, if the re-designed validators solve both problems in one sweep, especially because they are somewhat inter-related. I agree with you that it's better to tackle such issues early rather than when the project is more mature. This means that the validators + io functions are about to undergo an unstable period till we settle on a new structure that works. Regarding your experiments in #197, I'd say feel free to continue experimenting on point 2, but don't worry about getting any of the code "camera ready" just yet, because likely we'd have to alter it to match the ongoing changes. Regarding the structure of tests, is there anything we can do to help? I'd be open to hopping on a quick zoom cal some time next week if that'll help clarify things. |
Ok!
I'll dm you on Zulip :) |
@b-peri had a good idea that might help with this: When we load data, we know what type it is (poses or bboxes), so we could add a dataset attribute (e.g.
The more general validation steps (e.g. existence of |
As discussed in #322, this should also include moving |
The problem
Prior to PR #201 , a "movement dataset" was synonymous to a "poses dataset", because
movement
only supported pose tracking data. For this reason, we were using theValidPosesDataset
validator everywhere:compute_
to only validate requiredposition
data variable #206).Going forward,
movement
will support bbox-tracking data as well (and perhaps even other types in the future). We still would like the accessor to work both for poses and bboxes, i.e. both of those should still be a movement dataset. But this means we have to fundamentally re-design our validation strategy (we can't keep using the sameValidPosesDataset
validator for everything).Potential solution
Probably we will end up defining several "entities":
self.validate()
method should only check for the existence of theposition
andconfidence
data variables, and only require thespace
andtime
dimensions. Maybe these checks should be implemented as part of aValidMovementDataset
validator. If I'm not mistaken these checks are sufficient for all our current filters and kinematic variable computations to work, i.e. all accessor methods will/should work for a valid movement dataset.shape
array. This should be validated via theValidBboxesDataset
validator. This validator should run when loading/saving bbox data from/to files, and before any operation that only works on bboxes. Theindividuals
dimension should be optional perhaps, but that is a separate discussion (see below).keypoints
dimension. This is based on the fact that "pose" is usually defined as a set of keypoints. TheValidPosesDataset
could be used to validate such data during I/O and before a computation that only works on pose tracks.The above arrangement has some kinks though. For example, what should we do in cases where only 1 keypoint is being tracked per individual (as in the Aeon dataset, for example). That's not a "pose" strictly speaking, but it can very well be accommodated within a poses dataset with a single keypoint. However, this raises the question of whether a singleton keypoint dimensions should exist in such case, as @vigji has raised, see this issue and this zulip thread). As an alternative we could agree that all point tracking data is "poses", and make the
keypoints
dimension optional (i.e. a poses dataset is essentially the same as the "base" movement dataset.Related to the above, I think the
individuals
dimension, plus any other extra dimensions (likeviews
), should be always optional, i.e. their presence/absence should not be validated by the dataset validators, and they should be only created and validated when and as needed (basically agreeing with what was expressed in the zulip thread).The question is, can we restructure dataset validation in a way that accommodates something like the above scheme, with the kinks ironed out? I'm fully open to better ideas on this.
The text was updated successfully, but these errors were encountered: