Ability to save multi-animal pose tracks to single-animal files #71

DhruvSkyy · 2023-10-23T13:12:09Z

Description

What is this PR

Bug fix
Addition of a new feature
Other

What does this PR do?

This PR adds the functionality to save xarrays as a dictionary of individual pandas dataframes for to_dlc_df and as multiple files for each individual for to_dlc_file. This is based on the split_individuals: bool parameter, if True it will split the data for each individual, if false it will save it as a multi-animal file.

This issue was described in Issue #39.

References

Please reference any existing issues/PRs that relate to this PR.

How has this PR been tested?

The PR was tested on a Juypter Notebook against the four scenarios, (credit to @niksirbi):

The xarray.Dataset has multiple (>1) individuals and the user wants to save it to a multi-animal DeepLabCut dataframe. This currently works with the existing functions.
The xarray.Dataset has only 1 individual but the user still wants to save it to a multi-animal DeepLabCut dataframe, including the redundant "individuals" level. This also works currently.
The xarray.Dataset has a single individual and the user wants to save it to single-animal DeepLabCut dataframe (without the "individuals" level). This behaviour needs to be implemented, the output will be a single file.
The xarray.Dataset has multiple (>1) individuals and the user wants to split them for saving into multiple single-animal files. * This is the trickiest case to implement, as the dataset would have to be split across individuals (behind the scenes) and each part is then saved to a single-animal file as in case 3. The output should be as many files as there are individuals.

Todo

Need to write tests.

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

sonarqubecloud · 2023-10-30T11:42:03Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
1 Code Smell

No Coverage information
0.0% Duplication

niksirbi

Hey @DhruvSkyy, thanks a lot for contributing!
I left several comments. The main two points to be addressed are:

The two functions are now quite big and hard to parse, I think we would benefit from refactoring repeated bits into separate functions. I have pointed this out in some concrete examples.
We don't really handle the case where there is one individual in the dataset but the user has chosen split_individuals=True. I think in this case we should save one single-animal DataFrame (without the "individuals" level), as there is nothing to split. But, we should raise a warning to tell the user that we are ignoring the split_individuals setting.

Let me know if you think these are sensible choices. Have a shot at implementing them and we can have another round of review then.

niksirbi · 2023-10-30T17:07:21Z

movement/io/save_poses.py

+        tracks_with_scores = np.concatenate(
+            (
+                ds.pose_tracks.data,
+                ds.confidence.data[..., np.newaxis],
+            ),
+            axis=-1,
+        )


This snippet is repeated twice, once with ds and once with individual_data. This is a good candidate to be refactored into a separate function that takes the xarray dataset as input, and return the concatenated numpy array.

niksirbi · 2023-10-30T17:09:45Z

movement/io/save_poses.py

+        )
+
+        # Create the DLC-style multi-index columns
+        index_levels = ["scorer", "individuals", "bodyparts", "coords"]


There is also some repetition here. You can define `index_levels = ["scorer", "bodyparts", "coords"] near the top (before the if statements), and then add the "individuals" level in the second position here, only when it's needed.

niksirbi · 2023-10-30T17:12:59Z

movement/io/save_poses.py

+
+        """ Create a single DataFrame with
+        multi-index columns for each individual """
+        df = pd.DataFrame(


This bit is also repeated and can be refactored into its own function that takes a numpy array and the columns as arguments.
Optionally this can be combine with the other refactoring suggestion, that takes xarray dataset and return a numpy array. So, for example, you could have one function that takes an xarray dataset and the index levels, and returns a dataframe.

niksirbi · 2023-10-30T17:14:13Z

movement/io/save_poses.py

-    "likelihood", and stored in the "coords" level (as DeepLabCut expects).
+    The DataFrame(s) will have a multi-index column with the following levels:
+    "scorer", "individuals", "bodyparts", "coords"
+    (if multi_individual is True),


this bit is outdated. We no longer have a "multi_individual" argument, it has to be rewritten to reflect the current arguments.

niksirbi · 2023-10-30T17:27:49Z

movement/io/save_poses.py

+def to_dlc_file(
+    ds: xr.Dataset,
+    file_path: Union[str, Path],
+    split_individuals: Union[bool, None] = None,


I would change this to as described in the docstring:

Suggested change

split_individuals: Union[bool, None] = None,

split_individuals: Union[bool, Literal["auto"]] = "auto",

I think "auto" is more explicit and informative than None in this case. You would also have to modify the corresponding if statement of course.

niksirbi · 2023-10-30T17:30:04Z

movement/io/save_poses.py

+    # Sets default behaviour for the function
+    if split_individuals is None:
+        individuals = ds.coords["individuals"].data.tolist()
+        if len(individuals) == 1:


The splitting is needed when there are more than one individuals (not when there is only one):

Suggested change

if len(individuals) == 1:

if len(individuals) > 1:

You could also write this as a one-liner, for example:

split_individuals = True if len(individuals) > 1 else False

We also may want to throw an error if the user passes an invalid type, something like:

if split_individuals == "auto": individuals = ds.coords["individuals"].data.tolist() split_individuals = True if len(individuals) > 1 else False elif not isinstance(split_individuals, bool): error_msg = ( f"Expected 'split_individuals' to be a boolean or 'auto', but got " f"{type(split_individuals)}." ) log_error(ValueError, error_msg)

For the auto function, would we want it to save a single individual xarray as a single individual dataframe and a multi-individual xarray as a multi-individual dataframe, or save both as single individual dataframes?

I would say this one:

save a single individual xarray as a single individual dataframe and a multi-individual xarray as a multi-individual dataframe

I think that's what the users would expect.

niksirbi · 2023-10-30T17:36:13Z

movement/io/save_poses.py

+    """If split_individuals is True then it will split the file into a
+    dictionary of pandas dataframes for each individual."""
+    if split_individuals:
+        dfdict = to_dlc_df(ds, True)


In general, I would always explicitly provide the keyword arguments, so people don't have to look up to understand the meaning of this boolean:

Suggested change

dfdict = to_dlc_df(ds, True)

df_dict = to_dlc_df(ds, split_individuals=True)

niksirbi · 2023-10-30T17:58:24Z

movement/io/save_poses.py

-
-
-def to_dlc_file(ds: xr.Dataset, file_path: Union[str, Path]) -> None:
+    if split_individuals:


You also need to check if there are actually more than one individuals in the data here.

I would say if there is only one individual, then the split_individuals argument should not matter at all, and we should always output one single-animal DataFrame (with "scorer", "bodyparts", "coords").

We should only care about "split_individuals", when there are actually many of them to be split. In that case:

if split_individuals == True we should output multiple single animal Dataframes in a dictionary (with the individual name as keys and with "scorer", "bodyparts", "coords" as index levels)

if split_individuals == False we should output one combined dataframe (with "scorer", "individuals", "bodyparts", "coords")

the docstring also has to be updated to reflect this behavior.

Hi @niksirbi,

If split_individuals == True and we have only one individual, it will just output one single-animal DataFrame (with "scorer", "bodyparts", "coords"). If split_individuals == False and we have only one individual, it will output one single-animal DataFrame (with "scorer", "individuals", "bodyparts", "coords").

Although for the second case when split_individuals == False the individuals column will just be filled with one individual, it might still be important to have this feature. It might be useful in case they wanted to merge the dataframes with other multi-individual dataframes, where pandas would want the dataframes to have the same format.

It also might be unexpected if they ran the function with split_individuals == False on a set of data with a mixture of single and multi-individual xarrays and saw the output being a mixture of single-animal and multi-animal dataframes as the single-individual xarrays would automatically turn into single-individual dataframes with no choice to make it multi-individual.

The auto function in to_dlc_file handles cases when the user might want all single individual xarrays to be stored as single-individual dataframes and multi-individual xarrays to be stored as multi-individual dataframes, I can make a separate function for this auto feature and also use it for to_dlc_df if preferable.

If split_individuals == True and we have only one individual, it will just output one single-animal DataFrame (with "scorer", "bodyparts", "coords"). If split_individuals == False and we have only one individual, it will output one single-animal DataFrame (with "scorer", "individuals", "bodyparts", "coords").

Hm, I actually like this suggestion and the flexibility in gives to the user. The way you are proposing means that split_individuals == True will always give an output csv in "single-animal" format, while split_individuals == False will always return a csv in "multi-animal" format, regardless of how many animals are in the project.

The arguments you make for it are convincing, so let's go ahead and do this! We just have to be careful to write the docstrings in an understandable way.

niksirbi · 2023-10-30T18:06:23Z

movement/io/save_poses.py

+            ) in dfdict.items():
+                """Iterates over dictionary, the key is the name of the
+                individual and the value is the corresponding df."""
+                filepath = (


I find f-strings more readable, so I would rewrite this as:

filepath = f"{file.path.with_suffix('')}_{key}.csv" df.to_csv(Path(filepath), sep=",")

niksirbi · 2023-10-30T18:10:03Z

movement/io/save_poses.py

+
+    """If split_individuals is True then it will split the file into a
+    dictionary of pandas dataframes for each individual."""
+    if split_individuals:


Again here, as in the above function, we have to check if there are more than one individuals to split, otherwise output only one single-animal file.

If split_individuals is True and there is only one individual to split it should already automatically output only one single-animal file.

DhruvSkyy · 2023-11-12T12:58:19Z

This PR has been superseded by #83

updated format argument to split individuals

0df6dd7

niksirbi mentioned this pull request Oct 23, 2023

Ability to save multi-animal pose tracks to single-animal files #68

Closed

7 tasks

Merge branch 'main' into individual_poses

7a469a6

niksirbi self-requested a review October 30, 2023 16:55

niksirbi requested changes Oct 30, 2023

View reviewed changes

DhruvSkyy closed this Nov 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to save multi-animal pose tracks to single-animal files #71

Ability to save multi-animal pose tracks to single-animal files #71

DhruvSkyy commented Oct 23, 2023 •

edited

Loading

sonarqubecloud bot commented Oct 30, 2023

niksirbi left a comment

niksirbi Oct 30, 2023

niksirbi Oct 30, 2023

niksirbi Oct 30, 2023

niksirbi Oct 30, 2023

niksirbi Oct 30, 2023 •

edited

Loading

niksirbi Oct 30, 2023

niksirbi Oct 30, 2023

niksirbi Oct 30, 2023 •

edited

Loading

DhruvSkyy Nov 11, 2023

niksirbi Nov 13, 2023

niksirbi Oct 30, 2023

niksirbi Oct 30, 2023

DhruvSkyy Nov 11, 2023

niksirbi Nov 13, 2023

niksirbi Oct 30, 2023

niksirbi Oct 30, 2023

DhruvSkyy Nov 11, 2023

DhruvSkyy commented Nov 12, 2023

	split_individuals: Union[bool, None] = None,
	split_individuals: Union[bool, Literal["auto"]] = "auto",

	dfdict = to_dlc_df(ds, True)
	df_dict = to_dlc_df(ds, split_individuals=True)



		def to_dlc_file(ds: xr.Dataset, file_path: Union[str, Path]) -> None:
		if split_individuals:

Ability to save multi-animal pose tracks to single-animal files #71

Ability to save multi-animal pose tracks to single-animal files #71

Conversation

DhruvSkyy commented Oct 23, 2023 • edited Loading

Description

References

How has this PR been tested?

Todo

Checklist:

sonarqubecloud bot commented Oct 30, 2023

niksirbi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niksirbi Oct 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niksirbi Oct 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DhruvSkyy commented Nov 12, 2023

DhruvSkyy commented Oct 23, 2023 •

edited

Loading

niksirbi Oct 30, 2023 •

edited

Loading

niksirbi Oct 30, 2023 •

edited

Loading