Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only load template_metrics extension on compute if keeping some metrics #3478

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chrishalcrow
Copy link
Collaborator

Might fix #3471
(Could you try this out please @jonpedros)

When template_metrics is computed and delete_existing_metrics = False, any old metrics which aren't being recomputed are kept. To do this, the _run method loads the old template metric extension (if it exists). However, if this is the first time it has run, it has already created the extension folder. In this case _run sees the newly-created folder and tries to load it. Bu there's not much in it, so a no run_info warning appears.

To avoid this, this PR only loads the extension folder if there are metrics to be kept.

I think the previous implementation of load_run_info was sending two warnings if a run_info file didn't exist. I think I've made the logic a bit simpler. Could one of the run_info experts take a look please (@jonahpearl @alejoe91 ) - thanks!

@jonahpearl
Copy link
Contributor

jonahpearl commented Oct 15, 2024

The run_info change looks fine. Incidentally, this is exactly the reason that I think it would be useful to save run_info before the run has occurred, and perhaps as soon as the extension is created (@alejoe91 and see #3451) — because then it's very simple to check if the extension has already been run. If it hasn't been, you can skip all the delete_existing_metrics logic, avoid loading an empty folder, etc.

As written here, I think it's this condition if set(self.params["metrics_to_compute"]) != set(self.params["metric_names"]) that's being used to check if the newly created folder is empty? I suspect it would be clearer to just check if not run_info["run_completed"] (if we can make run_info get saved before the run happens) but lmk if I'm misunderstanding.

Comment on lines -2042 to -2043
else:
warnings.warn(f"Found no run_info file for {self.extension_name}, extension should be re-computed.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning was here to be aware that the analyzer was computed before this run info.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the full current implementation here:


If the first warning is triggered, the second is always triggered (I think!). So the first warning isn’t needed, and this is deleted here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the second if block was meant to be indented under the zarr case, but I agree it works just as well to assign it to None in the first place and then do the warning at the end if it's still None.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did wonder if it was a missing indentation. This solution's a tiny bit neater as we only need one copy of the warning message. I like the idea of the run_info in general: great suggestion @jonahpearl :)

@chrishalcrow
Copy link
Collaborator Author

Making the run_info.json file asap sounds reasonable to me.

As written here, I think it's this condition if set(self.params["metrics_to_compute"]) != set(self.params["metric_names"]) that's being used to check if the newly created folder is empty? I suspect it would be clearer to just check if not run_info["run_completed"] (if we can make run_info get saved before the run happens) but lmk if I'm misunderstanding.

To be honest, it's quite a good check to do anyway: it will now only try to load the template_metrics extension it needs propagate some already-existing metrics from an old run. So in this case I'm happy with the solution.

@alejoe91
Copy link
Member

Making the run_info.json file asap sounds reasonable to me.

As written here, I think it's this condition if set(self.params["metrics_to_compute"]) != set(self.params["metric_names"]) that's being used to check if the newly created folder is empty? I suspect it would be clearer to just check if not run_info["run_completed"] (if we can make run_info get saved before the run happens) but lmk if I'm misunderstanding.

To be honest, it's quite a good check to do anyway: it will now only try to load the template_metrics extension it needs propagate some already-existing metrics from an old run. So in this case I'm happy with the solution.

I think I removed it because it didn't play nice with loading old waveform extractor folders as an analyzer. This was part as a larger fix to improve back-compatibility, so I'm happy to test if this PR works on old data on my end!


# Check if we need to propogate any old metrics. If so, we'll do that.
# Otherwise, we'll avoid attempting to load an empty template_metrics.
if set(self.params["metrics_to_compute"]) != set(self.params["metric_names"]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cool idea to make the comparison :)

@jonpedros
Copy link

Tested it and now all the extensions I need to compute are outputted as expected. I do still get the warnings, though, but I guess that was not changed?

For template_metrics:

[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\spikeinterface\core\sortinganalyzer.py:2043](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/spikeinterface/core/sortinganalyzer.py:2043): UserWarning: Found no run_info file for template_metrics, extension should be re-computed.
  warnings.warn(f"Found no run_info file for {self.extension_name}, extension should be re-computed.")
[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\spikeinterface\core\sortinganalyzer.py:2050](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/spikeinterface/core/sortinganalyzer.py:2050): UserWarning: Found no run_info file for template_metrics, extension should be re-computed.
  warnings.warn(f"Found no run_info file for {self.extension_name}, extension should be re-computed.")
[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\spikeinterface\core\sortinganalyzer.py:2125](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/spikeinterface/core/sortinganalyzer.py:2125): UserWarning: Found no data for template_metrics, extension should be re-computed.
  warnings.warn(f"Found no data for {self.extension_name}, extension should be re-computed.")

For spike_amplitudes:

[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\numpy\core\_methods.py:206](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/numpy/core/_methods.py:206): RuntimeWarning: Degrees of freedom <= 0 for slice
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\numpy\core\_methods.py:163](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/numpy/core/_methods.py:163): RuntimeWarning: invalid value encountered in divide
  arrmean = um.true_divide(arrmean, div, out=arrmean,
[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\numpy\core\_methods.py:198](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/numpy/core/_methods.py:198): RuntimeWarning: invalid value encountered in divide
  ret = ret.dtype.type(ret / rcount)

@chrishalcrow
Copy link
Collaborator Author

chrishalcrow commented Nov 13, 2024

Tested it and now all the extensions I need to compute are outputted as expected. I do still get the warnings, though, but I guess that was not changed?

Hey @jonahpearl @jonpedros- sorry, I completely forgot about this PR. The warnings were updated. I think that, based on the fact that your warnings about template_metrics happen at lines 2043 and 2050, you're using the "main" branch rather than this PRs branch. If you have the github CLI, you can check out this branch by running gh pr checkout 3478 in your spikeinterface folder. If you had time to do that, re-run and confirm that you don't get two identical "Found no run_info file for template_metrics, extension should be re-computed" warning; that'd be great.

@jonahpearl
Copy link
Contributor

Hey @jonahpearl

@jonpedros I think you meant :)

@jonpedros
Copy link

jonpedros commented Nov 21, 2024

If you had time to do that, re-run and confirm that you don't get two identical "Found no run_info file for template_metrics, extension should be re-computed" warning; that'd be great.

Tested and it's gone! Only the one from spike_amplitudes remains. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when loading template_metrics extension
6 participants