Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose t_start in BaseRecording #3117

Closed

Conversation

JoeZiminski
Copy link
Collaborator

@JoeZiminski JoeZiminski commented Jul 1, 2024

This PR exposes t_start by allowing it to be set through set_times(). Now set_times times parameter can take a vector (np.ndarray) or int | float. If the format it is treated as a time vector and if the latter a t_start. Previously t_start were only available if set when loading from file.

The PR extends the tests to cover the t_start cases and another couple of cases such as save_to_memory(), which currently has a bug (#2515). In creating these tests another couple of small issues were found, I cherry-picked the fixes to different PRs to keep the diffs easier to review. For ease though if everyone is happy all can stay here. They will fail until this branch is rebased back on master once the other PRs are reviewed.

@JoeZiminski JoeZiminski force-pushed the add_t_start_to_set_times branch 7 times, most recently from f6d9e67 to acd57a6 Compare July 1, 2024 21:52


# TODO: deprecate original implementations ###
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: this was messing up the diff so left for the end.

@JoeZiminski JoeZiminski changed the title Expose t_start in BaseRecording, extend tests and a couple of fixes. Expose t_start in BaseRecording Jul 1, 2024
@alejoe91 alejoe91 added this to the 0.101.0 milestone Jul 2, 2024
@alejoe91 alejoe91 added the core Changes to core module label Jul 2, 2024
@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Jul 2, 2024

We do use the time machiner of spikeinterface in neuroconv. I would like to participate on the review. Among the Prs that you have opened, where should I start?

@JoeZiminski
Copy link
Collaborator Author

Thanks @h-mayorquin! That would be great, I think:

  1. Round instead of int for time_to_sample_index. #3119 is a very quick one mostly unrelated to others
  2. Is also mostly standalone Add time vector case to get_durations. #3118 only related to the time_vector concept.
  3. I think then this PR (I will undraft it now) for the changes to t_start
  4. then Fix t_starts not propagated to save_to_memory. #3120 which fixes a failing test case here.

Cheers!

@JoeZiminski JoeZiminski marked this pull request as ready for review July 2, 2024 19:07
Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this? I think knowing about how you intend this to be used would help clarify the debate and inform the API.

I find the testing a bit complicated to read. I think there is too much indirection of the fixtures. I will have to do a second reading but I think that's a code smell.

Plus, it seems that many of the tests don't really correspond hee. Why are the tests of memory no in the memory PR. If you are iterating over the segments already on some of the fixtures you can just set their t_start attribute directly, you don't need a special method at the BaseRecording level to set t_start, it is a public attribute at the BaseRecordingSegment. I don't think that the tests for the interface should rely on the interface.

I am requesting changes here because I think some of the testing does not correspond to this PR.


Parameters
----------
times : 1d np.array
The time vector
times : int | float | 1d np.array
Copy link
Collaborator

@h-mayorquin h-mayorquin Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would prefer to not overload this function and create a new one instead. set_t_start.

What are the advantages of overloading this? How are you thinking about it?

But I think what kind of API we should have will become clear once I understood how are you envisioning this to be used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I couldn't decide on this vs. separate functions and @samuelgarcia suggested this approach. On balance I think I prefer it as t_start and times_vector are mutually exclusive ways of setting custom times, so it makes sense to change in one place. It would be slightly strange to call set_t_start() and this removes the time_vector attribute. But I'm not sure either way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Thanks for sharing, I will add this as something to discuss at the meeting.

Copy link
Collaborator

@h-mayorquin h-mayorquin Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that my take is the following:

We use set_times because the sampling recording is slightly irregular when we want more precision. This is a good name for what the function does, it has a clear purpose and semantics.

Why we would need to set t_start?

The use case that I can think off is that we would like to shift all the recording to the right or the left on time. But if that is the use case I would be better to have a method that shift the recording times and works independently of whether times are handled internally with t_start and sampling frequency or a time vector.
In the first case, you shift t_start (in most of the cases from 0) and in the second you shift the time vector.

If it is not for shifting I can't think on other use of setting t_start

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use case would be if you had separate sessions in a single day, for example a session, 10 minute break to change some equiptment, and another recording session. If these sessions are held as different segments on a recording (or, as separate recordings) the researcher may want to hold the true recording times (e.g. session 1 started at 1 pm, session two at 1.30 pm).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, change my answer below what I think this type of case is not well served with the current implementation.

spaced timeseries data. Return the original recording,
recoridng with time vectors added and list including the added time vectors.
"""
times_recording = copy.deepcopy(raw_recording)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have clone for this as an extractor method but if you really require this, why make the raw recording fixture per session?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit of the raw_recording fixture is that durations only needs to be defined once, then copied as set_times() is in place. But I agree it is a lot of indirection and it is probably more readable to incorporate into the individual fixtures, possibly with DURATIONS=[...] set at the top of the script?

@JoeZiminski
Copy link
Collaborator Author

JoeZiminski commented Jul 2, 2024

Thanks @h-mayorquin! The problem is that t_start and time_vecetor are mutually exclusive and so if you set time_vector the t_start is removed (and, it requires the same vice-versa). Therefore some kind of setter is required to remove time_vector if it is already set when setting t_start. Also, t_start and time_vector seem like core ways of handling custom times in SpikeInterface so it might seem unusual to have a setter for one but not the other.

Agree the tests are kind of messed up, in adding these for the t_start I found some other bugs and split them into separate PRs to ease review, but was too lazy to do the tests 😅 . The tests in the other PRs require some of the machinery in this PR, so happy to merge this PR first removing some of those tests, then add them in other PRs once this is merged.

Tomorrow I can try and remove some of the indrection in the tests and remove tests that do not correspond directly to this PR, or at least, are failing if included in this PR. WDYT?

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Jul 2, 2024

Thanks @h-mayorquin! The problme is that t_start and time_vecetor are mutually exclusive and so if you set time_vector the t_start is removed (and, it requires the save vice-versa). Therefore some kind of setter is required to remove time_vector if it is set when setting t_start. Also, t_start and time_vector are core ways of handling custom times in SpikeInterface so for me it seems unusual to have a setter for one but the other is set through a public attribute on a segment.

Yes, I think this is at the core of why I don't like this approach. It requires the final user to be aware of internal details of spikeinterface. I feel it mixes how spikeinterface handles time internally with what the user wants with its recording object. Let me elaborate:

I think that a good interface will abstract those implementations details away from the user and will allow them to express things that they want. Let me illustrate a different approach with an example:

  • My times are iregular so I will set the correct times that I got from a TTL process with set_times.
  • But also, this is the second segment of an experiment and it should start an hour after the first.

What I calim is that this should be as easy as

set_times(times_from_ttl)
set_segment_start_time(t=time_when_the_second_segment_started)

The interface proposed in this PR will say: sorry, those things are exclusive because of this internal spikeinterface reason, what you need to do is to compose on your own the time_vector, shift it, and then use set_times differently for each segment.

What I am proposing is that we have a set_segment_start_time that is agnostic to the internal implementation details of spikeinterface: It should do what the user wants

  • If the recording has a time_vector because the format is like that or the user set those times before, then when the user calls set_segment_start_time it should modify the time vector such that the time_vector[0] is t_start.
  • If the recording handles time with sampling_frequency and t_start then it should modify t_start to what the user wants.

Advance users or testing casses like the one in this PR can modify the attributes directly for stuff that is not covered.

@JoeZiminski
Copy link
Collaborator Author

JoeZiminski commented Jul 3, 2024

Thanks Herberto, I like the semantics of the set_segment_start_time() it is very clear. I also don't like the mutual excusivility of the t_start and time_vector mechanisms and its requirement for the user to understand the SpikeInterface internals. I have put some feedback on the proposed approach in the dropdown below (as not to bloat the response).

on `set_segment_start_time`

I like the semantics of this but I think this still requires the user to understand how SpikeInterface is representing time internally, and creates some hidden dependencies that could be even more confusing. At present there is hidden dependency but at least the dependency is mutual exclusivility so the two concepts can't interact and you only really need to track the one you want to use. In the proposed case there are some interactions going on under the hood which could end up in confusing behaviour. For example:

  • if you have times that start from zero but you want them relative, you need to pass some times that are 'wrong' and then use another function to make them correct
  • if you have times that do not start from zero, you need to remove the offset (?), pass them in an incorrect form, then make them correct again through a second function
  • Or, maybe set_times can take a time_vector that do not start at zero. But now there are two way to set the segment start times for time vector (directly through the set time vector or through this second approach).
  • It could lead to the case where somone sets set_segment_start_time then later adds the time_vector, but now their time vector is wrong because the previously set t_starts are interacting. This would be quite stange behaviour that you set_times() then get_times() and the get_times() is different.

I think this is a very important discussion to have but it would require a major reworking of how SpikeInterface handles time which I think is outside the scope of the PR. I think we should make an issue to discuss this, I'd be very keen as it is important for #3072.

For now, let me better motivate this PR with respect to the existing implementation of times:

  1. A present, custom times can be represented as t_start or time_vector. time_vector has it's own setter set_times(). However t_start is not possible to set except for directly on the segment, which is an unused pattern in the SpikeInterface API. The only proper way to set t_start through the API is through some extractors when loading certain file types. So t_start is in this weird state of sometimes turning up in the codebase but isn't actually settable generally.
  2. Therefore I think there are two possible approaches. Either t_start is not a useful concept and it should be removed from SpikeInterface (maybe if t_start is required in extractors it can be set to a full time_vector during load) (I am not averse to this). Alternatively, it should be fully exposed and documented in the SpikeInterface API (this PR takes the second approach).
  3. Tests are added to cover all uses of time_vector and t_start in the core machinery.

Let me know what you think of the above, I think there is definately room for further optimisation for the API generally, but I think this PR represents a digestible improvement on the current situation (but alternatively would agree with removing t_start from the SpikeInterface side and relying entirely on time_vector).

@h-mayorquin
Copy link
Collaborator

Or, maybe set_times can take a time_vector that do not start at zero. But now there are two way to set the segment start times for time vector (directly through the set time vector or through this second approach).

Yes, that's how I am thinking about it. This is the way that is now. I don't understand the first two examples but maybe they rely on this behavior not being available? Maybe not? If not, can you illustrate the examples?

It could lead to the case where somone sets set_segment_start_time then later adds the time_vector, but now their time vector is wrong because the previously set t_starts are interacting. This would be quite stange behaviour that you set_times() then get_times() and the get_times() is different.

This one I agree is confusing. The current behavior is that set_times is the source of true so I will keep that overwrite behavior. set_segment_start_time is a utility that works either if you have a time vector or not but if you set the time vector then that overides what set_segment_start_time did before.

A present, custom times can be represented as t_start or time_vector. time_vector has it's own setter set_times(). However t_start is not possible to set except for directly on the segment, which is an unused pattern in the SpikeInterface API. The only proper way to set t_start through the API is through some extractors when loading certain file types. So t_start is in this weird state of sometimes turning up in the codebase but isn't actually settable generally.

Yeah, mabye is an improvement, but I don't think is a big one, setting the t_start at the segment level is fine for the purposes of this PR for example which is testing but if we are gona discuss how to expose a simpler API to the end user then I want to have this more general discussion that we are having.

We should not try to remove the concept of handling the internal timings with t_start because that is there for memory efficiency reasons but I maybe the user does not need to interact with it. Another possibility to the one I am proposing is only setings times with a time vector but transforming them to t_start, sampling_frequency internally if they are regular enough. I general I think we should separate internal handing from how the user interacts with him and avoid a power user bias.

@JoeZiminski
Copy link
Collaborator Author

Thanks! Agree these can be split into two separate discussions. Would you agree with:

  1. Move the from tests here to Fix t_starts not propagated to save_to_memory. #3120 (with some refactoring of the tests) and close this PR
  2. Open an issue to discuss the time API and how this should be presented in Understanding how Spikeinterface handles recording times. #2627

@h-mayorquin
Copy link
Collaborator

Mmm this is just my opinion, I don't know if I have convinced you. I think that Sam will side with you if you want to make your life more convenient here

But I think not using this machinery for the tests is a good idea regardless of whether we make this a user interface or not.

@JoeZiminski
Copy link
Collaborator Author

I think it's worthwhile discussing further, if we pick this lane (i.e. what is introduced in this PR) and write the documentation it will be hard to undo and there may be better options. I definately agree there is room to make this cleaner and more intutiive, for me though it is safer to keep t_start and time_vector distinct concepts rather than allow them to interact. I'll move the tests to the other PR ASAP, and then we can keep this PR with bare minimum changes and discuss the API further here.

@JoeZiminski
Copy link
Collaborator Author

I will close this PR, the tests can be included in #3120 #3118 and we are converging on a much better time API in #3157

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module