Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix eager dtype conversion of value column #1353

Merged
merged 10 commits into from
Dec 31, 2024
Merged

Conversation

drammock
Copy link
Member

@drammock drammock commented Dec 16, 2024

Follow-up to #1349, see https://github.com/mne-tools/mne-bids/pull/1349/files#r1885104885

PR Description

Basically: use defensive programming when trying to dtype-convert the value column from events.tsv. I was a bit worried that a column called value could easily exist in real datasets and not conform to the way that column gets written by MNE-BIDS, and it didn't take long for @larsoner to find one :)

The bug was already caught by a downstream test in MNE-BIDS-Pipeline, so I didn't bother adding a new test here, but LMK if you want me to add a simple one.

Merge checklist

Maintainer, please confirm the following before merging.
If applicable:

  • All comments are resolved
  • This is not your own PR
  • All CIs are happy
  • PR title starts with [MRG]
  • whats_new.rst is updated
  • New contributors have been added to CITATION.cff
  • PR description includes phrase "closes <#issue-number>"

@drammock
Copy link
Member Author

circleCI failure is a bit mysterious (failure to cross-ref pathlib.Path in an attribute docstring). I thought it was sphinx-doc/sphinx#12317 but CircleCI here is running Sphinx 8.1.3 so that fix should definitely be in place. I'll try to dig deeper tomorrow. Also haven't looked into the Py3.10 build failure yet, beyond noting that it's a metadata-missing on twine --check error. anyone else should feel free to investigate if you have a debugging craving and a few minutes.

@drammock
Copy link
Member Author

OK, the cross-ref failure looks like sphinx-doc/sphinx#13178; autodoc is the cause but so far I can't really understand the root of the problem. Probably we just need to add a warning ignore :(

sadly I can't replicate the twine check --strict dist/* failure locally using a py3.10 environment :(

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 90.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 97.42%. Comparing base (46f284b) to head (fae0ead).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
mne_bids/read.py 84.61% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1353      +/-   ##
==========================================
- Coverage   97.43%   97.42%   -0.02%     
==========================================
  Files          40       40              
  Lines        8903     8918      +15     
==========================================
+ Hits         8675     8688      +13     
- Misses        228      230       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@drammock
Copy link
Member Author

ok was able to reproduce the twine failure, and found a workaround here: astral-sh/rye#1446 (comment)

TL;DR some kind of mismatch between twine and hatchling, related to metadata versions.
Installing twine>=6 didn't fix it though, whereas pinning hatchling==1.26.3 did (latest is 1.27.0). TBH IDK what the problem is, both name = "mne-bids" and dynamic = ["version"] are in pyproject.toml 🤷🏻

@hoechenberger
Copy link
Member

Thanks for looking into this, @drammock

@scott-huberty
Copy link
Contributor

scott-huberty commented Dec 22, 2024

Hi @drammock

To understand the original issue I actually had to create a test myself that would be caught by the try-catches added in this PR, so maybe it is indeed worth adding a test?

Something like this to mne_bids/tests/test_read/test_handle_events_reading:

# Test with only a `value` column, but where the values in this `value` column don't
# meet our assumption that they can be coerced to integers/floats.
events = {
        "onset": [10, 15],
        "duration": [1, 1],
        "value": ["Condition 1", "Condition 2"],
}
events_fname = tmp_path / "bids6" / "sub-01_task-test_events.tsv"
events_fname.parent.mkdir()
_to_tsv(events, events_fname)

raw, event_id = _handle_events_reading(events_fname, raw)
ev_arr, ev_dict = mne.events_from_annotations(raw, event_id=event_id)

# EDIT: test below will pass after https://github.com/mne-tools/mne-bids/pull/1353#discussion_r1894777877

# In this edge case, there were no "event IDs" in the `value` column of the
# events.tsv, so we created event IDs based on the row index of those values.
# And The values of this `value` column were used as annotation descripitons.
assert {"Condition 1": 0, "Condition 2": 1} == event_id == ev_dict
assert raw.annotations.description.tolist() == ["Condition 1", "Condition 2"]

This test creates an events TSV file like:

onset	duration	value
10	1	Condition 1
15	1	Condition 2

Where _, event_id = _handle_events_reading(events_fname, raw) returns

{'Condition 1': 'Condition 1', 'Condition 2': 'Condition 2'}

And AFAICT, from the User's perspective:

  1. This event_id would only get returned to the user if they do read_raw_bids(..., return_event_dict=True)
  2. The condition 1 and condition 2 events are assigned to annotations in the raw object.

EDIT: I updated the proposed test after my suggestions in

@scott-huberty
Copy link
Contributor

On second thought...

{'Condition 1': 'Condition 1', 'Condition 2': 'Condition 2'} is strange.

In this case, maybe we should either

  • return None (and warn the user)
  • or use np.arange to create a "proper" event dict.

try:
culled_vals = np.asarray(culled_vals, dtype=float)
except ValueError:
pass
Copy link
Contributor

@scott-huberty scott-huberty Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pass
# If the values are not numeric, just use the row index as the event ID
culled_vals = np.arange(len(culled_vals))

@drammock drammock mentioned this pull request Dec 23, 2024
7 tasks
@sappelhoff
Copy link
Member

@scott-huberty could you please push your test to this PR? For now without changing the behavior as in your suggestions. I would rather merge this one sooner than later as it contains two important fixes / workarounds.

We can then discuss your suggested changes in a new issue if you like.

@scott-huberty
Copy link
Contributor

@scott-huberty could you please push your test to this PR? For now without changing the behavior as in your suggestions. I would rather merge this one sooner than later as it contains two important fixes / workarounds.

We can then discuss your suggested changes in a new issue if you like.

sorry @sappelhoff Since I am not an MNE maintainer I don't think that I have the push rights needed to do that..(?)

@drammock
Copy link
Member Author

I don't think that I have the push rights needed to do that

You can open a PR into my fork+branch.

@drammock drammock enabled auto-merge (squash) December 31, 2024 17:27
@drammock
Copy link
Member Author

I don't think that I have the push rights needed to do that

You can open a PR into my fork+branch.

never mind @scott-huberty, I pushed the (simplified) test. I gave some thought to your suggestion about setting the event_id to use row numbers instead of the (string) values in the values column. On one hand I think it makes sense (so that it would be possible to actually use that event_id dict in mne-python functions) but I also decided it's not crucial to get it in to this PR, because in the case where the events.tsv value column contains strings, the user won't be expecting to be able to recover an original mapping of eventID integers (because their value column wasn't written by MNE-BIDS in the first place).

The other reason I didn't try to add it in here, is that I think it's not ideal to just assign eventID integers from row numbers --- I think you'd want to take (sorted?) unique values and then assign sequential integers starting from 1 probably. Otherwise the event_id mapping changes depending on things like randomized trial presentation order, meaning different subjects could get different mappings. Or, to be really safe, somehow create integers from the hashes of the keys; that would be safest (though perhaps too annoying to be worth it --- especially given that I don't expect users who are in this particular corner case to even want the event_id back).

@drammock drammock merged commit 3492fa0 into mne-tools:main Dec 31, 2024
21 of 23 checks passed
@drammock drammock deleted the followup branch January 1, 2025 03:18
@sappelhoff
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Documentation Build Errors.
5 participants