-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: support python3.12 f-string tokenizing #213
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #213 +/- ##
==========================================
+ Coverage 97.73% 97.76% +0.02%
==========================================
Files 12 12
Lines 1017 1028 +11
Branches 223 228 +5
==========================================
+ Hits 994 1005 +11
Misses 12 12
Partials 11 11
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing these v. important bugs @mbhall88
A bit unfortunate for us that python 3.12 is changing f-string tokenisation
I'm just going to try to summarise in words what I understand from this PR but also what I don't understand from snakefmt (:cry:) and what I think we might need to do specifically here:
- You're adding 'space addition' through function
add_token_space
by registering 'FSTRING_START' to thespacing_triggers
dict; why can't we run f-strings throughblack
again? - If we can run f-strings through black, can't we try and run
process_token
from encountering 'FSTRING_START' to encountering 'FSTRING_END' as per the new f-string token sequence ? I'm thinking of something like lines 394 and 396, where if we'rein_brackets
orin_lambda
we keep parsing, instead of triggering snakefmt parameter resolution. Here we'd similarly ask if we'rein_fstring
BUT it seems that you're having to introduce parse 'FSTRING_MID's individually (lines 335-341) and add braces because they get lost?
So in fact IIUC, tokenizer gets rid of '{' and '}' inside f-strings, so we have to re-add them manually? And thus, we can't just try and 'parse through' an f-string from 'FSTRING_START' to 'FSTRING_END' because that way we lose '{' and '}' in f-strings?
Second comment: echoing my review of #211, do you want to have a chat about a formal lexer/parser for |
We can run them through black, but they get mangled by snakefmt's parser now before they go into black sadly.
That is an interesting idea. Not sure I completely understand how we would do this though as I think this section of the code was your creation and I'm not as familiar with it. Feel free to have a go at adding an Yes, I had to introduce them as
Essentially yes, this is the problem I ran into. Another option I had thought of was once we hit |
See my message on slack |
Replying to your reply #213 (comment), I think the solution you've implemented is currently the best - it's actually v. succinct So this is good to merge, thanks @mbhall88 ! |
Python 3.12 now formalises f-strings by parsing them as their own entities, rather than plain strings. This creates some problems for us as where a whole f-string would produce a single
tokenize.STRING
, it now breaks f-strings up into bits and adds the new tokensFSTRING_START
,FSTRING_MIDDLE
andFSTRING_END
.The fixes in this PR relate to issues people have already come across. I suspect there will be more. My solution is somewhat hacky in places but it was the best I could do given the tokens the
tokenize
library is giving us. #207 is particularly problematic as the tokenizer is the one that removes the extra braces, hence why I am manually adding them back. Fingers crossed this doesn't raise further issues.I have also added python 3.12 to the CI so we are testing across
<3.12
and>=3.12
Closes #207 and closes #210