fix: support python3.12 f-string tokenizing #213

mbhall88 · 2024-01-07T12:12:26Z

Python 3.12 now formalises f-strings by parsing them as their own entities, rather than plain strings. This creates some problems for us as where a whole f-string would produce a single tokenize.STRING, it now breaks f-strings up into bits and adds the new tokens FSTRING_START, FSTRING_MIDDLE and FSTRING_END.

The fixes in this PR relate to issues people have already come across. I suspect there will be more. My solution is somewhat hacky in places but it was the best I could do given the tokens the tokenize library is giving us. #207 is particularly problematic as the tokenizer is the one that removes the extra braces, hence why I am manually adding them back. Fingers crossed this doesn't raise further issues.

I have also added python 3.12 to the CI so we are testing across <3.12 and >=3.12

Closes #207 and closes #210

…kenize]

snakemake#207]

codecov · 2024-01-07T12:13:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (a80fa26) 97.73% compared to head (62bca59) 97.76%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #213      +/-   ##
==========================================
+ Coverage   97.73%   97.76%   +0.02%     
==========================================
  Files          12       12              
  Lines        1017     1028      +11     
  Branches      223      228       +5     
==========================================
+ Hits          994     1005      +11     
  Misses         12       12              
  Partials       11       11

Flag	Coverage Δ
unittests	`97.66% <90.90%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
snakefmt/parser/syntax.py	`98.91% <100.00%> (+0.04%)`	⬆️

bricoletc

Thanks for fixing these v. important bugs @mbhall88
A bit unfortunate for us that python 3.12 is changing f-string tokenisation

I'm just going to try to summarise in words what I understand from this PR but also what I don't understand from snakefmt (:cry:) and what I think we might need to do specifically here:

You're adding 'space addition' through function add_token_space by registering 'FSTRING_START' to the spacing_triggers dict; why can't we run f-strings through black again?
If we can run f-strings through black, can't we try and run process_token from encountering 'FSTRING_START' to encountering 'FSTRING_END' as per the new f-string token sequence ? I'm thinking of something like lines 394 and 396, where if we're in_brackets or in_lambda we keep parsing, instead of triggering snakefmt parameter resolution. Here we'd similarly ask if we're in_fstring
BUT it seems that you're having to introduce parse 'FSTRING_MID's individually (lines 335-341) and add braces because they get lost?

So in fact IIUC, tokenizer gets rid of '{' and '}' inside f-strings, so we have to re-add them manually? And thus, we can't just try and 'parse through' an f-string from 'FSTRING_START' to 'FSTRING_END' because that way we lose '{' and '}' in f-strings?

snakefmt/parser/syntax.py

bricoletc · 2024-01-08T14:26:57Z

Second comment: echoing my review of #211, do you want to have a chat about a formal lexer/parser for snakefmt?

mbhall88 · 2024-01-09T00:45:01Z

Thanks for fixing these v. important bugs @mbhall88 A bit unfortunate for us that python 3.12 is changing f-string tokenisation

I'm just going to try to summarise in words what I understand from this PR but also what I don't understand from snakefmt (:cry:) and what I think we might need to do specifically here:
* You're adding 'space addition' through function `add_token_space` by registering 'FSTRING_START' to the `spacing_triggers` dict; why can't we run f-strings through `black` again?

We can run them through black, but they get mangled by snakefmt's parser now before they go into black sadly.

* If we **can** run f-strings through black, can't we try and run `process_token` from encountering 'FSTRING_START' to encountering 'FSTRING_END' as per the new f-string [token sequence](https://peps.python.org/pep-0701/#new-tokens) ?  I'm thinking of something like lines 394 and 396, where if we're `in_brackets` or `in_lambda` we keep parsing, instead of triggering snakefmt parameter resolution. Here we'd similarly ask if we're `in_fstring`
  BUT it seems that you're having to introduce parse 'FSTRING_MID's individually (lines 335-341) and add braces because they get lost?

That is an interesting idea. Not sure I completely understand how we would do this though as I think this section of the code was your creation and I'm not as familiar with it. Feel free to have a go at adding an in_fstring variant.

Yes, I had to introduce them as tokenize automatically removes them, so we only get a single brace when doubles are used. They get returned in their surrounding strings too, not even as a single token which made it slightly more tricky.

So in fact IIUC, tokenizer gets rid of '{' and '}' inside f-strings, so we have to re-add them manually? And thus, we can't just try and 'parse through' an f-string from 'FSTRING_START' to 'FSTRING_END' because that way we lose '{' and '}' in f-strings?

Essentially yes, this is the problem I ran into. Another option I had thought of was once we hit FSTRING_START we greedily take tokens until FSTRING_END and we use the positions of those tokens in the original string (I assume we get that?) to pull out the whole f-string and make a STRING token out of it. However, f-strings have added functionality in python3.12 now that is compatible with 3.11 - though I'm not sure if this would be a problem with passing the string to black or not...

mbhall88 · 2024-01-09T00:48:22Z

Second comment: echoing my review of #211, do you want to have a chat about a formal lexer/parser for snakefmt?

See my message on slack

bricoletc · 2024-01-09T08:31:22Z

Replying to your reply #213 (comment), I think the solution you've implemented is currently the best - it's actually v. succinct
And I'm not a fan, in general, of extracting chunks of the original text using pointers stored in the tokens - we've avoided doing this so far IIRC
And my idea of trying to parse through the whole f-string and passing it to black will not work because of the loss of the double curly brackets; incidentally, why do they now do that?? It's a pain for us

So this is good to merge, thanks @mbhall88 !

snakefmt/parser/syntax.py

mbhall88 added 7 commits January 7, 2024 06:48

build: update dev dependencies

5ea098c

chore: lint

e0abb80

ci: update release please workflow actions versions

c055b65

fix: handle python3.12 f-string tokenization [closes snakemake#210]

3410511

fix: don't add space between string and comma [python3.12 f-string to…

a3220fc

…kenize]

ci: add python3.12 to CI testing

18fb507

fix: don't remove double braces in f-strings in rule directives [closes

a3f31c4

snakemake#207]

mbhall88 requested a review from bricoletc January 7, 2024 12:12

chore: correct reference to test source

62bca59

bricoletc reviewed Jan 8, 2024

View reviewed changes

snakefmt/parser/syntax.py Show resolved Hide resolved

bricoletc reviewed Jan 8, 2024

View reviewed changes

snakefmt/parser/syntax.py Show resolved Hide resolved

snakefmt/parser/syntax.py Show resolved Hide resolved

mbhall88 mentioned this pull request Jan 9, 2024

chore: release 0.9.0 #212

Merged

bricoletc approved these changes Jan 9, 2024

View reviewed changes

snakefmt/parser/syntax.py Show resolved Hide resolved

mbhall88 merged commit 4932076 into snakemake:master Jan 9, 2024
13 checks passed

mbhall88 mentioned this pull request Jan 29, 2024

feat: update to black v24 and fix double curly braces #216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support python3.12 f-string tokenizing #213

fix: support python3.12 f-string tokenizing #213

mbhall88 commented Jan 7, 2024 •

edited

Loading

codecov bot commented Jan 7, 2024 •

edited

Loading

bricoletc left a comment •

edited

Loading

bricoletc commented Jan 8, 2024

mbhall88 commented Jan 9, 2024

mbhall88 commented Jan 9, 2024

bricoletc commented Jan 9, 2024

fix: support python3.12 f-string tokenizing #213

fix: support python3.12 f-string tokenizing #213

Conversation

mbhall88 commented Jan 7, 2024 • edited Loading

codecov bot commented Jan 7, 2024 • edited Loading

Codecov Report

bricoletc left a comment • edited Loading

Choose a reason for hiding this comment

bricoletc commented Jan 8, 2024

mbhall88 commented Jan 9, 2024

mbhall88 commented Jan 9, 2024

bricoletc commented Jan 9, 2024

mbhall88 commented Jan 7, 2024 •

edited

Loading

codecov bot commented Jan 7, 2024 •

edited

Loading

bricoletc left a comment •

edited

Loading