-
-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve C scanner conditional inclusion #4626
base: master
Are you sure you want to change the base?
Conversation
6b4efba
to
3ba79e8
Compare
Seems like this could have significant performance impact? And really it's only impacting the conditional C scanner? |
For the vast majority of cases, it's a single iteration over the dict made from |
3ba79e8
to
95f6dab
Compare
old_ns = mapping | ||
ns = {} | ||
while True: | ||
ns = {k: old_ns[v] if v in old_ns else v for k, v in old_ns.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will loop forever if one variable self references.
-DXYZ=XYZ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not convinced... if you have {'XYZ': 'XYZ'}
, then k
gets the looked-up value old_ns['XYZ']
, which is 'XYZ'
, which is actually the same as it was before. Thus, the following check finds that old_ns
and ns
have the same values, so it breaks out of the loop. There might still be a way to get into a pathological state, but I don't think that example is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps if you have three defines that circularly point to each other? Is that worth worrying about? cpp does detect that as a pathological case and errors out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this to line 581 in CTests.py and run..
('XYZ','XYZ')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. it completed. Trying 3 way loop..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect two way (A: B, B: A) to work as well - first pass you'll have (A: A, B: B) and then there's no more work to do. Three-way i'm pretty sure will loop. We could decide that it's pathlogical to do more than two transformations, and not try to loop, or bail on a loop counter with an exception, if you think either is worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 way loop hangs infinitely... I think that'd be BAD...
subst() has an implementation to avoid this BTW.
Could use that to do the resolution.. of course that'd be slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, of course it's bad, but who would do something like that? I think fairly simple measures should be enough, as noted above. It's not supposed to be a perfect tool, as the C scanner has a limited job - unlike subst, which has to come out just right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example - no-loop, just two-pass:
ns = {k: old_ns[v] if v in old_ns else v for k, v in old_ns.items()}
if old_ns == ns: # there were no substitutions
return ns
# do one more round
return {k: ns[v] if v in ns else v for k, v in ns.items()}
So to be able to wrap this one up, two questions (update: I've ticked the two boxes that seem to be preferred):
and:
|
I'd say 5 loops is sufficient, I'd definitely note this in the release/changes. |
Simplistic macro replacement is now done on the contents of CPPDEFINES, to improve accuracy of conditional inclusion as compared to the real preprocessor (ref: issue SCons#4623). Signed-off-by: Mats Wichmann <[email protected]>
Replacement is now limited to five passes, to avoid going into an endless loop in pathlogical cases where there are three or more macros that circularly refer to each other. No error is reported in this case. Replacement is now only done for the otional C Conditional scanner, not for the classical C scanner. Signed-off-by: Mats Wichmann <[email protected]>
ff8c116
to
7ff7aca
Compare
One solution is to keep a dictionary of the initial value and the processed values. If the processed value is in the dictionary before adding it, then a cycle is detected. If a cycle is detected, there is a question of which value should be returned. One school of thought is the "prior value" before the cycle (i.e., the most "forward" progress). This technique was used in my own work for multiple levels of environment variable substitution which is effectively the same problem. |
Yes, of course there are ways to detect it (and thanks for the note!!). Trying to keep it lightweight, at the moment... a cycle should be a super-uncommon case. I hope. |
A Thanksgiving gift [last revision 2024-12-06 11:50 EST]. Bugs possible. Source Codefrom typing import Dict
### PR IMPLEMENTATION
def _replace(mapping: Dict) -> Dict:
old_ns = mapping
loops = 0
while loops < 5: # don't recurse forever in case there's circular data
ns = {k: old_ns[v] if v in old_ns else v for k, v in old_ns.items()}
if old_ns == ns:
break
old_ns = ns
loops += 1
return ns
### MODIFIED PR IMPLEMENTATION
def _replace_mod(mapping: Dict) -> Dict:
old_ns = mapping
loops = 0
while loops < 5: # don't recurse forever in case there's circular data
again = False
ns = {}
for k, v in old_ns.items():
if v in old_ns:
ns[k] = old_ns[v]
if not again and ns[k] != v:
again = True
else:
ns[k] = v
if not again:
break
old_ns = ns
loops += 1
return ns
### ALTERNATE PR IMPLEMENTATION
def _replace_alt(mapping: Dict) -> Dict:
replace_list = []
resolved_map = {}
for key, val in mapping.items():
if val not in mapping:
resolved_map[key] = val
elif val != key:
replace_list.append((key,val))
else:
# cycle (self-referential)
resolved_map[key] = val
if not replace_list:
mapping = resolved_map
else:
mapping = dict(mapping)
for key, val in replace_list:
if val in resolved_map:
mapping[key] = resolved_map[val]
continue
replace_seen = set()
replace_trail = [val]
while val in mapping:
nextval = mapping[val]
if nextval in resolved_map:
val = resolved_map[nextval]
break
replace_seen.add(val)
if nextval in replace_seen:
# cycle detected
val = replace_trail[-1]
break
replace_trail.append(nextval)
val = nextval
for resolved_key in replace_trail:
resolved_map[resolved_key] = val
mapping[key] = val
return mapping
if __name__ == "__main__":
import gc
import platform
import time
TESTING = False
TESTING_EXPS = [10, 20, 21]
TESTING_GC = [False]
NINVOCATIONS = pow(10, 7) # exp >= 5 for elapsed time division
### TIMING
def profile(func, mapping, ncalls=NINVOCATIONS, gcenabled=True):
gc.collect()
if not gcenabled:
gc.disable()
rval = None
begtime = time.perf_counter()
while ncalls > 0:
ncalls -= 1
rval = func(mapping)
endtime = time.perf_counter()
if not gcenabled:
gc.enable()
gc.collect()
elapsed = endtime - begtime
return rval, elapsed
### TESTING
print(f"python version = {platform.python_version()!s}")
print(f"number of invocations = {NINVOCATIONS:,}")
for gcenabled in (True, False):
if TESTING and TESTING_GC and gcenabled not in TESTING_GC:
continue
gcstatus = "Y" if gcenabled else "N"
testnum = 0
for CPPDEFINES, expect in [
(
#01 no replacements
[("STRING1", "VALUE1"), ("STRING2", "VALUE2"), ("STRING3", "VALUE3")],
{"STRING1": "VALUE1", "STRING2": "VALUE2", "STRING3": "VALUE3"},
),
(
#02 no replacements
[("STRING1", "STRING1"), ("STRING2", "STRING2"), ("STRING3", "STRING3")],
{"STRING1": "STRING1", "STRING2": "STRING2", "STRING3": "STRING3"},
),
(
#03 single replacement
[("STRING", "VALUE"), ("REPLACEABLE", "RVALUE"), ("RVALUE", "AVALUE")],
# Alternate algorithm replacement steps:
# [REPLACEABLE] RVALUE = AVALUE
{"STRING": "VALUE", "REPLACEABLE": "AVALUE", "RVALUE": "AVALUE"},
),
(
#04 single replacement (first)
[("AAA", "BBB"), ("BBB", "VALUE1"), ("CCC", "VALUE2"), ("DDD", "VALUE3"), ("EEE", "VALUE4"), ("FFF", "VALUE5")],
# Alternate algorithm replacement steps:
# [AAA] BBB = VALUE1
{"AAA": "VALUE1", "BBB": "VALUE1", "CCC": "VALUE2", "DDD": "VALUE3", "EEE": "VALUE4", "FFF": "VALUE5"},
),
(
#05 single replacement (last)
[("AAA", "VALUE1"), ("BBB", "VALUE2"), ("CCC", "VALUE3"), ("DDD", "VALUE4"), ("EEE", "VALUE5"), ("FFF", "EEE")],
# Alternate algorithm replacement steps:
# [FFF] EEE = VALUE5
{"AAA": "VALUE1", "BBB": "VALUE2", "CCC": "VALUE3", "DDD": "VALUE4", "EEE": "VALUE5", "FFF": "VALUE5"},
),
(
#06 multiple replacements
[("AAA", "XXX"), ("BBB", "YYY"), ("CCC", "VALUE1"), ("DDD", "VALUE2"), ("XXX", "DDD"), ("YYY", "CCC")],
# Alternate algorithm replacement steps:
# [AAA] XXX = DDD = VALUE2
# [BBB] YYY = CCC = VALUE1
# [XXX] VALUE2
# [YYY] VALUE1
{"AAA": "VALUE2", "BBB": "VALUE1", "CCC": "VALUE1", "DDD": "VALUE2", "XXX": "VALUE2", "YYY": "VALUE1"},
),
(
#07 multiple replacements
[("AAA", "BBB"), ("BBB", "VALUE1"), ("CCC", "DDD"), ("DDD", "VALUE2"), ("EEE", "FFF"), ("FFF", "VALUE3")],
# Alternate algorithm replacement steps:
# [AAA] BBB = VALUE1
# [CCC] DDD = VALUE2
# [EEE] FFF = VALUE3
{"AAA": "VALUE1", "BBB": "VALUE1", "CCC": "VALUE2", "DDD": "VALUE2", "EEE": "VALUE3", "FFF": "VALUE3"},
),
(
#08 multiple replacements
[("AAA", "BBB"), ("BBB", "VALUE1"), ("CCC", "AAA"), ("DDD", "BBB"), ("EEE", "AAA"), ("FFF", "BBB")],
# Alternate algorithm replacement steps:
# [AAA] BBB = VALUE1
# [CCC] AAA = VALUE1
# [DDD] BBB = VALUE1
# [EEE] AAA = VALUE1
# [FFF] BBB = VALUE1
{"AAA": "VALUE1", "BBB": "VALUE1", "CCC": "VALUE1", "DDD": "VALUE1", "EEE": "VALUE1", "FFF": "VALUE1"},
),
(
#09 long trail, single value
[("A", "B"), ("B", "C"), ("C", "D"), ("D", "E"), ("E", "F"), ("F", "G"), ("G", "H"), ("H", "I"), ("I", "J"), ("J", "K"), ("K", "0")],
# Algorithm replacement steps:
# [A] B = C = D = E = F = G = H = I = J = K = 0
# [B] 0
# ...
# [J] 0
{"A": "0", "B": "0", "C": "0", "D": "0", "E": "0", "F": "0", "G": "0", "H": "0", "I": "0", "J": "0", "K": "0"},
),
(
#10 cycles
[("ABC", "DEF"), ("DEF", "GHI"), ("GHI", "ABC"), ("XXX", "YYY"), ("YYY", "ZZZ"), ("ZZZ", "DEF")],
# Algorithm replacement steps:
# [ABC] DEF = GHI = ABC
# [DEF] ABC
# [GHI] ABC
# [XXX] YYY = ZZZ = DEF = ABC
# [YYY] ABC
# [ZZZ] ABC
{"ABC": "ABC", "DEF": "ABC", "GHI": "ABC", "XXX": "ABC", "YYY": "ABC", "ZZZ": "ABC"},
),
(
#11 none
[("A", 0), ("B", 1), ("C", 2), ("D", 3), ("E", 4), ("F", 5), ("G", 6), ("H", 7)],
{"A": 0, "B": 1, "C": 2, "D": 3, "E": 4, "F": 5, "G": 6, "H": 7},
),
(
#12 1 trail: best case
[("A", 0), ("B", "A"), ("C", "B"), ("D", "C"), ("E", "D"), ("F", "E"), ("G", "F"), ("H", "G")],
{"A": 0, "B": 0, "C": 0, "D": 0, "E": 0, "F": 0, "G": 0, "H": 0},
),
(
#13 2 trails: best case, worst case
[("A", 0), ("B", "A"), ("C", "B"), ("D", "C"), ("E", "F"), ("F", "G"), ("G", "H"), ("H", 1)],
{"A": 0, "B": 0, "C": 0, "D": 0, "E": 1, "F": 1, "G": 1, "H": 1},
),
(
#14 1 trail: worst case
[("A", "B"), ("B", "C"), ("C", "D"), ("D", "E"), ("E", "F"), ("F", "G"), ("G", "H"), ("H", 0)],
{"A": 0, "B": 0, "C": 0, "D": 0, "E": 0, "F": 0, "G": 0, "H": 0},
),
(
#15 none
[("A", "0"), ("B", "1"), ("C", "2"), ("D", "3"), ("E", "4"), ("F", "5"), ("G", "6"), ("H", "7")],
{"A": "0", "B": "1", "C": "2", "D": "3", "E": "4", "F": "5", "G": "6", "H": "7"},
),
(
#16 1 trail: best case
[("A", "0"), ("B", "A"), ("C", "B"), ("D", "C"), ("E", "D"), ("F", "E"), ("G", "F"), ("H", "G")],
{"A": "0", "B": "0", "C": "0", "D": "0", "E": "0", "F": "0", "G": "0", "H": "0"},
),
(
#17 2 trails: best case, worst case
[("A", "0"), ("B", "A"), ("C", "B"), ("D", "C"), ("E", "F"), ("F", "G"), ("G", "H"), ("H", "1")],
{"A": "0", "B": "0", "C": "0", "D": "0", "E": "1", "F": "1", "G": "1", "H": "1"},
),
(
#18 1 trail: worst case
[("A", "B"), ("B", "C"), ("C", "D"), ("D", "E"), ("E", "F"), ("F", "G"), ("G", "H"), ("H", "0")],
{"A": "0", "B": "0", "C": "0", "D": "0", "E": "0", "F": "0", "G": "0", "H": "0"},
),
(
#19 trail length = 33
[
("A0", "B0"), ("B0", "C0"), ("C0", "D0"), ("D0", "E0"), ("E0", "F0"), ("F0", "G0"), ("G0", "H0"), ("H0", "I0"), ("I0", "J0"), ("J0", "K0"),
("K0", "L0"), ("L0", "M0"), ("M0", "N0"), ("N0", "O0"), ("O0", "P0"), ("P0", "Q0"), ("Q0", "R0"), ("R0", "S0"), ("S0", "T0"), ("T0", "U0"),
("U0", "V0"), ("V0", "W0"), ("W0", "X0"), ("X0", "Y0"), ("Y0", "Z0"), ("Z0", "A1"), ("A1", "B1"), ("B1", "C1"), ("C1", "D1"), ("D1", "E1"),
("E1", "F1"), ("F1", "G1"), ("G1", "2"),
],
{ "A0": "2", "B0": "2", "C0": "2", "D0": "2", "E0": "2", "F0": "2", "G0": "2", "H0": "2", "I0": "2", "J0": "2",
"K0": "2", "L0": "2", "M0": "2", "N0": "2", "O0": "2", "P0": "2", "Q0": "2", "R0": "2", "S0": "2", "T0": "2",
"U0": "2", "V0": "2", "W0": "2", "X0": "2", "Y0": "2", "Z0": "2", "A1": "2", "B1": "2", "C1": "2", "D1": "2",
"E1": "2", "F1": "2", "G1": "2",
},
),
(
#20: cycle
[("A", "B"), ("B", "A")],
{"A": "A", "B": "A"},
),
(
#21: cycle
[("A", "B"), ("B", "C"), ("C", "A")],
{"A": "A", "B": "A", "C": "A"},
),
]:
testnum += 1
if TESTING and TESTING_EXPS and testnum not in TESTING_EXPS:
continue
print("")
mapping = {k: v for k, v in CPPDEFINES}
print(f"{gcstatus} {testnum:02d} CPPDEFINES {mapping}")
rval_cur, elapsed_cur = profile(_replace, mapping, ncalls=NINVOCATIONS, gcenabled=gcenabled)
status_cur = 'pass' if rval_cur == expect else 'FAIL'
print(f"{gcstatus} {testnum:02d} cur {status_cur} {elapsed_cur:7.4f} {rval_cur}")
rval_mod, elapsed_mod = profile(_replace_mod, mapping, ncalls=NINVOCATIONS, gcenabled=gcenabled)
status_mod = 'pass' if rval_mod == expect else 'FAIL'
print(f"{gcstatus} {testnum:02d} mod {status_mod} {elapsed_mod:7.4f} {rval_mod}")
rval_alt, elapsed_alt = profile(_replace_alt, mapping, ncalls=NINVOCATIONS, gcenabled=gcenabled)
status_alt = 'pass' if rval_alt == expect else 'FAIL'
print(f"{gcstatus} {testnum:02d} alt {status_alt} {elapsed_alt:7.4f} {rval_alt}")
rel = elapsed_mod / elapsed_cur
which = "mod" if rel < 1.0 else "CUR"
rel_error = ((elapsed_cur - elapsed_mod) / elapsed_mod) * 100.0
delta = (elapsed_cur - elapsed_mod) / NINVOCATIONS
print(f"{gcstatus} {testnum:02d} {which} ---- cur/mod {rel:7.4f} (rel_error = {rel_error:6.2f}% avg_delta = {delta:11.4E})")
rel = elapsed_alt / elapsed_cur
which = "alt" if rel < 1.0 else "CUR"
rel_error = ((elapsed_cur - elapsed_alt) / elapsed_alt) * 100.0
delta = (elapsed_cur - elapsed_alt) / NINVOCATIONS
print(f"{gcstatus} {testnum:02d} {which} ---- cur/alt {rel:7.4f} (rel_error = {rel_error:6.2f}% avg_delta = {delta:11.4E})")
rel = elapsed_alt / elapsed_mod
which = "alt" if rel < 1.0 else "MOD"
rel_error = ((elapsed_mod - elapsed_alt) / elapsed_alt) * 100.0
delta = (elapsed_mod - elapsed_alt) / NINVOCATIONS
print(f"{gcstatus} {testnum:02d} {which} ---- mod/alt {rel:7.4f} (rel_error = {rel_error:6.2f}% avg_delta = {delta:11.4E})") Batch File
Raw OutputTest and profiling output [Python 3.7]:
Test and profiling output [Python 3.12]:
ChangesEdits:
|
ObservationsThere is an asymetric difference in behavior between backward references (i.e., a value refers to an earlier key) compared to forward references (i.e., a value refers to a later key). For example:
For every loop variable value N (i.e., the initial value of variable While exiting due to exceeding the loop count due to a cycle is the most likely reason, it is not guaranteed. In the post above,
The current and modified algorithms in the post above exceed the loop count for experiments AlgorithmsThe results for python versions 3.7 to 3.11 are very different from the results for python version 3.12. For python versions prior to 3.12, the modified algorithm appears to be faster than the current algorithm in most all cases. The three algorithms in the post above are:
|
<Earlier markdown table remove pending better solution> Temporary PDF tables of algorithm comparisons: Raw output files by python version:
LegendThe following 3 tables compare pairs of algorithms. The abbreviated name of the "faster" algorithm is shown in the table cells. The text colors of the table cells correspond to the relative error percentage in the raw output:
A - Current Algorithm Versus Modified AlgorithmRelative error percentage = ((total_elapsed_time_current - total_elapsed_time_modified) / total_elapsed_time_modified) * 100.0
B - Current Algorithm Versus Alternate AlgorithmRelative error percentage = ((total_elapsed_time_current - total_elapsed_time_alternate) / total_elapsed_time_alternate) * 100.0
C - Modified Algorithm Versus Alternate AlgorithmRelative error percentage = ((total_elapsed_time_modified - total_elapsed_time_alternate) / total_elapsed_time_alternate) * 100.0
|
Boy, that's a lot. Sadly, with the winter sun shining in my window causing glare, the colors don't really stand out, plus something didn't render in the last row of the last table. Is there a "short summary" version of all this? |
What do you mean by relative error? What metrics are you using for this calculation? time? other? |
All of this is much ado about nothing.
I'm going to delete the content for the previous post until I can figure out how to handle it correctly. Abusing inline latex for colors again. Somewhere, I have code to generate html tables. Sigh. Initially, the last row didn't render here but refreshing the page "fixed" it. Go figure.
Notes:
The modified version is simple and easy to understand. The alternate version detects cycles and does not have trail length limitations but is more involved. In the end, the timing differences per invocation are REALLY small. |
Total elapsed time. def profile(func, mapping, ncalls=NINVOCATIONS, gcenabled=True):
gc.collect()
if not gcenabled:
gc.disable()
rval = None
begtime = time.perf_counter()
while ncalls > 0:
ncalls -= 1
rval = func(mapping)
endtime = time.perf_counter()
if not gcenabled:
gc.enable()
gc.collect()
elapsed = endtime - begtime
return rval, elapsed The relative error percentage for comparing the current and modified algorithms using total elapsed time is:
|
Well, the project to substantially speed up Python (that brought Guido out of a brief retirment) landed its first bits for 3.12, 3.13 might make another difference. All depends on what code paths we hit... differences between versions aren't unusual when I've done other benchmarking - including sometimes things got slower for a release or two!
that might be worthwhile... although comprehensions shouldn't normally be slower than an unrolled loop... |
It is not the loop that is slower; it is the comparison for equality following the loop. I would think that the worst-case performance of the dictionary comparison is when they are equal (i.e., all elements compared). By unrolling the comprehension, a flag can be set if the old and new values are different indicating another iteration is necessary. Once the flag is set, there is no need to compare the values anymore. The comparisons only take place when writing to keys in which the values need to be resolved. |
right, makes sense. |
Simplistic macro replacement is now done on the contents of
CPPDEFINES
, to improve accuracy of conditional inclusion as compared to the real preprocessor (ref: issue #4623). The replacement will happen for both the Conditional and the original C scanner, but since the original does not attempt to process the logic around conditional includes, it will simply have no effect on it. EDITED: now only applied to Conditional scanner.This should have no doc impacts: we don't say how scanners determine implicit dependencies, so a small tweak to that process is more like a bugfix.
Contributor Checklist:
CHANGES.txt
andRELEASE.txt
(and read theREADME.rst
).