-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New 'FlexC' dataset and CbcPruneExtractor
#7
base: main
Are you sure you want to change the base?
Conversation
arXiv paper from which the dataset originates: https://arxiv.org/abs/2309.09112 |
This looks great!! I'm a little worried about the threshold in the extractor though, it will make it hard to benchmark across different datasets. Is there a way to autotune it somehow? Like perhaps start at some percentile of all node costs, and then move it up if extraction fails. |
Autotuning the 'infinity value' would be interesting to explore for datasets where the 'infinity value' is unknown, maybe it could be the 99th percentile of the entire dataset rather than of an individual e-graph. Another way to go would be to ask datasets/egraphs to encode their 'infinity value', and if there is none, to either avoid using the pruning extractor, use a dummy threshold above all costs, or use a percentile guess. Now if we take the same pruning strategy and forget about trying to find an 'infinity value', but simply use the threshold as a heuristic, I could imagine an extractor that, e.g. tries to ILP solve using nodes with cost <=50th percentile, then with cost <=75th percentile, <= 87.5th percentile, etc. as some kind of dichotomy that assumes expensive nodes are less likely to be useful. Maybe that is worth experimenting with. That strategy could also be compatible with an explicitly encoded 'infinity threshold': the heuristic threshold should not move up past the 'infinity value' and instead extraction should fail early. |
I would love to merge the dataset first if possible, especially since it seems the new extractor is maybe hard to generalize. Also, consider running the improved CBC extractor on this dataset to see how it compares. |
Sounds good: #18 |
This is a great idea. I've had a go at implementing something similar in #16 The optimal extractor runs an approximate extractor (say bottom-up) to get the an upper-bound on the dag-cost, then removes any node from the candidate extraction that have a cost greater than that upper-bound. In the data/flexc the largest dag-cost that the bottom-up extractor finds is 137, so all of the "infinity" nodes with a cost of 1000 are removed. I suspect there's lots further we can go with the idea. |
Where do we sit with this PR? The dataset was merged in a different PR, and there are a lot of other ILP prs flying around. Is this still looking like a distinct enough extractor to merge? |
As far as I understand, @TrevorHansen's #16 will supersede the I'd also like to see better benchmark statistics/plotting in main, which would also supersede my changes to |
Hi all,
This PR:
CbcPruneExtractor
extractor, that:LpExtractor
from egg 0.9.0, meaning that it does not use the fix from LpExtractor creates an infeasible problem for CBC egg#207 (comment) , which was counterproductive in our own experiments. From what I have seen,CbcPruneExtractor
also seems to use a more simplistic cycle pruning strategy than the currentCbcExtractor
.This is the output of
make FEATURES=ilp-cbc,ilp-cbc-prune
:I assume that the difference between
ilp-cbc
andilp-cbc-prune
for the other datasets than 'FlexC' is due to the difference between theLpExtractor
from egg 0.9.0 and theCbcExtractor
from this repo, since no e-nodes should be banned.ilp-cbc-prune
seems to take more time and find better costs thanilp-cbc
for the 'babble' and 'tensat' datasets, and the opposite for the 'egg' dataset.On the 'FlexC' dataset,
ilp-cbc-prune
is clearly better thanilp-cbc
both in time and cost, thanks to the ILP model pruning. A nice observation for us is thatbottom-up
is actually doing really well, almost finding the same costs in much less time.I know that the PR needs some cleaning up before being mergeable, so please let me know what you would like me to change, and I hope people will find these additions useful :)