Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADT reasoning on distinct literals is incomplete #1014

Closed
Halbaroth opened this issue Dec 14, 2023 · 9 comments
Closed

ADT reasoning on distinct literals is incomplete #1014

Halbaroth opened this issue Dec 14, 2023 · 9 comments
Labels
completeness This issue is about completeness of theories reasoning This issue is about improving reasoning capabilities.

Comments

@Halbaroth
Copy link
Collaborator

Halbaroth commented Dec 14, 2023

The current implementation of the ADT theory is incomplete. By incomplete I mean the reasoning isn't capable to notice inconsistency of some ground problems involving only ADT and boolean symbols. I notice this flaw while documenting the module Adt_rel, see #1013.

For instance, Alt-Ergo answers Unknown on this input file:

(set-logic ALL)
(declare-datatype Data (
   (cons1 (d1 Bool))
   (cons2)
))
(declare-const x Data)
(declare-const y Data)
(declare-const z Data)
(assert ((_ is cons1) x))
(assert ((_ is cons1) y))
(assert ((_ is cons1) z))
(assert (distinct x y z))
(check-sat)

The expected answer is unsat. If we remove the second constructor cons2, Alt-Ergo will promote the ADT Data to a record and we got the correct answer.

This issue comes from the fact the assume function in Adt_rel ignores silently disequations when the involved constructors have payloads. Let's imagine we have two constructor applications: x = cons1 u and y = cons2 v and we assume that x <> y. Then we have to propagate u <> v to CC(X).

But if we have payloads with at least two fields? If x = cons1 u1 u2 and y = cons1 v1 v2 and we assume x <> y, then we have to propagate u1 <> v1 \/ u2 <> v2. In this case we have to propagate a clause but AE relations can only propagate literals!

The current code of assume does nothing if one of the involved constructor has a payload. We can do better if the payload has only one field but if we want to treat the general case, we have to work more.

This issue resembles a lot to the issue with the distinct literal. When I tried to propagate this literal in CC(X), I couldn't do it because its negation isn't literal but a clause.

A possible solution would be to change our definition of facts. Currently a fact is just an annotated literal but we could consider annotated clauses instead.

Another one would be to use a kind of union-find structure in Adt_rel to keep in mind all these constraints until we can produce a literal.

@Halbaroth Halbaroth added reasoning This issue is about improving reasoning capabilities. completeness This issue is about completeness of theories labels Dec 14, 2023
@bclement-ocp
Copy link
Collaborator

The expected answer is unsat. If we remove the second constructor cons2, Alt-Ergo will promote the ADT Data to a record and we got the correct answer.

Is this true? I get unknown when I remove the cons2 case on current next. It seems to only be handled by the model-generation code (with --produce-models); and it should be handled in the same way once model generation for ADTs is implemented (this will use backtracking but that is fine for model gen).

On the more general point of "should we propagate the clause earlier than model generation" — the only mechanism we have to do that currently is the case split mechanism (splitting on u1 <> v1 which is different from the splits we need for model generation). It would not be very efficient to do so currently but #901 should improve on that. That said, in #901 I currently have the necessary ground work to allow propagating actual clauses using semantic literals to the solver, which would be a better solution.

@Halbaroth
Copy link
Collaborator Author

You're right. It does return unsat only if we turn model generation on. Why we shouldn't support this before model generation? Is it too complicated with the current implementation and we should wait your patch to push the casesplit mechanism in the SAT solver or is it a bad idea because it will generate too much casesplits?

@bclement-ocp
Copy link
Collaborator

Why we shouldn't support this before model generation?
is it a bad idea because it will generate too much casesplits

In the current architecture (where case splits use their own little backtracking algorithm), probably. With #901 not sure but could be worth trying.

In any case we could also always enable model generation. There is not much to loose — it would only impact problems that we currently return unknown on due to incompleteness.

@Halbaroth
Copy link
Collaborator Author

Halbaroth commented Dec 18, 2023

I agree, we can activate model generation by default, at least in presence of theory for which model generation extends their support as record and ADT theories. Not sure it's worth doing it for LIA but it shouldn't impact the performance.

@Halbaroth
Copy link
Collaborator Author

Halbaroth commented Apr 5, 2024

In fact the incompleteness has no relation with the ADT theory. If you run Alt-Ergo without model generation on the input file:

(set-logic ALL)
(declare-datatype Data (
   (cons1 (d1 Int))
   (cons2)
))
(declare-const x Data)
(declare-const y Data)
(declare-const z Data)
(assert ((_ is cons1) x))
(assert ((_ is cons1) y))
(assert ((_ is cons1) z))
(assert (and (<= 0 (d1 x)) (<= (d1 x) 1)))
(assert (and (<= 0 (d1 y)) (<= (d1 y) 1)))
(assert (and (<= 0 (d1 z)) (<= (d1 z) 1)))
(assert (distinct x y z))
(check-sat)

We got unsat with both AE v2.5.3 and next. But if we run them with

(set-logic QF_DT)
(declare-datatype Data (
   (cons1 (d1 Bool))
   (cons2)
))
(declare-const x Data)
(declare-const y Data)
(declare-const z Data)
(assert ((_ is cons1) x))
(assert ((_ is cons1) y))
(assert ((_ is cons1) z))
(assert (distinct x y z))
(check-sat)

and the option --dd uf, we obtain unknown but the disequations map of the union-find contains .k1 <> .k2 and .k2 <> .k3
where .k1, .k2, .k3are the fresh names of type Tbool generated by the ADT theory.

We don't get the best answer because we don't call assign_value if the model generation is disabled, so we don't try to assign a value to .k1, .k2 and .k3.

It's a bit sad that the boolean theory isn't complete if we don't turn model generation on but we have a complete decision procedure for the enum theory (assuming we have enough fuel for the case-split mechanism) and Bool is just a built-in sum type.

@bclement-ocp
Copy link
Collaborator

Ah yes of course nobody is doing case splits for the boolean theory. Hm. Not sure what the proper fix is but at least we understand the issue now :)

@Halbaroth
Copy link
Collaborator Author

We may add a special function in Relation to perform case-split for boolean values as we do for the solver in Shostak.

@bclement-ocp
Copy link
Collaborator

We need to do case-splits for booleans in a way that doesn't negatively impact the existing solver, that's the tricky part (in particular doing case-splits for just any boolean term doesn't sound like a good idea as it would degenerate into every problems being solved by the pure backtracking case-split mechanism instead of the actual SAT solver).

Now that I think about it, this will probably be easier to tackle once #901 is complete.

@bclement-ocp
Copy link
Collaborator

Closing in favor of #1156 now that we have identified the underlying cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
completeness This issue is about completeness of theories reasoning This issue is about improving reasoning capabilities.
Projects
None yet
Development

No branches or pull requests

2 participants