FDCC: a Combined Approach for Solving Constraints over Finite Domains and Arrays
S´ebastien Bardin(1) , Arnaud Gotlieb(2) (1) CEA LIST (Paris, France) (2) INRIA (Rennes, France) - Certus V&V Center, Simula (Oslo, Norway)
CPAIOR 2012
Bardin, S., Gotlieb, A.
1/ 19
Overview Goal : an efficient CP(FD) approach for array+FD constraints go beyond standard filtering-based techniques (element) motivation = software verification Approach : combine global symbolic deduction mechanisms with local filtering in order to achieve better deductive power than both technique taken in isolation Results : an original “greybox” combination for array+FD constraints I I
identify which information should be shared propose ways of taming communication cost
a prototype and encouraging experiments (random instances) I I
greater solving power (beats perfect blackbox combination) low overhead
easy to adapt for any CP(FD) solver (small API) Bardin, S., Gotlieb, A.
2/ 19
Motivations int foo (int a, int b, int c) // precondition(a,b,c) int tmp, result; tmp = a+b; if (tmp c iff foo(a,b,c) goes through else-path
Bardin, S., Gotlieb, A.
3/ 19
Motivations int foo (int a, int b, int c) // precondition(a,b,c) int tmp, result; tmp = a+b; if (tmp c ∧ ¬Ψpost (a, b, c, c)
Bardin, S., Gotlieb, A.
3/ 19
Motivations int foo (int a, int b, int c) // precondition(a,b,c) int tmp, result; tmp = a+b; if (tmp c ∧ ¬Ψpost (a, b, c, c)
Bardin, S., Gotlieb, A.
3/ 19
Motivations (2) Constraint resolution becomes prominent in formal verification especially software verification Underlies several approaches, either for test generation or invariant computation [abstract model checking, bounded model checking] [symbolic execution, weakest precondition calculus]
Verification reduces to solving Verification Conditions (VCs)
Bardin, S., Gotlieb, A.
4/ 19
Motivations (2) Constraint resolution becomes prominent in formal verification especially software verification Underlies several approaches, either for test generation or invariant computation [abstract model checking, bounded model checking] [symbolic execution, weakest precondition calculus]
Verification reduces to solving Verification Conditions (VCs) We consider quantifier-free conjunctive fragments interesting by themselves [symbolic execution, test data generation] basic block of solvers handling disjunctions and quantifications
Bardin, S., Gotlieb, A.
4/ 19
CP(FD) and Verification Most verification techniques are based on SMT Yet, CP(FD) is a natural and interesting alternative since basic data types naturally range over finite domains Potentially interesting for bounded (non-linear) integer arithmetic modular arithmetic [Gotlieb-Leconte-Marre 10] bitvectors [Bardin-Herrmann-Perroud 10] floating-point arithmetic [Botella-Gotlieb-Michel 06] A few CP-based verification tools exist [+ encouraging case-studies] Inka [Gotlieb-Botella-Rueher 00], GATeL [Marre-Blanc 05] Osmose [Bardin-Herrmann 08], Jaut [Charreteur-Botella-Gotlieb 09]
Bardin, S., Gotlieb, A.
5/ 19
CP(FD) and Verification Most verification techniques are based on SMT Yet, CP(FD) is a natural and interesting alternative since basic data types naturally range over finite domains Potentially interesting for bounded (non-linear) integer arithmetic modular arithmetic [Gotlieb-Leconte-Marre 10] bitvectors [Bardin-Herrmann-Perroud 10] floating-point arithmetic [Botella-Gotlieb-Michel 06] A few CP-based verification tools exist [+ encouraging case-studies] Inka [Gotlieb-Botella-Rueher 00], GATeL [Marre-Blanc 05] Osmose [Bardin-Herrmann 08], Jaut [Charreteur-Botella-Gotlieb 09] But CP(FD) lacks an efficient handling of array constraints Bardin, S., Gotlieb, A.
5/ 19
The theory of arrays The standard theory of arrays is defined by three sorts : arrays A, elements of arrays E , indexes I function select(T , i) : A × I 7→ E function store(T , i, e) : A × I × E 7→ A = and 6= over E and I Semantics (read-over-write) (FC) i = j −→ select(T , i) = select(T , j) (RoW-1) i = j −→ select(store(T , i, e), j) = e (RoW-2) i 6= j −→ select(store(T , i, e), j) = select(T , j)
Bardin, S., Gotlieb, A.
6/ 19
The theory of arrays (2) Why does array theory matter so much in verification ? for modelling arrays and vectors [of course !] basis for more advanced containers I I
maps, hash tables memory heap
A few remarks about the theory no constraint on array size or domains of indexes / elements [need to combine with constraints on E and I ]
no equality / disequality between arrays yet, difficult to solve [NP-hard for the ∧-fragment]
Bardin, S., Gotlieb, A.
7/ 19
CP and arrays : local filtering arrays represented by pairs (index, element) [explicit arrays of logical variables]
constraints on domains of indexes / elements (and size) select : well-known constraint element [Van Hentenryck-Carillon 88, Brand 01]
store : more recent work [Charreteur-Botella-Gotlieb 09]
Element(ARRAY,I,E) :( integer(I)? ARRAY[I] == E, success ; S D(E) ← D(E) ∩ i∈D(I) D(ARRAY(i)), D(I) ← {i ∈ D(I)|D(E) ∩ D(ARRAY[i]) 6= ∅}, wait(...) )
Bardin, S., Gotlieb, A.
8/ 19
CP and arrays : local filtering arrays represented by pairs (index, element) [explicit arrays of logical variables]
constraints on domains of indexes / elements (and size) select : well-known constraint element [Van Hentenryck-Carillon 88, Brand 01]
store : more recent work [Charreteur-Botella-Gotlieb 09]
Update(A,I,E,A’) :( integer(I)? A’[I]==E, ∀ k 6= I do A’[k]==A[k], success ; S D(E) ← D(E) ∩ i∈D(I) D(A’(i)), D(I) ← {i ∈ D(I)|D(E) ∩ D(A’[i]) 6= ∅}, ∀ k 6∈ D(I) do A’[k] == A[k] ∀ k ∈ D(I) do D(A’[k]) ← D(A’[k]) ∩ (D(A[k])∪ D(E)) ... ) Bardin, S., Gotlieb, A.
8/ 19
CP and arrays : local filtering (2) Fine for “simple” array constraints either small arrays or very few updates fixed-value indexes (or at least no wide-domain indexes) Insufficient for many array constraints from program verification large arrays, many updates, (wide-range) variable indexes [see formulas from SMT-LIB]
Bardin, S., Gotlieb, A.
9/ 19
CP and arrays : local filtering (2) Fine for “simple” array constraints either small arrays or very few updates fixed-value indexes (or at least no wide-domain indexes) Insufficient for many array constraints from program verification large arrays, many updates, (wide-range) variable indexes [see formulas from SMT-LIB]
e = select(T , i) ∧ f = select(T , j) ∧ e 6= f ∧ i = j T array of size 100 Domains : 0..100
×
fd : needs labelling [no answer in 60 min in COMET]
Bardin, S., Gotlieb, A.
9/ 19
CP and arrays : local filtering (2) Fine for “simple” array constraints either small arrays or very few updates fixed-value indexes (or at least no wide-domain indexes) Insufficient for many array constraints from program verification large arrays, many updates, (wide-range) variable indexes [see formulas from SMT-LIB]
e = select(T , i) ∧ f = select(T , j) ∧ e 6= f ∧ i = j T array of size 100 Domains : 0..100
×
fd : needs labelling [no answer in 60 min in COMET]
Bardin, S., Gotlieb, A.
9/ 19
CP and arrays : local filtering (2) Fine for “simple” array constraints either small arrays or very few updates fixed-value indexes (or at least no wide-domain indexes) Insufficient for many array constraints from program verification large arrays, many updates, (wide-range) variable indexes [see formulas from SMT-LIB]
i ∈ 1..5 ∧ j ∈ 6..10 ∧ a 6= select(store(store(T , j, a), i, b), j)
×
fd : needs labelling, cannot established select(store(T , j, a), j) = a
Bardin, S., Gotlieb, A.
9/ 19
CP and arrays : local filtering (2) Fine for “simple” array constraints either small arrays or very few updates fixed-value indexes (or at least no wide-domain indexes) Insufficient for many array constraints from program verification large arrays, many updates, (wide-range) variable indexes [see formulas from SMT-LIB]
i ∈ 1..5 ∧ j ∈ 6..10 ∧ a 6= select(store(store(T , j, a), i, b), j)
×
fd : needs labelling, cannot established select(store(T , j, a), j) = a
Bardin, S., Gotlieb, A.
9/ 19
Our approach
Bardin, S., Gotlieb, A.
10/ 19
Our approach
Bardin, S., Gotlieb, A.
10/ 19
Our approach
Bardin, S., Gotlieb, A.
10/ 19
Our approach
Bardin, S., Gotlieb, A.
10/ 19
Our approach
Bardin, S., Gotlieb, A.
10/ 19
Our approach
Bardin, S., Gotlieb, A.
10/ 19
Examples e = select(T , i) ∧ f = select(T , j) ∧ e 6= f ∧ i = j T array of size 100 Domains : 0..100
X × X
cc : unsat quickly (axiom FC) fd : needs labelling [no answer in 60 min in COMET] fdcc : unsat quickly through cc
Bardin, S., Gotlieb, A.
11/ 19
Examples e = select(T , i) ∧ f = select(T , j) ∧ e 6= f ∧ i = j T array of size 100 Domains : 0..100
X × X
cc : unsat quickly (axiom FC) fd : needs labelling [no answer in 60 min in COMET] fdcc : unsat quickly through cc
Bardin, S., Gotlieb, A.
11/ 19
Examples e = select(T , i) ∧ f = select(T , j) ∧ e 6= f ∧ i = j T array of size 100 Domains : 0..100
X × X
cc : unsat quickly (axiom FC) fd : needs labelling [no answer in 60 min in COMET] fdcc : unsat quickly through cc
Bardin, S., Gotlieb, A.
11/ 19
Examples e = select(T , i) ∧ f = select(T , j) ∧ e 6= f ∧ i = j T array of size 100 Domains : 0..100
X × X
cc : unsat quickly (axiom FC) fd : needs labelling [no answer in 60 min in COMET] fdcc : unsat quickly through cc
Bardin, S., Gotlieb, A.
11/ 19
Examples i ∈ 1..5 ∧ j ∈ 6..10 ∧ a 6= select(store(store(T , j, a), i, b), j)
× × X
cc : no deduction since i 6= j cannot be inferred fd : needs labelling, cannot established select(store(T , j, a), j) = a fdcc : fd deduces i 6= j (domain-check), cc can then deduce a 6= select(store(T , j, a), j) then a 6= a and unsat
Bardin, S., Gotlieb, A.
11/ 19
Examples i ∈ 1..5 ∧ j ∈ 6..10 ∧ a 6= select(store(store(T , j, a), i, b), j)
× ×
cc : no deduction since i 6= j cannot be inferred
X
fdcc : fd deduces i 6= j (domain-check), cc can then deduce a 6= select(store(T , j, a), j) then a 6= a and unsat
fd : needs labelling, cannot established select(store(T , j, a), j) = a
Bardin, S., Gotlieb, A.
11/ 19
Examples i ∈ 1..5 ∧ j ∈ 6..10 ∧ a 6= select(store(store(T , j, a), i, b), j)
× ×
cc : no deduction since i 6= j cannot be inferred
X
fdcc : fd deduces i 6= j (domain-check), cc can then deduce a 6= select(store(T , j, a), j) then a 6= a and unsat
fd : needs labelling, cannot established select(store(T , j, a), j) = a
Bardin, S., Gotlieb, A.
11/ 19
Examples i ∈ 1..5 ∧ j ∈ 6..10 ∧ a 6= select(store(store(T , j, a), i, b), j)
× ×
cc : no deduction since i 6= j cannot be inferred
X
fdcc : fd deduces i 6= j (domain-check), cc can then deduce a 6= select(store(T , j, a), j) then a 6= a and unsat
fd : needs labelling, cannot established select(store(T , j, a), j) = a
Bardin, S., Gotlieb, A.
11/ 19
Examples e = select(T , i) ∧ f = select(T , j) ∧ g = select(T , k) ∧e 6= f ∧ e 6= g ∧ f 6= g T array of size 2, domain of indexes 1..2
×
cc : deduces allDifferent(i,j,k), does not output unsat (domains not taken into account)
× X
fd : needs labelling [labels indexes first ! !] fdcc : cc deduces allDifferent(i,j,k), then fd deduces unsat
Bardin, S., Gotlieb, A.
11/ 19
Examples e = select(T , i) ∧ f = select(T , j) ∧ g = select(T , k) ∧e 6= f ∧ e 6= g ∧ f 6= g T array of size 2, domain of indexes 1..2
×
cc : deduces allDifferent(i,j,k), does not output unsat (domains not taken into account)
× X
fd : needs labelling [labels indexes first ! !] fdcc : cc deduces allDifferent(i,j,k), then fd deduces unsat
Bardin, S., Gotlieb, A.
11/ 19
Communication framework Communication between fd and cc can be costly especially, checking (dis-)equalitites of variables through their domains, |V |2 pairs to be checked How to tame communication costs ? a communication policy allowing tight control over expensive communications a reduction of the number of pairs of variables to consider (critical pairs) Other labelling is only transmitted to fd
Bardin, S., Gotlieb, A.
12/ 19
Communication framework (2) Communication policy cheap communications (cc 7→ fd) made asynchronously expensive ones (fd 7→ cc) made on request (supervisor) Critical pairs focus on pairs whose (dis-)equality will surely lead to new deductions in cc [see axioms] the set of critical pairs is defined by a. ∀ v =select(T , i) and v 0 =select(T , j), pairs (v , v 0 ) and (i, j) ˆ ˆ b. ∀ v =select(store(T , i, e), j), pairs (i, j) and (e, v ) ˆ
still quadratic (in #select ) another reduction : focus only on type (b.) I I
Bardin, S., Gotlieb, A.
linear in #select, capture the specificity of array axioms manageable in practise, still brings interesting deductions 13/ 19
Communication framework (3)
Bardin, S., Gotlieb, A.
14/ 19
Implementation implemented in SICStus Prolog (≈ 1.7 KLOC) FD solver use the SICStus clpfd library add our own array select and store operations [Charreteur-Botella-Gotlieb 09]
simple labelling heuristics such as first-fail cc and supervisor build our own [straightforward implementations] cliques : reduce search to 3-clique clique detection launched when new disequality added
Bardin, S., Gotlieb, A.
15/ 19
Experiments on random instances Shape of random formulas 40 variables, 3-6 arrays of size 20, domain = 0..50 four kind of formulas (easy / hard array, arith / no arith) length 10-60 Properties to be evaluated ability to solve as many formulas as possible comparison with fd and cc [including overhead] comparison with blackbox combinations (hybrid and best) Experiment 1 : evaluates on 369 formulas, balanced in the 4 classes and sat / unsat Experiment 2 : evaluates on 100 formulas for each length between 10 and 60 [performance w.r.t. complexity threshold]
Bardin, S., Gotlieb, A.
16/ 19
Results First experiment solving power : solves > than fd, cc, or best I I
22 formulas (out of 369) solved only by fdcc 5x less TO than fd and 3x less TO than best
affordable overhead over cc and fd [when they succeed] I
at worst 4x slower, on average 1.1x - 1.5x slower
robustness : results hold for all 4 classes and sat / unsat Second experiment fdcc again better than fd and cc maximal benefits on hard-to-solve formulas [closed to complexity threshold]
Bardin, S., Gotlieb, A.
17/ 19
Results (2)
cc fd fdcc best hybrid
S 29 154 181 154 154
Total (369) U TO 115 225 151 64 175 13 175 40 175 40
T 13545 3995 957 2492 2609
S : # sat answer, U : # unsat answer, TO : # time-out (60 sec), T : time in sec.
Bardin, S., Gotlieb, A.
18/ 19
Results (2)
cc fd fdcc best hybrid
cc fd fdcc best hybrid
S 26 39 40 39 39
S 1 50 52 50 50
AEUF-I (79) U TO 37 16 26 14 37 2 37 3 37 3
T 987 875 144 202 242
AEUF+LIA-I (100) U TO T 21 78 4689 47 3 199 48 0 24 48 2 139 48 2 159
S 2 35 51 35 35
AEUF-II (90) U TO 30 58 18 37 30 9 30 25 30 25
S 0 30 38 30 30
AEUF+LIA-II (100) U TO T 27 73 4384 60 10 622 60 2 154 60 10 622 60 10 647
T 3485 2299 635 1529 1561
S : # sat answer, U : # unsat answer, TO : # time-out (60 sec), T : time in sec.
Bardin, S., Gotlieb, A.
18/ 19
Results (2) #(unsolved formulas)
#(solved formulas) CCFD
99 95
93
CC
TO_CCFD
FD
92
88 82
75
56
84
48
66
65
60
40
35
52 44 31
34 25
30
20
30
40
50
60
10
20
18 12
11
8
7
5 1 10
TO_FD
70
69
97 96
89
TO_CC
16 4 3
30
40
50
60
Gain with FDCC Miracle
Gain
88
83
81
39
36 21 11
15
10
5
4 10
Bardin, S., Gotlieb, A.
20
30
40
50
0 60
18/ 19
Conclusion Results an original decision procedure for arrays that combines ideas from symbolic reasoning and finite-domain constraint solving I I
identify which information should be shared propose ways of taming communication cost
a prototype and encouraging experiments (random instances) I I
greater solving power (beats even best) low overhead
easy to adapt for any CP(FD) solver Future work experiments on real-life problems extend the approach to handle memory heaps (new, delete) Bardin, S., Gotlieb, A.
19/ 19
About VCs Logical connectors ∧ : to express paths ∨ : to embed several paths in one formula [alternative : enumerate them in the verification tool]
∀, ∃ : advanced preconditions / postconditions / contracts First-order theories for data types basic data types : integers, bitvectors collections : arrays, maps We consider here quantifier-free conjunctive fragments interesting by themselves [symbolic execution, test data generation] basic block of solvers handling disjunctions and quantifications Bardin, S., Gotlieb, A.
19/ 19
Why a dedicated combination framework ? Or : more direct approaches, and why we do not choose them
Standard combination scheme between arrays and CP(FD) [Nelson-Oppen (NO)]
solving arrays is already NP-hard NO is heavy on non-convex theories like arrays or integers FD constraints do not fit well into NO assumptions [infinite model]
Remove all store functions by introducing ∨ CP(FD) not well-adapted for handling case-splits Simple concurrent black-box combination [first success wins] we want to outperform it in solving power while still allowing easy re-use of any CP(FD) engine Bardin, S., Gotlieb, A.
19/ 19
The two solvers cc and fd A semi-decision procedure cc for (pure) arrays global symbolic reasoning polynomial-time (no case-split) correct but not complete [may output “maybe”] based on the standard congruence closure algorithm I
+ rules for array axioms
A CP(FD) solvers for arrays and FD-constraints local domain filtering correct [complete with a labelling] based on the existing CP(FD) constraints re-use of existing CP(FD) solvers through a small API I I
Bardin, S., Gotlieb, A.
is fd eq(x,y) and is fd diff(x,y) if store and select not available, give access to internal domains (set, get, ∪, ∩, ∈, ∅?) 19/ 19
Details : cc for arrays Based on the standard congruence closure algorithm [union-find] each equivalence class has a witness each variable has a parent, the “higher parent” is the witness basic operations : witness(var) and union(var1,var2) clever implementations in O(n) [ranking, path compression] Extensions for arrays (FC-1)
i = j −→ select(T , i) = select(T , j)
(FC-2)
select(T , i) 6= select(T , j) −→ i 6= j
(RoW-1)
i = j −→ select(store(T , i, e), j) = e
(RoW-2)
i 6= j −→ select(store(T , i, e), j) = select(T , j)
(RoW-3)
select(store(T , i, e), j) 6= e −→ i 6= j
Bardin, S., Gotlieb, A.
19/ 19
Details : cc for arrays Based on the standard congruence closure algorithm [union-find] each equivalence class has a witness each variable has a parent, the “higher parent” is the witness basic operations : witness(var) and union(var1,var2) clever implementations in O(n) [ranking, path compression] Extensions for arrays (FC-1)
i = j −→ select(T , i) = select(T , j)
(FC-2)
select(T , i) 6= select(T , j) −→ i 6= j
FC handled by congruence closure [standard extension]
Bardin, S., Gotlieb, A.
19/ 19
Details : cc for arrays Based on the standard congruence closure algorithm [union-find] each equivalence class has a witness each variable has a parent, the “higher parent” is the witness basic operations : witness(var) and union(var1,var2) clever implementations in O(n) [ranking, path compression] Extensions for arrays (RoW-1)
i = j −→ select(store(T , i, e), j) = e
(RoW-3)
select(store(T , i, e), j) 6= e −→ i 6= j
RoW-1 and RoW-3 handled through reduction to FC add select(store(T , i, e), i) = e for each store(T , i, e) then rely on FC
Bardin, S., Gotlieb, A.
19/ 19
Details : cc for arrays Based on the standard congruence closure algorithm [union-find] each equivalence class has a witness each variable has a parent, the “higher parent” is the witness basic operations : witness(var) and union(var1,var2) clever implementations in O(n) [ranking, path compression] Extensions for arrays (RoW-2)
i 6= j −→ select(store(T , i, e), j) = select(T , j)
RoW-2 : mechanism of delayed evaluation for each select(store(T , i, e), j), put (T , i, e, j) in a watch list when i 6= j is proved, deduce select(store(T , i, e), j) = select(T , j)
Bardin, S., Gotlieb, A.
19/ 19