FDCC: a Combined Approach for Solving ... - Sébastien Bardin

Insufficient for many array constraints from program verification large arrays, many updates, (wide-range) variable indexes. [see formulas from SMT-LIB]. Bardin ...
713KB taille 1 téléchargements 53 vues
FDCC: a Combined Approach for Solving Constraints over Finite Domains and Arrays

S´ebastien Bardin(1) , Arnaud Gotlieb(2) (1) CEA LIST (Paris, France) (2) INRIA (Rennes, France) - Certus V&V Center, Simula (Oslo, Norway)

CPAIOR 2012

Bardin, S., Gotlieb, A.

1/ 19

Overview Goal : an efficient CP(FD) approach for array+FD constraints go beyond standard filtering-based techniques (element) motivation = software verification Approach : combine global symbolic deduction mechanisms with local filtering in order to achieve better deductive power than both technique taken in isolation Results : an original “greybox” combination for array+FD constraints I I

identify which information should be shared propose ways of taming communication cost

a prototype and encouraging experiments (random instances) I I

greater solving power (beats perfect blackbox combination) low overhead

easy to adapt for any CP(FD) solver (small API) Bardin, S., Gotlieb, A.

2/ 19

Motivations int foo (int a, int b, int c) // precondition(a,b,c) int tmp, result; tmp = a+b; if (tmp c iff foo(a,b,c) goes through else-path

Bardin, S., Gotlieb, A.

3/ 19

Motivations int foo (int a, int b, int c) // precondition(a,b,c) int tmp, result; tmp = a+b; if (tmp c ∧ ¬Ψpost (a, b, c, c)

Bardin, S., Gotlieb, A.

3/ 19

Motivations int foo (int a, int b, int c) // precondition(a,b,c) int tmp, result; tmp = a+b; if (tmp c ∧ ¬Ψpost (a, b, c, c)

Bardin, S., Gotlieb, A.

3/ 19

Motivations (2) Constraint resolution becomes prominent in formal verification especially software verification Underlies several approaches, either for test generation or invariant computation [abstract model checking, bounded model checking] [symbolic execution, weakest precondition calculus]

Verification reduces to solving Verification Conditions (VCs)

Bardin, S., Gotlieb, A.

4/ 19

Motivations (2) Constraint resolution becomes prominent in formal verification especially software verification Underlies several approaches, either for test generation or invariant computation [abstract model checking, bounded model checking] [symbolic execution, weakest precondition calculus]

Verification reduces to solving Verification Conditions (VCs) We consider quantifier-free conjunctive fragments interesting by themselves [symbolic execution, test data generation] basic block of solvers handling disjunctions and quantifications

Bardin, S., Gotlieb, A.

4/ 19

CP(FD) and Verification Most verification techniques are based on SMT Yet, CP(FD) is a natural and interesting alternative since basic data types naturally range over finite domains Potentially interesting for bounded (non-linear) integer arithmetic modular arithmetic [Gotlieb-Leconte-Marre 10] bitvectors [Bardin-Herrmann-Perroud 10] floating-point arithmetic [Botella-Gotlieb-Michel 06] A few CP-based verification tools exist [+ encouraging case-studies] Inka [Gotlieb-Botella-Rueher 00], GATeL [Marre-Blanc 05] Osmose [Bardin-Herrmann 08], Jaut [Charreteur-Botella-Gotlieb 09]

Bardin, S., Gotlieb, A.

5/ 19

CP(FD) and Verification Most verification techniques are based on SMT Yet, CP(FD) is a natural and interesting alternative since basic data types naturally range over finite domains Potentially interesting for bounded (non-linear) integer arithmetic modular arithmetic [Gotlieb-Leconte-Marre 10] bitvectors [Bardin-Herrmann-Perroud 10] floating-point arithmetic [Botella-Gotlieb-Michel 06] A few CP-based verification tools exist [+ encouraging case-studies] Inka [Gotlieb-Botella-Rueher 00], GATeL [Marre-Blanc 05] Osmose [Bardin-Herrmann 08], Jaut [Charreteur-Botella-Gotlieb 09] But CP(FD) lacks an efficient handling of array constraints Bardin, S., Gotlieb, A.

5/ 19

The theory of arrays The standard theory of arrays is defined by three sorts : arrays A, elements of arrays E , indexes I function select(T , i) : A × I 7→ E function store(T , i, e) : A × I × E 7→ A = and 6= over E and I Semantics (read-over-write) (FC) i = j −→ select(T , i) = select(T , j) (RoW-1) i = j −→ select(store(T , i, e), j) = e (RoW-2) i 6= j −→ select(store(T , i, e), j) = select(T , j)

Bardin, S., Gotlieb, A.

6/ 19

The theory of arrays (2) Why does array theory matter so much in verification ? for modelling arrays and vectors [of course !] basis for more advanced containers I I

maps, hash tables memory heap

A few remarks about the theory no constraint on array size or domains of indexes / elements [need to combine with constraints on E and I ]

no equality / disequality between arrays yet, difficult to solve [NP-hard for the ∧-fragment]

Bardin, S., Gotlieb, A.

7/ 19

CP and arrays : local filtering arrays represented by pairs (index, element) [explicit arrays of logical variables]

constraints on domains of indexes / elements (and size) select : well-known constraint element [Van Hentenryck-Carillon 88, Brand 01]

store : more recent work [Charreteur-Botella-Gotlieb 09]

Element(ARRAY,I,E) :( integer(I)? ARRAY[I] == E, success ; S D(E) ← D(E) ∩ i∈D(I) D(ARRAY(i)), D(I) ← {i ∈ D(I)|D(E) ∩ D(ARRAY[i]) 6= ∅}, wait(...) )

Bardin, S., Gotlieb, A.

8/ 19

CP and arrays : local filtering arrays represented by pairs (index, element) [explicit arrays of logical variables]

constraints on domains of indexes / elements (and size) select : well-known constraint element [Van Hentenryck-Carillon 88, Brand 01]

store : more recent work [Charreteur-Botella-Gotlieb 09]

Update(A,I,E,A’) :( integer(I)? A’[I]==E, ∀ k 6= I do A’[k]==A[k], success ; S D(E) ← D(E) ∩ i∈D(I) D(A’(i)), D(I) ← {i ∈ D(I)|D(E) ∩ D(A’[i]) 6= ∅}, ∀ k 6∈ D(I) do A’[k] == A[k] ∀ k ∈ D(I) do D(A’[k]) ← D(A’[k]) ∩ (D(A[k])∪ D(E)) ... ) Bardin, S., Gotlieb, A.

8/ 19

CP and arrays : local filtering (2) Fine for “simple” array constraints either small arrays or very few updates fixed-value indexes (or at least no wide-domain indexes) Insufficient for many array constraints from program verification large arrays, many updates, (wide-range) variable indexes [see formulas from SMT-LIB]

Bardin, S., Gotlieb, A.

9/ 19

CP and arrays : local filtering (2) Fine for “simple” array constraints either small arrays or very few updates fixed-value indexes (or at least no wide-domain indexes) Insufficient for many array constraints from program verification large arrays, many updates, (wide-range) variable indexes [see formulas from SMT-LIB]

e = select(T , i) ∧ f = select(T , j) ∧ e 6= f ∧ i = j T array of size 100 Domains : 0..100

×

fd : needs labelling [no answer in 60 min in COMET]

Bardin, S., Gotlieb, A.

9/ 19

CP and arrays : local filtering (2) Fine for “simple” array constraints either small arrays or very few updates fixed-value indexes (or at least no wide-domain indexes) Insufficient for many array constraints from program verification large arrays, many updates, (wide-range) variable indexes [see formulas from SMT-LIB]

e = select(T , i) ∧ f = select(T , j) ∧ e 6= f ∧ i = j T array of size 100 Domains : 0..100

×

fd : needs labelling [no answer in 60 min in COMET]

Bardin, S., Gotlieb, A.

9/ 19

CP and arrays : local filtering (2) Fine for “simple” array constraints either small arrays or very few updates fixed-value indexes (or at least no wide-domain indexes) Insufficient for many array constraints from program verification large arrays, many updates, (wide-range) variable indexes [see formulas from SMT-LIB]

i ∈ 1..5 ∧ j ∈ 6..10 ∧ a 6= select(store(store(T , j, a), i, b), j)

×

fd : needs labelling, cannot established select(store(T , j, a), j) = a

Bardin, S., Gotlieb, A.

9/ 19

CP and arrays : local filtering (2) Fine for “simple” array constraints either small arrays or very few updates fixed-value indexes (or at least no wide-domain indexes) Insufficient for many array constraints from program verification large arrays, many updates, (wide-range) variable indexes [see formulas from SMT-LIB]

i ∈ 1..5 ∧ j ∈ 6..10 ∧ a 6= select(store(store(T , j, a), i, b), j)

×

fd : needs labelling, cannot established select(store(T , j, a), j) = a

Bardin, S., Gotlieb, A.

9/ 19

Our approach

Bardin, S., Gotlieb, A.

10/ 19

Our approach

Bardin, S., Gotlieb, A.

10/ 19

Our approach

Bardin, S., Gotlieb, A.

10/ 19

Our approach

Bardin, S., Gotlieb, A.

10/ 19

Our approach

Bardin, S., Gotlieb, A.

10/ 19

Our approach

Bardin, S., Gotlieb, A.

10/ 19

Examples e = select(T , i) ∧ f = select(T , j) ∧ e 6= f ∧ i = j T array of size 100 Domains : 0..100

X × X

cc : unsat quickly (axiom FC) fd : needs labelling [no answer in 60 min in COMET] fdcc : unsat quickly through cc

Bardin, S., Gotlieb, A.

11/ 19

Examples e = select(T , i) ∧ f = select(T , j) ∧ e 6= f ∧ i = j T array of size 100 Domains : 0..100

X × X

cc : unsat quickly (axiom FC) fd : needs labelling [no answer in 60 min in COMET] fdcc : unsat quickly through cc

Bardin, S., Gotlieb, A.

11/ 19

Examples e = select(T , i) ∧ f = select(T , j) ∧ e 6= f ∧ i = j T array of size 100 Domains : 0..100

X × X

cc : unsat quickly (axiom FC) fd : needs labelling [no answer in 60 min in COMET] fdcc : unsat quickly through cc

Bardin, S., Gotlieb, A.

11/ 19

Examples e = select(T , i) ∧ f = select(T , j) ∧ e 6= f ∧ i = j T array of size 100 Domains : 0..100

X × X

cc : unsat quickly (axiom FC) fd : needs labelling [no answer in 60 min in COMET] fdcc : unsat quickly through cc

Bardin, S., Gotlieb, A.

11/ 19

Examples i ∈ 1..5 ∧ j ∈ 6..10 ∧ a 6= select(store(store(T , j, a), i, b), j)

× × X

cc : no deduction since i 6= j cannot be inferred fd : needs labelling, cannot established select(store(T , j, a), j) = a fdcc : fd deduces i 6= j (domain-check), cc can then deduce a 6= select(store(T , j, a), j) then a 6= a and unsat

Bardin, S., Gotlieb, A.

11/ 19

Examples i ∈ 1..5 ∧ j ∈ 6..10 ∧ a 6= select(store(store(T , j, a), i, b), j)

× ×

cc : no deduction since i 6= j cannot be inferred

X

fdcc : fd deduces i 6= j (domain-check), cc can then deduce a 6= select(store(T , j, a), j) then a 6= a and unsat

fd : needs labelling, cannot established select(store(T , j, a), j) = a

Bardin, S., Gotlieb, A.

11/ 19

Examples i ∈ 1..5 ∧ j ∈ 6..10 ∧ a 6= select(store(store(T , j, a), i, b), j)

× ×

cc : no deduction since i 6= j cannot be inferred

X

fdcc : fd deduces i 6= j (domain-check), cc can then deduce a 6= select(store(T , j, a), j) then a 6= a and unsat

fd : needs labelling, cannot established select(store(T , j, a), j) = a

Bardin, S., Gotlieb, A.

11/ 19

Examples i ∈ 1..5 ∧ j ∈ 6..10 ∧ a 6= select(store(store(T , j, a), i, b), j)

× ×

cc : no deduction since i 6= j cannot be inferred

X

fdcc : fd deduces i 6= j (domain-check), cc can then deduce a 6= select(store(T , j, a), j) then a 6= a and unsat

fd : needs labelling, cannot established select(store(T , j, a), j) = a

Bardin, S., Gotlieb, A.

11/ 19

Examples e = select(T , i) ∧ f = select(T , j) ∧ g = select(T , k) ∧e 6= f ∧ e 6= g ∧ f 6= g T array of size 2, domain of indexes 1..2

×

cc : deduces allDifferent(i,j,k), does not output unsat (domains not taken into account)

× X

fd : needs labelling [labels indexes first ! !] fdcc : cc deduces allDifferent(i,j,k), then fd deduces unsat

Bardin, S., Gotlieb, A.

11/ 19

Examples e = select(T , i) ∧ f = select(T , j) ∧ g = select(T , k) ∧e 6= f ∧ e 6= g ∧ f 6= g T array of size 2, domain of indexes 1..2

×

cc : deduces allDifferent(i,j,k), does not output unsat (domains not taken into account)

× X

fd : needs labelling [labels indexes first ! !] fdcc : cc deduces allDifferent(i,j,k), then fd deduces unsat

Bardin, S., Gotlieb, A.

11/ 19

Communication framework Communication between fd and cc can be costly especially, checking (dis-)equalitites of variables through their domains, |V |2 pairs to be checked How to tame communication costs ? a communication policy allowing tight control over expensive communications a reduction of the number of pairs of variables to consider (critical pairs) Other labelling is only transmitted to fd

Bardin, S., Gotlieb, A.

12/ 19

Communication framework (2) Communication policy cheap communications (cc 7→ fd) made asynchronously expensive ones (fd 7→ cc) made on request (supervisor) Critical pairs focus on pairs whose (dis-)equality will surely lead to new deductions in cc [see axioms] the set of critical pairs is defined by a. ∀ v =select(T , i) and v 0 =select(T , j), pairs (v , v 0 ) and (i, j) ˆ ˆ b. ∀ v =select(store(T , i, e), j), pairs (i, j) and (e, v ) ˆ

still quadratic (in #select ) another reduction : focus only on type (b.) I I

Bardin, S., Gotlieb, A.

linear in #select, capture the specificity of array axioms manageable in practise, still brings interesting deductions 13/ 19

Communication framework (3)

Bardin, S., Gotlieb, A.

14/ 19

Implementation implemented in SICStus Prolog (≈ 1.7 KLOC) FD solver use the SICStus clpfd library add our own array select and store operations [Charreteur-Botella-Gotlieb 09]

simple labelling heuristics such as first-fail cc and supervisor build our own [straightforward implementations] cliques : reduce search to 3-clique clique detection launched when new disequality added

Bardin, S., Gotlieb, A.

15/ 19

Experiments on random instances Shape of random formulas 40 variables, 3-6 arrays of size 20, domain = 0..50 four kind of formulas (easy / hard array, arith / no arith) length 10-60 Properties to be evaluated ability to solve as many formulas as possible comparison with fd and cc [including overhead] comparison with blackbox combinations (hybrid and best) Experiment 1 : evaluates on 369 formulas, balanced in the 4 classes and sat / unsat Experiment 2 : evaluates on 100 formulas for each length between 10 and 60 [performance w.r.t. complexity threshold]

Bardin, S., Gotlieb, A.

16/ 19

Results First experiment solving power : solves > than fd, cc, or best I I

22 formulas (out of 369) solved only by fdcc 5x less TO than fd and 3x less TO than best

affordable overhead over cc and fd [when they succeed] I

at worst 4x slower, on average 1.1x - 1.5x slower

robustness : results hold for all 4 classes and sat / unsat Second experiment fdcc again better than fd and cc maximal benefits on hard-to-solve formulas [closed to complexity threshold]

Bardin, S., Gotlieb, A.

17/ 19

Results (2)

cc fd fdcc best hybrid

S 29 154 181 154 154

Total (369) U TO 115 225 151 64 175 13 175 40 175 40

T 13545 3995 957 2492 2609

S : # sat answer, U : # unsat answer, TO : # time-out (60 sec), T : time in sec.

Bardin, S., Gotlieb, A.

18/ 19

Results (2)

cc fd fdcc best hybrid

cc fd fdcc best hybrid

S 26 39 40 39 39

S 1 50 52 50 50

AEUF-I (79) U TO 37 16 26 14 37 2 37 3 37 3

T 987 875 144 202 242

AEUF+LIA-I (100) U TO T 21 78 4689 47 3 199 48 0 24 48 2 139 48 2 159

S 2 35 51 35 35

AEUF-II (90) U TO 30 58 18 37 30 9 30 25 30 25

S 0 30 38 30 30

AEUF+LIA-II (100) U TO T 27 73 4384 60 10 622 60 2 154 60 10 622 60 10 647

T 3485 2299 635 1529 1561

S : # sat answer, U : # unsat answer, TO : # time-out (60 sec), T : time in sec.

Bardin, S., Gotlieb, A.

18/ 19

Results (2) #(unsolved formulas)

#(solved formulas) CCFD

99 95

93

CC

TO_CCFD

FD

92

88 82

75

56

84

48

66

65

60

40

35

52 44 31

34 25

30

20

30

40

50

60

10

20

18 12

11

8

7

5 1 10

TO_FD

70

69

97 96

89

TO_CC

16 4 3

30

40

50

60

Gain with FDCC Miracle

Gain

88

83

81

39

36 21 11

15

10

5

4 10

Bardin, S., Gotlieb, A.

20

30

40

50

0 60

18/ 19

Conclusion Results an original decision procedure for arrays that combines ideas from symbolic reasoning and finite-domain constraint solving I I

identify which information should be shared propose ways of taming communication cost

a prototype and encouraging experiments (random instances) I I

greater solving power (beats even best) low overhead

easy to adapt for any CP(FD) solver Future work experiments on real-life problems extend the approach to handle memory heaps (new, delete) Bardin, S., Gotlieb, A.

19/ 19

About VCs Logical connectors ∧ : to express paths ∨ : to embed several paths in one formula [alternative : enumerate them in the verification tool]

∀, ∃ : advanced preconditions / postconditions / contracts First-order theories for data types basic data types : integers, bitvectors collections : arrays, maps We consider here quantifier-free conjunctive fragments interesting by themselves [symbolic execution, test data generation] basic block of solvers handling disjunctions and quantifications Bardin, S., Gotlieb, A.

19/ 19

Why a dedicated combination framework ? Or : more direct approaches, and why we do not choose them

Standard combination scheme between arrays and CP(FD) [Nelson-Oppen (NO)]

solving arrays is already NP-hard NO is heavy on non-convex theories like arrays or integers FD constraints do not fit well into NO assumptions [infinite model]

Remove all store functions by introducing ∨ CP(FD) not well-adapted for handling case-splits Simple concurrent black-box combination [first success wins] we want to outperform it in solving power while still allowing easy re-use of any CP(FD) engine Bardin, S., Gotlieb, A.

19/ 19

The two solvers cc and fd A semi-decision procedure cc for (pure) arrays global symbolic reasoning polynomial-time (no case-split) correct but not complete [may output “maybe”] based on the standard congruence closure algorithm I

+ rules for array axioms

A CP(FD) solvers for arrays and FD-constraints local domain filtering correct [complete with a labelling] based on the existing CP(FD) constraints re-use of existing CP(FD) solvers through a small API I I

Bardin, S., Gotlieb, A.

is fd eq(x,y) and is fd diff(x,y) if store and select not available, give access to internal domains (set, get, ∪, ∩, ∈, ∅?) 19/ 19

Details : cc for arrays Based on the standard congruence closure algorithm [union-find] each equivalence class has a witness each variable has a parent, the “higher parent” is the witness basic operations : witness(var) and union(var1,var2) clever implementations in O(n) [ranking, path compression] Extensions for arrays (FC-1)

i = j −→ select(T , i) = select(T , j)

(FC-2)

select(T , i) 6= select(T , j) −→ i 6= j

(RoW-1)

i = j −→ select(store(T , i, e), j) = e

(RoW-2)

i 6= j −→ select(store(T , i, e), j) = select(T , j)

(RoW-3)

select(store(T , i, e), j) 6= e −→ i 6= j

Bardin, S., Gotlieb, A.

19/ 19

Details : cc for arrays Based on the standard congruence closure algorithm [union-find] each equivalence class has a witness each variable has a parent, the “higher parent” is the witness basic operations : witness(var) and union(var1,var2) clever implementations in O(n) [ranking, path compression] Extensions for arrays (FC-1)

i = j −→ select(T , i) = select(T , j)

(FC-2)

select(T , i) 6= select(T , j) −→ i 6= j

FC handled by congruence closure [standard extension]

Bardin, S., Gotlieb, A.

19/ 19

Details : cc for arrays Based on the standard congruence closure algorithm [union-find] each equivalence class has a witness each variable has a parent, the “higher parent” is the witness basic operations : witness(var) and union(var1,var2) clever implementations in O(n) [ranking, path compression] Extensions for arrays (RoW-1)

i = j −→ select(store(T , i, e), j) = e

(RoW-3)

select(store(T , i, e), j) 6= e −→ i 6= j

RoW-1 and RoW-3 handled through reduction to FC add select(store(T , i, e), i) = e for each store(T , i, e) then rely on FC

Bardin, S., Gotlieb, A.

19/ 19

Details : cc for arrays Based on the standard congruence closure algorithm [union-find] each equivalence class has a witness each variable has a parent, the “higher parent” is the witness basic operations : witness(var) and union(var1,var2) clever implementations in O(n) [ranking, path compression] Extensions for arrays (RoW-2)

i 6= j −→ select(store(T , i, e), j) = select(T , j)

RoW-2 : mechanism of delayed evaluation for each select(store(T , i, e), j), put (T , i, e, j) in a watch list when i 6= j is proved, deduce select(store(T , i, e), j) = select(T , j)

Bardin, S., Gotlieb, A.

19/ 19