Certifying cost annotations in compilers
Nicolas Ayache Post-doctorate at PPS Paris 7 18th november 2010
1/34
Presentation The CerCo project
3 years European project (FP7) Laboratories
I I I
University of Bologna Claudio Sacerdoti Cohen University of Edinburgh Randy Pollack University of Paris Diderot (PPS) Roberto Amadio (site leader) Nicolas Ayache (post-doc) Yann Régis-Gianas Ronan Saillard (Ph.D. student)
2/34
Presentation State of the art: Worst Case Execution Time
AbsInt
(Abstract Interpretation)
Concrete time X Binary analysis X User interaction (loop iteration) X Not formally proven
√
Linear Logic
Formal framework X Complexity classes X Verbose
√
3/34
Presentation Goal
Formally sound, cost annotating compiler C program CerCo
√
4/34
Executable C program + Cost annotations
Overapproximated concrete complexity
Presentation CerCo's approach: problematic
What is the cost of evaluating tab[i]+1 ? Depends on:
I I I
5/34
The variable nal locations The way memory accesses are compiled The way operations are compiled ⇒ Depends on the compilation process
Presentation CerCo's approach: solution Bad solution Consider worst case scenarios
X Too imprecise X Optimizations lost X Not modular
6/34
Presentation CerCo's approach: solution Bad solution Consider worst case scenarios
X Too imprecise X Optimizations lost X Not modular CerCo's solution
Symbolic cost update
: label
Compilation Labelled Source Labelling Labelled source assembly 6/34
Concrete costs
Presentation CerCo's approach: common considerations
Operational semantics
7/34
: P(Prog × State × State) : P(Prog × State × label trace × State)
I
Without labels
I
With labels
Presentation CerCo's approach: common considerations
Operational semantics
: P(Prog × State × State) : P(Prog × State × label trace × State)
I
Without labels
I
With labels
Annotation semantics
Annotation = Instruction of the source language Explicits cost annotations Analysis tools for cost synthesis √
I I
7/34
Presentation CerCo's approach: common considerations
Operational semantics
: P(Prog × State × State) : P(Prog × State × label trace × State)
I
Without labels
I
With labels
Annotation semantics
Annotation = Instruction of the source language Explicits cost annotations Analysis tools for cost synthesis √
I I
Demo 7/34
Presentation Outline
1
2
3 8/34
Toy compiler Languages Compilation Labelling Annotation Realistic C compiler Architecture Labelling Experiments Future work
Toy compiler Outline
1
2
3 9/34
Toy compiler Languages Compilation Labelling Annotation Realistic C compiler Architecture Labelling Experiments Future work
Toy compiler Overview
Languages
Imp −→ VM −→ ASM
I I I
10/34
Imp: simple while language VM: virtual machine (stack operations) ASM: assembly
Toy compiler Overview
Languages
Imp −→ VM −→ ASM
I I I
Imp: simple while language VM: virtual machine (stack operations) ASM: assembly
In the following...
I I I
10/34
How to label the languages? How to adapt the proofs? How to keep the proofs modular?
Toy compiler
Languages
Syntax
11/34
Imp
L
e ::= id | n ∈ N | e + e b ::= e < e S ::= skip | id := e | S ; S | if b then S else S P ::= prog S VM
I ::=
ASM
P
b do S
L
n
id ) | setvar(id ) | add | branch(k ) | bge(k ) | halt
cnst( ) | var(
P ::= I
I
| while
list
L
::=
loadi
R ,n | load R ,addr | store R ,addr k | bge R ,R ,k | halt
| branch
::=
I
list
| add
R ,R ,R
Toy compiler Labelled syntax
11/34
Imp
Languages
L
e ::= id | n ∈ N | e + e b ::= e < e S ::= skip | id := e | S ; S | if b then S else S P ::= prog S VM
I ::=
ASM
P
b do S
|
`:S
L
n
id ) | setvar(id ) | add | branch(k ) | bge(k ) | halt | emit(`)
cnst( ) | var(
P ::= I
I
| while
list
L
::=
loadi
R ,n | load R ,addr | store R ,addr k | bge R ,R ,k | halt | emit `
| branch
::=
I
list
| add
R ,R ,R
Compilation
Toy compiler Simulation
Imp
L
VM
L
ASM
L
Imp VM ASM Labelling Erasure Compilation Labelled compilation
12/34
Compilation
Toy compiler Simulation
Imp
L
VM
ASM
L
L
Imp VM ASM Labelling Erasure Compilation Labelled compilation Theorem EImp ◦ LImp = IdImp L CImp ◦ EImp = EVM ◦ CImp L CVM ◦ EVM = EASM ◦ CVM
: diagram commutativity
12/34
⇒ EASM ◦ C L ◦ LImp = C
Labelling
Toy compiler Example: soundness prog `1 : if n < 1 then `2 : res := 0 else `3 : while res < 5 do `4 : res := res + n
emit `1
load R0,ln
loadi R1,1
emit `2
loadi R0,0
store R0,lres
branch
bge R1,R0
loadi R0,5
load R1,lres
bge R0,R1
emit `3
load R0,lres
branch
13/34
halt
store R0,lres
add R0,R0,R1
load R1,ln
Labelling
Toy compiler Example: soundness prog `1 : if n < 1 then `2 : res := 0 else `3 : while res < 5 do `4 : res := res + n
emit `1
load R0,ln
loadi R1,1
emit `2
loadi R0,0
store R0,lres
branch
bge R1,R0
loadi R0,5
load R1,lres
bge R0,R1
emit `3
load R0,lres
branch
store R0,lres
add R0,R0,R1
Loop with no label ⇒ not constant cost code 13/34
halt
load R1,ln
Labelling
Toy compiler Example: precision prog `1 : if n < 1 then `2 : res := 0 else `3 : while res < 5 do `4 : res := res + n
emit `1
load R0,ln
loadi R1,1
emit `2
loadi R0,0
store R0,lres
branch
halt
bge R1,R0
loadi R0,5
load R1,lres
bge R0,R1
emit `4
load R0,lres
branch
14/34
store R0,lres
add R0,R0,R1
load R1,ln
Labelling
Toy compiler Example: precision prog `1 : if n < 1 then `2 : res := 0 else `3 : while res < 5 do `4 : res := res + n
emit `1
load R0,ln
loadi R1,1
emit `2
loadi R0,0
store R0,lres
branch
halt
bge R1,R0
loadi R0,5
load R1,lres
bge R0,R1
emit `4
load R0,lres
branch
store R0,lres
add R0,R0,R1
load R1,ln
From emit `1: three paths with dierent costs ⇒ imprecision 14/34
Toy compiler Labelling criteria
Labelling
Soundness
Every reachable code is in the scope of a label. At least one label inside each loop.
Precision
Two dierent paths to the next labels have the same cost.
Criteria
The labelling must be sound and precise. √ Nice plus is a reasonable economy: not too many labels. (This can be syntatically checked on the assembly code.)
15/34
Toy compiler Instrumentation
Annotation
Cost deduction
16/34
ASM Instruction list → N
Given:
φ:
Deduced:
κ:L→N
loop free may overapproximate
Annotation
Toy compiler Instrumentation
Cost deduction
ASM Instruction list → N
Given:
φ:
Deduced:
κ:L→N
loop free may overapproximate
Instrumentation I I I
Use a fresh variable Initialize it to 0 Replace labels with increments (following κ) prog `1 : while res < 5 do `2 : res := res + n
16/34
−→
prog _cost := 0; _cost := _cost + 2; while res < 5 do _cost := _cost + 4; res := res + n
Annotation
Toy compiler Annotation
17/34
Annotation = Instrumentation ◦ Labelling
Imp Imp
L
Imp
VM
L
VM Labelling Compilation Cost deduction
ASM
L→N L
ASM Erasure Labelled compilation Instrumentation
Realistic C compiler Outline
1
2
3 18/34
Toy compiler Languages Compilation Labelling Annotation Realistic C compiler Architecture Labelling Experiments Future work
Architecture
Realistic C compiler A
C Clight Cminor RTL
abs
RTL ERTL LTL LIN MIPS CSE
19/34
Dead code Compress Coloring
Architecture
Realistic C compiler C to Cminor
C Clight Cminor RTL
abs
RTL ERTL LTL LIN MIPS CSE
I
Dead code Compress Coloring
Inspired from CompCert C to Clight: CIL Clight to Cminor: manual port from Coq to OCaml (GNU GPL and INRIA Non-Commercial)
I I
20/34
Architecture
Realistic C compiler RTLabs
C Clight Cminor RTL
abs
RTL ERTL LTL LIN MIPS CSE
I I I
Home made Architecture independent Retargetting simplied (inspired from Gimple in GCC) √ Some common optimizations
X Some optimisations lost
21/34
Dead code Compress Coloring
Architecture
Realistic C compiler RTL to MIPS
C Clight Cminor RTL
abs
RTL ERTL LTL LIN MIPS CSE
I
Adapted from pedagogical Pseudo-Pascal compiler (François Pottier, Creative Commons)
22/34
Dead code Compress Coloring
Architecture
Realistic C compiler Optimizations
C Clight Cminor RTL
abs
RTL ERTL LTL LIN MIPS CSE
I I I I
23/34
Dead code Compress Coloring
CSE: Common Subexpression Elimination Dead code elimination Coloring: variable locations Graph compression
Architecture
Realistic C compiler Restrictions
C Clight Cminor RTL
abs
RTL ERTL LTL LIN MIPS CSE
Clight
Dead code Compress Coloring
X long long and long double types X longjmp and setjmp instructions X Unreasonable forms of switch X Unprototyped and variable-arity functions
RTL
X float
24/34
Realistic C compiler Overview
Labelling
Clight Clight
Cminor
Clight
Cminor
L
L
Labelling Compilation Cost deduction 25/34
L→N
...L
MIPS
...
MIPS
L
Erasure Labelled compilation Instrumentation
Labelling
Realistic C compiler Dierences Imp / C Side eect expressions
Ternary expressions Labels and Gotos Function calls
26/34
y = x++;
x ? y+2*z : z
lbl: ...
f(&x);
goto lbl;
Labelling
Realistic C compiler Dierences Imp / C Side eect expressions
eliminated by CIL
Ternary expressions Labels and Gotos Function calls
26/34
→
y = x++; _tmp = x; x = _tmp+1; y = _tmp;
x ? y+2*z : z
lbl: ...
f(&x);
goto lbl;
Realistic C compiler Ternary expressions
Labelling
x? (y+2*z) : z x?
y+2*z z
Branching ⇒ 1 label per branch for precision
27/34
Realistic C compiler Ternary expressions
Labelling
x? (y+2*z) : z x?
y+2*z z
Branching ⇒ 1 label per branch for precision Labelled expressions x? (`1 : Instrumentation
y+2*z) : (`2 : z)
1) Side eects inside expressions 2) Elimination by CIL C CIL Clight Instrument C CIL Clight
27/34
Labelling
Realistic C compiler Labels and Gotos lbl: ... lbl:
...
goto lbl; goto lbl;
Label ⇒ potential loop ⇒ 1 cost label needed
28/34
Labelling
Realistic C compiler Labels and Gotos lbl: ...
goto lbl;
...
lbl:
goto lbl;
Label ⇒ potential loop ⇒ 1 cost label needed lbl: `: ... lbl:
28/34
`:
goto lbl; ...
goto lbl;
Realistic C compiler Function calls
scope
29/34
Labelling
Function call: sequential instruction 1
x++; f(&x); y = x;
void f(int* x) { ...A return; }
scope
2
Realistic C compiler Function calls
scope
29/34
Labelling
Function call: sequential instruction 1
x++; f(&x); y = x;
void f(int* x) { ...A return; }
scope
2
Realistic C compiler Function calls
scope
Labelling
Function call: sequential instruction 1
x++; f(&x); y = x;
void f(int* x) { ...A return; }
scope
2
Function √pointer: statically unresolvable destination Each function handles its cost scope
1
29/34
x++; f(&x); y = x;
void f(int* x) { ...A return; }
scope
2
Experiments
Realistic C compiler Benchmarks
badsort fib mat_det min quicksort search
30/34
gcc -O0
55.93 76.24 163.42 12.21 27.46 463.19
acc
gcc -O1
34.51 12.96 34.28 45.68 156.20 54.76 16.25 3.95 17.95 9.41 623.79 155.38
Future work Outline
1
2
3 31/34
Toy compiler Languages Compilation Labelling Annotation Realistic C compiler Architecture Labelling Experiments Future work
Future work
I I I I I
32/34
Formal proofs in Matita Real world example (Lustre) Frama-C plugin (demo) Retarget to 8051 (moduralize the target architecture) Functional languages
Conclusion
CerCo I I I I
33/34
C sound compiler (untrusted for now...) Correct cost annotations (overapproximation) Symbolic cost labels Modular proofs
Conclusion
Thank you for your attention!
Questions?
34/34