Compiler Correctness
18 answers - 408 bytes -

Hi,
I have a simple question. Can we prove that a compiler is correct? I
mean we are given a language with it's syntax and semantics and a
compiler whose correctness needs to be proved. Is this theoretically
possible?
[I think it's been done on some toy compilers, but realistic compilers
are so large that the proof would be too long to be credible. -John]
No.1 | | 600 bytes |
| 
"pramod" <Pramod.Patangay@gmail.comwrites:
I have a simple question. Can we prove that a compiler is correct? I
mean we are given a language with it's syntax and semantics and a
compiler whose correctness needs to be proved. Is this theoretically
possible?
[I think it's been done on some toy compilers, but realistic compilers
are so large that the proof would be too long to be credible. -John]
Xavier Leroy: "Formal certification of a compiler back-end, or:
programming a compiler with a proof assistant." Proceedings of PPL
2006.
No.2 | | 561 bytes |
| 
See Xavier Leroy's recent work in PPL06.
His is not the first, attempt.
Note, both proofs are machine checked. So the credibility of the
proofs is not much in doubt. Personally, while I find these results
very interesting, there are simpler ways to achieve their goals.
pramod wrote:
I have a simple question. Can we prove that a compiler is correct? I
mean we are given a language with it's syntax and semantics and a
compiler whose correctness needs to be proved. Is this theoretically
possible?
No.3 | | 750 bytes |
| 
I have a simple question. Can we prove that a compiler is correct? I
mean we are given a language with it's syntax and semantics and a
compiler whose correctness needs to be proved. Is this theoretically
possible?
Yes, it is. In the PPL 2006 conference proceedings, Xavier Leroy
reported on his efforts implementing a provably-correct compiler for a
subset of C.
The proof is extremely large, however, and in order to ensure the
proof's correctness he had to develop the compiler in a theorem
proving system that verified the correctness of all his proofs.
"Formal certification of a compiler back-end, or: programming a
compiler with a proof assistant"
<http://pauillac.inria.fr/~>
No.4 | | 2919 bytes |
| 
Neelakantan Krishnaswami wrote:
>I have a simple question. Can we prove that a compiler is correct? I
>mean we are given a language with it's syntax and semantics and a
>compiler whose correctness needs to be proved. Is this theoretically
>possible?
>
Yes, it is. In the PPL 2006 conference proceedings, Xavier Leroy
reported on his efforts implementing a provably-correct compiler for a
subset of C.
The proof is extremely large, however, and in order to ensure the
proof's correctness he had to develop the compiler in a theorem
proving system that verified the correctness of all his proofs.
"Formal certification of a compiler back-end, or: programming a
compiler with a proof assistant"
<http://pauillac.inria.fr/~>
[Interesting paper. I wonder how likely it is that the tools, which
are not small, or the set of axioms have bugs that could affect the
validity of the proof. If it sounds like I'm channelling the famous
1979 proof paper by Perlis et al, it's not surprising since he was my
thesis advisor. -John]
A lot of the design of these tools has been shaped in specific
response to Perlis's paper.
Basically, the idea is that a theorem prover is inherently
untrustworthy. It's large, complex, and relies on very advanced math
for its correct operation -- it will have bugs with basically 100%
certainty.
So what these programs do is construct the actual proof of the theorem
as a data structure which you can print out. Then, you can write a
proof *checker* to verify the operations that the theorem prover has
carried out.
A proof checker is much smaller than the theorem prover; usually they
are a few hundred lines of code. Thus, they are small enough that
Perlis's criticisms don't apply to them -- the checkers are programs
that people can manually verify and easily write multiple competing
implementations to increase trust.
So, the process is:
1. You develop your proof interactively with the prover.
2. You dump out the proof and run your proof checkers on it.
3. If it says your proof is good, then you're done.
4. If it says your proof is bad, then you curse, and fix the
bug in the theorem prover and redo your proof.
Interestingly, this is a technique that Leroy used in his compiler.
For example, the register allocator would find a coloring, and then
he'd run a checker on it to ensure that it was okay before proceeding.
If the check failed, then his compiler would report that it failed to
compile that program rather than emitting incorrect code.
IIRC, the logic in Coq is equivalent to ZFC with countably many
inaccessible cardinals. That's a very rich foundation, but still
pretty conventional.
No.5 | | 948 bytes |
| 
our moderator writes:
>[Interesting paper. I wonder how likely it is that the tools, which
>are not small, or the set of axioms have bugs that could affect the
>validity of the proof
Also relevant -- although not quite the same thing -- is George Necula's
work on "certifying compilers" (e.g. his PLDI98 paper). Here the idea is
not to prove the compiler correct, but to prove its *output* correct, in
at least limited respects.
The idea is that the compiler annotates the output code with a proof that
the code satisfies certain constraints. That proof is then checked by an
independent checker -- *much* smaller and simpler than a theorem prover.
The result is that you can trust the compiler's output to have certain
properties even if you're not sure the compiler itself is correct. (He
also found it a very effective tool for compiler debugging.)
No.6 | | 2408 bytes |
| 
[Interesting paper. I wonder how likely it is that the tools, which
are not small, or the set of axioms have bugs that could affect the
validity of the proof. If it sounds like I'm channelling the famous
1979 proof paper by Perlis et al, it's not surprising since he was my
thesis advisor. -John]
Andrew Appel, has been doing related work in this area. He also gives a
talk where he explains point by point why Perlis was off the mark.
Unfortunately, I can't find a pointer to it now. The FPCC project tries
to establish the type-safety of SPARC machine code from a minimal TCB.
Some interesting facts that come from the FPCC project are.
A minimal proof verifier is 1-2 thousand lines of C code. This proof
verifier uses no library calls. It's designed so you can compile it to
assembly and can eyeball the output of the compiler. The basic logic
has around 10 axioms from which all of modern mathematics can be
derived. Then there is a 4000 line partial specification of the SPARC
machine semantics.
Proofs of compiler correctness are more involved. However, a minimal
proof verifier I suspect would be on order of Appel's verifier. The
statement of the theorem needs to include the machine semantics so
let's say another 4000 lines specification, plus the semantics for the
C subset. I'll guess this also is on the order of 2000 lines.
I'd say there is under 10 thousands lines of stuff you have to eye ball
very carefully, before you trust the result and believe the proof is true.
problem with the PPL06 result is that their correctness result only
applies to program that are known to terminate. This is a flaw in the
way their semantics is set up.
In any case. It's been demonstrated that simple proof verifiers can be
written in about 2000 lines of C. To trust any machine checked proof is
simply a matter of encoding a known sound logic, encoding the statement
of your theorem in that logic and then generating the proof to be checked.
For compiler correctness the statement of the theorem is relatively
complex. You need a semantics for the source and target languages. A
good deal of the challenge in a compiler correctness proof is the
statement of the theorem.
Dan (former student of Andrew W. Appel)
No.7 | | 1950 bytes |
| 
Sometimes, even large proofs contain some holes :-)
Here is a possible example with Xavier Leroy's certified compiler that
shows that it is very hard to really provide a full proof, even for
simple optimizing compilers.
I didn't read its last PPL article yet, but I remembered one of his
talks about this subject (certifying a simple compiler). At that time, I
noticed the following hole in their proof.
As explained by the author, an algorithm of constant propagation has
been formally proved. That is, if you have the following program :
y = 5 * 10
then constant propagation may replace this instruction by y = 50 for
instance. Generally, the compiler may statically compute some arithmetic
operations if the operands are known constants. This is an usual
optimization that avoids generating the code that will make the
computation at execution time if we can do it at compile time. Such code
optimization has been proved correct by coq, but an informal assumption
has been forgotten. Indeed, the machine executing the compiler may be
distinct from the machine executing the final generated code. This is
the case of cross compilers for example.
For this case, constant propagation cannot be stated as formally correct
unless you prove that the semantics of computation of these two distinct
machines are equivalent. This has not been done at the time of the talk
(maybe they did it after that time, but I am not sure).
As example, suppose that the compiler executes on a 32 bits machine and
generates a code for a 64 bits machine. If the compiler makes static
constant propagation, it makes arithmetic computation on the 32 bit
machine. This would not give you the same result if you generate the
code on the 64 bits machines.
We can imagine more complex cases with floating point computations
S
No.8 | | 4328 bytes |
| 
Neelakantan Krishnaswami <neelk@cs.cmu.eduwrote:
Basically, the idea is that a theorem prover is inherently
untrustworthy. It's large, complex, and relies on very advanced math
for its correct operation -- it will have bugs with basically 100%
certainty.
So what these programs do is construct the actual proof of the theorem
as a data structure which you can print out. Then, you can write a
proof *checker* to verify the operations that the theorem prover has
carried out.
This is interesting, because, in working on a theorem prover, I have
reached the same conclusions. Particularly, the question of bound
variables is tricky.
I think, though, one might combine a theorem prover and a proof
checker, so that the faulty proofs are detected right away. This is
what I am about to do, at least. Roughly, what is needed, is that
the*substitutions that the theorem prover engine (think of Prolog, but
more complicated) discovers are then verified for correctness.
A proof checker is much smaller than the theorem prover; usually they
are a few hundred lines of code. Thus, they are small enough that
Perlis's criticisms don't apply to them -- the checkers are programs
that people can manually verify and easily write multiple competing
implementations to increase trust.
example of a small, standalone one is MetaMath
<http://us.metamath.org/>. A mathematical proof consists essentially
of finding suitable*substitutions applied to a suitable sequence of
axioms which leads up to the statement one wants to prove. So all the
proof checker needs to do is that the found sequence of axioms (and
formerly proved statements, as they can be formally expanded) and
substitutions are applied correctly.
So, the process is:
1. You develop your proof interactively with the prover.
2. You dump out the proof and run your proof checkers on it.
3. If it says your proof is good, then you're done.
4. If it says your proof is bad, then you curse, and fix the
>* * bug in the theorem prover and redo your proof.
I think the point 4 here is very important. :-) It is very difficult to
develop a bug-free theorem prover.
Interestingly, this is a technique that Leroy used in his compiler.
For example, the register allocator would find a coloring, and then
he'd run a checker on it to ensure that it was okay before proceeding.
If the check failed, then his compiler would report that it failed to
compile that program rather than emitting incorrect code.
Therefore, if one would want to do a compiler verifier, one might be
better off to isolate a smaller, algorithmic language, which is easier to
implement a bug-freer one.
IIRC, the logic in Coq is equivalent to ZFC with countably many
inaccessible cardinals. That's a very rich foundation, but still
pretty conventional.
The consistency of ZFC and similar set theories cannot be proved (by
Goedel theorems) unless one invokes a more powerful metamathematics. And
if it is not consistent, all mathematics falls, due to the excluded third
(all statement true all false). then ends up on a belief that the
axioms are sufficient, because what should be used to prove
the*metamathematical theory one invokes to prove these theories? - To do
that, one has to invoke a still more powerful metametamathematics, and so
on. In addition, some mathematical fields require extensions of ZFC
(typically in category related fields). So, one should not assume that ZFC
is a blessing for "all known mathematics", as is a popular belief, it
seems.
Most working math is likely to be constructible within a much smaller
theory than ZFC and such similar axiomatic set theories, as one usually
has some rather basic*explicit objects at the very bottom, by which the
theoretical extensions just come in as tools to be able to handle this in
an efficient manner. And the*algorithmic parts used in a compiler should
be even smaller, just dealing with some rather explicit binary number
manipulations.
So, again, for a compiler verifier, perhaps ii helps focusing on a smaller
theory.
No.9 | | 1040 bytes |
| 
"Daniel C. Wang" <danwang74@gmail.comwrote:
A minimal proof verifier is 1-2 thousand lines of C code. This proof
verifier uses no library calls. It's designed so you can compile it to
assembly and can eyeball the output of the compiler. The basic logic
has around 10 axioms from which all of modern mathematics can be
derived.
I did not know that mathematicians had agreed on a set of axioms from
which all known mathematics can be derived. :-) Would you mind
specifying these axioms?
Further, a theorem prover as MetaMath <http://us.metamath.org/does
not use the Hilbert axioms straight off as written, but a
modification, and does not specify in exact what manners there results
a mathematical difference, if one. Perhaps it is not known.
problem in the context, is that the metamathematical axioms are often
written as*infinite sets of axioms, which in a computer context must
be properly rewritten as substitution axioms. The details here, seems
to be glossed over.
No.10 | | 1245 bytes |
| 
Sid Touati <Sid.Touati@inria.frwrote:
>As example, suppose that the compiler executes on a 32 bits machine and
generates a code for a 64 bits machine. If the compiler makes static
>constant propagation, it makes arithmetic computation on the 32 bit
>machine. This would not give you the same result if you generate the
>code on the 64 bits machines.
Any cross-compiler which doesn't emulate the *target* machine's arithmetic
when performing operations like constant propagation is simply broken.
course, if it does emulate, there is still the issue of whether it
emulates *correctly*. But this is just another case of "are there bugs in
the compiler?"; it's not a fundamental problem of missed assumptions.
>We can imagine more complex cases with floating point computations
Emulating another machine's floating point is definitely a lot of work
(the widespread consensus on IEEE-standard floating point has helped but
doesn't entirely solve the problem). But if you signed on to write a
cross-compiler, well, you knew the job was dangerous when you took it. :-)
No.11 | | 2170 bytes |
| 
Sid Touati <Sid.Touati@inria.frwrites:
I didn't read its last PPL article yet, but I remembered one of his
talks about this subject (certifying a simple compiler). At that time, I
noticed the following hole in their proof. []
Such code optimization has been proved correct by coq, but an
informal assumption has been forgotten. Indeed, the machine
executing the compiler may be distinct from the machine executing
the final generated code. This is the case of cross compilers for
example.
There is no hole of that kind in my compiler proof. The compiler
implements its own 32-bit integer arithmetic, as Coq functions that
are proved to satisfy all the arithmetic and logic properties that the
compiler optimizations rely on (associativity, distributivity,
multiplication by power of 2 is a left shift, etc). This arithmetic
is completely independent of that of the processor executing the
compiler.
These arithmetic operations are used both during compilation (e.g. the
constant propagation phase you mention) and in the formal semantics
for the target processor. That's what ensures that constant
propagation preserves semantics. If const.prop. were to use a
different arithmetic than the target processor, Coq would of course
reject the proof.
Now, you can have doubts that the arithmetic used by the compiler
matches that implemented by the target compiler. This comes back to
the issue raised by other posters in this thread, namely that the
formal semantics for the source language and the target processors
have not been formally proved correct, but only validated informally
by careful reading and by testing. (This is still much easier than
validating a whole compiler using these informal approaches.)
Formal validation of the semantics is possible, e.g. for the target
processor by relating it to the formal models used by processor
manufacturers to verify their chips. Unfortunately, such formal
models of hardware are generally trade secrets, so this is beyond the
scope of my compiler certification project.
- Xavier Leroy
No.12 | | 1592 bytes |
| 
2006-02-11, Sid Touati <Sid.Touati@inria.frwrote:
As explained by the author, an algorithm of constant propagation has
been formally proved. That is, if you have the following program :
y = 5 * 10
then constant propagation may replace this instruction by y = 50 for
instance. Generally, the compiler may statically compute some arithmetic
operations if the operands are known constants. This is an usual
optimization that avoids generating the code that will make the
computation at execution time if we can do it at compile time. Such code
optimization has been proved correct by coq, but an informal assumption
has been forgotten. Indeed, the machine executing the compiler may be
distinct from the machine executing the final generated code. This is
the case of cross compilers for example.
I'm not so deep into the matter, but since each deviating path between
compiler and verification tool could lead to differences, I'd say that
FPU operations are also a can of worms. This e.g. due to additional
precision in, at least x86, CPUs and the related rounding when saving
to mem, the outcome of FPU operations is also dependant on the path.
So result:=x*y*z is different from a1:=x*y result:=a1*z (if a1 is a memory
location)
This could probably be remedied with a tolerance (but even those could trip
in case of badly conditioned problems?)
[]
We can imagine more complex cases with floating point computations
(Hmm, seen to late that you already mentioned it)
No.13 | | 2240 bytes |
| 
Sid Touati wrote:
As example, suppose that the compiler executes on a 32 bits machine
and generates a code for a 64 bits machine. If the compiler makes
static constant propagation, it makes arithmetic computation on the
32 bit machine. This would not give you the same result if you
generate the code on the 64 bits machines.
When two cross compilers, on arbitrary machines, produce the same
instructions for a third machine, they also must produce the same
constant argument values.
Provided that your 32 bit cross compiler code will (deterministically)
produce the same instructions, regardless of it's host machine, it also
must produce the same constant values, regardless of (the arithmetic of)
the host machine.
a proof of the correctness of any program must take into
account the detailed behaviour of the host machine, and then is valid
only for that specific machine. IM it's much simpler to write
machine-independent compiler code, instead of proofing the behaviour of
machine-dependent code, for every new host machine ;-)
We can imagine more complex cases with floating point computations
That's a different thing, because in floating point computations the
least significant bits can be lost at any time, whereas a loss of the
most significant bits in integral values definitely are overflow
errors. Just the mapping of literal floating point constants, from
source code into binary values, can be affected by the arithmetic of
the host machine, and all these representations may be correct, within
the inevitable and acceptable deviations for floating point operations
and values. The numerical stability of floating point algorithms, as
implemented in the source code of the compiler input, is not of
concern in a proof of the correctness of the compiler itself.
DoDi
[Where I come from, with a given floating point arithmetic model,
there is only one correct result for each operation and one correct
representation for each input value. The ambiguities arise when a
compiler can use more than one possible sequence of operations to
evaluate an expression. -John]
No.14 | | 1432 bytes |
| 
The moderator nattered:
[Where I come from, with a given floating point arithmetic model,
there is only one correct result for each operation and one correct
representation for each input value. The ambiguities arise when a
compiler can use more than one possible sequence of operations to
evaluate an expression. -John]
Most languages allow the compiler some freedom in the evaluation of
expressions, including CSE and reordering of instructions and arguments.
What does this mean, with regards to the correctness of compilers for
such languages?
Ambiguities also can arise from different models or precision, as
implemented in specific FPU hardware. Doesn't this imply that a correct
compiler must emulate all floating point operations, so that the output
does not depend on the host machine of the compiler? The same for all
the target machines of an cross compiler
The operation of the Intel coprocessor (x87) is configurable, and this
configuration can be changed at runtime. In this case a "correct"
compiler has to assure that, after every call to external code, the FPU
configuration is restored to the expected state.
At least these points should be considered in a proof of the correctness
of any program, including compilers.
DoDi
[You're right, some languages permit multiple floating point models. -John]
No.15 | | 1850 bytes |
| 
Marco van de Voort <marcov@stack.nlwrote:
>FPU operations are also a can of worms. This e.g. due to additional
>precision in, at least x86, CPUs and the related rounding when saving
>to mem, the outcome of FPU operations is also dependant on the path.
>So result:=x*y*z is different from a1:=x*y result:=a1*z (if a1 is a memory
>location)
Not always. It *is* tricky to do floating-point constant folding, but
there are cases where it can be done, and it is possible to
automatically determine whether it's possible for a particular
expression.
Notably, if both the compiler and the target use IEEE arithmetic, and
you evaluate an expression in the compiler -- including format
conversions and roundings where appropriate -- and evaluation causes
*no* floating-point flags to be raised (not even the Inexact flag),
then it can be evaluated at compile time and moreover it is
independent of rounding mode(*). For example, folding `5.0 * 10.0' to
`50.0' is verifiably safe.
(* With one small exception that you have to watch for: for a finite
X, X-X is always zero, but whether that's +0 or -0 depends on the
rounding mode. There are only one or two situations where it makes a
visible difference. )
You may be able to go farther than that if the language spec permits,
e.g. by allowing inexact constant folding provided rounding is
correct.
Note that I said "both the *compiler* and the target", not "both the
*host* and the target". Compilers can quite reasonably use software
floating point to be sure that they match the exact behavior of the target
machine -- it's not like they do enough compile-time floating-point
arithmetic that its performance is likely to be a problem.
No.16 | | 2086 bytes |
| 
Hans-Peter Diettrich a :
a proof of the correctness of any program must take into
account the detailed behaviour of the host machine, and then is valid
only for that specific machine. IM it's much simpler to write
machine-independent compiler code, instead of proofing the behaviour of
machine-dependent code, for every new host machine ;-)
I am not expert in proving systems, but I am unsure that writing
machine-independent code would fundamentally help for certifying a
compiler: isn't like compiling for a virtual machine ?
Personally I think that, unless you compile for Turing machines (which
is impossible in practice because of memory space limitation), I doubt
that we can provide a full proof that is 100% mathematically
guaranteed with classical proofs (as we had all learn in school): you
would always have to make a set of assumptions that makes your life
easier. The question is how much and which assumptions are you ready
to admit. As said Xavier in his reply, you have for instance to assume
that the arithmetic of the target processor behaves as expected by the
compiler or by your prover. This is an acceptable assumption because
we generally trust processor vendors (do we have the choice ?),
Each real target machine (whether it is a virtual machine or not) is a
simplification of a Turing machine. Consequently the semantics of the
arithmetic computation on any of these real machines is not the
semantics of the computation used in mathematics. You have then to
mathematically define a new semantics for arithmetic computation. Unless
you use your brain (and you get a job for eternity), it's your own
machine that will implement this semantics and use it for certifying
your compiler This come back to trust the hardware implementations of
processor vendors. Finally, by following the famous nomenclature of
Xavier's proofs classifications, we can name it "proof by (commercial)
power" or "proof by resignation" :-)
S
No.17 | | 1632 bytes |
| 
Hans-Peter Diettrich <DrDiettrich@compuserve.dewrote:
>Ambiguities also can arise from different models or precision, as
>implemented in specific FPU hardware. Doesn't this imply that a correct
>compiler must emulate all floating point operations, so that the output
>does not depend on the host machine of the compiler?
Not *necessarily*, although (as I said in a previous message) that may be
a sensible thing to do on general principles.
IEEE floating point, in particular, is sufficiently tightly specified and
sufficiently widely used that a careful cross-compiler running on an IEEE
machine could mostly use its host floating-point hardware to emulate the
floating point of many target machines. (It might have to resort to
software emulation in some tricky areas.)
>The operation of the Intel coprocessor (x87) is configurable, and this
>configuration can be changed at runtime.
That's true of IEEE FPUs in general, not just the x86 one.
>In this case a "correct" compiler has to assure that, after every
>call to external code, the FPU configuration is restored to the
>expected state.
Indeed so. Modern language definitions increasingly are specifying the
details of where FPU state changes, which state changes propagate into and
out of a called routine, etc. With an older language spec, there's room
for implementer judgement about how things should work, but that's no
different from many other areas.
No.18 | | 1220 bytes |
| 
Henry Spencer wrote:
Hans-Peter Diettrich <DrDiettrich@compuserve.dewrote:
>>Ambiguities also can arise from different models or precision, as
>>implemented in specific FPU hardware. Doesn't this imply that a correct
>>compiler must emulate all floating point operations, so that the output
>>does not depend on the host machine of the compiler?
Not *necessarily*, although (as I said in a previous message) that may be
a sensible thing to do on general principles.
(snip)
I remember when the Pentium FDIV bug came out there was a C program to
compile and run to test for the bug. All the operations were done with
constants, which were evaluated by the compiler. Some compilers did it
in software, not using FDIV, and so didn't report the bug.
The IBM 360/91 supplies a rounded quotient in floating point divide,
instead of the truncated quotient specified by S/360. A program
compiled on one system may have different values for constant
expressions from the system it runs on.
-- glen
[The /91 didn't round. That was the /195. -John]