Compilers and Semantics of Programming Languages


Yanyan Jiang

Overview

Compilers

  • A minimal compiler for expressions
  • An optimized compiler and beyond

Semantics

  • Formal semantics of the $\textbf{while}$ language
  • Why formal semantics?

Compilers

Compiler

A program for translating programs to lower-level programs


A Compiler for Simple Expressions

Expression compiler ecc.py

  • Based on the lark library
  • Creates a stack-based compilation to x86-64 assembly
    • Produces slow and verbose code
  • Can be assembled and linked (e.g., test-main.c)

Example (foo.e):

foo: (x + 1) * (x + 1)

Optimized Compilers

Static Single Assignments (SSA)

  • Each assignment receives a new name
    • Instruction = Value
    • Try clang with -emit-llvm -c -S
      • (Many papers in the reading list are related to compilers)

Rewriting rules and iterate until fixed point

  • Inlining (ASPLOS'22 🏅)
  • Constant Propagation
  • Dead Code Elimination
  • (Problems remain open)

Playing with Compilers

Create an optimized compiler

  • ecc-opt.py
  • By “program synthesis”!
    • Constraint solving for equivalent code templates
    • A quick reference to Z3 Python API

Try it

  • Optimize (x + 1) * (x + 1) - (x - 1) * (x - 1)
  • Optimize x * 9

Semantics

Semantics of Expressions

Can we “know better” about an expression's value?

?sum: product
    | sum "+" product   -> add
?product: atom
    | product "*" atom  -> mul
?atom: NUMBER           -> num
     | "(" sum ")"

Semantics: the “value” of expressions

  • eval('3 + 4 * 5') → 23 (expressions are trees!)
    • $\llbracket E + P \rrbracket = \llbracket E \rrbracket + \llbracket P \rrbracket$, $\llbracket E \times P \rrbracket = \llbracket E \rrbracket \times \llbracket P \rrbracket$, $\llbracket (E) \rrbracket = \llbracket E \rrbracket$
    • $\llbracket n \rrbracket = n$ (this is the base case!)

Semantics of Expressions, with Variables

eval('x + y * z') → ?

  • Introducing the program state
    • Example: $\sigma = \{ x \mapsto 1, y \mapsto 2, z\mapsto 3\}$
  • Semantics of expressions, with variables
    • “Expression value given variable assignments”
    • $\llbracket v \rrbracket_\sigma = \sigma(v)$
    • eval('x + y * z') under $\sigma$
      • eval('1 + 2 * 3') → 4

Denotational Semantics

Programs are simply state transformers!

  • $\sigma' = P(\sigma)$, i.e., $(\sigma, \sigma') \in P$
    • Variables may be assigned undefined values ($\bot$)
    • Program states can also be undefined ($\bot$)
  • Semantics of $P$ can be recursively defined (just like expressions)
    • $\llbracket \textbf{skip} \rrbracket_\sigma$
    • $\llbracket x := e \rrbracket_\sigma$
    • $\llbracket c_1;\ c_2 \rrbracket_\sigma$
    • $\llbracket \textbf{if}\ b\ {\bf then}\ c_1\ {\bf else}\ c_2 \rrbracket_\sigma $
    • $\llbracket \textbf{while}\ b\ {\bf do}\ c \rrbracket_\sigma = \llbracket \textbf{if}\ b\ \textbf{then}\ (c; \textbf{while}\ b\ {\bf do}\ c)\ \textbf{else skip} \rrbracket_\sigma $
      • Right?

Loops May Not Terminate!

What is

$$ \llbracket \textbf{while}\ \textbf{true}\ \textbf{do}\ \textbf{skip} \rrbracket_\sigma? $$

$$ \llbracket \textbf{while}\ \textbf{true}\ \textbf{do}\ \textbf{skip} \rrbracket_\sigma = \llbracket \textbf{while}\ \textbf{true}\ \textbf{do}\ \textbf{skip} \rrbracket_\sigma? $$


Loop semantics should be the least fixed point satisfying $$\llbracket \textbf{while}\ b\ {\bf do}\ c \rrbracket_\sigma = \llbracket \textbf{if}\ b\ \textbf{then}\ (c; \textbf{while}\ b\ {\bf do}\ c)\ \textbf{else skip} \rrbracket_\sigma $$

  • Awkward heading to infinity!
    • Requires many non-trivial treatments

Operational Semantics

We can define a small step of evaluation

  • Each step “reduces” the program a little bit
  • Inferences rules: premises satisfied $\to$ conclusion is also satisfied
    • There is a “proof tree”!

$$ \frac{ } { \quad \llbracket {\bf while}\ b\ {\bf do}\ c \rrbracket_\sigma = \llbracket \textbf{if}\ b\ \textbf{then}\ (c; {\bf while}\ b\ {\bf do}\ c)\ \textbf{else skip} \rrbracket_\sigma \quad } $$


$$ \frac{ \qquad \llbracket b \rrbracket_\sigma = {\rm true} \qquad \llbracket c; {\bf while}\ b\ {\bf do}\ c \rrbracket_\sigma = \sigma' \quad } { \quad \llbracket {\bf while}\ b\ {\bf do}\ c \rrbracket_\sigma = \sigma' \quad } $$


$$ \frac{ \quad \llbracket b \rrbracket_\sigma = {\rm false} \quad } { \quad \llbracket {\bf while}\ b\ {\bf do}\ c \rrbracket_\sigma = \sigma \quad } $$

Operational Semantics (cont'd)

Can also add a program (step) counter to the state!

  • State (variable assignments) $\sigma$
  • Current location (label) $\ell$
    • Just like we're single-step debugging $$ (\sigma, \ell) \xRightarrow{\rm single-step} (\sigma', \ell') $$
    • Operational semantics defines this transition (instead of $P$)

Why Bother?

Going Rigorous

Semantics enables studying programs as mathematical objects

  • Developing theorems (properties) of programs
    • Example: $\llbracket x + x \rrbracket_\sigma = \llbracket x \times 2 \rrbracket_\sigma$ for all $\sigma$
      • Obvious in this case; but also applies to complicated cases
    • A rigorous foundation for (static and dynamic) program analyses
  • All theorems can be checked by a proof assistant
    • An interpreter can be automatically deduced
    • Verified systems
      • CompCert: all optimizations preserve semantics
      • seL4: the OS kernel functions and never crashes

Going Rigorous (cont'd)

Need a non-recursive implementation of the Tower of Hanoi?

void hanoi(int n, char from, char to, char via) {
  if (n == 1) printf("%c -> %c\n", from, to);
  else {
    hanoi(n - 1, from, via, to);
    hanoi(    1, from, to, via);
    hanoi(n - 1, via, to, from);
  }
}

End.