aria.rip

Mechanisation of WebAssembly 1.0 (in Agda)

module Essay where
    

This essay formalises parts of the WebAssembly 1.0 standard, a safe and portable bytecode format.

WebAssembly (or Wasm) is a stack-machine based language that uses structured control flow. It was designed to allow certain memory safety guarantees about a program to be checked before the program runs, so that checks don’t slow down the program at runtime.

We formalise a limited subset of WebAssembly that demonstrates its most important aspects. Specifically, we exclude:

Many of these features, such as floating point operations, would be formalised almost identically to features we have implemented. For the others, there are some notes on how these could be implemented at the end.

We provide an instruction type that enforces the validation requirements described in section 3 of the specification, hence formalising the validation step. We then formalise the reduction rules found in section 4 of the specification, and prove that they preserve types.

We take inspiration from Watt, 2021 in how we formalise instructions and state configurations.

I originally wrote this essay for Philip Wadler’s TSPL class. I did a little extra work on it after, but am probably not going to finish it anytime soon, so hopefully it will be useful to other people.

Language

Imports:

module Language where
      open import Agda.Builtin.Nat
      open import Agda.Builtin.List
      open import Agda.Builtin.Bool

      infix 4 _⊢_⟩_
      infix 4 _⊢_⟫_
      infixr 4 _then_

      infix 5 _∋_

      infixl 6 _,_
      infixl 6 _+ˡ_
      infixl 6 _+ʳ_
    

Conventional assembly (such as MIPS or x86) uses a number of registers to store working values, with operations such as add working on those registers. Local variables are then stored somewhere in RAM, usually on the stack.

By contrast, WebAssembly (abbreviated to Wasm) has no registers. Instructions pop their non-constant arguments off of an operand stack, and push (return) their result back onto it. Instructions can only act on the top elements of the stack, therefore local variables are stored off of the operand stack and retrieved/set as needed.

For example, the following program calculates 2+2 and leaves the result on the operand stack:

           Stack: []
  const 2      -> [2]
  const 2      -> [2, 2]
  add          -> [4]

Control flow is structured, meaning you cannot jump to arbitrary locations. Instructions such as block and loop nest instruction sequences recursively. The inner instruction sequence acts on an empty stack, and optionally has a return type that it must have at the top of the stack when it is done.

br and br-if instructions can jump to any of the control structures they are inside. If the target of the jump is a block, execution skips to after that block, and if it is a loop, then back to the start.

In the following program, we use br to skip over some instructions. In this case, the block has a return value (Number), which must be at the top of the stack when we return or finish.

             → []
  const 2    → [2]
  block      → []  Saved: [2]
    const 1  → [1]
    br 0       -------\
    const 1           |
    add               ↓
  end          [2, 1] Restore: [2] and push return value (1)
  add        → [3]

loop is similar, however br jumps back to the start of the loop, and only reaching the end of the inner instruction sequence ends it. The following program loops forever.

  loop   ↓↖
    br 0 ↘↗
  end

As mentioned above, locals are stored outside of the stack, and can be retrieved and set from the stack. The following program makes use of a single local to loop 3 times, then exit.

  const 3
  local.set 0
  loop
    local.get 0
    const 1
    sub
    local.set 0

    local.get 0
    br-if 0
  end

When the program starts, our stack is empty and we have one local with value 0. We push 3 to the stack and execute local.set 0, which sets the local with index 0 to the value at the top of the stack. We then enter the loop, where we copy the local back to the stack, subtract 1 and put it back. We then retrieve it once again, and if it is > 0 we go back to the start of our loop. Otherwise, we exit the loop and the program is finished.

Like labels, the amount of locals and their types are static, allowing us to check instructions involving locals are valid before execution.

The restrictions on control flow and variables mean that we are able to know the shape of the current function call’s stack throughout. Therefore, we are able to verify before running the program that:

In the specification, checking that a program satisfies these constraints is referred to as the validation phase (section 3). We will bake these constraints into our instruction types, so that any program that type checks must be valid.

Stacks

There are several different lists/stacks that we want to express the types of. Therefore, we will define a generic way to store them, and a lookup judgement to target a specific element.

  data Stack (T : Set) : Set where
           : Stack T
        _,_ : Stack T  T  Stack T

      data _∋_ {T : Set} : Stack T  T  Set where
        Z :  {Γ A}
            -----------
             Γ , A  A

        S :  {Γ A B}
             Γ  A
            -----------
             Γ , B  A
    

Values

The type of a single value in Wasm can be an integer or float, of either 32 or 64 bits. As mentioned above, we have excluded floating point instructions, so we will only have integers. We will also use arbitrary sized signed integers, for brevity.

  data ValueTy : Set where
        Number : ValueTy
    

The type of the operand stack is just a list of types that are on the stack.

  OpStackTy : Set
      OpStackTy = Stack ValueTy
    

As noted above, Wasm uses structured control flow. When we br or finish with a block, we might expect that block to return something. We make a simple type to say if we need a return value or not.

  data ReturnTy : Set where
        some : ValueTy  ReturnTy
        none : ReturnTy
    

We also define a function to add a return type to the stack type. For none, nothing will be added to the stack, whereas for some we will add the type in question.

  _+ʳ_ : OpStackTy  ReturnTy  OpStackTy
      s  some x = s , x
      s  none   = s
    

Labels

We now need to keep track of the labels in scope for a given instruction. It is invalid for a program to jump out of a label that doesn’t exist - a br instruction outside of our block could not target anything. Similarly, when br instructions are executed the top of the stack must have the correct return type.

We track the labels we can return to as a stack of return types, where each element corresponds to a nested block, and specifies the return type expected by that block.

  CtrlStackTy : Set
      CtrlStackTy = Stack ReturnTy
    

Locals

We will also use indices for locals, which means we can define them the same as our operand stack. The only difference is we won’t be changing the length throughout, only the values.

  LocalsTy : Set
      LocalsTy = Stack ValueTy
    

Context

We now define our context, which tells us what labels and locals are available to use.

  record Context : Set where
        field
          labels : CtrlStackTy
          locals : LocalsTy
    

We also define a convenience function for modifying labels.

  -- Add a new label to the context's labels field
      _+ˡ_ : Context  ReturnTy  Context
      _+ˡ_ Γ ret = record Γ { labels = (Context.labels Γ) , ret }
    

Instructions

There are two things which constrain what instructions we can execute: the context we are in, and the type of our operand stack. Given these two parameters, we can determine what the stack will be at the end of a given instruction.

We say our instructions are valid in a certain context (defined above), and perform a transformation from one operand stack type to another. We will use Γ throughout to refer to the context of an instruction, and Δ to the stack type. An instruction of type Γ ⊢ Δ ⟩ Δ′ is valid in Γ, requires a stack of type Δ, and changes the stack to have type Δ′.

Similarly, sequences of instructions are all valid in a certain context, and may change the stack type. These are mutually recursive, as we will need instruction sequences for instructions like block and loop.

  data _⊢_⟫_ : Context  OpStackTy  OpStackTy  Set
      data _⊢_⟩_ : Context  OpStackTy  OpStackTy  Set
    

Instruction Sequences

Sequences of instructions will all have the same context (as in the specification), and each instruction must leave the stack in the correct state for the next one to execute.

  data _⊢_⟫_ where
        done   :  {Γ Δ}
                  Γ  Δ  Δ

        _then_ :  {Γ A B C}
                  Γ  A  B
                  Γ  B  C
                 -----------
                  Γ  A  C
    

The definitions in the specification all pretend the stack is empty other than the values each instruction acts on. Instruction sequences are then defined so that instructions can ignore the rest of the stack. We instead make all instructions over the bottom of the stack, which is equivalent and slightly simpler.

Parametric Instructions (spec section 3.3.2)

We start by defining some simple operations on the stack.

  data _⊢_⟩_ where
        -- Do nothing
        nop :  {Γ Δ}
                ---------
               Γ  Δ  Δ

        -- Discard the value at the top of the stack
        drop :  {Γ Δ T}
                 -------------
                Γ  Δ , T  Δ

        -- Return the 2nd element of the stack if the top element is 1, otherwise return the 3rd to top.
        select :  {Γ Δ T}
                   ----------------------------------
                  Γ  (Δ , T , T , Number)  (Δ , T)
    

These are all valid in any context, and the stack below the used argument can take any shape.

Numeric Instructions (spec section 3.3.1)

We group numeric operations on numbers by the number of arguments, and allow an agda function on Nat to be passed in. We could instead specify an instruction for each operation, but this would make the code much longer and more repetitive.

    -- Push a constant to a stack
        const :  {Γ Δ}
                 Nat
                  ------------------
                 Γ  Δ  Δ , Number

        -- Consume one operand, returning another number; e.g. clz, ctz, popcnt
        op-unary :  {Γ Δ}
                    (Nat  Nat)
                     ---------------------------
                    Γ  Δ , Number  Δ , Number

        -- Consume two operands, returning another number; e.g. add, sub, mul, ...
        op-binary :  {Γ Δ}
                    (Nat  Nat  Nat)
                     ---------------------------
                    Γ  Δ , Number , Number  Δ , Number

        -- Consume one operand, and return a boolean (represented as a number in Wasm); e.g. eqz
        op-test :  {Γ Δ}
                   (Nat  Bool)
                    ---------------------------
                   Γ  Δ , Number  Δ , Number

        -- Consume two operands, and return a boolean; e.g. lt, lte, gt, gte
        op-rel :  {Γ Δ}
                  (Nat  Nat  Bool)
                   ---------------------------
                  Γ  Δ , Number , Number  Δ , Number
    

A possible improvement would be to have f in op-test and op-rel return Dec T, representing a proof/negation of whatever test or relation is being examined.

Control Instructions (spec section 3.3.5)

These allow us to add and use labels.

The simplest control flow is the block, which adds a new label that we can branch to the end of if needed. The instruction (or sequence of instructions) inside starts with an empty stack, and should finish with only the return value on the stack (if there is one).

    -- Start a block, where `br` instructions inside the block can jump to the end of it
        block :  {Γ Δ r}
                 (Γ  r)      r
                  ---------------------
                 Γ  Δ  Δ  r
    

if is slightly more complex, as we need something at the top of the stack, and two instruction sequences that return the same result type.

    -- If the top of the stack is > 0, execute the first branch. Otherwise, execute the 2nd
        if_else_ :  {Γ Δ r}
                    (Γ  r)      r
                    (Γ  r)      r
                     -----------------------
                    Γ  Δ , Number  Δ  r
    

loop is different - when we branch out of it, we jump to the start instead. Since we’re jumping to the start, we don’t expect anything to be pushed to the stack when we do branch (since the inner instruction sequence starts with an empty stack). We can still return something out of the loop though, which will happen when we reach the end of the instructions without branching.

    -- Start a loop, where `br` instructions inside it can jump back to the start
        loop :  {Γ Δ r}
                 Γ  none      r
                  ------------------------
                 Γ  Δ  Δ  r
    

We now define our branching operators, starting with unconditional branching to any valid label. Note that the stack after this is unconstrained - instructions after br are unreachable so it doesn’t matter.

    -- Jump to the given label unconditionally
        br :  {Γ Δ Δ′ t}
              (Context.labels Γ)  t
               ----------------------
              Γ  Δ  t  Δ′
    

Conditional branching branches only if the top element on the stack is > 0. Since we may continue after this, we constrain the stack so that we just get rid of the condition and the rest remains the same.

    -- If the top element is > 0, jump to the given label
        br-if :  {Γ Δ t}
                 (Context.labels Γ)  t
                  ------------------------------
                 Γ  (Δ  t) , Number  Δ  t
    

Variable Instructions (spec section 3.3.3)

Finally, we need to be able to get and set our local variables. We use lookup judgements to specify what local we’re acting on and its type.

    -- Get the current value of the given local
        local-get :  {Γ Δ t}
                     (Context.locals Γ)  t
                      ----------------------
                     Γ  Δ  Δ , t

        -- Set a new value for the current local from the top of the stack
        local-set :  {Γ Δ t}
                     (Context.locals Γ)  t
                     Γ  Δ , t  Δ

        -- Set a new value for the current local from the top of the stack, and keep that value at the top.
        local-tee :  {Γ Δ t}
                     (Context.locals Γ)  t
                     Γ  Δ , t  Δ , t
    

Writing programs

Now that we have all of our instructions, we can write some example programs.

First we’ll write some aliases for our op-binary instruction.

  add :  {Γ Δ}  Γ  Δ , Number , Number  Δ , Number
      add = op-binary λ{ x y  x + y}

      sub :  {Γ Δ}  Γ  Δ , Number , Number  Δ , Number
      sub = op-binary λ{ x y  x - y}
    

Here’s an example program that calculates 2+2.

  ∅-ctx : Context
      ∅-ctx = record { labels = ; locals =  }

      test-2+2 : ∅-ctx     , Number
      test-2+2 = const 2 then
                 const 2 then
                 add then done
    

A simple if statement:

  test-if : ∅-ctx   , Number   , Number
      test-if = -- Returns 42 if the stack started with a 1, 84 otherwise
                if (
                  const 42 then done
                ) else (
                  const 84 then done
                ) then done
    

An infinite loop:

  test-infloop : ∅-ctx    
      test-infloop = loop (br Z then done) then done
    

And the program from above that loops 3 times:

  test-loop3 : record {labels = ; locals =  , Number}    
      test-loop3 = const 3 then
                   local-set Z then
                   loop (
                       local-get Z then
                       const 1 then
                       sub then
                       local-tee Z then -- equivalent to local-set Z then local-get Z
                       br-if Z then done
                   ) then done
    

It is also impossible to create programs that would fail to execute, for instance the below examples cannot be constructed:

  _ : ∅-ctx ⊢ ∅ ⟫ ∅ , Number
  _ = const 2 then
      add -- Needs two arguments, we only have one!
      then done

  -- _Δ_381 , Number != ∅ of type Stack ValueTy
  -- when checking that the expression add has type
  -- ∅-ctx ⊢ ∅ , Number ⟩ _B_378
  _ : ∅-ctx ⊢ ∅ ⟫ ∅ , Number
  _ = const 42 then
      -- This block must return a number for us to add...
      block (const 1 then
             br-if Z -- ...but it doesnt!
             then done) then
      add then done

  -- Failed to solve the following constraints:
  --   ∅ , Number +ʳ _r_382 = ∅ , Number , Number : Stack ValueTy
  --     (blocked on _r_382)
  --   _Δ_394 +ʳ _r_382 = ∅ +ʳ _r_382 : Stack ValueTy
  --     (blocked on any(_r_382, _Δ_394))
  --   _Δ_394 +ʳ _r_382 = ∅ : Stack ValueTy (blocked on _r_382)

Evaluation

Imports:

module Evaluation where
      open import Agda.Builtin.Nat
      open import Agda.Builtin.List
      open import Agda.Builtin.Bool
      open import Data.Bool.Properties using (_≟_)
      open import Agda.Builtin.Equality
      open import Relation.Nullary
      open import Agda.Builtin.Sigma
        using (Σ)
        renaming (_,_ to _Σ,_)

      open Language
        using (Context; ValueTy; ReturnTy; Stack; OpStackTy; CtrlStackTy; LocalsTy; add; sub; _⊢_⟩_; _⊢_⟫_; _∋_; _+ʳ_; _+ˡ_)

      open ValueTy
      open Stack
      open ReturnTy
      open _⊢_⟩_
      open _⊢_⟫_
      open _∋_

      infix  1 begin_
      infix  2 _—↠_
      infixr 2 _—→⟨_⟩_
      infix  3 _∎
      infixl 4 _—→_
      infixl 6 _,_
    

Now that we are able to write programs and know that they are valid, we want to execute them.

The main challenge here is dealing with control structures: We must ensure that we know where we go back to if we jump back to a label, and we must know that we will always have the correct return type.

To do this, we will seperate each nested instruction sequence into a ‘frame’. We’ll keep track of all the frames above us, so that we can return to them when needed. We will always require that our current list of instructions ends by returning whatever the next frame ‘up’ expects.

Whenever we enter a new control structure, we will save our current ‘frame’ to be continued later, and change to a new empty stack with the block’s inner instructions. When we finish with our current instructions, we know that we will have the correct return type on the stack, and so we will ‘ascend’ up our list of stack frames.

This differs from the method used in the specification, where both operands and labels share the same stack. Our method makes defining reduction rules slightly more difficult, however proving preservation of types is much easier.

First, we define our concrete values and stacks, both dependent on their type as defined above.

  data Value : ValueTy  Set where
        Number : Nat  Value Number
    
  data OpStack : OpStackTy  Set where
            : OpStack 

        _,_  :  {Δ : OpStackTy} {T}
                OpStack Δ
                Value T
                 ---------------
                OpStack (Δ , T)
    

We define the storage of our locals, and helpers for getting/setting them.

  data Locals : LocalsTy  Set where
         : Locals 

        _,_ :  {Δ : LocalsTy} {T}
               Locals Δ
               Value T
                ---------------
               Locals (Δ , T)

      -- Set the local targeted by the given lookup judgement
      _[_]:=_ :  {Δ T}  Locals Δ  Δ  T  Value T  Locals Δ
      (ls , _) [ Z ]:= v = ls , v
      (ls , x) [ S l ]:= v = (ls [ l ]:= v) , x

      -- Get the local targeted by the given lookup judgement
      _[_] :  {Δ T}  Locals Δ  Δ  T  Value T
      (ls , v) [ Z ] = v
      (ls , _) [ S l ] = ls [ l ]
    

We then define our list of previous stack frames, indexed by the type that we expect the current block to return, and our control stack.

Note that the type the current block ends with may not be the same as the top of the control stack. In the case of loops, a br instruction jumps back to the start, and returns/retains none of the stack. However, loop blocks may still return a value once they have executed (reached the end without calling br). Therefore, these need to be separate.

When a program starts, we have no previous frames or labels we can jump to:

  data Frames : ReturnTy  Context  Set where
         :  {r ls}
              -----------------------------------------
             Frames r record {labels = ; locals = ls}
    

A notable limitation here is that programs can end with at most one value on the stack, as we will require they end with the return type indicated here.

We will use this constructor when we enter a block with a return type:

    _,_⦂_ :  {Γ Δ r r′}
                  Frames r Γ
                  Γ  Δ  r′    r
                  OpStack Δ
                   -----------------------------
                  Frames r′ (Γ  r′)
    

Here, r is the return type of the block which we are saving, and r′ is the return type we need before we can return to it. We store the ‘continuation’, which gets the return value, and will finish with whatever its previous frame expects. We also save the current state of the operand stack.

Loop constructs must also store the inner body of the loop, since when we br to this label we will go back to the start of the inner body. When we finish the body normally, we will continue on with the second instruction list. Note that we need the next frame to return some r, but we add the label none, as we mentioned above.

    _,_↬_⦂_ :  {Γ Δ r r′}
               Frames r Γ
               Γ  none      r′ -- Loop body
               Γ  Δ  r′    r    -- Continuation
               OpStack Δ
                ---------------------------
               Frames r′ (Γ  none)
    

Now we define the current state of execution, referred to as a Configuration as in the specification. This includes the current instruction sequence, its operand stack, and our locals.

  data Configuration : Set where
        conf :  {Γ Δ r}
                Γ  Δ    r
                OpStack Δ
                Locals (Context.locals Γ)
                Frames r Γ
                 -------------------------
                Configuration
    

The last thing we need before we can define our reduction rules is the ability to go back up the stack. This will be used when we call br, and must ascend the amount of frames specified, resuming that frame with the given return value and the new locals.

  transplant-ret :  {Δ Δ′ r}  OpStack (Δ  r)  OpStack Δ′  OpStack (Δ′  r)
      transplant-ret {r = some _} (xs , x) m = m , x
      transplant-ret {r = none} s m = m

      br-unwind :  {Γ Δ r v}
                   (Context.labels Γ)  v -- Label/frame to unwind up to
                   OpStack (Δ  v)
                   Locals (Context.locals Γ)
                   Frames r Γ
                    -------------------------
                   Configuration

      -- Base cases - resume the top frame
      br-unwind Z s ls (ss , is  s′) = conf is (transplant-ret s s′) ls ss

      -- For loops - go back to the start and don't remove the frame
      br-unwind Z _ ls ss@(_ , start  cont  s) = conf start  ls ss

      -- Recursive cases
      br-unwind (S l) s ls (ss , _  _) = br-unwind l s ls ss
      br-unwind (S l) s ls (ss , _  _  _) = br-unwind l s ls ss
    

Note that r (the value the next frame up execpts) and v (the value we are returning) are distinct - we may be jumping over our most recent block or may be returning to the top of a loop.

Now, we define our small-step reduction rules, which take one configuration to the other.

  data _—→_ : Configuration  Configuration  Set where
    

Our parametric instructions are easy enough, however we do have to specify a lot of implicit arguments to make Agda happy.

    nop :  {Γ Δ r is s ls ss}
               ---------------------------------------
               conf {Γ} {Δ} {r} (nop then is) s ls ss
                  —→ conf is s ls ss

        drop :  {Γ Δ r is s ls ss X}
                 {x : Value X}
                 -------------------------------------------------
                conf {Γ} {Δ , X} {r} (drop then is) (s , x) ls ss
                   —→ conf is s ls ss

        select-0 :  {Γ Δ r s ls ss T}
                     {a b : Value T}
                     {is : Γ  Δ , T  _}
                     -----------------------------------------------------
                    conf {Γ} {Δ , T , T , Number} {r}
                          (select then is) (s , b , a , Number zero) ls ss
                       —→ conf is (s , a) ls ss

        select-1 :  {Γ Δ r s ls ss T c}
                     {a b : Value T}
                     {is : Γ  Δ , T  _}
                     --------------------------------------------------------
                    conf {Γ} {Δ , T , T , Number} {r}
                          (select then is) (s , b , a , Number (suc c)) ls ss
                       —→ conf is (s , b) ls ss
    

Our numeric instructions simply apply their functions and put the return values on the stack.

Similar to select, we have two cases for op-test and op-rel, which must be chosen based on the output of f x.

    const :  {Γ Δ r is s ls ss x}
                  ------------------------------------------
                 conf {Γ} {Δ} {r} (const x then is) s ls ss
                    —→ conf is (s , Number x) ls ss

        op-unary :  {Γ Δ r is s ls ss f x}
                     -------------------------------------------------------------------
                    conf {Γ} {Δ , Number} {r} (op-unary f then is) (s , Number x) ls ss
                       —→ conf is (s , Number (f x)) ls ss

        op-binary :  {Γ Δ r is s ls ss f x y}
                     ----------------------------------------------------------------------------------------
                    conf {Γ} {Δ , Number , Number} {r} (op-binary f then is) (s , Number x , Number y) ls ss
                       —→ conf is (s , Number (f x y)) ls ss

        op-testᵀ :  {Γ Δ r is s ls ss f x}
                    f x  true
                     ------------------------------------------------------------------
                    conf {Γ} {Δ , Number} {r} (op-test f then is) (s , Number x) ls ss
                       —→ conf is (s , Number 1) ls ss

        op-testꟳ :  {Γ Δ r is s ls ss f x}
                    ¬ (f x  true)
                     ------------------------------------------------------------------
                    conf {Γ} {Δ , Number} {r} (op-test f then is) (s , Number x) ls ss
                       —→ conf is (s , Number 0) ls ss

        op-relᵀ :  {Γ Δ r is s ls ss f x y}
                    f x y  true
                     -------------------------------------------------------------------------------------
                    conf {Γ} {Δ , Number , Number} {r} (op-rel f then is) (s , Number x , Number y) ls ss
                       —→ conf is (s , Number 1) ls ss

        op-relꟳ :  {Γ Δ r is s ls ss f x y}
                    ¬ (f x y  true)
                     -------------------------------------------------------------------------------------
                    conf {Γ} {Δ , Number , Number} {r} (op-rel f then is) (s , Number x , Number y) ls ss
                       —→ conf is (s , Number 0) ls ss
    

As described above, when we enter a block we save the current state in our list of previous frames, and start with an empty stack. We know that the inner instruction sequence will have the correct end state because of how we define our block instruction, but Agda needs us to spell it out anyway.

    block :  {Γ Δ r r′ s ls ss}
                  {is : Γ  Δ  r′    r}
                  {is′ : Γ  r′      r′}
                  -------------------------------------------
                 conf {Γ} {Δ} {r} (block is′ then is) s ls ss
                      —→ conf is′  ls (ss , is  s)

        loop  :  {Γ Δ r r′ s ls ss}
                  {is : Γ  Δ  r′    r}
                  {is′ : Γ  none      r′}
                  ------------------------------------------------------------
                 conf (loop is′ then is) s ls ss —→ conf is′  ls (ss , is′  is  s)

        ifᵀ :  {Γ Δ n r r′ s ls ss}
                {is : Γ  Δ  r′    r}
                {ts : Γ  r′      r′}
                {es : Γ  r′      r′}
                ----------------------------------------------------------------------------
               conf {Γ} {Δ , Number} {r} (if ts else es then is) (s , Number (suc n)) ls ss
                    —→ conf ts  ls (ss , is  s)

        ifꟳ :  {Γ Δ r r′ s ls ss}
                {is : Γ  Δ  r′    r}
                {ts : Γ  r′      r′}
                {es : Γ  r′      r′}
                -------------------------------------------------------------------------
               conf {Γ} {Δ , Number} {r} (if ts else es then is) (s , Number zero) ls ss
                    —→ conf es  ls (ss , is  s)
    

br uses the br-unwind function defined above.

    br    :  {Γ Δ Δ′ r ls ss v}
                  {lbl : (Context.labels Γ)  v}
                  {s : OpStack (Δ  v)}
                  {is : Γ  Δ′    r}
                  ------------------------------------------------------------------
                 conf {Γ} {Δ  v} {r} (br lbl then is) s ls ss
                       —→ (br-unwind lbl s ls ss)
    

br-if is similar, again with two cases for zero and non-zero.

    br-ifᵀ :  {Γ Δ r ls ss v x}
                   {lbl : (Context.labels Γ)  v}
                   {s : OpStack (Δ  v)}
                   {is : Γ  Δ  v    r}
                   -------------------------------------------------------------------------------
                  conf {Γ} {(Δ  v) , Number} {r} (br-if lbl then is) (s , Number (suc x)) ls ss
                       —→ (br-unwind lbl s ls ss)

        br-ifꟳ :  {Γ Δ r ls ss v}
                   {lbl : (Context.labels Γ)  v}
                   {s : OpStack (Δ  v)}
                   {is : Γ  Δ  v    r}
                   ----------------------------------------------------------------------------
                  conf {Γ} {(Δ  v) , Number} {r} (br-if lbl then is) (s , Number zero) ls ss
                       —→ conf is s ls ss
    

Now we define our local instructions, which only affect the locals part of our configuration.

    local-get :  {Γ Δ r T is s ls ss}
                      {l : (Context.locals Γ)  T}
                     conf {Γ} {Δ} {r} (local-get l then is) s ls ss
                          —→ conf is (s , ls [ l ]) ls ss

        local-set :  {Γ Δ r T is s ls ss}
                      {l : (Context.locals Γ)  T}
                      {v : Value T}
                     conf {Γ} {Δ , T} {r} (local-set l then is) (s , v) ls ss
                          —→ conf is s (ls [ l ]:= v) ss

        local-tee :  {Γ Δ r T is s ls ss}
                      {l : (Context.locals Γ)  T}
                      {v : Value T}
                     conf {Γ} {Δ , T} {r} (local-tee l then is) (s , v) ls ss
                          —→ conf is (s , v) (ls [ l ]:= v) ss
    

Finally, we need to be able to go back up to our previous frames once we’ve completed our current list of instructions.

    ascend :  {Γ r ls Δ′ r′ cont s s′ ss′}
                   --------------------------------------------------------------------
                  conf {Γ  r} {  r} {r} done s ls (ss′ , cont  s′)
                       —→ conf {Γ} {Δ′  r} {r′} cont (transplant-ret s s′) ls ss′

        ascendˡ :  {Γ r ls Δ′ r′ s s′ ss′ start cont}
                    --------------------------------------------------------------------------
                   conf {Γ  none} {  r} {r} done s ls (ss′ , start  cont  s′)
                        —→ conf {Γ} {Δ′  r} {r′} cont (transplant-ret s s′) ls ss′
    

Progress

Now that we have our reduction rules, we can prove progress: Either all of our instructions are finished, or we can reduce further. Proving progress is relatively straightforward, since we mostly only care about what instruction is up next.

  data Progress : Configuration  Set where
         done :  {r lsty}
                  {ls : Locals lsty}
                  {s : OpStack (  r)}
                  --------------------------
                 Progress (conf done s ls ( {r}))

         step :  {C C′}
                 C —→ C′
                  ----------
                 Progress C

      progress :  (c : Configuration)
                   -------------------
                  Progress c
    

Here are our parametric instructions, with different cases for the branch we take when selecting. To make things clearer, we will use _ for all the parts of the configuration that aren’t modified by the reduction step.

  progress (conf (nop then is) _ _ _) = step nop
      progress (conf (drop then is) (_ , v) _ _) = step drop
      progress (conf (select then is) (_ , _ , a , Number zero) _ _) = step select-0
      progress (conf (select then is) (_ , b , _ , Number (suc _)) _ _) = step select-1
    
  progress (conf (local-get i then _) _ ls _) = step local-get
      progress (conf (local-set i then _) (_ , v) ls _) = step local-set
      progress (conf (local-tee i then _) (_ , v) ls _) = step local-tee
    

For op-test and op-rel we must decide if f x is true ourselves.

  progress (conf (const x then is) s _ _) = step const
      progress (conf (op-unary f then is) (_ , Number n) _ _) = step op-unary
      progress (conf (op-binary f then is) (_ , Number x , Number y) _ _) = step op-binary
      progress (conf (op-test f then is) (_ , Number x) _ _) with (f x)  true
      ... | (yes pass) = step (op-testᵀ pass)
      ... | (no fail) = step (op-testꟳ fail)
      progress (conf (op-rel f then is) (_ , Number x , Number y) _ _) with (f x y)  true
      ... | (yes pass) = step (op-relᵀ pass)
      ... | (no fail) = step (op-relꟳ fail)
    

We may deliberately ascend up the frame stack:

  progress (conf (block is′ then is) s _ ss) = step block
      progress (conf (loop is′ then is) s _ ss) = step loop

      progress (conf (if_else_ ts _ then is) (_ , Number (suc n)) _ ss) = step ifᵀ
      progress (conf (if_else_ _ fs then is) (_ , Number zero) _ ss) = step ifꟳ

      progress (conf (br lbl then is) s _ ss) = step br
      progress (conf (br-if lbl then is) (_ , Number (suc _)) _ ss) = step br-ifᵀ
      progress (conf (br-if lbl then is) (_ , Number zero) _ ss) = step br-ifꟳ
    

Finally, if we run out of instructions in the current frame we must try to use the next one up. Again, we know that we will have the return type when we need it thanks to the constraints in Configuration.

  progress (conf done _ _ (ss′ , is′  s′)) = step ascend
      progress (conf done _ _ (ss′ , ls  is′  s′)) = step ascendˡ
    

We are only done when all instructions are executed, and we can’t go up any more stack frames.

  progress (conf done s _ ) = done
    

Thanks to the constraints on our instruction type and configuration types, we have also proved the type system’s soundness: Types are respected throughout the whole run of our program, and the stack will always be transformed as expected.

Evaluation

We now adapt our proof of progress to be used as an evaluator.

First, we define big-step reductions as the reflexive and transitive closure of reductions, with some syntax to make writing them nicer.

  data _—↠_ : Configuration  Configuration  Set where

        _∎ : (c : Configuration)
             -------------------
            c —↠ c

        step—→ :  {B C}
                  (A : Configuration)
                  B —↠ C
                  A —→ B
                   -------
                  A —↠ C

      pattern _—→⟨_⟩_ A A—→B B—↠C = step—→ A B—↠C A—→B

      begin_ :  {A B}
                A —↠ B
                 ------
                A —↠ B
      begin_ A—↠B = A—↠B
    

This allows us to write chains of reductions, similar to an execution trace. For example, here is how the 2+2 program from earlier might be executed.

  _ : conf Language.test-2+2    —↠ conf done ( , Number 4)  
      _ = begin
            conf (const 2 then
                  const 2 then
                  add then done)   ( {some Number})
          —→⟨ const 
            conf (const 2 then
                  add then done) ( , Number 2)  
          —→⟨ const 
            conf (add then done) ( , Number 2 , Number 2)  
          —→⟨ op-binary 
            conf done ( , Number 4)  
          
    

We run our evaluator with a set amount of gas, which keeps Agda’s termination checker happy.

  record Gas : Set where
        constructor gas
        field
          amount : Nat

      data Finished (c : Configuration) : Set where
         done :  {r s′ LS ls′}
            c —↠ conf {record {labels = ; locals = LS}} {  r} {r} done s′ ls′ 
             ----------
            Finished c

         out-of-gas :
             ----------
             Finished c

      eval : Gas  (C : Configuration)  Finished C
      eval (gas zero) C   = out-of-gas
      eval (gas (suc n)) C with progress C
      ... | done          = done (C )
      ... | step {_} {C′} C→C′ with eval (gas n) C′
      ...    | done C′↠D  = done (C —→⟨ C→C′  C′↠D)
      ...    | out-of-gas = out-of-gas
    

Now we can evaluate the programs we defined earlier.

  -- 2 + 2 = 4
      _ : eval (gas 100) (conf Language.test-2+2   )  done (
              conf (const 2 then const 2 then op-binary  x y  x + y) then done)   
            —→⟨ const 
              conf (const 2 then op-binary  x y  x + y) then done) ( , Number 2)  
            —→⟨ const 
              conf (op-binary  x y  x + y) then done) ( , Number 2 , Number 2)  
            —→⟨ op-binary 
              conf done ( , Number 4)  
            )
      _ = refl

      -- if false then 42 else 84 -> 84
      _ : eval (gas 100) (conf Language.test-if ( , Number 0)  )  done (
              conf ((if const 42 then done else (const 84 then done)) then done)
                   ( , Number 0)
                   
                   
            —→⟨ ifꟳ 
              conf (const 84 then done)
                   
                   
                   ( , done  )
            —→⟨ const 
              conf done
                   ( , Number 84)
                   
                   ( , done  )
            —→⟨ ascend 
              conf done
                   ( , Number 84)
                   
                   
            )
      _ = refl

      -- if true then 42 else 84 -> 42
      _ : eval (gas 100) (conf Language.test-if ( , Number 1)  )  done (
              conf ((if const 42 then done else (const 84 then done)) then done)
                   ( , Number 1)
                   
                   
            —→⟨ ifᵀ 
              conf (const 42 then done)
                   
                   
                   ( , done  )
            —→⟨ const 
              conf done
                   ( , Number 42)
                   
                   ( , done  )
            —→⟨ ascend 
              conf done
                   ( , Number 42)
                   
                   
            )
      _ = refl

      -- infinite loop will never finish
      _ : eval (gas 100) (conf Language.test-infloop   )  out-of-gas
      _ = refl

      -- loop 3 times
      _ : eval (gas 100) (conf Language.test-loop3  ( , Number 0) )  done (
              conf (const 3 then local-set Z then loop (local-get Z then const 1 then op-binary  x y  x - y) then local-tee Z then br-if Z then done) then done)
                   
                   ( , Number 0)
                   
            —→⟨ const 
              conf
              (local-set Z then
              loop
              (local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done)
              then done)
              ( , Number 3) ( , Number 0) 
            —→⟨ local-set 
              conf
              (loop
              (local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done)
              then done)
               ( , Number 3) 
            —→⟨ loop  -- Loop 1
              conf
              (local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done)
               ( , Number 3)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ local-get 
              conf
              (const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done)
              ( , Number 3) ( , Number 3)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ const 
              conf
              (op-binary  x y  x - y) then local-tee Z then br-if Z then done)
              ( , Number 3 , Number 1) ( , Number 3)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ op-binary 
              conf (local-tee Z then br-if Z then done) ( , Number 2)
              ( , Number 3)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ local-tee 
              conf (br-if Z then done) ( , Number 2) ( , Number 2)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ br-ifᵀ  -- Loop 2
              conf
              (local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done)
               ( , Number 2)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ local-get 
              conf
              (const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done)
              ( , Number 2) ( , Number 2)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ const 
              conf
              (op-binary  x y  x - y) then local-tee Z then br-if Z then done)
              ( , Number 2 , Number 1) ( , Number 2)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ op-binary 
              conf (local-tee Z then br-if Z then done) ( , Number 1)
              ( , Number 2)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ local-tee 
              conf (br-if Z then done) ( , Number 1) ( , Number 1)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ br-ifᵀ  -- Loop 3
              conf
              (local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done)
               ( , Number 1)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ local-get 
              conf
              (const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done)
              ( , Number 1) ( , Number 1)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ const 
              conf
              (op-binary  x y  x - y) then local-tee Z then br-if Z then done)
              ( , Number 1 , Number 1) ( , Number 1)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ op-binary 
              conf (local-tee Z then br-if Z then done) ( , Number 0)
              ( , Number 1)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ local-tee 
              conf (br-if Z then done) ( , Number 0) ( , Number 0)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ br-ifꟳ 
              conf done  ( , Number 0)
              ( ,
              local-get Z then
              const 1 then
              op-binary  x y  x - y) then local-tee Z then br-if Z then done
               done  )
            —→⟨ ascendˡ 
              conf done  ( , Number 0) 
            )
      _ = refl
    

As a final test, here is a program for calculating the fibonacci sequence.

  fib : record {labels = ; locals =  , Number , Number , Number}   , Number   , Number
      fib = local-set (S (S Z)) then -- n
            const 0 then
            local-set (S Z) then -- previous
            const 1 then
            local-set Z then -- acc
            loop (
              -- load previous and acc
              local-get (S Z) then
              local-get Z then

              -- previous = acc
              local-tee (S Z) then

              -- acc = acc + previous
              add then
              local-set Z then

              -- subtract from loop variable
              local-get (S (S Z)) then
              const 1 then
              sub then
              local-tee (S (S Z)) then
              br-if Z then done
            ) then
            local-get Z then done
    

We avoid giving the full reduction sequence for length reasons.

  -- 7th fibonacci number = 21
      _ : eval (gas 100) (conf fib ( , Number 7) ( , Number 0 , Number 0 , Number 0) )  done {s′ =  , Number 21} _
      _ = refl
    

Future Work

This section outlines how you could extend this base to cover the rest of Wasm 1.0. Some of the work has been started in this repository, but it’s a bit of a mess. This is more of a notes section than actual exposition.

Modules

In actual WebAssembly, all code is executed in the context of a Module (spec version 2.5). To facilitate this, we can create a seperate type (ModuleTy) and parameterise our context and configuration based on that. We’ll refer to this as μ below.

Modules also require keeping track of some state that is persistent between function calls (memories, globals, etc). This means you’d probably end up with the following things to keep track of during execution/checking:

  1. ModuleTy - what functions/globals/memories/etc are available
  2. GlobalStore - values of globals, memories, etc, in the shape specified by ModuleTy
  3. Context - types of locals, brpoints, and return type
  4. Frames - actual data for previous control flow blocks, in the shape specified by Context
  5. LocalStore - locals and other function-specific values, in the shape specified by Context
  6. OpStack - the operand stack
  7. is - remaining instructions in the current control flow block

Some of these could probably be compacted together to make things less unweildy.

In our repository, these are specified in src/Wasm/Types.agda and src/Wasm/Configuration.agda.

Function calls

Function calls can be added in a similar way to frames, by adding a new constructor to Frames, like this:

data Frames where
  _,save_⨾_⨾_ : ∀ {μ Δ ε r r′ ls} {Γ : Context μ}
                → Frames μ r Γ
                → Locals Γ
                → OpStack Δ
                → Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r
                  -------------------
                → Frames r′ record {labels = ∅; locals = ls; ret = r′} ε

We also need to add a ret field to our Context, to keep track of how we can use the return instruction.

We extend our instructions like this:

data _⊢_⟩_ where
  return : ∀ {μ Δ Δ′ t} {Γ : Context μ}
           → (Context.ret Γ) ≡ just t
             -------------------
           → Γ ⊢ Δ +ʳ t ⟩ Δ′

  call : ∀ {μ Δ A L R} {Γ : Context μ}
         → (f : (ModuleTy.functions μ) ∋v record { args = A; locals = L; ret = R } )
         → (args : Δ ∋* A)
         → Γ ⊢ Δ ⟩ (discard args) +ʳ R

Where discard x removes length x elements from the top of the stack. Our return and ascend-call reduction steps will be pretty much the same as the existing br and ascend steps.

Validation & Instantiation

Validation works pretty much like the spec sets out: Mostly this is just a bit tedious and based on applying the same few checks in different ways.

Instantiation is more difficult, as we essentially need to execute with a partially initialised module when instantiating our globals. Probably it would look something like this:

  1. Infer our module type by looking at the serialised types, etc.
  2. Validate and execute the global initialisation code under a mostly-empty module as described by the spec.
  3. Validate the rest of the functions instructions.
  4. Compile this into a valid configuration, currently on the start function.
  5. Run the start function, and return both the module type and the persistent state (memories, mutable globals, etc)

There’s a vague start at this in our repository at src/Wasm/Validation.agda, using the existing haskell-wasm library and some FFI bindings in src/Wasm/Untyped.agda.

Bibliography

W3C. (2019, December 5). WebAssembly Core Specification. https://www.w3.org/TR/wasm-core-1/ (Accessed 11th Aug 2023)

Watt, C., Rao, X., Pichon-Pharabod, J., Bodin, M., Gardner, P. (2021). Two Mechanisations of WebAssembly 1.0. In: Huisman, M., Păsăreanu, C., Zhan, N. (eds) Formal Methods. FM 2021. Lecture Notes in Computer Science(), vol 13047. Springer, Cham. https://doi.org/10.1007/978-3-030-90870-6_4 (Accessed 11th Aug 2023)