Mechanisation of WebAssembly 1.0 (in Agda)
module Essay where
This essay formalises parts of the WebAssembly 1.0 standard, a safe and portable bytecode format.
WebAssembly (or Wasm) is a stack-machine based language that uses structured control flow. It was designed to allow certain memory safety guarantees about a program to be checked before the program runs, so that checks don’t slow down the program at runtime.
We formalise a limited subset of WebAssembly that demonstrates its most important aspects. Specifically, we exclude:
- Floating point types and operations
- Functions
- Globals
- Memories
- Tables
- Data segments
- Imports, Exports, and the module system
- Traps
Many of these features, such as floating point operations, would be formalised almost identically to features we have implemented. For the others, there are some notes on how these could be implemented at the end.
We provide an instruction type that enforces the validation requirements described in section 3 of the specification, hence formalising the validation step. We then formalise the reduction rules found in section 4 of the specification, and prove that they preserve types.
We take inspiration from Watt, 2021 in how we formalise instructions and state configurations.
I originally wrote this essay for Philip Wadler’s TSPL class. I did a little extra work on it after, but am probably not going to finish it anytime soon, so hopefully it will be useful to other people.
Language
Imports:
module Language where open import Agda.Builtin.Nat open import Agda.Builtin.List open import Agda.Builtin.Bool infix 4 _⊢_⟩_ infix 4 _⊢_⟫_ infixr 4 _then_ infix 5 _∋_ infixl 6 _,_ infixl 6 _+ˡ_ infixl 6 _+ʳ_
Conventional assembly (such as MIPS or x86) uses a number of
registers to store working values, with operations such as
add
working on those registers. Local variables are
then stored somewhere in RAM, usually on the stack.
By contrast, WebAssembly (abbreviated to Wasm) has no registers. Instructions pop their non-constant arguments off of an operand stack, and push (return) their result back onto it. Instructions can only act on the top elements of the stack, therefore local variables are stored off of the operand stack and retrieved/set as needed.
For example, the following program calculates 2+2 and leaves the result on the operand stack:
Stack: []
const 2 -> [2]
const 2 -> [2, 2]
add -> [4]
Control flow is structured, meaning you cannot jump to arbitrary
locations. Instructions such as block
and
loop
nest instruction sequences recursively. The inner
instruction sequence acts on an empty stack, and optionally has a
return type that it must have at the top of the stack when it is
done.
br
and br-if
instructions can jump to
any of the control structures they are inside. If the target of the
jump is a block
, execution skips to after that block,
and if it is a loop
, then back to the start.
In the following program, we use br
to skip over
some instructions. In this case, the block has a return value
(Number), which must be at the top of the stack when we return or
finish.
→ []
const 2 → [2]
block → [] Saved: [2]
const 1 → [1]
br 0 -------\
const 1 |
add ↓
end [2, 1] Restore: [2] and push return value (1)
add → [3]
loop
is similar, however br
jumps back
to the start of the loop, and only reaching the end of the inner
instruction sequence ends it. The following program loops
forever.
loop ↓↖
br 0 ↘↗
end
As mentioned above, locals are stored outside of the stack, and can be retrieved and set from the stack. The following program makes use of a single local to loop 3 times, then exit.
const 3
local.set 0
loop
local.get 0
const 1
sub
local.set 0
local.get 0
br-if 0
end
When the program starts, our stack is empty and we have one local
with value 0
. We push 3
to the stack and
execute local.set 0
, which sets the local with index
0
to the value at the top of the stack. We then enter
the loop, where we copy the local back to the stack, subtract 1 and
put it back. We then retrieve it once again, and if it is
> 0
we go back to the start of our loop. Otherwise,
we exit the loop and the program is finished.
Like labels, the amount of locals and their types are static, allowing us to check instructions involving locals are valid before execution.
The restrictions on control flow and variables mean that we are able to know the shape of the current function call’s stack throughout. Therefore, we are able to verify before running the program that:
- For every instruction, the stack must have the correct number of operands and the correct operand types at the top
- For
br
instructions, the targeted label must exist, and we must be returning the correct type - Blocks must always return the correct type (if there is one)
- Local stores and loads must always reference a local that exists, and take/return the correct type.
In the specification, checking that a program satisfies these constraints is referred to as the validation phase (section 3). We will bake these constraints into our instruction types, so that any program that type checks must be valid.
Stacks
There are several different lists/stacks that we want to express the types of. Therefore, we will define a generic way to store them, and a lookup judgement to target a specific element.
data Stack (T : Set) : Set where ∅ : Stack T _,_ : Stack T → T → Stack T data _∋_ {T : Set} : Stack T → T → Set where Z : ∀ {Γ A} ----------- → Γ , A ∋ A S : ∀ {Γ A B} → Γ ∋ A ----------- → Γ , B ∋ A
Values
The type of a single value in Wasm can be an integer or float, of either 32 or 64 bits. As mentioned above, we have excluded floating point instructions, so we will only have integers. We will also use arbitrary sized signed integers, for brevity.
data ValueTy : Set where Number : ValueTy
The type of the operand stack is just a list of types that are on the stack.
OpStackTy : Set OpStackTy = Stack ValueTy
As noted above, Wasm uses structured control flow. When we
br
or finish with a block, we might expect that block
to return something. We make a simple type to say if we need a
return value or not.
data ReturnTy : Set where some : ValueTy → ReturnTy none : ReturnTy
We also define a function to add a return type to the stack type.
For none
, nothing will be added to the stack, whereas
for some
we will add the type in question.
_+ʳ_ : OpStackTy → ReturnTy → OpStackTy s +ʳ some x = s , x s +ʳ none = s
Labels
We now need to keep track of the labels in scope for a given
instruction. It is invalid for a program to jump out of a label that
doesn’t exist - a br
instruction outside of our
block
could not target anything. Similarly, when
br
instructions are executed the top of the stack must
have the correct return type.
We track the labels we can return to as a stack of return types, where each element corresponds to a nested block, and specifies the return type expected by that block.
CtrlStackTy : Set CtrlStackTy = Stack ReturnTy
Locals
We will also use indices for locals, which means we can define them the same as our operand stack. The only difference is we won’t be changing the length throughout, only the values.
LocalsTy : Set LocalsTy = Stack ValueTy
Context
We now define our context, which tells us what labels and locals are available to use.
record Context : Set where field labels : CtrlStackTy locals : LocalsTy
We also define a convenience function for modifying labels.
-- Add a new label to the context's labels field _+ˡ_ : Context → ReturnTy → Context _+ˡ_ Γ ret = record Γ { labels = (Context.labels Γ) , ret }
Instructions
There are two things which constrain what instructions we can execute: the context we are in, and the type of our operand stack. Given these two parameters, we can determine what the stack will be at the end of a given instruction.
We say our instructions are valid in a certain context (defined
above), and perform a transformation from one operand stack type to
another. We will use Γ
throughout to refer to the
context of an instruction, and Δ
to the stack type. An
instruction of type Γ ⊢ Δ ⟩ Δ′
is valid in
Γ
, requires a stack of type Δ
, and changes
the stack to have type Δ′
.
Similarly, sequences of instructions are all valid in a certain
context, and may change the stack type. These are mutually
recursive, as we will need instruction sequences for instructions
like block
and loop
.
data _⊢_⟫_ : Context → OpStackTy → OpStackTy → Set data _⊢_⟩_ : Context → OpStackTy → OpStackTy → Set
Instruction Sequences
Sequences of instructions will all have the same context (as in the specification), and each instruction must leave the stack in the correct state for the next one to execute.
data _⊢_⟫_ where done : ∀ {Γ Δ} → Γ ⊢ Δ ⟫ Δ _then_ : ∀ {Γ A B C} → Γ ⊢ A ⟩ B → Γ ⊢ B ⟫ C ----------- → Γ ⊢ A ⟫ C
The definitions in the specification all pretend the stack is empty other than the values each instruction acts on. Instruction sequences are then defined so that instructions can ignore the rest of the stack. We instead make all instructions over the bottom of the stack, which is equivalent and slightly simpler.
Parametric
Instructions (spec section 3.3.2
)
We start by defining some simple operations on the stack.
data _⊢_⟩_ where -- Do nothing nop : ∀ {Γ Δ} --------- → Γ ⊢ Δ ⟩ Δ -- Discard the value at the top of the stack drop : ∀ {Γ Δ T} ------------- → Γ ⊢ Δ , T ⟩ Δ -- Return the 2nd element of the stack if the top element is 1, otherwise return the 3rd to top. select : ∀ {Γ Δ T} ---------------------------------- → Γ ⊢ (Δ , T , T , Number) ⟩ (Δ , T)
These are all valid in any context, and the stack below the used argument can take any shape.
Numeric
Instructions (spec section 3.3.1
)
We group numeric operations on numbers by the number of
arguments, and allow an agda function on Nat
to be
passed in. We could instead specify an instruction for each
operation, but this would make the code much longer and more
repetitive.
-- Push a constant to a stack const : ∀ {Γ Δ} → Nat ------------------ → Γ ⊢ Δ ⟩ Δ , Number -- Consume one operand, returning another number; e.g. clz, ctz, popcnt op-unary : ∀ {Γ Δ} → (Nat → Nat) --------------------------- → Γ ⊢ Δ , Number ⟩ Δ , Number -- Consume two operands, returning another number; e.g. add, sub, mul, ... op-binary : ∀ {Γ Δ} → (Nat → Nat → Nat) --------------------------- → Γ ⊢ Δ , Number , Number ⟩ Δ , Number -- Consume one operand, and return a boolean (represented as a number in Wasm); e.g. eqz op-test : ∀ {Γ Δ} → (Nat → Bool) --------------------------- → Γ ⊢ Δ , Number ⟩ Δ , Number -- Consume two operands, and return a boolean; e.g. lt, lte, gt, gte op-rel : ∀ {Γ Δ} → (Nat → Nat → Bool) --------------------------- → Γ ⊢ Δ , Number , Number ⟩ Δ , Number
A possible improvement would be to have f
in
op-test
and op-rel
return
Dec T
, representing a proof/negation of whatever test
or relation is being examined.
Control Instructions (spec section 3.3.5)
These allow us to add and use labels.
The simplest control flow is the block, which adds a new label that we can branch to the end of if needed. The instruction (or sequence of instructions) inside starts with an empty stack, and should finish with only the return value on the stack (if there is one).
-- Start a block, where `br` instructions inside the block can jump to the end of it block : ∀ {Γ Δ r} → (Γ +ˡ r) ⊢ ∅ ⟫ ∅ +ʳ r --------------------- → Γ ⊢ Δ ⟩ Δ +ʳ r
if
is slightly more complex, as we need something at
the top of the stack, and two instruction sequences that return the
same result type.
-- If the top of the stack is > 0, execute the first branch. Otherwise, execute the 2nd if_else_ : ∀ {Γ Δ r} → (Γ +ˡ r) ⊢ ∅ ⟫ ∅ +ʳ r → (Γ +ˡ r) ⊢ ∅ ⟫ ∅ +ʳ r ----------------------- → Γ ⊢ Δ , Number ⟩ Δ +ʳ r
loop
is different - when we branch out of it, we
jump to the start instead. Since we’re jumping to the start, we
don’t expect anything to be pushed to the stack when we do branch
(since the inner instruction sequence starts with an empty stack).
We can still return something out of the loop though, which will
happen when we reach the end of the instructions without
branching.
-- Start a loop, where `br` instructions inside it can jump back to the start loop : ∀ {Γ Δ r} → Γ +ˡ none ⊢ ∅ ⟫ ∅ +ʳ r ------------------------ → Γ ⊢ Δ ⟩ Δ +ʳ r
We now define our branching operators, starting with
unconditional branching to any valid label. Note that the stack
after this is unconstrained - instructions after br
are
unreachable so it doesn’t matter.
-- Jump to the given label unconditionally br : ∀ {Γ Δ Δ′ t} → (Context.labels Γ) ∋ t ---------------------- → Γ ⊢ Δ +ʳ t ⟩ Δ′
Conditional branching branches only if the top element on the stack is > 0. Since we may continue after this, we constrain the stack so that we just get rid of the condition and the rest remains the same.
-- If the top element is > 0, jump to the given label br-if : ∀ {Γ Δ t} → (Context.labels Γ) ∋ t ------------------------------ → Γ ⊢ (Δ +ʳ t) , Number ⟩ Δ +ʳ t
Variable Instructions (spec section 3.3.3)
Finally, we need to be able to get and set our local variables. We use lookup judgements to specify what local we’re acting on and its type.
-- Get the current value of the given local local-get : ∀ {Γ Δ t} → (Context.locals Γ) ∋ t ---------------------- → Γ ⊢ Δ ⟩ Δ , t -- Set a new value for the current local from the top of the stack local-set : ∀ {Γ Δ t} → (Context.locals Γ) ∋ t → Γ ⊢ Δ , t ⟩ Δ -- Set a new value for the current local from the top of the stack, and keep that value at the top. local-tee : ∀ {Γ Δ t} → (Context.locals Γ) ∋ t → Γ ⊢ Δ , t ⟩ Δ , t
Writing programs
Now that we have all of our instructions, we can write some example programs.
First we’ll write some aliases for our op-binary
instruction.
add : ∀ {Γ Δ} → Γ ⊢ Δ , Number , Number ⟩ Δ , Number add = op-binary λ{ x y → x + y} sub : ∀ {Γ Δ} → Γ ⊢ Δ , Number , Number ⟩ Δ , Number sub = op-binary λ{ x y → x - y}
Here’s an example program that calculates 2+2.
∅-ctx : Context ∅-ctx = record { labels = ∅; locals = ∅ } test-2+2 : ∅-ctx ⊢ ∅ ⟫ ∅ , Number test-2+2 = const 2 then const 2 then add then done
A simple if statement:
test-if : ∅-ctx ⊢ ∅ , Number ⟫ ∅ , Number test-if = -- Returns 42 if the stack started with a 1, 84 otherwise if ( const 42 then done ) else ( const 84 then done ) then done
An infinite loop:
test-infloop : ∅-ctx ⊢ ∅ ⟫ ∅ test-infloop = loop (br Z then done) then done
And the program from above that loops 3 times:
test-loop3 : record {labels = ∅; locals = ∅ , Number} ⊢ ∅ ⟫ ∅ test-loop3 = const 3 then local-set Z then loop ( local-get Z then const 1 then sub then local-tee Z then -- equivalent to local-set Z then local-get Z br-if Z then done ) then done
It is also impossible to create programs that would fail to execute, for instance the below examples cannot be constructed:
_ : ∅-ctx ⊢ ∅ ⟫ ∅ , Number
_ = const 2 then
add -- Needs two arguments, we only have one!
then done
-- _Δ_381 , Number != ∅ of type Stack ValueTy
-- when checking that the expression add has type
-- ∅-ctx ⊢ ∅ , Number ⟩ _B_378
_ : ∅-ctx ⊢ ∅ ⟫ ∅ , Number
_ = const 42 then
-- This block must return a number for us to add...
block (const 1 then
br-if Z -- ...but it doesnt!
then done) then
add then done
-- Failed to solve the following constraints:
-- ∅ , Number +ʳ _r_382 = ∅ , Number , Number : Stack ValueTy
-- (blocked on _r_382)
-- _Δ_394 +ʳ _r_382 = ∅ +ʳ _r_382 : Stack ValueTy
-- (blocked on any(_r_382, _Δ_394))
-- _Δ_394 +ʳ _r_382 = ∅ : Stack ValueTy (blocked on _r_382)
Evaluation
Imports:
module Evaluation where open import Agda.Builtin.Nat open import Agda.Builtin.List open import Agda.Builtin.Bool open import Data.Bool.Properties using (_≟_) open import Agda.Builtin.Equality open import Relation.Nullary open import Agda.Builtin.Sigma using (Σ) renaming (_,_ to _Σ,_) open Language using (Context; ValueTy; ReturnTy; Stack; OpStackTy; CtrlStackTy; LocalsTy; add; sub; _⊢_⟩_; _⊢_⟫_; _∋_; _+ʳ_; _+ˡ_) open ValueTy open Stack open ReturnTy open _⊢_⟩_ open _⊢_⟫_ open _∋_ infix 1 begin_ infix 2 _—↠_ infixr 2 _—→⟨_⟩_ infix 3 _∎ infixl 4 _—→_ infixl 6 _,_
Now that we are able to write programs and know that they are valid, we want to execute them.
The main challenge here is dealing with control structures: We must ensure that we know where we go back to if we jump back to a label, and we must know that we will always have the correct return type.
To do this, we will seperate each nested instruction sequence into a ‘frame’. We’ll keep track of all the frames above us, so that we can return to them when needed. We will always require that our current list of instructions ends by returning whatever the next frame ‘up’ expects.
Whenever we enter a new control structure, we will save our current ‘frame’ to be continued later, and change to a new empty stack with the block’s inner instructions. When we finish with our current instructions, we know that we will have the correct return type on the stack, and so we will ‘ascend’ up our list of stack frames.
This differs from the method used in the specification, where both operands and labels share the same stack. Our method makes defining reduction rules slightly more difficult, however proving preservation of types is much easier.
First, we define our concrete values and stacks, both dependent on their type as defined above.
data Value : ValueTy → Set where Number : Nat → Value Number
data OpStack : OpStackTy → Set where ∅ : OpStack ∅ _,_ : ∀ {Δ : OpStackTy} {T} → OpStack Δ → Value T --------------- → OpStack (Δ , T)
We define the storage of our locals, and helpers for getting/setting them.
data Locals : LocalsTy → Set where ∅ : Locals ∅ _,_ : ∀ {Δ : LocalsTy} {T} → Locals Δ → Value T --------------- → Locals (Δ , T) -- Set the local targeted by the given lookup judgement _[_]:=_ : ∀ {Δ T} → Locals Δ → Δ ∋ T → Value T → Locals Δ (ls , _) [ Z ]:= v = ls , v (ls , x) [ S l ]:= v = (ls [ l ]:= v) , x -- Get the local targeted by the given lookup judgement _[_] : ∀ {Δ T} → Locals Δ → Δ ∋ T → Value T (ls , v) [ Z ] = v (ls , _) [ S l ] = ls [ l ]
We then define our list of previous stack frames, indexed by the type that we expect the current block to return, and our control stack.
Note that the type the current block ends with may not be the
same as the top of the control stack. In the case of loops, a
br
instruction jumps back to the start, and
returns/retains none of the stack. However, loop blocks may still
return a value once they have executed (reached the end without
calling br
). Therefore, these need to be separate.
When a program starts, we have no previous frames or labels we can jump to:
data Frames : ReturnTy → Context → Set where ∅ : ∀ {r ls} ----------------------------------------- → Frames r record {labels = ∅; locals = ls}
A notable limitation here is that programs can end with at most one value on the stack, as we will require they end with the return type indicated here.
We will use this constructor when we enter a block with a return type:
_,_⦂_ : ∀ {Γ Δ r r′} → Frames r Γ → Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r → OpStack Δ ----------------------------- → Frames r′ (Γ +ˡ r′)
Here, r
is the return type of the block which we are
saving, and r′
is the return type we need before we can
return to it. We store the ‘continuation’, which gets the return
value, and will finish with whatever its previous frame expects. We
also save the current state of the operand stack.
Loop constructs must also store the inner body of the loop, since
when we br
to this label we will go back to the start
of the inner body. When we finish the body normally, we will
continue on with the second instruction list. Note that we need the
next frame to return some r
, but we add the label
none
, as we mentioned above.
_,_↬_⦂_ : ∀ {Γ Δ r r′} → Frames r Γ → Γ +ˡ none ⊢ ∅ ⟫ ∅ +ʳ r′ -- Loop body → Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r -- Continuation → OpStack Δ --------------------------- → Frames r′ (Γ +ˡ none)
Now we define the current state of execution, referred to as a
Configuration
as in the specification. This includes
the current instruction sequence, its operand stack, and our
locals.
data Configuration : Set where conf : ∀ {Γ Δ r} → Γ ⊢ Δ ⟫ ∅ +ʳ r → OpStack Δ → Locals (Context.locals Γ) → Frames r Γ ------------------------- → Configuration
The last thing we need before we can define our reduction rules
is the ability to go back up the stack. This will be used when we
call br
, and must ascend the amount of frames
specified, resuming that frame with the given return value and the
new locals.
transplant-ret : ∀ {Δ Δ′ r} → OpStack (Δ +ʳ r) → OpStack Δ′ → OpStack (Δ′ +ʳ r) transplant-ret {r = some _} (xs , x) m = m , x transplant-ret {r = none} s m = m br-unwind : ∀ {Γ Δ r v} → (Context.labels Γ) ∋ v -- Label/frame to unwind up to → OpStack (Δ +ʳ v) → Locals (Context.locals Γ) → Frames r Γ ------------------------- → Configuration -- Base cases - resume the top frame br-unwind Z s ls (ss , is ⦂ s′) = conf is (transplant-ret s s′) ls ss -- For loops - go back to the start and don't remove the frame br-unwind Z _ ls ss@(_ , start ↬ cont ⦂ s) = conf start ∅ ls ss -- Recursive cases br-unwind (S l) s ls (ss , _ ⦂ _) = br-unwind l s ls ss br-unwind (S l) s ls (ss , _ ↬ _ ⦂ _) = br-unwind l s ls ss
Note that r
(the value the next frame up execpts)
and v
(the value we are returning) are distinct - we
may be jumping over our most recent block or may be returning to the
top of a loop.
Now, we define our small-step reduction rules, which take one configuration to the other.
data _—→_ : Configuration → Configuration → Set where
Our parametric instructions are easy enough, however we do have to specify a lot of implicit arguments to make Agda happy.
nop : ∀ {Γ Δ r is s ls ss} --------------------------------------- → conf {Γ} {Δ} {r} (nop then is) s ls ss —→ conf is s ls ss drop : ∀ {Γ Δ r is s ls ss X} {x : Value X} ------------------------------------------------- → conf {Γ} {Δ , X} {r} (drop then is) (s , x) ls ss —→ conf is s ls ss select-0 : ∀ {Γ Δ r s ls ss T} {a b : Value T} {is : Γ ⊢ Δ , T ⟫ _} ----------------------------------------------------- → conf {Γ} {Δ , T , T , Number} {r} (select then is) (s , b , a , Number zero) ls ss —→ conf is (s , a) ls ss select-1 : ∀ {Γ Δ r s ls ss T c} {a b : Value T} {is : Γ ⊢ Δ , T ⟫ _} -------------------------------------------------------- → conf {Γ} {Δ , T , T , Number} {r} (select then is) (s , b , a , Number (suc c)) ls ss —→ conf is (s , b) ls ss
Our numeric instructions simply apply their functions and put the return values on the stack.
Similar to select, we have two cases for op-test
and
op-rel
, which must be chosen based on the output of
f x
.
const : ∀ {Γ Δ r is s ls ss x} ------------------------------------------ → conf {Γ} {Δ} {r} (const x then is) s ls ss —→ conf is (s , Number x) ls ss op-unary : ∀ {Γ Δ r is s ls ss f x} ------------------------------------------------------------------- → conf {Γ} {Δ , Number} {r} (op-unary f then is) (s , Number x) ls ss —→ conf is (s , Number (f x)) ls ss op-binary : ∀ {Γ Δ r is s ls ss f x y} ---------------------------------------------------------------------------------------- → conf {Γ} {Δ , Number , Number} {r} (op-binary f then is) (s , Number x , Number y) ls ss —→ conf is (s , Number (f x y)) ls ss op-testᵀ : ∀ {Γ Δ r is s ls ss f x} → f x ≡ true ------------------------------------------------------------------ → conf {Γ} {Δ , Number} {r} (op-test f then is) (s , Number x) ls ss —→ conf is (s , Number 1) ls ss op-testꟳ : ∀ {Γ Δ r is s ls ss f x} → ¬ (f x ≡ true) ------------------------------------------------------------------ → conf {Γ} {Δ , Number} {r} (op-test f then is) (s , Number x) ls ss —→ conf is (s , Number 0) ls ss op-relᵀ : ∀ {Γ Δ r is s ls ss f x y} → f x y ≡ true ------------------------------------------------------------------------------------- → conf {Γ} {Δ , Number , Number} {r} (op-rel f then is) (s , Number x , Number y) ls ss —→ conf is (s , Number 1) ls ss op-relꟳ : ∀ {Γ Δ r is s ls ss f x y} → ¬ (f x y ≡ true) ------------------------------------------------------------------------------------- → conf {Γ} {Δ , Number , Number} {r} (op-rel f then is) (s , Number x , Number y) ls ss —→ conf is (s , Number 0) ls ss
As described above, when we enter a block we save the current
state in our list of previous frames, and start with an empty stack.
We know that the inner instruction sequence will have the correct
end state because of how we define our block
instruction, but Agda needs us to spell it out anyway.
block : ∀ {Γ Δ r r′ s ls ss} {is : Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r} {is′ : Γ +ˡ r′ ⊢ ∅ ⟫ ∅ +ʳ r′} ------------------------------------------- → conf {Γ} {Δ} {r} (block is′ then is) s ls ss —→ conf is′ ∅ ls (ss , is ⦂ s) loop : ∀ {Γ Δ r r′ s ls ss} {is : Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r} {is′ : Γ +ˡ none ⊢ ∅ ⟫ ∅ +ʳ r′} ------------------------------------------------------------ → conf (loop is′ then is) s ls ss —→ conf is′ ∅ ls (ss , is′ ↬ is ⦂ s) ifᵀ : ∀ {Γ Δ n r r′ s ls ss} {is : Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r} {ts : Γ +ˡ r′ ⊢ ∅ ⟫ ∅ +ʳ r′} {es : Γ +ˡ r′ ⊢ ∅ ⟫ ∅ +ʳ r′} ---------------------------------------------------------------------------- → conf {Γ} {Δ , Number} {r} (if ts else es then is) (s , Number (suc n)) ls ss —→ conf ts ∅ ls (ss , is ⦂ s) ifꟳ : ∀ {Γ Δ r r′ s ls ss} {is : Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r} {ts : Γ +ˡ r′ ⊢ ∅ ⟫ ∅ +ʳ r′} {es : Γ +ˡ r′ ⊢ ∅ ⟫ ∅ +ʳ r′} ------------------------------------------------------------------------- → conf {Γ} {Δ , Number} {r} (if ts else es then is) (s , Number zero) ls ss —→ conf es ∅ ls (ss , is ⦂ s)
br
uses the br-unwind
function defined
above.
br : ∀ {Γ Δ Δ′ r ls ss v} {lbl : (Context.labels Γ) ∋ v} {s : OpStack (Δ +ʳ v)} {is : Γ ⊢ Δ′ ⟫ ∅ +ʳ r} ------------------------------------------------------------------ → conf {Γ} {Δ +ʳ v} {r} (br lbl then is) s ls ss —→ (br-unwind lbl s ls ss)
br-if
is similar, again with two cases for zero and
non-zero.
br-ifᵀ : ∀ {Γ Δ r ls ss v x} {lbl : (Context.labels Γ) ∋ v} {s : OpStack (Δ +ʳ v)} {is : Γ ⊢ Δ +ʳ v ⟫ ∅ +ʳ r} ------------------------------------------------------------------------------- → conf {Γ} {(Δ +ʳ v) , Number} {r} (br-if lbl then is) (s , Number (suc x)) ls ss —→ (br-unwind lbl s ls ss) br-ifꟳ : ∀ {Γ Δ r ls ss v} {lbl : (Context.labels Γ) ∋ v} {s : OpStack (Δ +ʳ v)} {is : Γ ⊢ Δ +ʳ v ⟫ ∅ +ʳ r} ---------------------------------------------------------------------------- → conf {Γ} {(Δ +ʳ v) , Number} {r} (br-if lbl then is) (s , Number zero) ls ss —→ conf is s ls ss
Now we define our local instructions, which only affect the locals part of our configuration.
local-get : ∀ {Γ Δ r T is s ls ss} {l : (Context.locals Γ) ∋ T} → conf {Γ} {Δ} {r} (local-get l then is) s ls ss —→ conf is (s , ls [ l ]) ls ss local-set : ∀ {Γ Δ r T is s ls ss} {l : (Context.locals Γ) ∋ T} {v : Value T} → conf {Γ} {Δ , T} {r} (local-set l then is) (s , v) ls ss —→ conf is s (ls [ l ]:= v) ss local-tee : ∀ {Γ Δ r T is s ls ss} {l : (Context.locals Γ) ∋ T} {v : Value T} → conf {Γ} {Δ , T} {r} (local-tee l then is) (s , v) ls ss —→ conf is (s , v) (ls [ l ]:= v) ss
Finally, we need to be able to go back up to our previous frames once we’ve completed our current list of instructions.
ascend : ∀ {Γ r ls Δ′ r′ cont s s′ ss′} -------------------------------------------------------------------- → conf {Γ +ˡ r} {∅ +ʳ r} {r} done s ls (ss′ , cont ⦂ s′) —→ conf {Γ} {Δ′ +ʳ r} {r′} cont (transplant-ret s s′) ls ss′ ascendˡ : ∀ {Γ r ls Δ′ r′ s s′ ss′ start cont} -------------------------------------------------------------------------- → conf {Γ +ˡ none} {∅ +ʳ r} {r} done s ls (ss′ , start ↬ cont ⦂ s′) —→ conf {Γ} {Δ′ +ʳ r} {r′} cont (transplant-ret s s′) ls ss′
Progress
Now that we have our reduction rules, we can prove progress: Either all of our instructions are finished, or we can reduce further. Proving progress is relatively straightforward, since we mostly only care about what instruction is up next.
data Progress : Configuration → Set where done : ∀ {r lsty} {ls : Locals lsty} {s : OpStack (∅ +ʳ r)} -------------------------- → Progress (conf done s ls (∅ {r})) step : ∀ {C C′} → C —→ C′ ---------- → Progress C progress : ∀ (c : Configuration) ------------------- → Progress c
Here are our parametric instructions, with different cases for
the branch we take when selecting. To make things clearer, we will
use _
for all the parts of the configuration that
aren’t modified by the reduction step.
progress (conf (nop then is) _ _ _) = step nop progress (conf (drop then is) (_ , v) _ _) = step drop progress (conf (select then is) (_ , _ , a , Number zero) _ _) = step select-0 progress (conf (select then is) (_ , b , _ , Number (suc _)) _ _) = step select-1
progress (conf (local-get i then _) _ ls _) = step local-get progress (conf (local-set i then _) (_ , v) ls _) = step local-set progress (conf (local-tee i then _) (_ , v) ls _) = step local-tee
For op-test
and op-rel
we must decide
if f x
is true ourselves.
progress (conf (const x then is) s _ _) = step const progress (conf (op-unary f then is) (_ , Number n) _ _) = step op-unary progress (conf (op-binary f then is) (_ , Number x , Number y) _ _) = step op-binary progress (conf (op-test f then is) (_ , Number x) _ _) with (f x) ≟ true ... | (yes pass) = step (op-testᵀ pass) ... | (no fail) = step (op-testꟳ fail) progress (conf (op-rel f then is) (_ , Number x , Number y) _ _) with (f x y) ≟ true ... | (yes pass) = step (op-relᵀ pass) ... | (no fail) = step (op-relꟳ fail)
We may deliberately ascend up the frame stack:
progress (conf (block is′ then is) s _ ss) = step block progress (conf (loop is′ then is) s _ ss) = step loop progress (conf (if_else_ ts _ then is) (_ , Number (suc n)) _ ss) = step ifᵀ progress (conf (if_else_ _ fs then is) (_ , Number zero) _ ss) = step ifꟳ progress (conf (br lbl then is) s _ ss) = step br progress (conf (br-if lbl then is) (_ , Number (suc _)) _ ss) = step br-ifᵀ progress (conf (br-if lbl then is) (_ , Number zero) _ ss) = step br-ifꟳ
Finally, if we run out of instructions in the current frame we
must try to use the next one up. Again, we know that we will have
the return type when we need it thanks to the constraints in
Configuration
.
progress (conf done _ _ (ss′ , is′ ⦂ s′)) = step ascend progress (conf done _ _ (ss′ , ls ↬ is′ ⦂ s′)) = step ascendˡ
We are only done when all instructions are executed, and we can’t go up any more stack frames.
progress (conf done s _ ∅) = done
Thanks to the constraints on our instruction type and configuration types, we have also proved the type system’s soundness: Types are respected throughout the whole run of our program, and the stack will always be transformed as expected.
Evaluation
We now adapt our proof of progress to be used as an evaluator.
First, we define big-step reductions as the reflexive and transitive closure of reductions, with some syntax to make writing them nicer.
data _—↠_ : Configuration → Configuration → Set where _∎ : (c : Configuration) ------------------- → c —↠ c step—→ : ∀ {B C} → (A : Configuration) → B —↠ C → A —→ B ------- → A —↠ C pattern _—→⟨_⟩_ A A—→B B—↠C = step—→ A B—↠C A—→B begin_ : ∀ {A B} → A —↠ B ------ → A —↠ B begin_ A—↠B = A—↠B
This allows us to write chains of reductions, similar to an
execution trace. For example, here is how the 2+2
program from earlier might be executed.
_ : conf Language.test-2+2 ∅ ∅ ∅ —↠ conf done (∅ , Number 4) ∅ ∅ _ = begin conf (const 2 then const 2 then add then done) ∅ ∅ (∅ {some Number}) —→⟨ const ⟩ conf (const 2 then add then done) (∅ , Number 2) ∅ ∅ —→⟨ const ⟩ conf (add then done) (∅ , Number 2 , Number 2) ∅ ∅ —→⟨ op-binary ⟩ conf done (∅ , Number 4) ∅ ∅ ∎
We run our evaluator with a set amount of gas, which keeps Agda’s termination checker happy.
record Gas : Set where constructor gas field amount : Nat data Finished (c : Configuration) : Set where done : ∀ {r s′ LS ls′} → c —↠ conf {record {labels = ∅; locals = LS}} {∅ +ʳ r} {r} done s′ ls′ ∅ ---------- → Finished c out-of-gas : ---------- Finished c eval : Gas → (C : Configuration) → Finished C eval (gas zero) C = out-of-gas eval (gas (suc n)) C with progress C ... | done = done (C ∎) ... | step {_} {C′} C→C′ with eval (gas n) C′ ... | done C′↠D = done (C —→⟨ C→C′ ⟩ C′↠D) ... | out-of-gas = out-of-gas
Now we can evaluate the programs we defined earlier.
-- 2 + 2 = 4 _ : eval (gas 100) (conf Language.test-2+2 ∅ ∅ ∅) ≡ done ( conf (const 2 then const 2 then op-binary (λ x y → x + y) then done) ∅ ∅ ∅ —→⟨ const ⟩ conf (const 2 then op-binary (λ x y → x + y) then done) (∅ , Number 2) ∅ ∅ —→⟨ const ⟩ conf (op-binary (λ x y → x + y) then done) (∅ , Number 2 , Number 2) ∅ ∅ —→⟨ op-binary ⟩ conf done (∅ , Number 4) ∅ ∅ ∎) _ = refl -- if false then 42 else 84 -> 84 _ : eval (gas 100) (conf Language.test-if (∅ , Number 0) ∅ ∅) ≡ done ( conf ((if const 42 then done else (const 84 then done)) then done) (∅ , Number 0) ∅ ∅ —→⟨ ifꟳ ⟩ conf (const 84 then done) ∅ ∅ (∅ , done ⦂ ∅) —→⟨ const ⟩ conf done (∅ , Number 84) ∅ (∅ , done ⦂ ∅) —→⟨ ascend ⟩ conf done (∅ , Number 84) ∅ ∅ ∎) _ = refl -- if true then 42 else 84 -> 42 _ : eval (gas 100) (conf Language.test-if (∅ , Number 1) ∅ ∅) ≡ done ( conf ((if const 42 then done else (const 84 then done)) then done) (∅ , Number 1) ∅ ∅ —→⟨ ifᵀ ⟩ conf (const 42 then done) ∅ ∅ (∅ , done ⦂ ∅) —→⟨ const ⟩ conf done (∅ , Number 42) ∅ (∅ , done ⦂ ∅) —→⟨ ascend ⟩ conf done (∅ , Number 42) ∅ ∅ ∎) _ = refl -- infinite loop will never finish _ : eval (gas 100) (conf Language.test-infloop ∅ ∅ ∅) ≡ out-of-gas _ = refl -- loop 3 times _ : eval (gas 100) (conf Language.test-loop3 ∅ (∅ , Number 0) ∅) ≡ done ( conf (const 3 then local-set Z then loop (local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done) then done) ∅ (∅ , Number 0) ∅ —→⟨ const ⟩ conf (local-set Z then loop (local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done) then done) (∅ , Number 3) (∅ , Number 0) ∅ —→⟨ local-set ⟩ conf (loop (local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done) then done) ∅ (∅ , Number 3) ∅ —→⟨ loop ⟩ -- Loop 1 conf (local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done) ∅ (∅ , Number 3) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ local-get ⟩ conf (const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done) (∅ , Number 3) (∅ , Number 3) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ const ⟩ conf (op-binary (λ x y → x - y) then local-tee Z then br-if Z then done) (∅ , Number 3 , Number 1) (∅ , Number 3) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ op-binary ⟩ conf (local-tee Z then br-if Z then done) (∅ , Number 2) (∅ , Number 3) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ local-tee ⟩ conf (br-if Z then done) (∅ , Number 2) (∅ , Number 2) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ br-ifᵀ ⟩ -- Loop 2 conf (local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done) ∅ (∅ , Number 2) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ local-get ⟩ conf (const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done) (∅ , Number 2) (∅ , Number 2) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ const ⟩ conf (op-binary (λ x y → x - y) then local-tee Z then br-if Z then done) (∅ , Number 2 , Number 1) (∅ , Number 2) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ op-binary ⟩ conf (local-tee Z then br-if Z then done) (∅ , Number 1) (∅ , Number 2) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ local-tee ⟩ conf (br-if Z then done) (∅ , Number 1) (∅ , Number 1) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ br-ifᵀ ⟩ -- Loop 3 conf (local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done) ∅ (∅ , Number 1) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ local-get ⟩ conf (const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done) (∅ , Number 1) (∅ , Number 1) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ const ⟩ conf (op-binary (λ x y → x - y) then local-tee Z then br-if Z then done) (∅ , Number 1 , Number 1) (∅ , Number 1) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ op-binary ⟩ conf (local-tee Z then br-if Z then done) (∅ , Number 0) (∅ , Number 1) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ local-tee ⟩ conf (br-if Z then done) (∅ , Number 0) (∅ , Number 0) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ br-ifꟳ ⟩ conf done ∅ (∅ , Number 0) (∅ , local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done ↬ done ⦂ ∅) —→⟨ ascendˡ ⟩ conf done ∅ (∅ , Number 0) ∅ ∎) _ = refl
As a final test, here is a program for calculating the fibonacci sequence.
fib : record {labels = ∅; locals = ∅ , Number , Number , Number} ⊢ ∅ , Number ⟫ ∅ , Number fib = local-set (S (S Z)) then -- n const 0 then local-set (S Z) then -- previous const 1 then local-set Z then -- acc loop ( -- load previous and acc local-get (S Z) then local-get Z then -- previous = acc local-tee (S Z) then -- acc = acc + previous add then local-set Z then -- subtract from loop variable local-get (S (S Z)) then const 1 then sub then local-tee (S (S Z)) then br-if Z then done ) then local-get Z then done
We avoid giving the full reduction sequence for length reasons.
-- 7th fibonacci number = 21 _ : eval (gas 100) (conf fib (∅ , Number 7) (∅ , Number 0 , Number 0 , Number 0) ∅) ≡ done {s′ = ∅ , Number 21} _ _ = refl
Future Work
This section outlines how you could extend this base to cover the rest of Wasm 1.0. Some of the work has been started in this repository, but it’s a bit of a mess. This is more of a notes section than actual exposition.
Modules
In actual WebAssembly, all code is executed in the context of a
Module (spec version 2.5). To facilitate this, we can create a
seperate type (ModuleTy
) and parameterise our context
and configuration based on that. We’ll refer to this as μ below.
Modules also require keeping track of some state that is persistent between function calls (memories, globals, etc). This means you’d probably end up with the following things to keep track of during execution/checking:
ModuleTy
- what functions/globals/memories/etc are availableGlobalStore
- values of globals, memories, etc, in the shape specified byModuleTy
Context
- types of locals,br
points, and return typeFrames
- actual data for previous control flow blocks, in the shape specified byContext
LocalStore
- locals and other function-specific values, in the shape specified byContext
OpStack
- the operand stackis
- remaining instructions in the current control flow block
Some of these could probably be compacted together to make things less unweildy.
In our repository, these are specified in
src/Wasm/Types.agda
and
src/Wasm/Configuration.agda
.
Function calls
Function calls can be added in a similar way to frames, by adding
a new constructor to Frames
, like this:
data Frames where
_,save_⨾_⨾_ : ∀ {μ Δ ε r r′ ls} {Γ : Context μ}
→ Frames μ r Γ
→ Locals Γ
→ OpStack Δ
→ Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r
-------------------
→ Frames r′ record {labels = ∅; locals = ls; ret = r′} ε
We also need to add a ret
field to our
Context
, to keep track of how we can use the
return
instruction.
We extend our instructions like this:
data _⊢_⟩_ where
return : ∀ {μ Δ Δ′ t} {Γ : Context μ}
→ (Context.ret Γ) ≡ just t
-------------------
→ Γ ⊢ Δ +ʳ t ⟩ Δ′
call : ∀ {μ Δ A L R} {Γ : Context μ}
→ (f : (ModuleTy.functions μ) ∋v record { args = A; locals = L; ret = R } )
→ (args : Δ ∋* A)
→ Γ ⊢ Δ ⟩ (discard args) +ʳ R
Where discard x
removes length x
elements from the top of the stack. Our return
and
ascend-call
reduction steps will be pretty much the
same as the existing br
and ascend
steps.
Validation & Instantiation
Validation works pretty much like the spec sets out: Mostly this is just a bit tedious and based on applying the same few checks in different ways.
Instantiation is more difficult, as we essentially need to execute with a partially initialised module when instantiating our globals. Probably it would look something like this:
- Infer our module type by looking at the serialised types, etc.
- Validate and execute the global initialisation code under a mostly-empty module as described by the spec.
- Validate the rest of the functions instructions.
- Compile this into a valid configuration, currently on the start function.
- Run the start function, and return both the module type and the persistent state (memories, mutable globals, etc)
There’s a vague start at this in our repository at
src/Wasm/Validation.agda
, using the existing
haskell-wasm
library and some FFI bindings in
src/Wasm/Untyped.agda
.
Bibliography
W3C. (2019, December 5). WebAssembly Core Specification. https://www.w3.org/TR/wasm-core-1/ (Accessed 11th Aug 2023)
Watt, C., Rao, X., Pichon-Pharabod, J., Bodin, M., Gardner, P. (2021). Two Mechanisations of WebAssembly 1.0. In: Huisman, M., Păsăreanu, C., Zhan, N. (eds) Formal Methods. FM 2021. Lecture Notes in Computer Science(), vol 13047. Springer, Cham. https://doi.org/10.1007/978-3-030-90870-6_4 (Accessed 11th Aug 2023)