Mechanisation of WebAssembly 1.0 (in Agda)
module Essay where
This essay formalises parts of the WebAssembly 1.0 standard, a safe and portable bytecode format.
WebAssembly (or Wasm) is a stack-machine based language that uses structured control flow. It was designed to allow certain memory safety guarantees about a program to be checked before the program runs, so that checks don’t slow down the program at runtime.
We formalise a limited subset of WebAssembly that demonstrates its most important aspects. Specifically, we exclude:
- Floating point types and operations
- Functions
- Globals
- Memories
- Tables
- Data segments
- Imports, Exports, and the module system
- Traps
Many of these features, such as floating point operations, would be formalised almost identically to features we have implemented. For the others, there are some notes on how these could be implemented at the end.
We provide an instruction type that enforces the validation requirements described in section 3 of the specification, hence formalising the validation step. We then formalise the reduction rules found in section 4 of the specification, and prove that they preserve types.
We take inspiration from Watt, 2021 in how we formalise instructions and state configurations.
I originally wrote this essay for Philip Wadler’s TSPL class. I did a little extra work on it after, but am probably not going to finish it anytime soon, so hopefully it will be useful to other people.
Language
Imports:
module Language where
open import Agda.Builtin.Nat
open import Agda.Builtin.List
open import Agda.Builtin.Bool
infix 4 _⊢_⟩_
infix 4 _⊢_⟫_
infixr 4 _then_
infix 5 _∋_
infixl 6 _,_
infixl 6 _+ˡ_
infixl 6 _+ʳ_
Conventional assembly (such as MIPS or x86) uses a number of
registers to store working values, with operations such as
add working on those registers. Local variables are
then stored somewhere in RAM, usually on the stack.
By contrast, WebAssembly (abbreviated to Wasm) has no registers. Instructions pop their non-constant arguments off of an operand stack, and push (return) their result back onto it. Instructions can only act on the top elements of the stack, therefore local variables are stored off of the operand stack and retrieved/set as needed.
For example, the following program calculates 2+2 and leaves the result on the operand stack:
Stack: []
const 2 -> [2]
const 2 -> [2, 2]
add -> [4]
Control flow is structured, meaning you cannot jump to arbitrary
locations. Instructions such as block and
loop nest instruction sequences recursively. The inner
instruction sequence acts on an empty stack, and optionally has a
return type that it must have at the top of the stack when it is
done.
br and br-if instructions can jump to
any of the control structures they are inside. If the target of the
jump is a block, execution skips to after that block,
and if it is a loop, then back to the start.
In the following program, we use br to skip over
some instructions. In this case, the block has a return value
(Number), which must be at the top of the stack when we return or
finish.
→ []
const 2 → [2]
block → [] Saved: [2]
const 1 → [1]
br 0 -------\
const 1 |
add ↓
end [2, 1] Restore: [2] and push return value (1)
add → [3]
loop is similar, however br jumps back
to the start of the loop, and only reaching the end of the inner
instruction sequence ends it. The following program loops
forever.
loop ↓↖
br 0 ↘↗
end
As mentioned above, locals are stored outside of the stack, and can be retrieved and set from the stack. The following program makes use of a single local to loop 3 times, then exit.
const 3
local.set 0
loop
local.get 0
const 1
sub
local.set 0
local.get 0
br-if 0
end
When the program starts, our stack is empty and we have one local
with value 0. We push 3 to the stack and
execute local.set 0, which sets the local with index
0 to the value at the top of the stack. We then enter
the loop, where we copy the local back to the stack, subtract 1 and
put it back. We then retrieve it once again, and if it is
> 0 we go back to the start of our loop. Otherwise,
we exit the loop and the program is finished.
Like labels, the amount of locals and their types are static, allowing us to check instructions involving locals are valid before execution.
The restrictions on control flow and variables mean that we are able to know the shape of the current function call’s stack throughout. Therefore, we are able to verify before running the program that:
- For every instruction, the stack must have the correct number of operands and the correct operand types at the top
-
For
brinstructions, the targeted label must exist, and we must be returning the correct type - Blocks must always return the correct type (if there is one)
- Local stores and loads must always reference a local that exists, and take/return the correct type.
In the specification, checking that a program satisfies these constraints is referred to as the validation phase (section 3). We will bake these constraints into our instruction types, so that any program that type checks must be valid.
Stacks
There are several different lists/stacks that we want to express the types of. Therefore, we will define a generic way to store them, and a lookup judgement to target a specific element.
data Stack (T : Set) : Set where
∅ : Stack T
_,_ : Stack T → T → Stack T
data _∋_ {T : Set} : Stack T → T → Set where
Z : ∀ {Γ A}
-----------
→ Γ , A ∋ A
S : ∀ {Γ A B}
→ Γ ∋ A
-----------
→ Γ , B ∋ A
Values
The type of a single value in Wasm can be an integer or float, of either 32 or 64 bits. As mentioned above, we have excluded floating point instructions, so we will only have integers. We will also use arbitrary sized signed integers, for brevity.
data ValueTy : Set where
Number : ValueTy
The type of the operand stack is just a list of types that are on the stack.
OpStackTy : Set
OpStackTy = Stack ValueTy
As noted above, Wasm uses structured control flow. When we
br or finish with a block, we might expect that block
to return something. We make a simple type to say if we need a
return value or not.
data ReturnTy : Set where
some : ValueTy → ReturnTy
none : ReturnTy
We also define a function to add a return type to the stack type.
For none, nothing will be added to the stack, whereas
for some we will add the type in question.
_+ʳ_ : OpStackTy → ReturnTy → OpStackTy
s +ʳ some x = s , x
s +ʳ none = s
Labels
We now need to keep track of the labels in scope for a given
instruction. It is invalid for a program to jump out of a label that
doesn’t exist - a br instruction outside of our
block could not target anything. Similarly, when
br instructions are executed the top of the stack must
have the correct return type.
We track the labels we can return to as a stack of return types, where each element corresponds to a nested block, and specifies the return type expected by that block.
CtrlStackTy : Set
CtrlStackTy = Stack ReturnTy
Locals
We will also use indices for locals, which means we can define them the same as our operand stack. The only difference is we won’t be changing the length throughout, only the values.
LocalsTy : Set
LocalsTy = Stack ValueTy
Context
We now define our context, which tells us what labels and locals are available to use.
record Context : Set where
field
labels : CtrlStackTy
locals : LocalsTy
We also define a convenience function for modifying labels.
-- Add a new label to the context's labels field
_+ˡ_ : Context → ReturnTy → Context
_+ˡ_ Γ ret = record Γ { labels = (Context.labels Γ) , ret }
Instructions
There are two things which constrain what instructions we can execute: the context we are in, and the type of our operand stack. Given these two parameters, we can determine what the stack will be at the end of a given instruction.
We say our instructions are valid in a certain context (defined
above), and perform a transformation from one operand stack type to
another. We will use Γ throughout to refer to the
context of an instruction, and Δ to the stack type. An
instruction of type Γ ⊢ Δ ⟩ Δ′ is valid in
Γ, requires a stack of type Δ, and changes
the stack to have type Δ′.
Similarly, sequences of instructions are all valid in a certain
context, and may change the stack type. These are mutually
recursive, as we will need instruction sequences for instructions
like block and loop.
data _⊢_⟫_ : Context → OpStackTy → OpStackTy → Set
data _⊢_⟩_ : Context → OpStackTy → OpStackTy → Set
Instruction Sequences
Sequences of instructions will all have the same context (as in the specification), and each instruction must leave the stack in the correct state for the next one to execute.
data _⊢_⟫_ where
done : ∀ {Γ Δ}
→ Γ ⊢ Δ ⟫ Δ
_then_ : ∀ {Γ A B C}
→ Γ ⊢ A ⟩ B
→ Γ ⊢ B ⟫ C
-----------
→ Γ ⊢ A ⟫ C
The definitions in the specification all pretend the stack is empty other than the values each instruction acts on. Instruction sequences are then defined so that instructions can ignore the rest of the stack. We instead make all instructions over the bottom of the stack, which is equivalent and slightly simpler.
Parametric
Instructions (spec section 3.3.2)
We start by defining some simple operations on the stack.
data _⊢_⟩_ where
-- Do nothing
nop : ∀ {Γ Δ}
---------
→ Γ ⊢ Δ ⟩ Δ
-- Discard the value at the top of the stack
drop : ∀ {Γ Δ T}
-------------
→ Γ ⊢ Δ , T ⟩ Δ
-- Return the 2nd element of the stack if the top element is 1, otherwise return the 3rd to top.
select : ∀ {Γ Δ T}
----------------------------------
→ Γ ⊢ (Δ , T , T , Number) ⟩ (Δ , T)
These are all valid in any context, and the stack below the used argument can take any shape.
Numeric
Instructions (spec section 3.3.1)
We group numeric operations on numbers by the number of
arguments, and allow an agda function on Nat to be
passed in. We could instead specify an instruction for each
operation, but this would make the code much longer and more
repetitive.
-- Push a constant to a stack
const : ∀ {Γ Δ}
→ Nat
------------------
→ Γ ⊢ Δ ⟩ Δ , Number
-- Consume one operand, returning another number; e.g. clz, ctz, popcnt
op-unary : ∀ {Γ Δ}
→ (Nat → Nat)
---------------------------
→ Γ ⊢ Δ , Number ⟩ Δ , Number
-- Consume two operands, returning another number; e.g. add, sub, mul, ...
op-binary : ∀ {Γ Δ}
→ (Nat → Nat → Nat)
---------------------------
→ Γ ⊢ Δ , Number , Number ⟩ Δ , Number
-- Consume one operand, and return a boolean (represented as a number in Wasm); e.g. eqz
op-test : ∀ {Γ Δ}
→ (Nat → Bool)
---------------------------
→ Γ ⊢ Δ , Number ⟩ Δ , Number
-- Consume two operands, and return a boolean; e.g. lt, lte, gt, gte
op-rel : ∀ {Γ Δ}
→ (Nat → Nat → Bool)
---------------------------
→ Γ ⊢ Δ , Number , Number ⟩ Δ , Number
A possible improvement would be to have f in
op-test and op-rel return
Dec T, representing a proof/negation of whatever test
or relation is being examined.
Control Instructions (spec section 3.3.5)
These allow us to add and use labels.
The simplest control flow is the block, which adds a new label that we can branch to the end of if needed. The instruction (or sequence of instructions) inside starts with an empty stack, and should finish with only the return value on the stack (if there is one).
-- Start a block, where `br` instructions inside the block can jump to the end of it
block : ∀ {Γ Δ r}
→ (Γ +ˡ r) ⊢ ∅ ⟫ ∅ +ʳ r
---------------------
→ Γ ⊢ Δ ⟩ Δ +ʳ r
if is slightly more complex, as we need something at
the top of the stack, and two instruction sequences that return the
same result type.
-- If the top of the stack is > 0, execute the first branch. Otherwise, execute the 2nd
if_else_ : ∀ {Γ Δ r}
→ (Γ +ˡ r) ⊢ ∅ ⟫ ∅ +ʳ r
→ (Γ +ˡ r) ⊢ ∅ ⟫ ∅ +ʳ r
-----------------------
→ Γ ⊢ Δ , Number ⟩ Δ +ʳ r
loop is different - when we branch out of it, we
jump to the start instead. Since we’re jumping to the start, we
don’t expect anything to be pushed to the stack when we do branch
(since the inner instruction sequence starts with an empty stack).
We can still return something out of the loop though, which will
happen when we reach the end of the instructions without
branching.
-- Start a loop, where `br` instructions inside it can jump back to the start
loop : ∀ {Γ Δ r}
→ Γ +ˡ none ⊢ ∅ ⟫ ∅ +ʳ r
------------------------
→ Γ ⊢ Δ ⟩ Δ +ʳ r
We now define our branching operators, starting with
unconditional branching to any valid label. Note that the stack
after this is unconstrained - instructions after br are
unreachable so it doesn’t matter.
-- Jump to the given label unconditionally
br : ∀ {Γ Δ Δ′ t}
→ (Context.labels Γ) ∋ t
----------------------
→ Γ ⊢ Δ +ʳ t ⟩ Δ′
Conditional branching branches only if the top element on the stack is > 0. Since we may continue after this, we constrain the stack so that we just get rid of the condition and the rest remains the same.
-- If the top element is > 0, jump to the given label
br-if : ∀ {Γ Δ t}
→ (Context.labels Γ) ∋ t
------------------------------
→ Γ ⊢ (Δ +ʳ t) , Number ⟩ Δ +ʳ t
Variable Instructions (spec section 3.3.3)
Finally, we need to be able to get and set our local variables. We use lookup judgements to specify what local we’re acting on and its type.
-- Get the current value of the given local
local-get : ∀ {Γ Δ t}
→ (Context.locals Γ) ∋ t
----------------------
→ Γ ⊢ Δ ⟩ Δ , t
-- Set a new value for the current local from the top of the stack
local-set : ∀ {Γ Δ t}
→ (Context.locals Γ) ∋ t
→ Γ ⊢ Δ , t ⟩ Δ
-- Set a new value for the current local from the top of the stack, and keep that value at the top.
local-tee : ∀ {Γ Δ t}
→ (Context.locals Γ) ∋ t
→ Γ ⊢ Δ , t ⟩ Δ , t
Writing programs
Now that we have all of our instructions, we can write some example programs.
First we’ll write some aliases for our op-binary
instruction.
add : ∀ {Γ Δ} → Γ ⊢ Δ , Number , Number ⟩ Δ , Number
add = op-binary λ{ x y → x + y}
sub : ∀ {Γ Δ} → Γ ⊢ Δ , Number , Number ⟩ Δ , Number
sub = op-binary λ{ x y → x - y}
Here’s an example program that calculates 2+2.
∅-ctx : Context
∅-ctx = record { labels = ∅; locals = ∅ }
test-2+2 : ∅-ctx ⊢ ∅ ⟫ ∅ , Number
test-2+2 = const 2 then
const 2 then
add then done
A simple if statement:
test-if : ∅-ctx ⊢ ∅ , Number ⟫ ∅ , Number
test-if = -- Returns 42 if the stack started with a 1, 84 otherwise
if (
const 42 then done
) else (
const 84 then done
) then done
An infinite loop:
test-infloop : ∅-ctx ⊢ ∅ ⟫ ∅
test-infloop = loop (br Z then done) then done
And the program from above that loops 3 times:
test-loop3 : record {labels = ∅; locals = ∅ , Number} ⊢ ∅ ⟫ ∅
test-loop3 = const 3 then
local-set Z then
loop (
local-get Z then
const 1 then
sub then
local-tee Z then -- equivalent to local-set Z then local-get Z
br-if Z then done
) then done
It is also impossible to create programs that would fail to execute, for instance the below examples cannot be constructed:
_ : ∅-ctx ⊢ ∅ ⟫ ∅ , Number
_ = const 2 then
add -- Needs two arguments, we only have one!
then done
-- _Δ_381 , Number != ∅ of type Stack ValueTy
-- when checking that the expression add has type
-- ∅-ctx ⊢ ∅ , Number ⟩ _B_378
_ : ∅-ctx ⊢ ∅ ⟫ ∅ , Number
_ = const 42 then
-- This block must return a number for us to add...
block (const 1 then
br-if Z -- ...but it doesnt!
then done) then
add then done
-- Failed to solve the following constraints:
-- ∅ , Number +ʳ _r_382 = ∅ , Number , Number : Stack ValueTy
-- (blocked on _r_382)
-- _Δ_394 +ʳ _r_382 = ∅ +ʳ _r_382 : Stack ValueTy
-- (blocked on any(_r_382, _Δ_394))
-- _Δ_394 +ʳ _r_382 = ∅ : Stack ValueTy (blocked on _r_382)
Evaluation
Imports:
module Evaluation where
open import Agda.Builtin.Nat
open import Agda.Builtin.List
open import Agda.Builtin.Bool
open import Data.Bool.Properties using (_≟_)
open import Agda.Builtin.Equality
open import Relation.Nullary
open import Agda.Builtin.Sigma
using (Σ)
renaming (_,_ to _Σ,_)
open Language
using (Context; ValueTy; ReturnTy; Stack; OpStackTy; CtrlStackTy; LocalsTy; add; sub; _⊢_⟩_; _⊢_⟫_; _∋_; _+ʳ_; _+ˡ_)
open ValueTy
open Stack
open ReturnTy
open _⊢_⟩_
open _⊢_⟫_
open _∋_
infix 1 begin_
infix 2 _—↠_
infixr 2 _—→⟨_⟩_
infix 3 _∎
infixl 4 _—→_
infixl 6 _,_
Now that we are able to write programs and know that they are valid, we want to execute them.
The main challenge here is dealing with control structures: We must ensure that we know where we go back to if we jump back to a label, and we must know that we will always have the correct return type.
To do this, we will seperate each nested instruction sequence into a ‘frame’. We’ll keep track of all the frames above us, so that we can return to them when needed. We will always require that our current list of instructions ends by returning whatever the next frame ‘up’ expects.
Whenever we enter a new control structure, we will save our current ‘frame’ to be continued later, and change to a new empty stack with the block’s inner instructions. When we finish with our current instructions, we know that we will have the correct return type on the stack, and so we will ‘ascend’ up our list of stack frames.
This differs from the method used in the specification, where both operands and labels share the same stack. Our method makes defining reduction rules slightly more difficult, however proving preservation of types is much easier.
First, we define our concrete values and stacks, both dependent on their type as defined above.
data Value : ValueTy → Set where
Number : Nat → Value Number
data OpStack : OpStackTy → Set where
∅ : OpStack ∅
_,_ : ∀ {Δ : OpStackTy} {T}
→ OpStack Δ
→ Value T
---------------
→ OpStack (Δ , T)
We define the storage of our locals, and helpers for getting/setting them.
data Locals : LocalsTy → Set where
∅ : Locals ∅
_,_ : ∀ {Δ : LocalsTy} {T}
→ Locals Δ
→ Value T
---------------
→ Locals (Δ , T)
-- Set the local targeted by the given lookup judgement
_[_]:=_ : ∀ {Δ T} → Locals Δ → Δ ∋ T → Value T → Locals Δ
(ls , _) [ Z ]:= v = ls , v
(ls , x) [ S l ]:= v = (ls [ l ]:= v) , x
-- Get the local targeted by the given lookup judgement
_[_] : ∀ {Δ T} → Locals Δ → Δ ∋ T → Value T
(ls , v) [ Z ] = v
(ls , _) [ S l ] = ls [ l ]
We then define our list of previous stack frames, indexed by the type that we expect the current block to return, and our control stack.
Note that the type the current block ends with may not be the
same as the top of the control stack. In the case of loops, a
br instruction jumps back to the start, and
returns/retains none of the stack. However, loop blocks may still
return a value once they have executed (reached the end without
calling br). Therefore, these need to be separate.
When a program starts, we have no previous frames or labels we can jump to:
data Frames : ReturnTy → Context → Set where
∅ : ∀ {r ls}
-----------------------------------------
→ Frames r record {labels = ∅; locals = ls}
A notable limitation here is that programs can end with at most one value on the stack, as we will require they end with the return type indicated here.
We will use this constructor when we enter a block with a return type:
_,_⦂_ : ∀ {Γ Δ r r′}
→ Frames r Γ
→ Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r
→ OpStack Δ
-----------------------------
→ Frames r′ (Γ +ˡ r′)
Here, r is the return type of the block which we are
saving, and r′ is the return type we need before we can
return to it. We store the ‘continuation’, which gets the return
value, and will finish with whatever its previous frame expects. We
also save the current state of the operand stack.
Loop constructs must also store the inner body of the loop, since
when we br to this label we will go back to the start
of the inner body. When we finish the body normally, we will
continue on with the second instruction list. Note that we need the
next frame to return some r, but we add the label
none, as we mentioned above.
_,_↬_⦂_ : ∀ {Γ Δ r r′}
→ Frames r Γ
→ Γ +ˡ none ⊢ ∅ ⟫ ∅ +ʳ r′ -- Loop body
→ Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r -- Continuation
→ OpStack Δ
---------------------------
→ Frames r′ (Γ +ˡ none)
Now we define the current state of execution, referred to as a
Configuration as in the specification. This includes
the current instruction sequence, its operand stack, and our
locals.
data Configuration : Set where
conf : ∀ {Γ Δ r}
→ Γ ⊢ Δ ⟫ ∅ +ʳ r
→ OpStack Δ
→ Locals (Context.locals Γ)
→ Frames r Γ
-------------------------
→ Configuration
The last thing we need before we can define our reduction rules
is the ability to go back up the stack. This will be used when we
call br, and must ascend the amount of frames
specified, resuming that frame with the given return value and the
new locals.
transplant-ret : ∀ {Δ Δ′ r} → OpStack (Δ +ʳ r) → OpStack Δ′ → OpStack (Δ′ +ʳ r)
transplant-ret {r = some _} (xs , x) m = m , x
transplant-ret {r = none} s m = m
br-unwind : ∀ {Γ Δ r v}
→ (Context.labels Γ) ∋ v -- Label/frame to unwind up to
→ OpStack (Δ +ʳ v)
→ Locals (Context.locals Γ)
→ Frames r Γ
-------------------------
→ Configuration
-- Base cases - resume the top frame
br-unwind Z s ls (ss , is ⦂ s′) = conf is (transplant-ret s s′) ls ss
-- For loops - go back to the start and don't remove the frame
br-unwind Z _ ls ss@(_ , start ↬ cont ⦂ s) = conf start ∅ ls ss
-- Recursive cases
br-unwind (S l) s ls (ss , _ ⦂ _) = br-unwind l s ls ss
br-unwind (S l) s ls (ss , _ ↬ _ ⦂ _) = br-unwind l s ls ss
Note that r (the value the next frame up execpts)
and v (the value we are returning) are distinct - we
may be jumping over our most recent block or may be returning to the
top of a loop.
Now, we define our small-step reduction rules, which take one configuration to the other.
data _—→_ : Configuration → Configuration → Set where
Our parametric instructions are easy enough, however we do have to specify a lot of implicit arguments to make Agda happy.
nop : ∀ {Γ Δ r is s ls ss}
---------------------------------------
→ conf {Γ} {Δ} {r} (nop then is) s ls ss
—→ conf is s ls ss
drop : ∀ {Γ Δ r is s ls ss X}
{x : Value X}
-------------------------------------------------
→ conf {Γ} {Δ , X} {r} (drop then is) (s , x) ls ss
—→ conf is s ls ss
select-0 : ∀ {Γ Δ r s ls ss T}
{a b : Value T}
{is : Γ ⊢ Δ , T ⟫ _}
-----------------------------------------------------
→ conf {Γ} {Δ , T , T , Number} {r}
(select then is) (s , b , a , Number zero) ls ss
—→ conf is (s , a) ls ss
select-1 : ∀ {Γ Δ r s ls ss T c}
{a b : Value T}
{is : Γ ⊢ Δ , T ⟫ _}
--------------------------------------------------------
→ conf {Γ} {Δ , T , T , Number} {r}
(select then is) (s , b , a , Number (suc c)) ls ss
—→ conf is (s , b) ls ss
Our numeric instructions simply apply their functions and put the return values on the stack.
Similar to select, we have two cases for op-test and
op-rel, which must be chosen based on the output of
f x.
const : ∀ {Γ Δ r is s ls ss x}
------------------------------------------
→ conf {Γ} {Δ} {r} (const x then is) s ls ss
—→ conf is (s , Number x) ls ss
op-unary : ∀ {Γ Δ r is s ls ss f x}
-------------------------------------------------------------------
→ conf {Γ} {Δ , Number} {r} (op-unary f then is) (s , Number x) ls ss
—→ conf is (s , Number (f x)) ls ss
op-binary : ∀ {Γ Δ r is s ls ss f x y}
----------------------------------------------------------------------------------------
→ conf {Γ} {Δ , Number , Number} {r} (op-binary f then is) (s , Number x , Number y) ls ss
—→ conf is (s , Number (f x y)) ls ss
op-testᵀ : ∀ {Γ Δ r is s ls ss f x}
→ f x ≡ true
------------------------------------------------------------------
→ conf {Γ} {Δ , Number} {r} (op-test f then is) (s , Number x) ls ss
—→ conf is (s , Number 1) ls ss
op-testꟳ : ∀ {Γ Δ r is s ls ss f x}
→ ¬ (f x ≡ true)
------------------------------------------------------------------
→ conf {Γ} {Δ , Number} {r} (op-test f then is) (s , Number x) ls ss
—→ conf is (s , Number 0) ls ss
op-relᵀ : ∀ {Γ Δ r is s ls ss f x y}
→ f x y ≡ true
-------------------------------------------------------------------------------------
→ conf {Γ} {Δ , Number , Number} {r} (op-rel f then is) (s , Number x , Number y) ls ss
—→ conf is (s , Number 1) ls ss
op-relꟳ : ∀ {Γ Δ r is s ls ss f x y}
→ ¬ (f x y ≡ true)
-------------------------------------------------------------------------------------
→ conf {Γ} {Δ , Number , Number} {r} (op-rel f then is) (s , Number x , Number y) ls ss
—→ conf is (s , Number 0) ls ss
As described above, when we enter a block we save the current
state in our list of previous frames, and start with an empty stack.
We know that the inner instruction sequence will have the correct
end state because of how we define our block
instruction, but Agda needs us to spell it out anyway.
block : ∀ {Γ Δ r r′ s ls ss}
{is : Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r}
{is′ : Γ +ˡ r′ ⊢ ∅ ⟫ ∅ +ʳ r′}
-------------------------------------------
→ conf {Γ} {Δ} {r} (block is′ then is) s ls ss
—→ conf is′ ∅ ls (ss , is ⦂ s)
loop : ∀ {Γ Δ r r′ s ls ss}
{is : Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r}
{is′ : Γ +ˡ none ⊢ ∅ ⟫ ∅ +ʳ r′}
------------------------------------------------------------
→ conf (loop is′ then is) s ls ss —→ conf is′ ∅ ls (ss , is′ ↬ is ⦂ s)
ifᵀ : ∀ {Γ Δ n r r′ s ls ss}
{is : Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r}
{ts : Γ +ˡ r′ ⊢ ∅ ⟫ ∅ +ʳ r′}
{es : Γ +ˡ r′ ⊢ ∅ ⟫ ∅ +ʳ r′}
----------------------------------------------------------------------------
→ conf {Γ} {Δ , Number} {r} (if ts else es then is) (s , Number (suc n)) ls ss
—→ conf ts ∅ ls (ss , is ⦂ s)
ifꟳ : ∀ {Γ Δ r r′ s ls ss}
{is : Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r}
{ts : Γ +ˡ r′ ⊢ ∅ ⟫ ∅ +ʳ r′}
{es : Γ +ˡ r′ ⊢ ∅ ⟫ ∅ +ʳ r′}
-------------------------------------------------------------------------
→ conf {Γ} {Δ , Number} {r} (if ts else es then is) (s , Number zero) ls ss
—→ conf es ∅ ls (ss , is ⦂ s)
br uses the br-unwind function defined
above.
br : ∀ {Γ Δ Δ′ r ls ss v}
{lbl : (Context.labels Γ) ∋ v}
{s : OpStack (Δ +ʳ v)}
{is : Γ ⊢ Δ′ ⟫ ∅ +ʳ r}
------------------------------------------------------------------
→ conf {Γ} {Δ +ʳ v} {r} (br lbl then is) s ls ss
—→ (br-unwind lbl s ls ss)
br-if is similar, again with two cases for zero and
non-zero.
br-ifᵀ : ∀ {Γ Δ r ls ss v x}
{lbl : (Context.labels Γ) ∋ v}
{s : OpStack (Δ +ʳ v)}
{is : Γ ⊢ Δ +ʳ v ⟫ ∅ +ʳ r}
-------------------------------------------------------------------------------
→ conf {Γ} {(Δ +ʳ v) , Number} {r} (br-if lbl then is) (s , Number (suc x)) ls ss
—→ (br-unwind lbl s ls ss)
br-ifꟳ : ∀ {Γ Δ r ls ss v}
{lbl : (Context.labels Γ) ∋ v}
{s : OpStack (Δ +ʳ v)}
{is : Γ ⊢ Δ +ʳ v ⟫ ∅ +ʳ r}
----------------------------------------------------------------------------
→ conf {Γ} {(Δ +ʳ v) , Number} {r} (br-if lbl then is) (s , Number zero) ls ss
—→ conf is s ls ss
Now we define our local instructions, which only affect the locals part of our configuration.
local-get : ∀ {Γ Δ r T is s ls ss}
{l : (Context.locals Γ) ∋ T}
→ conf {Γ} {Δ} {r} (local-get l then is) s ls ss
—→ conf is (s , ls [ l ]) ls ss
local-set : ∀ {Γ Δ r T is s ls ss}
{l : (Context.locals Γ) ∋ T}
{v : Value T}
→ conf {Γ} {Δ , T} {r} (local-set l then is) (s , v) ls ss
—→ conf is s (ls [ l ]:= v) ss
local-tee : ∀ {Γ Δ r T is s ls ss}
{l : (Context.locals Γ) ∋ T}
{v : Value T}
→ conf {Γ} {Δ , T} {r} (local-tee l then is) (s , v) ls ss
—→ conf is (s , v) (ls [ l ]:= v) ss
Finally, we need to be able to go back up to our previous frames once we’ve completed our current list of instructions.
ascend : ∀ {Γ r ls Δ′ r′ cont s s′ ss′}
--------------------------------------------------------------------
→ conf {Γ +ˡ r} {∅ +ʳ r} {r} done s ls (ss′ , cont ⦂ s′)
—→ conf {Γ} {Δ′ +ʳ r} {r′} cont (transplant-ret s s′) ls ss′
ascendˡ : ∀ {Γ r ls Δ′ r′ s s′ ss′ start cont}
--------------------------------------------------------------------------
→ conf {Γ +ˡ none} {∅ +ʳ r} {r} done s ls (ss′ , start ↬ cont ⦂ s′)
—→ conf {Γ} {Δ′ +ʳ r} {r′} cont (transplant-ret s s′) ls ss′
Progress
Now that we have our reduction rules, we can prove progress: Either all of our instructions are finished, or we can reduce further. Proving progress is relatively straightforward, since we mostly only care about what instruction is up next.
data Progress : Configuration → Set where
done : ∀ {r lsty}
{ls : Locals lsty}
{s : OpStack (∅ +ʳ r)}
--------------------------
→ Progress (conf done s ls (∅ {r}))
step : ∀ {C C′}
→ C —→ C′
----------
→ Progress C
progress : ∀ (c : Configuration)
-------------------
→ Progress c
Here are our parametric instructions, with different cases for
the branch we take when selecting. To make things clearer, we will
use _ for all the parts of the configuration that
aren’t modified by the reduction step.
progress (conf (nop then is) _ _ _) = step nop
progress (conf (drop then is) (_ , v) _ _) = step drop
progress (conf (select then is) (_ , _ , a , Number zero) _ _) = step select-0
progress (conf (select then is) (_ , b , _ , Number (suc _)) _ _) = step select-1
progress (conf (local-get i then _) _ ls _) = step local-get
progress (conf (local-set i then _) (_ , v) ls _) = step local-set
progress (conf (local-tee i then _) (_ , v) ls _) = step local-tee
For op-test and op-rel we must decide
if f x is true ourselves.
progress (conf (const x then is) s _ _) = step const
progress (conf (op-unary f then is) (_ , Number n) _ _) = step op-unary
progress (conf (op-binary f then is) (_ , Number x , Number y) _ _) = step op-binary
progress (conf (op-test f then is) (_ , Number x) _ _) with (f x) ≟ true
... | (yes pass) = step (op-testᵀ pass)
... | (no fail) = step (op-testꟳ fail)
progress (conf (op-rel f then is) (_ , Number x , Number y) _ _) with (f x y) ≟ true
... | (yes pass) = step (op-relᵀ pass)
... | (no fail) = step (op-relꟳ fail)
We may deliberately ascend up the frame stack:
progress (conf (block is′ then is) s _ ss) = step block
progress (conf (loop is′ then is) s _ ss) = step loop
progress (conf (if_else_ ts _ then is) (_ , Number (suc n)) _ ss) = step ifᵀ
progress (conf (if_else_ _ fs then is) (_ , Number zero) _ ss) = step ifꟳ
progress (conf (br lbl then is) s _ ss) = step br
progress (conf (br-if lbl then is) (_ , Number (suc _)) _ ss) = step br-ifᵀ
progress (conf (br-if lbl then is) (_ , Number zero) _ ss) = step br-ifꟳ
Finally, if we run out of instructions in the current frame we
must try to use the next one up. Again, we know that we will have
the return type when we need it thanks to the constraints in
Configuration.
progress (conf done _ _ (ss′ , is′ ⦂ s′)) = step ascend
progress (conf done _ _ (ss′ , ls ↬ is′ ⦂ s′)) = step ascendˡ
We are only done when all instructions are executed, and we can’t go up any more stack frames.
progress (conf done s _ ∅) = done
Thanks to the constraints on our instruction type and configuration types, we have also proved the type system’s soundness: Types are respected throughout the whole run of our program, and the stack will always be transformed as expected.
Evaluation
We now adapt our proof of progress to be used as an evaluator.
First, we define big-step reductions as the reflexive and transitive closure of reductions, with some syntax to make writing them nicer.
data _—↠_ : Configuration → Configuration → Set where
_∎ : (c : Configuration)
-------------------
→ c —↠ c
step—→ : ∀ {B C}
→ (A : Configuration)
→ B —↠ C
→ A —→ B
-------
→ A —↠ C
pattern _—→⟨_⟩_ A A—→B B—↠C = step—→ A B—↠C A—→B
begin_ : ∀ {A B}
→ A —↠ B
------
→ A —↠ B
begin_ A—↠B = A—↠B
This allows us to write chains of reductions, similar to an
execution trace. For example, here is how the 2+2
program from earlier might be executed.
_ : conf Language.test-2+2 ∅ ∅ ∅ —↠ conf done (∅ , Number 4) ∅ ∅
_ = begin
conf (const 2 then
const 2 then
add then done) ∅ ∅ (∅ {some Number})
—→⟨ const ⟩
conf (const 2 then
add then done) (∅ , Number 2) ∅ ∅
—→⟨ const ⟩
conf (add then done) (∅ , Number 2 , Number 2) ∅ ∅
—→⟨ op-binary ⟩
conf done (∅ , Number 4) ∅ ∅
∎
We run our evaluator with a set amount of gas, which keeps Agda’s termination checker happy.
record Gas : Set where
constructor gas
field
amount : Nat
data Finished (c : Configuration) : Set where
done : ∀ {r s′ LS ls′}
→ c —↠ conf {record {labels = ∅; locals = LS}} {∅ +ʳ r} {r} done s′ ls′ ∅
----------
→ Finished c
out-of-gas :
----------
Finished c
eval : Gas → (C : Configuration) → Finished C
eval (gas zero) C = out-of-gas
eval (gas (suc n)) C with progress C
... | done = done (C ∎)
... | step {_} {C′} C→C′ with eval (gas n) C′
... | done C′↠D = done (C —→⟨ C→C′ ⟩ C′↠D)
... | out-of-gas = out-of-gas
Now we can evaluate the programs we defined earlier.
-- 2 + 2 = 4
_ : eval (gas 100) (conf Language.test-2+2 ∅ ∅ ∅) ≡ done (
conf (const 2 then const 2 then op-binary (λ x y → x + y) then done) ∅ ∅ ∅
—→⟨ const ⟩
conf (const 2 then op-binary (λ x y → x + y) then done) (∅ , Number 2) ∅ ∅
—→⟨ const ⟩
conf (op-binary (λ x y → x + y) then done) (∅ , Number 2 , Number 2) ∅ ∅
—→⟨ op-binary ⟩
conf done (∅ , Number 4) ∅ ∅
∎)
_ = refl
-- if false then 42 else 84 -> 84
_ : eval (gas 100) (conf Language.test-if (∅ , Number 0) ∅ ∅) ≡ done (
conf ((if const 42 then done else (const 84 then done)) then done)
(∅ , Number 0)
∅
∅
—→⟨ ifꟳ ⟩
conf (const 84 then done)
∅
∅
(∅ , done ⦂ ∅)
—→⟨ const ⟩
conf done
(∅ , Number 84)
∅
(∅ , done ⦂ ∅)
—→⟨ ascend ⟩
conf done
(∅ , Number 84)
∅
∅
∎)
_ = refl
-- if true then 42 else 84 -> 42
_ : eval (gas 100) (conf Language.test-if (∅ , Number 1) ∅ ∅) ≡ done (
conf ((if const 42 then done else (const 84 then done)) then done)
(∅ , Number 1)
∅
∅
—→⟨ ifᵀ ⟩
conf (const 42 then done)
∅
∅
(∅ , done ⦂ ∅)
—→⟨ const ⟩
conf done
(∅ , Number 42)
∅
(∅ , done ⦂ ∅)
—→⟨ ascend ⟩
conf done
(∅ , Number 42)
∅
∅
∎)
_ = refl
-- infinite loop will never finish
_ : eval (gas 100) (conf Language.test-infloop ∅ ∅ ∅) ≡ out-of-gas
_ = refl
-- loop 3 times
_ : eval (gas 100) (conf Language.test-loop3 ∅ (∅ , Number 0) ∅) ≡ done (
conf (const 3 then local-set Z then loop (local-get Z then const 1 then op-binary (λ x y → x - y) then local-tee Z then br-if Z then done) then done)
∅
(∅ , Number 0)
∅
—→⟨ const ⟩
conf
(local-set Z then
loop
(local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done)
then done)
(∅ , Number 3) (∅ , Number 0) ∅
—→⟨ local-set ⟩
conf
(loop
(local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done)
then done)
∅ (∅ , Number 3) ∅
—→⟨ loop ⟩ -- Loop 1
conf
(local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done)
∅ (∅ , Number 3)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ local-get ⟩
conf
(const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done)
(∅ , Number 3) (∅ , Number 3)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ const ⟩
conf
(op-binary (λ x y → x - y) then local-tee Z then br-if Z then done)
(∅ , Number 3 , Number 1) (∅ , Number 3)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ op-binary ⟩
conf (local-tee Z then br-if Z then done) (∅ , Number 2)
(∅ , Number 3)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ local-tee ⟩
conf (br-if Z then done) (∅ , Number 2) (∅ , Number 2)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ br-ifᵀ ⟩ -- Loop 2
conf
(local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done)
∅ (∅ , Number 2)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ local-get ⟩
conf
(const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done)
(∅ , Number 2) (∅ , Number 2)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ const ⟩
conf
(op-binary (λ x y → x - y) then local-tee Z then br-if Z then done)
(∅ , Number 2 , Number 1) (∅ , Number 2)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ op-binary ⟩
conf (local-tee Z then br-if Z then done) (∅ , Number 1)
(∅ , Number 2)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ local-tee ⟩
conf (br-if Z then done) (∅ , Number 1) (∅ , Number 1)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ br-ifᵀ ⟩ -- Loop 3
conf
(local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done)
∅ (∅ , Number 1)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ local-get ⟩
conf
(const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done)
(∅ , Number 1) (∅ , Number 1)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ const ⟩
conf
(op-binary (λ x y → x - y) then local-tee Z then br-if Z then done)
(∅ , Number 1 , Number 1) (∅ , Number 1)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ op-binary ⟩
conf (local-tee Z then br-if Z then done) (∅ , Number 0)
(∅ , Number 1)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ local-tee ⟩
conf (br-if Z then done) (∅ , Number 0) (∅ , Number 0)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ br-ifꟳ ⟩
conf done ∅ (∅ , Number 0)
(∅ ,
local-get Z then
const 1 then
op-binary (λ x y → x - y) then local-tee Z then br-if Z then done
↬ done ⦂ ∅)
—→⟨ ascendˡ ⟩
conf done ∅ (∅ , Number 0) ∅
∎)
_ = refl
As a final test, here is a program for calculating the fibonacci sequence.
fib : record {labels = ∅; locals = ∅ , Number , Number , Number} ⊢ ∅ , Number ⟫ ∅ , Number
fib = local-set (S (S Z)) then -- n
const 0 then
local-set (S Z) then -- previous
const 1 then
local-set Z then -- acc
loop (
-- load previous and acc
local-get (S Z) then
local-get Z then
-- previous = acc
local-tee (S Z) then
-- acc = acc + previous
add then
local-set Z then
-- subtract from loop variable
local-get (S (S Z)) then
const 1 then
sub then
local-tee (S (S Z)) then
br-if Z then done
) then
local-get Z then done
We avoid giving the full reduction sequence for length reasons.
-- 7th fibonacci number = 21
_ : eval (gas 100) (conf fib (∅ , Number 7) (∅ , Number 0 , Number 0 , Number 0) ∅) ≡ done {s′ = ∅ , Number 21} _
_ = refl
Future Work
This section outlines how you could extend this base to cover the rest of Wasm 1.0. Some of the work has been started in this repository, but it’s a bit of a mess. This is more of a notes section than actual exposition.
Modules
In actual WebAssembly, all code is executed in the context of a
Module (spec version 2.5). To facilitate this, we can create a
seperate type (ModuleTy) and parameterise our context
and configuration based on that. We’ll refer to this as μ below.
Modules also require keeping track of some state that is persistent between function calls (memories, globals, etc). This means you’d probably end up with the following things to keep track of during execution/checking:
-
ModuleTy- what functions/globals/memories/etc are available -
GlobalStore- values of globals, memories, etc, in the shape specified byModuleTy -
Context- types of locals,brpoints, and return type -
Frames- actual data for previous control flow blocks, in the shape specified byContext -
LocalStore- locals and other function-specific values, in the shape specified byContext -
OpStack- the operand stack -
is- remaining instructions in the current control flow block
Some of these could probably be compacted together to make things less unweildy.
In our repository, these are specified in
src/Wasm/Types.agda and
src/Wasm/Configuration.agda.
Function calls
Function calls can be added in a similar way to frames, by adding
a new constructor to Frames, like this:
data Frames where
_,save_⨾_⨾_ : ∀ {μ Δ ε r r′ ls} {Γ : Context μ}
→ Frames μ r Γ
→ Locals Γ
→ OpStack Δ
→ Γ ⊢ Δ +ʳ r′ ⟫ ∅ +ʳ r
-------------------
→ Frames r′ record {labels = ∅; locals = ls; ret = r′} ε
We also need to add a ret field to our
Context, to keep track of how we can use the
return instruction.
We extend our instructions like this:
data _⊢_⟩_ where
return : ∀ {μ Δ Δ′ t} {Γ : Context μ}
→ (Context.ret Γ) ≡ just t
-------------------
→ Γ ⊢ Δ +ʳ t ⟩ Δ′
call : ∀ {μ Δ A L R} {Γ : Context μ}
→ (f : (ModuleTy.functions μ) ∋v record { args = A; locals = L; ret = R } )
→ (args : Δ ∋* A)
→ Γ ⊢ Δ ⟩ (discard args) +ʳ R
Where discard x removes length x
elements from the top of the stack. Our return and
ascend-call reduction steps will be pretty much the
same as the existing br and ascend
steps.
Validation & Instantiation
Validation works pretty much like the spec sets out: Mostly this is just a bit tedious and based on applying the same few checks in different ways.
Instantiation is more difficult, as we essentially need to execute with a partially initialised module when instantiating our globals. Probably it would look something like this:
- Infer our module type by looking at the serialised types, etc.
- Validate and execute the global initialisation code under a mostly-empty module as described by the spec.
- Validate the rest of the functions instructions.
- Compile this into a valid configuration, currently on the start function.
- Run the start function, and return both the module type and the persistent state (memories, mutable globals, etc)
There’s a vague start at this in our repository at
src/Wasm/Validation.agda, using the existing
haskell-wasm library and some FFI bindings in
src/Wasm/Untyped.agda.
Bibliography
W3C. (2019, December 5). WebAssembly Core Specification. https://www.w3.org/TR/wasm-core-1/ (Accessed 11th Aug 2023)
Watt, C., Rao, X., Pichon-Pharabod, J., Bodin, M., Gardner, P. (2021). Two Mechanisations of WebAssembly 1.0. In: Huisman, M., Păsăreanu, C., Zhan, N. (eds) Formal Methods. FM 2021. Lecture Notes in Computer Science(), vol 13047. Springer, Cham. https://doi.org/10.1007/978-3-030-90870-6_4 (Accessed 11th Aug 2023)