By Nish Tahir in WebAssembly — 14 Feb 2022

The WebAssembly Text Format

While WebAssembly's primary distribution format is a binary file, the spec describes a textual representation called the WebAssembly Text Format^[1] or WAT for short. While WASM binary files are designed to be as small and compact as possible making them ideal for distribution and transmission over the internet, WAT is designed with human readability in mind. Since every WASM binary can be expressed in WAT, it makes for a great decompilation target. We can already see this in action by attempting to view a WASM module in the browser debugger. In my example, Firefox renders the decompiled *.wasm file in its WAT representation.

This means that having some understanding of the format should make it easier to debug WASM modules. So with that motivation in mind, let's explore the format.

The WebAssembly Binary Toolkit^[2] comes with a couple of handy tools to help convert between WASM and WAT; specifically wasm2wat and wat2wasm. Let's see these in action by working on the smallest WASM module we can have. We can run the module through hexdump to see its binary content.

$ hexdump sample.wasm

0000000 00 61 73 6d 01 00 00 00                        
0000008

Breaking this down, the first 4 bytes of the module is the preamble (or magic number) 00 61 73 6d read as \0asm. It identifies the file as a WASM binary. The next 4 bytes represent the version number 01 00 00 00.

Running the binary against the wasm2wat tool, renders the module's WAT representation.

$ wasm2wat sample.wasm 

(module)

This may not be much to look at however this is the smallest valid WASM module.

S-Expression representation

WAT modules are expressed as big Symbolic Expressions^[3] or S-Expressions. This is a notation used to represent nested tree-like data structures, which makes it particularly suitable for representing Abstract Syntax Trees. To illustrate this, let's build an S-Expression representation of the following expression.

A + B * C

We can resolve ambiguities and clarify the order of operations by using parenthesis

(A + (B * C))

And represent the expression as a tree

graph TD; + --> A + --> * * --> B * --> C

We can textually represent this tree in JSON pretty easily.

{
  "add": {
    "lhs": "A",
    "rhs": {
      "mult": {
        "lhs": B,
        "rhs": C
      }
    }
  }
}

At this point, we have something fairly close to an S-expression. We can get closer by replacing braces with parenthesis.

(
  "add": (
    "lhs": "A",
    "rhs": (
      "mult": (
        "lhs": B,
        "rhs": C
      )
    )
  )
)

Next let's remove tokens used in our JSON that directly contribute information about our tree structure, specifically quotes, commas, colons as well as the lhs and rhs keys (we know that add and mult operations always take two operands).

(
  add (
    A
    (
      mult (
        B
        C
      )
    )
  )
)

Finally, let's flatten out the result and we've got our S-Expression.

(add A (mult B C))

Expressions in WAT are written similarly, however, operations/instructions we can use are defined in the specification^[4]. Let's rewrite our example from earlier as a valid WAT expression

(i32.add (local.get 0) (i32.mul (local.get 1) (local.get 2)))

The instructions invoked here are i32.add and i32.mul. Numeric instructions are prefixed with the integer data type that they operate on. The local.get instruction is used to retrieve a variable or parameter defined within the scope. We're referencing local variables by index but they can be given identifiers for convenience. Identifiers are all prefixed with the $ token.

(i32.add (local.get $A) (i32.mul (local.get $B) (local.get $C)))

Before compiling this to WASM we need to wrap this in a function, which is also expressed as an S-Expression.

(func $test (param $A i32) (param $B i32) (param $C i32) (result i32)
 (i32.add
  (local.get $A)
   (i32.mul
    (local.get $B)
    (local.get $C)
   )
 )
)

Function definitions are indexed statically but can be given identifiers for convenience. Here we're defining a function with an identifier $test that accepts 3 parameters. As mentioned earlier, locals are also indexed which makes their parameter names optional. Without these identifiers, our function is still usable but a lot more difficult to read.

(func $test (param i32) (param i32) (param i32) (result i32)
 (i32.add
  (local.get 0)
  (i32.mul
   (local.get 1)
   (local.get 2)
  )
 )
)

Notice that we reference our parameters just by using their numeric index. They are not prefixed with a $ because they are not identifiers.

The body of the function is evaluated with the return value being the result of the expression.

To verify that the code we've written so far is valid, we can wrap it in our module from earlier and serialize it with the wat2wasm tool and inspect the binary output with hexdump.

(module
 (func $test (param $A i32) (param $B i32) (param $C i32) (result i32)
  (i32.add
   (local.get $A)
   (i32.mul
    (local.get $B)
    (local.get $C)
   )
  )
 )
)

$ wat2wasm sample.wat
$ hexdump sample.wasm

0000000 00 61 73 6d 01 00 00 00 01 08 01 60 03 7f 7f 7f
0000010 01 7f 03 02 01 00 0a 0c 01 0a 00 20 00 20 01 20
0000020 02 6c 6a 0b                                    
0000024

We can see that our module is now a lot bigger than when we started. We're not going to go into details on the binary format here but I encourage you to take a look at the different parts of the module interactively using the WebAssembly Code Explorer.

Decompiling our Module

Interestingly, running our generated WASM module through wasm2wat again, we get a different output than the WAT code we serialized.

$ wasm2wat sample.wasm 

(module
  (type (;0;) (func (param i32 i32 i32) (result i32)))
  (func (;0;) (type 0) (param i32 i32 i32) (result i32)
    local.get 0
    local.get 1
    local.get 2
    i32.mul
    i32.add))

Inline comments are written as (; content here ;)

We can observe that parameter function and variable names were not preserved during the serialization process. Only imported and exported items get to keep their names since they may be used when interacting with the host environment.

We're also introduced to type declarations which give us a glimpse into how modules are organized under the hood. Type declarations are stored in the type section of a WASM module and are declared with the type keyword. Each type declaration specifies an entry in the type section of the WASM module. Each type is an indexed entry that describes a function signature; A function signature is a sequence of parameter type declarations followed by a list of return type declarations.

Declaring a function with the func keyword creates a record within the function section table which references a corresponding entry within type section. This means that it's perfectly valid to declare a function while referencing a type declaration by its index.

(module
 (type (func (param i32) (param i32) (param i32) (result i32)))
 (func $test (type 0)
  (i32.add
   (local.get 0)
   (i32.mul
    (local.get 1)
    (local.get 2)
   )
  )
 )
)

And as with everything else that we've been able to reference by index, we can reference them by identifiers instead.

(module
 (type $someInterface (func (param i32) (param i32) (param i32) (result i32)))
 (func $test (type $someInterface)
  (i32.add
   (local.get 0)
   (i32.mul
    (local.get 1)
    (local.get 2)
   )
  )
 )
)

The Stack Machine Representation

What may have been even more surprising is that our addition and multiplication expressions was no longer represented by a nested S-expression after decompilation. This is because WAT has 2 textual representations for code sections.

  (func (;0;) (type 0) (param i32 i32 i32) (result i32)
    local.get 0
    local.get 1
    local.get 2
    i32.mul
    i32.add)

Here each instruction represents an operation on the WASM stack memory. If we were to illustrate the add instruction in JavaScript code, it may look something like this

const stack = []

function add() {
 const left = stack.pop()
 const right = stack.pop()
 stack.push(left + right);
}

Notice that the add instruction here expects 2 operands on the stack and rather than return a value from the function, it instead pushes the result of the operation onto the stack at the end of the function.

With this in mind, let's take a look at the instructions in a bit more detail

# Push A onto the stack
local.get 0  #  [A]

# Push B onto the stack
local.get 1  #  [B, A]

# Push C onto the stack 
local.get 2  #  [C, B, A]

# Pop the last 2 values from the stack
# multiply them and push the result onto the stack
i32.mul    # [C * B, A]

# Pop the last 2 values from the stack
# add them and push the result onto the stack
i32.add    # [(C * B) + A]

# The last value on the stack is the result of the function call

While the representation may be different from the nested S-expression format we wrote earlier from an execution perspective, both formats represent the same set of stack machine operations. This is because, at its core, WASM is a stack machine^[5][6].

Imports and Exports

The main method for a WASM module with the host environment is through imports and exports. We can import a function that we expect to be provided by the host environment by adding an import declaration to our module.

(import "console" "log" (func $log (param i32)))

Here we're importing a log function from the console module and binding it to the $log identifier. The host environment, in this case, the browser, provides a reference to the imported function through the WASM instantiation API.

const importObject = {
  console: {
    log: function(arg) {
      console.log(arg);
    }
  }
};

WebAssembly.instantiateStreaming(fetch('sample.wasm'), importObject)

We can call the function in our wasm module using the call instruction.

(module
  (import "console" "log" (func $log (param i32)))
  (func $test (param i32) (param i32) (param i32)
    (call $log (i32.const 0))
  )
)

Since imports also have function declarations, we can reference type indexes

(module
  (type (func (param i32)))
  (import "console" "log" (func $log (type 0)))
  (func $test (param i32) (param i32) (param i32)
    (call $log (i32.const 0))
  )
)

Exports are declared using the export keyword and require a name that the host environment can reference. Export declarations require a reference to a function index but we can provide an identifier instead.

(export "test" (func $test))

The host environment can access this function through interface bindings that it may have available for interacting with WASM modules.

WebAssembly.instantiateStreaming(fetch('sample.wasm'))
  .then(module => {
    module.instance.exports.test(1, 2, 3));
  })

Conclusion

What we've taken a look at here is just scratching the surface of the WebAssembly Text Format. It's a very powerful tool that makes WebAssembly much more approachable than a lot of other standards out there. Having some understanding of how it works can make a big difference when you encounter it in the wild.

1: MDN Web Docs. (n.d.). Understanding WebAssembly text format. [online] Available at: https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format [Accessed 15 Feb. 2022].

2: GitHub. (2020). WebAssembly/wabt. [online] Available at: https://github.com/WebAssembly/wabt [Accessed 15 Feb. 2022].

3: Mit.edu. (2022). [online] Available at: http://people.csail.mit.edu/rivest/Sexp.txt [Accessed 15 Feb. 2022].

4: webassembly.github.io. (n.d.). Instructions — WebAssembly 1.1 (Draft 2022-01-28). [online] Available at: https://webassembly.github.io/spec/core/text/instructions.html [Accessed 15 Feb. 2022].

5: users.ece.cmu.edu. (n.d.). Stack Computers: 3.2 A GENERIC STACK MACHINE. [online] Available at: https://users.ece.cmu.edu/~koopman/stack_computers/sec3_2.html [Accessed 15 Feb. 2022].

6: stanford-cs242.github.io. (n.d.). CS 242: Stack machines and assembly. [online] Available at: https://stanford-cs242.github.io/f18/lectures/04-1-webassembly-practice.html [Accessed 15 Feb. 2022].

S-Expression representation

Decompiling our Module

The Stack Machine Representation

Imports and Exports

Conclusion

Subscribe to Another Dev's Two Cents