By Nish Tahir in WebAssembly — 14 Mar 2022

Interacting with WebAssembly memory

One of the most interesting features of WebAssembly is its memory model. Despite providing a system that allows for direct access and control of raw bytes, it does this in a way that offers more safety than one would typically expect out of low-level environments like C/C++. WASM memory is exposed as a linear memory buffer that is sandboxed and managed by the host runtime. For example, JavaScript runtimes, allocate an ArrayBuffer with defined memory bounds. Since this memory is just a JavaScript object, it's subject to the same rules and eventual garbage collection that every other JavaScript object experiences.

The content of the WASM memory, however, may be freely manipulated by the host runtime as well as any runtime used by the module. Linear memory indexes may be treated as memory addresses which is a familiar means for low-level systems to interact with their underlying system and can be used as a means to send complex data between the WASM module and the host runtime. This is usually abstracted away by your toolchain. For example, Rust has the excellent wasm_bindgen tool and Emscripten provides interoperability APIs. However, there's a lot we can learn by directly manipulating WASM memory so I won't be relying on those in this exercise.

💡

Here I'm using Rust as my language of choice and NodeJS as my runtime. Your preferred language and runtime may have similar features.

#[no_mangle]
unsafe fn raw_memory_access() {
    let obj: &mut [u8] = 
    	std::slice::from_raw_parts_mut::<u8>(4 as *mut u8, 1);
    obj[0] = 99;
}

Note that we're not starting from 0. This is because std::ptr considers 0 pointers to be null pointers and using it may produce undefined behavior. See rust-lang/rust/#57897.

We can access this value in memory from JavaScript by acquiring a reference to the module's memory.

const fs = require('fs');
const wasmBuffer = fs.readFileSync(
	'../wasm/target/wasm32-unknown-unknown/release/module.wasm');

(async fn execute() {
    const module = await WebAssembly.instantiate(wasmBuffer);
    
    // Get reference to the WASM memory buffer and functions that
    // have been exported.
    const { memory, raw_memory_access } = module.instance.exports;
    
    // Use a let binding here, this will become relevant later
    let buffer = new Uint8Array(memory.buffer);
    
    
    raw_memory_access();
    const value  = buffer[4];
    console.log(value);
})()

// 99

I'm grabbing the exported memory using the WebAssembly API and constructing a TypedArray view for safer access into the underlying data structure as a Uint8Array in this case. Knowing the address of the data that we stored ahead of time we can access it from the memory buffer using the property access operators.

Setting a value to a memory address is orthogonally simple

buffer[5] = 44;

We can access the memory address as a pointer. The referenced value can be obtained by dereferencing the pointer.

#[no_mangle]
unsafe fn mem_get(ptr: *const u8) -> u8 {
    *ptr
}

We can confirm this in JS by using the function to get the referenced value.

console.log(mem_get(5));

// 44

Examples so far have focused on small integer values but can be extrapolated to work with more complex data structures like strings.

let encodedStr = new TextEncoder().encode('Hello World!');
buffer.set(encodedStr, 6);

I'm using TextEncoder to UTF-8 encode a JS string. This returns a Uint8Array that we can conveniently copy to the WASM memory.

#[no_mangle]
unsafe fn replace_str(ptr: *const u8, len: usize, dest: *mut u8) -> usize {
    let raw = std::slice::from_raw_parts(ptr, len);
    let string = str::from_utf8_unchecked(raw);
    let new_string = string.replace("World", "WASM");
    let length = new_string.len();
    std::ptr::copy(new_string.as_ptr(), dest, length);

    length
}

There are other ways to do this more efficiently. Replace returns a heap-allocated string which we could return a pointer to. However, I'm choosing to do it this way for demonstration purposes.

On the Rust side, we can decode the string from the raw memory segment and perform. I'm running a replace operation on it which generates a new string which I'm copying into a new memory segment to avoid altering the old string. To make things a bit easier to manage, replace_str accepts a destination memory address to place the new string and returns the length.

let len = replace_str(6, encodedStr.length, 20);

Memory indexes are chosen arbitrarily to prevent overlaps

Attempting to access the array buffer at this point will throw an exception.

console.log(buffer.slice(20, 20 + len));
                       ^

TypeError: Cannot perform %TypedArray%.prototype.slice on a detached ArrayBuffer
    at Uint8Array.slice (<anonymous>)
    at execute ([Redacted Path]/index.js:27:24)

Node.js v17.5.0

This is likely because the default allocator grew the WASM memory behind the scenes. This means that the reference to the WASM memory buffer is no longer valid. Fortunately, this is easily fixed by obtaining a new reference to the memory buffer.

buffer = new Uint8Array(memory.buffer);

The memory content can now be decoded using TextDecoder.

let str = new TextDecoder().decode(buffer.slice(20, 31));
console.log(str);

// Hello WASM!

So far, I've been keeping track of memory locations and indexes manually by index but in a program that could potentially have thousands if not millions of allocated values, attempting to keep track of all of them this way is an exercise in futility. There's also the added problem of remembering how big each value is to prevent overlaps when storing new data. We may also want to reclaim memory that's no longer needed such that we can store more relevant data.

This task is usually reserved for a memory allocator. Readers familiar with C/C++ may recognize this as the system behind malloc and free. The allocator automates the task of dividing available memory into blocks that a user may request on-demand based on the size of the item they wish to store. They may also mark the memory block as available for reallocation once it's no longer relevant.

To allocate arbitrary data structures, you'd ideally want to rely on your toolchain and whatever solutions it has in place for this. However, in the spirit of exploration, we can build a makeshift allocator using existing Rust vectors since they are generally allocated onto the heap.

#[no_mangle]
unsafe fn malloc(size: usize) -> *const u8 {
    let buf = Vec::with_capacity(size);
    let ptr = &buf as *const Vec<u8> as *const usize;
    std::mem::forget(ptr);

    ptr
}

We have to perform 2 downcasts to get the underlying pointer to the Vec. Please don't actually do this in production code.

malloc allocates a vector and takes ownership of the vector std::mem::forget. This prevents it from being dropped once the scope ends. Looking at the documentation, we can see how vectors are laid out.

A Vec has three parts: a pointer to the allocation, the size of the allocation, and the number of elements that have been initialized

pub struct Vec<T> {
    ptr: *mut T,
    cap: usize,
    len: usize,
}

We can see the Vec struct layout in JS by examining the pointer that was returned

const ptr = malloc(10);
buffer = new Uint8Array(memory.buffer);
console.log(buffer.slice(ptr, ptr + 12));

//  Uint8Array(12) [
//    8, 0, 17, 0, 10,
//    0, 0,  0, 0,  0,
//    0, 0
//  ]

The first 4 bytes 8, 0, 17, 0 or 1114120 in little-endian is a pointer to the data. The next 4 bytes 1, 0, 0, 0 is its length. The last 4 0, 0, 0, 0 is its capacity. The capacity would usually be updated through the Vec API but since we're using it as a makeshift pointer we can ignore that for now. We can take a look at the data segment by inspecting the memory segment.

let dataPointer = Buffer.from(buffer.slice(ptr, ptr + 4)).readInt32LE(0);
console.log(buffer.slice(dataPointer, dataPointer + 10));

// Uint8Array(10) [
//  0, 0, 0, 0, 0,
//  0, 0, 0, 0, 0
// ]

We can confirm that this is the correct memory segment by adding some data during the allocation process for testing. However, this is an exercise I will leave up to the reader.

Freeing up the memory that we've allocated requires us to drop the vector that we allocated. The process is the inverse of what we did to allocate the value.

#[no_mangle]
unsafe fn free(ptr: *mut usize) -> usize {
    let vec = &*(ptr as *const Vec<u8>);
    std::mem::drop(vec);
}

Conclusion

Having direct memory access to any system is a blessing and a curse. The raw power afforded by the ability to manage every byte your system is capable of accessing provides an extreme level of flexibility to write incredibly efficient programs. However, this comes at the cost of the time and diligence required to do so safely and correctly. The overwhelming majority of what we've explored so far is really just to understand how WASM works. Under most normal circumstances, you should rely on tools available at your disposal to build out your projects safely and efficiently. However, for those incredibly rare occasions where it's absolutely necessary to get your hands dirty, godspeed and I hope this has been helpful to you.

Conclusion

Subscribe to Another Dev's Two Cents