Interacting with WebAssembly memory
One of the most interesting features of WebAssembly is its memory model. Despite providing a system that allows for direct access and control of raw bytes, it does this in a way that offers more safety than one would typically expect out of low-level environments like C/C++. WASM memory is exposed as a linear memory buffer that is sandboxed and managed by the host runtime. For example, JavaScript runtimes, allocate an ArrayBuffer with defined memory bounds. Since this memory is just a JavaScript object, it's subject to the same rules and eventual garbage collection that every other JavaScript object experiences.
The content of the WASM memory, however, may be freely manipulated by the host runtime as well as any runtime used by the module. Linear memory indexes may be treated as memory addresses which is a familiar means for low-level systems to interact with their underlying system and can be used as a means to send complex data between the WASM module and the host runtime. This is usually abstracted away by your toolchain. For example, Rust has the excellent wasm_bindgen
tool and Emscripten provides interoperability APIs. However, there's a lot we can learn by directly manipulating WASM memory so I won't be relying on those in this exercise.
We can access this value in memory from JavaScript by acquiring a reference to the module's memory.
const fs = require('fs');
const wasmBuffer = fs.readFileSync(
'../wasm/target/wasm32-unknown-unknown/release/module.wasm');
(async fn execute() {
const module = await WebAssembly.instantiate(wasmBuffer);
// Get reference to the WASM memory buffer and functions that
// have been exported.
const { memory, raw_memory_access } = module.instance.exports;
// Use a let binding here, this will become relevant later
let buffer = new Uint8Array(memory.buffer);
raw_memory_access();
const value = buffer[4];
console.log(value);
})()
// 99
I'm grabbing the exported memory using the WebAssembly API and constructing a TypedArray view for safer access into the underlying data structure as a Uint8Array
in this case. Knowing the address of the data that we stored ahead of time we can access it from the memory buffer using the property access operators.
Setting a value to a memory address is orthogonally simple
buffer[5] = 44;
We can access the memory address as a pointer. The referenced value can be obtained by dereferencing the pointer.
#[no_mangle]
unsafe fn mem_get(ptr: *const u8) -> u8 {
*ptr
}
We can confirm this in JS by using the function to get the referenced value.
console.log(mem_get(5));
// 44
Examples so far have focused on small integer values but can be extrapolated to work with more complex data structures like strings.
let encodedStr = new TextEncoder().encode('Hello World!');
buffer.set(encodedStr, 6);
I'm using TextEncoder
to UTF-8 encode a JS string. This returns a Uint8Array
that we can conveniently copy to the WASM memory.
On the Rust side, we can decode the string from the raw memory segment and perform. I'm running a replace operation on it which generates a new string which I'm copying into a new memory segment to avoid altering the old string. To make things a bit easier to manage, replace_str
accepts a destination memory address to place the new string and returns the length.
Attempting to access the array buffer at this point will throw an exception.
console.log(buffer.slice(20, 20 + len));
^
TypeError: Cannot perform %TypedArray%.prototype.slice on a detached ArrayBuffer
at Uint8Array.slice (<anonymous>)
at execute ([Redacted Path]/index.js:27:24)
Node.js v17.5.0
This is likely because the default allocator grew the WASM memory behind the scenes. This means that the reference to the WASM memory buffer is no longer valid. Fortunately, this is easily fixed by obtaining a new reference to the memory buffer.
buffer = new Uint8Array(memory.buffer);
The memory content can now be decoded using TextDecoder
.
let str = new TextDecoder().decode(buffer.slice(20, 31));
console.log(str);
// Hello WASM!
So far, I've been keeping track of memory locations and indexes manually by index but in a program that could potentially have thousands if not millions of allocated values, attempting to keep track of all of them this way is an exercise in futility. There's also the added problem of remembering how big each value is to prevent overlaps when storing new data. We may also want to reclaim memory that's no longer needed such that we can store more relevant data.
This task is usually reserved for a memory allocator. Readers familiar with C/C++ may recognize this as the system behind malloc
and free
. The allocator automates the task of dividing available memory into blocks that a user may request on-demand based on the size of the item they wish to store. They may also mark the memory block as available for reallocation once it's no longer relevant.
To allocate arbitrary data structures, you'd ideally want to rely on your toolchain and whatever solutions it has in place for this. However, in the spirit of exploration, we can build a makeshift allocator using existing Rust vectors since they are generally allocated onto the heap.
malloc
allocates a vector and takes ownership of the vector std::mem::forget
. This prevents it from being dropped once the scope ends. Looking at the documentation, we can see how vectors are laid out.
A Vec has three parts: a pointer to the allocation, the size of the allocation, and the number of elements that have been initialized
pub struct Vec<T> {
ptr: *mut T,
cap: usize,
len: usize,
}
We can see the Vec
struct layout in JS by examining the pointer that was returned
const ptr = malloc(10);
buffer = new Uint8Array(memory.buffer);
console.log(buffer.slice(ptr, ptr + 12));
// Uint8Array(12) [
// 8, 0, 17, 0, 10,
// 0, 0, 0, 0, 0,
// 0, 0
// ]
The first 4 bytes 8, 0, 17, 0
or 1114120
in little-endian is a pointer to the data. The next 4 bytes 1, 0, 0, 0
is its length. The last 4 0, 0, 0, 0
is its capacity. The capacity would usually be updated through the Vec
API but since we're using it as a makeshift pointer we can ignore that for now. We can take a look at the data segment by inspecting the memory segment.
Freeing up the memory that we've allocated requires us to drop the vector that we allocated. The process is the inverse of what we did to allocate the value.
#[no_mangle]
unsafe fn free(ptr: *mut usize) -> usize {
let vec = &*(ptr as *const Vec<u8>);
std::mem::drop(vec);
}
Conclusion
Having direct memory access to any system is a blessing and a curse. The raw power afforded by the ability to manage every byte your system is capable of accessing provides an extreme level of flexibility to write incredibly efficient programs. However, this comes at the cost of the time and diligence required to do so safely and correctly. The overwhelming majority of what we've explored so far is really just to understand how WASM works. Under most normal circumstances, you should rely on tools available at your disposal to build out your projects safely and efficiently. However, for those incredibly rare occasions where it's absolutely necessary to get your hands dirty, godspeed and I hope this has been helpful to you.