The Odyssey of Porting the BEAM JIT to ARM32

A blog series recounting our adventures in the quest to port the BEAM JIT to the ARM32-bit architecture.

This work is made possible thanks to funding from the Erlang Ecosystem Foundation and the ongoing support of its Embedded Working Group.

What is the JIT?

The BEAM JIT (BeamAsm) was introduced in 2020 as part of Erlang/OTP 24. The JIT allows the BEAM to emit machine code that implements BEAM instructions. The JIT executes at runtime during boot of the Erlang VM and every time a new Erlang module is loaded.

BeamAsm currently supports x86 and ARM64 instruction sets. ARM32 is missing, and in this blog series we are going to share with you the details and adventures of this difficult expedition to port the current implementation to ARM32.

But wait, why do we need a JIT?

Without the JIT, BEAM uses the "emulator" (emu in the codebase). The instructions would "simply" translate to a C++ function call to the code that implements the instruction. It is very simple conceptually, summing it up, this is what happens at runtime:

Fetch each instruction
Decode the opcode
Look up the function pointer in a jump table
Execute the function

The emulator requires a C++ function call for every BEAM instruction, which involves:

Stack frame setup/teardown
Register saving/restoring
Branch prediction misses
Cache misses for function pointers

Problem:

Each operation involves dozens to hundreds of host CPU instructions just to be executed every time it's called!

The emulator is heavily optimized but one cannot remove the intrinsic overhead of this approach.

JIT-ting is cooler!

Whenever the BEAM loads a module and the JIT is enabled, the bytecode is processed and translated into machine code on the spot. In other words, all compiled Erlang code becomes native machine code and is written in the memory of the Virtual Machine.

Upon module loading:

Each Erlang function or loop becomes a compiled block of real machine code.
The CPU will run this directly, no interpretation needed.

Result:

Zero dispatch overhead (no switch-case or indirect branches)
Better CPU branch prediction
Faster register and stack access

Why bother with ARM32?

While ARM64 has become the dominant architecture for modern devices, ARM32 remains highly relevant for several important reasons:

Embedded Systems and IoT

Power Efficiency: Many ARM32 designs are optimized for ultra-low power consumption
Real-time Systems: ARM32 is widely used in real-time embedded applications
Cost Constraints: ARM32 chips are often cheaper than ARM64 equivalents
Legacy Devices: Millions of ARM32 devices are still in production and active use

Industrial and Automotive Applications

Industrial Control Systems: Many factory automation systems run on ARM32
Automotive ECUs: Engine control units and vehicle systems often use ARM32
Medical Devices: Critical medical equipment frequently uses ARM32 for reliability
Aerospace: Flight control systems and avionics often rely on ARM32

An ARM32 JIT implementation ensures that Erlang applications can achieve the same performance benefits on ARM32 devices that they currently enjoy on x86-64 and ARM64 platforms, making the BEAM more accessible to a wider range of applications and deployment scenarios.

Porting the JIT to ARM32

This should be easy, right? I mean, we already have 2 available implementations. We could just copy everything from ARM64 and make sure that we use 32-bit integers instead of 64-bit!

Well... I wish it was that easy... 💀

We have an ARM64 implementation, cool... What's not cool for a lazy developer like me, is that ARM64 is a completely different redesign of ARM32.

ARM32 has:

Way fewer registers:
- 31 on ARM64 😎
- 16 on ARM32 😩
Different older instructions, usually less powerful and more limited
Different memory alignments and addressable space

The biggest ordeal here is the constrained amount of registers. We cannot mirror the ARM64 code because it is designed around 31 64-bit registers while we only have 16!

This requires us to make some creative decisions to chart our own course with the few registers we’ve got. There’s not much more room to manoeuvre.

What now?

We knew our job was not going to be easy. But how challenging could this be?

If you have never taken interest in the JIT and the BEAM internals, like I was before starting this project, then you probably wouldn't be able to imagine where to start.

That's OKAY. In this blog series, I will be guiding the reader through all of our adventures and misfortunes in the process. If you choose to follow along, you'll be stuck with me as we crawl towards the light.

In the next episode, we'll start from absolute scratch, think stone-age basics: diving into the OTP codebase, exploring the available documentation, and retracing how we figured out the path to tackle this huge challenge.

I promise not to drag you through every minor detour I stumbled upon, but I'll make sure you see enough of the process to understand why the optimal path makes sense.

The objective for my next post: understand how to get our hands dirty! 💩

Want to know more about the JIT?

Check the Erlang Blog! Erlang Blog

This holds precious insights that greatly help to understand the JIT:

The Odyssey of Porting the BEAM JIT to ARM32 ​

What is the JIT? ​

But wait, why do we need a JIT? ​

Problem: ​

JIT-ting is cooler! ​

Upon module loading: ​

Result: ​

Why bother with ARM32? ​

Embedded Systems and IoT ​

Industrial and Automotive Applications ​

Porting the JIT to ARM32 ​

What now? ​

Want to know more about the JIT? ​