A blog series recounting our adventures in the quest to port the BEAM JIT to the ARM32-bit architecture.

This work is made possible thanks to funding from the Erlang Ecosystem Foundation and the ongoing support of its Embedded Working Group.

Cross-Compiling Erlang OTP with JIT Support for ARM32 Architecture

Let's start from step 0!

Before writing any code, we need to identify how the existing build works and which tools are needed for the job. In other words, we define the optimal development workflow and necessary toolbox. I`m not going through every detail here, but I will link all key documentation and references so that you can look it up for yourself.

How to build with the JIT

A good starting point is the HOWTO directory. This folder contains guides for building and developing inside the OTP repository.

We are mainly interested in INSTALL.md. It explains the various ways to build OTP.

For example, we could call configure and make manually with desired parameters, or use the automation built into the otp_build script.

We see that the JIT can be enabled or disabled with a flag in configure --{enable,disable}-jit. Other than that, we are interested in building the bare minimum of the Erlang runtime to minimize the complexity of the build. otp_build looks good enough for now, we will use these three sub-commands:

configure
boot
release

We are also interested in INSTALL-CROSS.md. This explains how to cross compile for foreign OS and architectures. Spot the xcomp directory in the OTP project root, here we can find a catalogue of configuration files to target specific OS or architectures. The otp_build script allows to specify such files as configuration, allowing us to define a custom configuration for our needs.

For example:

./otp_build configure --xcomp-conf=xcomp/erl-xcomp-arm-linux-custom.conf

Now that we roughly know how to customize the OTP build, we need to decide the best workflow to develop the Arm32 JIT.

Is cross-building really necessary?

It is obviously simpler to develop and debug any software on the same machine. It would be nice if this could be possible, but we have no ARM32 development machines to work on. We need to emulate an ARM32 CPU in some way. To emulate an Arm32 CPU we chose to use QEMU

With QEMU we can work in 2 different ways:

emulate a whole system, with qemu_system
emulate the user space of a single process with qemu_user

Emulating an entire system would let us run a full ARM-32 virtual machine as a development environment. Unfortunately, qemu-system is not available on macOS, so we would have to nest it inside a Linux VM.

This already sounds bad but I was stubborn enough to try it.

SPOILER ALERT: bad idea.

I managed to run such nested VM. But as anyone could expect, the build performances were so outrageously bad that after many attempts I dropped the idea.

The bright side is that qemu_user exists. This mode lets us emulate a single process, eliminating system-level overhead. As a result, it’s much lighter and performant enough to run inside a VM. However, this approach requires cross-compilation and has a relevant limitation. With `qemu_user our ARM32 code will run in user mode, so it cannot execute privileged opcodes. This isn’t an immediate problem, but keep it in mind because we will need to address it in the future.

How to debug ARM32?

Of course, we need a way to debug our cross-built OTP. Plain GDB won’t work on ARM32 binaries, so we turn to gdb-multiarch, which lets us load and step through binaries for any supported architecture.

Defining the Workflow:

Our process consisted of three main steps:

Cross-compile OTP for ARM32
Run OTP under QEMU user-mode emulation (qemu-arm)
Debug with gdb-multiarch

These decisions shaped our development environment, but getting it up and running was a lot of work in itself. To ensure consistency across different dev machines, we dediced to work on a VM managed with Vagrant. Thanks to Vagrant, we could codidfy the VM installation and setup so that anyone of us could reprodce an identical copy of the working enviroment. Then, we created a dedicated, version-controlled repository for the Vagrantfile and provisioning scripts, making it trivial for anyone to share, replicate and evolve our ARM32 JIT development setup.

Checkout the repository at: https://github.com/stritzinger/arm32-jit

It includes version scripts for:

We will go through each of these scripts in the upcoming blog posts.

Now, what about the JIT code?

Let’s locate it and see how it’s wired into the build.

We need to understand:

where to write our ARM32 JIT code
how the JIT code is selected and added to the compilation

ARM and x86 JIT implementations

The erts directory contains all the components needed to build the Erlang runtime system. Under erts/emulatoris the C++ codebase for the BEAM VM, the main executable. Inside its internal_doc folder you’ll find the BeamAsm documentation, which explains how the Erlang JIT is designed and points to its implementation. It also shows that the JIT uses the asmjit library to write assembler and generate machine code.

Although ASMJIT does not yet officially support ARM32, there's an a32_port branch in development that we’ll use. Note that the asmjit directory sits next to beam rather than as a git submodule; the code has been copied directly into the repo.

If you dive into erts/emulator/beam you will find the core implementation of the VM.

Here you will find two subdirectories:

emu: this implements the emulator, we can see there are many tab files and not many C files
jit: what we came here for

Under jit you will see:

arm: for arm64
x86: for x86

Both directories contain tab files, which are special files that are used to generate C/C++ source code. We will cover these files later.

You will notice that both directories hold the same filenames. This suggests that, when building the JIT, the information about the target CPU is used to compile one of these 2 directories. We can profit from this arrangement as we just need to add another directory with source code for ARM32 and we should be good to go.

Currently, the configure scripts will reject an ARM32 JIT target. In the next episode we’ll walk through adding that new architecture and updating the build configuration so we can start writing our ARM32 JIT implementation.

Cross-Compiling Erlang OTP with JIT Support for ARM32 Architecture ​

How to build with the JIT ​

Is cross-building really necessary? ​

How to debug ARM32? ​

Defining the Workflow: ​

Now, what about the JIT code? ​

ARM and x86 JIT implementations ​