A blog series recounting our adventures in the quest to port the BEAM JIT to the ARM32-bit architecture.
This work is made possible thanks to funding from the Erlang Ecosystem Foundation and the ongoing support of its Embedded Working Group.
The Erlang ARM32 JIT is born! ​
This week we finally achieved our first milestone in developing the ARM32 JIT. We executed our first Erlang function through JITted ARM32 machine code!
~/arm32-jit$ qemu-arm -L /usr/arm-linux-gnueabihf ./otp/RELEASE/erts-15.0/bin/beam.smp -S 1:1 -SDcpu 1:1 -SDio 1 -JDdump true -JMsingle true -- -root /home/arm32-jit/otp/RELEASE -progname erl -home /home
~/arm32-jit$ echo $?
42
The BEAM successfully runs and terminates with error code 42! That 42 comes from an Erlang function, just-in-time compiled by our ARM32 JIT!
Announcement is done! All code is available at https://github.com/stritzinger/otp/tree/arm32-jit
Keep reading for a lot of interesting details!
The first piece of Erlang code ​
-module(hello).
-export([start/2]).
start(_BootMod, _BootArgs) ->
halt(42, [{flush, false}]).
This is hello.erl
that contains a start/2
function. The function head mimics the erl_init:start/2
function, which is the entry point of the first Erlang process. We replaced erl_init:start/2
with hello:start/2
in the erl_init.c
module of the BEAM VM. This way, we forced the runtime to execute this Erlang function.
hello:start/2
is very simple as it just calls the erlang:halt/2
. This function is a BIF (Built-in Function) that executes C code, part of the BEAM VM. This code executes an ordered shutdown of the BEAM and allows us to customize the error code, in this case: 42
.
(Why {flush, false}
? At the time I am writing this, letting it be true causes a segmentation fault EHEH)
Obviously, we need to compile this Erlang module, but I will also generate the BEAM assembly so we can have a look at what we will have to deal with.
{module, hello}. %% version = 0
{exports, [{module_info,0},{module_info,1},{start,2}]}.
{attributes, []}.
{labels, 7}.
{function, start, 2, 2}.
{label,1}.
{line,[{location,"erts/preloaded/src/hello.erl",74}]}.
{func_info,{atom,hello},{atom,start},2}.
{label,2}.
{move,{literal,[{flush,false}]},{x,1}}.
{move,{integer,42},{x,0}}.
{line,[{location,"erts/preloaded/src/hello.erl",76}]}.
{call_ext_only,2,{extfunc,erlang,halt,2}}.
{function, module_info, 0, 4}.
{label,3}.
{line,[]}.
{func_info,{atom,hello},{atom,module_info},0}.
{label,4}.
{move,{atom,hello},{x,0}}.
{call_ext_only,1,{extfunc,erlang,get_module_info,1}}.
{function, module_info, 1, 6}.
{label,5}.
{line,[]}.
{func_info,{atom,hello},{atom,module_info},1}.
{label,6}.
{move,{x,0},{x,1}}.
{move,{atom,hello},{x,0}}.
{call_ext_only,2,{extfunc,erlang,get_module_info,2}}.
You can spot the start function and the two standard module_info functions that all Erlang modules have. We do not care much about those right now as we discovered that they are not executed and are not required to work, for now.
We can see that the core of the start function is just two move
operations and one call_ext_only
. But bear in mind that the BEAM loader will transmute these Generic BEAM Operations into Specific operations. More complexity will pop up!
Execution ​
We are using qemu-arm
to emulate Arm32
and we are directly using beam.smp
to run the BEAM.
~/arm32-jit$ qemu-arm -L /usr/arm-linux-gnueabihf ./otp/RELEASE/erts-15.0/bin/beam.smp -S 1:1 -SDcpu 1:1 -SDio 1 -JDdump true -JMsingle true -- -root /home/vagrant/arm32-jit/otp/RELEASE -progname erl -home /home/vagrant
JIT initialization ​
At boot, the BEAM initializes the JIT if enabled. The JIT leverages the AsmJit library to emit all machine code instructions.
Emission of all global shared fragments ​
There are 90+ code snippets that are shared among all modules. The JIT loads them one single time and sets up jumps to them in every other module. It is like a global library for all modules.
We skipped most of these because just the shared fragments involved in the hello:start/2
execution were needed.
Emission of the erts_beamasm module ​
As part of the JIT initialization, erts_beamasm
is emitted. This module is an internal hardcoded module that exists only when BEAM is using the JIT. It holds 7 fundamental instructions used to manage the Erlang process executions.
- run_process - The main process execution entry point
- normal_exit - Normal process termination
- continue_exit - Continue after exit handling
- exception_trace - Exception tracing functionality
- return_trace - Return value tracing
- return_to_trace - Return to tracing state
- call_trace_return - Call tracing return handling
Preloaded modules ​
The hello.erl
module has been compiled and put as first and single Erlang module in the list of preloaded modules. Preloaded modules are Erlang fundamental modules that are always loaded by the BEAM before the first Erlang process can start. They implement, in Erlang, the core features of the Erlang Runtime System (ERTS). The OTP build scripts group all ebin
files into a single C header that is then linked into the executable. This makes the Erlang binaries available as a static C array in the BEAM source code. These are then loaded one by one after the BEAM VM is initialized.
Cool, let's nuke all these modules and leave just our hello.erl
. It does not need many BEAM instructions and we can easily verify that it executes. To do the substitution we just need to change this build variable in otp/erts/emulator/Makefile.in
We are running BEAMASM with -JDdump true
so asmjit
will dump all ARM32 assembly for each module! This is incredibly useful if monitored while executing with a debugger, as we can see the assembler being printed line by line by our code.
~/arm32-jit$ cat hello.asm
L6:
.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
# i_flush_stubs
# func_line_I
# aligned_label_Lt
label_1:
# i_func_info_IaaI
# hello:start/2
blx L8
.byte 0x00, 0x00, 0x00, 0x00
.byte 0x0B, 0x4F, 0x00, 0x00, 0x0B, 0xA4, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00
# aligned_label_Lt
start/2:
# i_breakpoint_trampoline
str lr, [r7, -4]!
b L9
bl L11
L9:
# i_test_yield
adr r2, start/2
subs r9, r9, 1
b.le L13
# i_move_sd
ldr r12, [L14]
str r12, [r4, 68]
# i_move_sd
movw r12, 687
str r12, [r4, 64]
# line_I
# allocate_tt
# call_light_bif_be
L15:
ldr r3, [L16]
movw r1, 10188
movt r1, 16432
adr r2, L15
# BIF: erlang:halt/2
sub r12, r7, 4
cmp r10, r12
b.ls L17
udf 48879
L17:
movw r12, 12424
add r12, r4, r12
ldr r12, [r12]
cmp sp, r12
b.eq L18
udf 57005
L18:
bl L20
# deallocate_t
movw r0, 64676
movt r0, 16480
blx L22
# return
movw r0, 61636
movt r0, 16480
blx L22
# i_flush_stubs
# func_line_I
# aligned_label_Lt
label_3:
# i_func_info_IaaI
# hello:module_info/0
blx L8
.byte 0x00, 0x00, 0x00, 0x00
.byte 0x0B, 0x4F, 0x00, 0x00, 0x4B, 0x6B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
# aligned_label_Lt
module_info/0:
# i_breakpoint_trampoline
str lr, [r7, -4]!
b L23
bl L11
L23:
# i_test_yield
adr r2, module_info/0
subs r9, r9, 1
b.le L13
# i_move_sd
movw r12, 20235
str r12, [r4, 64]
# allocate_tt
# call_light_bif_be
L24:
ldr r3, [L25]
movw r1, 4772
movt r1, 16425
adr r2, L24
# BIF: erlang:get_module_info/1
sub r12, r7, 4
cmp r10, r12
b.ls L26
udf 48879
L26:
movw r12, 12424
add r12, r4, r12
ldr r12, [r12]
cmp sp, r12
b.eq L27
udf 57005
L27:
bl L20
# deallocate_t
movw r0, 64676
movt r0, 16480
blx L22
# return
movw r0, 61636
movt r0, 16480
blx L22
# i_flush_stubs
# func_line_I
# aligned_label_Lt
label_5:
# i_func_info_IaaI
# hello:module_info/1
blx L8
.byte 0x00, 0x00, 0x00, 0x00
.byte 0x0B, 0x4F, 0x00, 0x00, 0x4B, 0x6B, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00
# aligned_label_Lt
module_info/1:
# i_breakpoint_trampoline
str lr, [r7, -4]!
b L28
bl L11
L28:
# i_test_yield
adr r2, module_info/1
subs r9, r9, 1
b.le L13
# i_move_sd
ldr r12, [r4, 64]
str r12, [r4, 68]
# i_move_sd
movw r12, 20235
str r12, [r4, 64]
# allocate_tt
# call_light_bif_be
L29:
ldr r3, [L30]
movw r1, 4868
movt r1, 16425
adr r2, L29
# BIF: erlang:get_module_info/2
sub r12, r7, 4
cmp r10, r12
b.ls L31
udf 48879
L31:
movw r12, 12424
add r12, r4, r12
ldr r12, [r12]
cmp sp, r12
b.eq L32
udf 57005
L32:
bl L20
# deallocate_t
movw r0, 64676
movt r0, 16480
blx L22
# return
movw r0, 61636
movt r0, 16480
blx L22
# int_code_end
L33:
movw r0, 18576
movt r0, 16480
blx L22
L13:
L12:
movw r12, 1968
movt r12, 14656
blx r12
L22:
L21:
movw r12, 29192
movt r12, 16399
blx r12
L11:
L10:
movw r12, 1752
movt r12, 14656
blx r12
L20:
L19:
movw r12, 680
movt r12, 14656
blx r12
L8:
L7:
movw r12, 1824
movt r12, 14656
blx r12
# Begin stub section
L14:
.xword 0x000000007FFFFFFF
L16:
.xword 0x000000007FFFFFFF
L25:
.xword 0x000000007FFFFFFF
L30:
.xword 0x000000007FFFFFFF
# End stub section
L34:
.section .rodata {#1}
md5:
.byte 0x6D, 0xC4, 0x1E, 0xF1, 0x13, 0x1E, 0xBF, 0xF2, 0x4B, 0xF5, 0xC0, 0x41, 0x57, 0x86, 0xDF, 0xD5
.section .text {#0}
; CODE_SIZE: 632
Bear in mind, this assembler is not what hello should look like. We are missing a lot of things.
You can spot many sequences like:
movw r0, 64676
movt r0, 16480
blx L22 # <---- branch to NYI
This is a call to nyi
(Not Yet Implemented) function and the argument loaded to R0 is the pointer to a string that contains the name of the BEAM instruction that should have been emitted instead. You can spot many of these since we are only emitting the code to reach halt. Everything after that is not important now as halt will never return!
There are many more comments we could make around all the details in this assembler dump, but let's move on.
Jumping into Jitted code! ​
Later in the BEAM initialization the first Erlang process will be allocated and started.
We swap the module and function with hello in erts/emulator/beam/erl_init.c
erl_spawn_system_process(&parent, am_hello, am_start, args, &so);
One BEAM scheduler thread will jump to the process_main
function. You can find it here in the source code. This is emitted by our JIT and is the first emitted code that will run.
Here we need to handle the Erlang processes scheduling by calling BEAM routines that implement the algorithms of Erlang concurrency, like erts_schedule
.
erts_schedule
will return the pointer to the Process
C structure that holds all information about the process that is going to execute. We then load all necessary data inside registers and then we branch to the exact point where the program execution stopped.
The first Erlang function call ​
In this case we are calling hello:start/2
so the first instruction to execute is apply_only
that does a few things but ends up calling the C apply
routine.
The routine processes the Module-Function-Arity information to get the address where the function code resides in memory.
What follows is the Erlang function prologue. You can see it in the assembler code section above. For example, all functions have these instructions in their prologue:
- i_breakpoint_trampoline: handle breakpoints for the
debugger
app - i_test_yield: checks if the function should yield and go back to the scheduler
We have minimal or partial implementations of these since we do not really need them. We have to emit them though, as the C++ generated loader functions from the BEAM are expanding the Erlang function call Operation into a more specific and complex function prologue sequence.
After that, we added support for the call_light_bif
operation that precedes the call to the halt_2 BIF routine. This implementation is also minimal.
Question for later: did you notice that we put a 42
as a number in the code? Numeric constants are printed as decimals in the dump, but we cannot spot any 42!?
After the call, we see two other operations:
- dealloc
- return
These are just calls to NYI as we will never reach this code! So for now, we can skip them...
Let's roll the JIT! ​
~/arm32-jit$ qemu-arm -L /usr/arm-linux-gnueabihf ./otp/RELEASE/erts-15.0/bin/beam.smp -S 1:1 -SDcpu 1:1 -SDio 1 -JDdump true -JMsingle true -- -root /home/arm32-jit/otp/RELEASE -progname erl -home /home
~/arm32-jit$
Impressive, the program returns immediately without even saying "Hi" ... and without Segmentation Fault!!
But let's check the program return code!
~/arm32-jit$ echo $?
42
We can safely say that number is not there by accident! This is a great achievement as from now on we will be able to incrementally add Erlang instructions.
Every Erlang line we add will trigger new Opcodes. By emitting them and running the code we will have immediate feedback on everything.
The next goal now is to complete the hello
module to host all possible beam instructions!
Hey where is 42??? ​
One interesting thing I spotted looking at the assembly: You cannot find the number 42
in there. Or actually, you can, it is just hidden in plain sight. To understand you need to know how we are using ARM32 registers.
In particular the register r4
, a callee-saved register. We are using it to store the pointer to the ErtsSchedulerRegisters
struct. The ErtsSchedulerRegisters
contains the X register array. When a function is called, X registers are used to store the arguments of the call.
This becomes more obvious if we compare the Erlang assembly to the Arm32 assembly.
# i_move_sd <---- {move,{literal,[{flush,false}]},{x,1}}. % List at X[1]
ldr r12, [L14]
str r12, [r4, 68]
# i_move_sd <---- {move,{integer,42},{x,0}}. % 42 at X[0]
movw r12, 687
str r12, [r4, 64]
# line_I
# allocate_tt
# call_light_bif_be
L15:
ldr r3, [L16]
movw r1, 10188
movt r1, 16432
adr r2, L15
# BIF: erlang:halt/2
# ...
42 is stored at r4
+64.
- r4: pointer to the
ErtsSchedulerRegisters
struct - 64: base offset from the beginning of the struct to the beginning of the
x_reg_array
The list is stored at r4
+68.
- 68: is the base offset + the size of one
Eterm
(4 bytes on ARM32)
But why in assembly do we see 687 and not 42?
Converting both numbers to hex we get:
- 42 -> 2A
- 687 -> 2AF !!
Yep, this is an example of a Tagged Value. If we consult the BEAM book we can learn about the Tagging Scheme:
- 00 11 Pid
- 01 11 Port
- 10 11 Immediate 2
- 11 11 Small integer
42 is tagged with 1111
at the low end. So the BEAM can quickly recognize during a pattern match that this Erlang Term is a Small Integer!