🪸 GRiSP Nano: A First Descent into Internal SRAM
Back in June 2025, GRiSP Nano had finally reached the Erlang shell. The next milestone was relocating selected code into internal static random-access memory (SRAM).

Reaching that point meant descending further into the lower levels of the platform: the bootloader, linker scripts, memory protection unit (MPU) configuration, and board support package (BSP) startup hooks. This post follows that path through the deeper parts of the system and shows what was required to make that happen.
🏝️ The surface
🤿 Hardware
To follow the rest of this post, we first need a brief view of the GRiSP Nano memory layout. Two memory blocks matter here: the 16 MB dynamic random-access memory (DRAM) block, which GRiSP Nano has so far used to run the application, and the STM32U5's internal 3 MB static random-access memory (SRAM), where part of the application is to be relocated.
Using internal static random-access memory (SRAM) for part of the application also changes how memory is used on the system. In particular, it frees that space in dynamic random-access memory (DRAM) for other parts of the user application. Since internal SRAM is also faster than DRAM, it might improve execution speed and possibly startup time as well.
🌊 Descent
🐟 First stop: Linker script files
The first place to look is the linker script files, where the memory layout is defined. These files map sections such as .text, .bss, and .data to specific addresses.
For the purposes of this post, the memory map of the STM32U5G7VJT can be simplified as follows:
(high addresses)
*-----------------------------* 0x9100_0000
| OCTOSPI_1 |
*-----------------------------* 0x9000_0000
| ... |
*-----------------------------* 0x8000_0000
| OCTOSPI_2 |
*-----------------------------* 0x7000_0000
| ... |
*-----------------------------* 0x2800_4000
| INT_SRAM_LPBAM |
*-----------------------------* 0x2800_0000
| ... |
*-----------------------------* 0x202F_0000
| INT_SRAM |
*-----------------------------* 0x2000_0000
| ... |
*-----------------------------* 0x0840_0000
| FLASH |
*-----------------------------* 0x0800_0000
(low addresses)At first glance, the task looks simple: map the right sections to the right regions. In practice, it is more complex. RTEMS documentation for these linkcmds files is sparse, and linker-script misconfigurations can be tedious to debug.
🐠 Bootloader memory mapping
The starting point is the bootloader linker file (linkcmds.bootloader):
INCLUDE linkcmds.memory
MEMORY {
APP : ORIGIN = 0x90000000, LENGTH = 0x01000000 - 5M
BL : ORIGIN = 0x91000000 - 5M, LENGTH = 5M
}
stm32u5_memory_app_begin = ORIGIN (APP);
stm32u5_memory_app_end = ORIGIN (APP) + LENGTH (APP);
stm32u5_memory_app_size = LENGTH (APP);
stm32u5_memory_bl_begin = ORIGIN (BL);
stm32u5_memory_bl_end = ORIGIN (BL) + LENGTH (BL);
stm32u5_memory_bl_size = LENGTH (BL);
REGION_ALIAS ("REGION_START", FLASH);
REGION_ALIAS ("REGION_VECTOR", FLASH);
REGION_ALIAS ("REGION_TEXT", FLASH);
REGION_ALIAS ("REGION_TEXT_LOAD", FLASH);
REGION_ALIAS ("REGION_RODATA", FLASH);
REGION_ALIAS ("REGION_RODATA_LOAD", FLASH);
REGION_ALIAS ("REGION_DATA", BL);
REGION_ALIAS ("REGION_DATA_LOAD", FLASH);
REGION_ALIAS ("REGION_FAST_TEXT", FLASH);
REGION_ALIAS ("REGION_FAST_TEXT_LOAD", FLASH);
REGION_ALIAS ("REGION_FAST_DATA", INT_SRAM);
REGION_ALIAS ("REGION_FAST_DATA_LOAD", FLASH);
REGION_ALIAS ("REGION_BSS", BL);
REGION_ALIAS ("REGION_WORK", BL);
REGION_ALIAS ("REGION_STACK", INT_SRAM);
REGION_ALIAS ("REGION_NOCACHE", INT_SRAM);
REGION_ALIAS ("REGION_NOCACHE_LOAD", FLASH);
bsp_vector_table_in_start_section = 1;
INCLUDE linkcmds.armv7mThis file does three things:
- It defines two zones in
OCTOSPI_1:APP, where the application is stored, andBL, where the bootloader is stored. - It creates symbols for the start, end, and size of each of these zones. These symbols are embedded in the ELF and used by the bootloader C code.
- It maps the different sections (
REGION_START, ...) to physical memory regions.
To move APP into INT_SRAM, several constraints had to be respected:
- There must be no overlap between bootloader and application regions.
- There must be enough space in the internal memory to fit the sections that are mapped into it.
- The current GRiSP Nano bootloader is minimal and only loads one contiguous blob, so
APPcannot be split across regions.
To address this, a new region is added: BL_SRAM. This divides SRAM between the application and the bootloader. Then 2.9 MB of SRAM is allocated to APP and the remaining 100 KB to BL_SRAM:
MEMORY {
APP : ORIGIN = 0x20000000, LENGTH = 2900K
BL_SRAM : ORIGIN = 0x20000000 + 2900K, LENGTH = 100K
BL : ORIGIN = 0x90000000, LENGTH = 5M
}New symbols are also defined for this new memory region:
stm32u5_memory_bl_sram_begin = ORIGIN (BL_SRAM);
stm32u5_memory_bl_sram_end = ORIGIN (BL_SRAM) + LENGTH (BL_SRAM);
stm32u5_memory_bl_sram_size = LENGTH (BL_SRAM);Finally, the bootloader regions that were in INT_SRAM are remapped to BL_SRAM:
...
REGION_ALIAS ("REGION_FAST_DATA", BL_SRAM);
...
REGION_ALIAS ("REGION_STACK", BL_SRAM);
REGION_ALIAS ("REGION_NOCACHE", BL_SRAM);
...🐡 Application memory mapping
The application linker file was more straightforward. All sections were mapped to INT_SRAM except REGION_BSS and REGION_WORK, which remained in external memory (OCTOSPI_1). These two regions are generally large and do not fit in SRAM.
Another point that emerged during this remapping is that moving a region with a corresponding *_LOAD section to external memory would not reduce binary size. The binary would still need to contain a copy of the initialized contents to load that region during startup.
With all this knowledge in mind, the resulting linkcmds file is the following:
INCLUDE linkcmds.memory
REGION_ALIAS ("REGION_START", INT_SRAM);
REGION_ALIAS ("REGION_VECTOR", INT_SRAM);
REGION_ALIAS ("REGION_TEXT", INT_SRAM);
REGION_ALIAS ("REGION_TEXT_LOAD", INT_SRAM);
REGION_ALIAS ("REGION_RODATA", INT_SRAM);
REGION_ALIAS ("REGION_RODATA_LOAD", INT_SRAM);
REGION_ALIAS ("REGION_DATA", INT_SRAM);
REGION_ALIAS ("REGION_DATA_LOAD", INT_SRAM);
REGION_ALIAS ("REGION_FAST_TEXT", INT_SRAM);
REGION_ALIAS ("REGION_FAST_TEXT_LOAD", INT_SRAM);
REGION_ALIAS ("REGION_FAST_DATA", INT_SRAM);
REGION_ALIAS ("REGION_FAST_DATA_LOAD", INT_SRAM);
REGION_ALIAS ("REGION_BSS", OCTOSPI_1);
REGION_ALIAS ("REGION_WORK", OCTOSPI_1);
REGION_ALIAS ("REGION_STACK", INT_SRAM);
REGION_ALIAS ("REGION_NOCACHE", INT_SRAM);
REGION_ALIAS ("REGION_NOCACHE_LOAD", INT_SRAM);
bsp_vector_table_in_start_section = 1;
INCLUDE linkcmds.armv7m🦑 Second stop: Memory protection unit (MPU)
After setting up the memory layout, the Erlang shell was expected to appear. Instead, the console hung right after the bootloader handed control to the application.
The next lead involved the MPU. Since the MPU restricts access to memory regions, it seemed possible that execution from internal SRAM was not yet permitted. A correct MPU setup is also useful later for catching certain classes of software faults.
To reach the shell quickly, the MPU was configured with two regions: one for the bootloader and one for the application. An example from STMicroelectronics on GitHub served as a useful guideline.
First, attribute 0 of the Memory Attribute Indirection Registers (MAIR) is configured with value 0xFF. This sets the region's inner and outer memory attributes to normal memory in write-back non-transient mode.
The MPU regions then need to be configured. The bootloader is assigned region 0, while the application is assigned region 1. Both regions are configured in the same way: non-shareable, read/write, and executable. Both also use MAIR attribute number 0.
Finally, the MPU is enabled with privileged default mapping. This was needed because RTEMS does not provide strict OS/application separation.
The region boundaries are taken from the symbols defined earlier in the linker file, such as stm32u5_memory_bl_sram_begin and stm32u5_memory_app_end.
static void MPU_Config(void)
{
MPU_Region_InitTypeDef app_region;
MPU_Region_InitTypeDef bl_region;
/* Disable MPU before perloading and config update */
HAL_MPU_Disable();
/* MAIR attribute configuration */
MPU_Attributes_InitTypeDef attr_region0;
attr_region0.Number = MPU_ATTRIBUTES_NUMBER0;
attr_region0.Attributes = 0xFFU ;
HAL_MPU_ConfigMemoryAttributes(&attr_region0);
/* Bootloader region configuration */
bl_region.Enable = MPU_REGION_ENABLE;
bl_region.Number = MPU_REGION_NUMBER0;
bl_region.AttributesIndex = MPU_ATTRIBUTES_NUMBER0;
bl_region.BaseAddress = (uint32_t) stm32u5_memory_bl_sram_begin;
bl_region.LimitAddress = (uint32_t) stm32u5_memory_bl_sram_end;
bl_region.AccessPermission = MPU_REGION_ALL_RW;
bl_region.DisableExec = MPU_INSTRUCTION_ACCESS_ENABLE;
bl_region.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
HAL_MPU_ConfigRegion(&bl_region);
/* App region configuration */
app_region.Enable = MPU_REGION_ENABLE;
app_region.Number = MPU_REGION_NUMBER1;
app_region.AttributesIndex = MPU_ATTRIBUTES_NUMBER0;
app_region.BaseAddress = (uint32_t) stm32u5_memory_app_begin;
app_region.LimitAddress = (uint32_t) stm32u5_memory_app_end;
app_region.AccessPermission = MPU_REGION_ALL_RW;
app_region.DisableExec = MPU_INSTRUCTION_ACCESS_ENABLE;
app_region.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
HAL_MPU_ConfigRegion(&app_region);
/* Enable the MPU */
HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
}Although this MPU setup was useful in the longer term, it did not solve the missing shell. The application was still hanging, which led to the next stop: the BSP.
🪼 Third stop: RTEMS board support package (BSP)
After MPU initialization, attention turned to two startup hooks in the BSP: bsp_start_hook_0 and bsp_start_hook_1. These hooks perform board-specific initialization:
void BSP_START_TEXT_SECTION bsp_start_hook_0(void)
{
HAL_GetTick_ptr = Startup_HAL_GetTick;
startup_delay_call_counter = 0;
/* If we are running from OctoSPI, we must not touch the clocks and pins.
* Otherwise the OSPI RAM won't work any more. */
if (stm32u5_init_octospi < stm32u5_memory_octospi_1_begin ||
stm32u5_init_octospi > stm32u5_memory_octospi_1_end) {
SystemInit();
SystemCoreClockUpdate();
stm32u5_rcc_power_clock_enable();
stm32u5_init_oscillator();
stm32u5_init_clocks();
stm32u5_init_power();
stm32u5_init_peripheral_clocks();
}
HAL_Init();
}
void BSP_START_TEXT_SECTION bsp_start_hook_1(void)
{
/* Init OctoSPI only if we are not running from it */
if (stm32u5_init_octospi < stm32u5_memory_octospi_1_begin ||
stm32u5_init_octospi > stm32u5_memory_octospi_1_end) {
stm32u5_init_octospi();
}
bsp_start_copy_sections();
bsp_start_clear_bss();
HAL_GetTick_ptr = Booted_HAL_GetTick;
}In the original code, the guards check whether execution is taking place from OCTOSPI. In this case, however, the code is split between INT_SRAM and OCTOSPI. As a result, those guards no longer distinguished the application case correctly for this layout, and they triggered clock and OctoSPI re-initialization when they should not have.
Once this was identified, a quick fix was applied: the guarded initialization code was disabled and the BSP was rebuilt for the application only, while the bootloader kept the original BSP behavior.
To support this in the toolchain build, two BSP variants were generated: one for the bootloader and one for the application. With C conditionals such as #ifdef, the unwanted initialization can be removed when building for the application:
void BSP_START_TEXT_SECTION bsp_start_hook_0(void)
{
HAL_GetTick_ptr = Startup_HAL_GetTick;
startup_delay_call_counter = 0;
/* If we are running from OctoSPI, we must not touch the clocks and pins.
* Otherwise the OSPI RAM won't work any more. */
if (stm32u5_init_octospi < stm32u5_memory_octospi_1_begin ||
stm32u5_init_octospi > stm32u5_memory_octospi_1_end) {
#ifdef GRISP_NANO_BOOTLOADER
SystemInit();
SystemCoreClockUpdate();
stm32u5_rcc_power_clock_enable();
stm32u5_init_oscillator();
stm32u5_init_clocks();
stm32u5_init_power();
stm32u5_init_peripheral_clocks();
#endif
}
HAL_Init();
}
void BSP_START_TEXT_SECTION bsp_start_hook_1(void)
{
/* Init OctoSPI only if we are not running from it */
if (stm32u5_init_octospi < stm32u5_memory_octospi_1_begin ||
stm32u5_init_octospi > stm32u5_memory_octospi_1_end) {
#ifdef GRISP_NANO_BOOTLOADER
stm32u5_init_octospi();
#endif
}
bsp_start_copy_sections();
bsp_start_clear_bss();
HAL_GetTick_ptr = Booted_HAL_GetTick;
}🪸 Last stop: The cache issue
After fixing and rebuilding the BSP for the application, the expected boot output appeared. The application, however, still crashed later in the boot sequence. The failure occurred after the following log messages:
[ERL] ERROR: SD card could not be mounted after timeout
[ERL] Reading /media/mmcsd-0-0/grisp.ini
[ERL] WARNING: /media/mmcsd-0-0/grisp.ini not found, using defaults
[ERL] Booting with arg: erl.rtems -- -root otp -home home -boot start_sasl -pa .This issue turned out to be related to the cache configuration introduced alongside the MPU configuration. In the STMicroelectronics demo code, the additional D-cache and I-cache are enabled immediately after MPU configuration, so that approach was initially mirrored.
This caused the MMC timeout.
I-cache and D-cache are additional caches added by ST on top of the Cortex-M33 L1 cache. The bootloader functions used in RTEMS (rtems_cache_flush_entire_data(), ...) are still based on ARMv7 cache-maintenance operations. These operations affect the Cortex-M33 L1 cache, but not ST's additional I-cache and D-cache. As a result, when the bootloader starts the application, those caches are neither invalidated nor flushed.
Once activation of these two caches was removed, the Erlang shell appeared.
🚤 Conclusion
This first descent into internal SRAM on GRiSP Nano reached further than a linker script change alone. It required coordinated work across the bootloader memory layout, linker scripts, memory protection unit (MPU) configuration, board support package (BSP) startup code, and cache setup. With those pieces in place, the Erlang shell could finally surface from this new layout.
One remaining issue is how to add support in the RTEMS code base for the additional STM32U5 instruction and data caches, and then re-activate them properly in GRiSP Nano without reintroducing the startup issues described above.
Thanks
Special thanks to Christian Mauderer from Embedded Brains, who answered my questions and helped me investigate RTEMS crashes throughout this work.
