Stephen Smith's Blog

Musings on Machine Learning…

Calling Main in Assembly Language on the RP2040

with one comment


Introduction

In last week’s article, I presented my first Assembly Language program on the Raspberry Pi Pico. The program worked, but it included some C code that I wasn’t happy with. In this article, I’ll explain why I needed to have the main entry point in C, what I missed and how to correct this problem.

The entry point is a function main() with no parameters or return code called by the RP2040 initialization code after it initializes the RP2040 hardware. In C this worked no problem, but in Assembly Language it resulted in a hardware fault on executing the first instruction in my main() routine. This was a bit of a head scratcher and it took a couple of days before I realized what the problem was. My first thought was that it was alignment, but no it wasn’t that. Perhaps I needed to duplicate the first few instructions in the Assembly Language generated by the C compiler, but no that still caused a hardware fault. Rather mystifying and annoying.

Use the Source

The program you run on the Pico contains pretty much everything in a single executable, that initializes the CPU, peripheral hardware and then runs in an endless loop forever. There is no operating system, just your program. The Raspberry Pi Pico contains a bit of firmware which is activated when you power on with the bootsel button pressed, this allows the Pico to connect as a shareable flash drive to a USB host, and will allow you to copy files into the writable part of the Pico’s flash memory. After that it reboots to let the program run.

One of the good things about the Pico is that the SDK contains the source code for this whole thing, and when you build your program, it actually compiles all this source code alongside your code (there are no libraries in this environment). This means you can build a debug build where everything is debuggable including both your code and the SDK code. This means you can set a breakpoint before your code and single step through the SDK into your code. You can’t start debugging at the very first instruction, you need to let the first bit of the SDK initialize the processor before starting, but you can set a breakpoint fairly early. I found a good place was the platform_entry routine, which is an Assembly Language function in crt0.S. This is the function that initializes the SDK environment and then calls your main() starting point. The code for this routine is fairly innocuous:

platform_entry: // symbol for stack traces
    // Use 32-bit jumps, in case these symbols are moved out of branch range
    // (e.g. if main is in SRAM and crt0 in flash)
    ldr r1, =runtime_init
    blx r1
    ldr r1, =main
    blx r1
    ldr r1, =exit
    blx r1

Nothing special, it just loads the address of our main routine and calls it. Stepping through the C code, it works, stepping through the Assembly Language code, hardware fault.

At some point I thought to look at the documentation for the BLX instruction, why were they calling this rather than BL? This turned out to be the root of the problem.

On a full ARM A-series CPU, like those in a full Raspberry Pi or in your cell phone, it can execute a rich set of instructions, which are the regular ARM 32-bit instruction set, but on the microcontroller M-series CPU like in the Pico it only executes the so called “thumb” instructions. On the A-series CPU you switch back and forth between regular and thumb modes using the BLX instruction. Thumb instructions are 16-bit in length, regular instructions are 32-bit, both have to be aligned, on even bytes the other on 4-byte boundaries. Both of these are even addresses so the true address of any instruction is even, which means the low order bit isn’t really used (it has to be zero). The BLX instruction uses this low order bit to specify whether to switch to thumb mode or not. If it is one, then thumb mode, if even then regular instruction mode. Let’s look at the disassembly for this routine:

1000021a <platform_entry>:
1000021a: 4919      ldr r1, [pc, #100] ; (10000280 <__get_current_exception+0x1a>)
1000021c: 4788      blx r1
1000021e: 4919      ldr r1, [pc, #100] ; (10000284 <__get_current_exception+0x1e>)
10000220: 4788      blx r1
10000222: 4919      ldr r1, [pc, #100] ; (10000288 <__get_current_exception+0x22>)
10000224: 4788      blx r1

10000280: 100012bd .word 0x100012bd   ; runtime_init
10000284: 10000361 .word 0x10000360   ; main
10000288: 100013a9 .word 0x100013a9   ; exit

Notice the address for my main routine is even whereas the other two routines are odd. If I compile with the C routine then main has an odd address as well. I didn’t think of this because the RP2040’s M-series CPU only executes thumb instructions, so why have any functionality to switch between modes? I don’t know but if you do tell it to switch to regular instructions then you get a hardware fault.

The other question is why the author of crt0.S in the SDK calls routines with BLX rather than BL? Afterall the Pico doesn’t support regular instructions, so you are always in thumb mode. If platform_entry used BL instead, then I wouldn’t have had any problem. I wonder if this indicates they developed the SDK on an A-series CPU, perhaps before they obtained real RP2040’s and this indicates how they did early development on the SDK? Or perhaps there is a way to emulate the RP2040 on a full A-series CPU and this is how the developers at the Raspberry Pi foundation operate.

To correct the problem, we just need to indicate our main() routine is a thumb routine. We do this by placing a .thumb_func directive in front of the .global directive.

.thumb_func
.global main             @ Provide program starting address to linker

.align  4 @ necessary alignment

main:

The key point is that this is in front of the .global, since it is really just the linker that needs to process this to set up the correct address when it links in crt0.

Summary

This eliminates the need for the C main() function we had last week. Next time we’ll eliminate the two other C routines we had and explore how the Raspberry Pi Pico’s GPIO control registers work. As with most problems, working through the solution, teaches us a bit more about how the RP2040 works and reminds us that there are consequences of using a subset of the full ARM instruction set.

For people using this SDK, you can program in 32-bit ARM Assembly Language and might want to consider my book “Raspberry Pi Assembly Language Programming”.

Written by smist08

April 23, 2021 at 9:11 am

One Response

Subscribe to comments with RSS.

  1. […] Raspberry Pi. In the original article, I required three routines written in C to make things work. Yesterday, I showed how to remove one of these C routines, namely to have the main routine written in […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: