Stephen Smith's Blog

Musings on Machine Learning…

Archive for the ‘assembly language’ Category

Assembly Language Tutorial Six for MagPi Magazine

with one comment

Assembly Language Tutorial Six for MagPi Magazine

I’m in the process of writing a series of Assembly Language tutorials for MagPi Magazine. The sixth and final one appeared in issue #121 on page 58.

The PDF version of the magazine is free to download, but MagPi always appreciates any donations.

This article doesn’t look at ARM Assembly Language, instead it looks at the special Assembly Language used by the Raspberry Pi Pico’s Programmable I/O processors. If a CPU needs to handle all the aspects of I/O operations itself, this can take a significant percentage of its processing power. To offload this I/O processing from the CPU, the RP2040 chip includes a set of special PIO coprocessors that can do the I/O processing independently from the CPU. This special Assembly Language is simpler than ARM Assembly Language and there is only room for 32 instructions in the coprocessor, but even so the RP2040’s PIO processor is powerful and can leave the RP2040’s main ARM CPU free to perform more application oriented processing.

Unfortunately this article was written before the Raspberry Pi Pico W was released. The Pico W adds Wifi and Bluetooth to the Raspberry Pi Pico. To do this, Raspberry took over the GPIO pin that connected the on-board LED to the CPU. As a result the program in this article won’t work on a Pico W, only the regular Pico. On the Pico W, the onboard LED is connected to the wireless chip and you have to go through the device driver for this chip to access the LED. There is an example program to do this in the Pico W’s SDK samples.

This tutorial can only give so much detail. If you want more detail, you can always consider my book RP2040 Assembly Language Programming.

Written by smist08

August 25, 2022 at 10:05 am

Assembly Language Tutorial Four for MagPi Magazine

leave a comment »

I’m in the process of writing a series of Assembly Language tutorials for MagPi Magazine. The fourth one appeared in issue #119 on page 50.

The PDF version of the magazine is free to download, but MagPi always appreciates any donations.

This article leads readers through using the Raspberry Pi’s floating point unit (FPU) to perform a calculation. This article uses the 64-bit of Raspberry Pi OS and shows how to write an Assembly Language routine to calculate the distance between two points in two dimensions. This shows how to use the FPU to add, subtract, multiply and perform square roots. There is a C program that uses this Assembly Language distance() routine to calculate the distance between  a couple of sets of points. The tutorial shows how to use the gdb debugger to step through the program and examine the data as it is calculated.

If you want more detail, you can always consider my book Programming with 64-Bit ARM Assembly Language.

Written by smist08

June 30, 2022 at 1:17 pm

Assembly Language Tutorial Three for MagPi Magazine

leave a comment »

Assembly Language Tutorial Three for MagPi Magazine

I’m in the process of writing a series of Assembly Language tutorials for MagPi Magazine. The third one appeared in issue #118 on page 52.

The PDF version of the magazine is free to download, but MagPi always appreciates any donations.

This article looks at writing Assembly Language code for the Raspberry Pi Pico. The Pico is Raspberry’s entry into the microcontroller market and includes a dual core ARM-M0+ CPU. This CPU runs ARM’s thumb instruction set, a subset of the full ARM 32-bit instruction set. This article shows how to create a project, include Assembly Language source code and then run or debug the program on the Raspberry Pi Pico.

This tutorial can only give so much detail. If you want more detail, you can always consider my book RP2040 Assembly Language Programming.

Written by smist08

May 26, 2022 at 1:51 pm

Assembly Language Tutorial Two for MagPi Magazine

with 2 comments

I’m in the process of writing a series of Assembly Language tutorials for MagPi Magazine. The second one appeared in issue # 117 on page 50. 

The PDF version of the magazine is free to download, but MagPi always appreciates any donations.

This second article looks at writing Assembly Language for the 64-bit version of Raspberry Pi OS. This tutorial shows how to access memory, use 64-bit registers, use loops, conditional logic and call functions. The completed program prints out the value of a register in both decimal and hexadecimal showing how to convert a binary value to ASCII.

This was a fairly long tutorial but I can only give so much detail. If you want more detail, you can always consider my book Programming with 64-Bit ARM Assembly Language.

Written by smist08

April 29, 2022 at 9:58 am

Assembly Language Tutorials for MagPi Magazine

with 2 comments

I’m in the process of writing a series of Assembly Language tutorials for MagPi Magazine. The first one appeared in issue # 116 on page 46. 

The PDF version of the magazine is free to download, but MagPi always appreciates any donations.

This first article was written before Raspberry released the 64-bit version of their operating system and so uses 32-bit ARM Assembly Language, this still has many applications and is a good springboard to using ARM based microcontrollers like the Raspberry Pi Pico.

It was a challenge to write in a tutorial format, but I think it ended up working well. With a four page article, I can only give so much detail, but of course if you want more detail, you can always consider my book Raspberry Pi Assembly Language Programming.

Written by smist08

April 20, 2022 at 10:21 am

Adding Assembly Language to MicroPython

with 2 comments


My book “RP2040 Assembly Language Programming”, covered how to interact with modules written in compiled languages like C. However, there isn’t a chapter on adding Assembly Language to MicroPython, a popular programming environment in the Raspberry Pi Pico world. In this article we’ll look at adding a simple Assembly Language function to the MicroPython program we presented in “Playing with the Seeed Studio Grove Starter Kit for the Raspberry Pi Pico”. We’ll place the temperature and humidity together on the first line and then calculate their sum and output that on the second line.

Assembly Language in MicroPython

MicroPython has a syntax to add ARM “Thumb” Assembly Language instructions directly into your Python source code. This is great if you are using an ARM M-series based microcontroller such as Raspberry’s RP2040 or the Seeed Studio Wio Terminal which is based on an ARM M4 CPU. You specify that a function will be in Assembly Language by placing the directive:


before the function definition. The parameters to the Assembly Language function must be r0, r1, r2 & r4, allowing up to four arguments that are all 32-bit integers, other data types aren’t supported. The return value for the function will be whatever is in r0. Our simple Assembly Language function is:

def sum(r0, r1):
    add(r0, r0, r1)

Notice how the Assembly Language instructions are entered as functions to conform to Python syntax, though MicroPython compiles these directly to Assembly Language. There is a function for each instruction in the ARM M-series Thumb instruction set, including the floating point instructions, in case your microcontroller has a floating point unit such as in M4 based systems like the Wio Terminal. The MicroPython documentation for this is quite good and available here.

There is a separate label() function to define labels so you can create loops. You can access constants defined in the main Python program and use the << operator to specify shifts where they are allowed. There is a data() function to define data and an align() function to define alignment.

Inside our Assembly Language function, we can create additional functions and BL to them, and BX to return.

Next is the complete MicroPython program for this project.

from lcd1602 import LCD1602
from dht11 import *
from machine import Pin, I2C
from time import sleep
import math
i2c = I2C(1,scl=Pin(7), sda=Pin(6), freq=400000)
d = LCD1602(i2c, 2, 16)
dht2 = DHT(18) #temperature and humidity sensor connect to D18 po0rt
def sum(r0, r1):
    add(r0, r0, r1)
while True:  
    temp,humid = dht2.readTempHumid()
    d.print("T,H:  " + str(temp) +" " + str(humid))
    d.setCursor(0, 1)
    d.print("Sum: " + str(sum(temp, humid)))

Notice that in our program, we didn’t need to build a custom version of MicroPython to include our routine, our program is strictly a MicroPython source file. This allows you to add Assembly Language functions to any build of MicroPython without requiring any rebuilding or worrying about conflicting versions.

Limitations & Uses

The main complaint I’ve seen about this is that you don’t have access to the RP2040 SDK, this is compiled into MicroPython and MicroPython doesn’t expose it for use. The best use for this is smaller functions that specifically optimize some aspect of Python that is slow, or access hardware registers that don’t have MicroPython libraries. Another use is to call specific Assembly Language instructions that there isn’t something corresponding to MicroPython. Similarly you can’t build other libraries into MicroPython and then call them directly from Assembly Language, you have to have MicroPython mediate access. There are tools like the Zerynth/Viper Emitter to generate your initial code, which can be helpful.

Another limitation is that you can’t write PIO Assembly Language for the RP2040’s programmable I/O.


Most people looking to use MicroPython, don’t want to mess with C or Assembly Language. They are using MicroPython to make their microcontroller projects easier. However, if you are stuck, this is a great way out of nearly any problem. Similarly, if you are writing a MicroPython interface for custom piece of hardware, this could be necessary to hit the hardware registers correctly. The MicroPython inline Assembly Language is a nice extension to the language and gives programmers great power from this system, but it is limited to ARM M-series type microcontrollers.

Assembly Language is Number 8

with 2 comments


Tiobe regularly produces a list of the most popular programming languages and their recently published list has Assembly Language at number 8, moving up from number 16 last year. The top eight languages are:

  1. Python
  2. C
  3. Java
  4. C++
  5. C#
  6. Visual Basic
  7. JavaScript
  8. Assembly Language

The top spots are all well established and well used, C shows remarkable resilience and Java remains popular, in spite of Oracle. In the early days of the PC, all major applications and games were written in Assembly Language, but with the availability of high quality C compilers, this waned and application development switched to C and then other high level languages. Let’s look at why Assembly Language is having a bit of a renaissance.

Assembly Language is Accessible

In the early days, you needed to buy a macro assembler from the chip manufacturer or some other vendor, such as Microsoft’s MASM. Now, all the chip vendors add their Assembly Language support directly into the open source GNU Assembler and/or the LLVM Assembler. Both of these are excellent macro assemblers, run on any hardware, support cross compiling and best of all are completely free.

In my first job out of university, I did some Assembly Language programming on an Intel 80186 board and to debug it, I needed to use an in-circuit emulator (I2ICE) which was a big expensive piece of hardware that replaced the CPU with a debugging probe. Now, all the CPUs and boards have excellent debug probes and you can debug them using open source tools like GNU’s gdb.

Another big help are all the great books on Assembly Language that are available such as: “Raspberry Pi Assembly Language Programming”, “Programming with 64-Bit ARM Assembly Language” and “RP2040 Assembly Language Programming”.

Microcontrollers are Everywhere

The Arduino microcontroller has created a giant community of DIY electronics hobbyists. There is a huge proliferation of inexpensive microcontrollers. In the Arduino world, you program these in Arduino C, but often to get the performance you need, you need to drop down to Assembly Language. Similarly, the memory on these boards is limited, and Assembly Language is the only way to get use of every single bit available to you. With the newer microcontrollers like Raspberry’s RP2040 which are based on ARM 32-bit M-series CPUs, these are much more powerful and have more memory. However, with the extra power, people are attempting more ambitious projects, often involving machine learning applications or other compute intensive applications. Again, they hit the wall with C or MicroPython programming and have to delve into Assembly Language to solve their problems.

When people program these microcontrollers, they are connecting to all sorts of imaginative hardware devices, and they have to create their own libraries to interface to these and often the best way to do this is via Assembly Language.

Competition in the Phone App Market

The App markets for both iOS and Android have matured where as new versions come out, there are fewer changes. The competition between various Apps in a given category is intense and one key way for vendors to differentiate themselves from their competition is via improved performance. Beyond re-writing code to use more efficient algorithms, programmers are turning to hand-crafting the core routines of their Apps into Assembly Language.

Machine Learning

Machine Learning (ML) or AI is extremely compute intensive. There has been a proliferation of coprocessor boards for performing ML computations. All these coprocessors need to be programmed in their own native Assembly Language. Similarly, although you can program nVidia GPUs in CUDA C, to get the absolute most out of a board, you need to delve into the board’s native Assembly Language. Most of the ML libraries are built over top of older Linear Algebra mathematical libraries written in Fortran. As people take on harder and harder problems and need to get useful work done out of every CPU cycle, many routines are being re-written in Assembly Language.


Modern applications are usually written with a number of modules, each module written in the best programming language for the module’s function. Perhaps C for a back end process, JavaScript for a web page and then Assembly Language for important performance critical routines. I don’t think anyone is taking on large applications in 100% Assembly Language, but enough Assembly Language is making its way into applications to move it up the Tiobe index.

Assembly Language is a great way to learn about how computers work and you might want to take a look at one of my books on the subject.

Written by smist08

November 13, 2021 at 4:47 pm

Posted in assembly language

Tagged with ,

RP2040 Assembly Language Programming

with 13 comments


My third book on ARM Assembly Language programming has recently started shipping from Apress/Springer, just in time for Christmas. This one is “RP2040 Assembly Language Programming” and goes into detail on how to program Raspberry’s RP2040 SoC. This chip is used in the Raspberry Pi Pico along with boards from several other manufacturers such as Seeed Studios, AdaFruit, Arduino and Pimoroni.

Flavours of ARM Assembly Language

ARM has ambitions to provide CPUs from the cheapest microcontrollers costing less than a dollar all the way up to supercomputers costing millions of dollars. Along the road to this, there are now three distinct flavours of ARM Assembly Language:

  1. A Series 32-bit
  2. M Series 32-bit
  3. 64-bit

Let’s look at each of these in turn.

A Series 32-bit

For A Series, each instruction is 32-bits in length and as the processors have evolved they added features to support virtual memory, advanced security and other features to support advanced operating systems like Linux, iOS and Android. This is the Assembly Language used in 32-bit phones, tablets and the Raspberry Pi OS. This is covered in my book “Raspberry Pi Assembly Language Programming”.

M Series 32-bit

The full A series instruction set didn’t work well in microcontroller environments. Using 32-bits for each instruction was considered wasteful as well as supporting all the features for advanced operating systems made the CPUs too expensive. To solve the memory problem, ARM introduced a mode to A series 32-bit where each instruction was 16-bits, this saved memory, but the processors were still too expensive. When ARM introduced their M series, or microcontroller processors, they made this 16-bit instruction format the native format and removed most of the advanced operating system features. The RP2040 SoC used in the Raspberry Pi Pico is one of these M Series CPUs using dual core ARM Cortex M0+ CPUs. This is the subject of my current book “RP2040 Assembly Language Programming”.


Like Intel and AMD, ARM made the transition from 32-bit to 64-bit processors. As part of this they cleaned up the instruction set, added registers and created a third variant of ARM Assembly Language. iOS and Android are now fully 64-bit and you can run 64-bit versions of Linux on newer Raspberry Pis. The ARM 64-bit instruction set is the topic of my book: “Programming with 64-Bit ARM Assembly Language”.

ARM 64-bit CPUs can run the 32-bit instruction set, and then the M series instruction set is a subset of the A series 32-bit instruction set. Each one is a full featured rich instruction set and deserves a book of its own. If you want to learn all three, I recommend buying all three of my books.

More Than ARM CPUs

The RP2040 is a System on a Chip (SoC), it includes the two M-series ARM CPU cores; but, it also includes many built in hardware interfaces, memory and other components. RP2040 boards don’t need much beyond the RP2040 chip besides a method to interface other components.

“RP2040 Assembly Language Programming” includes coverage of how to use the various hardware registers to control the built-in hardware controllers, as well as the innovative Programmable I/O (PIO) hardware coprocessors. These PIO coprocessors have their own Assembly Language and are capable of some very sophisticated communications protocols, even VGA.

Where to Buy

“RP2040 Assembly Language Programming” is available from most booksellers including:

Currently if you search for “RP2040” in books on any of these sites, my book comes up first.


The Raspberry Pi Pico and the RP2040 chip aren’t the first ARM M-series based microcontrollers, but with their release, suddenly the popularity and acceptance of ARM processors in the microcontroller space has exploded. The instruction set for ARM’s M-series processors is simple, clean and a great example of a RISC instruction set. Whether you are into more advanced microcontroller applications or learning Assembly Language for the first time, this is a great place to start.

Written by smist08

November 5, 2021 at 10:42 am

I/O Co-processing on the Raspberry Pi Pico

with 4 comments


Last time we looked at how to access the RP2040’s GPIO registers directly from the CPU in Assembly Language. This is a common technique to access and control hardware wired up to a microcontroller’s GPIO pins; however, the RP2040 contains a number of programmable I/O (PIO) coprocessors that can be used to offload this work from the main ARM CPUs. In this article we’ll give a quick overview of the PIO coprocessors and present an example that moves the LED blinking logic from the CPU over to the coprocessors, freeing the CPU to perform other work. There is a PIO blink program in the SDK samples, which blinks three LEDs at different frequencies, we’ll take that program and modify it to blink the LEDs in turn so that it works the same as the examples we’ve been working with.

PIO Overview

There are eight PIO coprocessors divided into two banks for four. Each bank has a single 32 word instruction memory that contains the program(s) that run on the coprocessors. 32 instructions aren’t very many, but you can do quite a bit with these. The SDK contains samples that implement quite a few communication protocols as well as showing how to do video output. 

Each PIO has an input and output FIFO buffer for exchanging data with the main CPUs.

The PIO coprocessors execute their own Assembly Language which the Raspberry folks call a state machine, though they also say they think it is Turing-complete. Below is a diagram showing one of the banks of four. This block is then duplicated twice in the RP2040 package.

Each processor has an X and Y 32-bit general purpose register, input and output shift registers for transferring data to and from the FIFOs, a clock divider register to help control timing, a program counter and then the register to hold the executing instruction as shown in the following diagram.

Each instruction can contain a few bits that specify a delay value, so for many protocols you can control the timing just by adding a timing delay to each instruction. Combine this with the clock divider register to slow down processing and you have a lot of control of timing without using extra instructions.

Sample LED Blinking Program

You write the Assembly Language PIO part of the program into a .pio file which is then compiled by the PIO Assembler into a .h file to include into your program. You can also include C helper functions here and the Pico SDK recommends including an initialization function. The various RP2040 SDK functions to support this are pretty standard and you tend to copy/paste these from the SDK samples.

We are blinking the LEDS using a 200ms delay time which by computer speeds is very slow, but for humans is quite quick. This means we can’t use the clock divider functionality and instruction delays as they don’t go this slow. Instead we have to rely on an old fashioned delay loop. We calculated the delay value in the main function using the frequency of the processor and then doing a loop. We do this delay loop twice because we need to wait for two other LEDs to flash before it’s our turn again. The pull instruction pulls the delay from the read FIFO, then out transfers it to the y register. We move y to x, turn on the pin and then do the delay loop decementing x until its zero. Then we turn the pin off and do the delay loop twice.

.program blink
    pull block
    out y, 32
    mov x, y
    set pins, 1   ; Turn LED on
    jmp x– lp1   ; Delay for (x + 1) cycles, x is a 32 bit number
    mov x, y
    set pins, 0   ; Turn LED off
    jmp x– lp2   ; Delay for the same number of cycles again
    mov x, y
lp3:   ; Do it twice since need to wait for 2 other leds to blink
    jmp x– lp3   ; Delay for the same number of cycles again
.wrap             ; Blink forever!

% c-sdk {
// this is a raw helper function for use by the user which sets up the GPIO output, and configures the SM to output on a particular pin

void blink_program_init(PIO pio, uint sm, uint offset, uint pin) {
   pio_gpio_init(pio, pin);
   pio_sm_set_consecutive_pindirs(pio, sm, pin, 1, true);
   pio_sm_config c = blink_program_get_default_config(offset);
   sm_config_set_set_pins(&c, pin, 1);
   pio_sm_init(pio, sm, offset, &c);

Now the main C program. In this one we configure the pins to use. Note that we will use a coprocessor for each pin, so three coprocessors but each one executing the same program. We start a pin flashing, sleep 200ms and then start the next  one. This way we achieve the same effect as we did in our previous programs.

After we get the LED flashing running on the coprocessors, we have an infinite loop that just prints a counter out to the serial port. This is to demonstrate that the CPU can go on and do anything it wants and the LEDs will keep flashing independently without any of the CPU’s attention.

#include <stdio.h>

#include “pico/stdlib.h”
#include “hardware/pio.h”
#include “hardware/clocks.h”
#include “blink.pio.h”

const uint LED_PIN1 = 18;
const uint LED_PIN2 = 19;
const uint LED_PIN3 = 20;
#define SLEEP_TIME 200

void blink_pin_forever(PIO pio, uint sm, uint offset, uint pin, uint freq);

int main() {
    int i = 0;


    PIO pio = pio0;
    uint offset = pio_add_program(pio, &blink_program);
    printf(“Loaded program at %d\n”, offset);
    blink_pin_forever(pio, 0, offset, LED_PIN1, 5);
    blink_pin_forever(pio, 1, offset, LED_PIN2, 5);
    blink_pin_forever(pio, 2, offset, LED_PIN3, 5);

        printf(“Busy counting away i = %d\n”, i);

void blink_pin_forever(PIO pio, uint sm, uint offset, uint pin, uint freq) {
    blink_program_init(pio, sm, offset, pin);
    pio_sm_set_enabled(pio, sm, true);
    printf(“Blinking pin %d at %d Hz\n”, pin, freq);
    pio->txf[sm] = clock_get_hz(clk_sys) / freq;


This was a quick introduction to the RP2040’s PIO coprocessors. The goal of any microcontroller is to control other interfaced hardware, whether measurement sensors or communications devices (like Wifi). The PIO coprocessors give the RP21040 programmer a powerful weapon to develop sophisticated integration projects without requiring a lot of specialized hardware to make things easier. It might be nice to have a larger instruction memory, but then in a $4 USD device, you can’t really complain.

For people playing with the Raspberry Pi Pico or another RP2040 based board, you can program in 32-bit ARM Assembly Language and might want to consider my book “Raspberry Pi Assembly Language Programming”.

Written by smist08

April 30, 2021 at 10:02 am

Bit-Banging the Raspberry Pi Pico’s GPIO Registers

with 5 comments


Last week, I introduced my first Assembly Language program for the Raspberry Pi Pico. This was a version of my flashing LED program that I implemented in a number of programming languages for the regular Raspberry Pi. In the original article, I required three routines written in C to make things work. Yesterday, I showed how to remove one of these C routines, namely to have the main routine written in Assembly Language. Today, I’ll show how to remove the two remaining C routines, which were wrappers for two SDK routines which are implemented as inline C functions and as a consequence only usable from C code.

In this article, we’ll look at the structure for the GPIO registers on the RP2040 and how to access these. The procedure we are using is called bit-banging because we are using one of the two M0+ ARM CPU cores to loop banging the bits in the GPIO registers to turn them on and off. This isn’t the recommended way to do this on the RP2040. The RP2040 implements eight programmable I/O (PIO) co-processors that you can program to offload this sort of thing from the CPU. We’ll look at how to do that in a future article, but as a first step we are going to explore bit-banging mostly to understand the RP2040 hardware better.

The RP2040 GPIO Hardware Registers

There are 28 programmable GPIO pins on the Pico. There are 40 pins, but the others are ground, power and a couple of specialized pins (see the diagram below).

This means that we can assign each one to a bit in a 32-bit hardware register which is mapped to 32-bits of memory in the RP2040’s address space. The GPIO functions are controlled by writing a 1 bit to the correct position in the GPIO register. There is one register to turn on a GPIO pin and a different register to turn it off, this means you don’t need to read the register, change one bit and then write it back. It’s quite easy to program these since you just place one in a CPU register, shift it over by the pin number and then write it to the correct memory location. These registers start at memory location 0xd0000000 and are defined in sio.h. Note there are two sio.h files, one in hardware_regs which contains the offsets and is better for Assembly Language usage and then one in hardware_structs which contains a C structure to map over the registers. Following are the GPIO registers, note that there are a few other non-GPIO related registers at this location and a few unused gaps in case you are wondering why the addresses aren’t contiguous.


Notice that there are a number of _hi_ registers, perhaps indicating that Raspberry plans to come out with a future version with more than 32 GPIO pins.

In the SDK and my code below we just write one bit at a time, I don’t know if the RP2040’s circuitry can handle writing more bits at once, for instance can we set all three pins to output in one write instruction? Remember hardware registers tend to have minimal functionality to simplify the electronics circuitry behind them so often you can’t get too complicated in what you expect of them.

Bit-Banging the Registers in Assembly

Below is the new updated program that doesn’t require the C file. In our routines to control the GPIO pins, we pass the pin number as parameter 1, which means it is in R0. We place 1 in R3 and then shift it left by the value in R0 (the pin number). This gives the value we need to write. We then load the address of the register we need, which we specified in the .data section and write the value. Note that we need two LDR instructions, once to load the address of the memory address and then the second to load the actual value.

@ Assembler program to flash three LEDs connected to the
@ Raspberry Pi GPIO port using the Pico SDK.

.EQU sleep_time, 200

.global main             @ Provide program starting address to linker

.align  4 @ necessary alignment


@ Init each of the three pins and set them to output

BL gpio_init
BL gpiosetout
BL gpio_init
BL gpiosetout
BL gpio_init
BL gpiosetout


@ Turn each pin on, sleep and then turn the pin off

BL gpio_on
LDR R0, =sleep_time
BL sleep_ms
BL gpio_off
BL gpio_on
LDR R0, =sleep_time
BL sleep_ms
BL gpio_off
BL gpio_on
LDR R0, =sleep_time
BL sleep_ms
BL gpio_off

B       loop @ loop forever

@ write a 1 bit to the pin position in the output set register
movs r3, #1
lsl r3, r0 @ shift over to pin position
ldr r2, =gpiosetdiroutreg @ address we want
ldr r2, [r2]
str r3, [r2]
bx lr

movs r3, #1
lsl r3, r0 @ shift over to pin position
ldr r2, =gpiosetonreg @ address we want
ldr r2, [r2]
str r3, [r2]
bx lr

movs r3, #1
lsl r3, r0 @ shift over to pin position
ldr r2, =gpiosetoffreg @ address we want
ldr r2, [r2]
str r3, [r2]
bx lr

      .align  4 @ necessary alignment
gpiosetdiroutreg: .word   0xd0000024 @ mem address of gpio registers
gpiosetonreg: .word   0xd0000014 @ mem address of gpio registers
gpiosetoffreg: .word   0xd0000018 @ mem address of gpio registers

Having separate functions for gpio_in and gpio_out simplifies our code since we don’t need any conditional logic to load the correct register address.

We loaded the actual address from a shared location. We could have loaded the base address of 0xd000000 and then stored things via an offset, but I did this to be a little clearer. If you look at the disassembly of the SDK routine, it does something rather clever to get the base address. It does:

movs r2, #208 @ 0xd0
lsl r2, r2, #24 @ becomes 0xd0000000

And then uses something like:

str r3, [r2, #40] @ 0x28

To store the value using an index which is the offset to the correct register. I thought this was rather clever on the C compiler’s part and represents the optimizations that the ARM engineers have been adding to the GCC generation of ARM code. This technique takes the same time to execute, but doesn’t require saving any values in memory, saving a few bytes which may be crucial in a larger program.


Writing to the hardware registers directly on the Raspberry Pi Pico is a bit simpler than the Broadcom implementation in the full Raspberry Pi. With these routines we wrote our entire program in Assembly Language. There is still C code in the SDK which will be linked into our program and we are still calling both gpio_init and sleep_ms in the SDK. We could look at the source code in the SDK and reimplement these in Assembly Language, but I don’t think there is any need. Between the RP2040 documentation and the SDK’s source code it is possible to figure out a lot about how the Raspberry Pi Pico works.

For people playing with the Raspberry Pi Pico or another RP2040 based board, you can program in 32-bit ARM Assembly Language and might want to consider my book “Raspberry Pi Assembly Language Programming”.

Written by smist08

April 24, 2021 at 11:50 am