Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘arm assembler

Flashing LEDs in Assembler

with 2 comments


Previously I wrote an article on an introduction to Assembler programming on the Raspberry Pi. This was quite a long article without much of a coding example, so I wanted to produce an Assembler  language version of the little program I did in Python, Scratch, Fortran and C to flash three LEDs attached to the Raspberry Pi’s GPIO port on a breadboard. So in this article I’ll introduce that program.

This program is fairly minimal. It doesn’t do any error checking, but it does work. I don’t use any external libraries, and only make calls to Linux (Raspbian) via software interrupts (SVC 0). I implemented a minimal GPIO library using Assembler Macros along with the necessary file I/O and sleep Linux system calls. There probably aren’t enough comments in the code, but at this point it is fairly small and the macros help to modularize and explain things.

Main Program

Here is the main program, that probably doesn’t look structurally that different than the C code, since the macro names roughly match up to those in the GPIO library the C function called. The main bit of Assembler code here is to do the loop through flashing the lights 10 times. This is pretty straight forward, just load 10 into register r6 and then decrement it until it hits zero.


@ Assembler program to flash three LEDs connected to the
@ Raspberry Pi GPIO port.
@ r6 - loop variable to flash lights 10 times

.include "gpiomacros.s"

.global _start             @ Provide program starting address to linker

_start: GPIOExport  pin17
        GPIOExport  pin27
        GPIOExport  pin22


        GPIODirectionOut pin17
        GPIODirectionOut pin27
        GPIODirectionOut pin22

        @ setup a loop counter for 10 iterations
        mov         r6, #10

loop:   GPIOWrite   pin17, high
        GPIOWrite   pin17, low
        GPIOWrite   pin27, high
        GPIOWrite   pin27, low
        GPIOWrite   pin22, high
        GPIOWrite   pin22, low

        @decrement loop counter and see if we loop
        subs    r6, #1      @ Subtract 1 from loop register setting status register
        bne     loop        @ If we haven't counted down to 0 then loop

_end:   mov     R0, #0      @ Use 0 return code
        lsl     R0, #2      @ Shift R0 left by 2 bits (ie multiply by 4)
        mov     R7, #1      @ Service command code 1 terminates this program
        svc     0           @ Linus command to terminate program

pin17:      .asciz  "17"
pin27:      .asciz  "27"
pin22:      .asciz  "22"
low:        .asciz  "0"
high:       .asciz  "1"


GPIO and Linux Macros

Now the real guts of the program are in the Assembler macros. Again it isn’t too bad. We use the Linux service calls to open, write, flush and close the GPIO device files in /sys/class/gpio. Similarly nanosleep is also a Linux service call for a high resolution timer. Note that ARM doesn’t have memory to memory or operations on memory type instructions, so to do anything we need to load it into a register, process it and write it back out. Hence to copy the pin number to the file name we load the two pin characters and store them to the file name memory area. Hard coding the offset for this as 20 isn’t great, we could have used a .equ directive, or better yet implemented a string scan, but for quick and dirty this is fine. Similarly we only implemented the parameters we really needed and ignored anything else. We’ll leave it as an exercise to the reader to flush these out more. Note that when we copy the first byte of the pin number, we include a #1 on the end of the ldrb and strb instructions, this will do a post increment by one on the index register that holds the memory location. This means the ARM is really very efficient in accessing arrays (even without using Neon) we combine the array read/write with the index increment all in one instruction.

If you are wondering how you find the Linux service calls, you look in /usr/include/arm-linux-gnueabihf/asm/unistd.h. This C include file has all the function numbers for the Linux system calls. Then you Google the call for its parameters and they go in order in registers r0, r1, …, r6, with the return code coming back in r0.


@ Various macros to access the GPIO pins
@ on the Raspberry Pi.

@ R5 is used for the file descriptor

.macro  openFile    fileName
        ldr         r0, =\fileName
        mov         r1, #01     @ O_WRONLY
        mov r7,     #5          @ 5 is system call number for open
        svc         0

.macro  writeFile   buffer, length
        mov         r0, r5      @ file descriptor
        ldr         r1, =\buffer
        mov         r2, #\length
        mov         r7, #4 @ 4 is write
        svc         0

.macro  flushClose
@fsync syscall
        mov         r0, r5
        mov         r7, #118    @ 118 is flush
        svc         0

@close syscall
        mov         r0, r5
        mov         r7, #6      @ 6 is close
        svc         0

@ Macro nanoSleep to sleep .1 second
@ Calls Linux nanosleep entry point which is function 162.
@ Pass a reference to a timespec in both r0 and r1
@ First is input time to sleep in seconds and nanoseconds.
@ Second is time left to sleep if interrupted (which we ignore)

.macro  nanoSleep
        ldr         r0, =timespecsec
        ldr         r1, =timespecsec
        mov         r7, #162    @ 162 is nanosleep
        svc         0

.macro  GPIOExport  pin
        openFile    gpioexp
        mov         r5, r0      @ save the file descriptor
        writeFile   \pin, 2

.macro  GPIODirectionOut   pin
        @ copy pin into filename pattern
        ldr         r1, =\pin
        ldr         r2, =gpiopinfile
        add         r2, #20
        ldrb        r3, [r1], #1 @ load pin and post increment
        strb        r3, [r2], #1 @ store to filename and post increment
        ldrb        r3, [r1]
        strb        r3, [r2]
        openFile    gpiopinfile
        writeFile   outstr, 3

.macro  GPIOWrite   pin, value
        @ copy pin into filename pattern
        ldr         r1, =\pin
        ldr         r2, =gpiovaluefile
        add         r2, #20
        ldrb        r3, [r1], #1    @ load pin and post increment
        strb        r3, [r2], #1    @ store to filename and post increment
        ldrb        r3, [r1]
        strb        r3, [r2]
        openFile    gpiovaluefile
        writeFile   \value, 1

timespecsec:   .word   0
timespecnano:  .word   100000000
gpioexp:    .asciz  "/sys/class/gpio/export"
gpiopinfile: .asciz "/sys/class/gpio/gpioxx/direction"
gpiovaluefile: .asciz "/sys/class/gpio/gpioxx/value"
outstr:     .asciz  "out"
            .align  2          @ save users of this file having to do this.


Here is a simple makefile for the project if you name the files as indicated. Again note that WordPress and Google Docs may mess up white space and quote characters so these might need to be fixed if you copy/paste.

model: model.o
    ld -o model model.o

model.o: model.s gpiomacros.s
    as -ggdb3 -o model.o model.s

    rm model model.o


IDE or Not to IDE

People often do Assembler language development in an IDE like Code::Blocks. Code::Blocks doesn’t support Assembler language projects, but you can add Assembler language files to C projects. This is a pretty common way to do development since you want to do more programming in a higher level language like C. This way you also get full use of the C runtime. I didn’t do this, I just used a text editor, make and gdb (command line). This way the above program has no extra overhead the executable is quite small since there is no C runtime or any other library linked to it. The debug version of the executable is only 2904 bytes long and non debug is 2376 bytes. Of course if I really wanted to reduce executable size, I could have used function calls rather than Assembler macros as the macros duplicate the code everywhere they are used.


Assembler language programming is kind of fun. But I don’t think I would want to do too large a project this way. Hats off to the early personal computer programmers who wrote spreadsheet programs, word processors and games entirely in Assembler. Certainly writing a few Assembler programs gives you a really good understanding of how the underlying computer hardware works and what sort of things your computer can do really efficiently. You could even consider adding compiler optimizations for your processor to GCC, after all compiler code generation has a huge effect on your computer’s performance.

Written by smist08

January 7, 2018 at 7:08 pm

Raspberry Pi Assembly Programming

with 5 comments


Most University Computer Science programs now do most of their instruction in quite high level languages like Java which don’t require you to need to know much about the underlying computer architecture. I think this tends to create quite large gaps in understanding which can lead to quite bad program design mistakes. Learning to program C puts you closer to the metal since you need to know more about memory and low level storage, but nothing really teaches you how the underlying computer works than doing some Assembler Language Programming, The Raspbian operating system comes with the GNU Assembler pre-installed, so you have everything you need to try Assembler programming right out of the gate. Learning a bit of Assembler teaches you how the Raspberry Pi’s ARM processor works, how a modern RISC architecture processes instructions and how work is divided by the ARM CPU and the various co-processors which are included on the same chip.

A Bit of History

The ARM processor was originally developed to be a low cost processor for the British educational computer, the Acorn. The developers of the ARM felt they had something and negotiated a split into a separate company selling CPU designs to hardware manufacturers. Their first big sale was to Apple to provide a CPU for Apple’s first PDA the Newton. Their first big success was the inclusion of their chip design in Apple’s iPods. Along the way many chip makers like TI which had given up competing on CPUs built single chip computers around the ARM. These ended up being included in about every cell phone including those from Nokia and Apple. Nowadays pretty up every Android phone is also built around ARM designs.

ARM Assembler Instructions

There have been a lot of ARM designs from early simple 16 bit processors up through 32 bit processors to the current 64bit designs. In this article I’m just going to consider the ARM processor in the Raspberry Pi, this processor is a 64 bit processor, but Raspbian is still a 32 bit operating system, so I’ll just be talking about the 32 bit processing used here (and I’ll ignore the 16 bit “thumb” instructions for now).

ARM is a RISC processor which means the idea is that it executes very simple instructions very quickly. The idea is to keep the main processor simple to reduce complexity and power usage. Each instruction is 32 bits long, so the processor doesn’t need to think about how much to increment the program counter for each instruction. Interestingly nearly all the instructions are in the same format, where you can control whether it sets the status bits, can test the status bits on whether to do anything and can have four register parameters (or 2 registers and an immediate constant). One of the registers can also have a shift operation applied. So how is all this packed into on 32 bit instruction? There are 16 registers in the ARM CPU (these include the program counter, link return register and the stack pointer. There is also the status register that can’t be used as a general purpose register. This means it takes 4 bits to specify a register, so specifying 4 registers takes 16 bits out of the instruction.

Below are the formats for the main types of ARM instructions.

Now we break out the data processing instructions in more detail since these comprise quite a large set of instructions and are the ones we use the most.

Although RISC means a small set of simple instructions, we see that by cleverly using every bit in those 32 bits for an instruction that we can pack quite a bit of information.

Since the ARM is a 32-Bit processor meaning among other things that it can address a 32-Bit address space this does lead to some interesting questions:

  1. The processor is 32-Bits but immediate constants are 16-Bits. How do we load arbitrary 32-Bit quantities? Actually the immediate instruction is 12-Bits of data and 4 Bits of a shift amount. So we can load 12 Bits and then shift it into position. If this works great. If not we have to be tricky by loading and adding two quantities.
  2. Since the total instruction is 32-Bits how do we load a 32-Bit memory address? The answer is that we can’t. However we can use the same tricks indicated in number 1. Plus you can use the program counter. You can add an immediate constant to the program counter. The assembler often does this when assembling load instructions. Note that since the ARM is pipelined the program counter tends to be a couple of instructions ahead of the current one, so this has to properly be taken into account.
  3. Why is there the ability to conditionally execute all the data processing instructions? Since the ARM processor is pipelines, doing a branch instruction is quite expensive since the pipeline needs to be discarded and then reloaded at the new execution point. Having individual instructions conditionally execute can save quite a few branch instructions leading to faster execution. (As long as your compiler is good at generating RISC type code or you are hand coding in Assembler).
  4. The ARM processor has a multiply instruction, but no divide instruction? This seems strange and unbalanced. Multiply (and multiply and add) are recent additions to the ARM processor. Divide is still considered too complicated and slow, plus is it really used that much? You can do divisions on either the floating point coprocessor or the Neon SIMD coprocessor.

A Very Small Example

Assembler listings tend to be quite long, so as a minimal set let’s start with a program that just exits when run returning 17 to the command line (you can see this if you type echo $@ after running it). Basically we have an assembler directive to define the entry point as a global. We move the immediate value 17 to register R0. Shift it left by 2 bits (Linux expects this). Move 1 to register R7 which is the Linux command code to exit the program and then call service zero which is the Linux operating system. All calls to Linux use this pattern. Set some parameters in registers and then call “svc 0” to do the call. You can open files, read user input, write to the console, all sorts of things this way.


.global _start              @ Provide program starting address to linker
_start: mov     R0, #17     @ Use 17 for test example
        lsl     R0, #2      @ Shift R0 left by 2 bits (ie multiply by 4)
        mov     R7, #1      @ Service command code 1 terminates this program
        svc     0           @ Linux command to terminate program


Here is a simple makefile to compile and link the preceding program. If you save the above as model.s and then the makefile as makefile then you can compile the program by typing “make” and run it by typing “./model”. as is the GNU-Assembler and ld is the linker/loader.


model: model.o
     ld -o model model.o

model.o: model.s
     as -g -o model.o model.s

     rm model model.o


Note that make is very particular about whitespace. It must be a true “tab” character before the commands to execute. If Google Docs or WordPress changes this, then you will need to change it back. Unfortunately word processors have a bad habit of introducing syntax errors to program listings by changing whitespace, quotes and underlines to typographic characters that compilers (and assemblers) don’t recognize.


Although these days you can do most things with C or a higher level language, Assembler still has its place in certain applications that require performance or detailed hardware interaction. For instance perhaps tuning the numpy Python library to use the Neon coprocessor for faster operation of vector type operations. I do still think that every programmer should spend some time playing with Assembler language so they understand better how the underlying processor and architecture they use day to day really works. The Raspberry Pi offers and excellent environment to do this with the good GNU Macro Assembler, the modern ARM RISC architecture and the various on-chip co-processors.

Just the Assembler introductory material has gotten fairly long, so we won’t get to an assembler version of our flashing LED program. But perhaps in a future article.


Written by smist08

January 4, 2018 at 10:45 pm