Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘assembly language

Assembly Language is Number 8

with 2 comments

Introduction

Tiobe regularly produces a list of the most popular programming languages and their recently published list has Assembly Language at number 8, moving up from number 16 last year. The top eight languages are:

  1. Python
  2. C
  3. Java
  4. C++
  5. C#
  6. Visual Basic
  7. JavaScript
  8. Assembly Language

The top spots are all well established and well used, C shows remarkable resilience and Java remains popular, in spite of Oracle. In the early days of the PC, all major applications and games were written in Assembly Language, but with the availability of high quality C compilers, this waned and application development switched to C and then other high level languages. Let’s look at why Assembly Language is having a bit of a renaissance.

Assembly Language is Accessible

In the early days, you needed to buy a macro assembler from the chip manufacturer or some other vendor, such as Microsoft’s MASM. Now, all the chip vendors add their Assembly Language support directly into the open source GNU Assembler and/or the LLVM Assembler. Both of these are excellent macro assemblers, run on any hardware, support cross compiling and best of all are completely free.

In my first job out of university, I did some Assembly Language programming on an Intel 80186 board and to debug it, I needed to use an in-circuit emulator (I2ICE) which was a big expensive piece of hardware that replaced the CPU with a debugging probe. Now, all the CPUs and boards have excellent debug probes and you can debug them using open source tools like GNU’s gdb.

Another big help are all the great books on Assembly Language that are available such as: “Raspberry Pi Assembly Language Programming”, “Programming with 64-Bit ARM Assembly Language” and “RP2040 Assembly Language Programming”.

Microcontrollers are Everywhere

The Arduino microcontroller has created a giant community of DIY electronics hobbyists. There is a huge proliferation of inexpensive microcontrollers. In the Arduino world, you program these in Arduino C, but often to get the performance you need, you need to drop down to Assembly Language. Similarly, the memory on these boards is limited, and Assembly Language is the only way to get use of every single bit available to you. With the newer microcontrollers like Raspberry’s RP2040 which are based on ARM 32-bit M-series CPUs, these are much more powerful and have more memory. However, with the extra power, people are attempting more ambitious projects, often involving machine learning applications or other compute intensive applications. Again, they hit the wall with C or MicroPython programming and have to delve into Assembly Language to solve their problems.

When people program these microcontrollers, they are connecting to all sorts of imaginative hardware devices, and they have to create their own libraries to interface to these and often the best way to do this is via Assembly Language.

Competition in the Phone App Market

The App markets for both iOS and Android have matured where as new versions come out, there are fewer changes. The competition between various Apps in a given category is intense and one key way for vendors to differentiate themselves from their competition is via improved performance. Beyond re-writing code to use more efficient algorithms, programmers are turning to hand-crafting the core routines of their Apps into Assembly Language.

Machine Learning

Machine Learning (ML) or AI is extremely compute intensive. There has been a proliferation of coprocessor boards for performing ML computations. All these coprocessors need to be programmed in their own native Assembly Language. Similarly, although you can program nVidia GPUs in CUDA C, to get the absolute most out of a board, you need to delve into the board’s native Assembly Language. Most of the ML libraries are built over top of older Linear Algebra mathematical libraries written in Fortran. As people take on harder and harder problems and need to get useful work done out of every CPU cycle, many routines are being re-written in Assembly Language.

Summary

Modern applications are usually written with a number of modules, each module written in the best programming language for the module’s function. Perhaps C for a back end process, JavaScript for a web page and then Assembly Language for important performance critical routines. I don’t think anyone is taking on large applications in 100% Assembly Language, but enough Assembly Language is making its way into applications to move it up the Tiobe index.

Assembly Language is a great way to learn about how computers work and you might want to take a look at one of my books on the subject.

Written by smist08

November 13, 2021 at 4:47 pm

Posted in assembly language

Tagged with ,

RP2040 Assembly Language Programming

with 6 comments

Introduction

My third book on ARM Assembly Language programming has recently started shipping from Apress/Springer, just in time for Christmas. This one is “RP2040 Assembly Language Programming” and goes into detail on how to program Raspberry’s RP2040 SoC. This chip is used in the Raspberry Pi Pico along with boards from several other manufacturers such as Seeed Studios, AdaFruit, Arduino and Pimoroni.

Flavours of ARM Assembly Language

ARM has ambitions to provide CPUs from the cheapest microcontrollers costing less than a dollar all the way up to supercomputers costing millions of dollars. Along the road to this, there are now three distinct flavours of ARM Assembly Language:

  1. A Series 32-bit
  2. M Series 32-bit
  3. 64-bit

Let’s look at each of these in turn.

A Series 32-bit

For A Series, each instruction is 32-bits in length and as the processors have evolved they added features to support virtual memory, advanced security and other features to support advanced operating systems like Linux, iOS and Android. This is the Assembly Language used in 32-bit phones, tablets and the Raspberry Pi OS. This is covered in my book “Raspberry Pi Assembly Language Programming”.

M Series 32-bit

The full A series instruction set didn’t work well in microcontroller environments. Using 32-bits for each instruction was considered wasteful as well as supporting all the features for advanced operating systems made the CPUs too expensive. To solve the memory problem, ARM introduced a mode to A series 32-bit where each instruction was 16-bits, this saved memory, but the processors were still too expensive. When ARM introduced their M series, or microcontroller processors, they made this 16-bit instruction format the native format and removed most of the advanced operating system features. The RP2040 SoC used in the Raspberry Pi Pico is one of these M Series CPUs using dual core ARM Cortex M0+ CPUs. This is the subject of my current book “RP2040 Assembly Language Programming”.

64-bit

Like Intel and AMD, ARM made the transition from 32-bit to 64-bit processors. As part of this they cleaned up the instruction set, added registers and created a third variant of ARM Assembly Language. iOS and Android are now fully 64-bit and you can run 64-bit versions of Linux on newer Raspberry Pis. The ARM 64-bit instruction set is the topic of my book: “Programming with 64-Bit ARM Assembly Language”.

ARM 64-bit CPUs can run the 32-bit instruction set, and then the M series instruction set is a subset of the A series 32-bit instruction set. Each one is a full featured rich instruction set and deserves a book of its own. If you want to learn all three, I recommend buying all three of my books.

More Than ARM CPUs

The RP2040 is a System on a Chip (SoC), it includes the two M-series ARM CPU cores; but, it also includes many built in hardware interfaces, memory and other components. RP2040 boards don’t need much beyond the RP2040 chip besides a method to interface other components.

“RP2040 Assembly Language Programming” includes coverage of how to use the various hardware registers to control the built-in hardware controllers, as well as the innovative Programmable I/O (PIO) hardware coprocessors. These PIO coprocessors have their own Assembly Language and are capable of some very sophisticated communications protocols, even VGA.

Where to Buy

“RP2040 Assembly Language Programming” is available from most booksellers including:

Currently if you search for “RP2040” in books on any of these sites, my book comes up first.

Summary

The Raspberry Pi Pico and the RP2040 chip aren’t the first ARM M-series based microcontrollers, but with their release, suddenly the popularity and acceptance of ARM processors in the microcontroller space has exploded. The instruction set for ARM’s M-series processors is simple, clean and a great example of a RISC instruction set. Whether you are into more advanced microcontroller applications or learning Assembly Language for the first time, this is a great place to start.

Written by smist08

November 5, 2021 at 10:42 am

ARM’s True RISC Processors

leave a comment »

Introduction

I recently completed my book, “RP2040 Assembly Language Programming” and was thinking about the differences in the three main instruction sets available on ARM Processors:

  1. The “thumb” instructions used in ARM’s 32-bit microcontrollers are covered in “RP2040 Assembly Language Programming”.
  2. The full 32-bit A-series instruction set as used by the Raspberry Pi OS is covered in my book “Raspberry Pi Assembly Language Programming”.
  3. The 64-bit instruction set used on all smartphones and tablets covered in my book “Programming with 64-Bit ARM Assembly Language”.

ARM is advertised as Reduced Instruction Set Computer (RISC) as opposed to Intel x86 chips which are Complex Instruction Set Computers (CISC). However, as ARM instroduces v9 of their full chip architecture, the instruction set has gotten pretty complex. Writing the RP2040 book and writing the included source code was nice in that the microcontroller version of the instruction set really is reduced and much simpler than the other two full versions. In this article, we’ll look at a bit of history of the various ARM instruction sets and why ARM is still considered a RISC processor.

A Bit of History

Originally, ARM was developed as a replacement to the 6502 processor used in the BBC Microcomputer, developed by Acorn. The early versions were specialty chips and it wasn’t until ARM was selected by Apple to use ARM in their Newton PDAs that ARM was spun off as a separate company starting with their 32-bit RISC CPUs. They reached the next level of success as Apple continued to use them in their iPods and then they hit it big when they were used in the iPhone and after that pretty much every smartphone and tablet that reached any level of success.

The original 32-bit instruction set used 32-bits to contain each machine instruction, which worked great as long as you had sufficient memory. In the microcontroller world there were complaints that for devices with only 4k of memory, these instructions were too big. To answer this, ARM added “thumb” instructions which were 16-bits in length, using half the memory of hte full instructions. The processor was still 32-bits, since the registers were 32-bits in size and all integer arithmetic was 32-bit. The “thumb” instruction set is a subset of the full 32-bit instruction set and the processor can switch between regular and thumb mode on select branch instructions. This allowed the microcontroller people to use the “thumb” subset to develop compact applications for their use. Even on computers with larger memory, “thumb” instructions can be useful since loading 16-bit instructions means you can load two instructions for each memory read and save contention on the memory bus and allowing twice as many instructions to fit in the instruction cache, improving performance.

The first “thumb” instruction set wasn’t complete which meant programs had to revert to full instructions to complete a number of functions. To address this ARM developed “thumb-2” to allow complete functionality without switching back. The various “thumb” instruction sets are all 32-bit, the 64-bit version of the ARM instruction set has no “thumb” subset.

Enter Microcontrollers

ARM has alway had the ambition to provide CPU chips covering the whole market from inexpensive small microcontrollers all the way up to the most powerful datacenter server chips. The full 32-bit ARM processors were a bit too expensive and complicated for the microcontroller market. To address this market, ARM developed the M-series CPUs where they chose to make the full instruction set of these devices, the “thumb” instruction set. This made these CPUs far simpler and required fewer transistors to create. This laid the way for powerful ARM 32-bit CPUs for the microcontroller market costing under $1 each.

For instance, the ARM Cortex-M0+ used in the Raspberry Pi Pico has 85 instructions. This sounds like a lot, but it counts things like adding a register to a register different from adding an immediate operand to a register. This is far fewer instructions than in an ARM full A-series processor, which is far fewer than the instructions in an x86 processor.

Some of the features that are dropped from the M-series processors are:

  • Virtual memory
  • Hardware memory protection
  • Virtualization
  • Conditional instructions
  • Not all instructions can address all the registers
  • Immediate operands are much smaller and shifting isn’t supported
  • The addressing modes are far simpler
  • Instructions either set or don’t set the conditional flags, there is no extra bit to control this

Most microcontrollers run a single program that has access to all the memory, so these aren’t an issue. However, the lack of hardware hasn’t stopped people adding software support and implementing Linux and other OS’s running on these microcontrollers.

Are ARM Processors Still RISC?

A full ARM A-Series processor like those found in the Raspberry Pi, Apple’s iPhone 7 iPad along with dozens of Android and ChromeOS devices, all run the full 64-bit instruction set, as well as the full 32-bit instruction set including the “thumb” instruction. They support virtual memory, virtualization, FPUs, vector processors, advanced security and everything else you would expect in a modern processor. That is a lot for something that is billed as “reduced”. Basically an ARM CPU has the same transistor budget as an x86 processor, so they use every transistor to do something useful. So why are ARM processors still considered RISC? The parts of RISC that all ARM processors retain is:

  • The instructions are a fixed length.
  • They are a load/store architecture (no instructions like add memory to register). An instruction either loads/stores from memory or performs an arithmetic operation on the registers.
  • Most instructions execute in a single clock cycle.
  • They have a large set of registers, though Intel processors now also have a large set of registers.

Even with all this functionality, ARM processors use far less power than x86 processors, this is mainly due to the simplifications that fixed length instructions and a load/store architecture provide. Intel processor now execute a RISC processor at their core, but then have to add another layer to translate each x86 instruction into their internal RISC instructions, that all uses transistors and power when executing,

So yes, even though the number of instructions in an ARM CPU has multiplied greatly over the nine generations of the chips, the core ideas are still RISC.

Summary

The line of M-series ARM CPUs are far simpler to program than the full A-Series. There is no virtual memory support, so you can access hardware addresses directly, reading and writing anywhere without worries about security or memory protection. The instruction set is simpler and nothing is wasted. Having written three books on ARM Assembly Language Programming, I think learning Assembly Language for a microcontroller is a great way to start. You have full control of the hardware and don’t have to worry about interacting with an operating system. I think you get a much better feel for how the hardware works as well as a real feel for programming for RISC based processors. If you are interested in this, I hope you check out my forthcoming book: “RP2040 Assembly Language Programming”.

Written by smist08

October 2, 2021 at 10:31 am

I/O Co-processing on the Raspberry Pi Pico

with 4 comments

Introduction

Last time we looked at how to access the RP2040’s GPIO registers directly from the CPU in Assembly Language. This is a common technique to access and control hardware wired up to a microcontroller’s GPIO pins; however, the RP2040 contains a number of programmable I/O (PIO) coprocessors that can be used to offload this work from the main ARM CPUs. In this article we’ll give a quick overview of the PIO coprocessors and present an example that moves the LED blinking logic from the CPU over to the coprocessors, freeing the CPU to perform other work. There is a PIO blink program in the SDK samples, which blinks three LEDs at different frequencies, we’ll take that program and modify it to blink the LEDs in turn so that it works the same as the examples we’ve been working with.

PIO Overview

There are eight PIO coprocessors divided into two banks for four. Each bank has a single 32 word instruction memory that contains the program(s) that run on the coprocessors. 32 instructions aren’t very many, but you can do quite a bit with these. The SDK contains samples that implement quite a few communication protocols as well as showing how to do video output. 

Each PIO has an input and output FIFO buffer for exchanging data with the main CPUs.

The PIO coprocessors execute their own Assembly Language which the Raspberry folks call a state machine, though they also say they think it is Turing-complete. Below is a diagram showing one of the banks of four. This block is then duplicated twice in the RP2040 package.

Each processor has an X and Y 32-bit general purpose register, input and output shift registers for transferring data to and from the FIFOs, a clock divider register to help control timing, a program counter and then the register to hold the executing instruction as shown in the following diagram.

Each instruction can contain a few bits that specify a delay value, so for many protocols you can control the timing just by adding a timing delay to each instruction. Combine this with the clock divider register to slow down processing and you have a lot of control of timing without using extra instructions.

Sample LED Blinking Program

You write the Assembly Language PIO part of the program into a .pio file which is then compiled by the PIO Assembler into a .h file to include into your program. You can also include C helper functions here and the Pico SDK recommends including an initialization function. The various RP2040 SDK functions to support this are pretty standard and you tend to copy/paste these from the SDK samples.

We are blinking the LEDS using a 200ms delay time which by computer speeds is very slow, but for humans is quite quick. This means we can’t use the clock divider functionality and instruction delays as they don’t go this slow. Instead we have to rely on an old fashioned delay loop. We calculated the delay value in the main function using the frequency of the processor and then doing a loop. We do this delay loop twice because we need to wait for two other LEDs to flash before it’s our turn again. The pull instruction pulls the delay from the read FIFO, then out transfers it to the y register. We move y to x, turn on the pin and then do the delay loop decementing x until its zero. Then we turn the pin off and do the delay loop twice.

.program blink
    pull block
    out y, 32
.wrap_target
    mov x, y
    set pins, 1   ; Turn LED on
lp1:
    jmp x– lp1   ; Delay for (x + 1) cycles, x is a 32 bit number
    mov x, y
    set pins, 0   ; Turn LED off
lp2:
    jmp x– lp2   ; Delay for the same number of cycles again
    mov x, y
lp3:   ; Do it twice since need to wait for 2 other leds to blink
    jmp x– lp3   ; Delay for the same number of cycles again
.wrap             ; Blink forever!

% c-sdk {
// this is a raw helper function for use by the user which sets up the GPIO output, and configures the SM to output on a particular pin

void blink_program_init(PIO pio, uint sm, uint offset, uint pin) {
   pio_gpio_init(pio, pin);
   pio_sm_set_consecutive_pindirs(pio, sm, pin, 1, true);
   pio_sm_config c = blink_program_get_default_config(offset);
   sm_config_set_set_pins(&c, pin, 1);
   pio_sm_init(pio, sm, offset, &c);
}
%}

Now the main C program. In this one we configure the pins to use. Note that we will use a coprocessor for each pin, so three coprocessors but each one executing the same program. We start a pin flashing, sleep 200ms and then start the next  one. This way we achieve the same effect as we did in our previous programs.

After we get the LED flashing running on the coprocessors, we have an infinite loop that just prints a counter out to the serial port. This is to demonstrate that the CPU can go on and do anything it wants and the LEDs will keep flashing independently without any of the CPU’s attention.

#include <stdio.h>

#include “pico/stdlib.h”
#include “hardware/pio.h”
#include “hardware/clocks.h”
#include “blink.pio.h”

const uint LED_PIN1 = 18;
const uint LED_PIN2 = 19;
const uint LED_PIN3 = 20;
#define SLEEP_TIME 200

void blink_pin_forever(PIO pio, uint sm, uint offset, uint pin, uint freq);

int main() {
    int i = 0;

    setup_default_uart();

    PIO pio = pio0;
    uint offset = pio_add_program(pio, &blink_program);
    printf(“Loaded program at %d\n”, offset);
    blink_pin_forever(pio, 0, offset, LED_PIN1, 5);
    sleep_ms(SLEEP_TIME);
    blink_pin_forever(pio, 1, offset, LED_PIN2, 5);
    sleep_ms(SLEEP_TIME);
    blink_pin_forever(pio, 2, offset, LED_PIN3, 5);

    while(1)
    {
        i++;
        printf(“Busy counting away i = %d\n”, i);
    }
}

void blink_pin_forever(PIO pio, uint sm, uint offset, uint pin, uint freq) {
    blink_program_init(pio, sm, offset, pin);
    pio_sm_set_enabled(pio, sm, true);
    printf(“Blinking pin %d at %d Hz\n”, pin, freq);
    pio->txf[sm] = clock_get_hz(clk_sys) / freq;
}

Summary

This was a quick introduction to the RP2040’s PIO coprocessors. The goal of any microcontroller is to control other interfaced hardware, whether measurement sensors or communications devices (like Wifi). The PIO coprocessors give the RP21040 programmer a powerful weapon to develop sophisticated integration projects without requiring a lot of specialized hardware to make things easier. It might be nice to have a larger instruction memory, but then in a $4 USD device, you can’t really complain.

For people playing with the Raspberry Pi Pico or another RP2040 based board, you can program in 32-bit ARM Assembly Language and might want to consider my book “Raspberry Pi Assembly Language Programming”.

Written by smist08

April 30, 2021 at 10:02 am

Bit-Banging the Raspberry Pi Pico’s GPIO Registers

with 4 comments

Introduction

Last week, I introduced my first Assembly Language program for the Raspberry Pi Pico. This was a version of my flashing LED program that I implemented in a number of programming languages for the regular Raspberry Pi. In the original article, I required three routines written in C to make things work. Yesterday, I showed how to remove one of these C routines, namely to have the main routine written in Assembly Language. Today, I’ll show how to remove the two remaining C routines, which were wrappers for two SDK routines which are implemented as inline C functions and as a consequence only usable from C code.

In this article, we’ll look at the structure for the GPIO registers on the RP2040 and how to access these. The procedure we are using is called bit-banging because we are using one of the two M0+ ARM CPU cores to loop banging the bits in the GPIO registers to turn them on and off. This isn’t the recommended way to do this on the RP2040. The RP2040 implements eight programmable I/O (PIO) co-processors that you can program to offload this sort of thing from the CPU. We’ll look at how to do that in a future article, but as a first step we are going to explore bit-banging mostly to understand the RP2040 hardware better.

The RP2040 GPIO Hardware Registers

There are 28 programmable GPIO pins on the Pico. There are 40 pins, but the others are ground, power and a couple of specialized pins (see the diagram below).

This means that we can assign each one to a bit in a 32-bit hardware register which is mapped to 32-bits of memory in the RP2040’s address space. The GPIO functions are controlled by writing a 1 bit to the correct position in the GPIO register. There is one register to turn on a GPIO pin and a different register to turn it off, this means you don’t need to read the register, change one bit and then write it back. It’s quite easy to program these since you just place one in a CPU register, shift it over by the pin number and then write it to the correct memory location. These registers start at memory location 0xd0000000 and are defined in sio.h. Note there are two sio.h files, one in hardware_regs which contains the offsets and is better for Assembly Language usage and then one in hardware_structs which contains a C structure to map over the registers. Following are the GPIO registers, note that there are a few other non-GPIO related registers at this location and a few unused gaps in case you are wondering why the addresses aren’t contiguous.

RegisterAddress
gpio_in0xd0000004
gpio_hi_in0xd0000008
gpio_out0xd0000010
gpio_set0xd0000014
gpio_clr0xd0000018
gpio_togl0xd000001c
gpio_oe0xd0000020
gpio_oe_set0xd0000024
gpio_oe_clr0xd0000028
gpio_togl0xd000002c
gpio_hi_out0xd0000030
gpio_hi_set0xd0000034
gpio_hi_clr0xd0000038
gpio_hi_togl0xd000003c
gpio_hi_oe0xd0000040
gpio_hi_oe_set0xd0000044
gpio_hi_oe_clr0xd0000048
gpio_hi_oe_togl0xd000004c

Notice that there are a number of _hi_ registers, perhaps indicating that Raspberry plans to come out with a future version with more than 32 GPIO pins.

In the SDK and my code below we just write one bit at a time, I don’t know if the RP2040’s circuitry can handle writing more bits at once, for instance can we set all three pins to output in one write instruction? Remember hardware registers tend to have minimal functionality to simplify the electronics circuitry behind them so often you can’t get too complicated in what you expect of them.

Bit-Banging the Registers in Assembly

Below is the new updated program that doesn’t require the C file. In our routines to control the GPIO pins, we pass the pin number as parameter 1, which means it is in R0. We place 1 in R3 and then shift it left by the value in R0 (the pin number). This gives the value we need to write. We then load the address of the register we need, which we specified in the .data section and write the value. Note that we need two LDR instructions, once to load the address of the memory address and then the second to load the actual value.

@
@ Assembler program to flash three LEDs connected to the
@ Raspberry Pi GPIO port using the Pico SDK.
@
@

.EQU LED_PIN1, 18
.EQU LED_PIN2, 19
.EQU LED_PIN3, 20
.EQU sleep_time, 200

.thumb_func
.global main             @ Provide program starting address to linker

.align  4 @ necessary alignment

main:

@ Init each of the three pins and set them to output

MOV R0, #LED_PIN1
BL gpio_init
MOV R0, #LED_PIN1
BL gpiosetout
MOV R0, #LED_PIN2
BL gpio_init
MOV R0, #LED_PIN2
BL gpiosetout
MOV R0, #LED_PIN3
BL gpio_init
MOV R0, #LED_PIN3
BL gpiosetout

loop:

@ Turn each pin on, sleep and then turn the pin off

MOV R0, #LED_PIN1
BL gpio_on
LDR R0, =sleep_time
BL sleep_ms
MOV R0, #LED_PIN1
BL gpio_off
MOV R0, #LED_PIN2
BL gpio_on
LDR R0, =sleep_time
BL sleep_ms
MOV R0, #LED_PIN2
BL gpio_off
MOV R0, #LED_PIN3
BL gpio_on
LDR R0, =sleep_time
BL sleep_ms
MOV R0, #LED_PIN3
BL gpio_off

B       loop @ loop forever

gpiosetout:
@ write a 1 bit to the pin position in the output set register
movs r3, #1
lsl r3, r0 @ shift over to pin position
ldr r2, =gpiosetdiroutreg @ address we want
ldr r2, [r2]
str r3, [r2]
bx lr

gpio_on:
movs r3, #1
lsl r3, r0 @ shift over to pin position
ldr r2, =gpiosetonreg @ address we want
ldr r2, [r2]
str r3, [r2]
bx lr

gpio_off:
movs r3, #1
lsl r3, r0 @ shift over to pin position
ldr r2, =gpiosetoffreg @ address we want
ldr r2, [r2]
str r3, [r2]
bx lr

.data
      .align  4 @ necessary alignment
gpiosetdiroutreg: .word   0xd0000024 @ mem address of gpio registers
gpiosetonreg: .word   0xd0000014 @ mem address of gpio registers
gpiosetoffreg: .word   0xd0000018 @ mem address of gpio registers

Having separate functions for gpio_in and gpio_out simplifies our code since we don’t need any conditional logic to load the correct register address.

We loaded the actual address from a shared location. We could have loaded the base address of 0xd000000 and then stored things via an offset, but I did this to be a little clearer. If you look at the disassembly of the SDK routine, it does something rather clever to get the base address. It does:

movs r2, #208 @ 0xd0
lsl r2, r2, #24 @ becomes 0xd0000000

And then uses something like:

str r3, [r2, #40] @ 0x28

To store the value using an index which is the offset to the correct register. I thought this was rather clever on the C compiler’s part and represents the optimizations that the ARM engineers have been adding to the GCC generation of ARM code. This technique takes the same time to execute, but doesn’t require saving any values in memory, saving a few bytes which may be crucial in a larger program.

Summary

Writing to the hardware registers directly on the Raspberry Pi Pico is a bit simpler than the Broadcom implementation in the full Raspberry Pi. With these routines we wrote our entire program in Assembly Language. There is still C code in the SDK which will be linked into our program and we are still calling both gpio_init and sleep_ms in the SDK. We could look at the source code in the SDK and reimplement these in Assembly Language, but I don’t think there is any need. Between the RP2040 documentation and the SDK’s source code it is possible to figure out a lot about how the Raspberry Pi Pico works.

For people playing with the Raspberry Pi Pico or another RP2040 based board, you can program in 32-bit ARM Assembly Language and might want to consider my book “Raspberry Pi Assembly Language Programming”.

Written by smist08

April 24, 2021 at 11:50 am

Calling Main in Assembly Language on the RP2040

with 2 comments

Introduction

In last week’s article, I presented my first Assembly Language program on the Raspberry Pi Pico. The program worked, but it included some C code that I wasn’t happy with. In this article, I’ll explain why I needed to have the main entry point in C, what I missed and how to correct this problem.

The entry point is a function main() with no parameters or return code called by the RP2040 initialization code after it initializes the RP2040 hardware. In C this worked no problem, but in Assembly Language it resulted in a hardware fault on executing the first instruction in my main() routine. This was a bit of a head scratcher and it took a couple of days before I realized what the problem was. My first thought was that it was alignment, but no it wasn’t that. Perhaps I needed to duplicate the first few instructions in the Assembly Language generated by the C compiler, but no that still caused a hardware fault. Rather mystifying and annoying.

Use the Source

The program you run on the Pico contains pretty much everything in a single executable, that initializes the CPU, peripheral hardware and then runs in an endless loop forever. There is no operating system, just your program. The Raspberry Pi Pico contains a bit of firmware which is activated when you power on with the bootsel button pressed, this allows the Pico to connect as a shareable flash drive to a USB host, and will allow you to copy files into the writable part of the Pico’s flash memory. After that it reboots to let the program run.

One of the good things about the Pico is that the SDK contains the source code for this whole thing, and when you build your program, it actually compiles all this source code alongside your code (there are no libraries in this environment). This means you can build a debug build where everything is debuggable including both your code and the SDK code. This means you can set a breakpoint before your code and single step through the SDK into your code. You can’t start debugging at the very first instruction, you need to let the first bit of the SDK initialize the processor before starting, but you can set a breakpoint fairly early. I found a good place was the platform_entry routine, which is an Assembly Language function in crt0.S. This is the function that initializes the SDK environment and then calls your main() starting point. The code for this routine is fairly innocuous:

platform_entry: // symbol for stack traces
    // Use 32-bit jumps, in case these symbols are moved out of branch range
    // (e.g. if main is in SRAM and crt0 in flash)
    ldr r1, =runtime_init
    blx r1
    ldr r1, =main
    blx r1
    ldr r1, =exit
    blx r1

Nothing special, it just loads the address of our main routine and calls it. Stepping through the C code, it works, stepping through the Assembly Language code, hardware fault.

At some point I thought to look at the documentation for the BLX instruction, why were they calling this rather than BL? This turned out to be the root of the problem.

On a full ARM A-series CPU, like those in a full Raspberry Pi or in your cell phone, it can execute a rich set of instructions, which are the regular ARM 32-bit instruction set, but on the microcontroller M-series CPU like in the Pico it only executes the so called “thumb” instructions. On the A-series CPU you switch back and forth between regular and thumb modes using the BLX instruction. Thumb instructions are 16-bit in length, regular instructions are 32-bit, both have to be aligned, on even bytes the other on 4-byte boundaries. Both of these are even addresses so the true address of any instruction is even, which means the low order bit isn’t really used (it has to be zero). The BLX instruction uses this low order bit to specify whether to switch to thumb mode or not. If it is one, then thumb mode, if even then regular instruction mode. Let’s look at the disassembly for this routine:

1000021a <platform_entry>:
1000021a: 4919      ldr r1, [pc, #100] ; (10000280 <__get_current_exception+0x1a>)
1000021c: 4788      blx r1
1000021e: 4919      ldr r1, [pc, #100] ; (10000284 <__get_current_exception+0x1e>)
10000220: 4788      blx r1
10000222: 4919      ldr r1, [pc, #100] ; (10000288 <__get_current_exception+0x22>)
10000224: 4788      blx r1

10000280: 100012bd .word 0x100012bd   ; runtime_init
10000284: 10000361 .word 0x10000360   ; main
10000288: 100013a9 .word 0x100013a9   ; exit

Notice the address for my main routine is even whereas the other two routines are odd. If I compile with the C routine then main has an odd address as well. I didn’t think of this because the RP2040’s M-series CPU only executes thumb instructions, so why have any functionality to switch between modes? I don’t know but if you do tell it to switch to regular instructions then you get a hardware fault.

The other question is why the author of crt0.S in the SDK calls routines with BLX rather than BL? Afterall the Pico doesn’t support regular instructions, so you are always in thumb mode. If platform_entry used BL instead, then I wouldn’t have had any problem. I wonder if this indicates they developed the SDK on an A-series CPU, perhaps before they obtained real RP2040’s and this indicates how they did early development on the SDK? Or perhaps there is a way to emulate the RP2040 on a full A-series CPU and this is how the developers at the Raspberry Pi foundation operate.

To correct the problem, we just need to indicate our main() routine is a thumb routine. We do this by placing a .thumb_func directive in front of the .global directive.

.thumb_func
.global main             @ Provide program starting address to linker

.align  4 @ necessary alignment

main:

The key point is that this is in front of the .global, since it is really just the linker that needs to process this to set up the correct address when it links in crt0.

Summary

This eliminates the need for the C main() function we had last week. Next time we’ll eliminate the two other C routines we had and explore how the Raspberry Pi Pico’s GPIO control registers work. As with most problems, working through the solution, teaches us a bit more about how the RP2040 works and reminds us that there are consequences of using a subset of the full ARM instruction set.

For people using this SDK, you can program in 32-bit ARM Assembly Language and might want to consider my book “Raspberry Pi Assembly Language Programming”.

Written by smist08

April 23, 2021 at 9:11 am

Assembly Language on the Raspberry Pi Pico

with 12 comments

Introduction

The Raspberry Pi Pico is the Raspberry Foundation’s first entry into the domain of Arduino style microcontrollers. The board contains Raspberry’s own designed SoC (System on a Chip) containing a dual core ARM Cortex-M0+ CPU along with memory and a collection of I/O circuitry. There are no keyboard, mouse or monitor ports on the board, only a micro-USB to connect to a host computer, a number of GPIO pins and three debug pins. This SoC is called the RP2040 and is licensed to other companies to use in their own boards. Raspberry supports programming this board in either C/C++ or MicroPython. The C/C++ SDK also supports Assembly Language programming to some degree and this article is a look at my first attempt to write an Assembly Language program for this board. I ran into a few problems and still have a few things to figure out and we’ll explain those in the article. We’ll write an Assembly Language version of the program we wrote in C last time to flash three connected LEDs.

ARM Cortex-M0+ Assembly Language

I blogged about 32-bit ARM Assembly Language here, and then presented the flashing LED Assembly Language program for the Raspberry Pi here. Further I wrote a whole book on 32-bit ARM Assembly Language Programming: “Raspberry Pi Assembly Language Programming”. These are all oriented to ARM’s full A-series processors which include floating point units (FPU), vector processors, virtual memory support and much more. The ARM M-series processors are a subset of these, designed to be low cost, use little memory and be very power efficient. The ARM M-series processors only contain what are called the ARM “thumb” instructions. Normally, on an A-series processor, each instruction takes 32-bits, but for some applications this uses too much memory, so ARM came up with “thumb” instructions where if the processor is operating in “thumb” mode then each instruction is only 16-bits in length, thus only using half the memory. The original set of “thumb” instructions was too limited, so ARM added a way to run some 32-bit instructions in with the 16-bit instructions and that makes the modern “thumb” instructions set used by the M-series processors. One consequence of using the “thumb” instructions is that registers R8 to R12 are not accessible and hence not implemented on the chip, thus saving circuitry. The registers you do have are all 32-bit and the Raspberry RP2040 has special multiplication and division circuitry to perform these operations quickly.

Code

This program uses the C/C++ SDK to access the GPIO pins, this means this Assembly Language program is quite similar to last week’s C program. To call a routine in Assembly, you put the first parameter in R0, the second in R1 and then call Branch with Link (BL). BL places the address of the next instruction into the LR register, so the called return returns by branching to the address contained in the LR register. When calling functions there is a convention on who has to save which register on the stack, but we don’t use any register over the function calls, so we don’t need to do this. This program is set up as an infinite loop, since there is nothing for the main routine to return to and if it does return the processor halts.

Assembly Language code:

@
@ Assembler program to flash three LEDs connected to the
@ Raspberry Pi Pico GPIO port using the Pico SDK.
@
@

.EQU LED_PIN1, 18
.EQU LED_PIN2, 19
.EQU LED_PIN3, 20
.EQU GPIO_OUT, 1
.EQU sleep_time, 200

.global main_asm             @ Provide program starting address to linker
main_asm:

MOV R0, #LED_PIN1
BL gpio_init
MOV R0, #LED_PIN1
MOV R1, #GPIO_OUT
BL link_gpio_set_dir
MOV R0, #LED_PIN2
BL gpio_init
MOV R0, #LED_PIN2
MOV R1, #GPIO_OUT
BL link_gpio_set_dir
MOV R0, #LED_PIN3
BL gpio_init
MOV R0, #LED_PIN3
MOV R1, #GPIO_OUT
BL link_gpio_set_dir
loop:   MOV R0, #LED_PIN1
MOV R1, #1
BL link_gpio_put
LDR R0, =sleep_time
BL sleep_ms
MOV R0, #LED_PIN1
MOV R1, #0
BL link_gpio_put
MOV R0, #LED_PIN2
MOV R1, #1
BL link_gpio_put
LDR R0, =sleep_time
BL sleep_ms
MOV R0, #LED_PIN2
MOV R1, #0
BL link_gpio_put
MOV R0, #LED_PIN3
MOV R1, #1
BL link_gpio_put
LDR R0, =sleep_time
BL sleep_ms
MOV R0, #LED_PIN3
MOV R1, #0
BL link_gpio_put
B       loop

.data

      .align  4 @ necessary alignment

I didn’t intend to include any C code, but I ran into a couple of problems that require it. One is that a large number of SDK functions are inline C functions which means they can’t be called from outside of C. In our case two functions gpio_set_dir and gpio_put are inline and required wrapping. The other problem is that if the main program is Assembly Language then the code to initialize the board doesn’t seem to be called. I think this is a matter of setting the correct CMake options, but I haven’t had a chance to figure it out yet. For now we have main in the C code and then call the Assembly Language main routine.

C code:

#include “hardware/gpio.h”

void link_gpio_set_dir(int pin, int dir)
{
gpio_set_dir(pin, dir);
}

void link_gpio_put(int pin, int value)
{
gpio_put(pin, value);
}

void main()
{
main_asm();
}

The Raspberry Pi Pico SDK uses the CMake system to manage builds. The SDK provides a large set of build rules. You run CMake and then it creates a makefile that compiles your program.

CMake file:

cmake_minimum_required(VERSION 3.13)

include(pico_sdk_import.cmake)

project(test_project C CXX ASM)

set(CMAKE_C_STANDARD 11)
set(CMAKE_CXX_STANDARD 17)

pico_sdk_init()

include_directories(${CMAKE_SOURCE_DIR})

add_executable(flashledsasm
  mainmem.S
  sdklink.c
)

pico_enable_stdio_uart(flashledsasm 1)
pico_add_extra_outputs(flashledsasm)
target_link_libraries(flashledsasm pico_stdlib)

Still To-Do

The program works, but there are a few things I’m not happy about. The Raspberry Pi Pico SDK is pretty new, so there aren’t a lot of answers on StackOverflow yet. The good thing is that it is all open source, so it is just a matter of time to figure out the code. Here is what I’ll be working on:

  1. How to have main be in Assembly Language and have the board properly initialized. Match the C startup sequence.
  2. Figure out the details of the GPIO registers and have Assembly Language versions of the inline C code that accesses these. They are similar to those on the full Raspberry Pi, but different.
  3. How to get constants from the C include file, on first try this didn’t work and gave syntax errors, but the SDK says they should be usable from Assembly Language. They might need a couple of fixes.

Summary

I planned to write a 100% Assembly Language program, but didn’t quite make it. At least the program works, showing you can include Assembly Language in your RP2040 projects. The support to build using the GCC macro assembler is all there and besides some interactions with the SDK all seems to work well. Of course the Raspberry Pi Pico SDK is pretty new so there will be a lot of updates and there are still a number of undocumented holes to investigate.

Written by smist08

April 16, 2021 at 9:43 am

Programming an Apple Watch

with 4 comments

Introduction

A cool thing about the Apple Watch is that it’s really a full ARM based computer running a Unix derived operating system that is fully programmable. Although most Apple Watch owners will never write programs for their Apple Watch, as they never write programs for their iPad or iPhone, it is entirely possible to do so using Apple’s Xcode development environment running on a newer Mac. In this article we’ll look a little at the powerful computer that is the Apple Watch and give an idea of how programs or Apps are developed.

The Platform

The Apple Watch contains a whole lot of processing power, combined with a ton of sensors and a nice retina display all packed into a very small package. The processor is a dual core 64-bit ARM CPU with 1Gig of RAM. In the Series 6 watch, these cores are low energy cores from the iPhone 11’s CPU. There is also 32Gig of storage for Apps and data. There is even a mini PowerVR GPU. The touch sensitive display is only 1.78”, but still has a resolution of 448 x 368 pixels. 1Gig may not sound like much RAM, but remember that all the Raspberry Pi’s up to the 3B, only had 1Gig of RAM and ran a full version of Linux quite nicely. For connectivity there is WiFi, Cell, Bluetooth and ultra wideband.

The sensors include: accelerometer, gyro, heart rate, barometer, always-on altimeter, compass, GPS, microphone, SpO2 and VO2max.

That’s quite a bit of computer packed into a small package only weighing 48grams.

Programming

The Apple Watch’s operating system is WatchOS which is based on iOS. Programming for WatchOS is pretty similar to programming for iOS and in fact you use the same tools. There is a WatchKit API for watch specific functions and you should keep in mind the Watchs UI limitations when creating Apps. For instance, even though you can do text entry on the watch, you have to draw each character or use the built in speech to text interface, i.e. there is no keyboard.

Typically you develop a WatchApp in parallel with an iPhone App, where the iPhone App provides configuration, setup and does much of the work allowing you to minimize the interface required on the watch. Xcode makes creating these dual Apps easy and in fact you can have separate heads for the Watch version, the Apple TV version, the iPhone version and the iPad version.

In my book “Programming with 64-Bit ARM Assembly Language”, I create a simple iOS App that has a text box, where you enter some text and then it calls an Assembly Language routine to convert the text to uppercase. Alex vonBelow took this example and added support for both the Apple Watch and AppleTV. The Github for this is available here and this program is in Chapter 10.

For most work, you debug by running the application in the iOS/WatchOS simulator. The nice thing about my new ARM based Mac is that the simulator is quite fast, since it doesn’t have to simulate running an ARM CPU on an Intel processor, instead everything is ARM and works quickly. Below is a screenshot of running this uppercase app for the Apple Watch.

The cool thing is that if you know how to write iOS Apps, then you already know how to write Apple Watch Apps (as well as AppleTV Apps). Besides writing code in Objective-C or Swift, you can even write code in 64-bit ARM Assembly Language. Xcode makes it easy to provide separate appropriate screens for each device.

There are tons of books on how to write iOS Apps and all that knowledge works across all the Apple mobile products. The key thing for the Watch is that the UI should be mostly informational and any UI should be limited to just a couple of buttons.

Programming with Objective-C or Swift using the iOS frameworks is fairly complex, it would be nice if there was something simpler like a version of Scratch for WatchOS or a command prompt App, like the one for the iPhone. But at least Xcode creates a reasonable skeleton working App when you create a new project.

Summary

The Apple Watch is quite a powerful little computer in its own right. You can program it from Xcode and use nearly all the tools you use for iOS development for the iPhone or iPad. It’s really amazing how much computing power, connectivity and sensors are stuffed into the small watch package.

Written by smist08

February 26, 2021 at 10:40 am

Porting Linux to Apple Silicon

with 2 comments

Introduction

When Apple announced they were switching from Intel to ARM CPUs, there was a worry that Apple would lock out installing non-Apple operating systems such as Linux. There is a new security processor that people worried would only allow MacOS to boot on these new chips. Fortunately, this proved to be false and the new ARM based Macintoshes fully support booting homebrew operating systems either from the SSD or from USB storage. However, the new Apple M1 chips present a number of problems that we’ll discuss in this article as well as why so many people are so interested in doing this.

Linus Torvalds, the father of Linux, recently said that he wished the new MacBooks ran Linux and that he would consider this the ultimate laptop and really want one. Linus said he saw porting Linux as possible, but personally he didn’t have the time to commit.

Last week’s article on an Assembly Language “Hello World” program hit number 1 on Hacker News and based on the comments, the interest was largely generated by the challenge of porting Linux to these new Apple systems. As we’ll see, doing this is going to require both reverse engineering and then writing ARM 64-bit Assembly Language code.

Asahi Linux

Last week we saw the announcement of the Asahi Linux project. Asahi means “rising sun” in Japanese and “asahi ringo” is Japanese for Macintosh Apple. The goal of this project is to develop a version of Linux that fully supports all the new hardware features of the Apple M1 chip including the GPU and USB-C ports. This won’t be easy because even though Apple doesn’t block you from doing this, they don’t help and they don’t provide any documentation on how the hardware works. People already have character based Linux booting and running on the Apple M1 Macs, and you can run the regular ARM version of Linux under virtualization on these new Macs, but the real goal is to understand the new hardware and have a version of Linux talking directly to the hardware that uses all the capabilities, like the GPU, to run as well as or better than MacOS.

GPUs and Linux

GPUs have always been a sore point with the Linux community. None of the GPU vendors properly document their hardware APIs to their products and believe the best way to support various operating systems is to provide precompiled binaries with no source code. Obviously this roils the open source community. GPUs are complicated and change a lot with each hardware generation. Newer Intel and AMD CPUs all have integrated graphics that have good open source drivers that at least will work, but at the disadvantage of not using all the fancy hardware you paid for in your expensive gaming PC. Even the Raspberry Pi versions of Linux use a binary Broadcom drive for the integrated GPU, rather than something open source.

Over the years, intrepid Linux developers have reverse engineered how all these GPUs work, so there are open source drivers for most nVidia and AMD GPUs. In fact, since neither nVidia or AMD support their hardware for all that long, if you have a more than 10 year old graphics card and run Linux, then you are pretty much forced to use the open source driver or switch to Intel integrated graphics (if available) or just stop upgrading the operating system and drivers.

The good news is that the open source community has a lot of experience figuring out how GPUs work, including those from nVidia, AMD, ARM and Broadcom. The bad news is that it takes time to first work out a disassembler of the GPU instructions to go from the binary form and work out what each bit means to produce a mnemonic Assembly Language source form. Then once this is known, write an Assembler for this and then use the tool to create the graphics driver. The Apple GPU isn’t entirely new, originally it was based on Imagination Technologies GPU design and then went through several iterations in iPads and iPhones before the current newest version ending up in the M1. Hopefully this history will be some help in developing the new Linux drivers.

Leveraging Existing Drivers

All the CPU vendors including ARM Holdings are motivated to contribute to the Linux kernel to ensure it runs well on their hardware. Linux is big enough that it greatly benefits vendors adoption to have a solid Linux offering. There is already really good ARM support in the Linux kernel and its tool chain such as GNU GCC. This is a solid first step in producing a working version of Linux for Apple Silicon.

Further, Apple doesn’t do everything themselves. There is hope that even if components are integrated into the M1 SoC that they still used standard designs. After all, Apple didn’t want to write all new drivers for MacOS. Hopefully a lot of the hardware drivers for the Intel Macs will just need to be recompiled for ARM and just work (or require very little work).

I haven’t mentioned the Apple integrated AI processor, but the hope here is that once the GPU is understood, that the AI processor is fairly similar, just missing the graphics specific parts and containing the same core vector processor.

There are quite a few other components in the SoC including sound processing and video decoding, hopefully these are known entities and not entirely new.

Why Do All This Work?

It’s hard enough writing device drivers when you have complete hardware documentation and can call a vendor’s support line. Having to reverse engineer how everything works first is a monumental task, so why are all these open source developers flocking to this task? Quite a few people like the challenge, if Apple provided lots of good documentation, then it would just be too easy. There is an attraction to having to connect hardware diagnostic equipment to your computer and iteratively write Assembly Language to figure out how to control things. None of this work is paid, besides the odd bit of gofundme money, these are mostly volunteers doing this in their spare time separate from their day jobs.

Humans are very curious creatures. Apple, by not providing any details, has piqued everyone’s curiosity. We don’t like being told no, you’re not allowed to know something. This just irritates us and perhaps we think there is something good being withheld from us.

There is also some fame to be had in hacker circles, as the people who solve the big problems are going to become legends in the Linux world.

Whatever the reason, we will all benefit from their hard work and determination. A well running Linux on Apple Silicon will be a great way to get full control of your hardware and escape App store restrictions and Apple’s policies on what you can and cannot do with your computer. It might even be a first step to producing Linux for iPhones and iPads which would be cool.

Summary

Apple has set a mythic challenge to hackers everywhere. By not providing any hardware documentation, Apple has created an epic contest for hackers to crack this nut and figure out how all the nitty gritty details of Apple Silicon work. This is a fun and difficult problem to work on. The kind of thing hackers love. I bet we are going to see prototype drivers and hardware details much faster than we think.

All of this requires a good knowledge of ARM 64-bit Assembly Language, so consider my book as a great way to learn all the details on how it works. I even have a chapter on reverse engineering which is hopefully helpful.

Written by smist08

January 15, 2021 at 10:59 am

Apple M1 Assembly Language Hello World

with 15 comments

Introduction

Last week, we talked about using a new Apple M1 based Macintosh as a development workstation and how installing Apple’s development system XCode also installed a large number of open source development tools including LLVM and make. This week, we’ll cover how to compile and run a simple command line ARM Assembly Language Hello World program.

Thanks to Alex vonBelow

My book “Programming with 64-Bit ARM Assembly Language” contains lots of sample self contained Assembly Language programs and a number of iOS and Android samples. The command line utilities are compiled for Linux using the GNU tool set. Alex vonBelow took all of these and modified them to work with the LLVM tool chain and to work within Apple’s development environment. He dealt with all the differences between Linux and MacOS/iOS as well. His version of the source code for my book, but modified for Apple M1 is available here:

https://github.com/below/HelloSilicon.

Differences Between MacOS and Linux

Both MacOS and Linux are based on Unix and are more similar than different. However there are a few differences of note:

  • MacOS uses LLVM by default whereas Linux uses GNU GCC. This really just affects the command line arguments in the makefile for the purposes of this article. You can use LLVM on Linux and GCC should be available for Apple M1 shortly.
  • The MacOS linker/loader doesn’t like doing relocations, so you need to use the ADR rather than LDR instruction to load addresses. You could use ADR in Linux and if you do this it will work in both.
  • The Unix API calls are nearly the same, the difference is that Linux redid the function numbers when they went to 64-bit, but MacOS kept the function numbers the same. In the 32-bit world they were the same, but now they are all different.
  • When calling a Linux service the function number goes in X16 rather than X8.
  • Linux installs the various libraries and includes files under /usr/lib and /usr/include, so they are easy to find and use. When you install XCode, it installs SDKs for MacOS, iOS, iPadOS, iWatchOS, etc. with the option of installing lots for versions. The paths to the libs and includes are rather complicated and you need a tool to find them.
  • In MacOS the program must start on a 64-bit boundary, hence the listing has an “.align 2” directive near top.
  • In MacOS you need to link in the System library even if you don’t make a system call from it or you get a linker error. This sample Hello World program uses software interrupts to make the system calls rather than the API in the System library and so shouldn’t need to link to it.
  • In MacOS the default entry point is _main whereas in Linux it is _start. This is changed via a command line argument to the linker.

Hello World Assembly File

Below is the simple Assembly Language program to print out “Hello World” in a terminal window. For all the gory details on these instructions and the architecture of the ARM processor, check out my book.

//
// Assembler program to print "Hello World!"
// to stdout.
//
// X0-X2 - parameters to linux function services
// X16 - linux function number
//
.global _start             // Provide program starting address to linker
.align 2

// Setup the parameters to print hello world
// and then call Linux to do it.

_start: mov X0, #1     // 1 = StdOut
adr X1, helloworld // string to print
mov X2, #13     // length of our string
mov X16, #4     // MacOS write system call
svc 0     // Call linux to output the string

// Setup the parameters to exit the program
// and then call Linux to do it.

mov     X0, #0      // Use 0 return code
       mov     X16, #1     // Service command code 1 terminates this program
       svc     0           // Call MacOS to terminate the program

helloworld:      .ascii  "Hello World!\n"

Makefile

Here is the makefile, the command to assemble the source code is simple, then the command to link is a bit more complicated.

HelloWorld: HelloWorld.o
ld -macosx_version_min 11.0.0 -o HelloWorld HelloWorld.o -lSystem -syslibroot
`xcrun -sdk macosx --show-sdk-path` -e _start -arch arm64

HelloWorld.o: HelloWorld.s
as -o HelloWorld.o HelloWorld.s

The xcrun command is Apple’s command to run or find the various SDKs. Here is a sample of running it:

stephensmith@Stephens-MacBook-Air-2 ~ % xcrun -sdk macosx –show-sdk-path
objc[42012]: Class AMSupportURLConnectionDelegate is implemented in both ?? (0x1edb5b8f0) and ?? (0x122dd02b8). One of the two will be used. Which one is undefined.
objc[42012]: Class AMSupportURLSession is implemented in both ?? (0x1edb5b940) and ?? (0x122dd0308). One of the two will be used. Which one is undefined.
objc[42013]: Class AMSupportURLConnectionDelegate is implemented in both ?? (0x1edb5b8f0) and ?? (0x1141942b8). One of the two will be used. Which one is undefined.
objc[42013]: Class AMSupportURLSession is implemented in both ?? (0x1edb5b940) and ?? (0x114194308). One of the two will be used. Which one is undefined.
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
stephensmith@Stephens-MacBook-Air-2 ~ %

After the ugly warnings from Objective-C, the path to the MacOS SDK is displayed.

Now we can compile and run our program.

stephensmith@Stephens-MacBook-Air-2 Chapter 1 % make -B
as -o HelloWorld.o HelloWorld.s
objc[42104]: Class AMSupportURLConnectionDelegate is implemented in both ?? (0x1edb5b8f0) and ?? (0x1145342b8). One of the two will be used. Which one is undefined.
objc[42104]: Class AMSupportURLSession is implemented in both ?? (0x1edb5b940) and ?? (0x114534308). One of the two will be used. Which one is undefined.
ld -macosx_version_min 11.0.0 -o HelloWorld HelloWorld.o -lSystem -syslibroot `xcrun -sdk macosx –show-sdk-path` -e _start -arch arm64 
stephensmith@Stephens-MacBook-Air-2 Chapter 1 % ./HelloWorld 
Hello World!
stephensmith@Stephens-MacBook-Air-2 Chapter 1 %

Summary

The new Apple M1 Macintoshes are running ARM processors as part of all that Apple Silicon and you can run standard ARM 64-bit Assembly Language. LLVM is a standard open source development tool which contains an Assembler that is similar to the GNU Assembler. Programming MacOS is similar to Linux since both are based on Unix and if you are familiar with Linux, most of your knowledge is directly applicable.

Written by smist08

January 8, 2021 at 10:31 am