Stephen Smith's Blog

Musings on Machine Learning…

Archive for the ‘assembly language’ Category

Assembly Language is Number 8

with 2 comments

Introduction

Tiobe regularly produces a list of the most popular programming languages and their recently published list has Assembly Language at number 8, moving up from number 16 last year. The top eight languages are:

  1. Python
  2. C
  3. Java
  4. C++
  5. C#
  6. Visual Basic
  7. JavaScript
  8. Assembly Language

The top spots are all well established and well used, C shows remarkable resilience and Java remains popular, in spite of Oracle. In the early days of the PC, all major applications and games were written in Assembly Language, but with the availability of high quality C compilers, this waned and application development switched to C and then other high level languages. Let’s look at why Assembly Language is having a bit of a renaissance.

Assembly Language is Accessible

In the early days, you needed to buy a macro assembler from the chip manufacturer or some other vendor, such as Microsoft’s MASM. Now, all the chip vendors add their Assembly Language support directly into the open source GNU Assembler and/or the LLVM Assembler. Both of these are excellent macro assemblers, run on any hardware, support cross compiling and best of all are completely free.

In my first job out of university, I did some Assembly Language programming on an Intel 80186 board and to debug it, I needed to use an in-circuit emulator (I2ICE) which was a big expensive piece of hardware that replaced the CPU with a debugging probe. Now, all the CPUs and boards have excellent debug probes and you can debug them using open source tools like GNU’s gdb.

Another big help are all the great books on Assembly Language that are available such as: “Raspberry Pi Assembly Language Programming”, “Programming with 64-Bit ARM Assembly Language” and “RP2040 Assembly Language Programming”.

Microcontrollers are Everywhere

The Arduino microcontroller has created a giant community of DIY electronics hobbyists. There is a huge proliferation of inexpensive microcontrollers. In the Arduino world, you program these in Arduino C, but often to get the performance you need, you need to drop down to Assembly Language. Similarly, the memory on these boards is limited, and Assembly Language is the only way to get use of every single bit available to you. With the newer microcontrollers like Raspberry’s RP2040 which are based on ARM 32-bit M-series CPUs, these are much more powerful and have more memory. However, with the extra power, people are attempting more ambitious projects, often involving machine learning applications or other compute intensive applications. Again, they hit the wall with C or MicroPython programming and have to delve into Assembly Language to solve their problems.

When people program these microcontrollers, they are connecting to all sorts of imaginative hardware devices, and they have to create their own libraries to interface to these and often the best way to do this is via Assembly Language.

Competition in the Phone App Market

The App markets for both iOS and Android have matured where as new versions come out, there are fewer changes. The competition between various Apps in a given category is intense and one key way for vendors to differentiate themselves from their competition is via improved performance. Beyond re-writing code to use more efficient algorithms, programmers are turning to hand-crafting the core routines of their Apps into Assembly Language.

Machine Learning

Machine Learning (ML) or AI is extremely compute intensive. There has been a proliferation of coprocessor boards for performing ML computations. All these coprocessors need to be programmed in their own native Assembly Language. Similarly, although you can program nVidia GPUs in CUDA C, to get the absolute most out of a board, you need to delve into the board’s native Assembly Language. Most of the ML libraries are built over top of older Linear Algebra mathematical libraries written in Fortran. As people take on harder and harder problems and need to get useful work done out of every CPU cycle, many routines are being re-written in Assembly Language.

Summary

Modern applications are usually written with a number of modules, each module written in the best programming language for the module’s function. Perhaps C for a back end process, JavaScript for a web page and then Assembly Language for important performance critical routines. I don’t think anyone is taking on large applications in 100% Assembly Language, but enough Assembly Language is making its way into applications to move it up the Tiobe index.

Assembly Language is a great way to learn about how computers work and you might want to take a look at one of my books on the subject.

Written by smist08

November 13, 2021 at 4:47 pm

Posted in assembly language

Tagged with ,

RP2040 Assembly Language Programming

with 6 comments

Introduction

My third book on ARM Assembly Language programming has recently started shipping from Apress/Springer, just in time for Christmas. This one is “RP2040 Assembly Language Programming” and goes into detail on how to program Raspberry’s RP2040 SoC. This chip is used in the Raspberry Pi Pico along with boards from several other manufacturers such as Seeed Studios, AdaFruit, Arduino and Pimoroni.

Flavours of ARM Assembly Language

ARM has ambitions to provide CPUs from the cheapest microcontrollers costing less than a dollar all the way up to supercomputers costing millions of dollars. Along the road to this, there are now three distinct flavours of ARM Assembly Language:

  1. A Series 32-bit
  2. M Series 32-bit
  3. 64-bit

Let’s look at each of these in turn.

A Series 32-bit

For A Series, each instruction is 32-bits in length and as the processors have evolved they added features to support virtual memory, advanced security and other features to support advanced operating systems like Linux, iOS and Android. This is the Assembly Language used in 32-bit phones, tablets and the Raspberry Pi OS. This is covered in my book “Raspberry Pi Assembly Language Programming”.

M Series 32-bit

The full A series instruction set didn’t work well in microcontroller environments. Using 32-bits for each instruction was considered wasteful as well as supporting all the features for advanced operating systems made the CPUs too expensive. To solve the memory problem, ARM introduced a mode to A series 32-bit where each instruction was 16-bits, this saved memory, but the processors were still too expensive. When ARM introduced their M series, or microcontroller processors, they made this 16-bit instruction format the native format and removed most of the advanced operating system features. The RP2040 SoC used in the Raspberry Pi Pico is one of these M Series CPUs using dual core ARM Cortex M0+ CPUs. This is the subject of my current book “RP2040 Assembly Language Programming”.

64-bit

Like Intel and AMD, ARM made the transition from 32-bit to 64-bit processors. As part of this they cleaned up the instruction set, added registers and created a third variant of ARM Assembly Language. iOS and Android are now fully 64-bit and you can run 64-bit versions of Linux on newer Raspberry Pis. The ARM 64-bit instruction set is the topic of my book: “Programming with 64-Bit ARM Assembly Language”.

ARM 64-bit CPUs can run the 32-bit instruction set, and then the M series instruction set is a subset of the A series 32-bit instruction set. Each one is a full featured rich instruction set and deserves a book of its own. If you want to learn all three, I recommend buying all three of my books.

More Than ARM CPUs

The RP2040 is a System on a Chip (SoC), it includes the two M-series ARM CPU cores; but, it also includes many built in hardware interfaces, memory and other components. RP2040 boards don’t need much beyond the RP2040 chip besides a method to interface other components.

“RP2040 Assembly Language Programming” includes coverage of how to use the various hardware registers to control the built-in hardware controllers, as well as the innovative Programmable I/O (PIO) hardware coprocessors. These PIO coprocessors have their own Assembly Language and are capable of some very sophisticated communications protocols, even VGA.

Where to Buy

“RP2040 Assembly Language Programming” is available from most booksellers including:

Currently if you search for “RP2040” in books on any of these sites, my book comes up first.

Summary

The Raspberry Pi Pico and the RP2040 chip aren’t the first ARM M-series based microcontrollers, but with their release, suddenly the popularity and acceptance of ARM processors in the microcontroller space has exploded. The instruction set for ARM’s M-series processors is simple, clean and a great example of a RISC instruction set. Whether you are into more advanced microcontroller applications or learning Assembly Language for the first time, this is a great place to start.

Written by smist08

November 5, 2021 at 10:42 am

I/O Co-processing on the Raspberry Pi Pico

with 4 comments

Introduction

Last time we looked at how to access the RP2040’s GPIO registers directly from the CPU in Assembly Language. This is a common technique to access and control hardware wired up to a microcontroller’s GPIO pins; however, the RP2040 contains a number of programmable I/O (PIO) coprocessors that can be used to offload this work from the main ARM CPUs. In this article we’ll give a quick overview of the PIO coprocessors and present an example that moves the LED blinking logic from the CPU over to the coprocessors, freeing the CPU to perform other work. There is a PIO blink program in the SDK samples, which blinks three LEDs at different frequencies, we’ll take that program and modify it to blink the LEDs in turn so that it works the same as the examples we’ve been working with.

PIO Overview

There are eight PIO coprocessors divided into two banks for four. Each bank has a single 32 word instruction memory that contains the program(s) that run on the coprocessors. 32 instructions aren’t very many, but you can do quite a bit with these. The SDK contains samples that implement quite a few communication protocols as well as showing how to do video output. 

Each PIO has an input and output FIFO buffer for exchanging data with the main CPUs.

The PIO coprocessors execute their own Assembly Language which the Raspberry folks call a state machine, though they also say they think it is Turing-complete. Below is a diagram showing one of the banks of four. This block is then duplicated twice in the RP2040 package.

Each processor has an X and Y 32-bit general purpose register, input and output shift registers for transferring data to and from the FIFOs, a clock divider register to help control timing, a program counter and then the register to hold the executing instruction as shown in the following diagram.

Each instruction can contain a few bits that specify a delay value, so for many protocols you can control the timing just by adding a timing delay to each instruction. Combine this with the clock divider register to slow down processing and you have a lot of control of timing without using extra instructions.

Sample LED Blinking Program

You write the Assembly Language PIO part of the program into a .pio file which is then compiled by the PIO Assembler into a .h file to include into your program. You can also include C helper functions here and the Pico SDK recommends including an initialization function. The various RP2040 SDK functions to support this are pretty standard and you tend to copy/paste these from the SDK samples.

We are blinking the LEDS using a 200ms delay time which by computer speeds is very slow, but for humans is quite quick. This means we can’t use the clock divider functionality and instruction delays as they don’t go this slow. Instead we have to rely on an old fashioned delay loop. We calculated the delay value in the main function using the frequency of the processor and then doing a loop. We do this delay loop twice because we need to wait for two other LEDs to flash before it’s our turn again. The pull instruction pulls the delay from the read FIFO, then out transfers it to the y register. We move y to x, turn on the pin and then do the delay loop decementing x until its zero. Then we turn the pin off and do the delay loop twice.

.program blink
    pull block
    out y, 32
.wrap_target
    mov x, y
    set pins, 1   ; Turn LED on
lp1:
    jmp x– lp1   ; Delay for (x + 1) cycles, x is a 32 bit number
    mov x, y
    set pins, 0   ; Turn LED off
lp2:
    jmp x– lp2   ; Delay for the same number of cycles again
    mov x, y
lp3:   ; Do it twice since need to wait for 2 other leds to blink
    jmp x– lp3   ; Delay for the same number of cycles again
.wrap             ; Blink forever!

% c-sdk {
// this is a raw helper function for use by the user which sets up the GPIO output, and configures the SM to output on a particular pin

void blink_program_init(PIO pio, uint sm, uint offset, uint pin) {
   pio_gpio_init(pio, pin);
   pio_sm_set_consecutive_pindirs(pio, sm, pin, 1, true);
   pio_sm_config c = blink_program_get_default_config(offset);
   sm_config_set_set_pins(&c, pin, 1);
   pio_sm_init(pio, sm, offset, &c);
}
%}

Now the main C program. In this one we configure the pins to use. Note that we will use a coprocessor for each pin, so three coprocessors but each one executing the same program. We start a pin flashing, sleep 200ms and then start the next  one. This way we achieve the same effect as we did in our previous programs.

After we get the LED flashing running on the coprocessors, we have an infinite loop that just prints a counter out to the serial port. This is to demonstrate that the CPU can go on and do anything it wants and the LEDs will keep flashing independently without any of the CPU’s attention.

#include <stdio.h>

#include “pico/stdlib.h”
#include “hardware/pio.h”
#include “hardware/clocks.h”
#include “blink.pio.h”

const uint LED_PIN1 = 18;
const uint LED_PIN2 = 19;
const uint LED_PIN3 = 20;
#define SLEEP_TIME 200

void blink_pin_forever(PIO pio, uint sm, uint offset, uint pin, uint freq);

int main() {
    int i = 0;

    setup_default_uart();

    PIO pio = pio0;
    uint offset = pio_add_program(pio, &blink_program);
    printf(“Loaded program at %d\n”, offset);
    blink_pin_forever(pio, 0, offset, LED_PIN1, 5);
    sleep_ms(SLEEP_TIME);
    blink_pin_forever(pio, 1, offset, LED_PIN2, 5);
    sleep_ms(SLEEP_TIME);
    blink_pin_forever(pio, 2, offset, LED_PIN3, 5);

    while(1)
    {
        i++;
        printf(“Busy counting away i = %d\n”, i);
    }
}

void blink_pin_forever(PIO pio, uint sm, uint offset, uint pin, uint freq) {
    blink_program_init(pio, sm, offset, pin);
    pio_sm_set_enabled(pio, sm, true);
    printf(“Blinking pin %d at %d Hz\n”, pin, freq);
    pio->txf[sm] = clock_get_hz(clk_sys) / freq;
}

Summary

This was a quick introduction to the RP2040’s PIO coprocessors. The goal of any microcontroller is to control other interfaced hardware, whether measurement sensors or communications devices (like Wifi). The PIO coprocessors give the RP21040 programmer a powerful weapon to develop sophisticated integration projects without requiring a lot of specialized hardware to make things easier. It might be nice to have a larger instruction memory, but then in a $4 USD device, you can’t really complain.

For people playing with the Raspberry Pi Pico or another RP2040 based board, you can program in 32-bit ARM Assembly Language and might want to consider my book “Raspberry Pi Assembly Language Programming”.

Written by smist08

April 30, 2021 at 10:02 am

Bit-Banging the Raspberry Pi Pico’s GPIO Registers

with 4 comments

Introduction

Last week, I introduced my first Assembly Language program for the Raspberry Pi Pico. This was a version of my flashing LED program that I implemented in a number of programming languages for the regular Raspberry Pi. In the original article, I required three routines written in C to make things work. Yesterday, I showed how to remove one of these C routines, namely to have the main routine written in Assembly Language. Today, I’ll show how to remove the two remaining C routines, which were wrappers for two SDK routines which are implemented as inline C functions and as a consequence only usable from C code.

In this article, we’ll look at the structure for the GPIO registers on the RP2040 and how to access these. The procedure we are using is called bit-banging because we are using one of the two M0+ ARM CPU cores to loop banging the bits in the GPIO registers to turn them on and off. This isn’t the recommended way to do this on the RP2040. The RP2040 implements eight programmable I/O (PIO) co-processors that you can program to offload this sort of thing from the CPU. We’ll look at how to do that in a future article, but as a first step we are going to explore bit-banging mostly to understand the RP2040 hardware better.

The RP2040 GPIO Hardware Registers

There are 28 programmable GPIO pins on the Pico. There are 40 pins, but the others are ground, power and a couple of specialized pins (see the diagram below).

This means that we can assign each one to a bit in a 32-bit hardware register which is mapped to 32-bits of memory in the RP2040’s address space. The GPIO functions are controlled by writing a 1 bit to the correct position in the GPIO register. There is one register to turn on a GPIO pin and a different register to turn it off, this means you don’t need to read the register, change one bit and then write it back. It’s quite easy to program these since you just place one in a CPU register, shift it over by the pin number and then write it to the correct memory location. These registers start at memory location 0xd0000000 and are defined in sio.h. Note there are two sio.h files, one in hardware_regs which contains the offsets and is better for Assembly Language usage and then one in hardware_structs which contains a C structure to map over the registers. Following are the GPIO registers, note that there are a few other non-GPIO related registers at this location and a few unused gaps in case you are wondering why the addresses aren’t contiguous.

RegisterAddress
gpio_in0xd0000004
gpio_hi_in0xd0000008
gpio_out0xd0000010
gpio_set0xd0000014
gpio_clr0xd0000018
gpio_togl0xd000001c
gpio_oe0xd0000020
gpio_oe_set0xd0000024
gpio_oe_clr0xd0000028
gpio_togl0xd000002c
gpio_hi_out0xd0000030
gpio_hi_set0xd0000034
gpio_hi_clr0xd0000038
gpio_hi_togl0xd000003c
gpio_hi_oe0xd0000040
gpio_hi_oe_set0xd0000044
gpio_hi_oe_clr0xd0000048
gpio_hi_oe_togl0xd000004c

Notice that there are a number of _hi_ registers, perhaps indicating that Raspberry plans to come out with a future version with more than 32 GPIO pins.

In the SDK and my code below we just write one bit at a time, I don’t know if the RP2040’s circuitry can handle writing more bits at once, for instance can we set all three pins to output in one write instruction? Remember hardware registers tend to have minimal functionality to simplify the electronics circuitry behind them so often you can’t get too complicated in what you expect of them.

Bit-Banging the Registers in Assembly

Below is the new updated program that doesn’t require the C file. In our routines to control the GPIO pins, we pass the pin number as parameter 1, which means it is in R0. We place 1 in R3 and then shift it left by the value in R0 (the pin number). This gives the value we need to write. We then load the address of the register we need, which we specified in the .data section and write the value. Note that we need two LDR instructions, once to load the address of the memory address and then the second to load the actual value.

@
@ Assembler program to flash three LEDs connected to the
@ Raspberry Pi GPIO port using the Pico SDK.
@
@

.EQU LED_PIN1, 18
.EQU LED_PIN2, 19
.EQU LED_PIN3, 20
.EQU sleep_time, 200

.thumb_func
.global main             @ Provide program starting address to linker

.align  4 @ necessary alignment

main:

@ Init each of the three pins and set them to output

MOV R0, #LED_PIN1
BL gpio_init
MOV R0, #LED_PIN1
BL gpiosetout
MOV R0, #LED_PIN2
BL gpio_init
MOV R0, #LED_PIN2
BL gpiosetout
MOV R0, #LED_PIN3
BL gpio_init
MOV R0, #LED_PIN3
BL gpiosetout

loop:

@ Turn each pin on, sleep and then turn the pin off

MOV R0, #LED_PIN1
BL gpio_on
LDR R0, =sleep_time
BL sleep_ms
MOV R0, #LED_PIN1
BL gpio_off
MOV R0, #LED_PIN2
BL gpio_on
LDR R0, =sleep_time
BL sleep_ms
MOV R0, #LED_PIN2
BL gpio_off
MOV R0, #LED_PIN3
BL gpio_on
LDR R0, =sleep_time
BL sleep_ms
MOV R0, #LED_PIN3
BL gpio_off

B       loop @ loop forever

gpiosetout:
@ write a 1 bit to the pin position in the output set register
movs r3, #1
lsl r3, r0 @ shift over to pin position
ldr r2, =gpiosetdiroutreg @ address we want
ldr r2, [r2]
str r3, [r2]
bx lr

gpio_on:
movs r3, #1
lsl r3, r0 @ shift over to pin position
ldr r2, =gpiosetonreg @ address we want
ldr r2, [r2]
str r3, [r2]
bx lr

gpio_off:
movs r3, #1
lsl r3, r0 @ shift over to pin position
ldr r2, =gpiosetoffreg @ address we want
ldr r2, [r2]
str r3, [r2]
bx lr

.data
      .align  4 @ necessary alignment
gpiosetdiroutreg: .word   0xd0000024 @ mem address of gpio registers
gpiosetonreg: .word   0xd0000014 @ mem address of gpio registers
gpiosetoffreg: .word   0xd0000018 @ mem address of gpio registers

Having separate functions for gpio_in and gpio_out simplifies our code since we don’t need any conditional logic to load the correct register address.

We loaded the actual address from a shared location. We could have loaded the base address of 0xd000000 and then stored things via an offset, but I did this to be a little clearer. If you look at the disassembly of the SDK routine, it does something rather clever to get the base address. It does:

movs r2, #208 @ 0xd0
lsl r2, r2, #24 @ becomes 0xd0000000

And then uses something like:

str r3, [r2, #40] @ 0x28

To store the value using an index which is the offset to the correct register. I thought this was rather clever on the C compiler’s part and represents the optimizations that the ARM engineers have been adding to the GCC generation of ARM code. This technique takes the same time to execute, but doesn’t require saving any values in memory, saving a few bytes which may be crucial in a larger program.

Summary

Writing to the hardware registers directly on the Raspberry Pi Pico is a bit simpler than the Broadcom implementation in the full Raspberry Pi. With these routines we wrote our entire program in Assembly Language. There is still C code in the SDK which will be linked into our program and we are still calling both gpio_init and sleep_ms in the SDK. We could look at the source code in the SDK and reimplement these in Assembly Language, but I don’t think there is any need. Between the RP2040 documentation and the SDK’s source code it is possible to figure out a lot about how the Raspberry Pi Pico works.

For people playing with the Raspberry Pi Pico or another RP2040 based board, you can program in 32-bit ARM Assembly Language and might want to consider my book “Raspberry Pi Assembly Language Programming”.

Written by smist08

April 24, 2021 at 11:50 am

Calling Main in Assembly Language on the RP2040

with 2 comments

Introduction

In last week’s article, I presented my first Assembly Language program on the Raspberry Pi Pico. The program worked, but it included some C code that I wasn’t happy with. In this article, I’ll explain why I needed to have the main entry point in C, what I missed and how to correct this problem.

The entry point is a function main() with no parameters or return code called by the RP2040 initialization code after it initializes the RP2040 hardware. In C this worked no problem, but in Assembly Language it resulted in a hardware fault on executing the first instruction in my main() routine. This was a bit of a head scratcher and it took a couple of days before I realized what the problem was. My first thought was that it was alignment, but no it wasn’t that. Perhaps I needed to duplicate the first few instructions in the Assembly Language generated by the C compiler, but no that still caused a hardware fault. Rather mystifying and annoying.

Use the Source

The program you run on the Pico contains pretty much everything in a single executable, that initializes the CPU, peripheral hardware and then runs in an endless loop forever. There is no operating system, just your program. The Raspberry Pi Pico contains a bit of firmware which is activated when you power on with the bootsel button pressed, this allows the Pico to connect as a shareable flash drive to a USB host, and will allow you to copy files into the writable part of the Pico’s flash memory. After that it reboots to let the program run.

One of the good things about the Pico is that the SDK contains the source code for this whole thing, and when you build your program, it actually compiles all this source code alongside your code (there are no libraries in this environment). This means you can build a debug build where everything is debuggable including both your code and the SDK code. This means you can set a breakpoint before your code and single step through the SDK into your code. You can’t start debugging at the very first instruction, you need to let the first bit of the SDK initialize the processor before starting, but you can set a breakpoint fairly early. I found a good place was the platform_entry routine, which is an Assembly Language function in crt0.S. This is the function that initializes the SDK environment and then calls your main() starting point. The code for this routine is fairly innocuous:

platform_entry: // symbol for stack traces
    // Use 32-bit jumps, in case these symbols are moved out of branch range
    // (e.g. if main is in SRAM and crt0 in flash)
    ldr r1, =runtime_init
    blx r1
    ldr r1, =main
    blx r1
    ldr r1, =exit
    blx r1

Nothing special, it just loads the address of our main routine and calls it. Stepping through the C code, it works, stepping through the Assembly Language code, hardware fault.

At some point I thought to look at the documentation for the BLX instruction, why were they calling this rather than BL? This turned out to be the root of the problem.

On a full ARM A-series CPU, like those in a full Raspberry Pi or in your cell phone, it can execute a rich set of instructions, which are the regular ARM 32-bit instruction set, but on the microcontroller M-series CPU like in the Pico it only executes the so called “thumb” instructions. On the A-series CPU you switch back and forth between regular and thumb modes using the BLX instruction. Thumb instructions are 16-bit in length, regular instructions are 32-bit, both have to be aligned, on even bytes the other on 4-byte boundaries. Both of these are even addresses so the true address of any instruction is even, which means the low order bit isn’t really used (it has to be zero). The BLX instruction uses this low order bit to specify whether to switch to thumb mode or not. If it is one, then thumb mode, if even then regular instruction mode. Let’s look at the disassembly for this routine:

1000021a <platform_entry>:
1000021a: 4919      ldr r1, [pc, #100] ; (10000280 <__get_current_exception+0x1a>)
1000021c: 4788      blx r1
1000021e: 4919      ldr r1, [pc, #100] ; (10000284 <__get_current_exception+0x1e>)
10000220: 4788      blx r1
10000222: 4919      ldr r1, [pc, #100] ; (10000288 <__get_current_exception+0x22>)
10000224: 4788      blx r1

10000280: 100012bd .word 0x100012bd   ; runtime_init
10000284: 10000361 .word 0x10000360   ; main
10000288: 100013a9 .word 0x100013a9   ; exit

Notice the address for my main routine is even whereas the other two routines are odd. If I compile with the C routine then main has an odd address as well. I didn’t think of this because the RP2040’s M-series CPU only executes thumb instructions, so why have any functionality to switch between modes? I don’t know but if you do tell it to switch to regular instructions then you get a hardware fault.

The other question is why the author of crt0.S in the SDK calls routines with BLX rather than BL? Afterall the Pico doesn’t support regular instructions, so you are always in thumb mode. If platform_entry used BL instead, then I wouldn’t have had any problem. I wonder if this indicates they developed the SDK on an A-series CPU, perhaps before they obtained real RP2040’s and this indicates how they did early development on the SDK? Or perhaps there is a way to emulate the RP2040 on a full A-series CPU and this is how the developers at the Raspberry Pi foundation operate.

To correct the problem, we just need to indicate our main() routine is a thumb routine. We do this by placing a .thumb_func directive in front of the .global directive.

.thumb_func
.global main             @ Provide program starting address to linker

.align  4 @ necessary alignment

main:

The key point is that this is in front of the .global, since it is really just the linker that needs to process this to set up the correct address when it links in crt0.

Summary

This eliminates the need for the C main() function we had last week. Next time we’ll eliminate the two other C routines we had and explore how the Raspberry Pi Pico’s GPIO control registers work. As with most problems, working through the solution, teaches us a bit more about how the RP2040 works and reminds us that there are consequences of using a subset of the full ARM instruction set.

For people using this SDK, you can program in 32-bit ARM Assembly Language and might want to consider my book “Raspberry Pi Assembly Language Programming”.

Written by smist08

April 23, 2021 at 9:11 am

Assembly Language on the Raspberry Pi Pico

with 12 comments

Introduction

The Raspberry Pi Pico is the Raspberry Foundation’s first entry into the domain of Arduino style microcontrollers. The board contains Raspberry’s own designed SoC (System on a Chip) containing a dual core ARM Cortex-M0+ CPU along with memory and a collection of I/O circuitry. There are no keyboard, mouse or monitor ports on the board, only a micro-USB to connect to a host computer, a number of GPIO pins and three debug pins. This SoC is called the RP2040 and is licensed to other companies to use in their own boards. Raspberry supports programming this board in either C/C++ or MicroPython. The C/C++ SDK also supports Assembly Language programming to some degree and this article is a look at my first attempt to write an Assembly Language program for this board. I ran into a few problems and still have a few things to figure out and we’ll explain those in the article. We’ll write an Assembly Language version of the program we wrote in C last time to flash three connected LEDs.

ARM Cortex-M0+ Assembly Language

I blogged about 32-bit ARM Assembly Language here, and then presented the flashing LED Assembly Language program for the Raspberry Pi here. Further I wrote a whole book on 32-bit ARM Assembly Language Programming: “Raspberry Pi Assembly Language Programming”. These are all oriented to ARM’s full A-series processors which include floating point units (FPU), vector processors, virtual memory support and much more. The ARM M-series processors are a subset of these, designed to be low cost, use little memory and be very power efficient. The ARM M-series processors only contain what are called the ARM “thumb” instructions. Normally, on an A-series processor, each instruction takes 32-bits, but for some applications this uses too much memory, so ARM came up with “thumb” instructions where if the processor is operating in “thumb” mode then each instruction is only 16-bits in length, thus only using half the memory. The original set of “thumb” instructions was too limited, so ARM added a way to run some 32-bit instructions in with the 16-bit instructions and that makes the modern “thumb” instructions set used by the M-series processors. One consequence of using the “thumb” instructions is that registers R8 to R12 are not accessible and hence not implemented on the chip, thus saving circuitry. The registers you do have are all 32-bit and the Raspberry RP2040 has special multiplication and division circuitry to perform these operations quickly.

Code

This program uses the C/C++ SDK to access the GPIO pins, this means this Assembly Language program is quite similar to last week’s C program. To call a routine in Assembly, you put the first parameter in R0, the second in R1 and then call Branch with Link (BL). BL places the address of the next instruction into the LR register, so the called return returns by branching to the address contained in the LR register. When calling functions there is a convention on who has to save which register on the stack, but we don’t use any register over the function calls, so we don’t need to do this. This program is set up as an infinite loop, since there is nothing for the main routine to return to and if it does return the processor halts.

Assembly Language code:

@
@ Assembler program to flash three LEDs connected to the
@ Raspberry Pi Pico GPIO port using the Pico SDK.
@
@

.EQU LED_PIN1, 18
.EQU LED_PIN2, 19
.EQU LED_PIN3, 20
.EQU GPIO_OUT, 1
.EQU sleep_time, 200

.global main_asm             @ Provide program starting address to linker
main_asm:

MOV R0, #LED_PIN1
BL gpio_init
MOV R0, #LED_PIN1
MOV R1, #GPIO_OUT
BL link_gpio_set_dir
MOV R0, #LED_PIN2
BL gpio_init
MOV R0, #LED_PIN2
MOV R1, #GPIO_OUT
BL link_gpio_set_dir
MOV R0, #LED_PIN3
BL gpio_init
MOV R0, #LED_PIN3
MOV R1, #GPIO_OUT
BL link_gpio_set_dir
loop:   MOV R0, #LED_PIN1
MOV R1, #1
BL link_gpio_put
LDR R0, =sleep_time
BL sleep_ms
MOV R0, #LED_PIN1
MOV R1, #0
BL link_gpio_put
MOV R0, #LED_PIN2
MOV R1, #1
BL link_gpio_put
LDR R0, =sleep_time
BL sleep_ms
MOV R0, #LED_PIN2
MOV R1, #0
BL link_gpio_put
MOV R0, #LED_PIN3
MOV R1, #1
BL link_gpio_put
LDR R0, =sleep_time
BL sleep_ms
MOV R0, #LED_PIN3
MOV R1, #0
BL link_gpio_put
B       loop

.data

      .align  4 @ necessary alignment

I didn’t intend to include any C code, but I ran into a couple of problems that require it. One is that a large number of SDK functions are inline C functions which means they can’t be called from outside of C. In our case two functions gpio_set_dir and gpio_put are inline and required wrapping. The other problem is that if the main program is Assembly Language then the code to initialize the board doesn’t seem to be called. I think this is a matter of setting the correct CMake options, but I haven’t had a chance to figure it out yet. For now we have main in the C code and then call the Assembly Language main routine.

C code:

#include “hardware/gpio.h”

void link_gpio_set_dir(int pin, int dir)
{
gpio_set_dir(pin, dir);
}

void link_gpio_put(int pin, int value)
{
gpio_put(pin, value);
}

void main()
{
main_asm();
}

The Raspberry Pi Pico SDK uses the CMake system to manage builds. The SDK provides a large set of build rules. You run CMake and then it creates a makefile that compiles your program.

CMake file:

cmake_minimum_required(VERSION 3.13)

include(pico_sdk_import.cmake)

project(test_project C CXX ASM)

set(CMAKE_C_STANDARD 11)
set(CMAKE_CXX_STANDARD 17)

pico_sdk_init()

include_directories(${CMAKE_SOURCE_DIR})

add_executable(flashledsasm
  mainmem.S
  sdklink.c
)

pico_enable_stdio_uart(flashledsasm 1)
pico_add_extra_outputs(flashledsasm)
target_link_libraries(flashledsasm pico_stdlib)

Still To-Do

The program works, but there are a few things I’m not happy about. The Raspberry Pi Pico SDK is pretty new, so there aren’t a lot of answers on StackOverflow yet. The good thing is that it is all open source, so it is just a matter of time to figure out the code. Here is what I’ll be working on:

  1. How to have main be in Assembly Language and have the board properly initialized. Match the C startup sequence.
  2. Figure out the details of the GPIO registers and have Assembly Language versions of the inline C code that accesses these. They are similar to those on the full Raspberry Pi, but different.
  3. How to get constants from the C include file, on first try this didn’t work and gave syntax errors, but the SDK says they should be usable from Assembly Language. They might need a couple of fixes.

Summary

I planned to write a 100% Assembly Language program, but didn’t quite make it. At least the program works, showing you can include Assembly Language in your RP2040 projects. The support to build using the GCC macro assembler is all there and besides some interactions with the SDK all seems to work well. Of course the Raspberry Pi Pico SDK is pretty new so there will be a lot of updates and there are still a number of undocumented holes to investigate.

Written by smist08

April 16, 2021 at 9:43 am

More Linux for Apple Silicon

leave a comment »

Introduction

Last week, I covered Asahi Linux and their drive to port Linux to Apple’s new ARM based Macintoshes. This week, there was a new contender where Corellium, a virtualization and security company, have successfully gotten Ubuntu Linux running on Apple M1 ARM based systems. Corellium created a system to allow security researchers to run iOS in virtual machines to allow more rapid testing of Apps for security problems. No one had heard of Corellium until Apple sued them for copyright infringement for doing this. The lawsuit has mostly been thrown out and Corellium was able to use the knowledge they learned virtualizing iOS to produce Linux device drivers for the new Apple M1 chips.

Corellium Linux

Corellium starts with the Raspberry Pi version of Ubuntu Linux. This is a full complete 64-bit version of Linux that runs on the Raspberry Pi’s ARM processor and has all the development tools and applications bundled. They then add their Apple M1 Linux drivers to the kernel, rebuild it and replace the Raspberry Pi kernel. Viola, Ubuntu Linux on the new Apple Silicon Macs. All the source code is available in Github and the install instructions are available here.

To virtualize iOS, Corellium had to figure out all the hardware register accesses made by iOS, intercept them and translate them into matching calls in the operating system hosting the virtualized iOS. Accomplishing this was an impressive feat. We are lucky that the M1 SoC used in the new Macs is really just the next generation of the processor chips Apple has been using for all their iPhones and iPads (even AppleTV and iWatches). As a consequence, all the directly integrated devices like USB support are all the same. Corellium could then use all this hard fought knowledge to modify various Linux device drivers to work properly with Apple devices. It is still impressive that they were able to accomplish this in such a short time.

This version of Ubuntu Linux is fully GUI, but the graphics aren’t accelerated and no use of the M1’s fancy GPU cores are used. Basically they figured out how to get an area of memory that represents the screen and then use Linux’s builtin ability to deal with this simple sort of graphics (almost like going back to the days of VGA).

Corellium recommends creating your Linux image on an USB storage device and then gives instructions on how to get your Mac to boot from this. Then you are running Linux. We’re probably still a distance away from dual booting Linux or MacOS and you probably don’t want to replace MacOS entirely from your new Mac. 

This is a great starting point to getting Linux fully supported on the new Macs, it seems progress is moving really fast. Asahi Linux is making good progress in understanding and using the M1’s GPU. With such a full featured working system, progress is accelerating.

What Next?

When new hardware appears, Linux support starts in local specialty source code repositories, that is the case now with Corellium and Asahi. The source code is all new, rough and needs cleaning up. Once this is done it is submitted to upstream source code repositories where it is reviewed and eventually accepted. Eventually, this will all make it into the main Linux kernel source code repository. When this happens, all the myriad Linux distributions will get it for free as they incorporate a newer kernel into their downstream repos. This may sound like a long process, but typically it happens quite quickly. Then we can look forward to Apple Silicon versions of all our favorite Linux distributions.

Summary

Apple Silicon Macs have only been in people’s hands for a very short time. It’s amazing that we already have a working version of Ubuntu Linux for these devices. We have the Raspberry Pi to thank for taking ARM based Linux mainstream so quickly and groups like Corelium and Asahi to thank for figuring out the hardware nitty-gritty details of these new Macs. All this just makes the new products from Apple more exciting and a nice alternative to the Intel/AMD world.

All of this requires a good knowledge of ARM 64-bit Assembly Language, so consider my book as a great way to learn all the details on how it works. I even have a chapter on reverse engineering which is hopefully helpful.

Written by smist08

January 22, 2021 at 12:56 pm

Porting Linux to Apple Silicon

with 2 comments

Introduction

When Apple announced they were switching from Intel to ARM CPUs, there was a worry that Apple would lock out installing non-Apple operating systems such as Linux. There is a new security processor that people worried would only allow MacOS to boot on these new chips. Fortunately, this proved to be false and the new ARM based Macintoshes fully support booting homebrew operating systems either from the SSD or from USB storage. However, the new Apple M1 chips present a number of problems that we’ll discuss in this article as well as why so many people are so interested in doing this.

Linus Torvalds, the father of Linux, recently said that he wished the new MacBooks ran Linux and that he would consider this the ultimate laptop and really want one. Linus said he saw porting Linux as possible, but personally he didn’t have the time to commit.

Last week’s article on an Assembly Language “Hello World” program hit number 1 on Hacker News and based on the comments, the interest was largely generated by the challenge of porting Linux to these new Apple systems. As we’ll see, doing this is going to require both reverse engineering and then writing ARM 64-bit Assembly Language code.

Asahi Linux

Last week we saw the announcement of the Asahi Linux project. Asahi means “rising sun” in Japanese and “asahi ringo” is Japanese for Macintosh Apple. The goal of this project is to develop a version of Linux that fully supports all the new hardware features of the Apple M1 chip including the GPU and USB-C ports. This won’t be easy because even though Apple doesn’t block you from doing this, they don’t help and they don’t provide any documentation on how the hardware works. People already have character based Linux booting and running on the Apple M1 Macs, and you can run the regular ARM version of Linux under virtualization on these new Macs, but the real goal is to understand the new hardware and have a version of Linux talking directly to the hardware that uses all the capabilities, like the GPU, to run as well as or better than MacOS.

GPUs and Linux

GPUs have always been a sore point with the Linux community. None of the GPU vendors properly document their hardware APIs to their products and believe the best way to support various operating systems is to provide precompiled binaries with no source code. Obviously this roils the open source community. GPUs are complicated and change a lot with each hardware generation. Newer Intel and AMD CPUs all have integrated graphics that have good open source drivers that at least will work, but at the disadvantage of not using all the fancy hardware you paid for in your expensive gaming PC. Even the Raspberry Pi versions of Linux use a binary Broadcom drive for the integrated GPU, rather than something open source.

Over the years, intrepid Linux developers have reverse engineered how all these GPUs work, so there are open source drivers for most nVidia and AMD GPUs. In fact, since neither nVidia or AMD support their hardware for all that long, if you have a more than 10 year old graphics card and run Linux, then you are pretty much forced to use the open source driver or switch to Intel integrated graphics (if available) or just stop upgrading the operating system and drivers.

The good news is that the open source community has a lot of experience figuring out how GPUs work, including those from nVidia, AMD, ARM and Broadcom. The bad news is that it takes time to first work out a disassembler of the GPU instructions to go from the binary form and work out what each bit means to produce a mnemonic Assembly Language source form. Then once this is known, write an Assembler for this and then use the tool to create the graphics driver. The Apple GPU isn’t entirely new, originally it was based on Imagination Technologies GPU design and then went through several iterations in iPads and iPhones before the current newest version ending up in the M1. Hopefully this history will be some help in developing the new Linux drivers.

Leveraging Existing Drivers

All the CPU vendors including ARM Holdings are motivated to contribute to the Linux kernel to ensure it runs well on their hardware. Linux is big enough that it greatly benefits vendors adoption to have a solid Linux offering. There is already really good ARM support in the Linux kernel and its tool chain such as GNU GCC. This is a solid first step in producing a working version of Linux for Apple Silicon.

Further, Apple doesn’t do everything themselves. There is hope that even if components are integrated into the M1 SoC that they still used standard designs. After all, Apple didn’t want to write all new drivers for MacOS. Hopefully a lot of the hardware drivers for the Intel Macs will just need to be recompiled for ARM and just work (or require very little work).

I haven’t mentioned the Apple integrated AI processor, but the hope here is that once the GPU is understood, that the AI processor is fairly similar, just missing the graphics specific parts and containing the same core vector processor.

There are quite a few other components in the SoC including sound processing and video decoding, hopefully these are known entities and not entirely new.

Why Do All This Work?

It’s hard enough writing device drivers when you have complete hardware documentation and can call a vendor’s support line. Having to reverse engineer how everything works first is a monumental task, so why are all these open source developers flocking to this task? Quite a few people like the challenge, if Apple provided lots of good documentation, then it would just be too easy. There is an attraction to having to connect hardware diagnostic equipment to your computer and iteratively write Assembly Language to figure out how to control things. None of this work is paid, besides the odd bit of gofundme money, these are mostly volunteers doing this in their spare time separate from their day jobs.

Humans are very curious creatures. Apple, by not providing any details, has piqued everyone’s curiosity. We don’t like being told no, you’re not allowed to know something. This just irritates us and perhaps we think there is something good being withheld from us.

There is also some fame to be had in hacker circles, as the people who solve the big problems are going to become legends in the Linux world.

Whatever the reason, we will all benefit from their hard work and determination. A well running Linux on Apple Silicon will be a great way to get full control of your hardware and escape App store restrictions and Apple’s policies on what you can and cannot do with your computer. It might even be a first step to producing Linux for iPhones and iPads which would be cool.

Summary

Apple has set a mythic challenge to hackers everywhere. By not providing any hardware documentation, Apple has created an epic contest for hackers to crack this nut and figure out how all the nitty gritty details of Apple Silicon work. This is a fun and difficult problem to work on. The kind of thing hackers love. I bet we are going to see prototype drivers and hardware details much faster than we think.

All of this requires a good knowledge of ARM 64-bit Assembly Language, so consider my book as a great way to learn all the details on how it works. I even have a chapter on reverse engineering which is hopefully helpful.

Written by smist08

January 15, 2021 at 10:59 am

Apple M1 Assembly Language Hello World

with 15 comments

Introduction

Last week, we talked about using a new Apple M1 based Macintosh as a development workstation and how installing Apple’s development system XCode also installed a large number of open source development tools including LLVM and make. This week, we’ll cover how to compile and run a simple command line ARM Assembly Language Hello World program.

Thanks to Alex vonBelow

My book “Programming with 64-Bit ARM Assembly Language” contains lots of sample self contained Assembly Language programs and a number of iOS and Android samples. The command line utilities are compiled for Linux using the GNU tool set. Alex vonBelow took all of these and modified them to work with the LLVM tool chain and to work within Apple’s development environment. He dealt with all the differences between Linux and MacOS/iOS as well. His version of the source code for my book, but modified for Apple M1 is available here:

https://github.com/below/HelloSilicon.

Differences Between MacOS and Linux

Both MacOS and Linux are based on Unix and are more similar than different. However there are a few differences of note:

  • MacOS uses LLVM by default whereas Linux uses GNU GCC. This really just affects the command line arguments in the makefile for the purposes of this article. You can use LLVM on Linux and GCC should be available for Apple M1 shortly.
  • The MacOS linker/loader doesn’t like doing relocations, so you need to use the ADR rather than LDR instruction to load addresses. You could use ADR in Linux and if you do this it will work in both.
  • The Unix API calls are nearly the same, the difference is that Linux redid the function numbers when they went to 64-bit, but MacOS kept the function numbers the same. In the 32-bit world they were the same, but now they are all different.
  • When calling a Linux service the function number goes in X16 rather than X8.
  • Linux installs the various libraries and includes files under /usr/lib and /usr/include, so they are easy to find and use. When you install XCode, it installs SDKs for MacOS, iOS, iPadOS, iWatchOS, etc. with the option of installing lots for versions. The paths to the libs and includes are rather complicated and you need a tool to find them.
  • In MacOS the program must start on a 64-bit boundary, hence the listing has an “.align 2” directive near top.
  • In MacOS you need to link in the System library even if you don’t make a system call from it or you get a linker error. This sample Hello World program uses software interrupts to make the system calls rather than the API in the System library and so shouldn’t need to link to it.
  • In MacOS the default entry point is _main whereas in Linux it is _start. This is changed via a command line argument to the linker.

Hello World Assembly File

Below is the simple Assembly Language program to print out “Hello World” in a terminal window. For all the gory details on these instructions and the architecture of the ARM processor, check out my book.

//
// Assembler program to print "Hello World!"
// to stdout.
//
// X0-X2 - parameters to linux function services
// X16 - linux function number
//
.global _start             // Provide program starting address to linker
.align 2

// Setup the parameters to print hello world
// and then call Linux to do it.

_start: mov X0, #1     // 1 = StdOut
adr X1, helloworld // string to print
mov X2, #13     // length of our string
mov X16, #4     // MacOS write system call
svc 0     // Call linux to output the string

// Setup the parameters to exit the program
// and then call Linux to do it.

mov     X0, #0      // Use 0 return code
       mov     X16, #1     // Service command code 1 terminates this program
       svc     0           // Call MacOS to terminate the program

helloworld:      .ascii  "Hello World!\n"

Makefile

Here is the makefile, the command to assemble the source code is simple, then the command to link is a bit more complicated.

HelloWorld: HelloWorld.o
ld -macosx_version_min 11.0.0 -o HelloWorld HelloWorld.o -lSystem -syslibroot
`xcrun -sdk macosx --show-sdk-path` -e _start -arch arm64

HelloWorld.o: HelloWorld.s
as -o HelloWorld.o HelloWorld.s

The xcrun command is Apple’s command to run or find the various SDKs. Here is a sample of running it:

stephensmith@Stephens-MacBook-Air-2 ~ % xcrun -sdk macosx –show-sdk-path
objc[42012]: Class AMSupportURLConnectionDelegate is implemented in both ?? (0x1edb5b8f0) and ?? (0x122dd02b8). One of the two will be used. Which one is undefined.
objc[42012]: Class AMSupportURLSession is implemented in both ?? (0x1edb5b940) and ?? (0x122dd0308). One of the two will be used. Which one is undefined.
objc[42013]: Class AMSupportURLConnectionDelegate is implemented in both ?? (0x1edb5b8f0) and ?? (0x1141942b8). One of the two will be used. Which one is undefined.
objc[42013]: Class AMSupportURLSession is implemented in both ?? (0x1edb5b940) and ?? (0x114194308). One of the two will be used. Which one is undefined.
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
stephensmith@Stephens-MacBook-Air-2 ~ %

After the ugly warnings from Objective-C, the path to the MacOS SDK is displayed.

Now we can compile and run our program.

stephensmith@Stephens-MacBook-Air-2 Chapter 1 % make -B
as -o HelloWorld.o HelloWorld.s
objc[42104]: Class AMSupportURLConnectionDelegate is implemented in both ?? (0x1edb5b8f0) and ?? (0x1145342b8). One of the two will be used. Which one is undefined.
objc[42104]: Class AMSupportURLSession is implemented in both ?? (0x1edb5b940) and ?? (0x114534308). One of the two will be used. Which one is undefined.
ld -macosx_version_min 11.0.0 -o HelloWorld HelloWorld.o -lSystem -syslibroot `xcrun -sdk macosx –show-sdk-path` -e _start -arch arm64 
stephensmith@Stephens-MacBook-Air-2 Chapter 1 % ./HelloWorld 
Hello World!
stephensmith@Stephens-MacBook-Air-2 Chapter 1 %

Summary

The new Apple M1 Macintoshes are running ARM processors as part of all that Apple Silicon and you can run standard ARM 64-bit Assembly Language. LLVM is a standard open source development tool which contains an Assembler that is similar to the GNU Assembler. Programming MacOS is similar to Linux since both are based on Unix and if you are familiar with Linux, most of your knowledge is directly applicable.

Written by smist08

January 8, 2021 at 10:31 am

Apple Macs Move to ARM Processors

with 7 comments

Introduction

I watched Apple’s introduction of their new Mac computers based on Apple Silicon which contain ARM CPUs. Of course I was excited about this since I wrote two books on ARM Assembly Language Programming. ARM processors are used in nearly all cell phones and tablets. They are used in single board computers like the Raspberry Pi as well as many IoT devices. Finally it looks like we are getting a good line of computers based on ARM processors. In this article we’ll look at why this is a good thing, as well as some of the hurdles that Apple will need to jump for this to be a success.

A Bit of History

The first Macs contained Motorola 68000 series CPUs, then Apple moved to IBM’s PowerPC chips and then on to using Intel CPUs like all other PCs. The Motorola 68000 was a CISC CPU that competed with Intel in the early days to be the heart of the PC. Intel won the race and Motorola lost interest in spending the billions that were required to keep this line of processors competitive. Apple made the decision to jump to IBM’s new RISC based PowerPC platform. Initially this was quite successful, but again IBM didn’t think it was worth investing the money required to keep up with Intel. Intel was competing fiercely against AMD to maintain a lead in processor technology and this battle between Intel and AMD left IBM in their dust. Apple saw the writing on the wall and moved the Mac line of computers to Intel processors.

Advance a few years, and the battle has moved to cell phones. Cell phones all use ARM processors mainly due to their lower power requirements. Now there is a furious battle between the various ARM chip makers to have the faster cell phone. Now the tables have turned and Intel is being left in the dust as its chips are getting older and it is having trouble competing. This gives Apple the chance to move to faster ARM processors that use less power (hence longer battery life) with the added advantage that all their devices from watches to phones to tablets to laptops to desktops all use variations of the same ARM processor.

The Apple M1 Processor

With these new ARM based Apple Macs, Apple introduced their new Apple M1 System on a Chip (SoC). This SoC contains eight ARM CPU cores, 4 are high power units, and 4 are lower power. The new MacOS dispatches threads based on whether they need to save power or maximize performance. This chip incorporates the CPU, GPU and memory all into one chip. The main downside of this is that this will be the least upgradeable Mac yet. I would recommend getting a higher configuration since you won’t be able to add to it down the road.

This is an impressive chip that Apple claims will be competitive with Intel i9 processors. It will be interesting to see the real benchmarks when these computers actually ship next week.

Unified Software

Now that iOS and MacOS programs use the same processor, it makes writing applications that run on everything from watches and phones to laptops and desktop easier. If you need some Assembly language optimizations, now you only need to include the same ARM code for all of them. It’s really cool that you can now run iPhone and iPad apps on your Mac.

Downsides

There are a couple of downsides to this approach, one is the lack up upgradeability due to the memory being included on the CPU chip. Another is that all software needs to be recompiled for the ARM processor. Apple has made this as easy as possible, so hopefully all the main software packages will be updated with ARM versions.

Even if a software vendor doesn’t do this (perhaps they went out of business), these new Macs claim they can run the software anyway by using an Intel emulator called Rosetta. We’ll have to get some real feedback on how well this works, but Apple claims it runs Intel programs better than only slightly older Intel processors.

The other headwind with Apple products is the price. These are higher end products that compete with Microsoft Surface and higher end Dell models. However there are a lot of much cheaper laptops from vendors like Acer or HP. I purchased a MacBook Air in 2012 and it is still going strong, a very solid laptop. The Sunshine Coast Tech Hub maintains half a dozen 2008 MacBook Pros that we use for an Arduino kids coding camp and all these laptops are going strong (admittedly upgraded to SSD drives and running Linux Mint). The price of these new ARM laptops are the same as the previous equivalent Intel models and my experience with Apple products is that they do last.

Will Microsoft and Others Follow?

ARM has released their Cortex A78C CPU that is an 8 core CPU for laptops and desktops where all 8 cores are high performance. How many other hardware vendors will try releasing laptops and desktops based on this chip? Linux runs fine on ARM CPUs, just look at the Raspberry Pi or nVidia Jetson Nano. Microsoft has a simplified version of Windows, similar to ChromeOS for ARM laptops. Will Microsoft support the full Windows Home and Pro on ARM? It will be interesting to see what new devices get released in 2021.

Summary

I’m excited about the new ARM based Apple Macs. If you want to learn more about the ARM CPU, check out one of my books on ARM Assembly Language programming such as the one pictured below. It will be interesting to see how these sell compared to Intel/AMD computers and how many other vendors choose to support ARM CPUs in laptops and desktops in 2021.

Written by smist08

November 10, 2020 at 3:12 pm