Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘arm

RP2040 Assembly Language Programming

with 6 comments


My third book on ARM Assembly Language programming has recently started shipping from Apress/Springer, just in time for Christmas. This one is “RP2040 Assembly Language Programming” and goes into detail on how to program Raspberry’s RP2040 SoC. This chip is used in the Raspberry Pi Pico along with boards from several other manufacturers such as Seeed Studios, AdaFruit, Arduino and Pimoroni.

Flavours of ARM Assembly Language

ARM has ambitions to provide CPUs from the cheapest microcontrollers costing less than a dollar all the way up to supercomputers costing millions of dollars. Along the road to this, there are now three distinct flavours of ARM Assembly Language:

  1. A Series 32-bit
  2. M Series 32-bit
  3. 64-bit

Let’s look at each of these in turn.

A Series 32-bit

For A Series, each instruction is 32-bits in length and as the processors have evolved they added features to support virtual memory, advanced security and other features to support advanced operating systems like Linux, iOS and Android. This is the Assembly Language used in 32-bit phones, tablets and the Raspberry Pi OS. This is covered in my book “Raspberry Pi Assembly Language Programming”.

M Series 32-bit

The full A series instruction set didn’t work well in microcontroller environments. Using 32-bits for each instruction was considered wasteful as well as supporting all the features for advanced operating systems made the CPUs too expensive. To solve the memory problem, ARM introduced a mode to A series 32-bit where each instruction was 16-bits, this saved memory, but the processors were still too expensive. When ARM introduced their M series, or microcontroller processors, they made this 16-bit instruction format the native format and removed most of the advanced operating system features. The RP2040 SoC used in the Raspberry Pi Pico is one of these M Series CPUs using dual core ARM Cortex M0+ CPUs. This is the subject of my current book “RP2040 Assembly Language Programming”.


Like Intel and AMD, ARM made the transition from 32-bit to 64-bit processors. As part of this they cleaned up the instruction set, added registers and created a third variant of ARM Assembly Language. iOS and Android are now fully 64-bit and you can run 64-bit versions of Linux on newer Raspberry Pis. The ARM 64-bit instruction set is the topic of my book: “Programming with 64-Bit ARM Assembly Language”.

ARM 64-bit CPUs can run the 32-bit instruction set, and then the M series instruction set is a subset of the A series 32-bit instruction set. Each one is a full featured rich instruction set and deserves a book of its own. If you want to learn all three, I recommend buying all three of my books.

More Than ARM CPUs

The RP2040 is a System on a Chip (SoC), it includes the two M-series ARM CPU cores; but, it also includes many built in hardware interfaces, memory and other components. RP2040 boards don’t need much beyond the RP2040 chip besides a method to interface other components.

“RP2040 Assembly Language Programming” includes coverage of how to use the various hardware registers to control the built-in hardware controllers, as well as the innovative Programmable I/O (PIO) hardware coprocessors. These PIO coprocessors have their own Assembly Language and are capable of some very sophisticated communications protocols, even VGA.

Where to Buy

“RP2040 Assembly Language Programming” is available from most booksellers including:

Currently if you search for “RP2040” in books on any of these sites, my book comes up first.


The Raspberry Pi Pico and the RP2040 chip aren’t the first ARM M-series based microcontrollers, but with their release, suddenly the popularity and acceptance of ARM processors in the microcontroller space has exploded. The instruction set for ARM’s M-series processors is simple, clean and a great example of a RISC instruction set. Whether you are into more advanced microcontroller applications or learning Assembly Language for the first time, this is a great place to start.

Written by smist08

November 5, 2021 at 10:42 am

ARM’s True RISC Processors

leave a comment »


I recently completed my book, “RP2040 Assembly Language Programming” and was thinking about the differences in the three main instruction sets available on ARM Processors:

  1. The “thumb” instructions used in ARM’s 32-bit microcontrollers are covered in “RP2040 Assembly Language Programming”.
  2. The full 32-bit A-series instruction set as used by the Raspberry Pi OS is covered in my book “Raspberry Pi Assembly Language Programming”.
  3. The 64-bit instruction set used on all smartphones and tablets covered in my book “Programming with 64-Bit ARM Assembly Language”.

ARM is advertised as Reduced Instruction Set Computer (RISC) as opposed to Intel x86 chips which are Complex Instruction Set Computers (CISC). However, as ARM instroduces v9 of their full chip architecture, the instruction set has gotten pretty complex. Writing the RP2040 book and writing the included source code was nice in that the microcontroller version of the instruction set really is reduced and much simpler than the other two full versions. In this article, we’ll look at a bit of history of the various ARM instruction sets and why ARM is still considered a RISC processor.

A Bit of History

Originally, ARM was developed as a replacement to the 6502 processor used in the BBC Microcomputer, developed by Acorn. The early versions were specialty chips and it wasn’t until ARM was selected by Apple to use ARM in their Newton PDAs that ARM was spun off as a separate company starting with their 32-bit RISC CPUs. They reached the next level of success as Apple continued to use them in their iPods and then they hit it big when they were used in the iPhone and after that pretty much every smartphone and tablet that reached any level of success.

The original 32-bit instruction set used 32-bits to contain each machine instruction, which worked great as long as you had sufficient memory. In the microcontroller world there were complaints that for devices with only 4k of memory, these instructions were too big. To answer this, ARM added “thumb” instructions which were 16-bits in length, using half the memory of hte full instructions. The processor was still 32-bits, since the registers were 32-bits in size and all integer arithmetic was 32-bit. The “thumb” instruction set is a subset of the full 32-bit instruction set and the processor can switch between regular and thumb mode on select branch instructions. This allowed the microcontroller people to use the “thumb” subset to develop compact applications for their use. Even on computers with larger memory, “thumb” instructions can be useful since loading 16-bit instructions means you can load two instructions for each memory read and save contention on the memory bus and allowing twice as many instructions to fit in the instruction cache, improving performance.

The first “thumb” instruction set wasn’t complete which meant programs had to revert to full instructions to complete a number of functions. To address this ARM developed “thumb-2” to allow complete functionality without switching back. The various “thumb” instruction sets are all 32-bit, the 64-bit version of the ARM instruction set has no “thumb” subset.

Enter Microcontrollers

ARM has alway had the ambition to provide CPU chips covering the whole market from inexpensive small microcontrollers all the way up to the most powerful datacenter server chips. The full 32-bit ARM processors were a bit too expensive and complicated for the microcontroller market. To address this market, ARM developed the M-series CPUs where they chose to make the full instruction set of these devices, the “thumb” instruction set. This made these CPUs far simpler and required fewer transistors to create. This laid the way for powerful ARM 32-bit CPUs for the microcontroller market costing under $1 each.

For instance, the ARM Cortex-M0+ used in the Raspberry Pi Pico has 85 instructions. This sounds like a lot, but it counts things like adding a register to a register different from adding an immediate operand to a register. This is far fewer instructions than in an ARM full A-series processor, which is far fewer than the instructions in an x86 processor.

Some of the features that are dropped from the M-series processors are:

  • Virtual memory
  • Hardware memory protection
  • Virtualization
  • Conditional instructions
  • Not all instructions can address all the registers
  • Immediate operands are much smaller and shifting isn’t supported
  • The addressing modes are far simpler
  • Instructions either set or don’t set the conditional flags, there is no extra bit to control this

Most microcontrollers run a single program that has access to all the memory, so these aren’t an issue. However, the lack of hardware hasn’t stopped people adding software support and implementing Linux and other OS’s running on these microcontrollers.

Are ARM Processors Still RISC?

A full ARM A-Series processor like those found in the Raspberry Pi, Apple’s iPhone 7 iPad along with dozens of Android and ChromeOS devices, all run the full 64-bit instruction set, as well as the full 32-bit instruction set including the “thumb” instruction. They support virtual memory, virtualization, FPUs, vector processors, advanced security and everything else you would expect in a modern processor. That is a lot for something that is billed as “reduced”. Basically an ARM CPU has the same transistor budget as an x86 processor, so they use every transistor to do something useful. So why are ARM processors still considered RISC? The parts of RISC that all ARM processors retain is:

  • The instructions are a fixed length.
  • They are a load/store architecture (no instructions like add memory to register). An instruction either loads/stores from memory or performs an arithmetic operation on the registers.
  • Most instructions execute in a single clock cycle.
  • They have a large set of registers, though Intel processors now also have a large set of registers.

Even with all this functionality, ARM processors use far less power than x86 processors, this is mainly due to the simplifications that fixed length instructions and a load/store architecture provide. Intel processor now execute a RISC processor at their core, but then have to add another layer to translate each x86 instruction into their internal RISC instructions, that all uses transistors and power when executing,

So yes, even though the number of instructions in an ARM CPU has multiplied greatly over the nine generations of the chips, the core ideas are still RISC.


The line of M-series ARM CPUs are far simpler to program than the full A-Series. There is no virtual memory support, so you can access hardware addresses directly, reading and writing anywhere without worries about security or memory protection. The instruction set is simpler and nothing is wasted. Having written three books on ARM Assembly Language Programming, I think learning Assembly Language for a microcontroller is a great way to start. You have full control of the hardware and don’t have to worry about interacting with an operating system. I think you get a much better feel for how the hardware works as well as a real feel for programming for RISC based processors. If you are interested in this, I hope you check out my forthcoming book: “RP2040 Assembly Language Programming”.

Written by smist08

October 2, 2021 at 10:31 am

Programming an Apple Watch

with 4 comments


A cool thing about the Apple Watch is that it’s really a full ARM based computer running a Unix derived operating system that is fully programmable. Although most Apple Watch owners will never write programs for their Apple Watch, as they never write programs for their iPad or iPhone, it is entirely possible to do so using Apple’s Xcode development environment running on a newer Mac. In this article we’ll look a little at the powerful computer that is the Apple Watch and give an idea of how programs or Apps are developed.

The Platform

The Apple Watch contains a whole lot of processing power, combined with a ton of sensors and a nice retina display all packed into a very small package. The processor is a dual core 64-bit ARM CPU with 1Gig of RAM. In the Series 6 watch, these cores are low energy cores from the iPhone 11’s CPU. There is also 32Gig of storage for Apps and data. There is even a mini PowerVR GPU. The touch sensitive display is only 1.78”, but still has a resolution of 448 x 368 pixels. 1Gig may not sound like much RAM, but remember that all the Raspberry Pi’s up to the 3B, only had 1Gig of RAM and ran a full version of Linux quite nicely. For connectivity there is WiFi, Cell, Bluetooth and ultra wideband.

The sensors include: accelerometer, gyro, heart rate, barometer, always-on altimeter, compass, GPS, microphone, SpO2 and VO2max.

That’s quite a bit of computer packed into a small package only weighing 48grams.


The Apple Watch’s operating system is WatchOS which is based on iOS. Programming for WatchOS is pretty similar to programming for iOS and in fact you use the same tools. There is a WatchKit API for watch specific functions and you should keep in mind the Watchs UI limitations when creating Apps. For instance, even though you can do text entry on the watch, you have to draw each character or use the built in speech to text interface, i.e. there is no keyboard.

Typically you develop a WatchApp in parallel with an iPhone App, where the iPhone App provides configuration, setup and does much of the work allowing you to minimize the interface required on the watch. Xcode makes creating these dual Apps easy and in fact you can have separate heads for the Watch version, the Apple TV version, the iPhone version and the iPad version.

In my book “Programming with 64-Bit ARM Assembly Language”, I create a simple iOS App that has a text box, where you enter some text and then it calls an Assembly Language routine to convert the text to uppercase. Alex vonBelow took this example and added support for both the Apple Watch and AppleTV. The Github for this is available here and this program is in Chapter 10.

For most work, you debug by running the application in the iOS/WatchOS simulator. The nice thing about my new ARM based Mac is that the simulator is quite fast, since it doesn’t have to simulate running an ARM CPU on an Intel processor, instead everything is ARM and works quickly. Below is a screenshot of running this uppercase app for the Apple Watch.

The cool thing is that if you know how to write iOS Apps, then you already know how to write Apple Watch Apps (as well as AppleTV Apps). Besides writing code in Objective-C or Swift, you can even write code in 64-bit ARM Assembly Language. Xcode makes it easy to provide separate appropriate screens for each device.

There are tons of books on how to write iOS Apps and all that knowledge works across all the Apple mobile products. The key thing for the Watch is that the UI should be mostly informational and any UI should be limited to just a couple of buttons.

Programming with Objective-C or Swift using the iOS frameworks is fairly complex, it would be nice if there was something simpler like a version of Scratch for WatchOS or a command prompt App, like the one for the iPhone. But at least Xcode creates a reasonable skeleton working App when you create a new project.


The Apple Watch is quite a powerful little computer in its own right. You can program it from Xcode and use nearly all the tools you use for iOS development for the iPhone or iPad. It’s really amazing how much computing power, connectivity and sensors are stuffed into the small watch package.

Written by smist08

February 26, 2021 at 10:40 am

State of Software on Apple Silicon

leave a comment »


Apple Silicon ARM based Macintoshes first shipped back on Nov. 20, 2020. We are now three months later, so I thought I’d review the state of software on these new Macs. Whenever a vendor changes the CPUs in their computers there is typically a lag between the hardware shipping and then software becoming available for the new platform. Apple took a number of extraordinary measures to try to eliminate this lag, to have a lot of software available when the first real hardware shipped. In this article we’ll look at what is available now, what is missing and how the journey seems to be going.

Rosetta and the Early Adopter Program

Knowing that software would be a problem when Apple switched from Intel to ARM for their Macintoshes, they worked hard to produce an excellent Intel emulation program called Rosetta. This allows most Intel based MacOS programs to run as is on the new hardware. I found this to work really well, with a couple of exceptions. One problem is battery life, when I first received my new MacBook Air, there wasn’t a native general media player yet, and running the Intel version of VLC would drain the battery in a couple of hours. Since then a native ARM version of VLC has been released and I’m able to play videos all day without recharging. Another problem is that Rosetta doesn’t emulate Intel AVX vector processing instructions and this prevents some machine learning libraries from running. However, I found most things did work and hence I could get everything I needed done, done.

The other thing Apple did was seed developers with an early prototype Mac Mini using an iPad Pro’s ARM CPU and a beta version of MacOS Big Sur. This allowed developers to get a jump on porting their applications to the new platform. The downside was that this hardware cost US $600 and you had  to sign a heavy nondisclosure along with agreeing to return this hardware when the real thing shipped. This limited the program to mostly larger companies who could easily afford this. At the end Apple provided a $500 Apple store credit if you returned the unit and if this was known at the beginning, more people might have participated.

Native Applications

There are now a large collection of native Apple Silicon applications available including Microsoft Office, Google Chrome, VLC Player and Zoom. Others such as Adobe Lightroom or Parallels virtualization software are available in beta. There is a list of native applications here.

Of course all Apple’s applications are available for the M1 chip including Xcode which gives a pretty robust starting point.

Porting applications from Intel to ARM appears to be fairly straightforward and expect any missing applications (as long as they are still actively developed) to show up soon.

MacOS is based on Unix as is Linux and Mac users are used to all the various Linux applications having MacOS versions as well. A great place to get these is from one of the MacOS package managers like Homebrew. Homebrew has ARM native versions of most open source programs, including GCC and lots of Python libraries.

Xcode installed the LLVM compiler collection and then with GCC you can download and build most open source projects if you really need to, but chances are the folk at Homebrew have beaten you to it.

Another source of native Apps are iOS iPad and iPhone apps, many of these will install on the new Macs and for a lot of things are the best way to go.


If you want to run Linux or Windows on your new Apple Silicon Mac, one way to go is via virtualization. Parallels has a beta version of their software for Apple Silicon and I gave it a try and got a virtualized copy Ubuntu Linux up and running without any problems. The target operating system has to be 64-bit ARM based, which means you can choose from quite a few flavours of Linux.

Although bootcamp isn’t supported on the new Macs, you can run virtualized ARM based Windows using Parallels, but this version of Windows is only available through the Windows Insider program.

Other Operating Systems

There is a large contingent of Linux developers working hard to get Linux running natively on this new Apple hardware. Correlium has Ubuntu Linux running natively, booting either from the SSD or from a USB drive. This version supports much of the new Apple hardware, except it doesn’t use the GPU at all. Asahi Linux is working hard on GPU support, but that is a big job.

Apple is making this job hard but not providing documentation on the low level hardware interfaces, but people are making good progress reverse engineering MacOS.

The good news is that it is possible to run other OS’s like Linux and over time we’ll hopefully see other operating systems in addition to Linux ported as well.

What’s Missing

The big thing missing is support for the GPU and TPU. All the libraries that utilize GPUs for acceleration, won’t use the Apple GPU. This is a huge advantage that nVidia has, that nearly every machine learning library supports acceleration on nVidia GPUs, then next, but still lagging is support for AMD GPUs. 

Apple wants everyone to use their Metal API to access the GPU, but the problem is getting people to do this. Apple has taken a branch of Google’s Tensorflow and is adding Metal support themselves, but they have a bit of a tough road ahead to get good support across the board. They could try to help organizations like OpenCL write their drivers, but I think this will take time.

Another thing is to sort out the installation of systems like Python where there are hundreds of add-on libraries. There are lots of conda virtual installs for the new M1s but most mix and match installs are still a pain, and nowhere near as painless as they are on Intel based MacOS or on Linux.


There has been huge progress in porting software to the new ARM based Macs. There is still work to do, but it appears the new Macs are selling well and there seem to be lots of developers acquiring these new Macs. Anyone serious about iOS development wants these as it is so much easier having your development system running the same processor as your target. The missing pieces are becoming more and more obscure and most of the holes can be filled with programs running under Rosetta. I use my new MacBook for pretty much all my work and find everything I need already there and working really well.

Written by smist08

February 19, 2021 at 11:13 am

Posted in Business

Tagged with , , , ,

Can Intel Turn Things Around?

with 2 comments


Last week, Intel reported earnings of 20 billion in revenue, a new record. Leading up to that their CEO, Bob Swan, resigned and they rehired Pat Gelsinger. Pat was most recently the CEO of VMWare, but is well known within Intel from when he was the chief architect of the 80486 processor. The hope is that Gelsinger with his engineering background can turn Intel around. In this article, we’ll look at why with record revenue, people are saying Intel is in trouble. Then we’ll discuss what their prospects are.

Intel is all About the Process

My first job out of University was at Epic Data and the first thing they did was send me on a two week course at Intel down in Santa Clara on the then new 80186 processor which they were using as an embedded controller. In the course’s introduction, there was a bit of corporate propaganda where they discussed how the process technology that they used to build their chips was the core of the company and their competitive advantage. How having a better process technology for building chips was their number one priority and the foundation of everything they did.

Building chip foundries is expensive and few companies have the billions required to build new foundries with each new generation of chip technology. Intel had the resources to do this and many of their competitors fell by the wayside. There are only a few companies now creating CPU chips, but the model has changed where you can now design a chip and then have a third party manufacture it for you. AMD designs Intel compatible chips, but doesn’t manufacture them, it now contracts TSMC or Samsung to do that for them. Similarly for ARM chips and even nVidia GPUs.

This is where Intel has fallen down. Their transition from 14nm to 10nm technology (this is the size of transistors on the chip) ran into problems. Most Intel CPUs are still 14nm with a few of the newer ones finally coming out at 10nm. This then delays their next generation which will be 7nm for several years into the future.

Meanwhile, nearly all AMD CPUs are produced by TSMC at 7nm today. There are some differences between these, but Intel is only maintaining a performance lead by running hotter and using much larger amounts of power. At the sametime Apple is the first adopter of TSMC’s 5nm technology which is in the new Apple Silicon M1 chip and the newer iPad Air. Once TSMC meets Apple’s need for chips and brings more production capacity online, then AMD will go to the 5nm technology while Intel is still struggling with 10nm. By next year, TSMC will be on to 3nm technology and this will be used by AMD and all the ARM chips before Intel makes its transition to 7nm. Samsung is a bit behind TSMC, but well ahead of Intel and spending like mad to catch up and overtake TSMC. Suddenly instead of being the process leader, Intel is finding themselves several generations behind and even looking to TSMC to manufacture some of their chips. Meanwhile AMD is taking more and more market share away from Intel.

The smaller the transistor size, the more transistors you can put on a chip. For CPUs this means more CPU cores, better integrated GPUs, integrated AI cores, etc. Often this also reduces power utilization and increased speed due to shorter paths.

Brutal Competition

When IBM selected Intel as the CPU for their IBM PCs, they insisted that there was a second source for the CPUs, and so Intel was forced to grant a license to AMD to produce Intel compatible chips. This decision then created a huge competition between Intel and AMD to create the best x86 compatible chips. Each company kept leapfrogging the other with better designs and better processes. During this competition, there was a lot of interest in RISC processors which were simpler and it was believed that these would provide better performance than the Intel x86 CISC chips. There were Sun’s Sparc processors, IBM’s Power CPUs, MIPS and several others. In spite of all the theoretical advantages of RISC, the ferocious battle between Intel and AMD left them all in the dust. The PC market was large and profitable enough to provide both Intel and AMD with lots of money for R&D. The rewards of producing the best x86 chip were huge. By the time Intel launched the Core2 architectures, the x86 world was way ahead of the RISC world in computing power and at a far lower cost. The old Power, Sparc and MIPS processors are mostly memories now.

Power is the Game Changer

When Apple was looking to produce the iPod music player, they would have been happy to use familiar Intel chips, except that they ran the battery down too fast and Steve Jobs felt it would be useless as a result. Apple instead chose a small RISC chip that had been developed for a new version of the BBC Acorn computer. This chip wasn’t particularly powerful, but it didn’t use much power and was sufficient for a single use music player.

Acorn computer then spawned off the ARM chip as a separate entity that would design new versions of the chip. The resultant ARM Holdings never manufactured chips or even contracted others to manufacture chips. Instead it licensed its designs to companies like Apple to modify for their needs and to manufacture anyway they felt like. The success of the iPod was enough to fund R&D to produce more powerful ARM chips, like for the iPod Touch, then the iPhone and then the iPad. Once the iPhone forced all cell phones to be smart phones, everyone adopted the ARM processor for their mobile designs.

The main companies that produce ARM chips are Apple, Samsung, Qualcomm, MediaTek and Broadcom. Most of these are manufactured either by Samsung or TSMC.

Competition in the cell phone market is far more intense than the battle between Intel and AMD and we have history repeating itself where the cell phone manufacturers have put in huge investments to produce the best ARM chip for their flagship phone. We are now at a point where ARM chips are just as powerful as high end Intel or AMD chips but using only one tenth the power.

Using less power is a huge competitive advantage since:

  • Saving electricity saves money, especially in data centers
  • Using less power generates less heat and requires less cooling hardware
  • Systems don’t have to slow down because they are overheating
  • Laptops don’t require annoying fans

All of a sudden we see ARM chips entering the laptop, desktop and data center worlds where Intel is king. This is a huge threat to Intel and their market dominance in these spaces.

Follow the Money

Intel just reported revenue of 20 billion last quarter, which is pretty good and should provide lots of money for R&D. However, Apple just reported 111 billion last quarter and Samsung 59 billion and Qualcomm 6.5 billion. Intel may be huge, but the phone market is so huge that suddenly the industry that Intel dominates is small by comparison. Can Intel invest in the R&D necessary to compete? Meanwhile TSMC and Samsung aren’t even looking at Intel as they compete with each other for the best process technology. This is Intel’s big problem, that suddenly they find themselves in the same boat that Sun and IBM found themselves in with their Sparc and Power processors. Will they be able to overcome this challenge?

Intel’s Strategy

Pat Gelsinger hasn’t even officially started at Intel yet and he is already enticing various ex-Intel engineers that have either retired or moved on to other companies to return. He feels that Intel has gotten away from their engineering roots and has let key talent get away. The other thing Intel has been doing is going on a media blitz announcing future products that probably won’t appear for years, basically saying just wait a bit and we’ll have something for you in the future rather than buying AMD now (or switching to ARM). Hopefully once Pat officially assumes his new role next month, we’ll see some more concrete action plans on how Intel is going to proceed.


Intel is a powerful company that is making lots of money, however its products are getting old and it isn’t keeping up with the competition. At some point that is going to start affecting profits as loyal customers are forced to consider alternatives to stay competitive. Will reuniting the old team be enough to turn Intel around? I think more is required and everyone will be watching Intel closely over the next six months to see how they respond.

Written by smist08

January 29, 2021 at 12:43 pm

Posted in Business

Tagged with , , , ,

Porting Linux to Apple Silicon

with 2 comments


When Apple announced they were switching from Intel to ARM CPUs, there was a worry that Apple would lock out installing non-Apple operating systems such as Linux. There is a new security processor that people worried would only allow MacOS to boot on these new chips. Fortunately, this proved to be false and the new ARM based Macintoshes fully support booting homebrew operating systems either from the SSD or from USB storage. However, the new Apple M1 chips present a number of problems that we’ll discuss in this article as well as why so many people are so interested in doing this.

Linus Torvalds, the father of Linux, recently said that he wished the new MacBooks ran Linux and that he would consider this the ultimate laptop and really want one. Linus said he saw porting Linux as possible, but personally he didn’t have the time to commit.

Last week’s article on an Assembly Language “Hello World” program hit number 1 on Hacker News and based on the comments, the interest was largely generated by the challenge of porting Linux to these new Apple systems. As we’ll see, doing this is going to require both reverse engineering and then writing ARM 64-bit Assembly Language code.

Asahi Linux

Last week we saw the announcement of the Asahi Linux project. Asahi means “rising sun” in Japanese and “asahi ringo” is Japanese for Macintosh Apple. The goal of this project is to develop a version of Linux that fully supports all the new hardware features of the Apple M1 chip including the GPU and USB-C ports. This won’t be easy because even though Apple doesn’t block you from doing this, they don’t help and they don’t provide any documentation on how the hardware works. People already have character based Linux booting and running on the Apple M1 Macs, and you can run the regular ARM version of Linux under virtualization on these new Macs, but the real goal is to understand the new hardware and have a version of Linux talking directly to the hardware that uses all the capabilities, like the GPU, to run as well as or better than MacOS.

GPUs and Linux

GPUs have always been a sore point with the Linux community. None of the GPU vendors properly document their hardware APIs to their products and believe the best way to support various operating systems is to provide precompiled binaries with no source code. Obviously this roils the open source community. GPUs are complicated and change a lot with each hardware generation. Newer Intel and AMD CPUs all have integrated graphics that have good open source drivers that at least will work, but at the disadvantage of not using all the fancy hardware you paid for in your expensive gaming PC. Even the Raspberry Pi versions of Linux use a binary Broadcom drive for the integrated GPU, rather than something open source.

Over the years, intrepid Linux developers have reverse engineered how all these GPUs work, so there are open source drivers for most nVidia and AMD GPUs. In fact, since neither nVidia or AMD support their hardware for all that long, if you have a more than 10 year old graphics card and run Linux, then you are pretty much forced to use the open source driver or switch to Intel integrated graphics (if available) or just stop upgrading the operating system and drivers.

The good news is that the open source community has a lot of experience figuring out how GPUs work, including those from nVidia, AMD, ARM and Broadcom. The bad news is that it takes time to first work out a disassembler of the GPU instructions to go from the binary form and work out what each bit means to produce a mnemonic Assembly Language source form. Then once this is known, write an Assembler for this and then use the tool to create the graphics driver. The Apple GPU isn’t entirely new, originally it was based on Imagination Technologies GPU design and then went through several iterations in iPads and iPhones before the current newest version ending up in the M1. Hopefully this history will be some help in developing the new Linux drivers.

Leveraging Existing Drivers

All the CPU vendors including ARM Holdings are motivated to contribute to the Linux kernel to ensure it runs well on their hardware. Linux is big enough that it greatly benefits vendors adoption to have a solid Linux offering. There is already really good ARM support in the Linux kernel and its tool chain such as GNU GCC. This is a solid first step in producing a working version of Linux for Apple Silicon.

Further, Apple doesn’t do everything themselves. There is hope that even if components are integrated into the M1 SoC that they still used standard designs. After all, Apple didn’t want to write all new drivers for MacOS. Hopefully a lot of the hardware drivers for the Intel Macs will just need to be recompiled for ARM and just work (or require very little work).

I haven’t mentioned the Apple integrated AI processor, but the hope here is that once the GPU is understood, that the AI processor is fairly similar, just missing the graphics specific parts and containing the same core vector processor.

There are quite a few other components in the SoC including sound processing and video decoding, hopefully these are known entities and not entirely new.

Why Do All This Work?

It’s hard enough writing device drivers when you have complete hardware documentation and can call a vendor’s support line. Having to reverse engineer how everything works first is a monumental task, so why are all these open source developers flocking to this task? Quite a few people like the challenge, if Apple provided lots of good documentation, then it would just be too easy. There is an attraction to having to connect hardware diagnostic equipment to your computer and iteratively write Assembly Language to figure out how to control things. None of this work is paid, besides the odd bit of gofundme money, these are mostly volunteers doing this in their spare time separate from their day jobs.

Humans are very curious creatures. Apple, by not providing any details, has piqued everyone’s curiosity. We don’t like being told no, you’re not allowed to know something. This just irritates us and perhaps we think there is something good being withheld from us.

There is also some fame to be had in hacker circles, as the people who solve the big problems are going to become legends in the Linux world.

Whatever the reason, we will all benefit from their hard work and determination. A well running Linux on Apple Silicon will be a great way to get full control of your hardware and escape App store restrictions and Apple’s policies on what you can and cannot do with your computer. It might even be a first step to producing Linux for iPhones and iPads which would be cool.


Apple has set a mythic challenge to hackers everywhere. By not providing any hardware documentation, Apple has created an epic contest for hackers to crack this nut and figure out how all the nitty gritty details of Apple Silicon work. This is a fun and difficult problem to work on. The kind of thing hackers love. I bet we are going to see prototype drivers and hardware details much faster than we think.

All of this requires a good knowledge of ARM 64-bit Assembly Language, so consider my book as a great way to learn all the details on how it works. I even have a chapter on reverse engineering which is hopefully helpful.

Written by smist08

January 15, 2021 at 10:59 am

Apple M1 Assembly Language Hello World

with 15 comments


Last week, we talked about using a new Apple M1 based Macintosh as a development workstation and how installing Apple’s development system XCode also installed a large number of open source development tools including LLVM and make. This week, we’ll cover how to compile and run a simple command line ARM Assembly Language Hello World program.

Thanks to Alex vonBelow

My book “Programming with 64-Bit ARM Assembly Language” contains lots of sample self contained Assembly Language programs and a number of iOS and Android samples. The command line utilities are compiled for Linux using the GNU tool set. Alex vonBelow took all of these and modified them to work with the LLVM tool chain and to work within Apple’s development environment. He dealt with all the differences between Linux and MacOS/iOS as well. His version of the source code for my book, but modified for Apple M1 is available here:

Differences Between MacOS and Linux

Both MacOS and Linux are based on Unix and are more similar than different. However there are a few differences of note:

  • MacOS uses LLVM by default whereas Linux uses GNU GCC. This really just affects the command line arguments in the makefile for the purposes of this article. You can use LLVM on Linux and GCC should be available for Apple M1 shortly.
  • The MacOS linker/loader doesn’t like doing relocations, so you need to use the ADR rather than LDR instruction to load addresses. You could use ADR in Linux and if you do this it will work in both.
  • The Unix API calls are nearly the same, the difference is that Linux redid the function numbers when they went to 64-bit, but MacOS kept the function numbers the same. In the 32-bit world they were the same, but now they are all different.
  • When calling a Linux service the function number goes in X16 rather than X8.
  • Linux installs the various libraries and includes files under /usr/lib and /usr/include, so they are easy to find and use. When you install XCode, it installs SDKs for MacOS, iOS, iPadOS, iWatchOS, etc. with the option of installing lots for versions. The paths to the libs and includes are rather complicated and you need a tool to find them.
  • In MacOS the program must start on a 64-bit boundary, hence the listing has an “.align 2” directive near top.
  • In MacOS you need to link in the System library even if you don’t make a system call from it or you get a linker error. This sample Hello World program uses software interrupts to make the system calls rather than the API in the System library and so shouldn’t need to link to it.
  • In MacOS the default entry point is _main whereas in Linux it is _start. This is changed via a command line argument to the linker.

Hello World Assembly File

Below is the simple Assembly Language program to print out “Hello World” in a terminal window. For all the gory details on these instructions and the architecture of the ARM processor, check out my book.

// Assembler program to print "Hello World!"
// to stdout.
// X0-X2 - parameters to linux function services
// X16 - linux function number
.global _start             // Provide program starting address to linker
.align 2

// Setup the parameters to print hello world
// and then call Linux to do it.

_start: mov X0, #1     // 1 = StdOut
adr X1, helloworld // string to print
mov X2, #13     // length of our string
mov X16, #4     // MacOS write system call
svc 0     // Call linux to output the string

// Setup the parameters to exit the program
// and then call Linux to do it.

mov     X0, #0      // Use 0 return code
       mov     X16, #1     // Service command code 1 terminates this program
       svc     0           // Call MacOS to terminate the program

helloworld:      .ascii  "Hello World!\n"


Here is the makefile, the command to assemble the source code is simple, then the command to link is a bit more complicated.

HelloWorld: HelloWorld.o
ld -macosx_version_min 11.0.0 -o HelloWorld HelloWorld.o -lSystem -syslibroot
`xcrun -sdk macosx --show-sdk-path` -e _start -arch arm64

HelloWorld.o: HelloWorld.s
as -o HelloWorld.o HelloWorld.s

The xcrun command is Apple’s command to run or find the various SDKs. Here is a sample of running it:

stephensmith@Stephens-MacBook-Air-2 ~ % xcrun -sdk macosx –show-sdk-path
objc[42012]: Class AMSupportURLConnectionDelegate is implemented in both ?? (0x1edb5b8f0) and ?? (0x122dd02b8). One of the two will be used. Which one is undefined.
objc[42012]: Class AMSupportURLSession is implemented in both ?? (0x1edb5b940) and ?? (0x122dd0308). One of the two will be used. Which one is undefined.
objc[42013]: Class AMSupportURLConnectionDelegate is implemented in both ?? (0x1edb5b8f0) and ?? (0x1141942b8). One of the two will be used. Which one is undefined.
objc[42013]: Class AMSupportURLSession is implemented in both ?? (0x1edb5b940) and ?? (0x114194308). One of the two will be used. Which one is undefined.
stephensmith@Stephens-MacBook-Air-2 ~ %

After the ugly warnings from Objective-C, the path to the MacOS SDK is displayed.

Now we can compile and run our program.

stephensmith@Stephens-MacBook-Air-2 Chapter 1 % make -B
as -o HelloWorld.o HelloWorld.s
objc[42104]: Class AMSupportURLConnectionDelegate is implemented in both ?? (0x1edb5b8f0) and ?? (0x1145342b8). One of the two will be used. Which one is undefined.
objc[42104]: Class AMSupportURLSession is implemented in both ?? (0x1edb5b940) and ?? (0x114534308). One of the two will be used. Which one is undefined.
ld -macosx_version_min 11.0.0 -o HelloWorld HelloWorld.o -lSystem -syslibroot `xcrun -sdk macosx –show-sdk-path` -e _start -arch arm64 
stephensmith@Stephens-MacBook-Air-2 Chapter 1 % ./HelloWorld 
Hello World!
stephensmith@Stephens-MacBook-Air-2 Chapter 1 %


The new Apple M1 Macintoshes are running ARM processors as part of all that Apple Silicon and you can run standard ARM 64-bit Assembly Language. LLVM is a standard open source development tool which contains an Assembler that is similar to the GNU Assembler. Programming MacOS is similar to Linux since both are based on Unix and if you are familiar with Linux, most of your knowledge is directly applicable.

Written by smist08

January 8, 2021 at 10:31 am

Technology Predictions for 2021

with one comment


Last week, we discussed some of the remarkable technologies from 2020. This time, we are going to look at some of the technology trends that will influence life in 2021. Everyone is hopeful that now that vaccines are starting to roll out, that we can put the pandemic behind us. The pandemic will certainly continue to dominate life well into 2021, but hopefully as more people become vaccinated, the numbers will start to finally go down and the level of community infection will reduce. As always, predictions are highly inaccurate and you should always expect the unexpected, nevertheless, it is fun to speculate.

Continuing CPU Wars

Intel will continue to have trouble with their new chip process technologies and as a result the wolves are circling. Both AMD and ARM will aggressively target Intel in all CPU categories. Both AMD and ARM will exploit newer process technologies from TSMC and Samsung to produce faster chips with higher transistor counts that use less power. Apple will accelerate their transition from Intel to ARM and it is likely they will finish their transition in 2021 a year ahead of schedule. Apple has proven the ARM processor is fully capable of competing in the laptop and desktop markets and expect other manufacturers to produce similar products running the ARM version of Windows which Microsoft will put more effort into as it gains market share. Similarly, the Apple M1 brought down a high bandwidth memory bus from the expensive server market into the mainstream. Expect to see better memory buses and technologies starting to appear in the consumer market. RISC-V will still struggle for relevance unless they can produce a Raspberry Pi board with similar performance for a similar price.

Self Driving Cars

There are dozens of companies working on self-driving cars. Honda plans to release a level 3 autonomous car by summer that will allow the driver to take their eyes off the road in a number of conditions. Already we are seeing self-driving taxis in some Asian cities. Expect to see more autonomous trucks driving the highways in the US. Expect the driver assist features in regular cars to be far more advanced this year going beyond simple lane control on highways.

We won’t see mainstream flying cars or jetpacks, these will remain dangerous toys, but expect cars to drive themselves more and more and suddenly we’ll wonder why we didn’t let cars always drive themselves.

Hackers and Virtual War

2020 is finishing with a major attack by a Russian state sponsored hacking group on the US government, infrastructure and corporations. We blogged on the ransomware attack against our own Translink, which is just one of dozens of successful attacks on various companies. IT departments everywhere are now under constant attack from private freelance hacking groups like the Translink case, but worse they are now faced with well funded state backed attacks from foreign countries, especially Russia, China, Iran and North Korea. All the labs developing COVID vaccines have been attacked as other countries have tried to steal their secrets for their own vaccine development.

In 2021, this is only going to get worse. The COVID pandemic made most IT departments short staffed and this will continue. The US government hasn’t made any investments in strengthening cyber security as any attempts to do so are stalled in a deadlocked congress. Similarly, corporations have gone cheap on their defenses as they just see these as a cost that should be reduced to increase profits.

There have been no real penalties for carrying out these attacks and this failure of law enforcement is going to embolden others to jump in and try to profit.

In 2021 things are going to get worse before they get better. Companies and governments will start to respond. Hopefully, the US can get past the dysfunction of the past four years and start to show some leadership again. Sadly, so far governments have been more interested in weakening encryption standards and providing backdoors for law enforcement. These moves have made communications easier to hack. Keeping backdoors secret from hackers never works, they always figure them out. Making it so law enforcement can eavesdrop on drug dealers sounds good, but the reverse is that it allows criminals to eavesdrop on our banking transactions.

In many ways, warfare between nations has gone virtual. While the US continues to buy more planes and tanks, Russia and China are investing in cyberwarfare and so far they are winning.

Computers Get Faster and More Powerful

This is an easy prediction as it is always the case. Expect more RAM, bigger SSDs and faster CPUs. Towards the end of 2021, I would expect a standard mid-range laptop or desktop to have 32Gig DDR5 RAM, a 2TB SSD and a 10+ core CPU. Further it will be standard to have GPU functionality built into the CPU chip. You can buy such systems now, but they are quite high end, just expect that as prices come down, what used to be high end becomes mainstream. There is currently a RAM surplus so look for more inexpensive RAM and SSD for this spring.


These are a few of the technology trends to watch in 2021. Of course AI will continue to improve, but only look for incremental improvements. The Raspberry Pi will go fully 64-bit, but don’t look for a new RP 5 until 2022. Will there be something unexpected? Probably. As always participating in the technology world is exciting.

Written by smist08

December 18, 2020 at 10:04 am

What’s Next for the Apple M2 ARM CPU

with 3 comments


Last week, Apple started shipping their new ARM M1 based Macintosh computers. I ordered a new MacBook Air and maybe I’ll get it before XMas. The demand for these new ARM based computers is high and they are selling like mad. The big attraction is that they have the computing power of top end Intel chips, but use a tenth of the power, leading to a new class of powerful laptops with battery life now measured in days rather than hours. With all the hype around the M1, people are starting to ask where Apple will go next? When will there be an M2 chip and what will it contain? Apple is a secretive company so all this speculation is mostly rumours. This article will look at my wish list and where I think things will go.

First, there will be more M1 base Macs early next year. Expect higher end MacBook Pros, these won’t have a completely new M2, more like an M1X which will have either more CPU or CPU cores and higher memory options. I expect the real M2 will come out towards next XMas as Apple prepares all their new products for the next holiday shopping season.

The Chip Manufacturing Process

The current M1 CPU is manufactured using TSMC’s 5nm process. TSMC recently completed their 3nm fabrication facility (at least the building). The expectation is that the next generation of Apple’s iPhone, iPad and Mac chips will be created using this process. With this size reduction, Apple will be able to fit 1.67 times as many transistors on the chip using the same form factor and power. Compare this to Intel which has been having trouble making the transition from 14nm to 10nm over the last few years. Of course AMD also uses TSMC to manufacture their chips, so there could be competitive AMD chips, but reaching the same power utilization as an ARM CPU is extremely difficult.

Samsung manufactures most of its chips using 8nm technology and is investing heavily trying to catch up to TSMC, hoping to get some of Apple and AMD’s business back. I don’t think Samsung will catch up in 2021 but beyond 2021, the competition could heat up and we’ll see even faster progress.

More Cores

The most obvious place to make use of all these extra transistors is in placing more CPU, GPU or AIPU cores on the chip. The M1 has 8 CPU cores, 8 GPU cores and 16 AI Processor cores. Apple could add to any of these. If they want a more powerful gaming computer, then adding GPU cores is the obvious place. I suspect 8 CPU cores is sufficient for most laptop workloads, but with more GPU cores, they could start being competitive with top of the line nVidia and AMD GPUs. The AI processing cores are interesting and are being used more and more, 

Apple is continually profiling how their processor components are used by applications and will be monitoring which parts of the system are maxed out and which remain mostly idle. Using this information they can allocate more processing cores to the areas that need it most.

More Memory

The current M1 chips come with either 8 or 16 GB of RAM. I suspect this is only a limitation of trying to get some systems shipping in 2020 and that there will be higher memory M1 chips sooner than later. For the M2 chip, I don’t think we really need an 8GB model anymore and if there are two sizes it should be 16 or 32 GB. Further, with high end graphics using a lot of memory, a good case for 64 GB can be made even for a laptop.

More and Faster Ports

The first few Mac computers have 2 USB 4 ports and one video port. There has been a lot of complaining about this, but it is a bit misleading because you can add hubs to these ports. It has been demonstrated that you can actually connect 6 monitors to the video out using a hub. Similarly you can connect a hub and have any number of ports. I’m not sure if Apple will add more ports back and either way I’m not too worried about it.

The good thing is that USB 4 is fast and it makes connecting an external drive (whether SSD or mechanical) more practical for general use. Of course making the ports even faster next time around would be great.

General Optimizations

Each year, ARM improves their CPU cores and Apple incorporates these improvements. The optimizations could be to the pipeline processing, improved algorithms for longer running operations, better out of order execution, security improvements, etc. There are also newer instructions and functionality incorporated. Apple takes all these and adds their own improvements as well. We’ve seen this year over year as the performance of the ARM processors have improved so much in the iPhones and iPads. This will continue and this alone will yield a 30% or so performance improvement.

More Co-processors

The M1 chip is more than a multi-core ARM CPU. It includes all sorts of co-processors like the GPU cores and AI processing. It includes the main RAM, memory controller, a security processor and support for specialty things like video decoding. We don’t know what Apple is working on, but they could easily use some fraction of their transistor budget to add new specialty co-processors. Hopefully whatever they do add is open for programmers to take advantage of and not proprietary and only used by the operating system.


The Apple M1 Silicon is a significant first milestone. Everyone is excited to see where Apple will go with this. Apple has huge resources to develop these chips going forwards. The R&D Apple puts into Apple Silicon benefits all their businesses from the Apple Watch to the iPad, so they are fully committed to this. I’m excited to see what the next generation chips will be able to do, though I’m hoping to use my M1 based MacBook for 8 years, like I did with my last MacBook.

If you are interested in the M1 ARM processor and want to learn more about how it works internally, then consider my book: Programming with 64-Bit ARM Assembly Language.

Written by smist08

November 27, 2020 at 10:56 am

Apple M1 Unified Memory

leave a comment »


I recently upgraded three 2008 MacBook Pros from 1Gig to 4Gig of RAM. It was super-easy, you remove the battery (accessible via a coin), remove a small cover over the RAM and hard-drive, then pop the RAM and push in the new ones. Upgrading the hard drive or RAM on these old laptops is straightforward and anyone can do it. Newer MacBooks require partial disassembly which makes the process harder. For the newest ARM based MacBooks, upgrading is impossible. So, do we gain anything for this lack of upgradeability?

This article looks at Apple’s new unified memory architecture that they claim gives large performance gains. Apple hasn’t released a lot of in depth technical details on the M1 chip, but from what they have released, and now that people have received these units and performed real benchmarks we can see that Apple really does have something here.

Why Would We Want To Upgrade?

In the case of the 2008 MacBook Pro, when it was new, 4Gig was expensive. Now 4Gig of DDR2 memory is $10. It makes total sense to upgrade to maximum memory. Similarly, the MacBook came with a mechanical hard drive which is quite small and slow by modern standards. It was easy to upgrade these to larger faster SSD drives for around $40 each.Often this is the case that the maximum configuration is too expensive at the time of the original purchase, but becomes much cheaper a few years later. Performing these upgrades then lets you get quite a few years more service out of your computer. The 2008 MacBook Pros upgraded to maximum configuration are still quite usable computers (of course you have to run Linux on them, since Apple software no longer supports them).

Enter the New Apple ARM Based Macintoshes

The newly released MacBooks based on ARM System on a Chips (SoCs) have their RAM integrated into their CPU chips. This means that unless you can replace the entire CPU, you can’t upgrade the RAM. Apple claims integrating the memory into the CPU gives them a number of performance gains, since the memory is high speed, very close to all the devices and shared by all the devices. A major bottleneck in modern computer systems is moving data between memory and the CPU or copying data from the CPU’s memory to the GPU’s memory.

AMD and nVidia graphics cards contain their own memory separate from the memory used by the CPU. So a modern gaming computer might have 16Gig RAM for the CPU and then 8Gig or RAM for the GPU. If you want the GPU to perform a matrix multiplication you need to transfer the matrices to the GPU, tell it to multiply them and then transfer the resulting matrix back to the CPU’s memory. nVidia and AMD claim this is necessary since they incorporate newer faster memory in their GPUs than is typically installed on the CPU motherboard. Most CPUs currently use DDR4 memory whereas GPUs typically incorporate faster DDR6 memory. There are GPUs (like the Raspberry Pi’s) that share CPU memory, however these tend to be lower end (cheaper since they don’t have their own memory) and slower since there is more contention for the CPU memory.

The Apple M1 tries to address these problems by incorporating the fastest memory and then providing a much wider bandwidth between the memory and the various processors on the M1 chip. For the M1 there isn’t just the GPU, but also a Neural Engine for AI processing (which is similar to a GPU) as well as other units for specialized functions like data encryption and video decoding. Most newer computers have a 64-bit memory controller that can move 64-bits of data between the CPU and RAM at the speed of the RAM, sometimes the RAM is as fast as the CPU, sometimes it’s a bit slower. Newer CPUs have large caches to try to save on some of this transfer, but the caches are in MegaBytes whereas main memory is in GigaBytes. Separate GPU memory helps by having a completely separate memory controller, expensive servers help by having multiple memory controllers. Apple’s block diagrams seem to indicate they have two 64-bit memory controllers or parallel pathways to main memory, but this is a bit hypothetical. As people are benchmarking these new computers, it does appear that Apple has made some significant performance improvements.


If Apple has greatly reduced the memory bottleneck and having the GPU, Neural Engine and CPU all accessing the same memory doesn’t cause too much contention, then saving the copying of data between the processing units will be a big advantage. On the downside, you should overbuy on the memory now, since you can’t upgrade it later.

If you are interested in the M1 ARM processor and want to learn more about how it works internally, then consider my book: Programming with 64-Bit ARM Assembly Language.

Written by smist08

November 20, 2020 at 1:09 pm