Stephen Smith's Blog

Musings on Machine Learning…

Archive for July 2020

Is Apple Silicon Really ARM?

with one comment

Introduction

Apple recently announced they were transitioning their line of Mac computers from using Intel CPUs to what they announced as Apple Silicon. This led to a lot of confusion since most people expected them to announce they would transition to the ARM CPU. The confusion arises from Apple’s market-speak where everything has to be Apple this or Apple that. Make no mistake the heart of Apple Silicon is the ARM CPU, but Apple Silicon refers to Apple’s System on a Chip (SoC) which includes a number of additional components along with the ARM CPUs. In this blog we’ll look at what Apple Silicon really is and discuss some of the current misconceptions around it.

ARM at the Core

My book “Programming with 64-Bit ARM Assembly Language” includes examples of adding ARM Assembly Language to Apple iOS apps for the iPhone and iPad. The main processors in iPhones and iPads have been redesignated as Apple Silicon chip, and the SoC used in the latest iPad is being used in Apple’s developer preview hardware of the new Mac Mini.

In fact, one of my readers, Alex vonBelow, converted all the source code for my book to run on the prototype Apple hardware. The Github repository of his work is available here. Again, this all demonstrates that ARM is at the center and is the brains behind Apple Silicon.

Other bloggers have claimed that Apple Silicon does not in fact use ARM processors since Apple customizes their ARM CPUs, rather than using off-the-shelf designs from ARM Holdings. This is true, but in fact most ARM licensees do the same thing, adding their own optimizations to gain competitive advantage. The key point is that Apple licenses the ARM CPU ISA (Instruction Set Architecture), which is the format and syntax of all the Assembly Language instructions the CPU processes, and then Apple uses ARMs software/hardware verification suite to ensure their designs produce correct results.

These other bloggers claim that Apple may stray from the ARM ISA in the future, which of course is possible, but highly unlikely. One of the keys to Apple’s success is the way they leverage open source software for much of both their operating system and development tools. Both MacOS and iOS are based on the open source Mach Unix Kernel from Carnegie Mellon University which in turn is based on BSD Unix. The XCode development environment looks Apple proprietary, but to do the actual compiling, it uses the open source tools from GCC or LLVM. Apple has been smart to concentrate their programming resources on areas where they can differentiate themselves from the competition, for instance making everything easier to use, and then using open source components for everything else. If Apple strays from the ARM ISA, then they have to take all this on themselves and will end up like Microsoft with the Windows 10 mess, where they can’t compete effectively with the much larger open source community.

Apple Silicon is an SoC

Modern integrated circuits allow billions of chips on a single small wafer. This means a single IC has room to hold multiple CPU cores along with all sorts of other components. Early PCs contained hundreds of ICs, nowadays, they contain a handful with most of the work being done by a single chip. This has been key to cell phones allowing them to be such powerful computers in such a small package. Similarly the $35 Raspberry Pi is a credit card sized computer where most of the work is done by a single SoC.

Let’s look at what Apple is cramming into their Apple Silicon SoC.

  1. A number of high performance/high power usage ARM CPU cores.
  2. A number of low power usage/lower performance ARM CPU cores.
  3. A Graphics Processing Unit (GPU).
  4. An Artificial Intelligence Processing Unit.
  5. An audio DSP processor.
  6. Power/performance management.
  7. A cryptographic acceleration unit.
  8. Secure boot controller.

For some reason, Apple marketing doesn’t like to mention ARM, which is too bad. But any programmer knows that to program for Apple Silicon, your compiler has to generate ARM 64-bit instructions. Similarly, if you want to debug a program for iPads, iPhones or the new Macs, then you have to use an ARM capable debugger such as GDB and you have to be able to interpret ARM instructions, ARM registers and ARM memory addressing modes.

It isn’t unusual for marketing departments to try and present technical topics to the general population in a way that doesn’t make sense to programmers. Programmers have to stick to reality or their programs won’t work and ignore most of what comes out of marketing departments. If you attended Apple’s WWDC a few months ago, you could see the programmers struggling to stick to the marketing message and every now and then having to mention ARM processors.

Summary

The transition of Macs from Intel to Apple Silicon is an exciting one, but don’t be fooled by the marketing spin, this is a transition from Intel to ARM. This is Apple going all-in on the ARM processor, using the same technology to power all their devices including iPhones, iPads and Mac computers.

If you want to learn more about the ARM processor and programming the new Apple devices, check out my book: Programming with 64-Bit ARM Assembly Language. Available directly from Apress, along with all the main booksellers.

Written by smist08

July 31, 2020 at 11:31 am

The Ups and Downs of Chip Manufacturing

with 4 comments

Introduction

Previously, we blogged about a number of successes in the ARM world: being adopted for the next generation Macs, moving into the server and supercomputer markets and of course continuing to dominate the mobile world. ARM is owned by the Japanese conglomerate holding company Softbank. ARM is Softbank’s big success, balanced with a number of major failures such as WeWork. With ARM’s current success, Softbank is considering whether this is a good time to sell off ARM at a big profit.

Meanwhile, Intel just reported their quarterly earnings and along with those, announced their 7nm next generation manufacturing process has been delayed until 2022. As a result Intel’s stock price took a major haircut and it seems Intel is even considering outsourcing their manufacturing.

In this article, we’ll look at the ramifications that we might see over the next year or so.

Who Might Buy ARM?

Softbank paid $32 billion to acquire ARM in 2016 and it is suspected that Softbank is trying to get in excess of $40 billion from the sale. Softbank is considering spawning ARM off as a separate company via an IPO, or selling it to either nVidia or Apple.

It is reported that Apple has already said that it isn’t interested in buying ARM. The main reason is that Apple only licenses the ARM ISA (Instruction Set Architecture) and not the CPU circuitry designs, so it doesn’t gain that much and will bring on anti-trust attention if it buys the company that designs the processors for all its competitors. Even if Apple has good intentions to its competitors, it doesn’t want to deal with them, as buying ARM would force it to.

nVidia is the main interested party. nVidia develops graphics cards that are used for AI applications and playing video games. The explosion in AI has greatly benefited nVidia, but nVidia faces a struggle as its main competitors in the graphics/AI world, namely Intel and AMD both make CPUs and have been including more and more graphics processing on the core CPU chips. nVidia would dearly like to enter the CPU market and be able to compete with newer AMD CPU/GPU chips. Intel has struggled of late, but even their built in Intel graphics are good enough for most people, meaning they don’t feel like they need an nVidia product. nVidia already has a lot of experience with ARM, we’ve blogged about their nVidia Jetson Nano, which is just one in a line of processor boards with integrated nVidia SIMD cores.

nVidia is large enough that they can afford to buy ARM, but the question is whether nVidia will be a good steward of the ARM technology portfolio? At least Softbank, being a holding company has largely left ARM alone to do its thing. One question will be how hands-on nVidia is if they do acquire ARM, or will they allow it independence. Will they try to milk more revenue from ARM, raising prices for everyone? Will they force the inclusion of nVidia technology. For instance ARM designs their own GPU called Mali, will nVidia mandate this be replaced by nVidia GPU technology? Personally, I feel that GPU is a weak spot for ARM and that a migration to the excellent nVidia graphics cores will be a big benefit. But this will take time, since most software expects Mali or one of its competitors.

If this deal does pass all the complicated regulatory and financial hurdles, only time will tell if this turns out to be a good thing for ARM.

Intel’s Manufacturing Problems

Intel has been struggling to release its 10nm based chips and they are just starting to come out. Meanwhile, AMD, Apple and others have been having their chips manufactured by TSMC using 7nm technology for almost a year now. Intel claims their 10nm technology is as good as TSMC’s 7nm process, but independent analysis shows that TSMC is beating Intel in the number of transistors per square millimeter (hence denser chips with more transistors), while using less power and generating less heat. Add to that, Intel announcing their 7nm technology has been deleted to 2022, and that TSMC is starting to produce 5nm based chips now.

These problems caused Apple to hurry up their switch from Intel to ARM for their Mac computers. It has also resulted in tremendous growth for AMD which produces Intel compatible chips using TSMC technology.

With my first job after University, they sent me on a 2 week course at Intel in Santa Clara on using their embedded processors (at that time the 80186). The course started with an Intel marketing video on how Intel was centered on their chip manufacturing process technology, how this was their crown jewel and the core of everything they did. Everything else was based on their excellence in manufacturing chips. This certainly remained true for many years, but recently a chink has appeared in Intel’s armour as TSMC has passed Intel in processing technology.

At Intel’s earnings call, Bob Swan, CEO said the unthinkable that Intel was considering outsourcing their manufacturing to TSMC as well. Would this work for Intel? Will their chip designs which are optimized for their process technology work on TSMC’s? Is it in TSMC’s interest to ramp up for Intel, when Intel is likely to take the manufacturing back in-house down the road? None of these questions were answered by Bob Swan on the call. Another question is whether Intel would exit the chip manufacturing business forever? How would this affect Intel long term? Is Intel’s chip design capability competitive, without a process technology advantage to help them? These are all hard questions and Intel is going to have to find some answers to all these or face an accelerating decline over the next few years.

Summary

Just when ARM is on a roll, Softbank has thrown it a curved ball by offering it up for sale. Whether anything comes of this is yet to be seen, if it does happen, I think nVidia would be the best choice to acquire ARM and that long term it would be a good thing. I think nVidia has shown how to do excellent SoCs based on ARM processors and nVidia GPU cores with products like the Jetson Nano and that hopefully nVidia can be a good steward for ARM.

Intel certainly faces some challenges in the coming months. AMD is eating away at their market share and getting their new chips to market seems to be getting more and more challenging. Hopefully Intel can find a solution to their problems, but these things can take several years and billions in investment to turn around.

To learn more about the internal architecture of the ARM Processor, consider my book: Programming with 64-Bit ARM Assembly Language.

Written by smist08

July 24, 2020 at 11:35 am

Challenges for Many Core Processors

leave a comment »

Introduction

CPU designers and manufacturers such as Intel, AMD and ARM are relying on adding more and more CPU cores to each chip they manufacture. Each CPU core is in itself a complete CPU that can execute programs independently. AMD has 64-core threadripper CPUs, there are now 128-core ARM CPUs and Intel can go as high as 18 cores. In this article we’ll look at some of the challenges of getting the full benefit from all these processors.

Memory Bandwidth

Good DDR4 memory runs at 3.6GHz these days. Let’s consider our processor cores running at 2GHz. An ARM CPU Core on average executes one instruction per clock cycle, and hence 2GHz means it can execute 2 billion instructions per second. On an ARM processor, each instruction is 32-bits in width, so if you have a 64-bit memory bus, then 3.6GHz memory can deliver 7.2 billion instructions per second. Note that some lower end systems will only have a 32-bit memory bus and hence have half this performance. This is plenty for one core, but can only keep 3 cores busy and this only counts Assembly language instructions and not data.

All these processors are 64-bit, with 64-bit registers. Whenever they load or store a memory address, that requires loading or storing a 64-bit quantity through the memory bus. The CPU can perform arithmetic on various size quantities whether 8, 16, 32 or 64 bits and all these must be moved to and from memory. Further all these processors have floating point coprocessors and some sort of SIMD processor, whether Intel’s AVX or ARM’s NEON. These can operate in parallel with the integer processing unit, if you can keep everything busy. All of this makes loading and saving data to and from memory a huge bottleneck.

If you have 128 cores, each running at 2GHz, you need your memory running at 128GHz to keep all the cores busy, just running code. Obviously this is impossible so what do system designers do to make more than 2 or 3 cores useful?

  1. Each CPU core has a local cache of a few megabytes of data. This data can be accessed immediately and once loaded doesn’t require much interaction with the memory controller. If the cores can keep their working set within the cache then they are very efficient.
  2. The manufacturer of the System on a Chip or the PC motherboard designer can include more than one memory bus, many server systems have 5 or 6 independent channels to main memory.
  3. Programming discipline. When writing C or Assembly Language code, it is tempting to use all 64-bit quantities, after all these are 64-bit processors and can perform 64-bit arithmetic in a single instruction cycle. The problem with this is the use of memory bandwidth. If you keep your integers to smaller sizes then this reduces the contention on the memory bus. This is why both Intel and ARM CPUs keep supporting instructions for smaller Arithmetic in their instruction sets. Good optimizing compilers are excellent at keeping data in registers and minimizing saving intermediate results to memory, so make sure you have all these options turned on, except while debugging.

Cache Consistency Protocol

Having a large CPU cache is touted as the solution to memory bandwidth problems. However, these introduce their own bottlenecks. When you write a value to memory and that value is in the cache, the individual CPU core updates its local cached value, but the cache isn’t necessarily written through to main memory right away. That is fine for an individual core, but the rule is that all cores have to see the same view of memory, you can’t have different CPU cores seeing different values at a given memory address.

There are quite a few different memory cache architectures as well as different protocols for maintaining cache consistency across all cores. A typical way of maintaining cache consistency is as follows:

  1. One core writes a new value to a memory address contained in its cache.
  2. The cache controller now checks to see if any other CPU has that memory address in its cache. If it isn’t in another cache, then the write is complete and the core continues processing.
  3. If the value is in another core’s cache then the CPU must first write the value to main memory and then send a notification to the other affected CPUs to invalidate that memory address in their cache, so next time it is read, it is read from main memory.
  4. Now the CPU core continues processing.

The advent of security vulnerabilities like Meltdown and Spectre that exploit these cache protocols to leak data across CPU cores has greatly reduced the performance of some of these schemes. Sometimes the advent of a new security problem can require a lot of the cache mechanisms to be disabled, badly affecting performance.

At some point cache contention becomes a problem and the circuitry to handle this is expensive.

Controlling Heat

Packing 128 CPU cores including their floating point unit and SIMD processor is a lot of circuitry on a single chip. Every active element on this chip generates heat that has to be dissipated. Chips control heat by slowing down when they get too hot or they shut down some of the CPU cores. Having 128 cores doesn’t help you if half of them aren’t running to cool down, or if they are all running at quarter speed. One of the bottlenecks in the Raspberry Pi is that if you keep all four CPU cores busy then the system overheats and slows down. Heat is the big enemy of modern CPU design and an important reason why ARM has been so successful, but even though ARM does better than Intel or AMD, it still runs into heat dissipation problems. This is partly why server farms have huge air conditioning bills and why liquid cooling is often incorporated into the design.

ARM CPUs have the idea that on a single chip, you can have a combination of different ARM CPU cores. In cell phones, half the cores are high performance, but generate more heat and use more battery power and then half are slower but more power efficient models. It is then up to the operating system to manage the process/thread scheduling to get a good balance of power and performance.

Operating Systems

If you solve the memory bandwidth problem, if you are running general processes like database or web servers on all the cores, then you are going to have other bottlenecks accessing operating system services. For instance when you access Linux services, then there is going to be more contention on operating system memory and resources such as devices like SSD drives. Typically you only have one channel to these and these will become a bottleneck very quickly. Interrupts are another problem as these may lock things and are typically tied to a single core. There are experimental extensions to the Linux kernel to dynamically allocate interrupt processing to less busy cores, but these are still a way off from being incorporated into the mainline.

The most efficient use of large cores is via specialize programs, typically used in supercomputing modelling systems. These are carefully crafted to avoid bottlenecks. Using a 128 core processor as a general purpose server may not give you the same boost.

Alternative Strategies

A few alternative strategies like you commonly see in high end graphics cards includes much more use of SIMD processing. If each core is executing the same instruction, then they can share the instruction loading and reduce memory bandwidth. Including RAM as part of the CPU is another approach to make memory access faster. There are a lot of innovative application specific solutions that are appearing in various AI processing and graphics chips. Slowly some of these will be incorporated into mainstream CPUs.

Summary

The prospect of having a workstation with 128 cores is exciting, the only thing is that to fully utilize this power, you will need expensive liquid cooling, expensive memory with multiple buses along with a number of other high performance components. This is why these systems are expensive and why discount systems, even with a lot of cores, typically don’t perform well in real benchmarks. AMD and ARM have processors on the market now with 64+ cores and the challenge for system designers of the next year is to solve all these bottlenecks, while maintaining security of each core’s data.

To learn more about the internal architecture of the ARM Processor, consider my book: Programming with 64-Bit ARM Assembly Language.

Written by smist08

July 17, 2020 at 11:35 am

Posted in Business

Tagged with , , , , ,

5 Best Word Processors for Writers

leave a comment »

The Write Cup

By Jeff Hortobagyi, Cathalynn Cindy Labonte-Smith, Elizabeth Rains & Stephen Smith

Introduction

The market has remained fairly static for word-processors since MS Word was released as Multi-Tool Word Version 1.0 in 1983, it’s dominated the word-processing market. Its main competitor became WordPerfect, but that program soon fizzled out and became a minor player. Although, there are loyal WordPerfect users out there and there’s a WordPerfect Office Professional 2020 suite available, but at over $500 it’s out-priced itself out of the market. MS Word remains the heavy hitter in the word-processing world and it’s affordable for $6.40/month for the entire MS Office package, but an increasing number of free apps keep driving down its price.

In 2006, Google Docs came along and changed the way people worked. No longer did authors need to print out and make paper copies to make redlines. No longer did they need to attach large…

View original post 4,039 more words

Written by smist08

July 10, 2020 at 9:10 pm

Posted in Uncategorized

Fallout From ARM’s Success

with 2 comments

Introduction

Last time, we talked about a number of ARM’s recent successes. This time we’ll discuss a few of the consequences for the rest of the industry. Many people are discussing the effect on Intel and AMD, but probably a bigger victim of the ARM steamroller is RISC-V, the open source processor.

Trouble for Intel’s Profits

This past year wasn’t a good one for Intel. They’ve been having trouble keeping up with chip manufacturing technology. Most other vendors outsource their chip manufacturing to TSMC, Samsung and a couple of others. What has happened is that TSMC is so large that it is out-spending Intel on R&D by orders of manufacturing and as a result is years ahead of Intel in chip technology. The big winners in this are AMD and ARM which now manufacture denser, faster, more power efficient chips than Intel. AMD gave up manufacturing their chips themselves some years ago and ARM never manufactured chips itself. 

Better chip manufacturing technology allows AMD and ARM to fit more processing cores on each chip or produce products in smaller form factors.

Intel’s main problem this past year has been AMD which has been chipping away at their market share. Now with Apple switching to ARM processors, this could be the start of a migration away from Intel. Microsoft already has an ARM version of their Surface notebook running a limited version of Windows, but they could easily produce something more powerful running a full version of Windows. Similarly, other manufacturers, such as Dell or HP could start producing ARM based laptops and workstations running Linux.

Although AMD doesn’t have Intel’s manufacturing problems, it does have a problem with requiring all its chips to support all the instructions introduced into the x86/x64 architecture over the many years of its existence. Modern x86 chips run RISC cores internally, but have to translate the old CISC instructions into RISC instructions as they run. This extra layer is required to keep all those old DOS and Windows programs running, many of which are no longer supported, but used by many users. Both Intel and AMD are at a competitive disadvantage to ARM and RISC-V, who don’t need to waste circuitry doing this, and extra circuitry means higher power consumption and heat production.

Today Intel’s most profitable chips are its data center focused Xeon processors. These are powerful multi-core chips, but with more and more cores being added to ARM processors, even here ARM is starting to chip away at Intel.

RISC-V is having Trouble Launching

I’ve blogged on RISC-V processors a couple of times, this is an open source hardware specification so you can develop a processor without paying royalties or fees to any other companies. Anyone can manufacture an ARM processor, but if they use the ARM instruction set, they need to pay royalties to ARM Holdings. The hope of the RISC-V folks was to stimulate competitive innovation and produce lower cost, more powerful processors.

The reality has been that companies designing RISC-V chips can’t get orders to manufacture in the volume they need to be price competitive.

RISC-V is still ticking along, but it is limited to the following applications:

  • Providing low cost processors for the Arduino market, usually 32-bit processors with a few meg of memory.
  • Producing specialty chips for things like AI processors. Again this is having trouble getting going due to low volumes.
  • Manufacturers like Western Digital using them as embedded processors in their products, like WD’s disk controllers.

What RISC-V really needs is a Single Board Computer (SBC) like the Raspberry Pi. This means with comparable performance and price. Plus they need to run Linux in a stable supported way. Without this there won’t be any software development and they won’t be able to gain any sort of foothold. Doing this will be extremely difficult given how powerful and cheap the current crop of ARM based SBCs are. The level of software support for ARM in the Linux world is phenomenal.

Summary

ARM certainly isn’t going to eradicate Intel and AMD anytime soon. But even a small dent in their sales can send their stock price into a tailspin. Investors are going to have to watch the trends very closely, in case they need to bail. RISC-V will continue to have difficulty gaining acceptance, and manufacturing a competitive chip. More companies will adopt ARM and this will increase its competitive advantage. Here ARM’s strategy of licensing designs rather than chips is really paying off in fielding more and more competition for its rivals. Next year will be a very good one for ARM and likely an even tougher year for Intel.

The main conclusion here is that if you are a programmer, you should have a look at ARM and a good way to learn about it is to study its Assembly Language, perhaps by reading my book: “Programming with 64-Bit ARM Assembly Language”.

Written by smist08

July 3, 2020 at 11:23 am

Posted in Business

Tagged with , , , ,