Stephen Smith's Blog

Musings on Machine Learning…

Archive for November 2020

What’s Next for the Apple M2 ARM CPU

with 3 comments

Introduction

Last week, Apple started shipping their new ARM M1 based Macintosh computers. I ordered a new MacBook Air and maybe I’ll get it before XMas. The demand for these new ARM based computers is high and they are selling like mad. The big attraction is that they have the computing power of top end Intel chips, but use a tenth of the power, leading to a new class of powerful laptops with battery life now measured in days rather than hours. With all the hype around the M1, people are starting to ask where Apple will go next? When will there be an M2 chip and what will it contain? Apple is a secretive company so all this speculation is mostly rumours. This article will look at my wish list and where I think things will go.

First, there will be more M1 base Macs early next year. Expect higher end MacBook Pros, these won’t have a completely new M2, more like an M1X which will have either more CPU or CPU cores and higher memory options. I expect the real M2 will come out towards next XMas as Apple prepares all their new products for the next holiday shopping season.

The Chip Manufacturing Process

The current M1 CPU is manufactured using TSMC’s 5nm process. TSMC recently completed their 3nm fabrication facility (at least the building). The expectation is that the next generation of Apple’s iPhone, iPad and Mac chips will be created using this process. With this size reduction, Apple will be able to fit 1.67 times as many transistors on the chip using the same form factor and power. Compare this to Intel which has been having trouble making the transition from 14nm to 10nm over the last few years. Of course AMD also uses TSMC to manufacture their chips, so there could be competitive AMD chips, but reaching the same power utilization as an ARM CPU is extremely difficult.

Samsung manufactures most of its chips using 8nm technology and is investing heavily trying to catch up to TSMC, hoping to get some of Apple and AMD’s business back. I don’t think Samsung will catch up in 2021 but beyond 2021, the competition could heat up and we’ll see even faster progress.

More Cores

The most obvious place to make use of all these extra transistors is in placing more CPU, GPU or AIPU cores on the chip. The M1 has 8 CPU cores, 8 GPU cores and 16 AI Processor cores. Apple could add to any of these. If they want a more powerful gaming computer, then adding GPU cores is the obvious place. I suspect 8 CPU cores is sufficient for most laptop workloads, but with more GPU cores, they could start being competitive with top of the line nVidia and AMD GPUs. The AI processing cores are interesting and are being used more and more, 

Apple is continually profiling how their processor components are used by applications and will be monitoring which parts of the system are maxed out and which remain mostly idle. Using this information they can allocate more processing cores to the areas that need it most.

More Memory

The current M1 chips come with either 8 or 16 GB of RAM. I suspect this is only a limitation of trying to get some systems shipping in 2020 and that there will be higher memory M1 chips sooner than later. For the M2 chip, I don’t think we really need an 8GB model anymore and if there are two sizes it should be 16 or 32 GB. Further, with high end graphics using a lot of memory, a good case for 64 GB can be made even for a laptop.

More and Faster Ports

The first few Mac computers have 2 USB 4 ports and one video port. There has been a lot of complaining about this, but it is a bit misleading because you can add hubs to these ports. It has been demonstrated that you can actually connect 6 monitors to the video out using a hub. Similarly you can connect a hub and have any number of ports. I’m not sure if Apple will add more ports back and either way I’m not too worried about it.

The good thing is that USB 4 is fast and it makes connecting an external drive (whether SSD or mechanical) more practical for general use. Of course making the ports even faster next time around would be great.

General Optimizations

Each year, ARM improves their CPU cores and Apple incorporates these improvements. The optimizations could be to the pipeline processing, improved algorithms for longer running operations, better out of order execution, security improvements, etc. There are also newer instructions and functionality incorporated. Apple takes all these and adds their own improvements as well. We’ve seen this year over year as the performance of the ARM processors have improved so much in the iPhones and iPads. This will continue and this alone will yield a 30% or so performance improvement.

More Co-processors

The M1 chip is more than a multi-core ARM CPU. It includes all sorts of co-processors like the GPU cores and AI processing. It includes the main RAM, memory controller, a security processor and support for specialty things like video decoding. We don’t know what Apple is working on, but they could easily use some fraction of their transistor budget to add new specialty co-processors. Hopefully whatever they do add is open for programmers to take advantage of and not proprietary and only used by the operating system.

Summary

The Apple M1 Silicon is a significant first milestone. Everyone is excited to see where Apple will go with this. Apple has huge resources to develop these chips going forwards. The R&D Apple puts into Apple Silicon benefits all their businesses from the Apple Watch to the iPad, so they are fully committed to this. I’m excited to see what the next generation chips will be able to do, though I’m hoping to use my M1 based MacBook for 8 years, like I did with my last MacBook.

If you are interested in the M1 ARM processor and want to learn more about how it works internally, then consider my book: Programming with 64-Bit ARM Assembly Language.

Written by smist08

November 27, 2020 at 10:56 am

Apple M1 Unified Memory

leave a comment »

Introduction

I recently upgraded three 2008 MacBook Pros from 1Gig to 4Gig of RAM. It was super-easy, you remove the battery (accessible via a coin), remove a small cover over the RAM and hard-drive, then pop the RAM and push in the new ones. Upgrading the hard drive or RAM on these old laptops is straightforward and anyone can do it. Newer MacBooks require partial disassembly which makes the process harder. For the newest ARM based MacBooks, upgrading is impossible. So, do we gain anything for this lack of upgradeability?

This article looks at Apple’s new unified memory architecture that they claim gives large performance gains. Apple hasn’t released a lot of in depth technical details on the M1 chip, but from what they have released, and now that people have received these units and performed real benchmarks we can see that Apple really does have something here.

Why Would We Want To Upgrade?

In the case of the 2008 MacBook Pro, when it was new, 4Gig was expensive. Now 4Gig of DDR2 memory is $10. It makes total sense to upgrade to maximum memory. Similarly, the MacBook came with a mechanical hard drive which is quite small and slow by modern standards. It was easy to upgrade these to larger faster SSD drives for around $40 each.Often this is the case that the maximum configuration is too expensive at the time of the original purchase, but becomes much cheaper a few years later. Performing these upgrades then lets you get quite a few years more service out of your computer. The 2008 MacBook Pros upgraded to maximum configuration are still quite usable computers (of course you have to run Linux on them, since Apple software no longer supports them).

Enter the New Apple ARM Based Macintoshes

The newly released MacBooks based on ARM System on a Chips (SoCs) have their RAM integrated into their CPU chips. This means that unless you can replace the entire CPU, you can’t upgrade the RAM. Apple claims integrating the memory into the CPU gives them a number of performance gains, since the memory is high speed, very close to all the devices and shared by all the devices. A major bottleneck in modern computer systems is moving data between memory and the CPU or copying data from the CPU’s memory to the GPU’s memory.

AMD and nVidia graphics cards contain their own memory separate from the memory used by the CPU. So a modern gaming computer might have 16Gig RAM for the CPU and then 8Gig or RAM for the GPU. If you want the GPU to perform a matrix multiplication you need to transfer the matrices to the GPU, tell it to multiply them and then transfer the resulting matrix back to the CPU’s memory. nVidia and AMD claim this is necessary since they incorporate newer faster memory in their GPUs than is typically installed on the CPU motherboard. Most CPUs currently use DDR4 memory whereas GPUs typically incorporate faster DDR6 memory. There are GPUs (like the Raspberry Pi’s) that share CPU memory, however these tend to be lower end (cheaper since they don’t have their own memory) and slower since there is more contention for the CPU memory.

The Apple M1 tries to address these problems by incorporating the fastest memory and then providing a much wider bandwidth between the memory and the various processors on the M1 chip. For the M1 there isn’t just the GPU, but also a Neural Engine for AI processing (which is similar to a GPU) as well as other units for specialized functions like data encryption and video decoding. Most newer computers have a 64-bit memory controller that can move 64-bits of data between the CPU and RAM at the speed of the RAM, sometimes the RAM is as fast as the CPU, sometimes it’s a bit slower. Newer CPUs have large caches to try to save on some of this transfer, but the caches are in MegaBytes whereas main memory is in GigaBytes. Separate GPU memory helps by having a completely separate memory controller, expensive servers help by having multiple memory controllers. Apple’s block diagrams seem to indicate they have two 64-bit memory controllers or parallel pathways to main memory, but this is a bit hypothetical. As people are benchmarking these new computers, it does appear that Apple has made some significant performance improvements.

Summary

If Apple has greatly reduced the memory bottleneck and having the GPU, Neural Engine and CPU all accessing the same memory doesn’t cause too much contention, then saving the copying of data between the processing units will be a big advantage. On the downside, you should overbuy on the memory now, since you can’t upgrade it later.

If you are interested in the M1 ARM processor and want to learn more about how it works internally, then consider my book: Programming with 64-Bit ARM Assembly Language.

Written by smist08

November 20, 2020 at 1:09 pm

Apple Mac ARM M1 Competitive Trade-offs

with 3 comments

Introduction

Earlier this week, Apple started their transition from using Intel CPUs to ARM CPUs in their Macintosh laptop and desktop computers. Although the actual computers don’t ship for a week or two, there has been a lot of press coverage comparing these to various Intel/AMD offerings. In this article, we’ll look at what Apple is trying to accomplish with this transition. Note that Apple pushed out three computers to fulfill their promise of shipping ARM based Macs before the end of the year, so there is lots of room for gaps in the current offerings to be filled early in 2021.

It’s All About Power and Heat

iPhones and iPads are powerful computers in their own right, and they need to run for days without recharging. Compare that to top of the line gaming laptops that have trouble running for more than an hour or two without recharging. Apple realizes that we live in a mobile world and don’t like being a slave to power cords. With the ARM processor, Apple realized that they now have the power of Intel/AMD chips, but at a fraction of the power. Hence, the new M1/ARM based Apple Laptops can run for possibly 24 hours of use between charges. This means, if you are ever allowed to fly to Australia again, you can work for the entire 16 hour flight without a worry.

The M1 ARM CPU used in these new computers utilize the ARM big.LITTLE architecture that allows the mixing of high power/low efficiency cores with low power/high efficiency cores on the same chip. The MacOS Big Sur has been tuned to correctly schedule threads to the appropriate type of core for the job that needs to be done. The M1 has four high power cores and four low power cores. Intel is considering introducing big.LITTLE chips, but if they do, they will need to get both Windows and Linux to add support for this, until then, it will do more harm than good.

The M1 ARM CPU is manufactured using TSMC’s 5nm process technology which allows record numbers of transistors on a chip. As a consequence Apple has combined the CPUs, GPUs, memory and other co-processors all onto a single chip. This is a potential problem as the more functions of the chip in use at once, the more heat will be produced. The MacBook Air doesn’t even have a fan, so it will be interesting to see how it manages under heavy load. If you run a workload that utilizes all eight cores, is graphics intensive and performs AI computations, then that will be a lot of transistors in use at once.

All modern chips monitor their temperature and if it gets too high then they throttle down the speed. The higher the clock rate of a CPU, the higher the power consumed and the more heat produced. Thus throttling down the CPU is a good method to let a CPU cool off.

Apple has chosen their design to maximize battery life, which means reducing the power used. Reducing the power used, then reduces the cooling required that then saves on the energy and space used by fans. Apple is making big claims about how good the performance of these new computers will be, while maintaining low power usage and not overheating. If they are successful at this, it will be a huge competitive advantage.

Look at the top performing Intel/AMD gaming systems. They require liquid cooling on the CPU chip and there are typically six fans installed in the case. Then an nVidia RTX3090 has a further three fans installed on the GPU board. With all this, people still complain about overheating and the throttling of the games they are playing. Are the new Apple systems going to turn these into dinosaurs?

Software Compatibility

The downside of switching CPU types, is that you require all your software to be re-compiled. Admittedly for Java, you just need the JavaVM recompiled, and interpreted programs like written in Python, Julia or JavaScript shouldn’t require reworking. The truth is that people have been running Intel programs for a long time and they probably have a number of programs that are written in Objective-C (being the Macs previous favorite programming language) and that these will need to be recompiled to work best on the new Macs.

MacOS Big Sur includes an Intel emulation layer called Rosetta that can efficiently translate Intel Assembly language into ARM Assembly language and the claim is that it works quite well. If you received a new M1 MacBook Air today, it would run Microsoft Office this way. Microsoft has a beta of a native ARM version of Office, but the released version could be a few weeks or months away.

Note that the Rosetta emulation layer can’t run everything. If the program uses newer Intel AVX instructions, then it will fail. The emulation only covers the core instruction set and not any extensions like the AVX vector processing instructions.

Another problem is virtual machines (VMs). Many Mac users occasionally have a need for a Windows program. With Intel based Macs you can run Windows in a VM. With the new M1 Macs, new VM software is going to be required to run Intel VMs on the ARM processors and I don’t know how well this will work.

On the other hand, now that the Macs are ARM based, they can natively run all iPhone and iPad apps, so although you lose a few compiled programs and VMs, you gain all this mobile software.

The Raspberry Pi is also ARM based and has a huge amount of software available for this. The Raspberry Pi has already driven a lot of software to ARM, perhaps helping ease the road for Apple.

Summary

Apple made the choice to switch to the ARM processor for their Mac computers to greatly lower power usage, extending their laptop’s battery life. The cost of switching CPUs is software compatibility, which Apple is mitigating with Rosetta and the availability of iOS apps. I think long term this strategy is a winning one. If you mostly run Apple software like Pages or Final Cut then you can get a new Mac now, if you need it. If you are a developer, developing for iOS using XCode then these M1 Macs become the development platform of choice since you can run your software directly without emulation.

Otherwise, you might want to wait till the software you use is re-released with ARM native versions, also down the road there will be more powerful models, for instance with memory beyond 16GB.

If you are interested in the M1 ARM processor and want to learn more about how it works internally, then consider my book: Programming with 64-Bit ARM Assembly Language.

Written by smist08

November 13, 2020 at 10:51 am

Posted in Business, Mobility

Tagged with , , , ,

Apple Macs Move to ARM Processors

with 7 comments

Introduction

I watched Apple’s introduction of their new Mac computers based on Apple Silicon which contain ARM CPUs. Of course I was excited about this since I wrote two books on ARM Assembly Language Programming. ARM processors are used in nearly all cell phones and tablets. They are used in single board computers like the Raspberry Pi as well as many IoT devices. Finally it looks like we are getting a good line of computers based on ARM processors. In this article we’ll look at why this is a good thing, as well as some of the hurdles that Apple will need to jump for this to be a success.

A Bit of History

The first Macs contained Motorola 68000 series CPUs, then Apple moved to IBM’s PowerPC chips and then on to using Intel CPUs like all other PCs. The Motorola 68000 was a CISC CPU that competed with Intel in the early days to be the heart of the PC. Intel won the race and Motorola lost interest in spending the billions that were required to keep this line of processors competitive. Apple made the decision to jump to IBM’s new RISC based PowerPC platform. Initially this was quite successful, but again IBM didn’t think it was worth investing the money required to keep up with Intel. Intel was competing fiercely against AMD to maintain a lead in processor technology and this battle between Intel and AMD left IBM in their dust. Apple saw the writing on the wall and moved the Mac line of computers to Intel processors.

Advance a few years, and the battle has moved to cell phones. Cell phones all use ARM processors mainly due to their lower power requirements. Now there is a furious battle between the various ARM chip makers to have the faster cell phone. Now the tables have turned and Intel is being left in the dust as its chips are getting older and it is having trouble competing. This gives Apple the chance to move to faster ARM processors that use less power (hence longer battery life) with the added advantage that all their devices from watches to phones to tablets to laptops to desktops all use variations of the same ARM processor.

The Apple M1 Processor

With these new ARM based Apple Macs, Apple introduced their new Apple M1 System on a Chip (SoC). This SoC contains eight ARM CPU cores, 4 are high power units, and 4 are lower power. The new MacOS dispatches threads based on whether they need to save power or maximize performance. This chip incorporates the CPU, GPU and memory all into one chip. The main downside of this is that this will be the least upgradeable Mac yet. I would recommend getting a higher configuration since you won’t be able to add to it down the road.

This is an impressive chip that Apple claims will be competitive with Intel i9 processors. It will be interesting to see the real benchmarks when these computers actually ship next week.

Unified Software

Now that iOS and MacOS programs use the same processor, it makes writing applications that run on everything from watches and phones to laptops and desktop easier. If you need some Assembly language optimizations, now you only need to include the same ARM code for all of them. It’s really cool that you can now run iPhone and iPad apps on your Mac.

Downsides

There are a couple of downsides to this approach, one is the lack up upgradeability due to the memory being included on the CPU chip. Another is that all software needs to be recompiled for the ARM processor. Apple has made this as easy as possible, so hopefully all the main software packages will be updated with ARM versions.

Even if a software vendor doesn’t do this (perhaps they went out of business), these new Macs claim they can run the software anyway by using an Intel emulator called Rosetta. We’ll have to get some real feedback on how well this works, but Apple claims it runs Intel programs better than only slightly older Intel processors.

The other headwind with Apple products is the price. These are higher end products that compete with Microsoft Surface and higher end Dell models. However there are a lot of much cheaper laptops from vendors like Acer or HP. I purchased a MacBook Air in 2012 and it is still going strong, a very solid laptop. The Sunshine Coast Tech Hub maintains half a dozen 2008 MacBook Pros that we use for an Arduino kids coding camp and all these laptops are going strong (admittedly upgraded to SSD drives and running Linux Mint). The price of these new ARM laptops are the same as the previous equivalent Intel models and my experience with Apple products is that they do last.

Will Microsoft and Others Follow?

ARM has released their Cortex A78C CPU that is an 8 core CPU for laptops and desktops where all 8 cores are high performance. How many other hardware vendors will try releasing laptops and desktops based on this chip? Linux runs fine on ARM CPUs, just look at the Raspberry Pi or nVidia Jetson Nano. Microsoft has a simplified version of Windows, similar to ChromeOS for ARM laptops. Will Microsoft support the full Windows Home and Pro on ARM? It will be interesting to see what new devices get released in 2021.

Summary

I’m excited about the new ARM based Apple Macs. If you want to learn more about the ARM CPU, check out one of my books on ARM Assembly Language programming such as the one pictured below. It will be interesting to see how these sell compared to Intel/AMD computers and how many other vendors choose to support ARM CPUs in laptops and desktops in 2021.

Written by smist08

November 10, 2020 at 3:12 pm