Stephen Smith's Blog

Musings on Machine Learning…

Apple M1 Unified Memory

leave a comment »


Introduction

I recently upgraded three 2008 MacBook Pros from 1Gig to 4Gig of RAM. It was super-easy, you remove the battery (accessible via a coin), remove a small cover over the RAM and hard-drive, then pop the RAM and push in the new ones. Upgrading the hard drive or RAM on these old laptops is straightforward and anyone can do it. Newer MacBooks require partial disassembly which makes the process harder. For the newest ARM based MacBooks, upgrading is impossible. So, do we gain anything for this lack of upgradeability?

This article looks at Apple’s new unified memory architecture that they claim gives large performance gains. Apple hasn’t released a lot of in depth technical details on the M1 chip, but from what they have released, and now that people have received these units and performed real benchmarks we can see that Apple really does have something here.

Why Would We Want To Upgrade?

In the case of the 2008 MacBook Pro, when it was new, 4Gig was expensive. Now 4Gig of DDR2 memory is $10. It makes total sense to upgrade to maximum memory. Similarly, the MacBook came with a mechanical hard drive which is quite small and slow by modern standards. It was easy to upgrade these to larger faster SSD drives for around $40 each.Often this is the case that the maximum configuration is too expensive at the time of the original purchase, but becomes much cheaper a few years later. Performing these upgrades then lets you get quite a few years more service out of your computer. The 2008 MacBook Pros upgraded to maximum configuration are still quite usable computers (of course you have to run Linux on them, since Apple software no longer supports them).

Enter the New Apple ARM Based Macintoshes

The newly released MacBooks based on ARM System on a Chips (SoCs) have their RAM integrated into their CPU chips. This means that unless you can replace the entire CPU, you can’t upgrade the RAM. Apple claims integrating the memory into the CPU gives them a number of performance gains, since the memory is high speed, very close to all the devices and shared by all the devices. A major bottleneck in modern computer systems is moving data between memory and the CPU or copying data from the CPU’s memory to the GPU’s memory.

AMD and nVidia graphics cards contain their own memory separate from the memory used by the CPU. So a modern gaming computer might have 16Gig RAM for the CPU and then 8Gig or RAM for the GPU. If you want the GPU to perform a matrix multiplication you need to transfer the matrices to the GPU, tell it to multiply them and then transfer the resulting matrix back to the CPU’s memory. nVidia and AMD claim this is necessary since they incorporate newer faster memory in their GPUs than is typically installed on the CPU motherboard. Most CPUs currently use DDR4 memory whereas GPUs typically incorporate faster DDR6 memory. There are GPUs (like the Raspberry Pi’s) that share CPU memory, however these tend to be lower end (cheaper since they don’t have their own memory) and slower since there is more contention for the CPU memory.

The Apple M1 tries to address these problems by incorporating the fastest memory and then providing a much wider bandwidth between the memory and the various processors on the M1 chip. For the M1 there isn’t just the GPU, but also a Neural Engine for AI processing (which is similar to a GPU) as well as other units for specialized functions like data encryption and video decoding. Most newer computers have a 64-bit memory controller that can move 64-bits of data between the CPU and RAM at the speed of the RAM, sometimes the RAM is as fast as the CPU, sometimes it’s a bit slower. Newer CPUs have large caches to try to save on some of this transfer, but the caches are in MegaBytes whereas main memory is in GigaBytes. Separate GPU memory helps by having a completely separate memory controller, expensive servers help by having multiple memory controllers. Apple’s block diagrams seem to indicate they have two 64-bit memory controllers or parallel pathways to main memory, but this is a bit hypothetical. As people are benchmarking these new computers, it does appear that Apple has made some significant performance improvements.

Summary

If Apple has greatly reduced the memory bottleneck and having the GPU, Neural Engine and CPU all accessing the same memory doesn’t cause too much contention, then saving the copying of data between the processing units will be a big advantage. On the downside, you should overbuy on the memory now, since you can’t upgrade it later.

If you are interested in the M1 ARM processor and want to learn more about how it works internally, then consider my book: Programming with 64-Bit ARM Assembly Language.

Written by smist08

November 20, 2020 at 1:09 pm

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.