Stephen Smith's Blog

Musings on Machine Learning…

RISC Instruction Encoding

with one comment


Introduction

Modern microprocessors execute programs from memory that are formatted specifically for the processor and the instructions it is capable of executing. This machine code is generated by tools, either fairly directly from Assembly Language source code or via a compiler that translates a high level language to machine code. There are two popular philosophies on how machine code is structured.  One is Reduced Instruction Set Computers (RISC) exemplified by ARM, RISC-V, PowerPC and MIPs processors, and the other is Complex Instruction Set Computers (CISC) exemplified by Intel and AMD processors. In RISC computers, each instruction is quite small and does a tiny bit of work, in CISC computers the instructions tend to be larger and each one does more work. The advantage of RISC processors is that the circuitry is simpler which means they use less power, this is why nearly all mobile devices use RISC processors. In this article we will be looking at some of the tricks RISC computers use to keep their instructions small and quick.

32-Bit Instructions

Most RISC processors use 32-bit machine code instructions. It doesn’t matter if the processor is 32-bit or 64-bits, this only refers to the size of pointers for memory addressing and the size of the registers, in both cases the instructions stay at 32-bits in length. With all rules there are exceptions, for instance in RISC-V processors most instructions are 32-bit, but there is a facility to allow longer instructions where necessary and in ARM processors, in 32-bit mode, there is the ability to limit instructions to 16-bits in length. Modern processors are very powerful and have a lot of functionality, so how do they encode all the information needed for an instruction into 32-bits? This restriction imposes a lot of discipline on the instruction set designers, but the solutions they have come up with are quite interesting. In comparison, Intel x86 instructions are variable length and often 120 bits in length.

Having all the instructions 32-bits in length makes creating an efficient execution pipeline very efficient, since you can load and start working on a set of instructions in parallel. You don’t need to decode one instruction to learn where the next one starts. You know there is a new instruction every 4-bytes in memory. This uniformity saves a lot of complexity and greatly enhances instruction execution throughput.

Where Do the Bits Go?

What needs to be encoded in a machine language instruction? Here are some of the possible components:

  1. The opcode. This tells the processor what the instruction does, whether its add two numbers, load data from memory or jump to another program location. If the opcode takes 8-bits then there are 256 possible instructions. To really save space some opcodes can be less bits, like perhaps if it start 011 then the other bits can go to the immediate value.
  2. Registers. Microprocessors load data into registers and then process the data in the registers. Often two or three registers need to be specified in an instruction, like the two numbers to add and then where to put the result. If there are 32 registers, then each register field will take 5-bits.
  3. Immediate data. Most processors have a way to encode some data in an instruction. Like “LOAD R1, 5” might mean load the value 5 into register R1. Here 5 is data encoded in the instruction, and called an immediate value. The size of these varies based on the instruction and use cases.
  4. Memory Addresses. Data has to be loaded from memory, or program execution has to jump to a different memory location. Note that in a modern computer memory addresses are either 32-bit or 64-bits. These are both too big to fit in a 32-bit instruction (we need at least an opcode as well). In RISC, how do we specify memory addresses?
  5. Bits for additional parameters. Perhaps there are several addressing modes, or perhaps other options for an instruction that need to be encoded. Often there are a few bits in each instruction for this purpose.

 

That’s a lot of information to pack into a 32-bit instruction. How do they do it? My introduction to Raspberry Pi Assembly Language shows how this is done for ARM processors in 32-bit mode.

How to Load a Register

Let’s look at how to load a 32-bit register with data. We can’t fit a full 32-bit value inside a 32-bit instruction, so what do we do? You might suggest that we load the value from memory rather than encode the value in the instruction. This is a legitimate thing to do, but it just moves the problem since we now need to load the 32 or 64-bit memory address into memory first.

First we could do it in two steps, perhaps we can fit a 16-bit value in an instruction and then perform two load instructions to load the value. In an ARM processor, there is a MOV instruction that can load a 16-bit immediate value and then a MOVT instructions that loads a 16-immediate value into the top 16-bits of a register. Suppose we want to load 0x12345678 into register R1, then in ARM 32-Bit Assembly we would encode:

MOVT R1, #0x1234
MOV  R1, #0x5678

This works and we do expect that working in RISC is going to take lots of small instructions to perform the work we need to get done. However this is somehow not satisfying, since this is something we do a lot and it seems wasteful to take two instructions. The other thing is that if we are running 64-bit mode and want to load a 64-bit register then this will take 4 instructions.

Another trick is to make use of the Program Counter (PC) register. This register points to the instructions currently being executed. So if we can position the value near this then we could load it by dereferencing the PC (plus a small offset). As long as the offset fits in the amount of room we have for an immediate value then this could work. In the ARM world, the Assembler helps us generate this code. We write something like:

LDR R1, =mydata

...

mydata: .WORD 0x12345678

Then the Assembler will convert the LDR instruction to something like:

LDR R1, [PC, #20]

Which means load the data pointed to by PC + 20 into R1. Now it only takes one instruction to load the data.  This technique has the advantage that it will remain one instruction to execute when dealing with 64-bit data.

Summary

This was a quick discussion of how RISC processors encode each machine code instruction as a 32-bit value. This is one of the key things that keeps RISC processors simple, allowing them to be quick while at the same time simple, and hence more power efficient.

If you are interested in machine code or Assembly Language programming, be sure to check out my book: “Raspberry Pi Assembly Language Programming” from Apress. It is available on all major booksellers or directly from Apress here.

Written by smist08

November 8, 2019 at 11:55 am

One Response

Subscribe to comments with RSS.

  1. There seems to be a dearth of good material on ARM assembly programming, since most assembly language books focus on x86. I commend you for working to close this gap. A knowledge of ARM assembly language is definitely an asset.

    psychocod3r

    November 8, 2019 at 3:44 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: