Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘risc-v

RISC Instruction Encoding

with one comment

Introduction

Modern microprocessors execute programs from memory that are formatted specifically for the processor and the instructions it is capable of executing. This machine code is generated by tools, either fairly directly from Assembly Language source code or via a compiler that translates a high level language to machine code. There are two popular philosophies on how machine code is structured.  One is Reduced Instruction Set Computers (RISC) exemplified by ARM, RISC-V, PowerPC and MIPs processors, and the other is Complex Instruction Set Computers (CISC) exemplified by Intel and AMD processors. In RISC computers, each instruction is quite small and does a tiny bit of work, in CISC computers the instructions tend to be larger and each one does more work. The advantage of RISC processors is that the circuitry is simpler which means they use less power, this is why nearly all mobile devices use RISC processors. In this article we will be looking at some of the tricks RISC computers use to keep their instructions small and quick.

32-Bit Instructions

Most RISC processors use 32-bit machine code instructions. It doesn’t matter if the processor is 32-bit or 64-bits, this only refers to the size of pointers for memory addressing and the size of the registers, in both cases the instructions stay at 32-bits in length. With all rules there are exceptions, for instance in RISC-V processors most instructions are 32-bit, but there is a facility to allow longer instructions where necessary and in ARM processors, in 32-bit mode, there is the ability to limit instructions to 16-bits in length. Modern processors are very powerful and have a lot of functionality, so how do they encode all the information needed for an instruction into 32-bits? This restriction imposes a lot of discipline on the instruction set designers, but the solutions they have come up with are quite interesting. In comparison, Intel x86 instructions are variable length and often 120 bits in length.

Having all the instructions 32-bits in length makes creating an efficient execution pipeline very efficient, since you can load and start working on a set of instructions in parallel. You don’t need to decode one instruction to learn where the next one starts. You know there is a new instruction every 4-bytes in memory. This uniformity saves a lot of complexity and greatly enhances instruction execution throughput.

Where Do the Bits Go?

What needs to be encoded in a machine language instruction? Here are some of the possible components:

  1. The opcode. This tells the processor what the instruction does, whether its add two numbers, load data from memory or jump to another program location. If the opcode takes 8-bits then there are 256 possible instructions. To really save space some opcodes can be less bits, like perhaps if it start 011 then the other bits can go to the immediate value.
  2. Registers. Microprocessors load data into registers and then process the data in the registers. Often two or three registers need to be specified in an instruction, like the two numbers to add and then where to put the result. If there are 32 registers, then each register field will take 5-bits.
  3. Immediate data. Most processors have a way to encode some data in an instruction. Like “LOAD R1, 5” might mean load the value 5 into register R1. Here 5 is data encoded in the instruction, and called an immediate value. The size of these varies based on the instruction and use cases.
  4. Memory Addresses. Data has to be loaded from memory, or program execution has to jump to a different memory location. Note that in a modern computer memory addresses are either 32-bit or 64-bits. These are both too big to fit in a 32-bit instruction (we need at least an opcode as well). In RISC, how do we specify memory addresses?
  5. Bits for additional parameters. Perhaps there are several addressing modes, or perhaps other options for an instruction that need to be encoded. Often there are a few bits in each instruction for this purpose.

 

That’s a lot of information to pack into a 32-bit instruction. How do they do it? My introduction to Raspberry Pi Assembly Language shows how this is done for ARM processors in 32-bit mode.

How to Load a Register

Let’s look at how to load a 32-bit register with data. We can’t fit a full 32-bit value inside a 32-bit instruction, so what do we do? You might suggest that we load the value from memory rather than encode the value in the instruction. This is a legitimate thing to do, but it just moves the problem since we now need to load the 32 or 64-bit memory address into memory first.

First we could do it in two steps, perhaps we can fit a 16-bit value in an instruction and then perform two load instructions to load the value. In an ARM processor, there is a MOV instruction that can load a 16-bit immediate value and then a MOVT instructions that loads a 16-immediate value into the top 16-bits of a register. Suppose we want to load 0x12345678 into register R1, then in ARM 32-Bit Assembly we would encode:

MOVT R1, #0x1234
MOV  R1, #0x5678

This works and we do expect that working in RISC is going to take lots of small instructions to perform the work we need to get done. However this is somehow not satisfying, since this is something we do a lot and it seems wasteful to take two instructions. The other thing is that if we are running 64-bit mode and want to load a 64-bit register then this will take 4 instructions.

Another trick is to make use of the Program Counter (PC) register. This register points to the instructions currently being executed. So if we can position the value near this then we could load it by dereferencing the PC (plus a small offset). As long as the offset fits in the amount of room we have for an immediate value then this could work. In the ARM world, the Assembler helps us generate this code. We write something like:

LDR R1, =mydata

...

mydata: .WORD 0x12345678

Then the Assembler will convert the LDR instruction to something like:

LDR R1, [PC, #20]

Which means load the data pointed to by PC + 20 into R1. Now it only takes one instruction to load the data.  This technique has the advantage that it will remain one instruction to execute when dealing with 64-bit data.

Summary

This was a quick discussion of how RISC processors encode each machine code instruction as a 32-bit value. This is one of the key things that keeps RISC processors simple, allowing them to be quick while at the same time simple, and hence more power efficient.

If you are interested in machine code or Assembly Language programming, be sure to check out my book: “Raspberry Pi Assembly Language Programming” from Apress. It is available on all major booksellers or directly from Apress here.

Written by smist08

November 8, 2019 at 11:55 am

Introducing Risc-V

with 3 comments

Introduction

Risc-V (pronounced Risc Five) is an open source hardware Instruction Set Architecture (ISA) for Reduced Instruction Set Computers (RISC) developed by UC Berkeley. The Five is because this is Berkeley’s fifth RISC ISA design. This is a fully open standard, meaning that any chip manufacturer can create CPUs that use this instruction set without having to pay royalties. Currently the lion’s share of the CPU market is dominated by two camps, one is the CISC based x86 architecture from Intel with AMD as an alternate source, the other is the ARM camp where the designs come from ARM Holdings and then chip manufacturers can license the designs with royalty agreements.

The x86 architecture dominates server, workstation and laptop computers. These are quite powerful CPUs, but at the expense of using more power. The ARM architecture dominates cell phones, tables and Single Board Computers (SBCs) like the Raspberry Pi, these are usually a bit less powerful, but use far less power and are typically much cheaper.

Why do we need a third camp? What are the advantages and what are some of the features of Risc-V? This blog article will start to explore the Risc-V architecture and why people are excited about it.

Economies of Scale

The computer hardware business is competitive. For instance Western Digital harddrives each contain an ARM CPU to manage the controller functions and handle the caching. Saving a few dollars for each drive by saving the ARM royalty is a big deal. With Risc-V, Western Digital can make or buy a specialized Risc-V processor and then save the ARM royalty, either improving their profits or making their drives more price competitive.

The difficulty with introducing a new CPU architecture is to be price competitive you have to manufacture in huge quantities or your product will be very expensive. This means for there to be inexpensive Risc-V processors on the market, there has to be some large orders and that’s why adoption by large companies like Western Digital is so important.

Another giant boost to the Risc-V world is a direct result of Trump’s trade was with China. With the US restricting trade in ARM and x86 technology to China, Chinese computer manufacturers are madly investing in Risc-V, since it is open source and trade restrictions can’t be applied. If a major Chinese cell phone manufacturer can no longer get access to the latest ARM chips, then switching to Risc-V will be attractive. This is a big risk that Trump is taking, because if the rest of the world invests in Risc-V, then it might greatly reduce Intel, AMD and ARM’s influence and leadership, having the opposite effect to what Trump wants.

The Software Chicken & Egg Problem

If you create a wonderful new CPU, no matter how good it is, you still need software. At a start you need operating systems, compilers and debuggers. Developing these can be as expensive as developing the CPU chip itself. This is where open source comes to the rescue. UC Berkeley along with many other contributors added Risc-V support to the GNU Compiler Collection (GCC) and worked with Debian Linux to produce a Risc-V version of Linux.

Another big help is the availability of open source emulator technology. You are very limited in your choices of actual Risc-V hardware right now, but you can easily set up an emulator to play with. If you’ve ever played with RetroPie, you know the open source world can emulate pretty much any computer ever made. There are several emulator environments available for Risc-V so you can get going on learning the architecture and writing software as the hardware slowly starts to emerge.

Risc-V Basics

The Risc-V architecture is modular, where you start with a core simple arithmetic unit that can load/store registers, add, subtract, perform logical operations, compare and branch. There are 32 registers labeled x0 to x31. However x0 is a dedicated zero register. There is also a program counter (PC). The hardware doesn’t specify any other functionality to the registers, the rest is by software convention, such as which register is the stack pointer, which registers are used for passing function parameters, etc. Base instructions are 32-bits, but an extension module allows for 16-bit compressed instructions and extension modules can define longer instructions. The specification supports three different address sizes: 32-bit, 64-bit and 128-bit. This is quite forward thinking as we don’t expect the largest most powerful computer in the world to exceed 64-bits until 2030 or so.

Then you start adding modules like the multiply/divide module, atomic instruction module, various floating point modules, the compressed instruction module, and quite a few others. Some of these have their specifications frozen, others are still being worked on. The goal is to allow chip manufacturers to produce silicon that exactly meets their needs and keeps power utilization to a minimum.

Getting Started

Most of the current Risc-V hardware available for DIYers are small low power/low memory microcontrollers similar to Arduinos. I’m more interested in getting a Risc-V SBC similar to a Raspberry Pi or NVidia Jetson. As a result I don’t have a physical Risc-V computer to play with, but can still learn about Risc-V and play with Risc-V Assembly language programming in an emulator environment.

I’ll list the resources I found useful and the environment I’m using. Then in future blog articles, I’ll go into more detail.

  • The Risc-V Specifications. These are the documents on the ISA. I found them readable, and they give the rationale for the decisions they took along with the reasons for a number of roads they didn’t go down. The only thing missing are practical examples.
  • The Debian Risc-V Wiki Page. There is a lot of useful information here.  A very big help was how to install the Risc-V cross compilation tools on any Debian release. I used these instructions to install the Risc-V GCC tools on my Ubuntu laptop.
  • TinyEMU, a Risc-V Emulator. There are several Risc-V emulators, this is the first one I tried and its worked fine for me so far.
  • RV8 a Risc-V Emulator. This emulator looks good, but I haven’t had time to try it out yet. They have a good Risc-V instruction set summary.
  • SiFive Hardware. SiFive have produced a number of limited run Risc-V microcontrollers. Their website has lots of useful information and their employees are major contributors to various Risc-V open source projects. They have started a Risc-V Assembly Programmers Guide.

Summary

The Risc-V architecture is very interesting. It is always nice to start with a clean slate and learn from all that has gone before it. If this ISA gains enough steam to achieve volumes where it can compete with ARM, it is going to allow very powerful low cost computers. I’m very hopeful that perhaps next year we’ll see a $25 Risc-V based Raspberry Pi 4B competitor with 4Gig RAM and an M.2 SSD slot.

Written by smist08

September 6, 2019 at 6:07 pm

Posted in Business

Tagged with , , , ,