Stephen Smith's Blog

Musings on Machine Learning…

Archive for September 2019

The Race for 64-Bit Raspberry Pi 4 Linux

leave a comment »


When the Raspberry Pi 4 was announced and shipped this past June, it caught everyone by surprise. No one was expecting a new Pi until next year sometime, if we were lucky. The Raspberry Pi 4 has updated faster components, including an updated ARM processor and USB 3.0. Raspbian, the official version version of Linux for the Pi was updated to be based on Debian Buster and shipped before the official Debian Buster actually shipped. However, Raspbian is still 32-bit, where the Raspberry foundation say this is so they only have to support one version of Linux for all Raspberry Pi devices.

Others in the Linux community, have then figured out how to run 64-bit Linux’s on the Raspberry Pi. For instance there are 64-bit versions of Ubuntu Mate, Ubuntu Server and Kali Linux. These work on the Raspberry Pi 3, but due to changes in the Raspberry architecture, didn’t work on the Raspberry Pi 4 when it shipped. We still don’t have official 64-bit releases, but we are reaching the point where the test builds are starting to work quite well.

Why 64-Bit?

To be honest, 64-bit Linux never ran very well on the Raspberry Pi 3. 64-bit Linux and 64-bit programs requires quite a bit more memory than their 32-bit equivalents. Each memory address is now 64-bits instead of 32-bits and there is a tendency to use 64-bit integers rather than 32-bit integers. The ARM processor instructions are 32-bits in both 32-bit and 64-bit mode, so programs tend to be about the same size, though 64-bit doesn’t have use of the 16-bit ARM thumb instructions. The Raspberry Pi 3 is limited to 1Gig of memory, that can just barely run a 64-bit Linux, and tends to run out of memory quickly as you run programs, like web browsers. The Raspberry Pi 4 now supports up to 4Gig of memory and that is sufficient to run 64-bit Linux along with a respectable number of programs. Plus the Raspberry Pi 4 has faster access to the SDCard and USB 3, so you can attach an even faster external drive, so if you do get swapping, it isn’t as painful.

In spite of these limitations, there are reasons to run 64-bit. The main one is that you can get better performance, especially if you actually need to work with 64-bit integers. Further the 64-bit instruction set has been optimised to work better with the execution pipeline, so you don’t get as many stalls when you perform jumps. For instance in 32-bit ARM, there is no function return instructions, so people use regular branches, pop the return address from the stack directly into the program counter or do a number of other tricks. As a result, function returns causes the execution pipeline to be flushed. In 64-bit, the pipeline knows about return instruction and knows where to get the next few instructions.

If 64-Bit Worked on the Pi 3, What’s the Problem?

If we had 64-bit working for the Pi 3, why doesn’t it just work on the Pi 4? There are a few reasons for this. The first obstacle was that Raspberry changed the whole Pi boot process. The Raspberry Pi 3 booted using the GPU. When it started the Pi 3’s GPU runs a program that knows how to read the boot folder on an SDCard and will load this into memory and then start the ARM CPU to run what it loaded into memory. The Raspberry Pi 4 now has a slightly larger EEPROM, this contains ARM code that executes on startup and then loads a further step from the SDCard. The volunteers with the other Linux distributions had to figure out this new process and adapt their code to fit into it. Sadly, the original EEPROM program didn’t provide a good way to do this, so the Linux volunteers have been working with Raspberry to get the support they need in newer versions of the EEPROM software. The most recent version seems to be working reliably finally.

The Raspberry Pi 4 then has all new hardware, so new drivers are required for bluetooth, wifi and everything else. To keep the price down, Raspberry uses older standard components, so there are drivers already written for all these devices. It’s just a matter of including the correct drivers and providing default configurations that work and settings dialogs if anything might need user input. This is all being worked on in parallel, and the consensus is that we are already in a better place than we were for the Pi 3.

It’s All Open Source so Why not Copy from Rasbian?

The Raspbian kernel is open source so anyone can look at that source code, but the EEPROM firmware is not open source. This can be reverse engineered, but that takes time. The Raspberry Pi foundation has been quite helpful in supporting people, but that is no substitute for reading the source code. This again shows the importance of open source BIOS.

Development got off to a slow start, because the Raspberry Pi foundation didn’t give anyone a heads up that this was coming. The developers of Ubuntu Mate had to order their Raspberry Pi 4’s just like everyone else when the announcement happened. This meant no one really got started until into July.

In spite of claiming up and down that they will never produce a 64-bit version of Raspbian, the Raspberry Pi foundation has produced a test Raspbian 64-bit Linux kernel. This then tests out that the Raspberry Pi firmware will support 64-bits and that all the device drivers are available. I couldn’t get this kernel to work, but it is proving very helpful for other developers. It also makes people excited that maybe Raspbian will go 64-bit sooner than later.

How Are We Doing?

The first distribution to get all this going is Gentoo Linux. They have a very smart developer Sakaki who provided the first image that actually worked. This then led to Arch and Majaro Linux releases based on Gentoo. This was a good first step, though these distributions are more for the DIY crowd due to their preference to always installing software from source code.

Next James Chambers put together a guide and images to allow you to install Ubuntu Server 64-bit on the Pi 4. Ubuntu Server is character based, but installing a desktop is no problem. The main limitation of this release is that you need a hardwired Internet connection to start. You can’t start with Wifi as the Wifi software isn’t installed with the base image. If you do have a wired Internet connection, getting it installed and installing the desktop is quite straightforward and works well. Once you have the desktop installed, then you can configure Wifi and ditch the ethernet cable.

The changes required for the Raspberry Pi 4 are being submitted to the standard Linux kernel for version 5.4. When this ships it will have available drivers for the Pi 4 hardware and official support for the Broadcom chips used in the Pi. Version 5.3 of the Linux kernel just shipped and added support for the NVidia Jetson Nano. Hopefully the wait for Linux 5.4 won’t be too long.


I’ve been running the 64-bit version of Ubuntu Linux Server, with the Xubuntu desktop for a few days now and it works really well on my Raspberry Pi 4 with 4Gig of RAM. Performance is great and everything is working. I’ve installed various software, including CubicSDR which works great. This is the first time I’ve been happy with Software Defined Radio running on a Pi.

I look forward to the official releases, and given the state of the current builds, think we are getting quite close.


Written by smist08

September 20, 2019 at 6:38 pm

Risc-V Assembly Language Hello World

leave a comment »


Last time, we started talking about the Risc-V CPU. We looked at some background and now we are going to start to look at its Assembly Language. We’ll write a program to print “Hello World!” to the terminal window, cross-compile it with GCC and run it in a Risc-V emulator. This program lets us start discussing some features of the core Risc-V instruction set. Risc-V supports 32-bit, 64-bit or 128-bit implementations, here we’ll run using 64-bits.

We’ll start with the program, then discuss various aspects of the Assembly instructions it uses and finally discuss how to build and run the program.

Hello World

First let’s present the program and then we’ll discuss it. This program works by making Linux system calls and like all Linux programs starts execution at the globally exported _start label. The program uses the Assembly directives specified in the GCC documentation.

# Risc-V Assembler program to print "Hello World!"
# to stdout.
# a0-a2 - parameters to linux function services
# a7 - linux function number

.global _start      # Provide program starting address to linker

# Setup the parameters to print hello world
# and then call Linux to do it.

_start: addi  a0, x0, 1      # 1 = StdOut
        la    a1, helloworld # load address of helloworld
        addi  a2, x0, 13     # length of our string
        addi  a7, x0, 64     # linux write system call
        ecall                # Call linux to output the string

# Setup the parameters to exit the program
# and then call Linux to do it.

        addi    a0, x0, 0   # Use 0 return code
        addi    a7, x0, 93  # Service command code 93 terminates
        ecall               # Call linux to terminate the program

helloworld:      .ascii "Hello World!\n"

The ‘#’ character is the comment character and anything after it on a line is a comment.


The Risc-V processor has 32 registers labeled x0 to x31 and a program counter (PC). x0 is a zero register, and x1-x31 can be used by programs as they wish. If you look at our listing for Hello World, you will notice that we are using registers a0, a1, a2 and a7. What are these? Since the Risc-V architecture provides no standards for register usage, and typical Assembly language programming requires a stack pointer, subroutine return register and some sort of function calling convention, these are defined in an Application Binary Interface (ABI). This is a software standard that the operating system defines so that programs and libraries can work together properly. Here GCC knows about the Risc-V Linux ABI where register usage is defined as:


Register ABI Use by convention Preserved?
x0 zero hardwired to 0, ignores writes n/a
x1 ra return address for jumps no
x2 sp stack pointer yes
x3 gp global pointer n/a
x4 tp thread pointer n/a
x5 t0 temporary register 0 no
x6 t1 temporary register 1 no
x7 t2 temporary register 2 no
x8 s0 or fp saved register 0 or frame pointer yes
x9 s1 saved register 1 yes
x10 a0 return value or function argument 0 no
x11 a1 return value or function argument 1 no
x12 a2 function argument 2 no
x13 a3 function argument 3 no
x14 a4 function argument 4 no
x15 a5 function argument 5 no
x16 a6 function argument 6 no
x17 a7 function argument 7 no
x18 s2 saved register 2 yes
x19 s3 saved register 3 yes
x20 s4 saved register 4 yes
x21 s5 saved register 5 yes
x22 s6 saved register 6 yes
x23 s7 saved register 7 yes
x24 s8 saved register 8 yes
x25 s9 saved register 9 yes
x26 s10 saved register 10 yes
x27 s11 saved register 11 yes
x28 t3 temporary register 3 no
x29 t4 temporary register 4 no
x30 t5 temporary register 5 no
x31 t6 temporary register 6 no
pc (none) program counter n/a


Which was taken from here. A0 to a7 are the registers used to pass function parameters (arguments), and a7 is used for Linux system calls where you specify the Linux function number from unistd.h.


We only use three Assembly instructions in this program: LA, ADDI and ECALL. Risc-V works hard to define as few instructions as possible. As a result some instructions have multiple uses. For instance ADDI is add an intermediate to a register, which is of the form:

      ADDI RD, RS, imm

Where RD is the destination register, RS the source register and imm is a 12-bit immediate value. Instructions are 32-bits in length so the size of the immediate value tends to be whatever is leftover after setting the opcode and any required registers.

You can define a NOP instruction with:

      ADDI x0, x0, 0

Or load immediate with:

      ADDI RD, X0, imm

The Assembler will take opcodes like NOP or LI (Load Immediate) and translate them into the correct underlying instruction. Here we used ADDI, but when we decompile the compiled program we’ll see the decompiler uses these aliases. These do make your program more readable. All our ADDI instructions use the LI pattern.

Risc-V provides a separate opcode to call the operating system. This is the ECALL instruction. When calling Linux, A7 is the Linux service number and A0 to A6 contain any parameters. When calling write, we need the file descriptor (1 for stdout), the string to write and the length in bytes to write, which we put in registers A0, A1 and A2. The return code which we don’t check will be in A0. This differs from most other architectures that use the interrupt mechanism for this purpose. The Risc-V designers feel it is cleaner to separate operating system calls from interrupts, even though both cause kernel privileged instructions to execute.

The remaining instruction is LA, which isn’t a Risc-V instruction, but rather it tells the Assembler that we want to load an address into a register. Then we leave it up to the Assembler to figure out how to do this. If we are running with 64-bit addressing then this address will be 64-bits. We can’t load this with a single load immediate instruction since the biggest immediate value is 20-bits, with most smaller. This means to load the address we either need to do many instructions to load this address piece by piece using load immediates, shifts, logical operations and/or arithmetic operations. The Assembler has inside knowledge of the value of this address, so it can, say use PC relative addressing to load this address. There are a lot of tricks to deal with 64-bit values from 32-bit instructions, that we don’t have room to go into now, but perhaps in a future blog article.


I don’t have a Risc-V processor, so I built the program using cross-compilation. The instructions on installing the GCC tools for this on a Debian based Linux are here. Then to build you run:

riscv64-linux-gnu-as -march=rv64imac -o HelloWorld.o HelloWorld.s
riscv64-linux-gnu-ld -o HelloWorld HelloWorld.o

We can run a Risc-V objdump to see what was produced with:

riscv64-linux-gnu-objdump -d HelloWorld

And get:

HelloWorld:     file format elf64-littleriscv

Disassembly of section .text:

00000000000100b0 <_start>:
   100b0: 00100513           li a0,1
   100b4: 00001597           auipc a1,0x1
   100b8: 02058593           addi a1,a1,32 # 110d4 <__DATA_BEGIN__>
   100bc: 00d00613           li a2,13
   100c0: 04000893           li a7,64
   100c4: 00000073           ecall
   100c8: 00000513           li a0,0
   100cc: 05d00893           li a7,93
   100d0: 00000073           ecall

We see it has interpreted the ADDI instructions that are just loading an immediate as LI. 

The “LA a1, helloworld” directive has been compiled to:

   100b4: 00001597           auipc a1,0x1
   100b8: 02058593           addi a1,a1,32 # 110d4 <__DATA_BEGIN__>

AUIPC is add immediate to PC, so it put PC+1 into A1 then the ADDI adds the offset to the beginning of the data section. Actually the Assembler set these as needing relocation and then the constants were filled in by the linker in the LD command. The good thing is that the Assembler and Linker took care of these details so we didn’t need to. Loading addresses and large integers is always a challenge in RISC processors.


Now I have our HelloWorld executable on my Intel i3 laptop running Ubuntu Linux. To run it, I use the TinyEMU Risc-V emulator. There are instructions on running a mini version of Linux under the emulator, you can then mount your /tmp folder and copy the executable over. Then it runs.

The whole process is:

stephen@stephenubuntu:~/riscV/HelloWorld$ bash -x ./build
+ riscv64-linux-gnu-as -march=rv64imac -o HelloWorld.o HelloWorld.s
+ riscv64-linux-gnu-ld -o HelloWorld HelloWorld.o
stephen@stephenubuntu:~/riscV/HelloWorld$ cp HelloWorld /tmp
stephen@stephenubuntu:~/riscV/HelloWorld$ cd ../../Downloads/diskimage-linux-riscv-2018-09-23/
stephen@stephenubuntu:~/Downloads/diskimage-linux-riscv-2018-09-23$ temu root_9p-riscv64.cfg 
[    0.307640] NET: Registered protocol family 17
[    0.308079] 9pnet: Installing 9P2000 support
[    0.311914] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
[    0.312757] EXT4-fs (vda): mounting ext2 file system using the ext4 subsystem
[    0.325269] EXT4-fs (vda): mounted filesystem without journal. Opts: (null)
[    0.325552] VFS: Mounted root (ext2 filesystem) on device 254:0.
[    0.326420] devtmpfs: mounted
[    0.326785] Freeing unused kernel memory: 80K
[    0.326949] This architecture does not have kernel memory protection.
~ # mount -t 9p /dev/root /mnt
~ # cp /mnt/HelloWorld .
~ # ./HelloWorld 
Hello World!
~ # 

Note: I had to add:

      kernel: "kernel-riscv64.bin",

To root_9p-riscv64.cfg in order for it to start properly.


This simple Hello World program showed us a basic Risc-V Assembly Language program that loads some registers and calls Linux to print a string and then exit. This was still a long blog posting since we needed to explain all the Assembly elements and then how to build and run the program without requiring any Risc-V hardware.

Written by smist08

September 7, 2019 at 10:38 pm

Posted in RiscV

Tagged with , , , ,

Introducing Risc-V

with one comment


Risc-V (pronounced Risc Five) is an open source hardware Instruction Set Architecture (ISA) for Reduced Instruction Set Computers (RISC) developed by UC Berkeley. The Five is because this is Berkeley’s fifth RISC ISA design. This is a fully open standard, meaning that any chip manufacturer can create CPUs that use this instruction set without having to pay royalties. Currently the lion’s share of the CPU market is dominated by two camps, one is the CISC based x86 architecture from Intel with AMD as an alternate source, the other is the ARM camp where the designs come from ARM Holdings and then chip manufacturers can license the designs with royalty agreements.

The x86 architecture dominates server, workstation and laptop computers. These are quite powerful CPUs, but at the expense of using more power. The ARM architecture dominates cell phones, tables and Single Board Computers (SBCs) like the Raspberry Pi, these are usually a bit less powerful, but use far less power and are typically much cheaper.

Why do we need a third camp? What are the advantages and what are some of the features of Risc-V? This blog article will start to explore the Risc-V architecture and why people are excited about it.

Economies of Scale

The computer hardware business is competitive. For instance Western Digital harddrives each contain an ARM CPU to manage the controller functions and handle the caching. Saving a few dollars for each drive by saving the ARM royalty is a big deal. With Risc-V, Western Digital can make or buy a specialized Risc-V processor and then save the ARM royalty, either improving their profits or making their drives more price competitive.

The difficulty with introducing a new CPU architecture is to be price competitive you have to manufacture in huge quantities or your product will be very expensive. This means for there to be inexpensive Risc-V processors on the market, there has to be some large orders and that’s why adoption by large companies like Western Digital is so important.

Another giant boost to the Risc-V world is a direct result of Trump’s trade was with China. With the US restricting trade in ARM and x86 technology to China, Chinese computer manufacturers are madly investing in Risc-V, since it is open source and trade restrictions can’t be applied. If a major Chinese cell phone manufacturer can no longer get access to the latest ARM chips, then switching to Risc-V will be attractive. This is a big risk that Trump is taking, because if the rest of the world invests in Risc-V, then it might greatly reduce Intel, AMD and ARM’s influence and leadership, having the opposite effect to what Trump wants.

The Software Chicken & Egg Problem

If you create a wonderful new CPU, no matter how good it is, you still need software. At a start you need operating systems, compilers and debuggers. Developing these can be as expensive as developing the CPU chip itself. This is where open source comes to the rescue. UC Berkeley along with many other contributors added Risc-V support to the GNU Compiler Collection (GCC) and worked with Debian Linux to produce a Risc-V version of Linux.

Another big help is the availability of open source emulator technology. You are very limited in your choices of actual Risc-V hardware right now, but you can easily set up an emulator to play with. If you’ve ever played with RetroPie, you know the open source world can emulate pretty much any computer ever made. There are several emulator environments available for Risc-V so you can get going on learning the architecture and writing software as the hardware slowly starts to emerge.

Risc-V Basics

The Risc-V architecture is modular, where you start with a core simple arithmetic unit that can load/store registers, add, subtract, perform logical operations, compare and branch. There are 32 registers labeled x0 to x31. However x0 is a dedicated zero register. There is also a program counter (PC). The hardware doesn’t specify any other functionality to the registers, the rest is by software convention, such as which register is the stack pointer, which registers are used for passing function parameters, etc. Base instructions are 32-bits, but an extension module allows for 16-bit compressed instructions and extension modules can define longer instructions. The specification supports three different address sizes: 32-bit, 64-bit and 128-bit. This is quite forward thinking as we don’t expect the largest most powerful computer in the world to exceed 64-bits until 2030 or so.

Then you start adding modules like the multiply/divide module, atomic instruction module, various floating point modules, the compressed instruction module, and quite a few others. Some of these have their specifications frozen, others are still being worked on. The goal is to allow chip manufacturers to produce silicon that exactly meets their needs and keeps power utilization to a minimum.

Getting Started

Most of the current Risc-V hardware available for DIYers are small low power/low memory microcontrollers similar to Arduinos. I’m more interested in getting a Risc-V SBC similar to a Raspberry Pi or NVidia Jetson. As a result I don’t have a physical Risc-V computer to play with, but can still learn about Risc-V and play with Risc-V Assembly language programming in an emulator environment.

I’ll list the resources I found useful and the environment I’m using. Then in future blog articles, I’ll go into more detail.

  • The Risc-V Specifications. These are the documents on the ISA. I found them readable, and they give the rationale for the decisions they took along with the reasons for a number of roads they didn’t go down. The only thing missing are practical examples.
  • The Debian Risc-V Wiki Page. There is a lot of useful information here.  A very big help was how to install the Risc-V cross compilation tools on any Debian release. I used these instructions to install the Risc-V GCC tools on my Ubuntu laptop.
  • TinyEMU, a Risc-V Emulator. There are several Risc-V emulators, this is the first one I tried and its worked fine for me so far.
  • RV8 a Risc-V Emulator. This emulator looks good, but I haven’t had time to try it out yet. They have a good Risc-V instruction set summary.
  • SiFive Hardware. SiFive have produced a number of limited run Risc-V microcontrollers. Their website has lots of useful information and their employees are major contributors to various Risc-V open source projects. They have started a Risc-V Assembly Programmers Guide.


The Risc-V architecture is very interesting. It is always nice to start with a clean slate and learn from all that has gone before it. If this ISA gains enough steam to achieve volumes where it can compete with ARM, it is going to allow very powerful low cost computers. I’m very hopeful that perhaps next year we’ll see a $25 Risc-V based Raspberry Pi 4B competitor with 4Gig RAM and an M.2 SSD slot.

Written by smist08

September 6, 2019 at 6:07 pm

Posted in Business

Tagged with , , , ,