Stephen Smith's Blog

Musings on Machine Learning…

Archive for the ‘Business’ Category

Fallout From ARM’s Success

with one comment

Introduction

Last time, we talked about a number of ARM’s recent successes. This time we’ll discuss a few of the consequences for the rest of the industry. Many people are discussing the effect on Intel and AMD, but probably a bigger victim of the ARM steamroller is RISC-V, the open source processor.

Trouble for Intel’s Profits

This past year wasn’t a good one for Intel. They’ve been having trouble keeping up with chip manufacturing technology. Most other vendors outsource their chip manufacturing to TSMC, Samsung and a couple of others. What has happened is that TSMC is so large that it is out-spending Intel on R&D by orders of manufacturing and as a result is years ahead of Intel in chip technology. The big winners in this are AMD and ARM which now manufacture denser, faster, more power efficient chips than Intel. AMD gave up manufacturing their chips themselves some years ago and ARM never manufactured chips itself. 

Better chip manufacturing technology allows AMD and ARM to fit more processing cores on each chip or produce products in smaller form factors.

Intel’s main problem this past year has been AMD which has been chipping away at their market share. Now with Apple switching to ARM processors, this could be the start of a migration away from Intel. Microsoft already has an ARM version of their Surface notebook running a limited version of Windows, but they could easily produce something more powerful running a full version of Windows. Similarly, other manufacturers, such as Dell or HP could start producing ARM based laptops and workstations running Linux.

Although AMD doesn’t have Intel’s manufacturing problems, it does have a problem with requiring all its chips to support all the instructions introduced into the x86/x64 architecture over the many years of its existence. Modern x86 chips run RISC cores internally, but have to translate the old CISC instructions into RISC instructions as they run. This extra layer is required to keep all those old DOS and Windows programs running, many of which are no longer supported, but used by many users. Both Intel and AMD are at a competitive disadvantage to ARM and RISC-V, who don’t need to waste circuitry doing this, and extra circuitry means higher power consumption and heat production.

Today Intel’s most profitable chips are its data center focused Xeon processors. These are powerful multi-core chips, but with more and more cores being added to ARM processors, even here ARM is starting to chip away at Intel.

RISC-V is having Trouble Launching

I’ve blogged on RISC-V processors a couple of times, this is an open source hardware specification so you can develop a processor without paying royalties or fees to any other companies. Anyone can manufacture an ARM processor, but if they use the ARM instruction set, they need to pay royalties to ARM Holdings. The hope of the RISC-V folks was to stimulate competitive innovation and produce lower cost, more powerful processors.

The reality has been that companies designing RISC-V chips can’t get orders to manufacture in the volume they need to be price competitive.

RISC-V is still ticking along, but it is limited to the following applications:

  • Providing low cost processors for the Arduino market, usually 32-bit processors with a few meg of memory.
  • Producing specialty chips for things like AI processors. Again this is having trouble getting going due to low volumes.
  • Manufacturers like Western Digital using them as embedded processors in their products, like WD’s disk controllers.

What RISC-V really needs is a Single Board Computer (SBC) like the Raspberry Pi. This means with comparable performance and price. Plus they need to run Linux in a stable supported way. Without this there won’t be any software development and they won’t be able to gain any sort of foothold. Doing this will be extremely difficult given how powerful and cheap the current crop of ARM based SBCs are. The level of software support for ARM in the Linux world is phenomenal.

Summary

ARM certainly isn’t going to eradicate Intel and AMD anytime soon. But even a small dent in their sales can send their stock price into a tailspin. Investors are going to have to watch the trends very closely, in case they need to bail. RISC-V will continue to have difficulty gaining acceptance, and manufacturing a competitive chip. More companies will adopt ARM and this will increase its competitive advantage. Here ARM’s strategy of licensing designs rather than chips is really paying off in fielding more and more competition for its rivals. Next year will be a very good one for ARM and likely an even tougher year for Intel.

The main conclusion here is that if you are a programmer, you should have a look at ARM and a good way to learn about it is to study its Assembly Language, perhaps by reading my book: “Programming with 64-Bit ARM Assembly Language”.

Written by smist08

July 3, 2020 at 11:23 am

Posted in Business

Tagged with , , , ,

Exciting Days for ARM Processors

with 5 comments

Introduction

ARM CPUs have long dominated the mobile world, nearly all Apple and Android phones and tablets utilize some model of ARM processor. However Intel and AMD still dominate the laptop, desktop, server and supercomputer markets. This week we saw a number of announcements where this will likely change:

  1. Apple announced they are going to transition all Mac computers to the ARM processor over two years.
  2. Ampere announced a 128-core server ARM processor.
  3. Japan now has the world’s most powerful supercomputer and it is based on 158,976 ARM Processors.

In this blog post, we’ll look at some of the consequences of these moves.

Apple Macs Move to ARM

The big announcement at this year’s Apple WorldWide Developers Conference is that Apple will be phasing out Intel processors in their Mac desktop and laptop computers. You wouldn’t know they are switching to ARM processors from all their marketing speak, which exclusively talks about the switch from Intel to Apple Silicon. But the heart of Apple Silicon are ARM CPU cores. The name Apple Silicon refers to the System on a Chip (SoC) that they are building around the ARM processors. These SoCs will include a number of ARM cores, a GPU, an AI processor, memory manager and other support functions.

Developers can pay $500 to get an iMac mini running the same ARM CPU as the latest iPad Pro, the downside is that you need to give this hardware back when the real systems ship at the end of this year. It is impressive that you can get a working ARM Mac running MacOS along with a lot of software already including the XCode development system. One cool feature is that you can run any iPad or iPhone app on your Mac, now that all Apple devices share the same CPU.

The new version of MacOS for ARM (or Apple Silicon) will run Intel compiled programs in an emulator, but the hope from Apple is that developers will recompile their programs for ARM fairly quickly, so this won’t be needed much. The emulation has some limitations, in that it doesn’t support Intel AVX SIMD instructions or instructions related to virtualization.

For developers converting their applications, if they have Assembly Language code, this will have to be converted from Intel Assembly to ARM Assembly and of course a great resource to do this is my book:

I’m excited to see what these new models of ARM based Apple computers look like. We should see them announced as we approach the Christmas shopping season. Incorporating all the circuitry onto a single chip will make these new computers even slimmer, lighter and more compact. Battery life should be far longer but still with great performance.

I think Apple should be thanking the Raspberry Pi world for showing what you can do with SoCs, and for driving so much software to already be ported to the ARM processor.

One possible downside of the new Macs, is that Apple keeps talking about the new secure boot feature only allowing Apple signed operating systems to boot as a security feature. Does this mean we won’t be able to run Linux on these new Macs, except using virtualization? This will be a big downside, especially down the road when Apple drops support for them. Apple makes great hardware that keeps on working long after Apple no longer supports it. You can get a lot of extra life out of your Apple hardware by installing Linux and keeping on trucking with new updates.

New Ampere Server ARM Chips

Intel and AMD have long dominated the server and data center markets, but that is beginning to change. Amazon has been designing their own ARM chips for AWS and Ampere has been providing extremely powerful ARM based server chips for everyone else. Last year they announced an 80-core ARM based server chip which is now in production. Just this week they announced the next generation which is a 128-core ARM server chip.

If you aren’t interested in a server, but would like a workstation containing one of these chips then you could consider a computer from Avantek such as this one.

These are just one of several powerful ARM based server chips coming to market. It will be interesting to see if there is a lot of uptake of ARM in this space.

Japan’s Fugaku ARM Based Supercomputer is Number One

Japan just took the number one spot in the list of the world’s most powerful supercomputers. The Fugaku supercomputer is located in Kobe and uses 158,976 Fujitsu 48-core ARM SoCs. Of course this computer runs Linux and currently is being used to solve protein folding problems around developing a cure for COVID-19, similar to folding@home. This is a truly impressive warehouse of technology and shows where you can go with the ARM CPU and the open source Linux operating system.

Summary

ARM conquered the mobile world some years ago, and now it looks like ARM is ready to take on the rest of the computer industry. Expect to see more ARM based desktop and laptop computers than just Macs. Only time will tell whether this is a true threat to Intel and AMD, but the advantage ARM has over previous attempts to unseat Intel as king is that they already have more volume production than Intel and AMD combined. The Intel world has stagnated in recent years, and I look forward to seeing the CPU market jump ahead again.

Written by smist08

June 24, 2020 at 11:28 am

Raspberry Pi Gets 8Gig and 64-Bits

leave a comment »

Introduction

The Raspberry Pi Foundation recently announced the availability of the Raspberry Pi 4 with 8-Gig of RAM along with the start of a beta for a 64-bit version of the Raspberry Pi OS (renamed from Raspbian). This blog post will look into these announcements, where the Raspberry Pi is today and where it might go tomorrow.

I’ve written two books on ARM Assembly Language programming, one for 32-bits and one for 64-bits. All Raspberry Pis have the ARM CPU as their brains. If you are interested in learning Assembly language, the Raspberry Pi is the ideal place to do so. My books are:

32- Versus 64-Bits

Previously the Raspberry Pi Foundation had been singing the virtues of their 32-bit operating system. It uses less memory than a 64-bit operating system and would run on every Raspberry Pi ever made. Further if you really wanted 64-bits then you could run alternative versions of Linux from Ubuntu, Gentoo or Kali. The limitation of 32-bits is that you can only address 4 Gig of memory, so this seems like a problem for an 8-gig device, but 32-bit Raspbian handles this and in fact each process can have up to 4-gig of RAM and hence all the 8gig will get used if needed, just across multiple processes.

The downside to this is that the ARM 64-bit instruction set is faster, the memory addressing is simpler without this extra Linux memory management and modern ARM processors are optimised around 64-bits and only maintain 32-bits for compatibility. There are no new improvements to the 32-bit instruction set and typically it can’t take advantage of newer features and optimizations in the processor.

The Raspberry Pi foundation has released a beta version of the Raspberry Pi OS where the kernel is compiled for 64-bits. Many of the applications are still 32-bits but can run fine in compatibility mode, this is just a band-aid until everything is compiled 64-bit. I’ve been running the 64-bit version of Kali Linux on my Raspberry Pi 4 with 4-gig for a year now and it is excellent. I think the transition to 64-bits is a good one and there will be many benefits down the road.

New Hardware

The new Raspberry Pi 4 model with 8-gig of RAM is similar to the older model. The change was facilitated by the recent availability of a 8-gig RAM chip in a compatible form factor. They made some small adjustments to the power supply circuitry to handle the slightly higher power requirements of this model. Otherwise, everything else is the same. If a 16-gig part becomes available they would be able to offer such a model as well. The current Raspberry Pi memory controller can only handle up to 16-gig, so to go higher, this would need to be upgraded as well.

The new model costs $75 USD with 8-gig of RAM. The 2-gig model is still only $35 USD. This is incredibly inexpensive for a computer, especially given the power of the Pi. Remember this is the price for the core unit, you still need to provide a monitor, cables, power supply, keyboard and mouse.

Raspberry Pi Limitations

For most daily computer usage the Raspberry Pi is fine. But what is the difference between the Raspberry Pi and computers costing thousands of dollars. Here are the main ones:

  1. No fast SSD interface. You can connect an SSD or mechanical harddrive to a Raspberry Pi USB port, but this isn’t as fast as if there was an M.2 or SATA interface. M.2 would be ideal for a Raspberry Pi given its compact size. Adding an M.2 slot shouldn’t greatly increase the price of a Pi.
  2. Poor GPU. On most computers GPUs can be expensive. For $75 or less you get an older less powerful GPU. A better GPU, like ARM’s Mali GPU or some nVidia CUDA cores would be terrific, but will probably double or triple the price of the Pi. Even with the poor GPU, the RetroPi game system is terrific.
  3. Faster memory interface. The Raspberry Pi 4 has DDR4 memory, but it doesn’t compare will to other computers with DDR4. This probably indicates a bottleneck in either the PCI bus or Pi memory controller. I suspect this keeps the price low, but limits CPU performance due to bottlenecks limiting the data flow to and from memory.

If the Raspberry Pi addressed these issues, it would be competitive with most computers costing hundreds of dollars more.

Summary

The 8-gig version of the Raspberry Pi is a powerful computer for only $75. Having 8-gig of RAM allows you to run more programs at once, have more browser windows open and generally have more work in progress at one time. Each year the Raspberry Pi hardware gets more powerful. Combine this with the forthcoming 64-bit version of the Raspberry Pi OS and you have a powerful system that is ideal for the DIY hobbyist, for people learning about programming, and even people using it as a general purpose desktop computer.

Written by smist08

June 5, 2020 at 4:36 pm

Browsing MSDOS and GW-Basic Source Code

leave a comment »

Introduction

These days I mostly play around with ARM Assembly Language and have written two books on it:

But long ago, my first job out of University involved some Intel 80186 Assembly Language programming, so I was interested when Microsoft recently posted the source code to GW-Basic which is entirely written in 8086 Assembly Language. Microsoft posted the source code to MS-DOS versions 1 and 2 a few years ago, which again is also entirely written in 8086 Assembly Language.

This takes us back to the days when C compilers weren’t as good at optimizing code as they are today, processors weren’t nearly as fast and memory was at a far greater premium. If you wanted your program to be useful, you had to write it entirely in Assembly Language. It’s interesting to scroll through this classic code and observe the level of documentation (low) and the programming styles used by the various programmers.

Nowadays, programs are almost entirely written in high-level programming languages and any Assembly Language is contained in a small set of routines that provide some sort of highly optimized functionality usually involving a coprocessor. But not too long ago, often the bulk of many programs consisted entirely of Assembly Language.

Why Release the Source Code?

Why did Microsoft release the source code for these? One reason is that they are a part of computer history now and there are historians that want to study this code. It provides insight into why the computer industry progressed in the manner it did. It is educational for programmers to learn from. It is a nice gesture and offering from Microsoft to the DIY and open source communities as well.

The other people who greatly benefit from this are those that are working on the emulators that are used in systems like RetroPie. Here they have emulators for dozens of old computer systems that allow vintage games and programs to be run on modern hardware. Having the source code for the original is a great way to ensure their emulations are accurate and a great help to fixing bugs correctly.

Example

Here is an example routine from find.asm in MS-DOS 2.0 to convert a binary number into an ASCII string. The code in this routine is typical of the code throughout MS-DOS. Remember that back then MS-DOS was 16-bits so AX is 16-bits wide. Memory addresses are built using two 16-bit registers, one that provides a segment and the other that gives an offset into that 64K segment. Remember that MS-DOS can only address memory upto 640K (ten such segments).

;——————————————————————–
;       Binary to Ascii conversion routine                
;                                                                 
; Entry:                                                          
;       DI      Points to one past the last char in the             
;       AX      Binary number                                       
;             result buffer.                                        
;                                                                   
; Exit:                                                             
;       Result in the buffer MSD first                            
;       CX      Digit count                                         
;                                                                   
; Modifies:                                                         
;       AX,BX,CX,DX and DI                                          
;                                                                   
;——————————————————————–
bin2asc:
        mov     bx,0ah
        xor     cx,cx
go_div:
        inc     cx
        cmp     ax,bx
        jb      div_done
        xor     dx,dx
        div     bx
        add     dl,’0′          ;convert to ASCII
        push    dx
        jmp     short go_div
div_done:
        add     al,’0′
        push    ax
        mov     bx,cx
deposit:
        pop     ax
        stosb
        loop    deposit
        mov     cx,bx
        ret

For an 8086 Assembly Language programmer of the day, this will be fairly self evident code and they would laugh at us if we complained there wasn’t enough documentation. But we’re 40 or so years on, so I’ll give the code again but with an explanation of what is going on added in comments.

bin2asc:
        mov     bx,0ah ; we will divide by 0ah = 10 to get each digit
        xor     cx,cx ; cx will be the length of the string, initialize it to 0
go_div:
        inc     cx ; increment the count for the current digit
        cmp     ax,bx ; Is the number < 10 (last digit)?
        jb      div_done   ; If so goto div_done to process the last digit
        xor     dx,dx ; DX = 0
        div     bx ; AX = AX/BX  DX=remainder
        add     dl,’0′          ;convert to ASCII. Know remainder is <10 so can use DL
        push    dx ; push the digit onto the stack
        jmp     short go_div ; Loop for the next digit
div_done:
        add     al,’0′ ; Convert last digit to ASCII
        push    ax ; Push it on the stack
        mov     bx,cx ; Move string length to BX
deposit:
        pop     ax ; get the next significant digit off the stack.
        stosb ; Store AX at ES:DI and increment DI
       ; Loop decrements CX and branches if CX not zero.
; Falls through when CX=0
        loop    deposit
        mov     cx,bx ; Put the count back in CX
        ret ; return from routine.

A bit different than a C routine. The routine assumes the DF flag is set, so the stosb increments the memory address, perhaps this is a standard across MS-DOS or perhaps it’s just local to this module. I think the comment is incorrect and that the start of the output buffer is passed in. The routine uses the stack to reverse the digits, since the dividing by 10 algorithm peels off the least significant digit first and we want the most significant digit first in the buffer. The resulting string isn’t NULL terminated so perhaps MS-DOS treats strings as a length and buffer everywhere.

Comparison to ARM

This code is representative of CISC type processors. The 8086 has few registers and their usage is predetermined. For instance the DIV instruction is only passed one parameter, the divisor. The dividend, quotient and remainder are set in hard-wired registers. RISC type processors like the ARM have a larger set of registers and tend to have three operands per instruction, namely two input registers and an output register.

This code could be assembled for a modern Intel 64-bit processor with little alteration, since Intel has worked hard to maintain a good level of compatibility as it has gone from 16-bits to 32-bits to 64-bits. Whereas ARM redesigned their instruction set when they went from 32-bits to 64-bits. This was a great improvement for ARM and only possible now that the amount of Assembly Language code in use is so much smaller.

Summary

Kudos to Microsoft for releasing this 8086 Assembly Language source code. It is interesting to read and gives insight into how programming was done in the early 80s. I hope more classic programs have their source code released for educational and historical purposes.

Written by smist08

May 25, 2020 at 6:56 pm

Benchmarking with Folding@Home

with 2 comments

Benchmarking with Folding@Home

Introduction

There are a lot of different benchmarks to rate the performance of computers, but I’m finding it interesting watching the performance of the various computers around the house running Folding@Home which I blogged about last time. Folding@Home provides all sorts of interesting statistics that I’ll talk about as well. The main upshot is how much processing power a GPU has compared to a CPU.

My Benchmarks

Here are some of the specs and the Folding@Home statistics for a number of the computers lying around my house:

Brand CPU # CPUs Memory Points per Day
MSI Intel i7 9750H

10

16Gig

42,337

MSI nVidia GTX 1650

896

4Gig

262,389

HP Intel i3 6300U 

4

4Gig

16,178

Apple Core 2 Duo

2

4Gig

2,073

Dell Celeron N2840

2

4Gig

565

 

These are all laptop computers. The Dell is a Chromebook that is running GalliumOS. Here are a few takeaways from these benchmarks:

  • Intel Celerons that are used in many cheap Chromebooks are terribly underpowered.
  • The Core 2 Duo does not contain the newer AVX SIMD instructions added with the Intel Core line of processors and that is why the 2008 era MacBooks do so poorly.
  • GPUs are far more powerful than CPUs for this sort of calculation.
  • On the MSI Gaming Laptop there are 12 logical CPUs, but one is controlling the GPU and one is being used by myself to write this article.

Folding@Home Operating System Statistics

Here are the operating system statistics as of May 1, 2020 taken from Folding@Home statistics page.

OS AMD GPUs NVidia GPUs CPUs CPU cores TFLOPS x86 TFLOPS
Windows

113,145

399,642 1,018,463 6,415,885 911,141 1,832,063
Linux

12,045

127,665 2,001,996 14,315,265 425,332

681,347

macOSX

136

0 96,503 477,058 5,537

5,753

Totals

125,326

527,307

3,116,962 21,208,208 1,342,010

2,519,163

 

Here are some takeaways from these numbers:

  • There are twice as many Linux CPUs at Windows CPUs. I think this is partly because technically advanced users that are more likely to use Folding@Home prefer Linux.
  • NVidia GPUs outnumber AMD GPUs by 3 to 1.
  • Most people with GPUs run Windows. Shows the draw of most gaming being on Windows.
  • It’s sad that Apple doesn’t offer GPUs on very models.

No ARM Support

It’s sad that Folding@Home doesn’t support the ARM processor. I’d love to add my various Raspberry Pis and nVidia Jetsons to this table. I think this would also greatly increase the number of Linux CPUs shown. Newer ARM CPUs all have NEON SIMD coprocessors and there could be support for common ARM GPUs.

Summary

It would be nice if computer reviews start adding Folding@Home benchmarks to all the other benchmarks they publish. I find it interesting comparing how various CPUs and GPUs do when running this algorithm.

Written by smist08

May 1, 2020 at 11:30 am

Posted in Business

Tagged with , ,

Restoring an Old MacBook Pro

leave a comment »

Introduction

A number of volunteers with the Sunshine Coast Tech Hub work with the Gibsons Public Library to offer a number of kids coding classes. These include programming Sphero Robots, programming Lego Mindstorms and basic Arduino programming. Originally the library used six 2008 MacBook Pros for these classes. However when these MacBooks went out of support they liquidated them and the Tech Hub took them over. Meanwhile the library bought a number of ChromeBooks which have been fine for the Sphero and Lego Mindstorm classes, but don’t work for the Arduino classes.

Since MacOS is no longer supported on these laptops, we installed Linux Mint on them all and they have been working fine. However they are old and only have 1Gig of RAM and 120Gig mechanical harddrives. Now that they are twelve years old, we figured we could upgrade them to see if we can get some more life out of them. This blog post is about this upgrade process, how it worked out and how these laptops from 2008 fair today in 2020.

These MacBooks Are Upgradeable!

Lately, I’ve become accustomed to laptops that are very hard to upgrade. The memory is usually soldered to the motherboard or incorporated into other components and hence not upgradeable. Harddrives are usually replaceable, but doing so is usually very hard, requiring much disassembly and fiddling.

I was pleasantly surprised at how easy it is to upgrade the RAM and harddrives in these old MacBooks. The battery is interchangeable, you can remove the battery using a quarter to release it. Then behind the battery compartment is the harddrive and RAM. You need to remove a guard, which is held in place by 3 regular phillips screws. Then you can pop the harddrive and RAM.

I ordered an SSD harddrive and 4Gig RAM from eBay. The 120Gig SSD was $30 and the RAM was $14. I popped these in, booted from a USB key and copied the old harddrive to the new one all using GParted. It was a surprisingly easy and straightforward procedure and everything worked fine. I borrowed a USB SATA harddrive caddy to connect the old drive.

Now I wish modern laptops have the same upgradeability of these old MacBooks.

Benchmarks

Here are some sysbench benchmarks on the MacBook Pro upgrade:

Computer CPU Memory Disk Read Disk Write
HP i3 laptop 312.91 4906.62 13.06 8.71
MacBook Pro Original 313.31 343.76 0.72 0.48
MacBook Pro Updated 313.51 343.19 27.75 18.5
Pi 64-Bit Kali 585.62 3731.93 0.85 0.57

 

The difference comparing before and after the upgrade is like night and day. The MacBook boots orders of magnitude faster and the laptop now feels like a decent laptop.

The HP laptop is circa 2015 or so. Interesting that a 2015 i3 gets about the same CPU score as a 2008 Core2Duo. I was surprised how well the Raspberry Pi did on the CPU test. I think it shows ARM based laptops can cut it, especially since the Pi runs ARM processors that are a couple of generations old.

Notice the SSD improves disk performance by about 40 times. Pretty good. The HP laptop is also an SSD, but a couple of years old now. Chances are the HP disk controller isn’t as good as the Apple one. The Pi is running off an SD card, which appears to have the same performance as an old mechanical drive. The MacBook uses a 2.5” SATA hardrive, which is the same form factor largely in use still today, the newer SATA standard is backwards compatible and new drives seem to work fine in quite old computers.

The MacBooks use DDR2, the HP DDR3 and the Pi DDR4. I was surprised the Pi didn’t do better on the memory benchmark. I suspect the Broadcom memory controller it uses isn’t very good, since it should outperform the HP laptop.

I upped the swap size on the upgraded macbook from 1gig to 13gig, so now it has piles of memory. Seems to run pretty well. One improvement is that the wifi seems reliable now (it definitely wasn’t before).

Intel Shooting Itself in the Foot

One thing this exercise showed me is how Intel shoots itself in the foot by producing crippled chips for the consumer market to keep prices high on their top end chips. All Intel chips are basically the same and produced at the same cost, but Intel cripples functionality to provide a range of price/performance points. For the HP laptop, at that point the Intel i7 was the top end chip and then the i3 is a crippled version offered at a lower price. So a i3 from 2015 has the same processing power as a Core2Duo from 2008. I think this is part of the reason Intel is in so much trouble. They spend far too much time crippling their own products so they don’t compete with each other that they leave the market open to competitors that don’t play these games. Hence the upswing in AMD and the emergence of ARM processors taking over from Intel. Intel is going to have a tough time recovering from this mess that they created for themselves.

Summary

Upgrading the 2008 era MacBook Pros to 4Gig and SSD harddrives keeps them chugging along as productive useful computers. Linux Mint works great on them. The displays are still better than modern cheap laptops and surprisingly the batteries still work for between 1 and 2 hours. There aren’t many technology products that you can get this much life out of, 12 years of productivity and counting. It would sure be nice if modern laptops would allow upgradeability over shaving another millimeter off the thickness.

Written by smist08

March 28, 2020 at 11:54 am

Posted in Business

Tagged with , , , , ,

On ARM Based MacBooks

with one comment

Introduction

There’re a lot of rumours circulating that Apple will start releasing ARM based MacBooks towards the end of this year or early next year. There has been a lot of negativity on these possibilities, but I feel this will be an overall positive for Apple and Mac computers. In this blog post, I’m going to look at the commonly mentioned criticisms of this and debunk these as the myths they are.

Performance of ARM vs Intel or AMD

Many claim that Intel and AMD chips are faster than ARM chips. This only applies at the high end. Both Intel and AMD produce excellent chips at the high end, that they charge a fortune for. Then they release lower powered versions for general consumer release. Further, the most powerful AMD and Intel chips are power hungry and don’t appear in laptops due to the heat they produce and how fast they drain batteries.

In most laptops you run less powerful, less power hungry chips. For ARM chips to be competitive, they only need to compare well to Intel i3 or i5 chips. These are what most people really run. I’m writing this on an older i3 based laptop, when I run CPU benchmarks on this laptop, it scores half of the ARM CPU in my Raspberry Pi 4. The ARM CPU in a Raspberry Pi is an older ARM chip to keep the price so low.

The CPU power per watt processing power of ARM processors is far superior to Intel or AMD chips. Further ARM processors typically have more cores and coprocessors than Intel or AMD chips. This is because Intel and AMD want to have such a wide line of processors and take so much out, to keep up demand for their more expensive products.

Availability of Software

There is a claim that no one will compile ARM versions of their software or bother to port their programs from the Intel instruction set to the ARM instruction set. This problem was solved by the Raspberry Pi. When I started with the Raspberry Pi, many common Linux programs either wouldn’t work on the Pi or you needed to build them yourself from source. Now every major Linux open source product, produces ARM 32 and 64-bit binaries as part of their regular build process.

Further, the Apple ecosystem has familiarity with ARM since the iPhone and iPad market is far larger than the MacOS market.

Sure, Microsoft has trouble getting software for their ARM based Surface laptops, but that is unique to the Windows world and doesn’t apply to Linux or Apple.

Apple has the experience to move their ecosystem across CPU architectures. They moved the MacOS world from PowerPC to Intel. This transition looks far easier.

Problems in the Windows World Apply to Apple

There have been a number of attempts to move Microsoft Windows to a non-Intel platform. So far these have all failed. The problem in the Windows world is that there is tons of software out there, much of it from legacy companies that have gone out of business and the source code isn’t available to re-compile. The legacy of rejecting open source software and promoting vendor lock-in has now tied Microsoft’s hands, preventing them from moving forwards.

The other problem is that Microsoft has tried to use this transition as a mechanism of locking customers in. For instance only allowing software to be installed from the Microsoft store. Or limiting the functionality in the ARM version, to promote demand for their more expensive products.

Advantages of ARM Based Laptops

There are several advantages for Apple going with ARM processors in their MacBooks:

  1. The ARM processor uses less power, so battery life will be far longer.
  2. It provides differentiation between Apple products and the millions of Intel/AMD Windows laptops on the market.
  3. It reduces Apple’s cost for CPUs by 60%.
  4. It allows Apple more room to innovate in their laptop line.
  5. The ARM Processors are produced by multiple manufacturers, so Apple can use the best of breed rather than relying on Intel’s lagging process manufacturing technology. 

Summary

I’m looking forward to a wave of ARM based laptops whether from Apple or from the various Linux vendors. I think this is the Intel/AMD duopoly’s last stand. Competition is only good. I’m tired of using crippled chips like the i3 or Celeron and look forward to much greater processing power at a lower cost with longer lasting batteries.

Written by smist08

March 27, 2020 at 2:01 pm

Posted in Business

Tagged with , , , ,

Introducing Risc-V

with 4 comments

Introduction

Risc-V (pronounced Risc Five) is an open source hardware Instruction Set Architecture (ISA) for Reduced Instruction Set Computers (RISC) developed by UC Berkeley. The Five is because this is Berkeley’s fifth RISC ISA design. This is a fully open standard, meaning that any chip manufacturer can create CPUs that use this instruction set without having to pay royalties. Currently the lion’s share of the CPU market is dominated by two camps, one is the CISC based x86 architecture from Intel with AMD as an alternate source, the other is the ARM camp where the designs come from ARM Holdings and then chip manufacturers can license the designs with royalty agreements.

The x86 architecture dominates server, workstation and laptop computers. These are quite powerful CPUs, but at the expense of using more power. The ARM architecture dominates cell phones, tables and Single Board Computers (SBCs) like the Raspberry Pi, these are usually a bit less powerful, but use far less power and are typically much cheaper.

Why do we need a third camp? What are the advantages and what are some of the features of Risc-V? This blog article will start to explore the Risc-V architecture and why people are excited about it.

Economies of Scale

The computer hardware business is competitive. For instance Western Digital harddrives each contain an ARM CPU to manage the controller functions and handle the caching. Saving a few dollars for each drive by saving the ARM royalty is a big deal. With Risc-V, Western Digital can make or buy a specialized Risc-V processor and then save the ARM royalty, either improving their profits or making their drives more price competitive.

The difficulty with introducing a new CPU architecture is to be price competitive you have to manufacture in huge quantities or your product will be very expensive. This means for there to be inexpensive Risc-V processors on the market, there has to be some large orders and that’s why adoption by large companies like Western Digital is so important.

Another giant boost to the Risc-V world is a direct result of Trump’s trade was with China. With the US restricting trade in ARM and x86 technology to China, Chinese computer manufacturers are madly investing in Risc-V, since it is open source and trade restrictions can’t be applied. If a major Chinese cell phone manufacturer can no longer get access to the latest ARM chips, then switching to Risc-V will be attractive. This is a big risk that Trump is taking, because if the rest of the world invests in Risc-V, then it might greatly reduce Intel, AMD and ARM’s influence and leadership, having the opposite effect to what Trump wants.

The Software Chicken & Egg Problem

If you create a wonderful new CPU, no matter how good it is, you still need software. At a start you need operating systems, compilers and debuggers. Developing these can be as expensive as developing the CPU chip itself. This is where open source comes to the rescue. UC Berkeley along with many other contributors added Risc-V support to the GNU Compiler Collection (GCC) and worked with Debian Linux to produce a Risc-V version of Linux.

Another big help is the availability of open source emulator technology. You are very limited in your choices of actual Risc-V hardware right now, but you can easily set up an emulator to play with. If you’ve ever played with RetroPie, you know the open source world can emulate pretty much any computer ever made. There are several emulator environments available for Risc-V so you can get going on learning the architecture and writing software as the hardware slowly starts to emerge.

Risc-V Basics

The Risc-V architecture is modular, where you start with a core simple arithmetic unit that can load/store registers, add, subtract, perform logical operations, compare and branch. There are 32 registers labeled x0 to x31. However x0 is a dedicated zero register. There is also a program counter (PC). The hardware doesn’t specify any other functionality to the registers, the rest is by software convention, such as which register is the stack pointer, which registers are used for passing function parameters, etc. Base instructions are 32-bits, but an extension module allows for 16-bit compressed instructions and extension modules can define longer instructions. The specification supports three different address sizes: 32-bit, 64-bit and 128-bit. This is quite forward thinking as we don’t expect the largest most powerful computer in the world to exceed 64-bits until 2030 or so.

Then you start adding modules like the multiply/divide module, atomic instruction module, various floating point modules, the compressed instruction module, and quite a few others. Some of these have their specifications frozen, others are still being worked on. The goal is to allow chip manufacturers to produce silicon that exactly meets their needs and keeps power utilization to a minimum.

Getting Started

Most of the current Risc-V hardware available for DIYers are small low power/low memory microcontrollers similar to Arduinos. I’m more interested in getting a Risc-V SBC similar to a Raspberry Pi or NVidia Jetson. As a result I don’t have a physical Risc-V computer to play with, but can still learn about Risc-V and play with Risc-V Assembly language programming in an emulator environment.

I’ll list the resources I found useful and the environment I’m using. Then in future blog articles, I’ll go into more detail.

  • The Risc-V Specifications. These are the documents on the ISA. I found them readable, and they give the rationale for the decisions they took along with the reasons for a number of roads they didn’t go down. The only thing missing are practical examples.
  • The Debian Risc-V Wiki Page. There is a lot of useful information here.  A very big help was how to install the Risc-V cross compilation tools on any Debian release. I used these instructions to install the Risc-V GCC tools on my Ubuntu laptop.
  • TinyEMU, a Risc-V Emulator. There are several Risc-V emulators, this is the first one I tried and its worked fine for me so far.
  • RV8 a Risc-V Emulator. This emulator looks good, but I haven’t had time to try it out yet. They have a good Risc-V instruction set summary.
  • SiFive Hardware. SiFive have produced a number of limited run Risc-V microcontrollers. Their website has lots of useful information and their employees are major contributors to various Risc-V open source projects. They have started a Risc-V Assembly Programmers Guide.

Summary

The Risc-V architecture is very interesting. It is always nice to start with a clean slate and learn from all that has gone before it. If this ISA gains enough steam to achieve volumes where it can compete with ARM, it is going to allow very powerful low cost computers. I’m very hopeful that perhaps next year we’ll see a $25 Risc-V based Raspberry Pi 4B competitor with 4Gig RAM and an M.2 SSD slot.

Written by smist08

September 6, 2019 at 6:07 pm

Posted in Business

Tagged with , , , ,

Low Cost Linux Notebooks

leave a comment »

Introduction

Theoretically, a notebook running Linux should be inexpensive, since you don’t need a Windows license and Linux runs well without premium hardware. In reality, buying a Linux notebook tends to be expensive on premium hardware. There are companies like Purism and System76 that produce Linux only laptops but these are high-end expensive. Similarly, companies like Dell seem to charge extra if you want Linux. In this article we’ll look at some options for running Linux inexpensively. We’ll look at the tradeoffs, including privacy and security.

Used, Refurbished or Discounted Windows Notebooks

Windows Notebooks have the advantage of mass-production and competition. There are tons of companies producing Windows notebooks. You can find great deals on sale, plus there is a huge market of refurbished lease returns that offer great deals. Also, companies take returns from retailers like Amazon, make sure they are ok and then sell them at a big discount. You then need to install your favorite Linux distribution and then you are up and running. You can even set it up so you can dual boot either Linux or Windows.

If you are concerned about privacy and security, then the downside of Windows notebooks is that they run the UEFI BIOS. This BIOS has backdoors built in so the NSA, and probably other governments, can remotely take control of your computer.

All that being said, if a notebook runs Windows well, it will run Linux better. A great way to bring an old slow laptop or notebook back to life, is to wipe Windows and replace it with Linux. I’m writing this on an old HP laptop which became slower and slower running Windows 10. Now with Ubuntu Linux, it runs great. No more Windows bitrot and it has a whole new life.

Chromebooks

Even cheaper than Windows notebooks, are Chromebooks. These are notebooks designed to run Google’s ChromeOS. These notebooks are cheaper because they don’t require a Windows license and they usually don’t include a harddrive. Instead of a harddrive they have a small memory card usually 16Gig or 32Gig. Chrome OS is based on a Linux kernel, but restricts you in a few ways. You need to sign on using a Google ID, then you install Apps (basically Android apps) via the Google Play store.

Earlier versions couldn’t run regular Linux apps; however, Google has been relaxing this and now allows you to install and run many Linux apps and run a terminal window. Over time Chrome OS has been slowly morphing into full Linux. From being just a portal to Google’s web apps to being a full client operating system. However, I find Chrome OS is still too limiting and there is the issue of having to sign on with Google.

Out of the box, you can’t just install Linux on a Chromebook. The BIOS is locked to only running Chrome OS. The BIOS in Chromebooks is based on Coreboot the open source which is good, however they modified it without providing the source code, so we don’t know if they added hooks for the NSA to spy on you. The Google BIOS does provide a developer mode, this developer mode gives you a root access terminal session and allows you to install and run flavours of Linux from inside Chrome OS using a set of shell scripts called crouton. Many people prefer this method as they get both Linux and Chrome OS at the same time.

Upgrade the BIOS

If you want to boot directly into an alternate OS, you usually need to upgrade the Chromebook’s BIOS to allow this. I bought an inexpensive refurbished Dell Chromebook 11 off Amazon for $100 (CAD). There are two ways to do this, one is reversible, the other isn’t and you run the risk of bricking your device. The Dell’s BIOS is divided into two parts, one is upgradable, and can be reversed using a recovery USB stick. The other requires disassembling the notebook, removing a BIOS write protect tab and then burning the whole BIOS.

I went the reversible route. I made a recovery USB stick and upgraded the BIOS to support booting other operating systems. This isn’t perfect as you are still using Google’s unknown BIOS and you have to hit control-L everytime you boot to run your alternate operating system.

The reason people will risk replacing their whole BIOS is to get a pure version of Coreboot that hasn’t been tampered with by Google. You then have full control of your computer, no developer mode and no control-L to boot. Perhaps one day I’ll give this a try.

Once you have your BIOS updated, you can install Linux from a USB stick. I chose to install GalliumOS, which is tailored for Chromebooks. It installs a minimal Linux, since it knows Chromebooks don’t have much disk space. It also includes all the drivers needed for typical Chromebook trackpads, bluetooth and Wifi. The Gallium OS website has great information, with links to how to upgrade your BIOS and otherwise prepare and complete a successful upgrade.

Another choice is LUbuntu (Light Ubuntu), which is Ubuntu Linux optimized for low memory hardware. I didn’t like this distro as much, probably because it is so optimized for low memory, whereas I have 4GB memory, it is disk space I’m short of (only 16GB). So I didn’t really need the low memory desktop, and would have preferred LibreOffice being left out.

A great source of info on updating Chromebook BIOS’s is MrChromebox. Its interesting because they also have lots of information on how to install UEFI BIOS on a Chromebook, so you can use it as a cheap Windows notebook. You could install UEFI and then run Linux, but why would you want to? Unless you want to be helpful to the NSA and other government spy agencies.

Impressions/Summary

Sadly, running Linux on a converted Windows notebook gives the better experience. At this point, despite the privacy concerns, the UEFI BIOS works better with Linux than Coreboot. On the Chromebook, besides the nuisance of having to hit control-L every time it boots, I found some things just didn’t work well. The main problem I had was closing and opening the lid on the notebook, that Linux’s suspend function didn’t work properly. Often when I opened the lid, Linux didn’t unsuspend and I’d have to do a hard power off- power on which then resulted in a disk corruption scan.  Otherwise bluetooth, wifi and the trackpad work fine.

I also think the small memory cards are a problem. I think you’re better off booting from a regular SSD hard drive. These are inexpensive and give you way more space with better performance. I wish there was a cheap Chromebook with an M.2 interface. Or even one where the memory card isn’t glued to the motherboard and in an accessible location.

I really want an inexpensive notebook with privacy and security. The best option right now is to convert a Chromebook over to full Coreboot and then run a privacy oriented version of Linux like PureOS, but right now this is quite a DIY project.

 

Written by smist08

August 9, 2019 at 6:46 pm

Posted in Business

Tagged with , , , , , ,

Spectre Attacks on ARM Devices

with 2 comments

Introduction

I predicted that 2018 would be a very bad year for data breaches and security problems, and we have already started the year with the Intel x86 specific Meltdown exploit and the Spectre exploit that works on all sorts of processors and even on some JavaScript systems (like Chrome). Since my last article was on Assembler programming and most of these type exploits are created in Assembler, I thought it might be fun to look at how Spectre works and get a feel for how hackers can retrieve useful data out of what seems like nowhere. Spectre is actually a large new family of exploits so patching them all is going to take quite a bit of time, and like the older buffer overrun exploits, are going to keep reappearing.

I’ve been writing quite a bit about the Raspberry Pi recently, so is the Raspberry Pi affected by Spectre? After all it affects all Android and Apple devices based on ARM processors. The main Raspberry Pi operating system is Raspbian which is variant of Debian Linux optimized for the Pi. A recent criticism of Raspbian is that it is still 32-Bit. It turns out that running the ARM in 32-bit mode eliminates a lot of the Spectre attack scenarios. We’ll discuss why this is in the article. If you are running 64-Bit software on the Pi (like running Android) then you are susceptible. You are also susceptible to the software versions of this attack like those in JavaScript interpreters that support branch prediction (like Chromium).

The Spectre hacks work by exploiting how processor branch prediction works coupled with how data is cached. The exploits use branch prediction to access data it shouldn’t and then use the processor cache to retrieve the data for use. The original article by the security researchers is really quite good and worth a read. Its available here. It has an appendix at the back with C code for Intel processors that is quite interesting.

Branch Prediction

In our last blog post we mentioned that all the data processing assembler instructions were conditionally executed. This was because if you perform a branch instruction then the instruction pipeline needs to be cleared and restarted. This will really stall the processor. The ARM 32-bit solution was good as long as compilers are good at generating code that efficiently utilize these. Remember that most code for ARM processors is compiled using GCC and GCC is a general purpose compiler that works on all sorts of processors and its optimizations tend to be general purpose rather than processor specific. When ARM evaluated adding 64-Bit instructions, they wanted to keep the instructions 32-Bit in length, but they wanted to add a bunch of instructions as well (like integer divide), so they made the decision to eliminate the bits used for conditionally executing instructions and have a bigger opcode instead (and hence lots more instructions). I think they also considered that their conditional instructions weren’t being used as much as they should be and weren’t earning their keep. Plus they now had more transistors to play with so they could do a couple of other things instead. One is that they lengthed the instruction pipeline to be much longer than the current three instructions and the other was to implement branch prediction. Here the processor had a table of 128 branches and the route they took last time through. The processor would then execute the most commonly chosen branch assuming that once the conditional was figured out, it would very rarely need to throw away the work and start over. Generally this larger pipeline with branch prediction lead to much better performance results. So what could go wrong?

Consider the branch statement:

 

if (x < array1_size)
    y = array2[array1[x] * 256];


This looks like a good bit of C code to test if an array is in range before accessing it. If it didn’t do this check then we could get a buffer overrun vulnerability by making x larger than the array size and accessing memory beyond the array. Hackers are very good at exploiting buffer overruns. But sadly (for hackers) programmers are getting better at putting these sort of checks in (or having automated or higher level languages do it for them).

Now consider branch prediction. Suppose we execute this code hundreds of times with legitimate values of x. The processor will see the conditional is usually true and the second line is usually executed. So now branch prediction will see this and when this code is executed it will just start execution of the second line right away and work out the first line in a second execution unit at the same time. But what if we enter a large value of x? Now branch prediction will execute the second line and y will get a piece of memory it shouldn’t. But so what, eventually the conditional in the first line will be evaluated and that value of y will be discarded. Some processors will even zero it out (after all they do security review these things). So how does that help the hacker? The trick turns out to be exploiting processor caching.

Processor Caching

No matter how fast memory companies claim their super fast DDR4 memory is, it really isn’t, at least compared to CPU registers. To get a bit of extra speed out of memory access all CPUs implement some sort of memory cache where recently used parts of main memory are cached in the CPU for faster access. Often CPUs have multiple levels of cache, a super fast one, a fast one and a not quite as fast one. The trick then to getting at the incorrectly calculated value of y above is to somehow figure out how to access it from the cache. No CPU has a read from cache assembler instruction, this would cause havoc and definitely be a security problem. This is really the CPU vulnerability, that the incorrectly calculated buffer overrun y is in the cache. Hackers figured out, not how to read this value but to infer it by timing memory accesses. They could clear the cache (this is generally supported and even if it isn’t you could read lots of zeros). Then time how long it takes to read various bytes. Basically a byte in cache will read much faster than a byte from main memory and this then reveals what the value of y was. Very tricky.

Recap

So to recap, the Spectre exploit works by:

  1. Clear the cache
  2. Execute the target branch code repeatedly with correct values
  3. Execute the target with an incorrect value
  4. Loop through possible values timing the read access to find the one in cache

This can then be put in a loop to read large portions of a programs private memory.

Summary

The Spectre attack is a very serious new technique for hackers to hack into our data. This will be like buffer overruns and there won’t be one quick fix, people are going to be patching systems for a long time on this one. As more hackers understand this attack, there will be all sorts of creative offshoots that will deal further havoc.

Some of the remedies like turning off branch prediction or memory caching will cause huge performance problems. Generally the real fixes need to be in the CPUs. Beyond this, systems like JavaScript interpreters, or even systems like the .Net runtime or Java VMs could have this vulnerability in their optimization systems. These can be fixed in software, but now you require a huge number of systems to be patched and we know from experience that this will take a very long time with all sorts of bad things happening along the way.

The good news for Raspberry Pi Raspbian users, is that the ARM in the older 32-Bit mode isn’t susceptible. It is only susceptible through software uses like JavaScript. Also as hackers develop these techniques going forwards perhaps they can find a combination that works for the Raspberry, so you can never be complacent.

 

Written by smist08

January 5, 2018 at 10:42 pm