Stephen Smith's Blog

Musings on Machine Learning…

LinuxFest Northwest 2019

leave a comment »

Introduction

2019 is the 50th anniversary of Unix and the 25th anniversary of Linux. Last weekend, I attended the 20th LinuxFest Northwest 2019 show in Bellingham at the Bellingham Technical Conference. A great celebration with over 1200 attendees and 84 speakers. Most of the main Linux distributions were represented along with many hardware, software and service companies associated with Linux.

I attended many great presentations and learned quite a lot. In this article, I’ll give a quick survey of what I got out of the conference. In each time slot there was typically ten talks to choose from and I chose the one that interested me the most. I tended to go to the security and overview presentations.

Computers are Broken

The first presentation I went to was “Computers are Broken (and we’re all going to die)” by Bryan Lunduke. This presentation laid out the problems with the continued increase in the complexity of all software. How this is slowing down current development, since programming teams need to be much larger and understanding what is already there is so difficult. He gave his presentation running Windows for Workgroups 3.11 and Powerpoint 4. His point was he can do everything he needs with this, but with way less RAM, disk space and processing power. Lots of arguments on how software gets into everything and how hard it is to test, it is getting quite dangerous. Just look at Boeing’s problems with the 737 Max.

50 Years of Unix

Next I went to Maddog’s presentation on “50 Years of Unix, the Internet and more”. Maddog has been around Unix the whole time and had a lot of great stories from the history of Unix, Linux and computers. He spent most of his career at DEC, but has done many other things along the way.

Freedom, Security and Privacy

Then I went to Kyle Rankin’s talk, which started with a slide on Oxford commas and why there is only one comma in the title of his presentation. The Linux community has some very paranoid people and maintaining security and privacy are major themes of the conference. One of the most hated items by the Linux community is the UEFI BIOS and how it gives corporations and governments backdoors into everyone’s computers. If you can, get a computer with a CoreBoot BIOS which is open source and lacks all these security problems. One claim is that security in Linux is better because there are so many eyes on it, but he makes the point that unless they are the right eyes, you don’t really gain anything. Getting the best security researchers to test and analyse Linux remains a challenge. Also people tend to be a bit complacent on where they get their software, even if it’s open source, they don’t build it themselves, leaving room for bad things to be inserted.

Early Technology and Ideas for the Future

Jeff Fitzmaurice gave a presentation that looked at some examples from the history of science and how various theoretical breakthroughs led to technological developments. Then there was speculation on what developments in Science happening now, will lead to future technological developments. We discussed AI, materials science, quantum computing among others.

Ubuntu 19.04+

I went to Simon Quigley’s presentation on Ubuntu. Mostly because I use Ubuntu, both on this laptop and on my NVidia Jetson Nano. This talk covered what is new in 19.04 (Disco Dingo) and how work is going towards 19.10 (note the version numbers are year.month of the release target). I’ve been running the LTS (long term support) version and I was surprised to find out they only do a LTS every two years, so when I got home, I changed my configuration to install any new released version. It was interesting on how they need to get open source contributors to commit to the five year support commitment of the LTS.

People were present that work on all the derivatives like Kubuntu and Lubuntu. Most of the work they do actually goes in the upstream Debian release, which benefits even more people.

The Fight for a Secure Linux Bios

David Spring gave this presentation on all the evils of UEFI and why we need CoreBoot so badly. He has a lot of stories on the evils done by the NSA, including causing the Deepwater Horizon disaster. When the NSA release the second version of Stuxnet to attack the Iranian nuclear program, it got away on them. The oil industry uses a lot of the same Siemens equipment and got infected. Before the disaster, Deepwater Horizons monitoring computers were all down, because of the NSA and Stuxnet. If not for the NSA, they would have detected the problem and resolved it without the disaster. For all the propaganda on Chinese and Russian hacking, the NSA employees 100 hackers for every single Chinese one. Their budget is huge.

Past, Present and Future of Blockchain

My friend Clive Boulton (from the GWT days) gave this presentation on the commercial uses of blockchain. This had nothing to do with cryptocurrencies and was on using the algorithms to secure and enable commercial transactions without third party intermediaries. The presentation covered a number of frameworks like Hyperledger and Openchain that enable blockchain for application developers.

Zero Knowledge Architecture

M4dz’s presentation showed how to limit access to application data, for instance to stop insurance companies seeing your medical records. Zero knowledge protocols find ways to tell if you have knowledge without getting that knowledge. For instance if you want to know if someone can access a room, you can watch them open the door, you don’t need to get a copy of the key. Similarly you can watch a service use a password, without giving you the password. These protocols are quite difficult, especially when you get into key recovery procedures, but ultimately if these gain traction we will all get better privacy.

Linux Gaming – the Dark Ages, Today and Beyond…

Ray Shimko’s presentation covered the state of Linux gaming from all the old console emulators to native ports of games where the source code has been released, to better packaging of all the layers required to run Windows games (right version of Wine, etc.). There are a lot of games on Linux now, but sadly the newest hot releases lag quite a while before showing up.

One interesting story is how the emulator contributors are trying to deal with games like “Duck Hunt”. Duck Hunt came with a gun, you pointed at the TV to shoot the ducks. The way this worked was that when you pressed the trigger, the game would flash the screen white. One a CRT this meant the refresh would scan down the screen in 1/60th of a second. A sensor in the gun would record when it saw white and by measuring the time difference, the software would know where the gun was pointing. The problem is that modern screens don’t work that way, so this whole aiming technique doesn’t work. Evidently a workaround is forthcoming.

Q&A

The conference ended with a Q&A session hosted by Maddog, Kyle Rankin and Simon Quigley. The audience could ask whatever they wanted and perhaps got an answer or perhaps got a story. Lots of why doesn’t Linux do X and how can I contribute to Y.

Summary

Hard to believe Linux is 25 years old all ready. This is a great show and in the spirit of free software the show is also free to attend. Lots of interesting discussion and its refreshing to see software developing where users really want, rather than what you see under various corporate agendas.

When you buy a new computer, make sure it uses Coreboot BIOS and not UEFI.

 

Advertisements

Written by smist08

April 30, 2019 at 7:01 pm

Posted in Life

Tagged with , , , ,

Playing with Software Defined Radio

leave a comment »

Introduction

Most Ham Radios these days, receive signals through an antenna, convert the signal to digital, process the signal with a built-in computer, and then output the result converting back to analog for the speaker. This trend to doing all the radio signal processing in software instead of using electronic components is called Software Defined Radio (SDR). The ICOM 7300 is built around SDR as are all the expensive Flex Radios.

Inexpensive SDR

Some clever hackers figured out that an inexpensive chip used in boards to receive TV into a computer, could actually tune to any frequency. From this discovery, many inexpensive USB dongles have been produced that utilize this “TV Tuner” chip, but to tune radio instead of TV. This is possible because all this chip does is receive a signal from an antenna and then convert it to digital for the computer to process. I purchased the RTL-SDR dongle for around $30 which included a small VHF/UHF antenna.

I run Linux, both on my laptop and on a Raspberry Pi. I looked around for software to use with this device and found several candidates. I chose CubicSDR because it easily installed from the Ubuntu App store on both my laptop and on my Raspberry Pi.

I tried it first on the Pi, but it just didn’t work well. It would keep hanging and the sound was never good. I then tried it on my laptop and it worked great. This led me to believe that the Raspberry Pi just doesn’t have the horsepower to run this sort of system. Either due to lack of memory (only having 1Gig) or that the ARM processor isn’t quite powerful enough. Doing some reading online, the consensus seemed to be that you couldn’t run both the radio software and a GUI on the same Pi. You needed to either have two Pi’s or use a command line version of the software. I was disappointed the Pi wasn’t up to the challenge, but got along just fine using my laptop.

Enter the NVidia Jetson Nano

I recently acquired an NVidia Jetson Nano Developers Kit. This is similar to a Raspberry Pi, but with a more powerful quad-core ARM processor, 4Gig or RAM and 120 Tegra NVidia GPU processors (it also costs $99 rather than $35).

I installed CubicSDR on this, and it worked right away like a charm. I was impressed, getting software for the Nano can sometime be difficult since it runs true 64-Bit Ubuntu Linux on ARM, so you need to have that built. But CubicSDR was in the App Store and installed with no problem. I fired it up and it recognized the RTL-SDR and it recognized the NVidia Tegra GPUs. It took over 10 of them for doing its signal processing and worked really well.

Below is a screenshot of CubicSDR playing an FM radio station.

CubicSDR

CubicSDR is open source and free, it uses GNURadio under the covers (low level open source radio processing). CubicSDR has quite an impressive display. Like fancy high end radios you can see what is happening on the frequencies around where you are tuned in. The interface can be a bit cryptic and you need to refer to the documentation to do some things. For instance the volume, doesn’t honor the system setting and you have to use the green slider in the upper right. Knowing what the various sliders do is quite helpful. Tuning frequencies is a bit tricky at first, but once you check the manual and play with it, it becomes easy. Using CubicSDR really is like using a high end radio, just for a fraction of the cost.

It is certainly helpful to know ham terminology and to know what radio protocol is used where. For instance most VHF communications use narrow band FM. Most longer wavelength ham communications are either upper or lower sideband. Aeronautical uses AM. Commercial FM stations use wide band FM.

Antennas

Although the RTL-SDR supports pretty much any frequency, you need the correct antenna for what you are doing. The ham bands that bounce off the stratosphere to allow you to talk to people halfway around the world use quite long wavelengths. The longer the wavelength, the larger the antenna you need to receive them. Don’t expect to receive anything from the 20 meter band without a good sized antenna. That doesn’t mean it has to be expensive, you can get good results using a dipole or end-fed antenna, both of these are just made out of wires, but you do have to string them high up and facing the right direction.

What About Transmitting?

This RTL-SDR only receives signals. If you want to transmit as well, then you need a more expensive model. These sort of SDR transmitters are very low power, so if you want to be heard, you will need a good linear amplifier, rated for the frequencies you want to use. You will also need a better antenna.

If you transmit you also require a ham radio license and call sign. You are responsible for not causing interference and that you signal doesn’t bleed through to adjacent channels. Since you are assembling this all yourself, an advanced license is required.

Summary

SDR is great fun to play with and there are lots of great projects you can create with this and an inexpensive single board computer. It’s too bad the Raspberry Pi isn’t quite up to the task. However, more powerful Pi competitors like the Jetson Nano run SDR just fine.

Written by smist08

April 16, 2019 at 2:08 am

Playing with CUDA on My NVIDIA Jetson Nano

leave a comment »

Introduction

I reported last time about my new toy, an NVIDIA Jetson Nano Development Kit. I’m pretty familiar with Linux and ARM processors. I even wrote a couple of articles on Assembler programming, here and here. The thing that intrigued be about the Jetson Nano is its 128 Maxwell GPU cores. What can I do with these? Sure I can speed up TensorFlow since it uses these automatically. I could probably do the same with OpenGL programs. But what can I do directly?

So I downloaded the CUDA C Programming Guide from NVIDIA’s website to have a look at what is involved.

Setup

The claim is that the microSD image of 64Bit Ubuntu Linux that NVIDIA provides for this computer has all the NVIDIA libraries and utilities you need all pre-installed. The programming guide made it clear that if you need to use the NVIDIA C compiler nvcc to compile your work. But if I typed nvcc at a command prompt, I just got an error that this command wasn’t found. A bit of Googling revealed that everything is installed, but it did it before installation created your user, so you need to add the locations to some PATHS. Adding:

export PATH=${PATH}:/usr/local/cuda/bin
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64

To my .bashrc file got everything working. It also shows where cuda is installed. This is handy since it includes a large collection of samples.

Compiling the deviceQuery sample produced the following output on my Nano:

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.0 / 10.0
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3957 MBytes (4148756480 bytes)
  ( 1) Multiprocessors, (128) CUDA Cores/MP:     128 CUDA Cores
  GPU Max Clock rate:                            922 MHz (0.92 GHz)
  Memory Clock rate:                             13 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1

Result = PASS

This is all good information and what all this data means is explained in NVIDIA’s developer documentation (which is actually pretty good). The deviceQuery sample exercises various information APIs in the CUDA library to tell you all it can about what you are running. If you can compile and run deviceQuery in the samples/1_Utilities folder then you should be good to go.

CUDA Hello World

The 128 NVidia Maxwell cores basically consist of a SIMD computer (Single Instruction Multiple Data). This means you have one instruction that they all execute, but on different data. For instance if you want to add two arrays of 128 floating point numbers you have one instruction, add, and then each processor core adds a different element of the array. NVidia actually calls their processors SIMT meaning single instruction multiple threads, since you can partition the processors to different threads and have the two threads each with a collection of processors doing their SIMD thing at once.

When you write a CUDA program, you have two parts, one is the part that runs on the host CPU and the other is the part that runs on the NVidia GPUs. The NVidia C compiler, NVCC adds a number of extensions to the C language to specify what runs where and provide some more convenient syntaxes for the common things you need to do. For the host parts, NVCC translates its custom syntax into CUDA library calls and then passes the result onto GCC to compile regularly. For the GPU parts, NVCC compiles to an intermediate format called PTX. The reason it does this is to support all the various NVidia GPU models. When the NVidia device driver goes to load this code, it does a just in time compile (which it then caches), where the PTX code is compiled to the correct binary code for your particular set of GPUs.

Here is the skeleton of a simple CUDA program:

// Kernel definition
__global__ void VecAdd(float* A, float* B, float* C)
{
    int i = threadIdx.x;
    C[i] = A[i] + B[i];
}

int main()
{
    ...
    // Kernel invocation with N threads
    VecAdd<<<1, N>>>(A, B, C);
    ...
}

 

The __global__ identifier specifies the VecAdd routine as to run on the GPU. One instance of this routine will be downloaded to run on N processors. Notice there is no loop to add these vectors, Each processor will be a different thread and the thread’s x member will be used to choose which array element to add.

Then in the main program we call VecAdd using the VecAdd<<>> syntax which indicates we are calling a GPU function with these three arrays (along with the size).

This little example skips the extra steps of copying the arrays to GPU memory or copying the result out of GPU memory. There are quite a few different memory types, and various trade offs for using them.

The complete program for adding two vectors from the samples is at the end of this article.

This example also doesn’t explain how to handles larger arrays or how to do error processing. For these extra levels of complexity, refer to the CUDA C Programming Guide.

The CUDA program here is very short, just doing an addition. If you wanted to say multiply two 10×10 matrices, you would have your CUDA code do the dot product of a row in the first matrix by a column in the second matrix. Then you would have 100 cores execute this code, so the result of the multiplication would be done 100 times faster than just using the host processor. There are a lot of samples on how to do matrix multiplication in the samples and documentation.

Newer CUDA Technologies

The Maxwell GPUs in the Jetson Nano are a bit old and reading and playing with the CUDA libraries revealed a few interesting tidbits on things they are missing. We all know how NVidia has been enhancing their products for gaming and graphics with the introduction of things like real time ray tracing, but the thing of more interest to me is how they’ve been adding features specific to Machine Learning and AI. Even though Google produces their own hardware for accelerating their TensorFlow product in their data centers, NVidia has added specific features that greatly help TensorFlow and other Neural Network programs.

One thing the Maxwell GPU lacks is direct matrix multiplication support, newer GPUs can just do A * B + C as a single instruction, where these are all matrices.

Another thing that NVidia just added is direct support for executing computation graphs. If you worked with the early version of TensorFlow then you know that you construct your model by building a computational graph and then training and executing it. The newest NVidia GPUs can now execute these graphs directly. NVidia has a TensorRT library to move parts of TensorFlow to the GPU, this library does work for the Maxwell GPUs in the Jetson Nano, but is probably way more efficient in the newest, bright and shiny GPUs. Even just using TensorFlow without TensorRT is a great improvement and handles moving the matrix calculations to the GPUs even for the Nano, it just means the libraries have more work to do.

Summary

The GPU cores in a product like the Jetson Nano can be easily utilized using products that support them like TensorFlow or OpenGL, but it’s fun to explore the lower level programming models to see how things are working under the covers. If you are interested in parallel programming on a SIMD type machine, then this is a good way to go.

 

/**
 * Copyright 1993-2015 NVIDIA Corporation.  All rights reserved.
 *
 * Please refer to the NVIDIA end user license agreement (EULA) associated
 * with this source code for terms and conditions that govern your use of
 * this software. Any use, reproduction, disclosure, or distribution of
 * this software and related documentation outside the terms of the EULA
 * is strictly prohibited.
 *
 */

/**
 * Vector addition: C = A + B.
 *
 * This sample is a very basic sample that implements element by element
 * vector addition. It is the same as the sample illustrating Chapter 2
 * of the programming guide with some additions like error checking.
 */

#include <stdio.h>

// For the CUDA runtime routines (prefixed with "cuda_")
#include <cuda_runtime.h>

#include <helper_cuda.h>

/**
 * CUDA Kernel Device code
 *
 * Computes the vector addition of A and B into C. The 3 vectors have the same
 * number of elements numElements.
 */

__global__ void
vectorAdd(const float *A, const float *B, float *C, int numElements)
{
    int i = blockDim.x * blockIdx.x + threadIdx.x;

    if (i < numElements)
    {
        C[i] = A[i] + B[i];
    }
}

/**
 * Host main routine
 */

int
main(void)
{
    // Error code to check return values for CUDA calls
    cudaError_t err = cudaSuccess;

    // Print the vector length to be used, and compute its size
    int numElements = 50000;
    size_t size = numElements * sizeof(float);
    printf("[Vector addition of %d elements]\n", numElements);

    // Allocate the host input vector A
    float *h_A = (float *)malloc(size);

    // Allocate the host input vector B
    float *h_B = (float *)malloc(size);

    // Allocate the host output vector C
    float *h_C = (float *)malloc(size);

    // Verify that allocations succeeded
    if (h_A == NULL || h_B == NULL || h_C == NULL)
    {
        fprintf(stderr, "Failed to allocate host vectors!\n");
        exit(EXIT_FAILURE);
    }

    // Initialize the host input vectors
    for (int i = 0; i < numElements; ++i)
    {
        h_A[i] = rand()/(float)RAND_MAX;
        h_B[i] = rand()/(float)RAND_MAX;
    }

    // Allocate the device input vector A
    float *d_A = NULL;
    err = cudaMalloc((void **)&d_A, size);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to allocate device vector A (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Allocate the device input vector B
    float *d_B = NULL;
    err = cudaMalloc((void **)&d_B, size);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to allocate device vector B (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Allocate the device output vector C
    float *d_C = NULL;
    err = cudaMalloc((void **)&d_C, size);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to allocate device vector C (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Copy the host input vectors A and B in host memory to the device input vectors in
    // device memory
    printf("Copy input data from the host memory to the CUDA device\n");
    err = cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to copy vector A from host to device (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    err = cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to copy vector B from host to device (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Launch the Vector Add CUDA Kernel
    int threadsPerBlock = 256;
    int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock;
    printf("CUDA kernel launch with %d blocks of %d threads\n", blocksPerGrid, threadsPerBlock);
    vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements);
    err = cudaGetLastError();

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to launch vectorAdd kernel (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Copy the device result vector in device memory to the host result vector
    // in host memory.
    printf("Copy output data from the CUDA device to the host memory\n");
    err = cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to copy vector C from device to host (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Verify that the result vector is correct
    for (int i = 0; i < numElements; ++i)
    {
        if (fabs(h_A[i] + h_B[i] - h_C[i]) > 1e-5)
        {
            fprintf(stderr, "Result verification failed at element %d!\n", i);
            exit(EXIT_FAILURE);
        }
    }

    printf("Test PASSED\n");

    // Free device global memory
    err = cudaFree(d_A);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to free device vector A (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    err = cudaFree(d_B);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to free device vector B (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    err = cudaFree(d_C);

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to free device vector C (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Free host memory
    free(h_A);
    free(h_B);
    free(h_C);

    printf("Done\n");
    return 0;
}






Written by smist08

April 3, 2019 at 6:01 pm

Can NVidia Bake a Better Pi Than Raspberry?

with 2 comments

Introduction

I love my Raspberry Pi, but I find it’s limited 1Gig of RAM can be quite restricting. It is still pretty amazing what you can do with these $35 computers. I was disappointed when the Raspberry Foundation announced that the Raspberry Pi 4 is still over a year away, so I started to look at Raspberry Pi alternatives. I wanted something with 4Gig of RAM and a faster ARM processor. I was considering purchasing an Odroid N2, when I saw the press release from NVidia’s Developer Conference that they just released their NVidia Jetson Nano Developer Kit. This board has a faster ARM A57 quad core processor, 4 Gig of RAM plus the bonus of a 128 core Maxwell GPU. The claim being that this is an ideal DIY computer for those interested in AI and machine learning (i.e. me). It showed up for sale on arrow.com, so I bought one and received it via FedEx in 2 days.

Setup

If you already have a Raspberry Pi, setup is easy, since you can unplug things from the Pi and plug them into the Nano, namely the power supply, keyboard, monitor and mouse. Like the Pi, the Nano runs from a microSD card, so I reformatted one of my Pi cards to a download of the variant of Ubuntu Linux that NVidia provides for these. Once the operating system was burned to the microSD card, I plugged it into the Nano and away I went.

One difference from the Pi is that the Nano does not have built in Wifi or Bluetooth. Fortunately the room I’m setting this up in has a wired Internet port, so I went into the garage and found a long Internet cable in my box of random cables, plugged it in and was all connected to the Internet. You can plug a USB Wifi dongle in if you need Wifi, or there is an M.2 E slot (which is hard to access) for an M.2 Wifi card. Just be careful of compatibility, since the drivers need to be compiled for ARM64 Linux.

The board doesn’t come with a case, but the box folds into a stand to hold the board. For now that is how I’m running. If they sell enough of these, I’m sure cases will appear, but you will need to ensure there is enough ventilation for the huge heat sink.

Initial Impressions

The Jetson Nano certainly feels faster than the Raspberry Pi. This is all helped by the faster ARM processor, the quadrupled memory, using the GPU cores for graphics acceleration and that the version of Linux is 64 Bit (unlike Raspbian which is 32 Bit). It ran the pre installed Chromium Browser quite well.

As I installed more software, I found that writing large amounts of data to the microSD card can be a real bottleneck, and I would often have to wait for it to catch up. This is more pronounced than on the Pi, probably because other things are quite slow as well. It would be nice if there was an M.2 M interface for an NVMe SSD drive, but there isn’t. I ordered a faster microSD card (over three times faster than what I have) and hope that helps. I can also try putting some things on a USB SSD, but again this isn’t the fastest.

I tried running the TensorFlow MNIST tutorial program. The version of TensorFlow for this is 1.11. If I want to try TensorFlow 2.0, I’ll have to compile it myself for ARM64, which I haven’t attempted yet. Anyway, TensorFlow automatically used the GPU and executed the tutorial orders of magnitude faster than the Pi (a few minutes versus several hours). So I was impressed with that.

This showed up another gotcha. The GPU cores and CPU share the same memory. So when TensorFlow used the GPU, that took a lot of memory away from the CPU. I was running the tutorial in a Jupyter notebook running locally, so that meant I was running a web server, Chromium, Python, and then TensorFlow with bits on the CPU and GPU. This tended to use up all memory and then things would grind to a halt until garbage collection sorted things out. Running from scratch was fine, but running iteratively felt like it kept hitting a wall. I think the lesson here is that to do machine learning training on this board, I really have to use a lighter Python environment than Jupyter.

The documentation mentions a utility to control the processor speeds of the ARM cores and GPU cores, so you can tune the heat produced. I think this is more for if you embed the board inside something, but beware this sucker can run hot if you keep all the various processors busy.

How is it so Cheap?

The NVidia Jetson Nano costs $99 USD. The Odroid is $79 so it is fairly competitive with other boards trying to be super-Pis. However, it is cheaper than pretty much any NVidia graphics card and even their Nano compute board (which has no ports and costs $129 in quantities of 1000).

The obvious cost saving is no Wifi and no bluetooth. Another is the lack of a SATA or M.2 M interface. It does have a camera interface, a serial interface and a Pi like GPIO block.

The Nano has 128 Maxwell GPU cores. Sounds impressive, but remember most graphics cards have 700 to 4000 cores. Further Maxwell is the oldest supported platform (version 5) where as the newest is the version 7 Volta core.

I think NVidia is keeping the cost low, to get the DIY crowd using their technologies, they’ve seen the success of the Raspberry Pi community and want to duplicate it for their various processor boards. I also think they want to be in the ARM board game, so as better ARM processors come out, they might hope to supplant Intel in producing motherboards for desktop and laptop computers.

Summary

If the Raspberry Pi 4 team can produce something like this for $35 they will have a real winner. I’m enjoying playing with the board and learning what it can do. So far I’ve been pretty impressed. There are some limitations, but given the $100 price tag, I don’t think you can lose. You can play with parallel processing with the GPU cores, you can interface to robots with the GPIO pins, or play with object recognition via the camera interface.

For an DIY board, there are a lot of projects you can take on.

 

Ghidra

with one comment

Introduction

In my novel “Influence”, the lead character J@ck Tr@de searches for an easter egg in a server operating system. To do this he uses a disassembler which converts machine code back into source code. Normally you write computer programs in a programming language which a compiler (another program) reads through and converts to the bits and bytes that computers actually execute. In my novel, the disassembler uses AI to do an especially good job. I don’t know of any disassembler that uses AI yet, but a new really powerful disassembler has just been released by the NSA as open source. So I thought I’d spend a bit of time blogging on this, since it’s open source, perhaps someone will add some AI to it, so it is a powerful as the tool J@ck uses.

Most disassemblers either aren’t very good, or are quite expensive. This just changed when the NSA released their internally developed tool Ghidra as open source. I don’t know where the word Ghidra comes from, but the icon for the program is a dragon. I downloaded this and gave it a run. It was easy to install, ran well and looks really powerful. Why did the NSA do this? Don’t they usually guard their internal tools with their lives? They claim it’s to help security researchers at Universities and such do a better job discovering vulnerabilities in software, making us all safer as a result. I wonder if it’s the NSA trying to get some good publicity, since they are generally untrusted and most Americans got upset when it was revealed that the NSA could access any photo on any cell phones, including dick-pics. This really upset a lot of people, probably the only good thing a dick-pic has ever done.

For anyone interested, my novel, Influence, is available either as a paperback or as a Kindle download on Amazon.com:

Paperback – https://www.amazon.com/dp/1730927661
Kindle – https://www.amazon.com/dp/B07L477CF6

Installation

Ghidra is a Java program, so you need to have the Java 11 JDK installed first. I’m doing this on Ubuntu and didn’t already have Java installed. The Java that is installed by default apt-get is Java 10, so it didn’t work. To install Java 11, took a bit of Googling, but the following commands worked:

sudo add-apt-repository ppa:linuxuprising/java
sudo apt-get update
sudo apt-get install oracle-java11-installer

This adds an additional repository and then installs Java 11 from it. Then download Ghidra, uncompress it somewhere and run the shell script to start it.

Running

To play around with it, I created a new project and imported the executable file “head” from /usr/bin. This gave me some basic information on the executable:

It then takes a second to analyse the file and then I can launch the code browser and look through a split screen with the assembler code on the left and the generated C code on the right.

I can view a function call graph of the current function (the functions that call it and the functions that it calls).

I can view a function graph of the entire program that I can zoom in and out and browse around in.

I can annotate the program, add any insights I see. I can patch the program. All very powerful. Ghidra has full scripting support, the built in scripting language is Java, after all it is a Java program. But the API has support to add other scripting languages. There is a plug-in architecture so you can write extensions. It supports many executable formats and knows about many processor instruction sets.

Trust the NSA?

After I downloaded Ghidra, I watched a couple of YouTube videos on it. One of the presenters ran WireShark so he could see if Ghidra was making any network connections back to the NSA. After all, could this be a trojan horse that the NSA will use to find out what hackers are up to? At least this presenter didn’t see any network calls while he was running. But to a real hacker this could be major concern. As of this writing all the Java code has been open sourced, but some of the extra addons that are in other languages still need to be posted, so right now you can’t build Ghidra as it’s distributed, but the NSA say this should be remedied in a few weeks.

Summary

Although, perhaps not as powerful as what J@ck was using, this is a really powerful tool to reverse engineer programs and even operating systems. The generated source code isn’t great, but it’s helpful compared to just assembler. I think the expensive commercial disassembler vendors must be pretty upset as I really don’t see any reason for them to exist now? I think this will be a big boon to black and white hat hackers as well as to anyone that needs to reverse engineer some code (perhaps to figure out an undocumented API). Happy Hacking.

Written by smist08

March 6, 2019 at 9:19 pm

Social Media Bots

leave a comment »

Introduction

In my novel “Influence”, the lead character J@ck Tr@de spends a lot of time creating and improving Social Media Bots. I thought in this article I’d spend a bit of time providing some background on these. Social Media Bots weren’t made up by me, they’ve been around for a while. It is estimated that 15-20% of all social media accounts are really Bots and that 15-20% of all posts on social media sites like Twitter and Facebook were created by these Bots.

For anyone interested, my book is available either as a paperback or as a Kindle download on Amazon.com:

Paperback – https://www.amazon.com/dp/1730927661
Kindle – https://www.amazon.com/dp/B07L477CF6

What is a Bot?

Bot is short for Robot and really means a computer program that is pretending to be a real human being. Early bots were easy to identify and rather simple, but over time they’ve become more and more sophisticated, even to the extent of being credited with getting Donald Trump elected as president of the USA.

The original Bots were spambots, which were programs that just send out spam emails. Basically hackers would take over people’s computers and install a program on them (the spambot) which would then send out spam to all the contacts on the poor victim’s email. Programmers found these quite effective and took the same idea to social media.

Most of these Bots are quite simple and just work to advocate some idea by posting from a collection of human created messages. They can be trying to influence political views, direct people to dubious websites or perhaps just make people mad for the fun of it.

There is an interesting website, Botometer, that will analyse a Twitter account and score it to see if it’s a bot. I ran it on all my Twitter followers and quite a few got a score indicating they were Bots.

Bots Get More Sophisticated

Like any computer programs, Bots keep coming out with new versions getting more and more sophisticated. They now create quite realistic Internet personas with photos and a history. If you look at such a Bot’s Facebook page, you might be hard pressed to tell that it doesn’t belong to a real person. Creating social media accounts is pretty easy with very little verification. You just need a valid email account and need the ability to respond to the email that is sent to ensure it is you, plus perhaps fool a simple captcha tool.

Another newer Bot is the so called ChatBot. These are programs that can carry on a conversation. They can use modern sophisticated machine learning algorithms to carry on a conversation on a topic like providing movie reviews. Many companies are trying to deploy ChatBots to automate their customer service. Companies can purchase ChatBot kits that they can customize for their own customer service needs. Often companies use ChatBots to handle their social media accounts. A major large company can’t answer all the Tweets and Facebook posts it receives, so they automate this with a Chatbot. Sometime this is effective, sometimes it just pisses people off. The feeling is that people getting some sort of answer is better than no answer.

The developers that create Social Media Bots took this same technology and incorporated it into their Bots. Now these Bots don’t just post canned messages, but can also carry on limited conversations on these topics. Often political campaigns employ these to give the impression they have far more support than they really do. If you post a comment to a news article on Facebook, often you get responses almost right away. Often most of these responses are actually from Social Media Bots using ChatBot technology. The Russians really spearheaded this in the American 2016 election campaign.

As Machine Learning and AI technology gets more and more powerful, these Social Media Bots get harder and harder to distinguish from real people. Especially given the low quality of posts from actual real people. When a corporation uses a Chatbot for technical support, it will identify itself as such and often has an option to switch to a real person (perhaps with quite a long wait time), but when you are on Social Media, how do you really know who you are talking to?

In my book, Influence, the main character, J@ck programs his Bots to both network and to modify their own code. As it is, Bots behave as viruses and spread maliciously from computer to computer. The current Bots tend to rely on volume to do their damage. But as in Influence, perhaps the Bots will start to coordinate their actions and work together to accomplish their goals. Given the number of computing devices connected to the Internet, a successfully spread Bot could harness tremendous computing power to spread its Influence. Applying new algorithms for reinforcement and adaptive learning, the programs can get more and more effective out in the wild without requiring additional coding from their creators. Is it really that far fetched that this network of Bots couldn’t become aware or intelligent in some sort of sense?

Summary

Twenty percent of users and twenty percent of posts on Social Media are via automated Bots and not created by real people. Should you believe what you see on Facebook, should you be influenced by all the tweets you see going by on Twitter? Are your thought processes critical enough to filter out all the automated noise that is being targeted at you? Are your consumer decisions on what you buy or your political decisions on how you vote being controlled by all these Bots? This is definitely something people should be aware of you should be aware of this and don’t just believe it all.

Written by smist08

February 5, 2019 at 9:38 pm

The Technology of “Influence” – Part 5 VHF Radio Modems

leave a comment »

Introduction

In my novel “Influence”, the lead character J@ck Tr@de performs various hacking tasks. In the book he spends a lot of time securing his connections, hiding his identity and hiding his location. In this series of blog posts, I’m going to talk about the various technologies mentioned in the book like VPN, the Onion Browser, Kali Linux and using VHF radios. I’ve talked about HTTPS,  VPNs, the Onion Browser and Kali Linux so far, now we’re going to discuss VHF Radio Modems.

Very High Frequency (VHF) is a radio band used by both commercial and amateur radio operators (on different frequencies). Often if you see people using small handheld radios then chances are they are using VHF. This frequency band works line of sight and doesn’t require a very large antenna to work quite well. Like any radio frequency you can transmit and receive digital data over the air, just like a cell phone does. You can buy fairly inexpensive VHF radio modems which can be used to connect a computer to the Internet via a VHF radio.

In this article we’ll look at these in a bit more detail and discuss why J@ck finds these useful.

For anyone interested, my book is available either as a paperback or as a Kindle download on Amazon.com:

Paperback – https://www.amazon.com/dp/1730927661
Kindle – https://www.amazon.com/dp/B07L477CF6

Why Does J@ck Use These?

In the previous articles we have J@ck accessing the Internet from a coffee shop Wifi using HTTPS, a VPN and the Onion Browser. With all this security, why doesn’t J@ck feel secure? As we mentioned before you want to consider security as an Onion where the more layers you have protecting you, the more secure you can feel. However the good hackers always feel paranoid and worry about being traced. In this case J@ck is worried, what if the NSA, FBI or some other agency can track his Internet usage back to the coffee shop’s wifi?

J@ck doesn’t know if anyone can do this or if anyone is actually looking for him. By having a homeless person plant a Raspberry Pi with a VHF radio outside the coffee shop and then J@ck accesses that via a VHF radio modem attached to his laptop, J@ck can be upto 2 km away from the coffee shop, as long as he has line of sight.

This way if the people in the black SUVs show up, J@ck can see them, be warned and escape. Most importantly then he will know someone is looking for him.

The downside for J@ck is that each layer of the security onion adds overhead and latency that slows down his Internet access. With all this security in place J@ck can only access the Internet very slowly.

Strictly speaking to use these frequencies you should have either a Ham or Commercial Radio license. But then if you follow the license rules, you need to identify yourself every 30 minutes, and J@ck is certainly not going to do that. In the scheme of things, J@ck considers the penalties for illegally operating a radio, the least of his problems. There are radio modems for UHF and 900 MHz as well, J@ck could use these as well. As long as the radio is cheap enough to be disposable.

Can the NSA Catch J@ck?

If the NSA can trace J@ck’s Internet traffic back to the coffee shop. Perhaps via a compromised Tor exit node and a compromised VPN, then what can they do?

If the NSA suspect J@ck is using a VHF modem, then rather than sending the SWAT team into the coffee shop, they could have three vehicles with radio direction finding equipment move into the area quietly and then they could triangulate J@ck’s true location from the emissions from the VHF radio attached to his laptop.

J@ck’s hope is that they wouldn’t do this the first time, so if the G-men do show up at the coffee shop then he would assume they would either find his Raspberry Pi/Radio Modem or guess that he was doing this and then use the radio vans the second time.

J@ck also limits his time at each coffee shop, so that the Feds have less time to set this all up and trap him.

Summary

Catching hackers is a game of cat and mouse. Since J@ck is the mouse he wants to be as elusive as possible. VHF modems are just another tool to make it harder to trace back to J@ck’s location and catch him.

Written by smist08

January 24, 2019 at 9:12 pm