Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘Python

Playing with my Raspberry Pi

leave a comment »

Introduction

I do most of my work (like writing this blog posting) on my MacBook Air laptop. I used to have a good desktop computer for running various longer running processes or playing games. Last year the desktop packed it in (it was getting old anyway), so since then I’ve just been using my laptop. I wondered if I should get another desktop and run Ubuntu on it, since that is good for machine learning, but I wondered if it was worth price. Meanwhile I was intrigued with everything I see people doing with Raspberry Pi’s. So I figured why not just get a Raspberry Pi and see if I can do the same things with it as I did with my desktop. Plus I thought it would be fun to learn about the Pi and that it would be a good toy to play with.

Setup

Since I’m new to the Raspberry Pi, I figured the best way to get started was to order one of the starter kits. This way I’d be able to get up and running quicker and get everything I needed in one shot. I had a credit with Amazon, so I ordered one of the Canakits from there. It included the Raspberry Pi 3, a microSD card with Raspbian Linux, a case, a power supply, an electronics breadboard, some leds and resistors, heat sinks and an HDMI cable. Then I needed to supply a monitor, a USB keyboard and a USB mouse (which I had lying around).

Setting up was quite easy, though the quick setup instructions were missing a few steps like what to do with the heatsinks (which was obvious) or how to connect the breadboard. Setup was really just install the Raspberry Pi motherboard in the case, add the heat sinks, insert the microSD card and then connect the various cables.

As soon as I powered it on, it displayed an operating system selection and installation menu (with only one choice), so clicked install and 10 minutes later I was logged in and running Raspbian.

The quick setup guide then recommends you set your locale and change the default password, but they don’t tell you the existing password, which a quick Google reveals as “Raspberry”. Then I connected to our Wifi network and I was up and running. I could browse the Internet using Chromium, I could run Mathematica (a free Raspberry version comes pre-installed), run a Linux terminal session. All rather painless and fairly straight forward.

I was quite impressed how quickly it went and how powerful a computer I had up and running costing less than $100 (for everything) and how easy the installation and setup process was.

Software

I was extremely pleased with how much software the Raspberry Pi came with pre-installed. This was all on the provided 32Gig card, which with a few extra things installed, I still have 28Gig free. Amazingly compact. Some of the pre-installed software includes:

  • Mathematica. Great for Math students and to promote Mathematica. Runs from the Wolfram Language which is interesting in itself.
  • Python 2 and 3 (more on the pain of having Python 2 later).
  • LibreOffice. A full MS Office like suite of programs.
  • Lots of accessories like file manager, calculator, image viewer, etc.
  • Chromium web browser.
  • Two Java IDEs.
  • Sonic Pi music synthesizer.
  • Terminal command prompt.
  • Minecraft and some Python games.
  • Scratch programming environment.

Plus there is an add/remove software program where you can easily add many more open source Pi programs. You can also use the Linux apt-get command to get many other pre-compiled packages.

Generally I would say this is a very complete set of software for any student, hobbyist or even office worker.

Python

I use Python as my main goto programming language these days and generally I use a number of scientific and machine learning libraries. So I tried installing these. Usually I just use pip3 and away things go (at least on my Mac). However doing this caused pip3 to download the C++/Fortran source code and to try to compile it, which failed. I then Googled around on how to best install these packages.

Unfortunately most of the Google results were how to do this for Python 2, which I didn’t want. It will be so nice when Python 2 finally is discontinued and stops confusing everything. I wanted these for Python 3. Before you start you should update apt-get’s list of available software and upgrade all the packages on your machine. You can do this with:

sudo apt-get update        # Fetches the list of available updates
sudo apt-get upgrade       # Strictly upgrades the current packages

What I found is I could get most of what I wanted using apt-get. I got most of what I wanted with:

sudo apt-get install python3-numpy
sudo apt-get install python3-scipy
sudo apt-get install python3-matplotlib
sudo apt-get install python3-pandas

However I couldn’t find and apt-get module for SciKit Learn the machine learning library. So I tried pip3 and it did work even though it downloaded the source code and compiled it.

pip3 install sklearn –upgrade

Now I had all the scientific programming power of the standard Python libraries. Note that since the Raspberry Pi only has 1Gig RAM and the SD Card only has twenty something Gig free, you can’t really run large machine learning tasks. However if they do fit within the Pi then it is a very inexpensive way to do these computations. What a lot of people do is build clusters of Raspberry Pi’s that work together. I’ve seen articles on how University labs have built supercomputers out of hundreds or Pi’s all put together in a cluster. Further they run quite sophisticated software like Hadoop, Docker and Kubernetes to orchestrate the whole thing.

Summary

I now have the Raspberry Pi up and running and I’m enjoying playing with Mathematica and Sonic Pi. I’m doing a bit of Python programming and browsing the Internet. Quite an amazing little device. I’m also impressed with how much it can do for such a low cost. As other vendors like Apple, Microsoft, HP and Dell try to push people into more and more expensive desktops and laptops, it will be interesting to see how many people revolt and switch to the far more inexpensive DIY type solutions. Note that there are vendors that make things like Raspberry Pi complete desktop computers at quite a low cost as well.

Advertisements

Written by smist08

November 11, 2017 at 9:35 pm

The Road to TensorFlow – Part 9: TensorBoard

with one comment

Introduction

We’ve spent some time developing a Neural Network model for predicting the stock market. TensorFlow has produced a fairly black box implementation that is trained by historical data and then can output predictions for tomorrow’s prices.

But what confidence do we have that this model is really doing what we want? Last time we discussed some of the meta-parameters that configure the model. How do we know these are vaguely correct? How do we know if the weights we are training are converging? If we want to step through the model, how do we do that?

TensorFlow comes with a tool called TensorBoard which you can use to get some insight into what is happening. You can’t easily just print variables since they are all internal to the TensorFlow engine and only have values when required as a session is running. There is also the problem with how to visualize the variables. The weights matrix is very large and is constantly changing as you train it, you certainly don’t want to print this out repeatedly, let alone try to read through it.

To use TensorBoard you instrument your program. You tell it what you want to track and assign useful names to those items. This data is then written to log files as your model runs. You then run the TensorBoard program to process these log files and view the results in your Web Browser.

Something Went Wrong

Due to household logistics I moved my TensorFlow work over to my MacBook Air from running in an Ubuntu VM image on our Windows 10 laptop. Installing Python 3, TensorFlow and the various other libraries I’m using was quite simple and straight forward. Just install Python from Python.org and then use pip3 to install any other libraries. That all worked fine. But when I started running the program from last time, I was getting NaN results quite often. I wondered if TensorFlow wasn’t working right on my Mac? Anyway I went to debug the program and that led me to TensorBoard. As it turns out there was quite a bad bug in the program presented last time due to un-initialized variables.

You tend to get complacent programming in Python about un-initialized variables (and array subscript range errors) because usually Python will raise and exception if you try to use a variable that hasn’t been initialized. The problem is NumPy which is a library written in C for efficiency. When you create a NumPy array, it is returned to Python, telling Python its good to go. But since its managed by C code you don’t get the usual Python error checking. So when I changed the program to add the volumes to the price changes, I had a bug that left some of the data arrays uninitialized. I suspect on the Windows 10 laptop that these were initialized to zero, but that all depends on which exact C runtime is being used. On the Mac these values were just random memory and that immediately led to program errors.

Adding the TensorBoard initialization showed the problem was originating with the data and then it was fairly straight forward to zero in on the problem and fix it.

As a result, for this article, I’m just going to overwrite the Python file from last time with a newer one (tfstocksdiff2.py) which is posted here. This version includes TensorBoard instrumentation and a couple of other improvements that I’ll talk about next time.

TensorBoard

First we’ll start with some of the things that TensorBoard shows you. If you read an overview of TensorFlow it’s a bit confusing about what are Tensors and what flows. If you’ve looked at the program so far, it shows quite a few algebraic matrix equations, but where are the Tensors? What TensorFlow does is break these equations down into nodes where each node is a function execution and the data flows along the edges. This is a fairly common way to evaluate algebraic expressions and not unique to TensorFlow. TensorFlow then supports executing these on GPUs and in distributed environments as well as providing all the node types you need to create Neural Networks. TensorBoard gives you a way to visualize these graphs. The names of the nodes are from the program instrumentation.

screen-shot-2016-10-01-at-8-30-35-am

When the program was instrumented it grouped things together. Here is an expansion of the trainingmodel box where you can see the operations that make up our model.

screen-shot-2016-10-01-at-8-31-56-am

This gives us some confidence that we have constructed our TensorFlow graph correctly, but doesn’t show any data.

We can track various statistics of all our TensorFlow variables over time. This graph is showing a track of the means of the various weight and bias matrixes.

screen-shot-2016-10-01-at-8-32-50-am

TensorBoard also lets us look at the distribution of the matrix values over time.

screen-shot-2016-10-01-at-8-33-44-am

TensorBoard also lets us look at histograms of the data and how those histograms evolve over time.

screen-shot-2016-10-01-at-8-34-36-am

You can see how the layer 1 weights start as their seeded normal distribution of random numbers and then progress to their new values as training progresses. If you look at all these graphs you can see that the values are still progressing when training stops. This is because TensorBoard instrumentation really slows down processing, so I shortened the training steps while using TensorBoard. I could let it run much longer over night to ensure that I am providing sufficient training for all the values to settle down.

Program Instrumentation

Rather than include all the code here, check out the Google Drive for the Python source file. But quickly we added a function to get all the statistics on a variable:

def variable_summaries(var, name):
  """Attach a lot of summaries to a Tensor."""
  with tf.name_scope('summaries'):
   mean = tf.reduce_mean(var)
   tf.scalar_summary('mean/' + name, mean)
  with tf.name_scope('stddev'):
   stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
   tf.scalar_summary('stddev/' + name, stddev)
  tf.scalar_summary('max/' + name, tf.reduce_max(var))
  tf.scalar_summary('min/' + name, tf.reduce_min(var))
  tf.histogram_summary(name, var)

We define names in the various section and indicate the data we want to collect:

with tf.name_scope('Layer1'):
    with tf.name_scope('weights'):
        layer1_weights = tf.Variable(tf.truncated_normal(
            [NHistData * num_stocks * 2, num_hidden], stddev=0.1))
        variable_summaries(layer1_weights, 'Layer1' + '/weights')
     with tf.name_scope('biases'):
         layer1_biases = tf.Variable(tf.zeros([num_hidden]))
         variable_summaries(layer1_biases, 'Layer1' + '/biases')

Before the call to initialize_all_variables we need to call:

merged = tf.merge_all_summaries()
test_writer = tf.train.SummaryWriter('/tmp/tf/test',
    session.graph )

And then during training:

summary, _, l, predictions = session.run(
    [merged, optimizer, loss, train_prediction], feed_dict=feed_dict)
test_writer.add_summary(summary, i)

Summary

TensorBoard is quite a good tool to give you insight into what is going on in your model. Whether the program is correctly doing what you think and whether there is any sanity to the data. It also lets you tune the various parameters to ensure you are getting best results.

Written by smist08

October 1, 2016 at 4:49 pm

The Road to TensorFlow – Part 2: Python

with 6 comments

Introduction

This is part 2 on my blog series on playing with TensorFlow. Last time I blogged on getting Linux going in a VM. This time we will be talking about the Python programming language. The API for TensorFlow is primarily aimed at Python and in fact much of the research in AI, scientific computing, numerical computing and data research all takes place in Python. There is a C++ API as well, but it seems like a good chance to give Python a try.

Python is an interpreted language that is very rich in supporting various programming paradigms like object oriented, procedural and functional. Python is open source and runs on many platforms. Most Linux’s and the MacOS come with some version of Python pre-installed. Python is very interoperable and can work with most other programming systems, and there are a huge number of libraries of functionality available to the Python programmer. Python is oriented to getting things done quickly with a minimum of code and a minimum of fuss. The name Python is a tribute to the comedy troupe Monty Python and there are many references to Monty Python throughout the documentation.

Monty_Python's_Flying_Circus_Title_Card

Installation and Versions

Although I generally like Python it has one really big problem that is generally a pain in the ass when setting up new systems and browsing documentation. The newest version of Python as of this writing is 3.5.2 which is the one I wanted to use along with all the attendant libraries. However, if you type python in a terminal window you get 2.7.12. This is because when Python went to version 3 it broke source code compatibility. So they made the decision to maintain version 2 going forwards while everyone updated their programs and scripts to version 3. Version 3.0 was released in 2008 and this mess is still going on eight years later. The latest Python 2.x, namely 2.7.12 was just released in June 2016 and seems to be quite actively developed by a good sized community. So generally to get anything Python 3.x you need to add a 3 to the end. So to run Python 3.5.2 in a terminal window you type python3. Similarly, the IDE is IDLE3 and the package installer is pip3. It makes it very easy to make a mistake an to get the wrong thing. Worse the naming isn’t entirely consistent across all packages, there are several that I’ve run into where you add a 2 for the 2.x version and the version 3 one is just the name. As a result, I always get a certain amount of Python 2.x stuff accidentally installed by mistake (which doesn’t hurt anything, just wastes time and disk space). This also leads to a bit of confusion when you Google for information, in that you have to be careful to get 3.x info rather than 2.x info as the wrong one may or may not work and may or may not be a best practice.

On Ubuntu Linux I just used apt-get to install the various packages I needed. I’ll talk about these a bit more in the next posting. Another option for installing Python and all the scientific libraries is to use the Anaconda distribution which is quite a good way to get everything in Python installed all at once. I used Anaconda to install Python on Windows 10 at it worked really well, you just don’t get the fine control of what it does and it creates a separate installation to keep everything separate from anything already installed.

Python the Language

Python is a very large language; it has everything from object orientation to functional programming to huge built in libraries. It does have a number of quirks though. For instance, the way you define blocks is via indentation rather than using curly brackets or perhaps end block statements. So indentation isn’t just a style guideline, it’s fundamental to how the program works. In the following bit of code:

for i in range(10):
    a = i * 8
    print( i, a )
a = 8

the two indented statements are part of the for loop and the out-dented assignment is outside the loop. You don’t define variables, they are defined when first assigned to, and you can’t use a variable without assigning it first (or an exception will be thrown). There are a lot of built in types including dictionaries and lists, but no array type (but the numpy library does add these). Notice how the for loop uses in rather than to, to do a basic loop.

I don’t want to get too much into the language since it is quite large. If you are interested there are many good sites on the web to teach Python and the O’Reilly book “Learning Python” is recommended (but quite long).

Since Python is interpreted, you don’t need to wait for any compile steps so the coding, testing, debugging cycle is quite quick. Writing tight loops in Python will be slower than C, but generally Python gives you quite good libraries to do most of what you want and the libraries tend to be written in C or Fortran and very fast. So far I haven’t found speed to be an issue. TensorFlow is also written in C for speed, plus it has the ability to run on NVidia graphics cards for an extra boost.

Summary

This was my quick intro to Python. I’ll talk more about relevant parts of Python as I go along in this series. I generally like Python and so far my only big complaint is the confusion between the version 2 world and the version 3 world.

 

Written by smist08

August 26, 2016 at 11:10 pm

Posted in Artificial Intelligence

Tagged with ,

The Road to TensorFlow – Part 1 Linux

with 12 comments

Introduction

There have been some remarkable advancements in Artificial Intelligence type algorithms lately. I blogged on this a little while ago here. Whether its computers reading hand-writing, understanding speech, driving cars or winning at games like Go, there seems to be a continual flood of stories of new amazing accomplishments. I thought I’d spend a bit of time getting to know how this was all coming about by doing a bit of reading and playing with the various technologies.

I wanted to play with Neural Network technology, so thought the Google TensorFlow open source toolkit would be a good place to start. This led me down the road to quite a few new (to me) technologies. So I thought I’d write a few blog posts on my road to getting some working TensorFlow programs. This might take quite a few articles covering Linux, Python, Python libraries like Pandas, Stock Market technical analysis, and then TensorFlow.

Linux

The first obstacle I ran into was that TensorFlow had no install image for Windows, after a bit of Googling, I found you need to run it on MacOS or Linux. I haven’t played with Linux in a few years and I’d been meaning to give it a try.

I happened to have just read about a web site osboxes.org that provides VirtualBox and VMWare images of all sorts of versions of Linux all ready to go. So I thought I’d give this a try. I downloaded and installed VirtualBox and downloaded a copy of 64Bit Ubuntu Linux. Since I didn’t choose anything special I got Canonical’s Unity Desktop. Since I was trying new things, I figured oh well, lets get going.

Things went pretty well at first, I figured out how to install things on Ubuntu which uses APT (Advanced Packaging Tool) which is a command line utility to install things into Ubuntu Linux. This worked pretty well and the only problems I had were particular to installing Python which I’ll talk about when I get to Python. I got TensorFlow installed and was able to complete the tutorial, I got the IDLE3 IDE for Python going and all seemed good and I felt I was making good progress.

Then Ubuntu installed an Ubuntu update for me (which like Windows is run automatically by default). This updated many packages on my virtual image. And in the process broke the Unity desktop. Now the desktop wouldn’t come up and all I could do was run a single terminal window. So at least I could get my work off the machine. I Googled the problem and many people had it, but none of the solutions worked for me and I couldn’t resolve the problem. I don’t know if its just that Unity is finicky and buggy or if it’s a problem with running in a VirtualBox VM. Perhaps something with video drivers, who knows.

Anyway I figured to heck with Ubuntu and switched to Red Hat’s Fedora Linux. I chose a standard simple Gnome desktop and swore to never touch Unity again. I also realized that now I’m retired, I’m not a commercial user, so I can freely use VMWare, so I also switched to VMWare since I wondered if my previous problem was caused by VirtualBox. Anyway installing TensorFlow on Fedora seemed to be quite difficult. The dependencies in the TensorFlow install assume the packages that Ubuntu installs by default and apparently these are quite different that Fedora. So after madly installing things that I didn’t really think were necessary (like the Gnu Fortran compiler), I gave up on Fedora.

So I went back to osboxes.org and downloaded an Ubuntu image with the Gnome desktop. This then has been working great. I got everything re-installed quite quickly and was back to being productive. I like Gnome much better than Unity and I haven’t had any problems. Similarly, I think VMWare works a bit better than VirtalBox and I think I get a bit better performance in this configuration.

I have Python along with all the Python scientific and numerical computing libraries working. I have TensorFlow working. I spend most of my time in Terminal windows and the IDLE3 IDE, but occasionally use FireFox and some of the other programs pre-installed with the distribution.

gnome

I’m greatly enjoying working with Linux again, and I’m considering replacing my currently broken desktop computer with something inexpensive natively running Linux. I haven’t really enjoyed the direction Windows has taken after Windows 7 and I’m thinking of perhaps doing most of my computing on Linux and MacOS.

Summary

I am enjoying using Linux again. In spite of my initial problems with Ubuntu’s Unity Desktop and then with Fedora (running TensorFlow). Now that I have a good system that seems to be stable and working well I’m pretty happy with it. I’m also glad to be free of things like App stores and its nice to feel in control of my environment when running Linux. Anyway this was the small first step to TensorFlow.

Written by smist08

August 23, 2016 at 11:40 pm