Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘numpy

Playing with my Raspberry Pi

with 5 comments

Introduction

I do most of my work (like writing this blog posting) on my MacBook Air laptop. I used to have a good desktop computer for running various longer running processes or playing games. Last year the desktop packed it in (it was getting old anyway), so since then I’ve just been using my laptop. I wondered if I should get another desktop and run Ubuntu on it, since that is good for machine learning, but I wondered if it was worth price. Meanwhile I was intrigued with everything I see people doing with Raspberry Pi’s. So I figured why not just get a Raspberry Pi and see if I can do the same things with it as I did with my desktop. Plus I thought it would be fun to learn about the Pi and that it would be a good toy to play with.

Setup

Since I’m new to the Raspberry Pi, I figured the best way to get started was to order one of the starter kits. This way I’d be able to get up and running quicker and get everything I needed in one shot. I had a credit with Amazon, so I ordered one of the Canakits from there. It included the Raspberry Pi 3, a microSD card with Raspbian Linux, a case, a power supply, an electronics breadboard, some leds and resistors, heat sinks and an HDMI cable. Then I needed to supply a monitor, a USB keyboard and a USB mouse (which I had lying around).

Setting up was quite easy, though the quick setup instructions were missing a few steps like what to do with the heatsinks (which was obvious) or how to connect the breadboard. Setup was really just install the Raspberry Pi motherboard in the case, add the heat sinks, insert the microSD card and then connect the various cables.

As soon as I powered it on, it displayed an operating system selection and installation menu (with only one choice), so clicked install and 10 minutes later I was logged in and running Raspbian.

The quick setup guide then recommends you set your locale and change the default password, but they don’t tell you the existing password, which a quick Google reveals as “Raspberry”. Then I connected to our Wifi network and I was up and running. I could browse the Internet using Chromium, I could run Mathematica (a free Raspberry version comes pre-installed), run a Linux terminal session. All rather painless and fairly straight forward.

I was quite impressed how quickly it went and how powerful a computer I had up and running costing less than $100 (for everything) and how easy the installation and setup process was.

Software

I was extremely pleased with how much software the Raspberry Pi came with pre-installed. This was all on the provided 32Gig card, which with a few extra things installed, I still have 28Gig free. Amazingly compact. Some of the pre-installed software includes:

  • Mathematica. Great for Math students and to promote Mathematica. Runs from the Wolfram Language which is interesting in itself.
  • Python 2 and 3 (more on the pain of having Python 2 later).
  • LibreOffice. A full MS Office like suite of programs.
  • Lots of accessories like file manager, calculator, image viewer, etc.
  • Chromium web browser.
  • Two Java IDEs.
  • Sonic Pi music synthesizer.
  • Terminal command prompt.
  • Minecraft and some Python games.
  • Scratch programming environment.

Plus there is an add/remove software program where you can easily add many more open source Pi programs. You can also use the Linux apt-get command to get many other pre-compiled packages.

Generally I would say this is a very complete set of software for any student, hobbyist or even office worker.

Python

I use Python as my main goto programming language these days and generally I use a number of scientific and machine learning libraries. So I tried installing these. Usually I just use pip3 and away things go (at least on my Mac). However doing this caused pip3 to download the C++/Fortran source code and to try to compile it, which failed. I then Googled around on how to best install these packages.

Unfortunately most of the Google results were how to do this for Python 2, which I didn’t want. It will be so nice when Python 2 finally is discontinued and stops confusing everything. I wanted these for Python 3. Before you start you should update apt-get’s list of available software and upgrade all the packages on your machine. You can do this with:

sudo apt-get update        # Fetches the list of available updates
sudo apt-get upgrade       # Strictly upgrades the current packages

What I found is I could get most of what I wanted using apt-get. I got most of what I wanted with:

sudo apt-get install python3-numpy
sudo apt-get install python3-scipy
sudo apt-get install python3-matplotlib
sudo apt-get install python3-pandas

However I couldn’t find and apt-get module for SciKit Learn the machine learning library. So I tried pip3 and it did work even though it downloaded the source code and compiled it.

pip3 install sklearn –upgrade

Now I had all the scientific programming power of the standard Python libraries. Note that since the Raspberry Pi only has 1Gig RAM and the SD Card only has twenty something Gig free, you can’t really run large machine learning tasks. However if they do fit within the Pi then it is a very inexpensive way to do these computations. What a lot of people do is build clusters of Raspberry Pi’s that work together. I’ve seen articles on how University labs have built supercomputers out of hundreds or Pi’s all put together in a cluster. Further they run quite sophisticated software like Hadoop, Docker and Kubernetes to orchestrate the whole thing.

Summary

I now have the Raspberry Pi up and running and I’m enjoying playing with Mathematica and Sonic Pi. I’m doing a bit of Python programming and browsing the Internet. Quite an amazing little device. I’m also impressed with how much it can do for such a low cost. As other vendors like Apple, Microsoft, HP and Dell try to push people into more and more expensive desktops and laptops, it will be interesting to see how many people revolt and switch to the far more inexpensive DIY type solutions. Note that there are vendors that make things like Raspberry Pi complete desktop computers at quite a low cost as well.

Written by smist08

November 11, 2017 at 9:35 pm

The Road to TensorFlow – Part 3: Python Libraries

with 3 comments

Introduction

Continuing on with my long and winding journey to learn TensorFlow, we started with Linux then went on to Python. Today we will be looking at a number of necessary Python libraries.

My background is Mathematics and I’ve always had an interest in Numerical Analysis and Scientific Computing. But I mostly left these behind when I left University. As I learned Python and started to play with it, among the attendant libraries, I was very pleasantly surprised to find that all my favorite numerical algorithms (and many more). These were now all part of the Python fairly standard libraries. Many of these core libraries are still written in their original Fortran or C code, but are tailored to fit very well into the Python ecosystem. All of this is all open source software and to a certain degree made possible by the good work of the GNU Fortran and C compilers.

These libraries led to quite a few diversions from my primary task of learning TensorFlow, but I found this to be quite a wonderful world to become conversant in.

As I completed the TensorFlow tutorials and an Udacity course, I wanted a different problem to play with rather than the standard image recognition and speech analysis projects that seem pretty standard. To use these, you need quite a bit of data to train your algorithms with, so I thought why not do something with stock market data? After all you can easily get gobs of stock market data via web service calls fairly easily (and freely).

Some Useful Libraries

Here are a few of the libraries that I found useful to help with machine learning and TensorFlow.

Numpy – this is the fundamental Python numerical package that most other libraries are built over. It includes a powerful N dimensional array object, useful linear algebra, Fourier transform, random number capabilities and much more.

Scipy – is built on numpy and includes most numerical algorithms you’ve ever heard of including numerical integration, ODE solvers, optimization, interpolation, special functions and signal processing.

Matplotlib – is a very powerful 2D plotting library that is very useful to use to visualize your results.n

Pandas – was originally written as a library to manipulate stock market data and perform the standard things market technical analysts like to do, but now it markets itself as a general purpose data analysis library.

Sympy – is a library for performing symbolic mathematics. Although I’m not using this in relation to TensorFlow (currently), it is a fascinating tool for performing symbolic algebra and calculus.

IPython – is interactive Python when you program in interactive web based notebooks. A useful tool to play with, but I tend to do my real programming in an IDE. Still if you want to quickly play with something, this is lots of fun.

Pickle – although this is a standard library, I thought I’d highlight it since we are about to use it. This library lets you easily save and load Pythons objects to disk files.

Scikit-learn – is a collection of machine learning algorithms for things like clustering, classification and regression. I.e. neural networks aren’t the only way to accomplish these tasks.

There are many more Python libraries for things like writing GUI programs, performing web requests, processing web data, accessing databases, etc. We’ll talk about those as we need them. Since Python has such a large community of users and contributors there are tons of good web pages, blogs, books courses and forums on all of these. Google is your friend.

Some Code Finally

So let’s use all of this to load some stock market data which will then be ready for our TensorFlow model. We are going to use Pandas to load some recent prices for the Dow 30 stocks and we’ll use matplotlib to display a graph of their values. This graph is a bit too busy since 30 stocks is also really too many to display at once. Also we haven’t normalized the data at all, so this doesn’t give any real way to compare them. It really only shows we’ve loaded a bunch of data which is hopefully correct.

In this snippet we only load a small bit of history, so its reasonably quick but when we want large amounts of data we will want to cache this. So when we do the web services call to get the data, we pickle it to a file (Python speak for serializing our data object and saving it to a file). If the file exists we just read it from the file and skip the web service call. To refresh the data from the web service, just delete the stocks.pickle file.

We get the data from Yahoo Finance. We could use Yahoo’s Python library directly, but I thought I might use the Pandas DataReader general purpose API to make it easy to switch to Google if Verizon shuts down (or strangles) this service now that they own Yahoo. The Web Services call returns the open, high, low, volume, close and adjusted close which is why we have the couple of lines to clean up the data and only keep the adjusted close. I’ll talk more about the stock market and what the adjusted close is next time.

The program wants to get TrainDataSetSize prices for each stock which is set to 50 below. But due to weekends and holidays, you can’t just subtract 50 from today’s date to get that. So I use a simple heuristic to ensure I get more data than that (which massively overestimates).

import time
import math
import os
from datetime import date
from datetime import timedelta
import numpy as np
import matplotlib
import pandas as pd
import pandas_datareader as pdr
from pandas_datareader import data, wb
from six.moves import cPickle as pickle

TrainDataSetSize = 50

# Load the Dow 30 stocks from Yahoo into a Pandas datasheet

dow30 = ['AXP', 'AAPL', 'BA', 'CAT', 'CSCO', 'CVX', 'DD', 'XOM',
          'GE', 'GS', 'HD', 'IBM', 'INTC', 'JNJ', 'KO', 'JPM',
          'MCD', 'MMM', 'MRK', 'MSFT', 'NKE', 'PFE', 'PG',
          'TRV', 'UNH', 'UTX', 'VZ', 'V', 'WMT', 'DIS']

stock_filename = 'stocks.pickle'
if os.path.exists(stock_filename):
     try:
         with open(stock_filename, 'rb') as f:
             trainData = pickle.load(f)
     except Exception as e:
       print('Unable to process data from', stock_filename, ':', e)
       raise            
     print('%s already present - Skipping requesting/pickling.' %
         stock_filename)
else:
     f = pdr.data.DataReader(dow30, 'yahoo', date.today()-
         timedelta(days=TrainDataSetSize*2+5), date.today())
     cleanData = f.ix['Adj Close']
     trainData = pd.DataFrame(cleanData)
     print('Pickling %s.' % stock_filename)
     try:
         with open(stock_filename, 'wb') as f:
           pickle.dump(trainData, f, pickle.HIGHEST_PROTOCOL)
     except Exception as e:
         print('Unable to save data to', stock_filename, ':', e)

print(trainData)

trainData.plot()
matplotlib.pyplot.show()

 

Generally, I think this is a fairly short bit of code that accomplishes all this. This is one of the beauties of Python that it is so compact.

stocks1

Summary

This was a quick introduction the Python libraries we’ll be using in addition to TensorFlow. Hopefully the quick sample program gave a taste of how we will be using them and is in fact how we will be getting training data for our TensorFlow model.

 

 

Written by smist08

August 30, 2016 at 10:49 pm