Posts Tagged ‘Neural Networks’
I’ve been using Google’s TensorFlow machine learning platform for some time now starting with version 0.8, going onto 0.9 and now playing with 1.0 which was released last week. There are some really good videos from the release summit posted on YouTube here. This blog article looks at the evolution of TensorFlow and what 1.0 brings to the table.
Installing the new TensorFlow 1.0 on MacOS was fairly painless, I chose to install it natively rather than using a VM type solution since I don’t try to run multiple versions of Python, just stick to the latest. They recommend using Docker or other VM technology to avoid having to install at all, but I didn’t have any problems.
More Than Neural Networks
TensorFlow has always been built on a low level compute engine that executes graphs of operations on matrices and vectors (tensors). However the main tutorials and higher level functions were always oriented to performing Neural Network calculations. It contains very good algorithms for training Neural Networks and had all the supporting functions you needed to create very powerful Neural Network models. It contained a Linear Regression function, but this was mainly used as a simple tutorial rather than anything real.
With 1.0 TensorFlow is adding a large number of other popular machine learning algorithms out of the box so you can use Random Forests, Support Vector Machines, and many other standard libraries that you find in more complete libraries like scikit-learn. The list of standard algorithms isn’t as full as scikit-learn yet, and a very notable omission is the ensemble method of gradient boosting (which is promised sometime soon).
I’ve been entering some Kaggle competitions where penalized regression, random forests and gradient boosting are often the algorithms that produce the best results. However TensorFlow under Keras has been doing quite well. Often the winning solution is a combination of several of these, since an average of independent techniques will give better results.
The good thing about this is that TensorFlow provides very good GPU and other hardware accelerator support, so now all these algorithms can benefit from this. In addition Google is now offering (in beta) a machine learning cloud service which runs TensorFlow on optimized accelerated hardware. In the past if this only had TensorFlow the usage would have been limited since most full applications use a combination of algorithms in the final deployment.
As TensorFlow went through the 0.x versions, there were quite a few API changes that caused you to be frequently updating your programs. With version 1.0 the claim is that for the part of TensorFlow that is in the core library, API compatibility will now be maintained.
A lot of the changes for 1.0 were to make the naming conventions more standard, including following the lead of Python’s Numpy library (so the same function didn’t have a different name in NumPy vs TensorFlow). All this should make coding a bit more straightforward and reduce always having to look everything up continuously.
However beware that a lot of the new advertised features in TensorFlow 1.0 are not in the core library yet, and so their API may change until they are moved there.
The good thing is that Google provided a Python script to convert previous TensorFlow Python programs up to the new API level. This worked fine for my programs, so as to make the process rather painless.
Higher Level APIs
A criticism of TensorFlow was that although it was a great low level framework, it was difficult or tedious to do a number of standard operations, like for instance setting up a simple multi-level neural network. Due to this omission sevel developers created competing high level abstractions to run on various lower level libraries. Probably the most successful of these is Keras which runs on top of both TensorFlow and Theano.
With 1.0 TensorFlow is adding a higher level API which works with all the various algorithms it contains as well as adding a Keras compatible library as a nod to the heavy adoption that Keras has enjoyed.
The non-neural network algorithms follow the API conventions in scikit-learn, which are very efficient. The whole thing is also oriented so you can feed one component into another so you can easily build a compound model consisting of several algorithms and then easily train and deploy the whole thing.
Generally this is a good thing for people looking to just use TensorFlow since the amount of code you need to write becomes much smaller and it embodies all the TensorFlow best practices so it works properly with TensorBoard, deploys flexibly, etc.
The TensorFlow documentation has been greatly improved. The tutorials are way better and it’s much easier to get a basic understanding of TensorFlow from the introductory material. There are also many more videos available as well as training courses.
Although this is all a huge step forward, one annoying side effect is that all the external links, say from Stack Overflow articles (or even Google searches) are now broken.
Some of the other notable additions include a new experimental TensorFlow compiler XLA, APIs for Go and Java, addition of a command line debugger, improvements to TensorBoard for better visualizations and lots of additional hardware support.
Windows support was added in version 0.10 which is new since my original blogs. There is support to use Qualcomm DSP chips for co-processing which should greatly enhance the capabilities of Android phones containing this chip.
TensorFlow has come a long way over the last year from a rather specialized Neural Network tool, evolving into a complete machine learning platform. The open source community around TensorFlow is extremely vibrant and extends quite far beyond just Google employees. Looking at what is scheduled for the next couple of point releases looks very exciting and I’m finding this tool becoming more powerful in leaps and bounds.
We’ve now quickly covered a number of preliminary topics including Linux, Python, Python Libraries and some Stock Market theory. Now we are ready to start talking about Neural Networks and TensorFlow.
TensorFlow is Google’s open source platform for performing the types of numerical computations required by Neural Networks. It isn’t specific to Neural Networks, but has a lot of supporting functions to help with their development. If you had another application that required lots of matrix algebra, then perhaps TensorFlow would also work for you. TensorFlow supports optimized mathematical operations that can either run on your native CPU or be offloaded to a GPU. Google has even developed a custom processor chip to run TensorFlow operations in their data centers.
TensorFlow now powers quite a few Google products for things like speech recognition, photo recognition, and is even giving back some Google search results.
Biological Versus the Mechanical
A lot of AI researchers like to distance themselves from taking how biological neurons exactly work and rather to just take certain ideas. They point out that to achieve manned flight required taking ideas from birds like wing design while throwing away other ideas like wings flapping. Similarly, for neural networks they take some ideas and throw others away.
If you are interested in a more precise simulation of the brain, check out Waterloo University’s Nengo project. This is a very interesting simulation of the brain that has been able to solve a number of problems. In this discussion we’ll be looking at what is more typically done these days in neural networks which tend to take the ideas where the math works easiest and skipping the rest.
From Neurons to Matrix Equations
Consider a bunch of neurons in the brain as depicted in the following diagram.
Inputs come into each neuron and then if a weighted sum of the signals it receives is high enough then its outputs will fire (with a certain strength) which will then feed into another layer of neurons. This rather simplistic model of neurons and the brain is what we will model for our initial neural networks.
We will take some sort of vector of inputs and feed them into an input layer of neurons which based on the weighted sums of these inputs will fire with some strength into the next layer of neurons. In neural networks any layers of neurons that aren’t externally connected to inputs or outputs are called hidden layers. The following diagram shows this model.
Notice that all the inputs connect to all the next layer of neurons. In a biological brain, there won’t be that many connections, but here when we train this model to determine the weights, some weights will be zero (or very small) corresponding to there not really being a connection. But having a fixed complete set of connections really is just convenience to make the math easier and more uniform.
If you work out the math of doing all these weighted sums you quickly realize, you are just doing matrix algebra and you can get the input to the next layer by multiplying the inputs to this layer by a matrix. So:
Output of Layer = A x (Input of Layer)
Where A is the matrix of weights. That’s simple and easy to calculate (just ignoring for now where the elements of the matrix A come from).
If you remember your matrix algebra you will realize that if you do this to each layer, since this is just linear, you can multiply all the matrixes together and reduce the multiple layer problem to a single layer problem. So in this simple view there is no value in multiple layers. Additionally, linear models are overly simple and can be constructed and solved quite easily. Also with this the output is unbounded, it can come out at any magnitude, which clearly real neurons can’t.
What most neural networks do is add a non-linear activation function to this equation. The activation function maps the output value back into a valid range, adds a non-linearity so the whole equation doesn’t just transform back to one layer as well as adds flexibility in how the model can produce values. The new form of the equation then becomes:
Output of Layer = ActivationFunction( A x (Input of Layer) + b )
Where b is a scalar vector that allows the output to be shifted into range of the activation function. The simplest activation function is the rectifier function defined as f(x) = max( 0, x ). This basically returns x if x is positive and 0 if x is negative. This is good if we only want positive values as output, it is really simple and it does behave like some biological networks. On the downside, it isn’t invertible so we can’t run the network backwards (useful for sanity checking), it isn’t differentiable everywhere (helps with solving for the weights) and it doesn’t provide an upper bound on the output. All that being said, ReLU (Rectified Linear Unit) neural networks are currently the most popular. A smooth version of ReLU is the softplus function f(x) = ln(1+ex). Other choices of activation function include logistic sigmoid (from probability theory) and hyperbolic tangent (tanh) which we will use.
We’re still a bit theoretical at this point, but once we consider what the inputs look like and what we want for an output then we can start to solve for the bits in the middle. If we have good values for the various A matrixes and b vectors then we can see that with some matrix multiplication, addition and simple function evaluation we can get solutions and as it turns out both modern CPUs and especially GPUs are really good at this.
Stock Market Example
We’ll now start looking at this with a simple stock market example to get an idea how this all works. Suppose we want to feed in the last 30 adjusted closing prices for the 30 stocks that compose the Dow Jones index and we want our neural network to output the next day closing prices for these 30 stocks. We will be starting simple to give the basic ideas then we’ll look at making this model more sophisticated. Let’s see how we can go about this.
Our Input Vector
For any Neural Network we have to feed a vector of floating point numbers. So let’s consider feeding in a vector consisting of the last 30 adjusted closing prices of the first Dow component followed by the last 30 adjusted closes of the next component and so on. This means out input vector will contain 900 elements containing the last 30 adjusted closes of each of the 30 Dow stocks.
You can do this but it causes problems because the activation function we are going to use returns values between -1 and 1. Typically neural networks work best with values in this range (or maybe 0 to 1 if only positive values are required). So to make this work you need to normalize the input data to something that works better. We are going to do three things:
- Divide each stocks price by the first price we have in its history so it starts at 1.
- Rather than use the actual stock price, we’ll use the stock price change (of the price normalized by #1).
- If NaN is returned in the historical data, we will back fill it from the next good value. Fortuneately Pandas provides a function to do this:
This then puts all the values nicely in range and makes them fairly uniform. The reason for step 3 is that when we go to train the neural network we want to train it with lots of historical data and if we don’t do this we can’t go back very far. Visa, in its current corporate incarnation, only went public in 2008 and then was added to the Dow in 2013 (replacing Bank of America). So there is no Visa historical data from before 2008. Actually I chose tanh as the activation function after switching to price changes, originally I used ReLU with real prices but it tended to be rather unstable.
Our Output Vector
Out output vector will be the next price changes for the 30 Dow component stocks. Then we just need to undo the first normalization above in order to use them.
This article was a quick introduction to the equations we are going to solve with TensorFlow and what motivates them. We started to look at how we input data into the model and we will continue next time with finding all the various matrix components by framing it as an optimization problem.