Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘TensorBoard

A Crack in the TensorFlow Platform

leave a comment »

Introduction

Last time we looked at how some tunable parameters through off a TensorFlow solution of a linear regression problem. This time we are going to look at a few more topics around TensorFlow and linear regression. Then we’ll look at how Google is implementing Linear Regression and some problems with their approach.

TensorFlow Graphs

Last time we looked at calculating the solution to a linear regression problem directly using TensorFlow. That bit of code was:

# Now lets calculated the least squares fit exactly using TensorFlow
X = tf.constant(data[:,0], name="X")
Y = tf.constant(data[:,1], name="Y")

Xavg = tf.reduce_mean(X, name="Xavg")
Yavg = tf.reduce_mean(Y, name="Yavg")
num = (X - Xavg) * (Y - Yavg)
denom = (X - Xavg) ** 2
rednum = tf.reduce_sum(num, name="numerator")
reddenom = tf.reduce_sum(denom, name="denominator")
m = rednum / reddenom
b = Yavg - m * Xavg
with tf.Session() as sess:
    writer = tf.summary.FileWriter('./graphs', sess.graph)
    mm, bb = sess.run([m, b])

 

TensorFlow does all its calculations based on a graph where the various operators and constants are nodes that then get connected together to show dependencies. We can use TensorBoard to show the graph for the snippet of code we just reviewed here:

Notice that TensorFlow overloads the standard Python numerical operators, so when we get a line of code like: “denom = (X – Xavg) ** 2”, since X and Xavg are Tensors then we actually generate TensorFlow nodes as if we had called things like tf.subtract and tf.pow. This is much easier code to write, the only downside being that there isn’t a name parameter to label the nodes to get a better graph out of TensorBoard.

With TensorFlow you perform calculations in two steps, first you build the graph (everything before the with statement) and then you execute a calculation by specifying what you want. To do this you create a session and call run. In run we specify the variables we want calculated. TensorFlow then goes through the graph calculating anything it needs to, to get the variables we asked for. This means it may not calculate everything in the graph.

So why does TensorFlow follow this model? It seems overly complicated to perform numerical calculations. The reason is that there are algorithms to separate graphs into separate independent components that can be calculated in parallel. Then TensorFlow can delegate separate parts of the graph to separate GPUs to perform the calculation and then combine the results. In this example this power isn’t needed, but once you are calculating a very complicated large Neural Network then this becomes a real selling point. However since TensorFlow is a general tool, you can use it to do any calculation you wish on a set of GPUs.

TensorFlow’s New LinearRegressor Estimator

Google has been trying to turn TensorFlow into a platform for all sorts of Machine Learning algorithms, not just Neural Networks. They have added estimators for Random Forests and for Linear Regression. However they did this by using the optimizers they created for Neural Nets rather than using the standard algorithms used in other libraries, like those implemented in SciKit Learn. The reasoning behind this is that they have a lot of support for really really big models with lots of support for one-hot encoding, sparse matrices and so on. However the algorithms that solve the problem seem to be exceedingly slow and resource hungry. Anything implemented in TensorFlow will run on a GPU, and similarly any Machine Learning algorithm can be implemented in TensorFlow. The goal here is to have TensorFlow running the Google AI Cloud where all the virtual machines have Google designed GPU like AI accelerator hardware. But I think unless they implement the standard algorithms, so they can solve things like a simple least squares regression quickly hand accurately then its usefulness will be limited.

Here is how you solve our fire versus theft linear regression this way in TensorFlow:

 

features = [tf.contrib.layers.real_valued_column("x", dimension=1)]
estimator = tf.contrib.learn.LinearRegressor(feature_columns=features,
     model_dir='./linear_estimator')
# Input builders
input_fn = tf.contrib.learn.io.numpy_input_fn({"x":x}, y,
     num_epochs=10000)

estimator.fit(input_fn=input_fn, steps=2000)

mm = estimator.get_variable_value('linear/x/weight')
bb = estimator.get_variable_value('linear/bias_weight')
print(mm, bb)

 

This solves the problem and returns a slope of 1.50674927 and intercept of 13.47268105 (the correct numbers from last post are 1.31345600492 and 16.9951572327). By increasing the steps in the fit statement I can get closer to the correct answer, but it is very time consuming.

The documentation for these new estimators is very limited, so I’m not 100% sure it’s solving least squares, but I tried getting the L1 solution using SciKit Learn and it was very close to least squares, so whatever this new estimator is estimating (which might be least squares), it is very slow and quite inaccurate. It is also strange that we now have a couple of tunable parameters added to make a fairly simple calculation problematic. The graph for this solution isn’t too bad, but still since we know the exact solution it is a bit disappointing.

Incidentally I was planning to compare the new TensorFlow RandomForest estimator to the Scikit Learn implementation. Although the SciKit Learn one is quite fast, it uses a huge amount of memory so I kind of would like a better solution. But when I compared the two I found the TensorFlow one so bad (both slow and resource intensive) that I didn’t bother blogging it. I hope that by the time this solution becomes more mainstream in TensorFlow that it improves a lot.

Summary

TensorFlow is a very powerful engine for performing calculations that can be automatically parallelized and distributed over multiple GPUs for amazing computational speeds. This really does make it possible to spend a few thousand dollars and build quite a powerful supercomputer.

The downside is that Google appears to have the hammer of their neural network optimizers that they really want to use. As a result they are treating everything else as a nail and hitting it with this hammer. The results are quite sub-optimal. I think they do need to spend the time to implement a few of the standard non-Neural Network algorithms properly in TensorFlow if they really want to unleash the power of this platform.

Advertisements

Written by smist08

August 8, 2017 at 10:09 pm

TensorFlow Goes 1.0

leave a comment »

Introduction

I’ve been using Google’s TensorFlow machine learning platform for some time now starting with version 0.8, going onto 0.9 and now playing with 1.0 which was released last week. There are some really good videos from the release summit posted on YouTube here. This blog article looks at the evolution of TensorFlow and what 1.0 brings to the table.

Installing the new TensorFlow 1.0 on MacOS was fairly painless, I chose to install it natively rather than using a VM type solution since I don’t try to run multiple versions of Python, just stick to the latest. They recommend using Docker or other VM technology to avoid having to install at all, but I didn’t have any problems.

tensorflow-1-0-google-open-source

More Than Neural Networks

TensorFlow has always been built on a low level compute engine that executes graphs of operations on matrices and vectors (tensors). However the main tutorials and higher level functions were always oriented to performing Neural Network calculations. It contains very good algorithms for training Neural Networks and had all the supporting functions you needed to create very powerful Neural Network models. It contained a Linear Regression function, but this was mainly used as a simple tutorial rather than anything real.

With 1.0 TensorFlow is adding a large number of other popular machine learning algorithms out of the box so you can use Random Forests, Support Vector Machines, and many other standard libraries that you find in more complete libraries like scikit-learn. The list of standard algorithms isn’t as full as scikit-learn yet, and a very notable omission is the ensemble method of gradient boosting (which is promised sometime soon).

I’ve been entering some Kaggle competitions where penalized regression, random forests and gradient boosting are often the algorithms that produce the best results. However TensorFlow under Keras has been doing quite well. Often the winning solution is a combination of several of these, since an average of independent techniques will give better results.

The good thing about this is that TensorFlow provides very good GPU and other hardware accelerator support, so now all these algorithms can benefit from this. In addition Google is now offering (in beta) a machine learning cloud service which runs TensorFlow on optimized accelerated hardware. In the past if this only had TensorFlow the usage would have been limited since most full applications use a combination of algorithms in the final deployment.

API Stability

As TensorFlow went through the 0.x versions, there were quite a few API changes that caused you to be frequently updating your programs. With version 1.0 the claim is that for the part of TensorFlow that is in the core library, API compatibility will now be maintained.

A lot of the changes for 1.0 were to make the naming conventions more standard, including following the lead of Python’s Numpy library (so the same function didn’t have a different name in NumPy vs TensorFlow). All this should make coding a bit more straightforward and reduce always having to look everything up continuously.

However beware that a lot of the new advertised features in TensorFlow 1.0 are not in the core library yet, and so their API may change until they are moved there.

The good thing is that Google provided a Python script to convert previous TensorFlow Python programs up to the new API level. This worked fine for my programs, so as to make the process rather painless.

Higher Level APIs

A criticism of TensorFlow was that although it was a great low level framework, it was difficult or tedious to do a number of standard operations, like for instance setting up a simple multi-level neural network. Due to this omission sevel developers created competing high level abstractions to run on various lower level libraries. Probably the most successful of these is Keras which runs on top of both TensorFlow and Theano.

With 1.0 TensorFlow is adding a higher level API which works with all the various algorithms it contains as well as adding a Keras compatible library as a nod to the heavy adoption that Keras has enjoyed.

The non-neural network algorithms follow the API conventions in scikit-learn, which are very efficient. The whole thing is also oriented so you can feed one component into another so you can easily build a compound model consisting of several algorithms and then easily train and deploy the whole thing.

Generally this is a good thing for people looking to just use TensorFlow since the amount of code you need to write becomes much smaller and it embodies all the TensorFlow best practices so it works properly with TensorBoard, deploys flexibly, etc.

Documentation

The TensorFlow documentation has been greatly improved. The tutorials are way better and it’s much easier to get a basic understanding of TensorFlow from the introductory material. There are also many more videos available as well as training courses.

Although this is all a huge step forward, one annoying side effect is that all the external links, say from Stack Overflow articles (or even Google searches) are now broken.

Lots More

Some of the other notable additions include a new experimental TensorFlow compiler XLA, APIs for Go and Java, addition of a command line debugger, improvements to TensorBoard for better visualizations and lots of additional hardware support.

Windows support was added in version 0.10 which is new since my original blogs. There is support to use Qualcomm DSP chips for co-processing which should greatly enhance the capabilities of Android phones containing this chip.

tensorboard

Summary

TensorFlow has come a long way over the last year from a rather specialized Neural Network tool, evolving into a complete machine learning platform. The open source community around TensorFlow is extremely vibrant and extends quite far beyond just Google employees. Looking at what is scheduled for the next couple of point releases looks very exciting and I’m finding this tool becoming more powerful in leaps and bounds.

Written by smist08

February 19, 2017 at 9:32 pm

The Road to TensorFlow – Part 9: TensorBoard

with one comment

Introduction

We’ve spent some time developing a Neural Network model for predicting the stock market. TensorFlow has produced a fairly black box implementation that is trained by historical data and then can output predictions for tomorrow’s prices.

But what confidence do we have that this model is really doing what we want? Last time we discussed some of the meta-parameters that configure the model. How do we know these are vaguely correct? How do we know if the weights we are training are converging? If we want to step through the model, how do we do that?

TensorFlow comes with a tool called TensorBoard which you can use to get some insight into what is happening. You can’t easily just print variables since they are all internal to the TensorFlow engine and only have values when required as a session is running. There is also the problem with how to visualize the variables. The weights matrix is very large and is constantly changing as you train it, you certainly don’t want to print this out repeatedly, let alone try to read through it.

To use TensorBoard you instrument your program. You tell it what you want to track and assign useful names to those items. This data is then written to log files as your model runs. You then run the TensorBoard program to process these log files and view the results in your Web Browser.

Something Went Wrong

Due to household logistics I moved my TensorFlow work over to my MacBook Air from running in an Ubuntu VM image on our Windows 10 laptop. Installing Python 3, TensorFlow and the various other libraries I’m using was quite simple and straight forward. Just install Python from Python.org and then use pip3 to install any other libraries. That all worked fine. But when I started running the program from last time, I was getting NaN results quite often. I wondered if TensorFlow wasn’t working right on my Mac? Anyway I went to debug the program and that led me to TensorBoard. As it turns out there was quite a bad bug in the program presented last time due to un-initialized variables.

You tend to get complacent programming in Python about un-initialized variables (and array subscript range errors) because usually Python will raise and exception if you try to use a variable that hasn’t been initialized. The problem is NumPy which is a library written in C for efficiency. When you create a NumPy array, it is returned to Python, telling Python its good to go. But since its managed by C code you don’t get the usual Python error checking. So when I changed the program to add the volumes to the price changes, I had a bug that left some of the data arrays uninitialized. I suspect on the Windows 10 laptop that these were initialized to zero, but that all depends on which exact C runtime is being used. On the Mac these values were just random memory and that immediately led to program errors.

Adding the TensorBoard initialization showed the problem was originating with the data and then it was fairly straight forward to zero in on the problem and fix it.

As a result, for this article, I’m just going to overwrite the Python file from last time with a newer one (tfstocksdiff2.py) which is posted here. This version includes TensorBoard instrumentation and a couple of other improvements that I’ll talk about next time.

TensorBoard

First we’ll start with some of the things that TensorBoard shows you. If you read an overview of TensorFlow it’s a bit confusing about what are Tensors and what flows. If you’ve looked at the program so far, it shows quite a few algebraic matrix equations, but where are the Tensors? What TensorFlow does is break these equations down into nodes where each node is a function execution and the data flows along the edges. This is a fairly common way to evaluate algebraic expressions and not unique to TensorFlow. TensorFlow then supports executing these on GPUs and in distributed environments as well as providing all the node types you need to create Neural Networks. TensorBoard gives you a way to visualize these graphs. The names of the nodes are from the program instrumentation.

screen-shot-2016-10-01-at-8-30-35-am

When the program was instrumented it grouped things together. Here is an expansion of the trainingmodel box where you can see the operations that make up our model.

screen-shot-2016-10-01-at-8-31-56-am

This gives us some confidence that we have constructed our TensorFlow graph correctly, but doesn’t show any data.

We can track various statistics of all our TensorFlow variables over time. This graph is showing a track of the means of the various weight and bias matrixes.

screen-shot-2016-10-01-at-8-32-50-am

TensorBoard also lets us look at the distribution of the matrix values over time.

screen-shot-2016-10-01-at-8-33-44-am

TensorBoard also lets us look at histograms of the data and how those histograms evolve over time.

screen-shot-2016-10-01-at-8-34-36-am

You can see how the layer 1 weights start as their seeded normal distribution of random numbers and then progress to their new values as training progresses. If you look at all these graphs you can see that the values are still progressing when training stops. This is because TensorBoard instrumentation really slows down processing, so I shortened the training steps while using TensorBoard. I could let it run much longer over night to ensure that I am providing sufficient training for all the values to settle down.

Program Instrumentation

Rather than include all the code here, check out the Google Drive for the Python source file. But quickly we added a function to get all the statistics on a variable:

def variable_summaries(var, name):
  """Attach a lot of summaries to a Tensor."""
  with tf.name_scope('summaries'):
   mean = tf.reduce_mean(var)
   tf.scalar_summary('mean/' + name, mean)
  with tf.name_scope('stddev'):
   stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
   tf.scalar_summary('stddev/' + name, stddev)
  tf.scalar_summary('max/' + name, tf.reduce_max(var))
  tf.scalar_summary('min/' + name, tf.reduce_min(var))
  tf.histogram_summary(name, var)

We define names in the various section and indicate the data we want to collect:

with tf.name_scope('Layer1'):
    with tf.name_scope('weights'):
        layer1_weights = tf.Variable(tf.truncated_normal(
            [NHistData * num_stocks * 2, num_hidden], stddev=0.1))
        variable_summaries(layer1_weights, 'Layer1' + '/weights')
     with tf.name_scope('biases'):
         layer1_biases = tf.Variable(tf.zeros([num_hidden]))
         variable_summaries(layer1_biases, 'Layer1' + '/biases')

Before the call to initialize_all_variables we need to call:

merged = tf.merge_all_summaries()
test_writer = tf.train.SummaryWriter('/tmp/tf/test',
    session.graph )

And then during training:

summary, _, l, predictions = session.run(
    [merged, optimizer, loss, train_prediction], feed_dict=feed_dict)
test_writer.add_summary(summary, i)

Summary

TensorBoard is quite a good tool to give you insight into what is going on in your model. Whether the program is correctly doing what you think and whether there is any sanity to the data. It also lets you tune the various parameters to ensure you are getting best results.

Written by smist08

October 1, 2016 at 4:49 pm