Stephen Smith's Blog

Musings on Machine Learning…

An Introduction to Image Style Transfer

leave a comment »

Introduction

Image Style Transfer is an AI technique that is becoming quite popular for enhancing or stylizing photos. It takes one picture (often a classical painting) and then applies the style of that picture to another picture. For example I could take this photo of the Queen of Surrey passing Hopkins Landing:

Combined with the style of Vincent van Gogh’s Starry Night:

To then feed these through the AI algorithm to get:

In this article, we’ll be look at some of the ways you can accomplish this yourself either through using online services or running your own Neural Network with TensorFlow.

Playing with Image Style Transfer

There are lots of services that let you play with this. Generally to apply a canned style to your own picture is quite fast (a few seconds). To provide your own photo as the style photo is more involved, since it involves “training” the style and this can take 30 minutes (or more).

Probably the most popular program is the Prisma app for either iPhone or Android. This app has a large number of pre-trained styles and can apply any of them to any photo on your phone. This app works quite well and gives plenty of variety to play with. Plus its free. Here is the ferry in Prisma’s comic theme:

If you want to provide your own photo as the style reference then deepart.io is a good choice. This is available as a web app as well as either an iPhone or Android app. The good part about this for photographers is that you can copy photos from your good camera to your computer and then use this program’s website, no phone required. This site has some pre-programmed styles based on Vincent van Gogh which work really quickly and produce good results. Then it has the ability to upload a style photo. Processing a style is more work and typically takes 25 minutes (you can pay to have it processed quicker, but not that much quicker). If you don’t mind the wait this site is free and works quite well. Here is an example of the ferry picture above van Gogh’ized by deepart.io (sorry they don’t label the styles so I don’t know which painting this is styled from):

Playing More Directly

These programs are great fun, but I like to tinker with things myself on my computer. So can I run these programs myself? Can I get the source code? Fortunately the answer to both is yes. This turns out to be a bit easier than you first might think, largely due to a project out of the Visual Geometry Group (VGG) at the University of Oxford. They created an exceptional image recognition neural network that they trained and won several competitions with. It turns out that the backbone to doing Image Style Transfer is to have a good image recognition Neural Network. This Neural Net is 19 layers deep and Oxford released the fully trained network for anyone to use. Several people have then taken this network, figured out how to load it into TensorFlow and created some really good Image Style Transfer programs based on this. The first program I played with was Anish Athalye’s program posted on GitHub here. This program uses VGG and can train a neural network for a given style picture. Anish has quite a good write up on his blog here.

Then I played with a program that expanded on Anish’s by Shafeen Tejani which is on GitHub here along with a blog post here. This program lets you keep the trained network so you can perform the transformation quickly on any picture you like. This is similar to how Prisma works. The example up in the introduction was created with this picture. To train the network you require a training set of image like the Microsoft COCO collection.

Running these programs isn’t for everyone. You have to be used to running Python programs and have TensorFlow installed and working on your system. You need a few other dependent Python libraries and of course you need the VGG saved Neural Network. But if you already have Python and TensorFlow, I found both of these programs just ran and I could play with them quite easily.

The writeups on all these programs highly recommend having a good GPU to speed up the calculations. I’m playing on an older MacBook Air with no GPU and was able to get quite good results. One trick I found that helped is to play with reduced resolution images to help speed up the process, then run the algorithm on a higher resolution version when you have things right. I found I couldn’t use the full resolution from my DLSR (12meg), but had to use the Apple’s “large” size (286KB).

Summary

This was a quick introduction to Image Style Transfer. We are seeing this in more and more places. There are applications that can apply this same technique to videos. I expect this will become a standard part of all image processing software like PhotoShop or Gimp. It also might remain the domain of specialty programs like HDR has, since it is quite technical and resource intensive. In the meantime projects like VGG have made this technology quite accessible for anyone to play with.

Written by smist08

August 14, 2017 at 6:48 pm

A Crack in the TensorFlow Platform

leave a comment »

Introduction

Last time we looked at how some tunable parameters through off a TensorFlow solution of a linear regression problem. This time we are going to look at a few more topics around TensorFlow and linear regression. Then we’ll look at how Google is implementing Linear Regression and some problems with their approach.

TensorFlow Graphs

Last time we looked at calculating the solution to a linear regression problem directly using TensorFlow. That bit of code was:

# Now lets calculated the least squares fit exactly using TensorFlow
X = tf.constant(data[:,0], name="X")
Y = tf.constant(data[:,1], name="Y")

Xavg = tf.reduce_mean(X, name="Xavg")
Yavg = tf.reduce_mean(Y, name="Yavg")
num = (X - Xavg) * (Y - Yavg)
denom = (X - Xavg) ** 2
rednum = tf.reduce_sum(num, name="numerator")
reddenom = tf.reduce_sum(denom, name="denominator")
m = rednum / reddenom
b = Yavg - m * Xavg
with tf.Session() as sess:
    writer = tf.summary.FileWriter('./graphs', sess.graph)
    mm, bb = sess.run([m, b])

 

TensorFlow does all its calculations based on a graph where the various operators and constants are nodes that then get connected together to show dependencies. We can use TensorBoard to show the graph for the snippet of code we just reviewed here:

Notice that TensorFlow overloads the standard Python numerical operators, so when we get a line of code like: “denom = (X – Xavg) ** 2”, since X and Xavg are Tensors then we actually generate TensorFlow nodes as if we had called things like tf.subtract and tf.pow. This is much easier code to write, the only downside being that there isn’t a name parameter to label the nodes to get a better graph out of TensorBoard.

With TensorFlow you perform calculations in two steps, first you build the graph (everything before the with statement) and then you execute a calculation by specifying what you want. To do this you create a session and call run. In run we specify the variables we want calculated. TensorFlow then goes through the graph calculating anything it needs to, to get the variables we asked for. This means it may not calculate everything in the graph.

So why does TensorFlow follow this model? It seems overly complicated to perform numerical calculations. The reason is that there are algorithms to separate graphs into separate independent components that can be calculated in parallel. Then TensorFlow can delegate separate parts of the graph to separate GPUs to perform the calculation and then combine the results. In this example this power isn’t needed, but once you are calculating a very complicated large Neural Network then this becomes a real selling point. However since TensorFlow is a general tool, you can use it to do any calculation you wish on a set of GPUs.

TensorFlow’s New LinearRegressor Estimator

Google has been trying to turn TensorFlow into a platform for all sorts of Machine Learning algorithms, not just Neural Networks. They have added estimators for Random Forests and for Linear Regression. However they did this by using the optimizers they created for Neural Nets rather than using the standard algorithms used in other libraries, like those implemented in SciKit Learn. The reasoning behind this is that they have a lot of support for really really big models with lots of support for one-hot encoding, sparse matrices and so on. However the algorithms that solve the problem seem to be exceedingly slow and resource hungry. Anything implemented in TensorFlow will run on a GPU, and similarly any Machine Learning algorithm can be implemented in TensorFlow. The goal here is to have TensorFlow running the Google AI Cloud where all the virtual machines have Google designed GPU like AI accelerator hardware. But I think unless they implement the standard algorithms, so they can solve things like a simple least squares regression quickly hand accurately then its usefulness will be limited.

Here is how you solve our fire versus theft linear regression this way in TensorFlow:

 

features = [tf.contrib.layers.real_valued_column("x", dimension=1)]
estimator = tf.contrib.learn.LinearRegressor(feature_columns=features,
     model_dir='./linear_estimator')
# Input builders
input_fn = tf.contrib.learn.io.numpy_input_fn({"x":x}, y,
     num_epochs=10000)

estimator.fit(input_fn=input_fn, steps=2000)

mm = estimator.get_variable_value('linear/x/weight')
bb = estimator.get_variable_value('linear/bias_weight')
print(mm, bb)

 

This solves the problem and returns a slope of 1.50674927 and intercept of 13.47268105 (the correct numbers from last post are 1.31345600492 and 16.9951572327). By increasing the steps in the fit statement I can get closer to the correct answer, but it is very time consuming.

The documentation for these new estimators is very limited, so I’m not 100% sure it’s solving least squares, but I tried getting the L1 solution using SciKit Learn and it was very close to least squares, so whatever this new estimator is estimating (which might be least squares), it is very slow and quite inaccurate. It is also strange that we now have a couple of tunable parameters added to make a fairly simple calculation problematic. The graph for this solution isn’t too bad, but still since we know the exact solution it is a bit disappointing.

Incidentally I was planning to compare the new TensorFlow RandomForest estimator to the Scikit Learn implementation. Although the SciKit Learn one is quite fast, it uses a huge amount of memory so I kind of would like a better solution. But when I compared the two I found the TensorFlow one so bad (both slow and resource intensive) that I didn’t bother blogging it. I hope that by the time this solution becomes more mainstream in TensorFlow that it improves a lot.

Summary

TensorFlow is a very powerful engine for performing calculations that can be automatically parallelized and distributed over multiple GPUs for amazing computational speeds. This really does make it possible to spend a few thousand dollars and build quite a powerful supercomputer.

The downside is that Google appears to have the hammer of their neural network optimizers that they really want to use. As a result they are treating everything else as a nail and hitting it with this hammer. The results are quite sub-optimal. I think they do need to spend the time to implement a few of the standard non-Neural Network algorithms properly in TensorFlow if they really want to unleash the power of this platform.

Written by smist08

August 8, 2017 at 10:09 pm

Dangers of Tunable Parameters in TensorFlow

with one comment

Introduction

One of the great benefits of the Internet era has been the democratization of knowledge. A great contributor to this is the number of great Universities releasing a large number of high quality online courses that anyone can access for free. I was going through one of these, namely Stanford’s CS 20SI: Tensorflow for Deep Learning Research and playing with TensorFlow to follow along. This is an excellent course and the course notes could be put together into a nice book on TensorFlow. I was going through “Lecture note 3: Linear and Logistic Regression in TensorFlow”, which starts with a simple example of using TensorFlow to perform a linear regression. This example demonstrates how to use TensorFlow to solve this problem iteratively using Gradient Descent. This approach will then be turned to much harder problems where this is necessary, however for linear regression we can actually solve the problem exactly. I did this and got very different results than the lesson. So I investigated and figured I’d blog a bit on why this is the case as well as provide some code for different approaches to this problem. Note that a lot of the code in this article comes directly from the Stanford course notes.

The Example Problem

The sample data they used was fire and theft data in Chicago to see if there is a relation between the number of fires in a neighborhood to the number of thefts. The data is available here. If we download the Excel version of the file then we can read it with Python XLRD package.

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import xlrd

DATA_FILE = "data/fire_theft.xls"

# Step 1: read in data from the .xls file
book = xlrd.open_workbook(DATA_FILE, encoding_override="utf-8")
sheet = book.sheet_by_index(0)
data = np.asarray([sheet.row_values(i) for i in range(1, sheet.nrows)])
n_samples = sheet.nrows - 1

With the data loaded in we can now try linear regression on it.

Solving With Gradient Descent

This is the code from the course notes which solve the problem by minimizing the loss function which is defined as the square of the difference (ie least squares). I’ve blogged a bit about using TensorFlow this way in my Road to TensorFlow series of posts like this one. Its uses the GadientDecentOptimizer and iterates through the data a few times to arrive at a solution.

# Step 2: create placeholders for input X (number of fire) and label Y (number of theft)
X = tf.placeholder(tf.float32, name="X")
Y = tf.placeholder(tf.float32, name="Y")

# Step 3: create weight and bias, initialized to 0
w = tf.Variable(0.0, name="weights")
b = tf.Variable(0.0, name="bias")

# Step 4: construct model to predict Y (number of theft) from the number of fire
Y_predicted = X * w + b

# Step 5: use the square error as the loss function
loss = tf.square(Y - Y_predicted, name="loss")

# Step 6: using gradient descent with learning rate of 0.01 to minimize loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)

with tf.Session() as sess:

    # Step 7: initialize the necessary variables, in this case, w and b
    sess.run(tf.global_variables_initializer())

    # Step 8: train the model
    for i in range(100): # run 100 epochs
        for xx, yy in data:

            # Session runs train_op to minimize loss
            sess.run(optimizer, feed_dict={X: xx, Y:yy})

    # Step 9: output the values of w and b
    w_value, b_value = sess.run([w, b])

Running this results in w (the slope) as 1.71838 and b (the intercept) as 15.7892.

Solving Exactly with TensorFlow

We can solve the problem exactly with TensorFlow. You can find the formula for this here, or a complete derivation of the formula here.

# Now lets calculated the least squares fit exactly using TensorFlow
X = tf.constant(data[:,0], name="X")
Y = tf.constant(data[:,1], name="Y")

Xavg = tf.reduce_mean(X, name="Xavg")
Yavg = tf.reduce_mean(Y, name="Yavg")
num = (X - Xavg) * (Y - Yavg)
denom = (X - Xavg) ** 2
rednum = tf.reduce_sum(num, name="numerator")
reddenom = tf.reduce_sum(denom, name="denominator")
m = rednum / reddenom
b = Yavg - m * Xavg
with tf.Session() as sess:
    writer = tf.summary.FileWriter('./graphs', sess.graph)
    mm, bb = sess.run([m, b])

This results in a slope of 1.31345600492 and intercept of 16.9951572327.

Solving with NumPy

My first thought was that I did something wrong in TensorFlow, so I thought why not just solve it with NumPy. NumPy has a linear algebra subpackage which easily solves this.

# Calculate least squares fit exactly using numpy's linear algebra package.
x = data[:, 0]
y = data[:, 1]
m, c = np.linalg.lstsq(np.vstack([x, np.ones(len(x))]).T, y)[0]

There is a little extra complexity since it handles n dimensions, so you need to reformulate the data from a vector to a matrix for it to be happy. This then returns the same result as the exact TensorFlow, so I guess my code was somewhat correct.

Visualize the Results

You can easily visualize the results with matplotlib.

# Plot the calculated line against the data to see how it looks.
plt.plot(x, y, "o")
plt.plot([0, 40], [bb, mm * 40 + bb], 'k-', lw=2)
plt.show()

This leads to the following pictures. First we have the plot of the bad result from GradientDecent.

This course instructor looked at this and decided it wasn’t very good (which it isn’t) and that the solution was to fit the data with a parabola instead. The parabola gives a better result as far as the least squares error because it nearly goes through the point on the upper right. But I don’t think that leads to a better predictor because if you remove that one point the picture is completely different. My feeling is that the parabola is already overfitting the problem.

Here is the result with the exact correct solution:

To me this is a better solution because it represents the lower right data better. Looking at this gives much less impetus to replace it with a concave up parabola. The course then looks at some correct solutions, but built on the parabola model rather than a linear model.

What Went Wrong?

So what went wrong with the Gradient Descent solution? My first thought was that it didn’t iterate the data enough, just doing 100 iterations wasn’t enough. So I increased the number of iterations but this didn’t greatly improve the result. I know that theoretically Gradient Descent should converge for least squares since the derivatives are easy and well behaved. Next I tried making the learning rate smaller, this improved the result, and then also doing more iterations solved the problems. I found to get a reasonable result I needed to reduce the learning rate by a factor of 100 to 0.00001 and increase the iterations by 100 to 10,000. This then took about 5 minutes to solve on my computer, as opposed to the exact solution which was instantaneous.

The lesson here is that too high a learning rate leads to the result circling the solution without being able to converge to it. Once the learning rate is reduced so small, it takes a long time for the solution to move from the initial guess to the correct solution which is why we need so many iterations.

This highlights why many algorithms build in adaptable learning rates where they are higher when moving quickly and then they dynamically reduce to zero in on a solution.

Summary

Most Machine Learning algorithms can’t be double checked by comparing them to the exact solution. But this example highlights how a simple algorithm can return a wrong result, but a result that is close enough to fool a Stanford researcher and make them (in my opinion) go in a wrong direction. It shows the danger we have in all these tunable parameters to Machine Learning algorithms, how getting things like the learning rate or number of iterations incorrect can lead to quite misleading results.

 

Written by smist08

August 4, 2017 at 6:25 pm

Your New AI Accountant

leave a comment »

Introduction

We live in a complex world where we’ve accumulated huge amounts of knowledge. Knowing everything for a profession is getting harder and harder. We have better and better knowledge retrieval programs that let us look up information at our fingertips using natural language queries and Google like searches. But in fields like medicine and law, sifting through all the results and sorting out what is relevant and important is getting harder and harder. Especially in medicine there is a lot of bogus and misleading information that can lead to disastrous results. This is a prime application area where Artificial Intelligence and Machine Learning are starting show some real promise. We have applications like IBM’s Watson successfully diagnosing some quite rare conditions that stumped doctors. We have systems like ROSS that provide AI solutions for law firms.

How about AIs supplementing Accountants? Accountants are very busy and in demand.  All the baby boomers are retiring now, and far more Accountants are retiring than are being replaced by young people entering the profession. For many businesses getting professional business advice from Accountants is getting to be a major problem. This affects them properly meeting financial reporting requirements, legal regulatory compliance and generally having a firm complete understanding on how their business is doing. This article is going to look at how AI can help with this problem. We’ll look at the sort of things that AIs can be trained to do to help perform some of these functions. Of course you will still need an Accountant to provide human oversight and to provide a sanity check, but if things are setup correctly to start with, it will save you a lot of time and money.

Interfaces

If you have an AI with Accounting knowledge, how can it help you? In this sections we’ll look at a few ways that the AI system could interact with both the employees of the business as well as the Business Applications the business uses like their Accounting or CRM systems.

Chatbots

Chatbots are becoming more common, here you either type natural language queries to the AI, or it has a voice recognition component that you can talk to. The query processor is connected to the AI and the AI is then connected to your company’s databases as well as a wealth of professional information on the Internet. These AIs usually have multiple components for voice input, natural language processing, various business areas of expertise, and multiple ways of presenting results.

There have been some notable chatbot failures like Microsoft’s Twitter Chatbot which quickly became a racist asshole. But we are starting to see the start of some more successful implementations like Sage’s Pegg or KLM’s Messenger Bot. Plus the general purpose bots like Alexa, Siri and Allo are getting rather good. There are also some really good toolkits, like Amazon Lex, available to develop chatbots so this becomes easier for more and more developers.

In-program Advice

There have been some terrible examples of in-product advice such as the best forgotten Microsoft Clippy. But with advances in User Centered Design, much less intrusive and subtle ways of helping users have emerged. Generally these require good content so what they present is actually useful, plus they have to be unobtrusive so they never interfere with someone doing their work unless they want to pay attention to them. Then when they are used they can offer to make changes automatically, provide more information or keep things to a simple one line tip.

If these help technologies are combined with an AI engine then they can monitor what the user is doing and present application and context based help. For instance suggesting that perhaps a different G/L account should be used here for better Financial Reporting. Perhaps suggesting that the sales taxes on an invoice should be different due to some local regulations. Making suggestions on additional items that should be added to an Accounting document.

These technologies allow the system to learn from how a company uses the product and to make more useful suggestions. As well as having access to industry standards that can be incorporated to assist.

Offline Monitoring

In most larger businesses, the person using the Business Application isn’t the one that needs or can use an Accountant’s advice. Most data entry personnel have to follow corporate procedures and would get fired if they changed what they’ve been told to do, even if it’s wrong. Usually this has to be the CFO or someone senior in the Accounting department. In these cases an AI can monitor what is going on in the business and make recommendations to the right person. Perhaps seeing how G/L Accounts are being used and sending a recommendation for some changes to facilitate better Financial Reporting or regulatory compliance.

Care has to be taken to keep this functionality clear of other unpopular productivity monitoring software that does things like record people’s keystrokes to monitor when they are working and how fast. Generally this functionality has to stick to improving the business rather than be perceived as big brother snitching on everyone.

Summary

Most small business owners consider Accounting as a necessary evil that they are required to do to submit their corporate income tax. They do the minimum required and don’t pay much attention to the results. But as their company grows their Accounting data can give them great insights to how their business is running. Managing Inventory, A/R and A/P make huge differences to a company’s cash flow and profitability. Correctly and proactively handling regulatory compliance can be a huge time saver and huge cost saver in fines and lawsuits.

It used to be that sophisticated programs to handle these things required huge IT departments, millions of dollars invested in software and really was only available to large corporations. With the current advances in AI and Machine Learning, many of these sophisticated functionalities can be integrated into the Business Applications used by all small and medium sized businesses. In fact in a few years this will be a mandatory feature that users expect in all the software they use.

Written by smist08

July 29, 2017 at 8:42 pm

Making Business Applications Intelligent

leave a comment »

Introduction

Today Business Applications tend to be rather boring programs which present the user with rather complicated forms that need to be filled in with a lot of detail. Accuracy is tantamount and there are a lot of security measures to prevent fraud and theft. Companies need to hire large numbers of people to enter data very repetitively into these forms. With modern User Centered Design these forms have become a bit easier to work with and have progressed quite a bit since the original Business Apps on 3270 terminals connected to IBM Mainframes, but I don’t think anyone really considers these applications fun. Necessary and important yes, but still not many people’s favorite programs.

We’ve been talking a lot about the road to strong AI and we’ve looked at a number of AI tools like TensorFlow, but what about more practical applications that are possible right now? My background is working on ERP software, namely Sage 300/Accpac. In this article I’ll be looking at how we’ll be seeing machine learning/AI algorithms start to be incorporated into standard business applications. A lot of what we will talk about here will be integrated into many applications including things like CRM and Business Analytics.

Many of the ideas I talk about in this article are available today, just not all in the same place. Over the coming years I think we’ll see most of these become standard expected features in every Business Application. Just like we expect modern User Centered Design, tomorrow we will expect intelligent algorithms supporting us behind the scenes in everything we do.

Very High Level Diagram of the Main Components of an Intelligent Business Application

Some Quick Ideas

With Machine Learning and AI algorithms there could be many small improvements made to Business Applications, there could be major changes in the way things work, all the way up to automating many of the processes that we currently perform manually. Often small improvements can make a huge difference to the lives of current users and are the easiest to implement, so I don’t want to ignore these possibilities on the way to pursuing larger more ambitious longer term goals. Perhaps these AI applications aren’t as exciting as self-driving cars or real time speech translation, but they will make a huge difference to business productivity and lead to large cost savings to millions of companies. They will provide real business benefit with better accuracy, better productivity and automated business processes that lead to real cost savings and real revenue boosts.

Better Defaulting of Fields

Currently fields tend to be defaulted based on configuration screens configured by administrators. These might change based on an object like a customer or customer group, but tend to be fairly static. An algorithm could watch what a user (or all the users at a company) tend to use and make much more intelligent defaults. These could be based on various contexts of other fields, time/date, current promotions, even news feed items. If defaults are provided more intelligently, then it will save users huge time in data entry.

Better Auto-Suggestions

Currently auto-suggestions on fields tend to be based on a combination of previous values entered and performing a “Google-like” search on what has been typed so far. Like defaulting this could be greatly improved by adding more sophisticated algorithms to improve the suggestions. The real Google search already does this, but most “Google-like” searches integrated into Business Apps do not. Like defaulting, having auto-suggestions give better more intelligent recommendations will greatly improve productivity. Like Google Search uses all your previous searches, trending topics, social media feeds and many other sources, so could your Business Application.

Fraud Detection

Credit card companies already use AI to scan people’s credit card purchasing patterns as well as the patterns of people using stolen credit cards to flag when they think a credit card has been stolen or compromised. Similarly Business Applications can monitor various company procedures and expenses to detect theft (perhaps strangeness in Inventory Adjustments) or unusual payments. Here there could be regulatory restrictions on what data could be used, for instance HR data is probably protected from being incorporated in this sort of analysis. Currently theft and fraud is a huge cost to businesses and AI could help reduce it. Sometimes just knowing that tools like this are being used can act as a major deterrent.

Purchasing

Algorithms could be used to better detect when items are needed to reduce inventory levels. Further the algorithms can continuously search vendor prices looking for deals and consider whether its worth buying now at a cheaper price and incurring the inventory expense or waiting. When you regularly purchase thousands or more items, a dynamic algorithm keeping on track of things can really help.

Customer Data

When you get a new customer you need all sorts of information such as their address, phone number, contacts, etc. Perhaps an algorithm could also search the web and fill in this information automatically (perhaps this is a specific example of better defaulting). Plus the AI could scan various web source (some perhaps pay services for credit ratings and such) to suggest a good credit rating/limit for this new customer. The algorithm could also run in the background and update existing customers as this data changes, since keeping customer data up to date is a major challenge for companies with many customers. Knowing and keeping up to date with your customers is a major challenge for many companies and much of this work can be automated.

Chasing Accounts Receivables

Collecting money is always a major challenge for every company. Much of this work could be automated. Plus algorithms can watch the paying habits of customers to know if say they alway pay on the end of  the quarter, not to worry so much when they go over 30 days. But if a customer suddenly gets credit rating problems or their stock tanks or there is negative news on the company then you better get collecting. Again this is all a lot of work and algorithms can greatly reduce the manual workload and make the whole process more efficient.

Setting Prices

Setting prices is an art and a science. You need to lower prices to move slow moving items out of inventory and try to keep prices high to maximize return. You need to be aware of competitors prices and watch for these items going on sale. Algorithms can greatly help with this. Amazon is a master of this, maintaining millions of prices with AI all over their web site. Algorithms can scan the web for competitive pricing, watch inventory levels and item costs, know where we are in a quarter and how much we need to stimulate sales to meet targets. These algorithms can make all the trade offs of knowing our customer loyalty versus having to be low cost, etc. Similarly this can affect customer and volume discounts. Once you have a lot of items for sale, maintain prices is a lot of work, especially in the world of online shopping where everything is changing so dynamically. With the big guys like Amazon and Walmart using these algorithms so effectively, you need to as well to be competitive.

Summary

This article just gave a few examples of the many places we’ll be seeing AI and Machine Learning algorithms becoming integrated into all our Business Applications. The examples in this article are all possible today and in use individually by large corporations. The cost of all these technologies is coming down and we are seeing these become integrated into lower cost Business Applications for small and medium sized businesses.

As these become adopted by more and more companies, it will become a competitive necessity to adopt them or risk becoming uncompetitive in the fast paced online world. There will still be a human element to monitor and provide policies but humans can perform many of these tasks at the speed and scale that today’s world requires.

For the users of Business Applications, the addition of AI to the User Interactions, should make these applications much more pleasant to operate. Instead of gotchas there will be helpful suggestions and reminders. Instead of needed to memorize and lookup all sorts of codes, these will be usefully provided wherever necessary. I think this transition will be as big as the transition we made from text based applications to GUI applications, but in this case I think the real ROI will be much higher.

 

Written by smist08

July 26, 2017 at 2:03 am

Learning in Brains and Computers

leave a comment »

Introduction

In the last couple of articles we were considering whether the brain is a computer and then what its operating system looks like. In this article we’ll be looking at how the brain learns and comparing that to how learning in a modern AI system works. As we noted before, our DNA doesn’t contain a lot of seed data for the brain, nearly everything we know needs to be learned. This tends to be why as animals become more and more advanced, their childhoods become longer and longer. Besides growing to our full size, we also require that time to learn what we will need to survive on our own as adults for when we leave our parents. Similarly AI systems start without any knowledge, just a seed of random data to start with, then we train them so that they can do the job we desire like driving a car. However, how we train an AI system is quite different than how we train a child, though there are similarities.

How the Brain Learns

For our purposes we are looking at what happens at the neuron level during learning rather than considering higher level theories on educational methods. As we are trained two things happen, on is when something is reinforced then the neural connections are strengthened. Similarly if a pathway isn’t used then it weakens over time. This is controlled by the number of chemical transmitters at the junction between the neural connection. The other thing that happens is the neurons grow new connections. To some degree the brain is always re-wiring itself by growing new connections. As mentioned before thousands of neurons die each day and not all of them are replaced, so as we age we have fewer neurons, but this is counterbalanced by a lifetime of learning where we have continuously grown new neural connections, so as we age perhaps we have fewer neurons, but by far more neural connections. This is partly why staying mentally active and pursuing lifetime learning is so important to maintain mental health into older age.

Interestingly this is also how memory works. This same neural strength adjustment and connection growth is how we encode memories. The system is a bit more complex since we have a short term memory system from which some data is later encoded into long term memory but the basic mechanism are the same. This is why we forget things, if we don’t access a memory then the neural connection will weaken over time and eventually the memory will be forgotten.

A further feature of biological learning is how the feedback loop works. We get information through our senses and can use that for learning, but it’s been shown that the learning is much more effective if it leads to action and then the action provides feedback. For instance if you are shown a picture of a dog and told its a dog, this is far less effective than being provided a dog that you can interact with, by touching and petting. It appears that having exploratory action attached to learning is far more effective in our we learn, especially at young ages. We say this is the input – learn – action loop with feedback rather than just the input – learn loop with feedback.

How AIs Learn

Let’s look specifically at Neural Networks, which have a lot of similarities with the brain. In this case we represent all the connections between neurons as weights in a matrix where zero represents no connection and a non-zero weight represents a connection that we can strengthen by making larger or weaken by making smaller.

To train a Neural Network we need a set of data where we know the answers. Suppose we want to train a Neural Network to recognize handwritten numbers. What we need is a large database of images of handwritten numbers along with the number each image represents. We then train the Neural Network by seeding it with random weights feed each image through the Neural Network and compare how it does to the correct answer. We have sophisticated algorithms like Stochastic Gradient Descent that adjusts the weights in the matrix to produce better results. If we do this enough then we can get very good results from our Neural Network. If often apply some other adjustments such as setting small weights to zero so they really don’t represent a connection or penalizing large weights since these lead to overfitting.

This may seem like a lot of work, and it is, but it can be done in a few hours or days on a fast modern computer, using GPUs if necessary to speed things up. This relies on that we can adjust weights instantly since they are just floating point numbers in a matrix, unlike the brain which needs to make structural changes to the neuron or brain.

A Comparison

To effectively train a Neural Network to recognize handwritten decimal digits (0-9) requires a training database of around 100,000 images. One of the reasons AI has become so successful in recent years has been the creation of many such huge databases that can be used for training.

Although it might feel like it to a parent, it doesn’t require showing a toddler 100,000 images for them to learn their basic numbers. What it does take is more time and a certain amount of repetition. Also the effectiveness is increased if the child can handle the digits (like with blocks) or draw the digits with crayons.

It does take longer to train a toddler than an AI, but this is largely because growing neural connections is a slower process than executing an algorithm on a fast computer which doesn’t have any other distractions. But the toddler will quickly become more effective at performing the task than the AI.

Comparing learning to recognize digits like this may not be accurate, since in the case of the toddler, they are first learning to distinguish objects in their visual field and then recognize objects when they are rotated and seen from separate angles. So the input into learning digits for a brain probably isn’t a set of pixels directly off the optic nerve. The brain will already have applied a number of algorithms it learned previously to present a higher level representation of the digit before being asked to identify each digit. In the same way perhaps our AI algorithm for identifying digits in isolation from pixelated images is useful for AI applications, but isn’t useful on the road to true intelligence and that perhaps we shouldn’t be using these algorithms in so much isolation. We won’t start approaching strong AI till we get many more of the systems working together. For instance for self driving cars, the system has to break a scene up into separate objects before trying to identify them, creating such a system requires several Neural Networks working together to do this work.

Is AI Learning Wrong?

It would appear that the learning algorithm used by the toddler is far superior to the learning algorithm used in the computer. The toddler learns quite quickly based on just a few examples and the quality of the result often beats the quality of a Neural Network. The algorithms used in AI like Stochastic Gradient Descent tend to be very brute force, find new values of the weights that reduce the error and then keep iterating, reducing the error till you get a good enough result. If you don’t get a good enough result then fiddle with the model and try again (we now have meta-algorithms to fiddle with the model for us as well). But is this really correct? It is certainly effective, but seems to lack elegance. It also doesn’t seem to work in as varied circumstances as biological learning works. Is there a more elegant and efficient learning algorithm that is just waiting to be discovered?

Some argue that a passive AI will never work, that the AI needs a way to manipulate its world in order to add that action feedback loop to the learning process. This could well be the case. After all we are training our AI to recognize a bunch of pixels all out of context and independently. If you add the action feedback then you can handle and manipulate a digit to see it from different angles and orientations. Doing this you get far more benefit from each individual training case rather than just relying on brute force and millions of separate samples.

Summary

There are a lot of similarities in how the brain learns versus how we train AIs, but there are also a lot of fundamental differences. AIs rely much more on brute force and volume of training data. Whereas the brain requires fewer examples but can make much more out of each example. For AI to advance we really need to be building systems of multiple Neural Networks rather than focusing so much on individual applications. We are seeing this start to take shape in applications like self-driving cars. In AIs we also need to provide a way to manipulate their environment, even if this just means manipulating the images that they are provided as training data and incorporating that manipulation into the training algorithms to make them much more effective and not so reliant on big data volume. I also think that biological brains are hiding some algorithmic tricks that we still need to learn and that these learning improvements will make progress advance in leaps and bounds.

 

Written by smist08

June 23, 2017 at 6:49 pm

The Brain’s Operating System

with one comment

Introduction

Last time we posited that the human brain and in fact any biological brain is really a type of computer. If that is so, then how is it programmed? What is its operating system? What are the algorithms that facilitate learning and intelligent action? In this article we’ll start to look at some of the properties of this Biological operating system and how it compares to a modern computer operating system like Windows.

Installation

You can install Windows from a DVD that contains about 3.8Gigabytes of information. You install this on a computer that required a huge specification for its construction including the microprocessor, bios, memory and all the other components. So the amount of information required for the combined computer and operating system is far greater than 3.8Gigabytes.

The human genome which is like the DVD for a human is used to construct both the physical human and provide any initial programming for the brain. This human genome contains only 3.2Gigabytes of information. So most of this information is used to build the heart, liver, kidneys, legs, eyes, ears as well as the brain as well as any initial information to store in the brain.

Compared to modern computer operating systems, video games and ERP systems, this is an amazingly compact specification for something as complex as a human body.

This is partly why higher mammals like humans require so much learning as children. The amount of initial information we are born with is very small and limited to the basest survival requirements like knowing how to breathe and eat. Plus perhaps a few primal reflexes like a fear of snakes.

Reliability

Windows runs on very reliable hardware and yet still crashes. The equivalent of a Blue Screen of Death (BSOD) in a human or animal would likely be fatal in the wild. Sure in diseases like epilepsy, an epileptic fit could be considered a biological BSOD, but still in most healthy humans this doesn’t happen. You could also argue that if you reboot your computer every night, your chance of a BSOD is much lower and similarly if a human sleeps every night then they work much more reliably than if they don’t.

The brain has a further challenge that neural cells are dying all the time. It’s estimated that about 9000 neurons die a day. Yet the brain keeps functioning quite well in spite of this. It was originally thought that we were born with all our neurons and that after that they just died off, but more modern research has shown that new neuron cells are in fact produced but not uniformly in the brain. Imagine how well Windows would run if 9000 transistors stopped working in your CPU every day and a few random new transistors were added every now and then? How many BSOD would this cause? It’s a huge testament to the brain’s operating system that it can run so reliably for so long under these conditions.

It is believed that the algorithms must be quite simple to operate under these conditions and able to easily adapt to changing configurations. Interestingly in Neural Networks it’s found that as part of training neural networks, that removing and adding neurons during the training process reduces overfitting and produces better results even in a reliable system. We talked about this “dropout” in this article about TensorFlow. So to some degree perhaps the “unreliability” actually led to better intelligence in biological systems.

Parallelism

Most modern computers are roughly based on the von Neuman architecture and if they do support parallelism it through multiple von Neuman architecture computers synchronized together (ie with multiple cores). The neurons in the brain don’t follow this architecture and operate much more independently and in parallel than the logic gates in a computer do. This is more to do with the skill of the programmer than a constraint on how computer hardware is put together. As a programmer it’s hard enough to program and debug a computer that executes one instruction at a time in a simple linear fashion with simple flow of control statements. Programming a system where everything happens at once is beyond my ability as a programmer. Part of this comes down to economics. Biology programmed the brain’s algorithms over millions of years using trial and error driven by natural selection. I’m required to produce a new program in usually less than one year (now three months in Internet time). Obviously if I put in a project plan to produce a new highly parallel super ERP system in even ten years, it would be rejected as too long, never mind taking a million years.

The brain does have synchronization mechanisms. It isn’t totally an uncontrolled environment. Usually as we study biological systems, at first they just look mushy, but as we study in more detail we find there is an elegant design to them and that they do tend to build on modular building blocks. With the brain we are just starting to get into understanding all these building blocks and how they build together to make the whole.

In studying more primitive nervous systems (without brains), there are two typical simple neural connection, one is from a sensor connecting to a single action, then there is the connection from a sensor connected to a coordinator to do something like activate two flippers to swim straight. These simple coordinators are the start of the brain’s synchronization mechanisms that lead to much more sophisticated coordinated behaviour.

Recurrent and Memory

The brain also is recurrent, it feeds its output back in as input to iterate in a sense to reach a solution. There is also memory, though the brain’s memory is a bit different than computers. Computers have a memory bank that is separate then the executing program. The executing program access the memory but these are two separate systems. In the brain there is only one system and the neurons act as both logic gates (instruction executors) and memory. To some degree you can do this in a computer, but it’s considered bad programming practice. As a programmer we are trained to never hard code data in our programs and this will be pointed out to us at any code review. However the brain does this as a matter of course. But unlike computer programs the brain can dynamically change these as it feels like, unlike our computer programs that need a programmer to be assigned to do a maintenance update.

This leads to another fundamental difference between computers and humans, computers we update by re-installing software. The brain is never reinstalled, what you get initially is what you live with, it can learn and adapt, but never be reinstalled. This eliminates one of tech supports main solutions to problems, namely reinstall the operating system. The other tech support solution to problems of turn the computer off and on again is quite difficult with biological brains and involves a defibrillator.

Summary

This article was an initial look at the similarities and differences between the brain and a modern computer. We looked at a few properties of the brain’s operating system and how they compare to something like Windows. Next time we’ll start to look at how the brain is programmed, namely how we learn.

 

Written by smist08

June 16, 2017 at 6:52 pm