Stephen Smith's Blog

Musings on Machine Learning…

Playing with Image Style Transfer

leave a comment »

Introduction

Last time we introduced Image Style Transfer, an AI algorithm that combines the contents of one image with the style of another image. In this article we are going to look at some ways to play with this process in more advanced ways. We are going to play with Anish Athalye’s implementation which is on GitHub here, this implementation is really good at allowing lots of tuning and playing.

Playing around this way is quite time consuming since you have to run Gradient Descent to find the solution, rather than just applying canned solutions. Since I ran all these on an older MacBook Air with no GPU, I had to use a lower resolution in the interest of time. At lower resolution (the MacOS’s small size) it took about an hour for each image. At medium resolution, it took about six hours to generate an image. This is ok for running over night but doesn’t allow a lot of play. Makes me wonder if I should get a beefy desktop computer with a good NVidia GPU?

I found a really good YouTube video explaining Image Style Transfer here which is well worth a watch.

Playing with Algorithms

We’ve seen in previous articles how we can play with the tunable parameters in AI algorithms to get quite different results. Here we’ll look at the effects of playing with some parameters as well as fiddling with the algorithm itself.

The basic observation that lead to Image Style Transfer was that a deep image recognition neural network extracts the features related to content in the lower layers and the features related to style in the higher layers. Interestingly the human brain’s image recognition neurons appear to be structured in the same sort of way and it is believed there is a fair bit of similarity between how an advanced image recognition algorithm works and how the brain works. This separation of content from style is then the basis for merging and manipulating these.

The Image Style Transfer algorithm works by starting with an image of white noise and then iterating it using stochastic gradient descent to minimize the difference between the content in one image and the style in the other. This is the loss function we often talk about in AI. The interesting part of the algorithm is that we aren’t training the neural network matrix weights, since these are pre-done by the VGG group, but we are training the input image. So we have a loss function like:

Total Loss = Loss of content from first image + Loss of style from the second image

We can then play with this Loss function in various ways which we’ll experiment with in the rest of this article.

Apply Some Weights

Usually in Machine Learning algorithms we apply weights everywhere that we can use to tune things. The same applies here. We can weight the contributions from content versus style in the total loss formula to achieve more of a contribution from style or content.

First we take a picture of Tetrahedron Peak and combine it with Vincent van Gogh’s Starry Night using the default settings of the algorithm:

Now we can try playing with the weight of the content contribution. Lower means more style, higher means more content. In the image above the content weight was the default of 5.

Notice the image on the left is much more abstract with the large stars appearing all over.

Using Multiple Styles

Last time we used one style at a time to get our result. But you can actually use the algorithm to incorporate multiple styles at once. In this case we just generalize the Loss function above as:

Total Loss = Loss of content form first image + Loss of style from style image 1 +
                 Loss of style form style image 2

Of course we can then further generalize this to any number of style images.

We’ll use our Starry Night combination and also use Picasso’s Dora Maar:

Now we will use both pictures for the style and see what we get:

This weights the styles of Starry Night and Dora Maar equally. However you can see from the Loss formula that we can easily weight the components and get say 75% Starry Night and 25% Dora Maar:

 

Now if we reverse the weights and do Starry Night at 25% and Dora Maar at 75%:

Playing with the Neural Network

We can also play with the Neural Network used. We  can change a number of parameters in the Neural Network as well as introduce various scaling and weight factors.

Pooling Type

For instance there are something called Pooling Layers in the network. These reduce the resolution of the image and help with reducing the abstraction from fine level details to higher level abstractions. There are two commonly used types of pooling layers namely average pooling and max pooling. We can try either of these to see what affect that might have on the image style transfer.

Here we see that average pooling favoured fine details and preserved more of the content image. Whereas max pooling used more of the style image and is a bit more abstract.

Exponential Style Layer Weight

Another thing we can do is magnify some layers over others. For instance we can magnify each style layer over the last one as follows:

weight(layer<n+1>) = weight_exp*weight(layer<n>)

The default is 1 (ie none). Here is Tetrahedron Peak using 0.2 and 2.0.

A factor less than one means more original content since some style layers are suppressed, and a factor greater than one magnifies some style layer contributions. Since the style layers aren’t all weighted the same  this is a bit different than just changing the weighting factor between content and style.

Iterations

Another parameter that is fun to play with is the number of iterations that Gradient Descent runs for. Below we can see a sequence of images as the number of iterations is increased. We can see the content and style of the image forming out of the initial white noise.

At this resolution we are pretty much converged at 500 iterations, however for higher resolution and more complicated images more iterations might be necessary. We could also use a stopping criterion like when the loss function stops changing by some delta, rather than using a fixed number of iterations.

This problem converges quite well since it is mathematically well defined. Often in AI, we don’t get this good behaviour because the training data has lots of errors and/or lots of noise. Here we are just training against a content picture and one or more style pictures, so by definition there isn’t any erroneous data. These challenges would have been faced and solved by the team developing the VGG image recognition neural network that we get to just use and don’t have to worry about training.

Summary

As we can see we can get quite a few different effects by tuning the algorithm using the same style picture as a reference. Simple tools like Prisma or deepart.io don’t let you play with all these parameters. As a photographer who is trying to get a specific effect, you want the power and flexibility to tune your style transfer exactly. Right now the only way to do this is to run the AI algorithms on your computer and play with them which is very time consuming. I suspect once this technology is incorporated in more advanced tools then various degrees of tuning will be possible. Adobe has been demonstrating Image Transfer Style in their labs, and it will be interesting to see if they incorporate it into Photoshop and then how much tuning is possible. Also if it runs in the Adobe Creative Cloud, it will be interesting to see whether it’s quicker running that way than running on your own computer.

 

Advertisements

Written by smist08

August 21, 2017 at 4:29 pm

An Introduction to Image Style Transfer

with one comment

Introduction

Image Style Transfer is an AI technique that is becoming quite popular for enhancing or stylizing photos. It takes one picture (often a classical painting) and then applies the style of that picture to another picture. For example I could take this photo of the Queen of Surrey passing Hopkins Landing:

Combined with the style of Vincent van Gogh’s Starry Night:

To then feed these through the AI algorithm to get:

In this article, we’ll be look at some of the ways you can accomplish this yourself either through using online services or running your own Neural Network with TensorFlow.

Playing with Image Style Transfer

There are lots of services that let you play with this. Generally to apply a canned style to your own picture is quite fast (a few seconds). To provide your own photo as the style photo is more involved, since it involves “training” the style and this can take 30 minutes (or more).

Probably the most popular program is the Prisma app for either iPhone or Android. This app has a large number of pre-trained styles and can apply any of them to any photo on your phone. This app works quite well and gives plenty of variety to play with. Plus its free. Here is the ferry in Prisma’s comic theme:

If you want to provide your own photo as the style reference then deepart.io is a good choice. This is available as a web app as well as either an iPhone or Android app. The good part about this for photographers is that you can copy photos from your good camera to your computer and then use this program’s website, no phone required. This site has some pre-programmed styles based on Vincent van Gogh which work really quickly and produce good results. Then it has the ability to upload a style photo. Processing a style is more work and typically takes 25 minutes (you can pay to have it processed quicker, but not that much quicker). If you don’t mind the wait this site is free and works quite well. Here is an example of the ferry picture above van Gogh’ized by deepart.io (sorry they don’t label the styles so I don’t know which painting this is styled from):

Playing More Directly

These programs are great fun, but I like to tinker with things myself on my computer. So can I run these programs myself? Can I get the source code? Fortunately the answer to both is yes. This turns out to be a bit easier than you first might think, largely due to a project out of the Visual Geometry Group (VGG) at the University of Oxford. They created an exceptional image recognition neural network that they trained and won several competitions with. It turns out that the backbone to doing Image Style Transfer is to have a good image recognition Neural Network. This Neural Net is 19 layers deep and Oxford released the fully trained network for anyone to use. Several people have then taken this network, figured out how to load it into TensorFlow and created some really good Image Style Transfer programs based on this. The first program I played with was Anish Athalye’s program posted on GitHub here. This program uses VGG and can train a neural network for a given style picture. Anish has quite a good write up on his blog here.

Then I played with a program that expanded on Anish’s by Shafeen Tejani which is on GitHub here along with a blog post here. This program lets you keep the trained network so you can perform the transformation quickly on any picture you like. This is similar to how Prisma works. The example up in the introduction was created with this picture. To train the network you require a training set of image like the Microsoft COCO collection.

Running these programs isn’t for everyone. You have to be used to running Python programs and have TensorFlow installed and working on your system. You need a few other dependent Python libraries and of course you need the VGG saved Neural Network. But if you already have Python and TensorFlow, I found both of these programs just ran and I could play with them quite easily.

The writeups on all these programs highly recommend having a good GPU to speed up the calculations. I’m playing on an older MacBook Air with no GPU and was able to get quite good results. One trick I found that helped is to play with reduced resolution images to help speed up the process, then run the algorithm on a higher resolution version when you have things right. I found I couldn’t use the full resolution from my DLSR (12meg), but had to use the Apple’s “large” size (286KB).

Summary

This was a quick introduction to Image Style Transfer. We are seeing this in more and more places. There are applications that can apply this same technique to videos. I expect this will become a standard part of all image processing software like PhotoShop or Gimp. It also might remain the domain of specialty programs like HDR has, since it is quite technical and resource intensive. In the meantime projects like VGG have made this technology quite accessible for anyone to play with.

Written by smist08

August 14, 2017 at 6:48 pm

A Crack in the TensorFlow Platform

leave a comment »

Introduction

Last time we looked at how some tunable parameters through off a TensorFlow solution of a linear regression problem. This time we are going to look at a few more topics around TensorFlow and linear regression. Then we’ll look at how Google is implementing Linear Regression and some problems with their approach.

TensorFlow Graphs

Last time we looked at calculating the solution to a linear regression problem directly using TensorFlow. That bit of code was:

# Now lets calculated the least squares fit exactly using TensorFlow
X = tf.constant(data[:,0], name="X")
Y = tf.constant(data[:,1], name="Y")

Xavg = tf.reduce_mean(X, name="Xavg")
Yavg = tf.reduce_mean(Y, name="Yavg")
num = (X - Xavg) * (Y - Yavg)
denom = (X - Xavg) ** 2
rednum = tf.reduce_sum(num, name="numerator")
reddenom = tf.reduce_sum(denom, name="denominator")
m = rednum / reddenom
b = Yavg - m * Xavg
with tf.Session() as sess:
    writer = tf.summary.FileWriter('./graphs', sess.graph)
    mm, bb = sess.run([m, b])

 

TensorFlow does all its calculations based on a graph where the various operators and constants are nodes that then get connected together to show dependencies. We can use TensorBoard to show the graph for the snippet of code we just reviewed here:

Notice that TensorFlow overloads the standard Python numerical operators, so when we get a line of code like: “denom = (X – Xavg) ** 2”, since X and Xavg are Tensors then we actually generate TensorFlow nodes as if we had called things like tf.subtract and tf.pow. This is much easier code to write, the only downside being that there isn’t a name parameter to label the nodes to get a better graph out of TensorBoard.

With TensorFlow you perform calculations in two steps, first you build the graph (everything before the with statement) and then you execute a calculation by specifying what you want. To do this you create a session and call run. In run we specify the variables we want calculated. TensorFlow then goes through the graph calculating anything it needs to, to get the variables we asked for. This means it may not calculate everything in the graph.

So why does TensorFlow follow this model? It seems overly complicated to perform numerical calculations. The reason is that there are algorithms to separate graphs into separate independent components that can be calculated in parallel. Then TensorFlow can delegate separate parts of the graph to separate GPUs to perform the calculation and then combine the results. In this example this power isn’t needed, but once you are calculating a very complicated large Neural Network then this becomes a real selling point. However since TensorFlow is a general tool, you can use it to do any calculation you wish on a set of GPUs.

TensorFlow’s New LinearRegressor Estimator

Google has been trying to turn TensorFlow into a platform for all sorts of Machine Learning algorithms, not just Neural Networks. They have added estimators for Random Forests and for Linear Regression. However they did this by using the optimizers they created for Neural Nets rather than using the standard algorithms used in other libraries, like those implemented in SciKit Learn. The reasoning behind this is that they have a lot of support for really really big models with lots of support for one-hot encoding, sparse matrices and so on. However the algorithms that solve the problem seem to be exceedingly slow and resource hungry. Anything implemented in TensorFlow will run on a GPU, and similarly any Machine Learning algorithm can be implemented in TensorFlow. The goal here is to have TensorFlow running the Google AI Cloud where all the virtual machines have Google designed GPU like AI accelerator hardware. But I think unless they implement the standard algorithms, so they can solve things like a simple least squares regression quickly hand accurately then its usefulness will be limited.

Here is how you solve our fire versus theft linear regression this way in TensorFlow:

 

features = [tf.contrib.layers.real_valued_column("x", dimension=1)]
estimator = tf.contrib.learn.LinearRegressor(feature_columns=features,
     model_dir='./linear_estimator')
# Input builders
input_fn = tf.contrib.learn.io.numpy_input_fn({"x":x}, y,
     num_epochs=10000)

estimator.fit(input_fn=input_fn, steps=2000)

mm = estimator.get_variable_value('linear/x/weight')
bb = estimator.get_variable_value('linear/bias_weight')
print(mm, bb)

 

This solves the problem and returns a slope of 1.50674927 and intercept of 13.47268105 (the correct numbers from last post are 1.31345600492 and 16.9951572327). By increasing the steps in the fit statement I can get closer to the correct answer, but it is very time consuming.

The documentation for these new estimators is very limited, so I’m not 100% sure it’s solving least squares, but I tried getting the L1 solution using SciKit Learn and it was very close to least squares, so whatever this new estimator is estimating (which might be least squares), it is very slow and quite inaccurate. It is also strange that we now have a couple of tunable parameters added to make a fairly simple calculation problematic. The graph for this solution isn’t too bad, but still since we know the exact solution it is a bit disappointing.

Incidentally I was planning to compare the new TensorFlow RandomForest estimator to the Scikit Learn implementation. Although the SciKit Learn one is quite fast, it uses a huge amount of memory so I kind of would like a better solution. But when I compared the two I found the TensorFlow one so bad (both slow and resource intensive) that I didn’t bother blogging it. I hope that by the time this solution becomes more mainstream in TensorFlow that it improves a lot.

Summary

TensorFlow is a very powerful engine for performing calculations that can be automatically parallelized and distributed over multiple GPUs for amazing computational speeds. This really does make it possible to spend a few thousand dollars and build quite a powerful supercomputer.

The downside is that Google appears to have the hammer of their neural network optimizers that they really want to use. As a result they are treating everything else as a nail and hitting it with this hammer. The results are quite sub-optimal. I think they do need to spend the time to implement a few of the standard non-Neural Network algorithms properly in TensorFlow if they really want to unleash the power of this platform.

Written by smist08

August 8, 2017 at 10:09 pm

Dangers of Tunable Parameters in TensorFlow

with 2 comments

Introduction

One of the great benefits of the Internet era has been the democratization of knowledge. A great contributor to this is the number of great Universities releasing a large number of high quality online courses that anyone can access for free. I was going through one of these, namely Stanford’s CS 20SI: Tensorflow for Deep Learning Research and playing with TensorFlow to follow along. This is an excellent course and the course notes could be put together into a nice book on TensorFlow. I was going through “Lecture note 3: Linear and Logistic Regression in TensorFlow”, which starts with a simple example of using TensorFlow to perform a linear regression. This example demonstrates how to use TensorFlow to solve this problem iteratively using Gradient Descent. This approach will then be turned to much harder problems where this is necessary, however for linear regression we can actually solve the problem exactly. I did this and got very different results than the lesson. So I investigated and figured I’d blog a bit on why this is the case as well as provide some code for different approaches to this problem. Note that a lot of the code in this article comes directly from the Stanford course notes.

The Example Problem

The sample data they used was fire and theft data in Chicago to see if there is a relation between the number of fires in a neighborhood to the number of thefts. The data is available here. If we download the Excel version of the file then we can read it with Python XLRD package.

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import xlrd

DATA_FILE = "data/fire_theft.xls"

# Step 1: read in data from the .xls file
book = xlrd.open_workbook(DATA_FILE, encoding_override="utf-8")
sheet = book.sheet_by_index(0)
data = np.asarray([sheet.row_values(i) for i in range(1, sheet.nrows)])
n_samples = sheet.nrows - 1

With the data loaded in we can now try linear regression on it.

Solving With Gradient Descent

This is the code from the course notes which solve the problem by minimizing the loss function which is defined as the square of the difference (ie least squares). I’ve blogged a bit about using TensorFlow this way in my Road to TensorFlow series of posts like this one. Its uses the GadientDecentOptimizer and iterates through the data a few times to arrive at a solution.

# Step 2: create placeholders for input X (number of fire) and label Y (number of theft)
X = tf.placeholder(tf.float32, name="X")
Y = tf.placeholder(tf.float32, name="Y")

# Step 3: create weight and bias, initialized to 0
w = tf.Variable(0.0, name="weights")
b = tf.Variable(0.0, name="bias")

# Step 4: construct model to predict Y (number of theft) from the number of fire
Y_predicted = X * w + b

# Step 5: use the square error as the loss function
loss = tf.square(Y - Y_predicted, name="loss")

# Step 6: using gradient descent with learning rate of 0.01 to minimize loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)

with tf.Session() as sess:

    # Step 7: initialize the necessary variables, in this case, w and b
    sess.run(tf.global_variables_initializer())

    # Step 8: train the model
    for i in range(100): # run 100 epochs
        for xx, yy in data:

            # Session runs train_op to minimize loss
            sess.run(optimizer, feed_dict={X: xx, Y:yy})

    # Step 9: output the values of w and b
    w_value, b_value = sess.run([w, b])

Running this results in w (the slope) as 1.71838 and b (the intercept) as 15.7892.

Solving Exactly with TensorFlow

We can solve the problem exactly with TensorFlow. You can find the formula for this here, or a complete derivation of the formula here.

# Now lets calculated the least squares fit exactly using TensorFlow
X = tf.constant(data[:,0], name="X")
Y = tf.constant(data[:,1], name="Y")

Xavg = tf.reduce_mean(X, name="Xavg")
Yavg = tf.reduce_mean(Y, name="Yavg")
num = (X - Xavg) * (Y - Yavg)
denom = (X - Xavg) ** 2
rednum = tf.reduce_sum(num, name="numerator")
reddenom = tf.reduce_sum(denom, name="denominator")
m = rednum / reddenom
b = Yavg - m * Xavg
with tf.Session() as sess:
    writer = tf.summary.FileWriter('./graphs', sess.graph)
    mm, bb = sess.run([m, b])

This results in a slope of 1.31345600492 and intercept of 16.9951572327.

Solving with NumPy

My first thought was that I did something wrong in TensorFlow, so I thought why not just solve it with NumPy. NumPy has a linear algebra subpackage which easily solves this.

# Calculate least squares fit exactly using numpy's linear algebra package.
x = data[:, 0]
y = data[:, 1]
m, c = np.linalg.lstsq(np.vstack([x, np.ones(len(x))]).T, y)[0]

There is a little extra complexity since it handles n dimensions, so you need to reformulate the data from a vector to a matrix for it to be happy. This then returns the same result as the exact TensorFlow, so I guess my code was somewhat correct.

Visualize the Results

You can easily visualize the results with matplotlib.

# Plot the calculated line against the data to see how it looks.
plt.plot(x, y, "o")
plt.plot([0, 40], [bb, mm * 40 + bb], 'k-', lw=2)
plt.show()

This leads to the following pictures. First we have the plot of the bad result from GradientDecent.

This course instructor looked at this and decided it wasn’t very good (which it isn’t) and that the solution was to fit the data with a parabola instead. The parabola gives a better result as far as the least squares error because it nearly goes through the point on the upper right. But I don’t think that leads to a better predictor because if you remove that one point the picture is completely different. My feeling is that the parabola is already overfitting the problem.

Here is the result with the exact correct solution:

To me this is a better solution because it represents the lower right data better. Looking at this gives much less impetus to replace it with a concave up parabola. The course then looks at some correct solutions, but built on the parabola model rather than a linear model.

What Went Wrong?

So what went wrong with the Gradient Descent solution? My first thought was that it didn’t iterate the data enough, just doing 100 iterations wasn’t enough. So I increased the number of iterations but this didn’t greatly improve the result. I know that theoretically Gradient Descent should converge for least squares since the derivatives are easy and well behaved. Next I tried making the learning rate smaller, this improved the result, and then also doing more iterations solved the problems. I found to get a reasonable result I needed to reduce the learning rate by a factor of 100 to 0.00001 and increase the iterations by 100 to 10,000. This then took about 5 minutes to solve on my computer, as opposed to the exact solution which was instantaneous.

The lesson here is that too high a learning rate leads to the result circling the solution without being able to converge to it. Once the learning rate is reduced so small, it takes a long time for the solution to move from the initial guess to the correct solution which is why we need so many iterations.

This highlights why many algorithms build in adaptable learning rates where they are higher when moving quickly and then they dynamically reduce to zero in on a solution.

Summary

Most Machine Learning algorithms can’t be double checked by comparing them to the exact solution. But this example highlights how a simple algorithm can return a wrong result, but a result that is close enough to fool a Stanford researcher and make them (in my opinion) go in a wrong direction. It shows the danger we have in all these tunable parameters to Machine Learning algorithms, how getting things like the learning rate or number of iterations incorrect can lead to quite misleading results.

 

Written by smist08

August 4, 2017 at 6:25 pm

Your New AI Accountant

leave a comment »

Introduction

We live in a complex world where we’ve accumulated huge amounts of knowledge. Knowing everything for a profession is getting harder and harder. We have better and better knowledge retrieval programs that let us look up information at our fingertips using natural language queries and Google like searches. But in fields like medicine and law, sifting through all the results and sorting out what is relevant and important is getting harder and harder. Especially in medicine there is a lot of bogus and misleading information that can lead to disastrous results. This is a prime application area where Artificial Intelligence and Machine Learning are starting show some real promise. We have applications like IBM’s Watson successfully diagnosing some quite rare conditions that stumped doctors. We have systems like ROSS that provide AI solutions for law firms.

How about AIs supplementing Accountants? Accountants are very busy and in demand.  All the baby boomers are retiring now, and far more Accountants are retiring than are being replaced by young people entering the profession. For many businesses getting professional business advice from Accountants is getting to be a major problem. This affects them properly meeting financial reporting requirements, legal regulatory compliance and generally having a firm complete understanding on how their business is doing. This article is going to look at how AI can help with this problem. We’ll look at the sort of things that AIs can be trained to do to help perform some of these functions. Of course you will still need an Accountant to provide human oversight and to provide a sanity check, but if things are setup correctly to start with, it will save you a lot of time and money.

Interfaces

If you have an AI with Accounting knowledge, how can it help you? In this sections we’ll look at a few ways that the AI system could interact with both the employees of the business as well as the Business Applications the business uses like their Accounting or CRM systems.

Chatbots

Chatbots are becoming more common, here you either type natural language queries to the AI, or it has a voice recognition component that you can talk to. The query processor is connected to the AI and the AI is then connected to your company’s databases as well as a wealth of professional information on the Internet. These AIs usually have multiple components for voice input, natural language processing, various business areas of expertise, and multiple ways of presenting results.

There have been some notable chatbot failures like Microsoft’s Twitter Chatbot which quickly became a racist asshole. But we are starting to see the start of some more successful implementations like Sage’s Pegg or KLM’s Messenger Bot. Plus the general purpose bots like Alexa, Siri and Allo are getting rather good. There are also some really good toolkits, like Amazon Lex, available to develop chatbots so this becomes easier for more and more developers.

In-program Advice

There have been some terrible examples of in-product advice such as the best forgotten Microsoft Clippy. But with advances in User Centered Design, much less intrusive and subtle ways of helping users have emerged. Generally these require good content so what they present is actually useful, plus they have to be unobtrusive so they never interfere with someone doing their work unless they want to pay attention to them. Then when they are used they can offer to make changes automatically, provide more information or keep things to a simple one line tip.

If these help technologies are combined with an AI engine then they can monitor what the user is doing and present application and context based help. For instance suggesting that perhaps a different G/L account should be used here for better Financial Reporting. Perhaps suggesting that the sales taxes on an invoice should be different due to some local regulations. Making suggestions on additional items that should be added to an Accounting document.

These technologies allow the system to learn from how a company uses the product and to make more useful suggestions. As well as having access to industry standards that can be incorporated to assist.

Offline Monitoring

In most larger businesses, the person using the Business Application isn’t the one that needs or can use an Accountant’s advice. Most data entry personnel have to follow corporate procedures and would get fired if they changed what they’ve been told to do, even if it’s wrong. Usually this has to be the CFO or someone senior in the Accounting department. In these cases an AI can monitor what is going on in the business and make recommendations to the right person. Perhaps seeing how G/L Accounts are being used and sending a recommendation for some changes to facilitate better Financial Reporting or regulatory compliance.

Care has to be taken to keep this functionality clear of other unpopular productivity monitoring software that does things like record people’s keystrokes to monitor when they are working and how fast. Generally this functionality has to stick to improving the business rather than be perceived as big brother snitching on everyone.

Summary

Most small business owners consider Accounting as a necessary evil that they are required to do to submit their corporate income tax. They do the minimum required and don’t pay much attention to the results. But as their company grows their Accounting data can give them great insights to how their business is running. Managing Inventory, A/R and A/P make huge differences to a company’s cash flow and profitability. Correctly and proactively handling regulatory compliance can be a huge time saver and huge cost saver in fines and lawsuits.

It used to be that sophisticated programs to handle these things required huge IT departments, millions of dollars invested in software and really was only available to large corporations. With the current advances in AI and Machine Learning, many of these sophisticated functionalities can be integrated into the Business Applications used by all small and medium sized businesses. In fact in a few years this will be a mandatory feature that users expect in all the software they use.

Written by smist08

July 29, 2017 at 8:42 pm

Making Business Applications Intelligent

leave a comment »

Introduction

Today Business Applications tend to be rather boring programs which present the user with rather complicated forms that need to be filled in with a lot of detail. Accuracy is tantamount and there are a lot of security measures to prevent fraud and theft. Companies need to hire large numbers of people to enter data very repetitively into these forms. With modern User Centered Design these forms have become a bit easier to work with and have progressed quite a bit since the original Business Apps on 3270 terminals connected to IBM Mainframes, but I don’t think anyone really considers these applications fun. Necessary and important yes, but still not many people’s favorite programs.

We’ve been talking a lot about the road to strong AI and we’ve looked at a number of AI tools like TensorFlow, but what about more practical applications that are possible right now? My background is working on ERP software, namely Sage 300/Accpac. In this article I’ll be looking at how we’ll be seeing machine learning/AI algorithms start to be incorporated into standard business applications. A lot of what we will talk about here will be integrated into many applications including things like CRM and Business Analytics.

Many of the ideas I talk about in this article are available today, just not all in the same place. Over the coming years I think we’ll see most of these become standard expected features in every Business Application. Just like we expect modern User Centered Design, tomorrow we will expect intelligent algorithms supporting us behind the scenes in everything we do.

Very High Level Diagram of the Main Components of an Intelligent Business Application

Some Quick Ideas

With Machine Learning and AI algorithms there could be many small improvements made to Business Applications, there could be major changes in the way things work, all the way up to automating many of the processes that we currently perform manually. Often small improvements can make a huge difference to the lives of current users and are the easiest to implement, so I don’t want to ignore these possibilities on the way to pursuing larger more ambitious longer term goals. Perhaps these AI applications aren’t as exciting as self-driving cars or real time speech translation, but they will make a huge difference to business productivity and lead to large cost savings to millions of companies. They will provide real business benefit with better accuracy, better productivity and automated business processes that lead to real cost savings and real revenue boosts.

Better Defaulting of Fields

Currently fields tend to be defaulted based on configuration screens configured by administrators. These might change based on an object like a customer or customer group, but tend to be fairly static. An algorithm could watch what a user (or all the users at a company) tend to use and make much more intelligent defaults. These could be based on various contexts of other fields, time/date, current promotions, even news feed items. If defaults are provided more intelligently, then it will save users huge time in data entry.

Better Auto-Suggestions

Currently auto-suggestions on fields tend to be based on a combination of previous values entered and performing a “Google-like” search on what has been typed so far. Like defaulting this could be greatly improved by adding more sophisticated algorithms to improve the suggestions. The real Google search already does this, but most “Google-like” searches integrated into Business Apps do not. Like defaulting, having auto-suggestions give better more intelligent recommendations will greatly improve productivity. Like Google Search uses all your previous searches, trending topics, social media feeds and many other sources, so could your Business Application.

Fraud Detection

Credit card companies already use AI to scan people’s credit card purchasing patterns as well as the patterns of people using stolen credit cards to flag when they think a credit card has been stolen or compromised. Similarly Business Applications can monitor various company procedures and expenses to detect theft (perhaps strangeness in Inventory Adjustments) or unusual payments. Here there could be regulatory restrictions on what data could be used, for instance HR data is probably protected from being incorporated in this sort of analysis. Currently theft and fraud is a huge cost to businesses and AI could help reduce it. Sometimes just knowing that tools like this are being used can act as a major deterrent.

Purchasing

Algorithms could be used to better detect when items are needed to reduce inventory levels. Further the algorithms can continuously search vendor prices looking for deals and consider whether its worth buying now at a cheaper price and incurring the inventory expense or waiting. When you regularly purchase thousands or more items, a dynamic algorithm keeping on track of things can really help.

Customer Data

When you get a new customer you need all sorts of information such as their address, phone number, contacts, etc. Perhaps an algorithm could also search the web and fill in this information automatically (perhaps this is a specific example of better defaulting). Plus the AI could scan various web source (some perhaps pay services for credit ratings and such) to suggest a good credit rating/limit for this new customer. The algorithm could also run in the background and update existing customers as this data changes, since keeping customer data up to date is a major challenge for companies with many customers. Knowing and keeping up to date with your customers is a major challenge for many companies and much of this work can be automated.

Chasing Accounts Receivables

Collecting money is always a major challenge for every company. Much of this work could be automated. Plus algorithms can watch the paying habits of customers to know if say they alway pay on the end of  the quarter, not to worry so much when they go over 30 days. But if a customer suddenly gets credit rating problems or their stock tanks or there is negative news on the company then you better get collecting. Again this is all a lot of work and algorithms can greatly reduce the manual workload and make the whole process more efficient.

Setting Prices

Setting prices is an art and a science. You need to lower prices to move slow moving items out of inventory and try to keep prices high to maximize return. You need to be aware of competitors prices and watch for these items going on sale. Algorithms can greatly help with this. Amazon is a master of this, maintaining millions of prices with AI all over their web site. Algorithms can scan the web for competitive pricing, watch inventory levels and item costs, know where we are in a quarter and how much we need to stimulate sales to meet targets. These algorithms can make all the trade offs of knowing our customer loyalty versus having to be low cost, etc. Similarly this can affect customer and volume discounts. Once you have a lot of items for sale, maintain prices is a lot of work, especially in the world of online shopping where everything is changing so dynamically. With the big guys like Amazon and Walmart using these algorithms so effectively, you need to as well to be competitive.

Summary

This article just gave a few examples of the many places we’ll be seeing AI and Machine Learning algorithms becoming integrated into all our Business Applications. The examples in this article are all possible today and in use individually by large corporations. The cost of all these technologies is coming down and we are seeing these become integrated into lower cost Business Applications for small and medium sized businesses.

As these become adopted by more and more companies, it will become a competitive necessity to adopt them or risk becoming uncompetitive in the fast paced online world. There will still be a human element to monitor and provide policies but humans can perform many of these tasks at the speed and scale that today’s world requires.

For the users of Business Applications, the addition of AI to the User Interactions, should make these applications much more pleasant to operate. Instead of gotchas there will be helpful suggestions and reminders. Instead of needed to memorize and lookup all sorts of codes, these will be usefully provided wherever necessary. I think this transition will be as big as the transition we made from text based applications to GUI applications, but in this case I think the real ROI will be much higher.

 

Written by smist08

July 26, 2017 at 2:03 am

Learning in Brains and Computers

leave a comment »

Introduction

In the last couple of articles we were considering whether the brain is a computer and then what its operating system looks like. In this article we’ll be looking at how the brain learns and comparing that to how learning in a modern AI system works. As we noted before, our DNA doesn’t contain a lot of seed data for the brain, nearly everything we know needs to be learned. This tends to be why as animals become more and more advanced, their childhoods become longer and longer. Besides growing to our full size, we also require that time to learn what we will need to survive on our own as adults for when we leave our parents. Similarly AI systems start without any knowledge, just a seed of random data to start with, then we train them so that they can do the job we desire like driving a car. However, how we train an AI system is quite different than how we train a child, though there are similarities.

How the Brain Learns

For our purposes we are looking at what happens at the neuron level during learning rather than considering higher level theories on educational methods. As we are trained two things happen, on is when something is reinforced then the neural connections are strengthened. Similarly if a pathway isn’t used then it weakens over time. This is controlled by the number of chemical transmitters at the junction between the neural connection. The other thing that happens is the neurons grow new connections. To some degree the brain is always re-wiring itself by growing new connections. As mentioned before thousands of neurons die each day and not all of them are replaced, so as we age we have fewer neurons, but this is counterbalanced by a lifetime of learning where we have continuously grown new neural connections, so as we age perhaps we have fewer neurons, but by far more neural connections. This is partly why staying mentally active and pursuing lifetime learning is so important to maintain mental health into older age.

Interestingly this is also how memory works. This same neural strength adjustment and connection growth is how we encode memories. The system is a bit more complex since we have a short term memory system from which some data is later encoded into long term memory but the basic mechanism are the same. This is why we forget things, if we don’t access a memory then the neural connection will weaken over time and eventually the memory will be forgotten.

A further feature of biological learning is how the feedback loop works. We get information through our senses and can use that for learning, but it’s been shown that the learning is much more effective if it leads to action and then the action provides feedback. For instance if you are shown a picture of a dog and told its a dog, this is far less effective than being provided a dog that you can interact with, by touching and petting. It appears that having exploratory action attached to learning is far more effective in our we learn, especially at young ages. We say this is the input – learn – action loop with feedback rather than just the input – learn loop with feedback.

How AIs Learn

Let’s look specifically at Neural Networks, which have a lot of similarities with the brain. In this case we represent all the connections between neurons as weights in a matrix where zero represents no connection and a non-zero weight represents a connection that we can strengthen by making larger or weaken by making smaller.

To train a Neural Network we need a set of data where we know the answers. Suppose we want to train a Neural Network to recognize handwritten numbers. What we need is a large database of images of handwritten numbers along with the number each image represents. We then train the Neural Network by seeding it with random weights feed each image through the Neural Network and compare how it does to the correct answer. We have sophisticated algorithms like Stochastic Gradient Descent that adjusts the weights in the matrix to produce better results. If we do this enough then we can get very good results from our Neural Network. If often apply some other adjustments such as setting small weights to zero so they really don’t represent a connection or penalizing large weights since these lead to overfitting.

This may seem like a lot of work, and it is, but it can be done in a few hours or days on a fast modern computer, using GPUs if necessary to speed things up. This relies on that we can adjust weights instantly since they are just floating point numbers in a matrix, unlike the brain which needs to make structural changes to the neuron or brain.

A Comparison

To effectively train a Neural Network to recognize handwritten decimal digits (0-9) requires a training database of around 100,000 images. One of the reasons AI has become so successful in recent years has been the creation of many such huge databases that can be used for training.

Although it might feel like it to a parent, it doesn’t require showing a toddler 100,000 images for them to learn their basic numbers. What it does take is more time and a certain amount of repetition. Also the effectiveness is increased if the child can handle the digits (like with blocks) or draw the digits with crayons.

It does take longer to train a toddler than an AI, but this is largely because growing neural connections is a slower process than executing an algorithm on a fast computer which doesn’t have any other distractions. But the toddler will quickly become more effective at performing the task than the AI.

Comparing learning to recognize digits like this may not be accurate, since in the case of the toddler, they are first learning to distinguish objects in their visual field and then recognize objects when they are rotated and seen from separate angles. So the input into learning digits for a brain probably isn’t a set of pixels directly off the optic nerve. The brain will already have applied a number of algorithms it learned previously to present a higher level representation of the digit before being asked to identify each digit. In the same way perhaps our AI algorithm for identifying digits in isolation from pixelated images is useful for AI applications, but isn’t useful on the road to true intelligence and that perhaps we shouldn’t be using these algorithms in so much isolation. We won’t start approaching strong AI till we get many more of the systems working together. For instance for self driving cars, the system has to break a scene up into separate objects before trying to identify them, creating such a system requires several Neural Networks working together to do this work.

Is AI Learning Wrong?

It would appear that the learning algorithm used by the toddler is far superior to the learning algorithm used in the computer. The toddler learns quite quickly based on just a few examples and the quality of the result often beats the quality of a Neural Network. The algorithms used in AI like Stochastic Gradient Descent tend to be very brute force, find new values of the weights that reduce the error and then keep iterating, reducing the error till you get a good enough result. If you don’t get a good enough result then fiddle with the model and try again (we now have meta-algorithms to fiddle with the model for us as well). But is this really correct? It is certainly effective, but seems to lack elegance. It also doesn’t seem to work in as varied circumstances as biological learning works. Is there a more elegant and efficient learning algorithm that is just waiting to be discovered?

Some argue that a passive AI will never work, that the AI needs a way to manipulate its world in order to add that action feedback loop to the learning process. This could well be the case. After all we are training our AI to recognize a bunch of pixels all out of context and independently. If you add the action feedback then you can handle and manipulate a digit to see it from different angles and orientations. Doing this you get far more benefit from each individual training case rather than just relying on brute force and millions of separate samples.

Summary

There are a lot of similarities in how the brain learns versus how we train AIs, but there are also a lot of fundamental differences. AIs rely much more on brute force and volume of training data. Whereas the brain requires fewer examples but can make much more out of each example. For AI to advance we really need to be building systems of multiple Neural Networks rather than focusing so much on individual applications. We are seeing this start to take shape in applications like self-driving cars. In AIs we also need to provide a way to manipulate their environment, even if this just means manipulating the images that they are provided as training data and incorporating that manipulation into the training algorithms to make them much more effective and not so reliant on big data volume. I also think that biological brains are hiding some algorithmic tricks that we still need to learn and that these learning improvements will make progress advance in leaps and bounds.

 

Written by smist08

June 23, 2017 at 6:49 pm