Stephen Smith's Blog

Musings on Machine Learning…

Breadboarding with my Raspberry Pi

leave a comment »

Introduction

One of the cool things about the Raspberry Pi is that it has a set of general purpose input/output (GPIO) pins. Many Raspberry Pi starter kits (including my Canakit) come with a breadboard and few simple electronic components you can play with. In this article I’ll talk about connecting up some LED lights and controlling them from both Python and Scratch. Below is a bit of a hazy picture of my Raspberry Pi hooked up to the breadboard and a Scratch program running.

Here is a closer look at the breadboard with a few LEDs and resistors connected.

You don’t need to do this for many standard tasks, after all the Pi has four USB ports, Wifi, Bluetooth, HDMI, sound/composite video and ethernet ports. But the GPIO port is great for electronic enthusiasts, hobbyists and educators to get their hands dirty playing with electronic components.

Hooking up an LED

Each GPIO pin can be individually controlled and will provide 3.3V when activated. It is then specified to keep the current under 16mA or you can damage the circuits. My kit came with a number of 220 Ohm resistors and by Ohm’s law these would case the current to be 3.3V/220Ω = 15mA, so just right. You need to have a resistor in series with the LED since the LED’s resistance is quite low (typically around 13 Ohms and variable). I connected 3 LEDs and for each LED you connect a wire from a GPIO pin (in this case I used 17, 27 and 22) to the positive lead of the LED then you connect the negative side to a resistor and the other side of the resistor to the -3.3V line on the breadboard. Really quite simple.

Python

It’s quite simple to control the GPIO pins via a Python package. You just need to import RPi.GPIO and you can get going. This package came pre-installed so all I had to do was write some lines of code and away it went. Basically I just need to set the mode for since the package supports a few different boards and chipsets, then configure the pins I’m using for output. Then I just need to turn the LEDs on and off. You need to add some sleep statements or the whole thing executes faster than you can see.

Scratch

Scratch is a very simple and visual programming language/environment developed by the MIT Education Department. It is used to teach programming to students as young as in kindergarten. It is really amazing the animations and games that kids can produce with this system. It comes pre-installed on the Raspberry Pi and you can also control the GPIO pins with it, just like you can in Python. You have to run the GPIO server from the edit menu and then you use the broadcast statement to control the GPIO functions. Here is the Scratch version of the simple Python program displayed above.

More on the GPIO

The GPIO has 26 pins, two are +3.3V, two are +5V, 5 are ground and then that leaves 17 as general GPIO pins.

In the same way we configured the pins for output to control LEDs you can configure them for input and then for instance read the setting of a switch.

However this isn’t all there is to GPIO, besides the functions we’ve talked about so far, which are rather limited, a number of the pins have “alternate” functions that you can select programatically. For instance pins 3 and 5 can support the I2C standard that allows two microchips to talk to eachother. There are pins that can support two serial ports which are handy for connecting to radios or printers. There are pins that can support PWM and PPM which are handy for controlling electical motors.

Summary

The Raspberry Pi 3 is a very versatile device. It runs a most Linux software and has a very flexible architecture allowing it to interface to a great many devices. It has four USB ports, Wifi, Internet and Bluetooth. Plus there is the general purpose GPIO bus that allows a great deal of flexibility to interface the Pi to almost anything. That is why you see Raspberry Pi’s built as the brains of drones, robots, home security systems, information kiosks and so much more.

 

Advertisements

Written by smist08

November 15, 2017 at 9:14 pm

Posted in raspberry pi

Tagged with , ,

Playing with my Raspberry Pi

leave a comment »

Introduction

I do most of my work (like writing this blog posting) on my MacBook Air laptop. I used to have a good desktop computer for running various longer running processes or playing games. Last year the desktop packed it in (it was getting old anyway), so since then I’ve just been using my laptop. I wondered if I should get another desktop and run Ubuntu on it, since that is good for machine learning, but I wondered if it was worth price. Meanwhile I was intrigued with everything I see people doing with Raspberry Pi’s. So I figured why not just get a Raspberry Pi and see if I can do the same things with it as I did with my desktop. Plus I thought it would be fun to learn about the Pi and that it would be a good toy to play with.

Setup

Since I’m new to the Raspberry Pi, I figured the best way to get started was to order one of the starter kits. This way I’d be able to get up and running quicker and get everything I needed in one shot. I had a credit with Amazon, so I ordered one of the Canakits from there. It included the Raspberry Pi 3, a microSD card with Raspbian Linux, a case, a power supply, an electronics breadboard, some leds and resistors, heat sinks and an HDMI cable. Then I needed to supply a monitor, a USB keyboard and a USB mouse (which I had lying around).

Setting up was quite easy, though the quick setup instructions were missing a few steps like what to do with the heatsinks (which was obvious) or how to connect the breadboard. Setup was really just install the Raspberry Pi motherboard in the case, add the heat sinks, insert the microSD card and then connect the various cables.

As soon as I powered it on, it displayed an operating system selection and installation menu (with only one choice), so clicked install and 10 minutes later I was logged in and running Raspbian.

The quick setup guide then recommends you set your locale and change the default password, but they don’t tell you the existing password, which a quick Google reveals as “Raspberry”. Then I connected to our Wifi network and I was up and running. I could browse the Internet using Chromium, I could run Mathematica (a free Raspberry version comes pre-installed), run a Linux terminal session. All rather painless and fairly straight forward.

I was quite impressed how quickly it went and how powerful a computer I had up and running costing less than $100 (for everything) and how easy the installation and setup process was.

Software

I was extremely pleased with how much software the Raspberry Pi came with pre-installed. This was all on the provided 32Gig card, which with a few extra things installed, I still have 28Gig free. Amazingly compact. Some of the pre-installed software includes:

  • Mathematica. Great for Math students and to promote Mathematica. Runs from the Wolfram Language which is interesting in itself.
  • Python 2 and 3 (more on the pain of having Python 2 later).
  • LibreOffice. A full MS Office like suite of programs.
  • Lots of accessories like file manager, calculator, image viewer, etc.
  • Chromium web browser.
  • Two Java IDEs.
  • Sonic Pi music synthesizer.
  • Terminal command prompt.
  • Minecraft and some Python games.
  • Scratch programming environment.

Plus there is an add/remove software program where you can easily add many more open source Pi programs. You can also use the Linux apt-get command to get many other pre-compiled packages.

Generally I would say this is a very complete set of software for any student, hobbyist or even office worker.

Python

I use Python as my main goto programming language these days and generally I use a number of scientific and machine learning libraries. So I tried installing these. Usually I just use pip3 and away things go (at least on my Mac). However doing this caused pip3 to download the C++/Fortran source code and to try to compile it, which failed. I then Googled around on how to best install these packages.

Unfortunately most of the Google results were how to do this for Python 2, which I didn’t want. It will be so nice when Python 2 finally is discontinued and stops confusing everything. I wanted these for Python 3. Before you start you should update apt-get’s list of available software and upgrade all the packages on your machine. You can do this with:

sudo apt-get update        # Fetches the list of available updates
sudo apt-get upgrade       # Strictly upgrades the current packages

What I found is I could get most of what I wanted using apt-get. I got most of what I wanted with:

sudo apt-get install python3-numpy
sudo apt-get install python3-scipy
sudo apt-get install python3-matplotlib
sudo apt-get install python3-pandas

However I couldn’t find and apt-get module for SciKit Learn the machine learning library. So I tried pip3 and it did work even though it downloaded the source code and compiled it.

pip3 install sklearn –upgrade

Now I had all the scientific programming power of the standard Python libraries. Note that since the Raspberry Pi only has 1Gig RAM and the SD Card only has twenty something Gig free, you can’t really run large machine learning tasks. However if they do fit within the Pi then it is a very inexpensive way to do these computations. What a lot of people do is build clusters of Raspberry Pi’s that work together. I’ve seen articles on how University labs have built supercomputers out of hundreds or Pi’s all put together in a cluster. Further they run quite sophisticated software like Hadoop, Docker and Kubernetes to orchestrate the whole thing.

Summary

I now have the Raspberry Pi up and running and I’m enjoying playing with Mathematica and Sonic Pi. I’m doing a bit of Python programming and browsing the Internet. Quite an amazing little device. I’m also impressed with how much it can do for such a low cost. As other vendors like Apple, Microsoft, HP and Dell try to push people into more and more expensive desktops and laptops, it will be interesting to see how many people revolt and switch to the far more inexpensive DIY type solutions. Note that there are vendors that make things like Raspberry Pi complete desktop computers at quite a low cost as well.

Written by smist08

November 11, 2017 at 9:35 pm

Getting My Amateur Radio License

leave a comment »

Introduction

I recently passed the exam to get my amateur (ham) radio license. This article is a bit on my experience with taking the course and writing the exam, as well some thoughts on ham radio. People spend a lot of money on cell phones, but most of what they do is done routinely for free by dedicated radio hobbyists. After all a cell phone is just a radio running on frequencies bought up by the cell phone providers.

Why?

There is a perception that amateur radio is an old obsolete dying hobby. That is consists of people using old tube radios trying talk to people far away with lots of interference and poor sound quality.

Why would anyone use ham radio when you can talk to anyone reliably on Skype or FaceTime? Didn’t ham radio only make sense back in the days when long distance calls were prohibitively expensive? Now even most phone plans are quite inexpensive and use the Internet to communicate. So why are the number of amateur radio licenses growing year over year? In the US the number of licenses issued each year is growing at a rate of about 1%, not Internet growth, but pretty steady for something that requires preparing for and writing a written exam.

Here are some of the main reasons for the continuing interest in amateur radio:

  • Disaster preparedness. Most amateur radios can operate off batteries and don’t require cell phone towers, Internet connections or even A/C power.
  • Better radios. There are a proliferation of new radios on the market, at much lower price points (under $100)  and a great many features including computer connectivity, digital channels and more efficient power.
  • Volunteer infrastructure. To use your cell phone you typically pay $75/month to someone for your ability to use their cell phone towers and such. In the amateur radio world many clubs operate repeaters (think amateur cell phone towers) and Internet connections allowing long distance communications with voice or even e-mail from low powered portable radios.
  • Cell phone coverage often isn’t as good as advertised. I do a lot of hiking and mountain biking. Most of the places I go, there is no cell phone coverage. However here on the Sunshine Coast if I get line of sight to the ocean I can probably hit one of the Nanaimo repeaters and communicate. The only other alternative is satellite phones and these make cell phones seem inexpensive.
  • Ability to build your own equipment and experiment. The world of cellphones is very locked down and DIY is not allowed at all. In the world of amateur radio if you have the correct license, you can build your own radio, you can heavily customize your equipment, you can build your own antennas and generally do a lot of experimentation and customization. If you have an interest in understanding how things work and playing with electronics, then this is a great playground.

Licensing

To get my license I took a course offered by VECTOR in Vancouver to prepare and write the basic amateur radio exam. This was a great success and I greatly enjoyed it. VECTOR is an emergency radio non-profit society for the city of Vancouver. The course was taught by a number of VECTOR members usually someone different for each section. This gave a good mix of perspective and provided a good variety over the five morning sessions. The details of this process are a bit different by country so the details here apply specifically to Canada. The basic exam is 100 multiple choice questions, you can take practice exams here. The course covers the main areas of the exam which are: Rules and Policies, Basic Electronics, Antenna Systems, Radio Wave Propagation, Interference and Suppression, Operating Procedures and Station Assembly Practices and Safety. The whole course and exam are theoretical and don’t involve touching a radio whatsoever.

I did pass the exam. If you get 70% then you get access to all frequencies above 30MHz and if you get 80% then you don’t have any frequency limitations. I got over 80% so I can use any amateur frequency band. My call sign is VA6SMI. I have to use this to identify myself in any communications.

To actually build your own radio you need the advanced certification which I might try to get next year. The other qualification you can get is Morse Code. Up till ten years ago this was part of getting a basic license but isn’t required anymore.

Future

I think a lot of the current popularity of ham radio is driven by a wish to be independent of cell phone providers and a wish for much more DIY in our communications. As recent fires, hurricanes and earthquakes have demonstrated the cell phone infrastructure is rather fragile and can take quite some time to get up and running again. During all these emergencies, ham radio was the main form of communications. With global warming, fires and hurricanes will become more common and a need for emergency communications becomes more and more important.

Separately amateurs are adding more and more functionality into ham communications with better digital protocols and speed. They are adding better volunteer infrastructure to support longer distance communications and Internet connectivity. The radio hardware market has gotten much more competitive recently with the addition of several Chinese manufacturers that have driven down prices and driven new innovation.

Will we reach a point where we have “smart” ham radios like we have “smart” phones? Will we be able to run apps like Facebook on our ham radio? Will be be able to use them as ubiquitously as we use our phones? Will we one day be able to break the hold of the cell phone providers and eliminate all those expensive subscription plans? It might take some time but we are slowly moving in that direction. Certainly the ham radio frequencies can’t handle that volume of traffic, but perhaps for those willing to pass the basic radio exam we will start to get some freedom from the current cell phone hegemony.

Summary

I enjoyed the process of getting my ham radio license. I’ve ordered a radio and look forward to getting on the air when it arrives. It was fun reviewing basic electronics and learning more about antennas and radio wave propagation. The rules and regulations part wasn’t bad, once you get the hang of what they are looking for, the questions are quite easy.

Now I’ll see if I’m better connected once I’m out in the woods. Plus I’ll be able to communicate during power outages (we had one today that also took out local cell phone coverage).

 

Written by smist08

October 21, 2017 at 12:47 am

Updates to the TensorFlow API

leave a comment »

Introduction

Last year I published a series of posts on getting up and running on TensorFlow and creating a simple model to make stock market predictions. The series starts here, however the coding articles are here, here and here. We are now a year later and TensorFlow has advanced by quite a few versions (1.3 as of this writing). In this article I’m going to rework that original Python code to use some simpler more powerful APIs from TensorFlow as well as adopt some best practices that weren’t well known last year (at least by me).

This is the same basic model we used last year, which I plan to improve on going forwards. I changed the data set to record the actual stock prices rather than differences. This doesn’t work so well since most of these stocks increase over time and since we go around and around on the training data, it tends to make the predictions quite low. I plan to fix this in a future articles where I handle this time series data correctly. But first I wanted to address a few other things before proceeding.

I’ve placed the updated source code tfstocksdiff13.py on my Google Drive here.

Higher Level API

In the original code to create a layer in our Neural Network, we needed to define the weight and bias Tensors:

layer1_weights = tf.Variable(tf.truncated_normal(
      [NHistData * num_stocks * 2, num_hidden], stddev=0.1))
layer1_biases = tf.Variable(tf.zeros([num_hidden]))

And then define the layer with a complicated mathematical expression:

hidden = tf.tanh(tf.matmul(data, layer1_weights) + layer1_biases)

This code is then repeated with mild variations for every layer in the Neural Network. In the original code this was quite a large block of code.

In TensorFlow 1.3 there is now an API to do this:

hidden = tf.layers.dense(data, num_hidden, activation=tf.nn.elu,
        kernel_initializer=he_init,
        kernel_regularizer=tf.contrib.layers.l1_l2_regularizer(),
        name=name + "model" + "hidden1")

This eliminates a lot of repetitive variable definitions and error prone mathematics.

Also notice the kernel_regularizer=tf.contrib.layers.l1_l2_regularizer() parameter. Previously we had to process the weights ourselves to add regularization penalties to the loss function, now TensorFlow will do this for you, but you still need to extract the values and add them to your loss function.

reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
loss = tf.add_n([tf.nn.l2_loss( tf.subtract(logits, tf_train_labels))] + reg_losses)

You can get at the actual weights and biases if you need them in a similar manner as well.

Better Initialization

Previously we initialized the weights using a truncated normal distribution. Back then the recommendation was to use random values to get the initial weights away from zero. However since 2010 (quite a long time ago) there have been better suggestions and the new tf.layers.dense() API supports these. The original paper was “Understanding the difficulty of training deep feedforward neural networks” by Xavier Glorot and Yoshua Bengio. If you ran the previous example you would have gotten an uninitialized variable on he_init. Here is its definition:

he_init = tf.contrib.layers.variance_scaling_initializer(mode="FAN_AVG")

The idea is that these initializers vary based on the number of inputs and outputs for the neuron layer. There is also tf.contrib.layers.xavier_initializer() and tf.contrib.layers.xavier_initializer_conv2d(). For this example with only two hidden layers it doesn’t matter so much, but if you have a much deeper Neural Network, using these initializers can greatly speed up training and avoid having the gradients either go to zero or explode early on.

Vanishing Gradients and Activation Functions

You might also notice I changed the activation function from tanh to elu. This is due to the problem of vanishing gradients. Since we are using Gradient Descent to train our system then any zero gradients will stop training improvement in that dimension. If you get large values out of the neuron then the gradient of the tanh function will be near zero and this causes training to get stalled. The relu function also has similar problems if the value ever goes negative then the gradient is zero and again training will likely stall and get stuck there. On solution to this is to use the elu function or a “leaky” relu function. Below are the graphs of elu, leaky relu and relu.

Leaky relu has a low sloped linear function for negative values. Elu uses an exponential type function to flatten out a bit to the left of zero so if things go a bit negative they can recover. Although if things go more negative with elu, they will get stuck again. Elu has the advantage that its is rigged to be differentiable at 0 to avoid special cases. Practically speaking both of these activation functions have given very good results in very deep Neural Networks which would otherwise get stuck during training with tanh, sigmoid or relu.

Scaling the Input Data

Neural networks work best if all the data is between zero and one. Previously we didn’t scale our data properly and just did an approximation by dividing by the first value. All that code has been deleted and we now use SciKit Learn’s MinMaxScaler object instead. You fit the data using the training data and then transform any data we process with the result. The code for us is:

# Scale all the training data to the range [0,1].
scaler = MinMaxScaler(copy=False)
scaler.fit(train_dataset)
scaler.transform(train_dataset)
scaler.transform(valid_dataset)
scaler.transform(test_dataset)
scaler.transform(final_row)

The copy=False parameter basically says to do the conversion in place rather than producing a new copy of the data.

SciKit Learn has a lot of useful utility functions that can greatly help with using TensorFlow and well worth looking at even though you aren’t using a SciKit Learn Machine Learning function.

Summary

The field of Neural Networks is evolving rapidly and the best practices keep getting better. TensorFlow is a very dynamic and quickly evolving tool set which can sometimes be a challenge to keep up with.

The main learnings I wanted to share here are:

  • TensorFlow’s high level APIs
  • More sophisticated initialization like He Initialization
  • Avoiding vanishing gradients with elu or leaky ReLU
  • Scaling the input data to between zero and one

These are just a few things of the new things that I could incorporate. In the future I’ll address how to handle time series data in a better manner.

Written by smist08

October 16, 2017 at 9:42 pm

Components Leading to Strong AI

leave a comment »

Introduction

There have been a lot of advances in AI in the past couple of years. A lot of these advances are better simulating the various functions of the brain. These include the convolutional neural networks which are very good at image recognition and new techniques to incorporate memory into neural networks.

Very Deep Neural Networks

In the early days of Neural Networks, finding the weights for the connections was very difficult and often performed by hand. Then the gradient descent algorithm came along and allowed bigger Neural Networks to be trained. Then in 1986 a groundbreaking paper by D. E. Rumelhart showed how to use back propagation to train a multi-level Neural Network with Gradient Descent. However the shape of the surface that is being optimized is often very ill suited to this algorithm, containing many local minimums, or more usually being very flat not indicating the direction to take. Plus depending on the problem the training data may contain lots of errors that can mislead the training process.

With recent tweaks to the training algorithms, researchers have managed to train very deep Neural Networks. For instance the Oxford Visual Geometry Group (VGG) has released a pre-trained 19 layer Neural Network for image recognition.

This is a great building block for other image manipulation projects like Image Style Transfer that we looked at previously.

Now these Neural Networks are starting to resemble the architecture and structure of biological Neurons in the human brain such as the following from the human cortex.

This shows that we are starting to accurately simulate the computational engine in our brains.

The Road to Memory

Although the deep neural networks in the last section are very large and powerful at some problems, other problems they fail at primarily due to a lack or memory or context. For instance if you are translating text word by word, you need to remember the previous words in the sentence to get a correct translation based on the context. Or you need to do a first pass word by word and then knowing the whole, correct mistakes based on now knowing more generally what is being said. Similarly as an algorithm deals with the world, it should learn about the world as it explores and gathers more information. Just retraining the whole Neural Network for each bit of new information is very inefficient.

For language translation and speech recognition the use of Recurrent Neural Networks (RNNs) have proven quite effective. In these the outputs from Neurons can feed into the inputs of the same layer or into the inputs of previous layers. The networks of the previous section were all feed forward Neural Networks since the output of a layer only feeds the input of the next layer. RNNs aren’t true non feed forward networks since they don’t iterate to find a solution with everything stabilized, Rather these outputs from use n go into the inputs of usage n+1. In this way these act as a sort of memory from usage to usage allowing the network to preserve some context from say word to word in translation.

More recent research has led to Neural Networks that can actually have memory banks. These include Long Short-Term Memory Cells (LSTM Cells) and Gated Recurrent Unit (GRU) Cells.

These artificial neurons have the ability to store memory values (as well as forget memory values). The key difficulty in adding memory to Neural Networks was in how to train them. Gradient Descent and all its variations require that the function being optimized is differentiable or very nearly so. Putting things in memory, reading memory and erasing memory are very discrete functions. These sort of functions are not differentiable and can’t be patched since they are flat with zero derivative elsewhere. Something with a zero derivative doesn’t give any information to Gradient Descent as to which direction to go. The solution to this was to replace the discrete functions with probability distributions that are differentiable. So rather than say put something in memory, the function gives you a probability that you should put the value in memory and then you do so if say the probability is greater than 50%.

Learning

I think the current tools for training Neural Networks work quite well for deep feedforward Neural Networks. I think they do a good job of training the weights to use in the various network layers. However I don’t think they provide a good solution for training systems with memory. The brain probably uses some process similar to what we do to train the input weights and outputs to biological Neurons, such as Hebbian Learning. However I don’t think this is what is used to decide whether to remember something or not. I think we still have a long way to go before effectively using memory in our Neural Networks even though just a little bit of memory is greatly improving our translators, speech and text recognition programs.

Summary

The field of Neural Networks is making great progress. This is due to advances in refining the training process of deep Neural Networks along with advances in making artificial Neurons more sophisticated by adding elements like memory banks. Combine this with the fast pace of development of GPUs allowing essentially low cost supercomputers for training and running these networks and the large amount of venture capital that is flowing into anything AI related and we are seeing a true renaissance in the AI field.

Does someone have a true deep AI running in their lab already? Perhaps; but, if they don’t I think we are starting to get quite close.

Written by smist08

September 29, 2017 at 9:12 pm

Playing with Image Style Transfer

leave a comment »

Introduction

Last time we introduced Image Style Transfer, an AI algorithm that combines the contents of one image with the style of another image. In this article we are going to look at some ways to play with this process in more advanced ways. We are going to play with Anish Athalye’s implementation which is on GitHub here, this implementation is really good at allowing lots of tuning and playing.

Playing around this way is quite time consuming since you have to run Gradient Descent to find the solution, rather than just applying canned solutions. Since I ran all these on an older MacBook Air with no GPU, I had to use a lower resolution in the interest of time. At lower resolution (the MacOS’s small size) it took about an hour for each image. At medium resolution, it took about six hours to generate an image. This is ok for running over night but doesn’t allow a lot of play. Makes me wonder if I should get a beefy desktop computer with a good NVidia GPU?

I found a really good YouTube video explaining Image Style Transfer here which is well worth a watch.

Playing with Algorithms

We’ve seen in previous articles how we can play with the tunable parameters in AI algorithms to get quite different results. Here we’ll look at the effects of playing with some parameters as well as fiddling with the algorithm itself.

The basic observation that lead to Image Style Transfer was that a deep image recognition neural network extracts the features related to content in the lower layers and the features related to style in the higher layers. Interestingly the human brain’s image recognition neurons appear to be structured in the same sort of way and it is believed there is a fair bit of similarity between how an advanced image recognition algorithm works and how the brain works. This separation of content from style is then the basis for merging and manipulating these.

The Image Style Transfer algorithm works by starting with an image of white noise and then iterating it using stochastic gradient descent to minimize the difference between the content in one image and the style in the other. This is the loss function we often talk about in AI. The interesting part of the algorithm is that we aren’t training the neural network matrix weights, since these are pre-done by the VGG group, but we are training the input image. So we have a loss function like:

Total Loss = Loss of content from first image + Loss of style from the second image

We can then play with this Loss function in various ways which we’ll experiment with in the rest of this article.

Apply Some Weights

Usually in Machine Learning algorithms we apply weights everywhere that we can use to tune things. The same applies here. We can weight the contributions from content versus style in the total loss formula to achieve more of a contribution from style or content.

First we take a picture of Tetrahedron Peak and combine it with Vincent van Gogh’s Starry Night using the default settings of the algorithm:

Now we can try playing with the weight of the content contribution. Lower means more style, higher means more content. In the image above the content weight was the default of 5.

Notice the image on the left is much more abstract with the large stars appearing all over.

Using Multiple Styles

Last time we used one style at a time to get our result. But you can actually use the algorithm to incorporate multiple styles at once. In this case we just generalize the Loss function above as:

Total Loss = Loss of content form first image + Loss of style from style image 1 +
                 Loss of style form style image 2

Of course we can then further generalize this to any number of style images.

We’ll use our Starry Night combination and also use Picasso’s Dora Maar:

Now we will use both pictures for the style and see what we get:

This weights the styles of Starry Night and Dora Maar equally. However you can see from the Loss formula that we can easily weight the components and get say 75% Starry Night and 25% Dora Maar:

 

Now if we reverse the weights and do Starry Night at 25% and Dora Maar at 75%:

Playing with the Neural Network

We can also play with the Neural Network used. We  can change a number of parameters in the Neural Network as well as introduce various scaling and weight factors.

Pooling Type

For instance there are something called Pooling Layers in the network. These reduce the resolution of the image and help with reducing the abstraction from fine level details to higher level abstractions. There are two commonly used types of pooling layers namely average pooling and max pooling. We can try either of these to see what affect that might have on the image style transfer.

Here we see that average pooling favoured fine details and preserved more of the content image. Whereas max pooling used more of the style image and is a bit more abstract.

Exponential Style Layer Weight

Another thing we can do is magnify some layers over others. For instance we can magnify each style layer over the last one as follows:

weight(layer<n+1>) = weight_exp*weight(layer<n>)

The default is 1 (ie none). Here is Tetrahedron Peak using 0.2 and 2.0.

A factor less than one means more original content since some style layers are suppressed, and a factor greater than one magnifies some style layer contributions. Since the style layers aren’t all weighted the same  this is a bit different than just changing the weighting factor between content and style.

Iterations

Another parameter that is fun to play with is the number of iterations that Gradient Descent runs for. Below we can see a sequence of images as the number of iterations is increased. We can see the content and style of the image forming out of the initial white noise.

At this resolution we are pretty much converged at 500 iterations, however for higher resolution and more complicated images more iterations might be necessary. We could also use a stopping criterion like when the loss function stops changing by some delta, rather than using a fixed number of iterations.

This problem converges quite well since it is mathematically well defined. Often in AI, we don’t get this good behaviour because the training data has lots of errors and/or lots of noise. Here we are just training against a content picture and one or more style pictures, so by definition there isn’t any erroneous data. These challenges would have been faced and solved by the team developing the VGG image recognition neural network that we get to just use and don’t have to worry about training.

Summary

As we can see we can get quite a few different effects by tuning the algorithm using the same style picture as a reference. Simple tools like Prisma or deepart.io don’t let you play with all these parameters. As a photographer who is trying to get a specific effect, you want the power and flexibility to tune your style transfer exactly. Right now the only way to do this is to run the AI algorithms on your computer and play with them which is very time consuming. I suspect once this technology is incorporated in more advanced tools then various degrees of tuning will be possible. Adobe has been demonstrating Image Transfer Style in their labs, and it will be interesting to see if they incorporate it into Photoshop and then how much tuning is possible. Also if it runs in the Adobe Creative Cloud, it will be interesting to see whether it’s quicker running that way than running on your own computer.

 

Written by smist08

August 21, 2017 at 4:29 pm

An Introduction to Image Style Transfer

with 2 comments

Introduction

Image Style Transfer is an AI technique that is becoming quite popular for enhancing or stylizing photos. It takes one picture (often a classical painting) and then applies the style of that picture to another picture. For example I could take this photo of the Queen of Surrey passing Hopkins Landing:

Combined with the style of Vincent van Gogh’s Starry Night:

To then feed these through the AI algorithm to get:

In this article, we’ll be look at some of the ways you can accomplish this yourself either through using online services or running your own Neural Network with TensorFlow.

Playing with Image Style Transfer

There are lots of services that let you play with this. Generally to apply a canned style to your own picture is quite fast (a few seconds). To provide your own photo as the style photo is more involved, since it involves “training” the style and this can take 30 minutes (or more).

Probably the most popular program is the Prisma app for either iPhone or Android. This app has a large number of pre-trained styles and can apply any of them to any photo on your phone. This app works quite well and gives plenty of variety to play with. Plus its free. Here is the ferry in Prisma’s comic theme:

If you want to provide your own photo as the style reference then deepart.io is a good choice. This is available as a web app as well as either an iPhone or Android app. The good part about this for photographers is that you can copy photos from your good camera to your computer and then use this program’s website, no phone required. This site has some pre-programmed styles based on Vincent van Gogh which work really quickly and produce good results. Then it has the ability to upload a style photo. Processing a style is more work and typically takes 25 minutes (you can pay to have it processed quicker, but not that much quicker). If you don’t mind the wait this site is free and works quite well. Here is an example of the ferry picture above van Gogh’ized by deepart.io (sorry they don’t label the styles so I don’t know which painting this is styled from):

Playing More Directly

These programs are great fun, but I like to tinker with things myself on my computer. So can I run these programs myself? Can I get the source code? Fortunately the answer to both is yes. This turns out to be a bit easier than you first might think, largely due to a project out of the Visual Geometry Group (VGG) at the University of Oxford. They created an exceptional image recognition neural network that they trained and won several competitions with. It turns out that the backbone to doing Image Style Transfer is to have a good image recognition Neural Network. This Neural Net is 19 layers deep and Oxford released the fully trained network for anyone to use. Several people have then taken this network, figured out how to load it into TensorFlow and created some really good Image Style Transfer programs based on this. The first program I played with was Anish Athalye’s program posted on GitHub here. This program uses VGG and can train a neural network for a given style picture. Anish has quite a good write up on his blog here.

Then I played with a program that expanded on Anish’s by Shafeen Tejani which is on GitHub here along with a blog post here. This program lets you keep the trained network so you can perform the transformation quickly on any picture you like. This is similar to how Prisma works. The example up in the introduction was created with this picture. To train the network you require a training set of image like the Microsoft COCO collection.

Running these programs isn’t for everyone. You have to be used to running Python programs and have TensorFlow installed and working on your system. You need a few other dependent Python libraries and of course you need the VGG saved Neural Network. But if you already have Python and TensorFlow, I found both of these programs just ran and I could play with them quite easily.

The writeups on all these programs highly recommend having a good GPU to speed up the calculations. I’m playing on an older MacBook Air with no GPU and was able to get quite good results. One trick I found that helped is to play with reduced resolution images to help speed up the process, then run the algorithm on a higher resolution version when you have things right. I found I couldn’t use the full resolution from my DLSR (12meg), but had to use the Apple’s “large” size (286KB).

Summary

This was a quick introduction to Image Style Transfer. We are seeing this in more and more places. There are applications that can apply this same technique to videos. I expect this will become a standard part of all image processing software like PhotoShop or Gimp. It also might remain the domain of specialty programs like HDR has, since it is quite technical and resource intensive. In the meantime projects like VGG have made this technology quite accessible for anyone to play with.

Written by smist08

August 14, 2017 at 6:48 pm