Stephen Smith's Blog

Musings on Machine Learning…

Playing with Image Style Transfer

leave a comment »


Last time we introduced Image Style Transfer, an AI algorithm that combines the contents of one image with the style of another image. In this article we are going to look at some ways to play with this process in more advanced ways. We are going to play with Anish Athalye’s implementation which is on GitHub here, this implementation is really good at allowing lots of tuning and playing.

Playing around this way is quite time consuming since you have to run Gradient Descent to find the solution, rather than just applying canned solutions. Since I ran all these on an older MacBook Air with no GPU, I had to use a lower resolution in the interest of time. At lower resolution (the MacOS’s small size) it took about an hour for each image. At medium resolution, it took about six hours to generate an image. This is ok for running over night but doesn’t allow a lot of play. Makes me wonder if I should get a beefy desktop computer with a good NVidia GPU?

I found a really good YouTube video explaining Image Style Transfer here which is well worth a watch.

Playing with Algorithms

We’ve seen in previous articles how we can play with the tunable parameters in AI algorithms to get quite different results. Here we’ll look at the effects of playing with some parameters as well as fiddling with the algorithm itself.

The basic observation that lead to Image Style Transfer was that a deep image recognition neural network extracts the features related to content in the lower layers and the features related to style in the higher layers. Interestingly the human brain’s image recognition neurons appear to be structured in the same sort of way and it is believed there is a fair bit of similarity between how an advanced image recognition algorithm works and how the brain works. This separation of content from style is then the basis for merging and manipulating these.

The Image Style Transfer algorithm works by starting with an image of white noise and then iterating it using stochastic gradient descent to minimize the difference between the content in one image and the style in the other. This is the loss function we often talk about in AI. The interesting part of the algorithm is that we aren’t training the neural network matrix weights, since these are pre-done by the VGG group, but we are training the input image. So we have a loss function like:

Total Loss = Loss of content from first image + Loss of style from the second image

We can then play with this Loss function in various ways which we’ll experiment with in the rest of this article.

Apply Some Weights

Usually in Machine Learning algorithms we apply weights everywhere that we can use to tune things. The same applies here. We can weight the contributions from content versus style in the total loss formula to achieve more of a contribution from style or content.

First we take a picture of Tetrahedron Peak and combine it with Vincent van Gogh’s Starry Night using the default settings of the algorithm:

Now we can try playing with the weight of the content contribution. Lower means more style, higher means more content. In the image above the content weight was the default of 5.

Notice the image on the left is much more abstract with the large stars appearing all over.

Using Multiple Styles

Last time we used one style at a time to get our result. But you can actually use the algorithm to incorporate multiple styles at once. In this case we just generalize the Loss function above as:

Total Loss = Loss of content form first image + Loss of style from style image 1 +
                 Loss of style form style image 2

Of course we can then further generalize this to any number of style images.

We’ll use our Starry Night combination and also use Picasso’s Dora Maar:

Now we will use both pictures for the style and see what we get:

This weights the styles of Starry Night and Dora Maar equally. However you can see from the Loss formula that we can easily weight the components and get say 75% Starry Night and 25% Dora Maar:


Now if we reverse the weights and do Starry Night at 25% and Dora Maar at 75%:

Playing with the Neural Network

We can also play with the Neural Network used. We  can change a number of parameters in the Neural Network as well as introduce various scaling and weight factors.

Pooling Type

For instance there are something called Pooling Layers in the network. These reduce the resolution of the image and help with reducing the abstraction from fine level details to higher level abstractions. There are two commonly used types of pooling layers namely average pooling and max pooling. We can try either of these to see what affect that might have on the image style transfer.

Here we see that average pooling favoured fine details and preserved more of the content image. Whereas max pooling used more of the style image and is a bit more abstract.

Exponential Style Layer Weight

Another thing we can do is magnify some layers over others. For instance we can magnify each style layer over the last one as follows:

weight(layer<n+1>) = weight_exp*weight(layer<n>)

The default is 1 (ie none). Here is Tetrahedron Peak using 0.2 and 2.0.

A factor less than one means more original content since some style layers are suppressed, and a factor greater than one magnifies some style layer contributions. Since the style layers aren’t all weighted the same  this is a bit different than just changing the weighting factor between content and style.


Another parameter that is fun to play with is the number of iterations that Gradient Descent runs for. Below we can see a sequence of images as the number of iterations is increased. We can see the content and style of the image forming out of the initial white noise.

At this resolution we are pretty much converged at 500 iterations, however for higher resolution and more complicated images more iterations might be necessary. We could also use a stopping criterion like when the loss function stops changing by some delta, rather than using a fixed number of iterations.

This problem converges quite well since it is mathematically well defined. Often in AI, we don’t get this good behaviour because the training data has lots of errors and/or lots of noise. Here we are just training against a content picture and one or more style pictures, so by definition there isn’t any erroneous data. These challenges would have been faced and solved by the team developing the VGG image recognition neural network that we get to just use and don’t have to worry about training.


As we can see we can get quite a few different effects by tuning the algorithm using the same style picture as a reference. Simple tools like Prisma or don’t let you play with all these parameters. As a photographer who is trying to get a specific effect, you want the power and flexibility to tune your style transfer exactly. Right now the only way to do this is to run the AI algorithms on your computer and play with them which is very time consuming. I suspect once this technology is incorporated in more advanced tools then various degrees of tuning will be possible. Adobe has been demonstrating Image Transfer Style in their labs, and it will be interesting to see if they incorporate it into Photoshop and then how much tuning is possible. Also if it runs in the Adobe Creative Cloud, it will be interesting to see whether it’s quicker running that way than running on your own computer.



Written by smist08

August 21, 2017 at 4:29 pm

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: