Stephen Smith's Blog

Musings on Machine Learning…

Julia Flux for Machine Learning

leave a comment »


Introduction

Flux is a Neural Network Machine Learning library for the Julia programming language. It is entirely written in Julia and relies on Julia’s built-in support for running on GPUs and providing distributed processing. It makes writing Neural Networks easy and leverages the power and expressiveness of the Julia language to make creating your Neural Network just the same as writing any other Julia expressions.

My last article pointed out some problems with using TensorFlow from Julia, due to many of the newer features being implemented in Python rather than being implemented in the core shared library. One recommendation from the TensorFlow folks is that if you want eager execution then use Flux rather than TensorFlow. The Flux folks claim a real benefit of Flux over TensorFlow is that you only need to know one language to do ML. Whereas for TensorFlow you need to know TensorFlow (its graph language) plus the host language like Python. Then it’s confusing because there is a lot of duplication and it isn’t always clear in which system to do things or whether to use a TensorFlow of Python data type. Flux then simplifies all this.

Although this all sounds wonderful remember that Julia just hit version 1.0 and Flux just hit version 0.67. The main problem I found was excessive memory usage, which I’ll benchmark and discuss later on.

Also note that Flux isn’t a giant compilation of algorithms like SciKit Learn. It is rather specific to Neural Networks. There are other libraries available in Julia for things like Random Forests, but you need to find the correct package and install it. Then each of these may or may not fully support Julia 1.0 yet.

MNIST in Flux

To give a flavour for using Julia and Flux here are a couple of examples from the FluxML model zoo. You can see it’s very simple to setup the Neural Network layers, perform the training and test the accuracy.

using Flux, Flux.Data.MNIST, Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle
using Base.Iterators: repeated
# using CuArrays

# Classify MNIST digits with a simple multi-layer-perceptron
imgs = MNIST.images()

# Stack images into one large batch
X = hcat(float.(reshape.(imgs, :))...) |> gpu

labels = MNIST.labels()
# One-hot-encode the labels
Y = onehotbatch(labels, 0:9) |> gpu

m = Chain(
  Dense(28^2, 32, relu),
  Dense(32, 10),
  softmax) |> gpu

loss(x, y) = crossentropy(m(x), y)

accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))

dataset = repeated((X, Y), 200)
evalcb = () -> @show(loss(X, Y))
opt = ADAM(params(m))

Flux.train!(loss, dataset, opt, cb = throttle(evalcb, 10))

println("acc X,Y ", accuracy(X, Y))

# Test set accuracy
tX = hcat(float.(reshape.(MNIST.images(:test), :))...) |> gpu
tY = onehotbatch(MNIST.labels(:test), 0:9) |> gpu

println("acc tX, tY ", accuracy(tX, tY))

Here is a more sophisticated model which uses a convolutional Neural Network.

using Flux, Flux.Data.MNIST, Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle
using Base.Iterators: repeated, partition
# using CuArrays

# Classify MNIST digits with a convolutional network
imgs = MNIST.images()

labels = onehotbatch(MNIST.labels(), 0:9)

# Partition into batches of size 1,000
train = [(cat(float.(imgs[i])..., dims = 4), labels[:,i])
         for i in partition(1:60_000, 1000)]

train = gpu.(train)

# Prepare test set (first 1,000 images)
tX = cat(float.(MNIST.images(:test)[1:1000])..., dims = 4) |> gpu
tY = onehotbatch(MNIST.labels(:test)[1:1000], 0:9) |> gpu

m = Chain(
  Conv((2,2), 1=>16, relu),
  x -> maxpool(x, (2,2)),
  Conv((2,2), 16=>8, relu),
  x -> maxpool(x, (2,2)),
  x -> reshape(x, :, size(x, 4)),
  Dense(288, 10), softmax) |> gpu

m(train[1][1])

loss(x, y) = crossentropy(m(x), y)

accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))

evalcb = throttle(() -> @show(accuracy(tX, tY)), 10)
opt = ADAM(params(m))

Flux.train!(loss, train, opt, cb = evalcb)

Performance

One of Julia’s promises is the ease of use of a scripting language like Python with the speed of a compiled language like C. As it stands Flux isn’t there yet. There seem to be some points where Flux goes away for a long time. These might be the garbage collector kicking in, or something else. I find the speed is about the same order of magnitude as other systems (modulo the pauses), but the big problem is memory usage.

To solve MNIST using a convolutional Neural Network from Python using the TensorFlow tutorial runs quite well and uses 400Meg of memory. Running the similar model using Julia and TensorFlow uses 600Meg of memory. Running the simple model above using Julia and Flux takes 2Gig or memory. Running the convolutional model above uses 2.6Gig. This laptop that I’m using has 4Gig of RAM and is running Ubuntu Linux. This is why I think the big stalls in performance is garbage collection.

The problem with this is that MNIST is a nice small dataset and the model used to solve it isn’t very large as Neural Networks go. If Flux is using six times as much memory as Python then it really diminishes its usefulness as an ML toolkit.

I spent a bit of time looking at the Julia Differential Equations tutorial. They were pointing out that using matrix operations in the Julia expression evaluator would lead to lots of unnecessary temporary storage for instance to evaluate:

D = A + B + C

Where these are all large matrices has to create a temporary matrix to hold the sum A + B which is then added to C. This temporary matrix has to be allocated from the heap and then later garbage collected. This process seems to be rather inefficient in Julia, at least by going by all the workarounds they have to avoid this situation. They have SVectors which are for small vectors that can be allocated on the stack rather than the heap. They recommend using the +. operator which does things element by element and is smart enough to not create lots of temporary values on the heap. I wonder if Flux needs some optimisations like they spent so much time putting into the Differential Equations library.

Summary

Julia and Flux make a nice system for Machine Learning in theory. I think until the technology matures a bit and some problems like memory management are better addressed, that using this for large projects is a bit problematic. A lot of the current ML systems being worked on with Flux are by PhD candidates who are developing Flux as part of their thesis work. Hopefully they improve the memory usage and allow Flux and Julia to live up to their full potential.

 

Advertisements

Written by smist08

September 24, 2018 at 9:02 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: