Stephen Smith's Blog

Musings on Machine Learning…

Archive for the ‘programming’ Category

Getting Productive with Julia

leave a comment »

Introduction

Julia is a programming language that is used quite extensively by the scientific community. It is open source, it just reached its version 1.0 milestone after quite a few years of development and it is nearly as fast as C but with many features associated with interpretive languages like R or Python.

There don’t seem to be many articles on getting all up and running with Julia, so I thought I’d write about some things that I found useful. This is all based on playing with Julia on my laptop running Ubuntu Linux.

Run in the Cloud

One option is to avoid any installation hassles is to just run in the cloud. You can do this with JuliaBox. JuliaBox gives you a Jupyter Notebook interface where you can either play with the various tutorials or do your own programming. Just beware the resources you get for free are quite limited and JuliaBox makes its money by charging you for additional time and computing power.

Sadly at this point, there aren’t very many options for running Julia in the cloud since the big AI clouds seem to only offer Python and R. I’m hoping that Google’s Kaggle will add it as an option, since the better performance will open up some intriguing possibilities in their competitions.

JuliaBox gives you easy direct access to all the tutorials offered from Julia’s learning site. Running through the YouTube videos and playing with these notebooks is a great way to get up to speed with Julia.

Installing Julia

Julia’s website documents how to install Julia onto various operating systems. Generally the Julia installation is just copying files to the right places and adding the Julia executable to the PATH. On Ubuntu you can search for Julia in the Ubuntu Software App and install it from there. Either way this is usually pretty easy straight forward. This gives you the ability to run Julia programs by typing “julia sourefile.jl” at a command prompt. If you just type “julia” you get the REPL environment for entering commands.

You can do quite a lot in REPL, but I don’t find it very useful myself except for doing things like package management.

If you like programming by coding in your favorite text editor and then just running the program, then this is all you need. For many purposes this works out great.

The Juno IDE

If you enjoy working in full IDE’s then there is Juno which is based on the open source Atom IDE. There are commercial variants of this IDE with full support, but I find the free version works just fine.

To install Juno you follow these instructions. Basically this involves installing the Atom IDE by downloading and running a .deb installation package. Then from within Atom, adding Julia support to the IDE.

Atom has integration with Julia’s debugger Gallium as well as provides a plot plane and access to watch variables. The editor is fairly good with syntax highlighting. Generally not a bad way to develop in Julia.

Jupyter

JuliaBox mentioned above uses Jupyter and runs it in the cloud. However, you can install it locally as well. Jupyter is a very popular and powerful notebook metaphor for developing programs where you see  the results of each incremental bit of code as you write it. It is really good at displaying all sorts of fancy formats like graphs. It is web based and will run a local web server that will use the local Julia installation to run. If you develop in Python or R, then you’ve probably already played with Jupyter.

To install it you first have to install Jupyter. The best way to do this is to use “sudo apt install jupyter”. This will install the full Jupyter environment with full Python support. To add Julia Jupyter support, you need to run Julia another way (like just entering julia to get the REPL) and type “Pkg.add(“IJulia”)”. Now next time you start Jupyter (usually by typing “jupyter notebook”), you can create a new notebook based on Julia rather than Python.

Julia Packages

Once you have the core Julia programming environment up and running, you will probably want to install a number of add-on packages. The package manager is call Pkg and you need to type “using Pkg” before using it. These are all installed by the Pkg.add(“”) command. You only need to add a package once. You will probably want to run “Pkg.update()” now and again to see if the packages you are using have been updated.

There are currently about 1900 registered Julia packages. Not all of them have been updated to Julia version 1.0 yet, so check the compatibility first. There are a lot of useful packages for things like machine learning, scientific calculations, data frames, plotting, etc. Certainly have a look at the package library before embarking on writing something from scratch.

Summary

These are currently the main ways to play with Julia. I’m sure since Julia is a very open community driven system, that these will proliferate. I don’t miss using the giant IDEs Visual Studio or Eclipse, these have become far too heavy and slow in my opinion. I find I evenly distribute my time between using Jupyter, Juno and just edit/run. Compared to Python it may appear their aren’t nearly as many tools for Julia, but with the current set, I don’t feel deprived.

 

Advertisements

Written by smist08

October 10, 2018 at 3:55 am

Julia Flux for Machine Learning

leave a comment »

Introduction

Flux is a Neural Network Machine Learning library for the Julia programming language. It is entirely written in Julia and relies on Julia’s built-in support for running on GPUs and providing distributed processing. It makes writing Neural Networks easy and leverages the power and expressiveness of the Julia language to make creating your Neural Network just the same as writing any other Julia expressions.

My last article pointed out some problems with using TensorFlow from Julia, due to many of the newer features being implemented in Python rather than being implemented in the core shared library. One recommendation from the TensorFlow folks is that if you want eager execution then use Flux rather than TensorFlow. The Flux folks claim a real benefit of Flux over TensorFlow is that you only need to know one language to do ML. Whereas for TensorFlow you need to know TensorFlow (its graph language) plus the host language like Python. Then it’s confusing because there is a lot of duplication and it isn’t always clear in which system to do things or whether to use a TensorFlow of Python data type. Flux then simplifies all this.

Although this all sounds wonderful remember that Julia just hit version 1.0 and Flux just hit version 0.67. The main problem I found was excessive memory usage, which I’ll benchmark and discuss later on.

Also note that Flux isn’t a giant compilation of algorithms like SciKit Learn. It is rather specific to Neural Networks. There are other libraries available in Julia for things like Random Forests, but you need to find the correct package and install it. Then each of these may or may not fully support Julia 1.0 yet.

MNIST in Flux

To give a flavour for using Julia and Flux here are a couple of examples from the FluxML model zoo. You can see it’s very simple to setup the Neural Network layers, perform the training and test the accuracy.

using Flux, Flux.Data.MNIST, Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle
using Base.Iterators: repeated
# using CuArrays

# Classify MNIST digits with a simple multi-layer-perceptron
imgs = MNIST.images()

# Stack images into one large batch
X = hcat(float.(reshape.(imgs, :))...) |> gpu

labels = MNIST.labels()
# One-hot-encode the labels
Y = onehotbatch(labels, 0:9) |> gpu

m = Chain(
  Dense(28^2, 32, relu),
  Dense(32, 10),
  softmax) |> gpu

loss(x, y) = crossentropy(m(x), y)

accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))

dataset = repeated((X, Y), 200)
evalcb = () -> @show(loss(X, Y))
opt = ADAM(params(m))

Flux.train!(loss, dataset, opt, cb = throttle(evalcb, 10))

println("acc X,Y ", accuracy(X, Y))

# Test set accuracy
tX = hcat(float.(reshape.(MNIST.images(:test), :))...) |> gpu
tY = onehotbatch(MNIST.labels(:test), 0:9) |> gpu

println("acc tX, tY ", accuracy(tX, tY))

Here is a more sophisticated model which uses a convolutional Neural Network.

using Flux, Flux.Data.MNIST, Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle
using Base.Iterators: repeated, partition
# using CuArrays

# Classify MNIST digits with a convolutional network
imgs = MNIST.images()

labels = onehotbatch(MNIST.labels(), 0:9)

# Partition into batches of size 1,000
train = [(cat(float.(imgs[i])..., dims = 4), labels[:,i])
         for i in partition(1:60_000, 1000)]

train = gpu.(train)

# Prepare test set (first 1,000 images)
tX = cat(float.(MNIST.images(:test)[1:1000])..., dims = 4) |> gpu
tY = onehotbatch(MNIST.labels(:test)[1:1000], 0:9) |> gpu

m = Chain(
  Conv((2,2), 1=>16, relu),
  x -> maxpool(x, (2,2)),
  Conv((2,2), 16=>8, relu),
  x -> maxpool(x, (2,2)),
  x -> reshape(x, :, size(x, 4)),
  Dense(288, 10), softmax) |> gpu

m(train[1][1])

loss(x, y) = crossentropy(m(x), y)

accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))

evalcb = throttle(() -> @show(accuracy(tX, tY)), 10)
opt = ADAM(params(m))

Flux.train!(loss, train, opt, cb = evalcb)

Performance

One of Julia’s promises is the ease of use of a scripting language like Python with the speed of a compiled language like C. As it stands Flux isn’t there yet. There seem to be some points where Flux goes away for a long time. These might be the garbage collector kicking in, or something else. I find the speed is about the same order of magnitude as other systems (modulo the pauses), but the big problem is memory usage.

To solve MNIST using a convolutional Neural Network from Python using the TensorFlow tutorial runs quite well and uses 400Meg of memory. Running the similar model using Julia and TensorFlow uses 600Meg of memory. Running the simple model above using Julia and Flux takes 2Gig or memory. Running the convolutional model above uses 2.6Gig. This laptop that I’m using has 4Gig of RAM and is running Ubuntu Linux. This is why I think the big stalls in performance is garbage collection.

The problem with this is that MNIST is a nice small dataset and the model used to solve it isn’t very large as Neural Networks go. If Flux is using six times as much memory as Python then it really diminishes its usefulness as an ML toolkit.

I spent a bit of time looking at the Julia Differential Equations tutorial. They were pointing out that using matrix operations in the Julia expression evaluator would lead to lots of unnecessary temporary storage for instance to evaluate:

D = A + B + C

Where these are all large matrices has to create a temporary matrix to hold the sum A + B which is then added to C. This temporary matrix has to be allocated from the heap and then later garbage collected. This process seems to be rather inefficient in Julia, at least by going by all the workarounds they have to avoid this situation. They have SVectors which are for small vectors that can be allocated on the stack rather than the heap. They recommend using the +. operator which does things element by element and is smart enough to not create lots of temporary values on the heap. I wonder if Flux needs some optimisations like they spent so much time putting into the Differential Equations library.

Summary

Julia and Flux make a nice system for Machine Learning in theory. I think until the technology matures a bit and some problems like memory management are better addressed, that using this for large projects is a bit problematic. A lot of the current ML systems being worked on with Flux are by PhD candidates who are developing Flux as part of their thesis work. Hopefully they improve the memory usage and allow Flux and Julia to live up to their full potential.

 

Written by smist08

September 24, 2018 at 9:02 pm

TensorFlow from Julia

with 2 comments

Introduction

Last time, I gave a quick introduction to the Julia programming language which has just reached the 1.0 release mark after ten years of development. Julia is touted as the next great thing for scientific computing, machine learning, data science and artificial intelligence. Its hope is to supplant Python which is currently the goto language in these fields. The goal is a more unified language, since it was developed well after Python and learned from a lot of its mistakes. It also claims to have the flexibility of Python but with the speed of a true compiled language like C.

I saw that in the list of packages there was support for using Google’s TensorFlow AI system natively from Julia so I thought I would give this a try. Although it worked, it did reveal some challenges that Julia is going to face in its battle to become a true equal with Python.

Using TensorFlow in Julia

The TensorFlow wrapper/interface for Julia is in a package created by a PhD candidate at MIT, Jon Malmaud. You can add it to Julia using Pkg.add(“TensorFlow”) as well as view the source code on GitHub. Since I wrote an article recently comparing TensorFlow running on a Raspberry Pi to running on my laptop, I thought I’d use the same example and compare Julia to those cases. I cut/pasted the code into the Julia IDE Juno and made some code syntax changes and gave it a go. It came back that the Keras object was undefined.

I then noticed that in the Tensorflow.jl github there were a couple of examples doing predictions on the MNIST dataset, so at least these were solving the same problem as my article, just using different models. I fired these up, but they failed with syntax errors in the code to load the MNIST dataset. Right now this is a bit of a problem in Julia that not all libraries have been updated to the Julia 1.0 syntax. I had a look at the library to load MNIST and noticed that no one had contributed to it in three years. It appeared to be abandoned with no plans to continue it. After a bit more research I found another Julia package called MLDatasets that was maintained and would load MNIST along with several other popular datasets.

I logged an issue with the Tensorflow.jl repository that they should fix this. They replied that they didn’t have time but if I wanted to fix it, to go ahead. So I fixed this and checked it in to the Tensorflow.jl Github. So now these MNIST examples work with Julia 1.0. I was then happy to have given my small contribution back to this community.

I then thought, why not be ambitious and add the Keras layer to Tensorflow.jl? Well this led to some interesting revelations to how Tensorflow is architected.

Problems with the Tensorflow Architecture

Looking at some of the issues in the Tensorflow.jl library there were requests for things like TensorFlow’s eager execution and the TensorFlow layers interface. The answer to these issues was that the Julia interface only talked to the DLL/SO interface to Tensorflow and that these modules didn’t exist there and were in fact written in Python rather than C++. I had a look inside the TensorFlow Github and found that their Keras layer is also written in Python.

Originally Tensorflow.jl talked to the Tensorflow Python interface. Julia is really good at interoperability and can easily talk to both Python libraries as well as C/C++ DLL/SOs. The problem with talking to Python libraries is that it involves running a Python process and then doing process to process communications to execute the code. This tends to be way slower than talking to DLLs or SOs. So early on the TensorFlow.jl library was changed to just talk to the DLL/SO interface for Tensorflow and eliminated all Python dependencies. This then lets Julia use the really performant part of TensorFlow and perform all the core operations very quickly.

Now the problem seems to be that Google is doing a lot of the new Tensorflow development in Python and not putting the code into the core shared library. Google is also spending a lot of time promoting these new interfaces as the way to go. This means if you aren’t programming in Python you are definitely a second class citizen.

OK, so is this just bad for the newbie language Julia? Should Julia programmers just use the Jula native Flux AI library? Well, the other thing Google is promoting is running TensorFlow on things like mobile devices, but then you are accessing TensorFlow from Swift on iOS or from Java on Android. Now you have the same problems as the Julia programmer. You only have efficient access to the core low level APIs for TensorFlow and all the new fancy high level access is denied to you. Google’s API block diagram below highlights this.

To me this is a big architectural problem with TensorFlow. Its great to use from Python, but is really limited in other environments. The videos and blogs starting to surface on TensorFlow 2.0 are promoting eager execution and the Keras layer will be the default and primary ways to program with TensorFlow. This then begs the question as to whether these will be moved into the core shared library or will remain as Python code? At this point I haven’t seen this explained, but as we get closer to the 2.0 preview later this year, I’ll be watching this keenly.

It would certainly be nice if they move this Python code into C++ in the shared library so everyone can use it. At that point I think TensorFlow would be much more usable from Julia, Swift, Java, C++, etc. Here’s hoping that is a major upgrade in the 2.0 release.

Julia TensorFlow Code

Just for interest here is the simplest Julia MNIST example just to give a flavour for the code. This is a simple linear model, so doesn’t give great results. There is a more complicated example that uses a convolutional neural network and gives far superior results.

using TensorFlow
include("mnist_loader.jl")

loader = DataLoader()

sess = Session(Graph())

x = placeholder(Float32)
y_ = placeholder(Float32)

W = Variable(zeros(Float32, 784, 10))
b = Variable(zeros(Float32, 10))

run(sess, global_variables_initializer())

y = nn.softmax(x*W + b)

cross_entropy = reduce_mean(-reduce_sum(y_ .* log(y), axis=[2]))
train_step = train.minimize(train.GradientDescentOptimizer(.00001), cross_entropy)

correct_prediction = argmax(y, 2) .== argmax(y_, 2)
accuracy=reduce_mean(cast(correct_prediction, Float32))

for i in 1:1000
    batch = next_batch(loader, 100)
    run(sess, train_step, Dict(x=>batch[1], y_=>batch[2]))
end

testx, testy = load_test_set()

println(run(sess, accuracy, Dict(x=>testx, y_=>testy)))

Summary

You can certainly use TensorFlow from Julia. Just beware that you are limited to the lower level APIs, so anything TensorFlow has implemented in Python isn’t available to you. This means you set up the graph and then execute it, really like you always did in the earlier versions of TensorFlow. It would certainly be nice if Google fixes this problem for TensorFlow 2.0.

Written by smist08

September 22, 2018 at 5:43 pm

Playing with Julia 1.0 on the Raspberry Pi

with 2 comments

Introduction

A couple of weeks ago I saw the press release about the release of version 1.0 of the Julia programming language and thought I’d check it out. I saw it was available for the Raspberry Pi, so I booted up my Pi and installed it. Julia has been in development since 2012, it was created by four MIT professors as an open source project for mathematical computing.

Why Julia?

Most people doing data science and numerical computing use the Python or R languages. Both of these are open source languages with huge followings. All new machine learning projects need to integrate to these to get anywhere. Both are very productive environments, so why do we need a new one? The main complaint about Python and R is that these are interpreted languages and as a result are very slow when compared to compiled languages like C. They both get around this by supporting large libraries of optimized code written in C, C++, Assembler and Fortran to give highly optimized off the shelf algorithms. These work great, but if one of these doesn’t apply and you need to write Python loops to process a large data set then it can get really frustrating. Another frustration with Python is that it doesn’t have a built in array data type and relies on the numpy and pandas libraries. Between these you can do a lot, but there are holes and strange differences between the two systems.

Julia has a powerful builtin array type and most of the array manipulation features of numpy and pandas are built in to the core language. Further Julia was created from scratch around powerful new just in time (JIT) compiler technology to provide both the speed of development of an interpreted language combined with the speed of a compiled language. You don’t get the full speed of C, but it’s close and a lot better than Python.

The Julia language borrows a lot of features from Python and I find programming in it quite similar. There are tuples, sets, dictionaries and comprehensions. Functions can return multiple values. For loops work very similarly to Python with ranges (using the : built into the language rather than the range() function).

Julia can call C functions directly (meaning you can get pointers to objects), and this allows many wrapper objects to have been created for other systems such as TensorFlow. This is why Julia is very precise about the physical representation of data types and the ability to get a pointer to any data.

Julia uses the end keyword to terminate blocks of code, rather than Pythons forced indentation or C’s semicolons. You can use semicolons to have multiple statements on one line, but don’t need them at the end of a line unless you want it to return null.

Julia has native built in support of most numeric data types including complex numbers and rational numbers. It has types for all the common hardware supported ints and floats. Then it also has arbitrary precision types build around GNU’s bignum library.

There are currently 1906 registered Julia packages and you can see the emphasis on scientific computing, along with machine learning and data science.

The creators of Julia always keep performance at the top of mind. As a result the parallelization support is exceptional along with the ability to run Julia code on CUDA NVidia graphics cards and easily setup clusters.

Is Julia Ready for Prime Time?

As of the time of this writing, the core Julia 1.0 language has been released and looks quite good. Many companies have produced impressive working systems with the 0.x versions of Julia. However right now there are a few problems.

  • Although Julia 1.0 has been released, most of the add on packages haven’t been upgraded to this version yet. In the first release you need to add the Pkg package to add other packages to discourage people using them yet. For instance the library with GPIO support for the Pi is still at version 0.6 and if you add it to 1.0 you get a syntax error in the include file.
  • They have released the binaries for all the versions of Julia, but these haven’t made them into the various package management systems yet. So for instance if you do “sudo apt install julia” on a Raspberry Pi, you still get version 0.6.

Hopefully these problems will be sorted out fairly quickly and are just a result of being too close to the bleeding edge.

I was able to get Julia 1.0 going on my Raspberry Pi by downloading the ARM32 files from Julia’s website and then manually copying them over the 0.6 release. Certainly 1.0 works much better than 0.6 (which segmentation faults pretty much every time you have a syntax error). Hopefully they update Raspbian’s apt repository shortly.

Julia for Machine Learning

There is a TensorFlow.jl wrapper to use Google’s TensorFlow. However the Julia group put out a white paper dissing the TensorFlow approach. Essentially TensorFlow is a separate programming language that you use from another programming language like Python. This results in a lot of duplication and forces the programmer to operate in two different paradigms at once. To solve this problem, Julia has the Flux machine learning system built natively in Julia. This is a fairly powerful machine learning system that is really easy to use, reducing the learning curve to getting working models. Hopefully I’ll write a bit more about Flux in a future article.

Summary

Julia 1.0 looks really promising. I think in a month or so all the add-on packages should be updated to the 1.0 level and all the binaries should make it out to the various package distribution repositories. In the meantime, it’s a good time to learn Julia and you can accomplish a lot with the core language.

I was planning to publish a version of my LED flashing light program in Julia, but with the PiGPIO package not updated to 1.0 yet, this will have to wait for a future article.

 

Written by smist08

August 31, 2018 at 7:34 pm

UI Testing in Swift

with one comment

Introduction

To round out my blog series on an introduction to Swift, this posting will be covering UI Testing. Previously we created a simple Swift program to draw a Koch Snowflake, adding some unit tests and then added some performance tests.

The source code for the project is on Google Drive here.

UI Testing actually runs the program like an end user would run the program and if you switch to the simulator while the test is running you can watch these actions take place. Unlike many other UI testing frameworks, this one just interacts with the screen controls, if done properly there is no code involving doing things at specific (x,y) co-ordinates. The magic that makes this work is the iOS accessibility layer that was created to help people with disabilities use Apple products. For instance, the VoiceOver feature that reads the screen needs to interact with the controls in the same way as our UI Tests.

This then means that UI Tests also provide a good means for testing some of the accessibility aspects of our iOS applications. Fully supporting accessibility is an often neglected area and really deserves more consideration. The great thing here is that by making your UI Tests thorough you are also validating that many accessibility technologies will also work.

UI Testing in XCode

When you create a new Swift project in XCode and select unit testing you also get a skeletal group for UI Tests with some setup and a dummy test. You create you test by selecting an empty (or not) test and then pressing record and then manually perform the tests. When you close the simulator a bunch of recorded code will be pasted into your project. This then is a great starting point for writing more thorough tests. You then use all the same XCTAssert type functions as in the unit testing framework to check for problems.

Screen Shot 2016-06-15 at 8.41.45 AM

Gotchas

Not Having Accessibility Setup Correctly

If you haven’t set an accessibility identifier for your control, you won’t get the correct code recorded. Recording will try its best, but it will give you something that probably won’t work. This happened to me. I kept the bad code from the first attempt in the file commented out so you can see it. Generally, if the accessibility is setup right, the code is simple complete and will work. If not, you will find things you did not recorded and other things having hardware or synchronicity problems (strange errors which if you google have workarounds but it all becomes quite complicated).

Screen Shot 2016-06-15 at 8.42.06 AM

Keyboards and other Hardware

I performed my tests on my MacBook which of course has a fixed keyboard. When recording tests, make sure you use the iOS keyboard (that is on the screen). Generally, you want the tests to use all the iOS stuff and not the macOS stuff which makes using the simulator easier. Another approach is to access text fields via the clipboard using cut/paste so as to avoid the keyboard entirely. I tend to think for a good UI test you should test all the cases, but perhaps not on every text field. Also beware text already in text boxes that may need to be cleared first. One way to do this (probably the best way) is to add a clear button in the text boxes properties and then press this. In the recorded sample I hit the delete key a couple of times. Note that tapping a field usually doesn’t select all the text.

Synchronization

Beware that if you cause something to popup or be created, chances are your test code will run faster than that and start using things before they are created. You will need to add wait loops to wait for controls to exist before using them. This case doesn’t happen in the Koch snowflake program. Generally, you don’t want to insert sleep type statements to wait a couple of seconds, this slows down your UI tests and can prove unreliable and lead to investigating a lot of false failed tests. Always better to look for specific events and to proceed quickly.

The Test

The code for the test is below. The setUp and teardown methods were generated by XCode and I didn’t change them. The code for the testExample routine was generated by recording, then I just cleaned up a bit of noise. The intent is that it sets fractal level to 3 and then to 4. If you click on the simulator while running, then you can see this happen. Unfortunately, there isn’t really a good way to validate that it works correctly, so this is really only a run without crashing sort of test, unless you manually observe it.

//
//  KochSnowFlakeUITests.swift
//  KochSnowFlakeUITests
//
//  Created by Stephen Smith on 2016-05-13.
//  Copyright © 2016 Stephen Smith. All rights reserved.
//

import XCTest

class KochSnowFlakeUITests: XCTestCase {

    override func setUp() {
        super.setUp()

        // Put setup code here. This method is called before the invocation of each
        // test method in the class.
        // In UI tests it is usually best to stop immediately when a failure occurs.

        continueAfterFailure = false

        // UI tests must launch the application that they test.
        //Doing this in setup will make sure it happens for each test method.
        XCUIApplication().launch()

        // In UI tests it’s important to set the initial state -
        // such as interface orientation - required for your tests before they run.
        // The setUp method is a good place to do this.
    }

    override func tearDown() {
        // Put teardown code here. This method is called after the invocation of
        // each test method in the class.
        super.tearDown()
    }

    func testExample() {

        let app = XCUIApplication()
        app.textFields["textField"].tap()

        let deleteKey = app.keys["delete"]
        deleteKey.tap()
        deleteKey.tap()
        app.textFields["textField"].typeText("3")

        let returnButton = app.buttons["Return"]
        returnButton.tap()

        deleteKey.tap()
        deleteKey.tap()
        app.textFields["textField"].typeText("4")
        returnButton.tap()
    }
}

Summary

The UI testing support built into XCode and Swift is quite nice. Certainly comparable to some quite expensive packages available in the Windows world. Since iOS and macOS are quite a controlled environment and the accessibility support is quite good, this makes this package quite nice. The main thing to watch out for is the proliferation of Apple hardware to check. It appears that going forwards Apple is spending quite a bit of time ensuring automated testing works quite well for their development platform.

Written by smist08

June 15, 2016 at 4:09 pm

Posted in Mobility, programming

Tagged with , , ,

Performance Testing in Swift

with one comment

Introduction

A couple of blog posts ago I covered writing my first Swift program for iOS so that I could draw a Koch Snowflake on an iPad or an iPhone. Then last time I covered adding unit tests to that project. This time I’m going to add performance tests.

In the process of adding performance tests, I had to refactor the test project and we’ll also look at why that was and how it makes it better going forwards as more tests are added. I’ll also mention a few things that should be done if this project gets a bit bigger.

I put an updated version of the Koch Snowflake project on Google Drive here.

Performance Tests in XCode

Of course you could instrument your program yourself and perhaps write the performance results out to a file or something, for that matter you can drill down into the Swift test case class and have a look at their implementation. But XCode gives you a bit of support so you generally don’t need to. If you add self.measureBlock {} around code in a unit test then the time taken of the code inside the measureBlock will be recorded and reported inside XCode as shown in the following screenshot.

Screen Shot 2016-06-01 at 3.32.53 PM

Actually it does a bit more than that. When you add measureBlock to a unit test, then when you run that unit test, it won’t just be run once, but will be run ten times, so that the average and standard deviation will be recorded. Due to this it is crucial that any performance tests are idempotent. You can also set a baseline, so the percentage deviation from the baseline gets recorded. This is shown in the following screenshot that is a drill down from the previous screen shot.

Screen Shot 2016-06-01 at 3.33.04 PM

Hence XCode gives a fairly painless way to add some performance metrics to your unit tests.

Test Case Organization

Generally, you want your unit tests to run against every build or your product, so you want then to run in a second or two. Once the performance tests get longer, you will probably want to separate them off into a separate test group and then run this test group perhaps once over night. I haven’t done that, but if the project gets any bigger then I will.

In fact, the test framework inside XCode is quite good for performing integration tests (which would run against real databases and real servers), but since these may require some setup or be quite time consuming, you could also set these to run once per night.

There is also a separate framework for UI testing, which again is too slow to run against every build, but makes sense to run every night.

Refactoring the Unit Tests

For the performance test, I wanted to record the time it takes to draw the Koch Snowflake at various fractal levels. To do this I wanted to do something like the previous testInitialViewController routine, but it contained a lot of setup code. So first the unit test framework includes setUp function that is called before the unit tests are run and a tearDown routine that is called after then finished. So I moved the creating of the graphic context to this routine, along with the code to get the view controller started. Then it was fairly easy to add tests for fractal levels 3 through 7.

Last time I just had 2 unit tests, each was quite large and performed multiple things. Now we’ve split things up into more unit tests that do less, which is generally a better practice. This was actually forced on me since you can only have one measureBlock in any unit test, so I couldn’t performance test the different fractal levels in the same unit test (at least with separate timings). Really I should break up the turtle graphics tests into multiple unit tests, perhaps next time.

The reason I went all the way to fractal level 7, was that the performance reports in XCode are often 2 decimal places (or sometimes 3 decimal places) on the number of seconds the test takes. For my fractal, the drawing is quite quick so I needed go this high to get some longer test times recorded (kind of a good problem to have). I could have gone higher or put them in an additional loop, but thought this was sufficient.

//
//  KochSnowFlakeTests.swift
//  KochSnowFlakeTests
//
//  Created by Stephen Smith on 2016-05-13.
//  Copyright © 2016 Stephen Smith. All rights reserved.
//

import XCTest

@testable import KochSnowFlake

class KochSnowFlakeTests: XCTestCase {
    var storyboard:UIStoryboard!
    var viewController:ViewController!

    override func setUp() {
        super.setUp()
        // Put setup code here. This method is called before the invocation of each test method in the class.

        UIGraphicsBeginImageContextWithOptions(CGSize(width: 50, height: 50), false, 20);

        self.storyboard = UIStoryboard(name: "Main", bundle: nil)

        self.viewController = storyboard.instantiateInitialViewController() as! ViewController
        _ = viewController.view
        viewController.viewDidLoad()
    }

    override func tearDown() {
        // Put teardown code here. This method is called after the invocation of each test method in the class.
        UIGraphicsEndImageContext();
        super.tearDown()
    }

    func testTurtleGraphics() {
        // Test the turtle graphics library.
        // Note we need a valid graphics context to do this.

        let context = UIGraphicsGetCurrentContext();
        let tg = TurtleGraphics(inContext: context!);
        XCTAssert(tg.x == 50, "Initial X value should be 50");
        XCTAssertEqual(tg.y, 150, "Initial Y value should be 150");
        XCTAssertEqual(tg.angle, 0, "Initial angle should be 0");
        tg.move(10);
        XCTAssertEqual(tg.x, 60, "X should be incremented to 60");
        XCTAssertEqual(tg.y, 150, "Initial Y value should be 150");
        XCTAssertEqual(tg.angle, 0, "Initial angle should be 0");
        tg.turn(90);
        tg.move(10);
        XCTAssertEqualWithAccuracy(tg.x, 60, accuracy: 0.0001, "X should be o 60");
        XCTAssertEqualWithAccuracy(tg.y, 160, accuracy: 0.0001, "Initial Y value should be 160");
        XCTAssertEqual(tg.angle, 90, "Initial angle should be 90");
        tg.turn(-45);
        tg.move(10);
        XCTAssertEqualWithAccuracy(tg.x, 60 + 10 * sqrt(2) / 2, accuracy: 0.0001, "X should be o 60+10*sqrt(2)/2");
        XCTAssertEqualWithAccuracy(tg.y, 160 + 10 * sqrt(2) / 2, accuracy: 0.0001, "Initial Y value should be 160+10*sqrt(2)/2");
        XCTAssertEqual(tg.angle, 45, "Initial angle should be 45");
    }

    func testPerformanceLevel3()
    {
        // Test that the storyboard is connected to the view controller and
        // that we can create and use the view and controls.

        viewController.fractalLevelTextField.text = "3"
        self.measureBlock {
            self.viewController.textChangeNot("dummy")
            self.viewController.fracView.drawRect(CGRect(x:0, y:0, width: 50, height: 50))
        }

        XCTAssertTrue(viewController.fracView.level == 3)
        // This next line is just to get 100% code coverage.
        viewController.didReceiveMemoryWarning()
    }

    func testPerformanceLevel4() {
        // This is an example of a performance test case.
        viewController.fractalLevelTextField.text = "4"
        self.measureBlock {
            self.viewController.textChangeNot("dummy")
            self.viewController.fracView.drawRect(CGRect(x:0, y:0, width: 50, height: 50))
        }
    }

    func testPerformanceLevel5() {
        // This is an example of a performance test case.
        viewController.fractalLevelTextField.text = "5"
        self.measureBlock {
            self.viewController.textChangeNot("dummy")
            self.viewController.fracView.drawRect(CGRect(x:0, y:0, width: 50, height: 50))
        }
    }

    func testPerformanceLevel6() {
        // This is an example of a performance test case.
        viewController.fractalLevelTextField.text = "6"
        self.measureBlock {
            self.viewController.textChangeNot("dummy")
            self.viewController.fracView.drawRect(CGRect(x:0, y:0, width: 50, height: 50))
        }
    }

    func testPerformanceLevel7() {
        // This is an example of a performance test case.
        viewController.fractalLevelTextField.text = "7"
        self.measureBlock {
            self.viewController.textChangeNot("dummy")
            self.viewController.fracView.drawRect(CGRect(x:0, y:0, width: 50, height: 50))
        }
    }
}

 

Summary

I found adding performance tests to my fractal iOS application quite easy. XCode gives quite nice support to perform these tests painlessly, hopefully motivating more programmers to include them.

At this point I’m not going to optimize the code as it is running fast enough. But if I ever take on drawing more sophisticated or complicated fractals, then drawing speed becomes really important. Some things to consider would be how efficient in the recursive algorithm used, and whether I’m efficiently using floating point and integer arithmetic (or are there unnecessary conversions or perhaps too much precision being used).

Written by smist08

June 7, 2016 at 2:15 am

Adding Unit Tests to My Swift Application

with 3 comments

Introduction

Last time I blogged on creating my first Swift application that would display a Koch Snowflake fractal on an iPad or and iPhone. This time I’m going to look at adding some unit tests to that program to see how that works in XCode and Swift. It appears that Apple has built some nice unit testing support into XCode so doing this was actually quite easy.

I put the Koch Snowflake Swift project on Google Drive here. Feel free to download it and play with it. Refer to this project or the previous blog post for the code we are actually testing in this article.

Test Driven Development

If I was following proper Test Driven Development (TDD) then I should have written the unit tests before I wrote the actual code. This would force me to think about the design of the APIs and the testability of the program first, and then would force quick cycles of writing tests, writing code so the tests pass and then refactoring the code to make it all better. However, in this case I ported an old Objective-C program to Swift to play with Swift and now that I’m getting more familiar with Swift, I’m finally looking at the unit testing framework. So what I’m doing is perhaps not a good practice, but adding the unit tests later is better than not adding them at all.

The danger I could have run into is that the program could have turned out to be quite hard to test due to perhaps some bad design decisions and then required quite a bit of refactoring to make it properly unit testable. Fortunately, in this case the program is fairly simple and adding a good set of unit tests was fairly straight forward and didn’t require any changes to the program.

One criticism of TDD is that there is a perception that it interferes with good program design and architecture. This isn’t true. The misconception is that the first tests need to be truly useful, whereas the first tests usually just call the API and set how the API to the various classes is used. The classes are just skeletons and the unit tests pass as long as everything compiles. Getting everything to compile is the first step and thinking about all the APIs before writing the real code that does stuff is the first step to getting a good modular design.

Apple has adopted TDD for a lot of their own projects, for instance their Core Data module was developed this way. Since Apple developers use XCode and now many are using TDD this means that a lot of good support for TDD is being baked into XCode. It also means that newer Apple technologies are much easier to use in a TDD environment than some of their older ones. But even for the older ones there are lots of clever examples on the web of how to work around the various challenges they introduce.

Screen Shot 2016-05-31 at 10.41.21 AM

Testing Turtle Graphics

When you create a project with XCode you have the option to include unit tests (which I did) and this gives you a test project that you can run with some dummy tests in it. So I figured first let’s add some unit tests to my TurtleGraphics class. This is pretty simple you move the turtle and turn the turtle and it leaves a trail behind it. For drawing fractals turtle graphics makes the algorithms very easy since it fits in well with the recursive algorithms typically used.

The turtle graphics class uses Apple’s Quartz2D library to draw on the screen. Perhaps the first possible design flaw is that when I instantiate the turtle graphics library I need to pass its constructor a valid graphics context. The underlying Quartz2D library routines certainly don’t like this being nil and throw an exception if you try that (ie my first try). So I thought I would just create a graphic context object but after battling with Swift a bit, I found I couldn’t do this because the underlying class doesn’t have any constructors. This also prevented me from mocking the class (at least using the real class and replacing just a few methods). This is because Apple wants to make sure you get a valid graphics context from one of their factory methods. If you are actually in the drawing routine this is passed to you by the system, but as a unit test, we aren’t running in a screen drawing context. Fortunately there is a way to get a graphics context to draw on an in-memory image file, so I used that.

To solve this, I could have refactored the drawing routines into another class and had this class accept nil for the graphics context and just return when this happens. This is basically just making a new class that pass through to the real class when the program runs, or acts as a stub class when running unit tests. I didn’t like this so much since then if the real program did pass nil, it wouldn’t get caught and the program just wouldn’t work properly. Even though I’m not testing the actual placement of pixels on the screen, it makes me feel good to know the actual real drawing routines are being exercised during the unit tests, so if they do get an invalid argument or something the problem will be caught.

The unit test framework lets unit tests get at variables in a class to help you set up the asserts to prove correctness (you import the module via @testable import KochSnowFlake rather than just import), so to test it, I just did some simple move and turns and then tested that the internal state of the turtle graphics module was correct. Notice when I turned 45 degrees I used trigonometry to calculate the new position, for testing its usually good to test against a calculation, rather than running the program to see what the value is and then using that. Notice the use of XCTAssertEqualWithAccuracy, since we are using floating point arithmetic, rounding errors creep in, so the equals will never be exact. You can configure XCode to either stop on each error or run to completion just recording all the errors, its up to you. For this I usually just stopped on an error and then fixed the problem.

func testTurtleGraphics() {
 // Test the turtle graphics library.
 // Note we need a valid graphics context to do this.
 
 UIGraphicsBeginImageContextWithOptions(CGSize(width: 50, height: 50), false, 20);
 let context = UIGraphicsGetCurrentContext();
 let tg = TurtleGraphics(inContext: context!);
 XCTAssert(tg.x == 50, "Initial X value should be 50");
 XCTAssertEqual(tg.y, 150, "Initial Y value should be 150");
 XCTAssertEqual(tg.angle, 0, "Initial angle should be 0");
 tg.move(10);
 XCTAssertEqual(tg.x, 60, "X should be incremented to 60");
 XCTAssertEqual(tg.y, 150, "Initial Y value should be 150");
 XCTAssertEqual(tg.angle, 0, "Initial angle should be 0");
 tg.turn(90);
 tg.move(10);
 XCTAssertEqualWithAccuracy(tg.x, 60, accuracy: 0.0001, "X should be o 60");
 XCTAssertEqualWithAccuracy(tg.y, 160, accuracy: 0.0001, "Initial Y value should be 160");
 XCTAssertEqual(tg.angle, 90, "Initial angle should be 90");
 tg.turn(-45);
 tg.move(10);
 XCTAssertEqualWithAccuracy(tg.x, 60 + 10 * sqrt(2) / 2, accuracy: 0.0001, "X should be o 60+10*sqrt(2)/2");
 XCTAssertEqualWithAccuracy(tg.y, 160 + 10 * sqrt(2) / 2, accuracy: 0.0001, "Initial Y value should be 160+10*sqrt(2)/2");
 XCTAssertEqual(tg.angle, 45, "Initial angle should be 45");
 
 UIGraphicsEndImageContext();
 
 }

Testing the View Controller

Now let’s test things at a higher level. Generally, we don’t do UI testing in unit tests, but we can do some basic tests to ensure everything is created properly and is more or less hooked up correctly. XCode does have built in support for UI testing and perhaps we’ll have a look at that in a later blog post. In this case we are going to create the storyboard (where the UI is defined) and from that instantiate our View Controller. Then we do a dummy access for the view to get the delay loaded elements loaded and start calling the View Controllers methods starting with viewDidLoad(). For our test we set the fractal level to 3 in the text box and then call the notification methods and see if it then updated properly in our View.

func testInitialViewController()
 {
 // Test that the storyboard is connected to the view controller and
 // that we can create and use the view and controls.
 
 let storyboard = UIStoryboard(name: "Main", bundle: nil)
 
 let viewController = storyboard.instantiateInitialViewController() as! ViewController
 _ = viewController.view
 viewController.viewDidLoad()
 
 viewController.fractalLevelTextField.text = "3"
 viewController.textChangeNot("dummy")
 XCTAssertTrue(viewController.fracView.level == 3)
 // This next line is just to get 100% code coverage.
 viewController.didReceiveMemoryWarning()
 }

Test Coverage

The big questions that is usually asked about unit tests, is how much code to they cover (or exercise). XCode can give this report which you can see in the screenshot:

Screen Shot 2016-05-31 at 10.42.53 AM

Amazingly we nearly get 100% coverage. The only module with poor coverage is AppDelegate.swift which is code generated by XCode and then I never use it. The reason we get such good coverage is that this is a fairly simple program and it draws the snowflake when it initializes. But the key point here is that I am testing a lot of things fairly easily and fairly quickly. It tests the functionality of the turtle graphics library and it tests that the UI is all hooked up properly and working.

A true TDD adherent would delete all the code in AppDelegate.swift, since code shouldn’t be written unless it is to make a unit test pass. But I tend to give an exception to code generated by the tools like XCode, to me this code is Apple’s responsibility and not mine. Plus, if I ever do add code to this class then I would need to start adding unit tests and take it a bit more seriously.

Harder Cases

This program is pretty simple, so I didn’t need to do anything too fancy to unit test it. This is mainly because there is no database access, no hardware access (beyond drawing and one text field) and no communication with a server. If you add these then you would need to add mock, stub and/or fake classes to assist in testing. You want your unit tests to be fast and run on every compile. You don’t want to require databases be setup or that servers are running somewhere in a certain state. Fortunately, Swift is a very powerful object oriented language and creating all of these is fairly easy (extensions are especially helpful). The main problems you run into are in using the Objective-C libraries like UIKit, but again you aren’t the first to run into these and there are tons of sample code, blogs and lessons on how to deal with these.

Another tricky area is with asynchronous calls. When you make a call to the server, the response will be returned asynchronously. What this means that as soon as your unit test sends the server request, it will complete. Since the asynchronous request comes back on a separate thread later, you need to ensure your main thread doesn’t just end and you wait for the results, or the tests in the callback will never be executed. There are lots of examples on how to do this, but it definitely adds some complexity to your tests.

Continuous Build Server

If you have a team, then you will probably want a continuous build server that monitors your source code repository (probably Git) and then does a build and test each time something new is checked in. Apple has an XCode Server that you can purchase if you are a registered developer. Or you could use the Jenkins Build Server which is open source and free. Jenkins also has an XCode plugin to make life a bit easier. Though XCode is quite good at letting you control it from the command line. This way you can be notified as soon as something breaks and ensure problems get fixed right when they are caused, which is the easiest time to fix them.

Summary

That turned out to be quite a long posting on unit tests. XCode and Swift have a very powerful and easy to work with unit testing framework built in. If you are developing a new application, certainly consider using TDD for your development methodology. If you are porting or doing a new version of an app, then certainly try adding some unit tests. Unit tests will help you greatly over the lifetime of your product, making it more reliable, bug free and easier to maintain.

Written by smist08

May 31, 2016 at 8:17 pm