The Road to TensorFlow – Part 2: Python
This is part 2 on my blog series on playing with TensorFlow. Last time I blogged on getting Linux going in a VM. This time we will be talking about the Python programming language. The API for TensorFlow is primarily aimed at Python and in fact much of the research in AI, scientific computing, numerical computing and data research all takes place in Python. There is a C++ API as well, but it seems like a good chance to give Python a try.
Python is an interpreted language that is very rich in supporting various programming paradigms like object oriented, procedural and functional. Python is open source and runs on many platforms. Most Linux’s and the MacOS come with some version of Python pre-installed. Python is very interoperable and can work with most other programming systems, and there are a huge number of libraries of functionality available to the Python programmer. Python is oriented to getting things done quickly with a minimum of code and a minimum of fuss. The name Python is a tribute to the comedy troupe Monty Python and there are many references to Monty Python throughout the documentation.
Installation and Versions
Although I generally like Python it has one really big problem that is generally a pain in the ass when setting up new systems and browsing documentation. The newest version of Python as of this writing is 3.5.2 which is the one I wanted to use along with all the attendant libraries. However, if you type python in a terminal window you get 2.7.12. This is because when Python went to version 3 it broke source code compatibility. So they made the decision to maintain version 2 going forwards while everyone updated their programs and scripts to version 3. Version 3.0 was released in 2008 and this mess is still going on eight years later. The latest Python 2.x, namely 2.7.12 was just released in June 2016 and seems to be quite actively developed by a good sized community. So generally to get anything Python 3.x you need to add a 3 to the end. So to run Python 3.5.2 in a terminal window you type python3. Similarly, the IDE is IDLE3 and the package installer is pip3. It makes it very easy to make a mistake an to get the wrong thing. Worse the naming isn’t entirely consistent across all packages, there are several that I’ve run into where you add a 2 for the 2.x version and the version 3 one is just the name. As a result, I always get a certain amount of Python 2.x stuff accidentally installed by mistake (which doesn’t hurt anything, just wastes time and disk space). This also leads to a bit of confusion when you Google for information, in that you have to be careful to get 3.x info rather than 2.x info as the wrong one may or may not work and may or may not be a best practice.
On Ubuntu Linux I just used apt-get to install the various packages I needed. I’ll talk about these a bit more in the next posting. Another option for installing Python and all the scientific libraries is to use the Anaconda distribution which is quite a good way to get everything in Python installed all at once. I used Anaconda to install Python on Windows 10 at it worked really well, you just don’t get the fine control of what it does and it creates a separate installation to keep everything separate from anything already installed.
Python the Language
Python is a very large language; it has everything from object orientation to functional programming to huge built in libraries. It does have a number of quirks though. For instance, the way you define blocks is via indentation rather than using curly brackets or perhaps end block statements. So indentation isn’t just a style guideline, it’s fundamental to how the program works. In the following bit of code:
for i in range(10):
a = i * 8
print( i, a )
a = 8
the two indented statements are part of the for loop and the out-dented assignment is outside the loop. You don’t define variables, they are defined when first assigned to, and you can’t use a variable without assigning it first (or an exception will be thrown). There are a lot of built in types including dictionaries and lists, but no array type (but the numpy library does add these). Notice how the for loop uses in rather than to, to do a basic loop.
I don’t want to get too much into the language since it is quite large. If you are interested there are many good sites on the web to teach Python and the O’Reilly book “Learning Python” is recommended (but quite long).
Since Python is interpreted, you don’t need to wait for any compile steps so the coding, testing, debugging cycle is quite quick. Writing tight loops in Python will be slower than C, but generally Python gives you quite good libraries to do most of what you want and the libraries tend to be written in C or Fortran and very fast. So far I haven’t found speed to be an issue. TensorFlow is also written in C for speed, plus it has the ability to run on NVidia graphics cards for an extra boost.
This was my quick intro to Python. I’ll talk more about relevant parts of Python as I go along in this series. I generally like Python and so far my only big complaint is the confusion between the version 2 world and the version 3 world.