I’ve talked a bit about SaaSifying Accpac and SaaSifying Sage. Today I’m going to talk a bit more about continuous deployment which is one of the end goals of these themes. This is the notion that there is an automated process that will take code from the programmer, build a product, test it and then deploy it so that customers are running it immediately. If you’ve worked in software development, this can be a scary notion, since you’re lucky if programmers check in something that compiles, let alone something you want customers running without a lot of checks and balances. The key to continuous deployment is automating many manual processes that used to be very repetitive, time consuming and error prone.
A key to successful continuous deployment is to always be challenging the status-quo. Don’t accept that something absolutely requires manual intervention. Work hard to eliminate all gate-keepers. In a way continuous deployment is a mindset and process where you are always looking to improve your automated processes, eliminate manual processes and get features to customers as soon as the programmer commits their changes. Part and parcel with this is breaking major functionality down into many small features that are all deployed as they are done. Breaking things down into very small pieces is the key to success and the key to speeding up development. Implementing things as one big piece tends to be hard, complex and leads to customer problems.
Building the Product
Generally with modern source control systems, programmers should only check things into the main trunk once they have completed something including unit testing, automated tests and manual testing. If they need intermediate check-ins then they should use a code branch and merge that in when they are finished. However with large development projects, you can still get conflicts and unexpected results. You want to find these as quickly as possible and fix them right away while things are fresh in your mind.
The programmer checking in code is the beginning of the continuous deployment process. If they have been successful in coding and testing a useful feature for your customers, then you would like this to be available to your customers in a very short order. I blogged on the desirability of doing this here.
Now you have a build server running somewhere, perhaps being controlled by a continuous integration system such as Jenkins. The system on this server will be scanning source control for any committed changes, as soon as it sees one it will get it and build it and run all unit tests. Further it has a tree of all the dependencies in the system (usually using Ivy) and will build all dependent modules and run their unit tests. If anything fails then a number of designated people will receive e-mails that the build is now failing. The person that checked in the changes will also be notified. It is now up to them to quickly fix things (the system can be configured to automatically roll back whatever they checked in, but usually the build system just keeps using the last good build).
The key point here is that you find out right away that you broke the build. This is in contrast to say doing nightly builds, where you don’t know until the next day, and there may have been many check-ins so it can be time consuming finding the real culprit. Plus it now interrupts whatever you were meant to do the next day.
Continuous Test Server
Usually beyond unit tests, you have a continuous test server that may be using Fitnesse to run Selenium UI based tests. These tests are too long to be unit tests and would slow the build server down. However you want builds to be regularly deployed to this server and the full set of UI based automated tests to be run. Perhaps after it runs the full suite it would re-deploy the latest build (if newer) and run all the tests again and just keep continuously doing this. If a test fails it would immediately notify the appropriate people and demand prompt action to resolve the problem either by fixing the problem, fixing the test or rolling something back. Again the goal here is to detect problems as quickly as possible and to force them to be resolved quickly, not allowing problems to pile up. Generally you want to keep your product in a releasable state as I blogged about here.
Automated Processes to Deploy
Next we want to start deploying for various users to start playing with the system. So we have a build that has passed the unit tests and the automated longer running tests, now we start deploying for review by various departments. Usually this will mean automatic deployment to a server that various interested internal parties know how to access, perhaps with a common URL. Perhaps the build is deployed to various manual testing computers. Anyway at this point the system can be played with by anyone that needs to review things like User Centered Design Analysts, Business Analysts, Product Managers, other interested developers, executives, etc.
This step is actually optional. If your unit tests and other automated tests are good enough and if your development organization is confident enough and has a good track record then this step can be skipped and you can deploy directly to customers. Eliminating this step is a primary goal of many continuous development teams. Sometimes it’s just a matter of keeping one last human checkpoint in place, but there are many organizations that deploy to the production servers several times a day without needing this step. This is the fastest way to immediate customer feedback on what you are doing.
Deploy to Customers
Now the fun part, deploying to the production servers for customers. You want this to be the same process as described above so that you know it will work properly. Sometimes you might deploy to one server and have it start serving a set up customers to ensure all is well before allowing the build to go to all servers. Key to this is the ability to seamlessly update the database and have the ability to roll back the changes including the database changes if things go wrong.
Usually when you deploy new features, there have to be database schema changes to support these. Depending on the type of database system you are using whether SQL or NoSQL the types of changes could be quite different. In the SQL world you might add some fields to some tables or add some new tables. Generally when you design your schema changes you need to be cognizant of the demands of continuous deployment. For a web site that needs to be up 24×7, you can’t deploy something that requires hours of data conversion. Generally you want a very simple database design to allow changes without a great deal of fuss. If you end up really needing a major conversion, then this will have to be planned and you need to measure the ROI of such a feature given the disruption to customers of even small outages. Another approach is to make the application smarter so perhaps it can handle old data and new data so that the system can still run as the conversion is in progress. This of course takes more work that simply shutting things down, but becomes the cost of doing business in the 24×7 SaaS world.
Besides providing scripts and/or programs to convert the database, you have to provide scripts and/or programs to un-convert the database. By the same token if something goes wrong, say users start having problems, then you need to roll back the database changes (and the newly deployed build), again without causing a huge service blackout. This side of things can be very challenging. These processes need to be a heavy focus for unit and other automated testing.
Not all changes that pass all the testing and have been approved by all departments end up delighting customers. Often some unforeseen usage causes the new features to greatly irritate your customers, who then tell you and the world all about their annoyance on Twitter, LinkedIn, Facebook, etc. At this point you may need to make the hard decision to roll-back the newly deployed build along with any artifacts like database changes.
Even though we hate doing this, we need to remember that customer satisfaction is job one and that if we keep annoying them, they will go to another service that won’t annoy them so much. All this is a part of being Agile, customer focused and applying validated learning. Basically learn everything you can about what happened so you can do better next time and actually delight your customers. Over time you will probably become comfortable with rolling back at the first sign of trouble and just chalk it up to a learning experience.
As part of continuous deployment, you need continuous monitoring of your systems. You need to know what features people are using and how. From this you can learn where your users are running into problems and use this information to improve the situation for future builds.
You can also save yourself the headache of a Twitter storm if you detect a problem users are having before too many get mad about it and either fix or roll-back the annoyance.
The nice things about SaaS services is that you have complete control of the service and can track all sorts of usage metrics and information (of course subject to privacy policies).
A rather good blog post on WordPress’ experience with continuous deployment is here (the site this blog is hosted on). Just to emphasize that many companies are really doing this. In WordPress’ case deploying around 16 builds per day to customers. As companies perfect this process it is becoming a competitive weapon where whoever does this best, innovates faster and leaves their competition in the dust.