Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘software development

The Road to DevOps Part 1

with 5 comments

Introduction

Historically Development and Operations have been two separate departments that rarely talk to each other. Development produces a product using some sort of development methodology and when they’ve finished it, they release it and hand it off to Operations. Operations then installs the product, upgrades the users from the old version and then maintains the system, fixing hardware problems, logging software defect complaints and keeping backups. This worked fine when Development produced a release every 18 months or so, but doesn’t really work in the modern Web world where software can often be released hourly.

In the new world Development and Operations need to be combined into one team and one department. The goal being to remove any organizational or bureaucratic barriers between the two. It also recognizes that the goal of the company isn’t just to produce the software but to run it and to keep it available, up to date and healthy for all its customers. Operations has to provide feedback immediately to Development and Development has to quickly address any issues and provide updates quickly back to Operations.

In this posting I’m covering the aspect of DevOps concerned with frequently rolling out new versions to the cloud. In a future posting I’ll cover the aspects of DevOps concerned with the normal running of a cloud based service, like provisioning new users, monitoring the system, scaling the system and such.

Agile to DevOps

We have transitioned our software development methodology from Waterfall to Agile. A key idea of Agile is that you break work into small stories that you complete in a single short sprint (in our case two weeks). A key goal is that during the sprint each story is done completely including testing, documentation and anything else required. This then leads to the product being in a releasable state at the end of each sprint. In an ideal world you could then release (or deploy) the product at the end of each sprint.

This is certainly an improvement over Waterfall where the product is only releasable every year or 18 months or so. But generally with Agile this is the ideal, but not really what is quite achieved. Generally a release consists of the outcome of a number of sprints (usually around 10) followed by a short regression, held concurrently with a beta test followed by release.

So the question is how do we remove this extra overhead and release continuously (or much more frequently)? This is where DevOps comes in. It is a methodology that has been developed to extend Agile Development and principles straight through to actually encompassing the deployment and maintenance of the product. DevOps does require Development change the way it does things as it requires Operations to change. DevOps requires a much more team bases approach to doing things and requires organizational boundaries be removed. DevOps requires a razor focus on automating all manual processes in the build to deploy to monitor to get feedback cycle.

Development

Most development processes have an automated build system, usually built on something like Jenkins. The idea is that when you check in source code, the build system sees you checked it in, then it has rules that tell it what module it is part of, and rebuilds those modules, then it sees what modules depend on those modules and rebuilds those and so on. As part of the build process, unit tests are run and various automated tests are set off. The idea is that if anything goes wrong, it is very clear which check-in caused it and things can be quickly fixed and/or rolled back.

This is a good starting point but for effective DevOps it needs to be refined. Most modern source control systems support branching (most famously git). For DevOps it becomes even more crucial that the master branch of the product is always in releasable state and can be deployed at any time. The way this is achieved is by developing each feature as a separate branch. Then when a feature is completely ready for release it can be pulled into the master branch, which means it can be deployed at any time. Below is a diagram of how this process typically works:

Automated Testing

Obviously in this environment, it isn’t going to work if for every frequent release, you need to run a complete thorough manual test of the entire product. In an ideal world you have very complete test coverage using various levels of automated testing. I covered some aspects of this here. You still want to do some manual testing to ensure that things still feel right, but you don’t want to have to be repeating a lot of functional testing.

Operations

Operations can then take any release and in consultation with the various stakeholders release it to production. Operations is then in charge of this part of things and ensures the new versions in installed, data is converted and everything is running smoothly.

Some organizations release everything that is built which means their web site can be updated hourly. Github is a good example of this. But generally for ERP or CRM type software we want to be a bit more controlled. Generally there is a schedule of releases (perhaps one release every two weeks) and then a schedule of when things need to be done to be included in that release, which then controls which branches get pulled into the master branch. This is to ensure that there aren’t any disruptions to business customers. Again you can see that this process is blending elements of QA along with Operations which is why the DevOps team approach works so well.

A key idea that has emerged is the concept: “Infrastructure as Code”. In this world all Operations tasks are performed by code. This way things become much more repeatable and much more automated. It’s really this whole idea that you build your infrastructure by writing programs that then do the real work, that has largely led to movement of Developers into Operations.

DevOps

This is where Development and Operations must be working as a team. Operations has to let Developers know what tools (scripts) they require to deploy the new version. They need automated procedures to roll out the new version, convert data and such. They have to be working together very closely to develop all these scripts in the various tools like Jenkins, PowerShell, Maven, Ant, Chef, Puppet or Nexus.

Performing all this work takes a lot of effort. It has to be people’s full time job, or it just won’t get done properly. If people aren’t fully applied to this, manual processes will start to creep in, productivity and quality will suffer.

Beyond successfully deploying the software, this team has to handle things when they go wrong. They need to be able to rollback a version. Reverse the database schema changes and return to a known stable good state.

Summary

DevOps is a whole new profession. Combining many of the skills of Development with those of Operations. People with these skill sets will be in high demand as this is becoming an area that is making or breaking companies on the Web and in the Cloud. No one likes to have outages; no one likes to roll out bad upgrades. In today’s fast paced world, these can put huge pressures on a company. DevOps as a profession and set of operating procedures is a good way to alleviate this pressure while keep up to the fast pace.

Releasable State

with 5 comments

First an advertisement for a new session just added for Sage Summit:

Session: Sage ERP Technology Roadmap on Sunday, July 10 at 1pm – 2:30pm.

Session Description: Join Sage ERP Product leaders who will be discussing the technology evolution of Sage ERP product lines. This session will cover our product journey to the cloud; technologies being used to deliver rich web experience; developments on the Connected Services front(joining On-premise applications with the Cloud enabling us for web and mobile services); what kind of tools/technology/skills are needed to integrate, customize our products in the web world; and collaboration occurring on common components. There will also be time set aside for open dialogue. This session is for Sage ERP Development Partners only.

Now back to our regularly scheduled programming…

Releasable State

One of the key goals of Agile Software Development is keeping software in a releasable state. There are quite a few reasons that this is a good thing which we are going to explore in this blog posting. This is very much related to test driven development which I blogged about here.

Problems with Waterfall

In traditional Waterfall Software Development you have a number of key phases namely the setting requirements phase then the design phase then the implementation (or coding) phase then the QA phase then the maintenance phase. The diagram below gives an indication why this is called Waterfall.

There are a number of problems with this; one of the key ones is that bugs and defects are found quite late and that these tend to accumulate to quite large numbers. There are other problems with the requirements gathering and design phases but those would be a topic for another day. Generally it’s far cheaper to fix bugs right when they occur and everything about the code is fresh in people’s minds. As the time between coding and discovering a bug increases, the cost (or time) to fix it increases quite quickly. Fixing a bug from two or more stages in the development process ago is especially expensive. These bugs also cause havoc to the scheduling process. Estimating how long it takes to code something is usually marginally accurate. Estimating how long it will take to fix say a hundred or more bugs is really a crap shoot. Usually when Waterfall projects start to fall behind, there is a feeling that you can make it up by squeezing the verification phase, but practically, what happens is that as a project falls behind then the verification phase grows faster, since more bugs are found and they take longer than expected to fix.

Besides making scheduling un-predictable, what often happens in this environment is that you miss a couple of deadlines; the product is feature complete and has a couple of successful beta sites. But QA are still finding quite a high number of bugs and the outstanding bug count is high. What tends to happen next is a rationalization that the bugs that QA are finding won’t affect customers or that they are things like adherence to standards or in obscure corners of the product and that given the two beta sites are happy then why not go ahead and release. The product is then released; many bugs are found in the field, customers are very unhappy. Then there is a huge panic to address these bugs in a first service pack. This then delays the start of development of the next version and guarantees the process will repeat itself. This then lead to quite a dangerous downward spiral.

Generally this is a gloomy prospect that leads to the various software development death marches that you hear about.

Agile to the Rescue

Agile Software Development wants to address this problem in a couple of ways:

  1. Fix bugs right away, don’t let the time between coding and bug fixing happen.
  2. Don’t let bugs accumulate. Fix bugs right away, it is never ok to let bugs accumulate for later.

One of the main tenants of Agile Development is that you want to keep the software in a releasable state. You never want to get yourself into a situation where you know you have to fix a lot of bugs before releasing. The philosophy has to be that if you know about something in the product that would prevent it from being released right now, then that must be addressed immediately. This statement refers to quality and not feature completeness. Features are added one at a time off a priority queue and then it’s up to Product Management to decide the commercial readiness of the product.

Sure you could try to apply these two points to Waterfall, but the process and philosophy of Waterfall works against you. The goal to finish each phase on time tends to drive away good intentions to not let this happen yet again.

The Agile Development process operates by building a product backlog of user stories that you want to implement into the product. These are in a priority order (by business value) so the teams take stories from the top of this queue to implement. These are then implemented in two or three week “sprints”. At the end of each sprint you produce a working increment of your product. This is shown in the diagram below:

The stories in the product backlog are small enough that they can be fully implemented, tested and documented inside of a single sprint. When done, stories need to be “accepted”. Stories are not done or accepted unless fully tested, code reviewed, documented and complete. If a story is not accepted then it isn’t made part of the product, the team doesn’t get any credit for the work and it will need to be completed next sprint. The key point is that at the end of each sprint there isn’t a bunch of half-finished work checked in causing bugs and general havoc. From the diagram you might want to think of each sprint as a mini-Waterfall, however that would be wrong, a sprint is quite different, but again that’s another topic, perhaps for a future post.

If you are a SaaS product, potentially you could deploy the output of each sprint to your live web server and have all your customers immediately benefit from the work performed last sprint. Within Sage we are trying to operate with a SaaS mindset and I previously blogged about that here. Once you have working software, you have a lot of options as to what to do with it. You have far more opportunities to distribute your software, if not live, then to get more customer feedback to provide more improvements in later sprints.

A key to staying in a Releasable State is having a good set of automated testing tools that can be run continuously on the product so that bugs are found immediately and not allowed to creep in. I blogged on our efforts to become a more test driven organization here.

Generally with-in Agile you are trying to reduce accumulated in quality debt and technical debt. These are defects or parts of the program that badly need updating. The more of these that you accumulate, the harder a program is to maintain. With-in Agile you track these and work very hard to ensure you are always reducing these.

The key thing missing from the diagram above on Agile Development is the lack of any sort of regression or final quality validation phase. This is the desired end state; it takes a bit of time to get to a place when you are really confident you can drop these. We are endeavoring to shorten them with each release with the goal of removing them entirely one day. This speeds up our releases, by eliminating this work and makes things more predictable, since otherwise going into regression you don’t know what you will find, and then the time to fix anything found is also unpredictable.

The further you let a project get from releasable state, the less predictable the completion date. Worse you don’t know how much of this accumulated quality debt you are going to have to pay to release, how negatively this will affect customers or how badly this will hamper future product maintenance. To work optimially you need to really keep a hard line on these. The idea of being able to release at the end of every sprint is a good rallying cry to keep these all under control.

Summary

The Agile Development Process’s goal of fixing bugs right away as they are introduced, performing continuous automated testing and removing the regression testing tail end processes provided software development organizations far better predictability in their development efforts.

This also makes your development process far more adaptable in changing direction as business and competitive conditions change. Since it is far easier to change direction when going from releasable state to releasable state as you go from short sprint to sprint.

Written by smist08

May 28, 2011 at 4:17 pm