Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘programmer

Troubleshooting Problems in the Cloud

with one comment

Introduction

Imagine that you are managing a large cloud based solution with lots of moving parts, thousands of users and fairly complicated business logic going on all the time. Now an irate customer phones into support complaining that something bad happened. Sometimes from the explanation you can figure out what happened quite easily. Sometimes what the user described sounds completely inexplicable and impossible. Now what do you do to troubleshoot the problem?

Easy Problems

First let’s review how easy problems are solved and can be dealt with quickly:

  • From the description of the problem, the programmer goes, oh of course we should have thought of that and on reviewing the code finds and easy fix which can be deployed easily. (Too bad this doesn’t happen more often).
  • The programmer scratches his head, but carefully goes through the customer’s steps and reproduces the bug on his local system, so he can easily investigate in the debugger and solve the problem.
  • The problem is a bit of a mystery so a programmer and support analyst using GoToMeeting to watch the customer work. On watching them work, they realize what this customer is doing differently to cause the problem and then the programmer can reproduce and debug the problem.
  • On examining the standard application logging, an unhandled exception is found in the log around the time the customer reported the problem and from this the code can be examined and the cause of the exception can be determined and fixed.

These are the standard and generally easy problems to fix. But what happens when these methods don’t yield any results?

Harder Problems

Here are some examples of harder problems that can drive people crazy.

  • The problem happens infrequently and can’t be consistently reproduced. But it happens frequently enough that customers are getting rather annoyed.
  • The system works fine most of the time, but every now and again, the system goes berserk for some reason. Then it just magically goes back to normal, or you need to restart the servers to get things back on track.
  • The problem only happens to certain users, perhaps at certain times. You can watch it happening to them via GoToMeeting, but can’t for the life of you get it to happen to you or to happen in your test environment and many other users never experience this.
  • Often problems aren’t outright bugs, but other strange behaviors, like the whole system suddenly slowing down for no apparent reason and then some time later it goes back to normal, again for no apparent reason.

A lot of time these sorts of things are due to interactions between what people are doing in different parts of the system. Predicting and testing all these situations is very difficult and often a result of emergent phenomena which we talk about a bit later.

Preventative Measures

Generally the only way to solve the hard problems is to instrument your web application so you know exactly what is going on. Not only does this help with solving hard problems, but it also helps with making using your web site a better experience for users, since you can log what they do, how long it takes and where they are getting stuck. Often usability problems are more serious to users than program failures or other bugs.

Instrumenting you application generally means having good logging and making sure you log a lot of metrics that you can monitor. This can also means having extra APIs to the running application where you can inquire on the state of various components, find out how many of one type of object is currently in use for instance. Often your infrastructure like your web server will do a lot of logging for you, so be aware of this as there is no need to log things twice.

Having a dashboard to track these metrics is very helpful.  The dashboard can both make reports and graphs from the application logs as well as use the application’s diagnostic API to provide useful information about what is happening. Often you can integrate with third party vendors like New Relic or Microsoft Application Insights, but one way or another you need this.

appinsights

One objection to logging a lot of information is that the process of doing this can slow things down so much it becomes unacceptable. This is certainly true if you are logging synchronously to a file (i.e. not continuing on until the data is written). But modern systems get around this problems by logging asynchronously. The system doing the logging just fires the log message at a listener and goes on without waiting for or needing a reply. This causes logging to become much less a burden on the application. It then moves the problem to the listener application to log the messages, and if it gets too far behind it usually starts spilling messages. Most common logging infrastructures like Microsoft’s ETW already have this ability to take advantage of.

Some things you must track via APIs or logging:

  • Any resource used in the system. Java and C# programmers often think they can’t have leaks because of garbage collection, but garbage collection isn’t very smart and often important resources will leak due to unexpected references or a circular dependency. In a 24×7 web application that needs to essentially run forever in a busy state, this is incredibly important.
  • Generally what users are doing, so you know what the possible interactions might be and what are the concurrent things you may need to do to replicate the problem.
  • Any exceptions or errors thrown by the program this tends to be a given, but make sure you include programmatically handled errors since these can be useful as well.
  • Make sure you log performance metrics, if something takes longer than expected, log it.
  • Any calls to external systems and the results (with performance stats).
  • Assert type conditions where the program finds itself in an unexpected state (is make these get logged and not compiled out for production).
  • Anything the programmer considers a sensitive area of the program that they are worried about.

There are also some things that must not be logged, these include any sort of passwords, decryption keys or sensitive customer data (ie nearly all customer data). Generally a lot of people need to work with the logs and there can’t be any sensitive information there as this will be considered a major security problem.

Make sure you have some good software (either home grown, open source or commercial) to search and analyze your logs. These logs will get incredibly big and will need to be carefully managed (usually archiving them each day). There are many good packages to make this chore far easier.

Emergent Behaviors

Emergent behavior refers to complex unexpected behaviors arising from large sets of much simpler processes. Modern web applications are getting bigger and bigger with many moving parts and many interactions between different systems and subsystems. We are already starting to see emergent behavior arising. Nothing on the scale of the system suddenly becoming intelligent, but on the scale where predicting what will happen in all cases has become impossible. It doesn’t matter how much QA you apply, chaos theory and mathematics quite clearly state that the system is beyond simple prediction. That doesn’t mean it is unstable. If done properly your system should still be quite stable, meaning small changes won’t cause radically different things to happen.

The key point is to keep this mind when building your system, and to make sure you have plenty of instrumentation in place so that even if you can’t predict what will happen, you can still see what is happening and act on it.

Summary

Diagnosing and solving hard problems in a large web application with thousands of concurrent users can be a lot of fun and very challenging. Having good preventative measures in place can make life a lot easier. You don’t want to be continually pushing out newer versions of your application just to add more logging because you can’t figure out what is going on.

Advertisements

10 Questions for Sage Uncle Steve

with 3 comments

This is a guest blog posting by my wife, Cathalynn Labonté-Smith, though I’m the one answering the questions.

***

It may seem odd to readers to interview the man I’ve looked across the dinner table at for 29 plus years in his own blog, but we’ve had a recent addition to our household, Ian. Steve’s nephew is an enthusiastic young man who is in a programmer’s boot camp (see Steve’s Blog entry The Times They Are a Changin) and as an educator this has brought to my mind new questions for my darling husband beyond, “How was your day?” and “Will you be able to fit in a vacation around your business travel this year?” Also, he didn’t like my alternate idea of a Valentine to Computing.

We got out of the habit of talking about the details of Steve’s work since the time I worked as a technical writer in the field of wireless technology nearly a decade ago. For couples out there who both work in the same or related fields, you will know what I mean when I say it’s just best to unwind and avoid topics to do with work in the off hours.

When I left tech writing and became a teacher, occasionally I’d walk into a business class that was learning Accpac for Windows or Simply Accounting. Trained as an English teacher I’d do what all on-call teachers do when outside their subject area: stick to the lesson plan, get help from the brightest students in the class and muddle through as best I could. So it was fun to share those experiences with Steve and I actually learned a bit about the Sage products.

It’s been many years since I’ve been in the classroom, but having taught career preparation I want to know the following from Steve for programmers coming on stream. I know that Steve’s blog audience is unlikely to be junior programmers but I thought this might get his more senior executive readers thinking about what legacy they can pass along to new programmers.

Whoa, I can hear you say, what makes you think they can hear us with their ears jammed with ear buds and if they could we don’t speak their lingo? I’m not saying they’re going to sit through a PowerPoint of your ruminations and really the best example is modelling, after all, and as a teacher I found that it was an equal exchange. You can learn as much from your novice employees as they can learn from you–just about different things.

When I met Steve he was a Teacher’s Assistant in the Math Department at the University of British Columbia working on his Master’s Degree. His Math 100 class was just him, the blackboard, a huge lecture hall packed full of nervous first-years and a piece of chalk. I was never his student, no; I was on the other side of campus in Creative Writing workshops in poetry, fiction and children’s writing.

After his degree, he worked at various software companies in many different fields as a contractor, consultant or employee before finding his long-time home at Sage. Aside from having over twenty years at Sage now in his current role as Chief Architect, I’m curious as what Uncle Steve would say to Ian if he were around longer than it takes for him to gulp down his dinner and head upstairs for more studying?

1. Steve, what kind of guidance can you offer for formal programs a would-be programmer should choose for the best future employment and advancement? Can you compare it to your formal programming education?

A. I learned to program originally in Grade 11. Nowadays people have lots of opportunity to learn how to program at a young age. There are quite a few exceptional online programs where you can learn program, for example, Khan Academy. Khan Academy teaches you to program in JavaScript while creating fun drawings and animations. Programming like most skills requires practice to master. In the book Outliers, Malcolm Gladwell maintains that it takes 10,000 hours of practice to really master something, so starting early really helps.

My undergraduate and master’s degrees are in Mathematics and not Computer Science. However’ I took a few CS courses along the way (in things like Numerical Analysis and Operations Research), so strictly speaking I don’t have a formal CS background.

I was in the Co-op program at the University of Victoria so when I did graduate I had four work terms of job experience. Plus, I was always working on some sort of programming project on my trusty Apple II Plus computer (usually involving Fractals).

It doesn’t really matter so much which programming languages you learn, just learn a variety. After all, things are changing so fast these days that you need to expect to keep learning these as you progress through your career.

To summarize, you need something that will give you lots of practice programming, a few formal courses to give you credibility and you need to be a voracious reader.

2. In your undergraduate degree, you went through a co-op program. Is this something that you recommend and why? For example, does it make a programmer more desirable as a future employee?

A. Yes, absolutely. I think intern type programs are terrific ways to get job experience and references ready for that first real job. I did four co-op work terms and learned an awful lot about how various companies operate and what is involved. It is a great chance to get some experience with a variety of companies, perhaps a large one, a small one and a government one. I certainly give credit for co-op work terms when I’m hiring.

3. What kind of summer, part-time or volunteer work might add to and develop their skills?

A. I would look for something where you are giving back to the community, such as donating your time to a charity and if you have the chance to travel when you do this then even better. Again do something that interests you and you are passionate about.

4. What kind of advice can you give new programmers about how to pick their first employer?

A. Chances are you are going to have several jobs throughout your career. More than likely the pay will be similar, so go for something interesting. Do some research on the companies you are applying to and look beyond the initial job you will have there. Also, consider travelling to a new location for your first job to get a bit more experience of the world as well.

5. Just like some doctors are better at staying current on the latest treatments and research, how do programmers stay current when there seems to be so many new technologies and programming languages to learn. How do you manage to filter through all of it to get what will last and have future value? Or is it even critical that programmers do stay current or is there enough maintenance work to go around forever?

A.  I think the number one rule is to not rely on your employer for this. This is really your own professional responsibility. Employers will train you for what you need immediately but usually not for much else and not for things that they aren’t interested in.

One of the great things about the profession today is that most of the programming tools that are important are either open source or have free versions available (like Visual Studio Express). So you can dabble with all sorts of things in your spare time. All you really need is a computer and an Internet connection. I really believe in learning by doing. So pick something new and interesting and do a small project in it to see if you want to go deeper.

6. What are some common pitfalls new programmers could avoid in their early careers?

A. I think the most common pitfalls are either being too loyal to a company or giving up on a company too easily.

Often people in their career have very high and probably unrealistic expectations on how well a company is run. Often this gives rise to a lot of changing jobs after quick stints. This can be a mistake if you don’t get ahead and develop a resume with lots of short stays.

The reverse is the other common mistake—being in a job that doesn’t work, but trying to stick it out too long rather than cutting the cord. Leaving is often a hard decision to make, but is often easier earlier in your career. Finding the right compromise between these two extremes can be very difficult.

7. What is the most valuable lesson or lessons that you’ve learned throughout your career that you could share with a new programmer?

A. That things are often darkest before the dawn. On any project at some point things are going to look bad, problems look unsolvable, bugs are piling up and deadlines are being missed. The lesson here is not to take the whole world’s problems on your shoulders, but to just work through the problems one by one. Often these are difficult problems that take much more time than you would have thought, but sticking to this eventually yields the light at the end of the tunnel.

Another take on this is to remain optimistic in the face of adversity. Or follow the Hitchhiker’s Guide to the Galaxy’s main advice: Don’t Panic! (Their other advice of always carry a towel, I’m not so sure about).

don__t_panic_and_carry_a_towel

8. Who were your early role models?

A. Bill Gates and Steve Wozniak for what they did to start their companies. Steve Jobs for what he did when he returned to Apple.

9. Is there anything you would have done differently in your early career knowing what you know now?

A. There are always so many shoulda coulda wouldas. Now I know which companies back then paid the big bucks in stock options, but it’s hard to predict when looking forwards. I sometimes wonder if I should have moved from Vancouver, but then you get a beautiful day like today and just say “Nah”.

10. Is there a question that I didn’t ask that you wished I did?

A. No, this blog is already getting quite long J.

Point taken, Steve, this is a good place to wrap it up. Oh and, Happy Valentine’s Day, to you and to all your readers.

Written by smist08

February 15, 2014 at 4:34 pm