Archive for January 2012
NoSQL databases are a class of database management systems that grew up with the Internet. Generally these refer to any database system that doesn’t use SQL as its query language (though some call them Not only SQL, meaning they support SQL and something else as well). This would make them any database system other than the standard databases. However these are usually associated with a class of database system that is intended to server large Internet companies. In this article we are only considering the large distributed databases and not the specialized in-memory databases. The goal of these is to be highly distributed and highly redundant. As a database scales you would be able to scale usage by just adding servers and have them add to the capacity of the system in a linear manner. You would like to have your database distributed all over the world, so it can be accessed from anywhere with a very low latency.
The main difference between traditional SQL databases and NoSQL databases comes down to the CAP Theorem which says out of the three things:
- Consistency – roughly meaning that all clients of a data store get responses to requests that ‘make sense’. For example, if Client A writes 1 then 2 to location X, Client B cannot read 2 followed by 1.
- Availability – all operations on a data store eventually return successfully. We say that a data store is ‘available’ for, e.g. write operations.
- Partition tolerance – if the network stops delivering messages between two sets of servers, will the system continue to work correctly?
You have to choose two, you can’t have all three.
Then SQL databases choose consistency and partition tolerance and NoSQL databases choose availability and partition tolerance. The CAP Theorem is shown graphically below:
Generally then there has been a proliferation of NoSQL databases that all make a slightly different tradeoffs for the CAP theorem. If they can’t have all three does it make sense to have 2/3 of each perhaps? Generally there has been some great research that has produced amazing results, but this has led to a bit of a tower of Babel of database access languages.
The classic example is Google’s BigData database used to power the Google search engine. How is it that when you type a search into Google, you immediately get back a page of search results instantly? How is it translating what you typed into some sort of database query and how is that query operating so quickly? How is Google handling millions of these queries coming from all over the world so quickly? What is the database schema that allows this magic?
The structure of the database is a really simple key-value store. The key is every possible search term that anyone could type in. The value is the table of web sites (and descriptions) to return for this search term. The database is called BigData because this value consisting of all the search results could have millions of hits for you to scroll through. There is no limit to the size of the value and there is no limit to the number of keys. Even so an indexed binary tree read of such a table would still be quite slow and this is Google’s big advancement with an algorithm called MapReduce that lets these simple key reads happen so quickly. The BigTable database spreads this huge table over many servers to operate and it has redundancy of having the database in many places at once. The MapReduce algorithm is a parallel algorithm in that parts of the search can be done by multiple servers in parallel.
Sound wonderful? Google published the MapReduce algorithm and there is an open source implementation of it called Apache HBase, so why doesn’t everyone use it? All the NoSQL databases provide a huge optimization on one of the things a SQL database does, but at a cost of making something else very slow, or eliminating something else entirely. In the case of BigTable, inserting and updating records in this table is very slow. But for Google search this is ok. Google operates a giant number of “web spiders” that are always out there traversing the web looking for new sites and updating the Google database of search results. These are running all the time and updating all the servers all the time. But these updates and inserts might take hours or days to happen, especially if you count the amount of time to find their way over all the thousands of servers involved. But Google doesn’t mind as long as it happens and is eventually consistent.
ERP Database Usage
Generally ERP packages rely very heavily on the tenants of a SQL type relational database. They require database transactioning and consistency (often referred to as ACID). ERP packages require general SQL type queries. They need good performance writing transactions to the database as well as reading and reporting from the database. I blogged a bit about Sage 300 ERP’s database usage here.
The NoSQL camp would contest these statements. For instance if I insert an Order, is it necessary that if I immediately list the Orders that I see it, or is it ok if it shows up in that list, say ten minutes later? Or do ERP packages believe they need all the benefits of SQL without weighing them against the other benefits of alternative systems? Often customers will prefer speed over customizability or functionality.
I like the way in SQL you can hit post and get back a response that a transaction is either all committed or all rolled back. However from working with SQL, I also know this is a huge cost, especially in our ability to scale out the database and is a definite limit on multi-user contention. For a large scale system, I’m really questioning the locking requirements that large transactions like entering an order place on the system. Remember the “availability” part of NoSQL databases will ensure the Order is entered eventually.
Another problem is adhoc SQL queries. ERP packages tend to like to do reporting and data inquiries using adhoc SQL queries. Usually this is a no no in the NoSQL world. Often in the NoSQL world, all queries must be designed ahead of time using very complicated MapReduce formula. The question is whether this is really necessary. If you have a really powerful Google like search ability built into the database, then do you really need to search the data with SQL queries? Is it that we just do this because we know how and that in fact real users would much rather use the Google method than the SQL method?
Suppose you want the data query ability of a NoSQL database, but still want the SQL transaction integrity and consistency? One approach has been hybrid systems. Think of these as two databases in one where both a SQL and NoSQL database engine operate on your data. Perhaps a cleaner solution is to take a data warehousing approach and create your data in a SQL database, but then archive it to a NoSQL database for various querying and analysis. This is the foundation of many cloud based BI solutions that are out there. With this approach you get some of the benefits of both. What you lack is the high availability and scalability of the database for data entry and regular SQL type queries. But still this can be a way to get a bit of the best of both worlds.
I think it will be a while longer before NoSQL databases gain acceptance in the ERP world. However the amount of innovation going on in the NoSQL world is huge and great strides are being taken. I suspect we will start to see NoSQL begin to be used for non-traditional BI via data warehouses. Even adding Google like search to ERP usually involves adding a NoSQL database on the side. I think once this starts to happen then we will see more and more functionality move to the NoSQL side of things and then perhaps down the road somewhere we will be able to remove the SQL database. But the jury is still our as to whether this is a good long term goal.
I put a questionnaire on how people liked Crystal Reports up on the “Sage Partners, Employees & Alumni Networking Group” group on LinkedIn and received a lot of good discussion. I’ve previously blogged on customizing Crystal Reports for Sage 300 ERP (Accpac) here. This blog is more about the history of how ERP to Crystal integrations have gone in the past and the various challenges faced.
Accpac (now Sage 300 ERP) was originally developed by the Basic Software Group in the eighties in Vancouver. They started a project to have a new WYSIWYG report writer as part of the product rather than the codes based approach being used. This project was cancelled and some of the people involved quit and started their own company to develop Crystal Reports (then Quick Reports). This was originally developed as an add-in product for the MS-DOS based Accpac Plus. Then they generalized the product to dynamically look at any database and that became the Crystal Reports of today. The original founders then struck it rich when they sold their company to Seagate in 1994, which was trying to branch out from hard disks to software. Eventually Seagate gave up on this and Crystal went private and independent as Crystal Decisions. Shortly after in 2003 it was sold again to Business Objects. Then in 2008 Business Objects was bought by SAP where it lives today.
All of Sage 100, 300, 500 and X3 ERP use Crystal Reports as their main reporting engine. Sage HRMS and Sage CRM also use Crystal. We all adopted Crystal long ago at some stage or another. Sage 300 ERP adopted Crystal when we switched from using CARET (the Computer Associates report writer) back at version 3.0A.
Most products started integrating with Crystal in the 16-Bit world of Windows 3.1. We all integrated via the DLL interface that was to the DLL: crpe.dll. This interface let us print reports to a printer, preview or file. It let us set parameters that were passed from application’s entry forms to Crystal for things like to/from ranges. The Crystal runtime that included crpe.dll was installed to a directory c:\windows\crystal. This meant that any product that installed the Crystal runtime would usually overwrite what was already there, meaning the last product using Crystal to be installed would usually work, whereas most other products using Crystal would then stop working. This was the famous DLL hell.
In these days we had to use native drivers for accessing the database, Sage ERP 300 shipped two versions of its reports, one set using the native Btrieve driver and the other using ODBC for SQL Server. This tended to be a maintenance headache. Having multiple reports for different databases, or even different page sizes like letter vs. A4 vs. legal. Having different reports depending whether G/L is used or not. Over the years as Crystal functionality has improved and ODBC drivers have improved, we’ve been able to reduce the number of report versions. We now ship one version of reports using ODBC that is just configured to look at the correct database regardless of type.
Then we all started to move to 32 bit Windows with the advent of Windows 95. Crystal produced a 32-bit version and the new interface DLL was crpe32.dll. This was a close match to the 16-bit version and programs could work with Crystal in the 32 bit world very similarly to how they did in the 16 bit world. However DLL hell still remained. Plus we started to see the appearance of Terminal Server and Citrix. To keep users separated on TS, anything that a user installs to the Windows directory actually goes to a local Windows directory. The problem now is that the Crystal directory would go to the local user’s private windows directory and hence Crystal would only work for the user that installed the product and not for any other users on the Terminal Server. This then led to a manual procedure of copying the Crystal subdirectory either to the main Windows directory or each user’s local Windows directory.
With Crystal Reports 9, Crystal moved to eliminate the DLL hell problem. They dropped support for calling the crpe32.dll directly and moved the Crystal runtimes file over under Program Files under a version labeled directory. This allowed multiple version of the Crystal runtime to be installed at the same time and fixed the problems with Terminal Server and Citrix. However applications that used Crystal had to switch from using the crpe32.dll to using Crystal’s COM interface.
Crystal X and XI then followed and these worked very similarly to Crystal 9. Another nice thing was at this point Crystal had adopted an extensible file format, so the file format stopped changing between versions making upgrades easier.
Crystal Reports 2008 then dropped their COM API and now only supports integration via .Net and Java interfaces. This presented a number of problems, namely since these interfaces were quite different than the ones that went before them. Plus Crystal Reports was a fairly major upgrade to XI and seemed to introduce quite a few bugs. At this point ERP packages integrating to Crystal either had trouble with the new interfaces or had trouble with existing reports not running properly. This led to a bit of a low adoption of CR 2008. However the file format remained the same, so you could use CR 2008 to edit reports that would later be rendered in the ERP package via an earlier Crystal runtime, and again co-existence of multiple versions isn’t a problem due to the side-by-side DLLs.
Crystal has now released Crystal Reports 2011 and this fixes most of the problem in CR 2008. So we should start to see ERP packages moving forwards again. This version still only supports .Net and Java interfaces but has fixed a number of bugs and deficiencies that were causing adoption problems.
For setting parameters and running reports, most ERP packages can do this fairly handily. However when you start to push the integration, what are some of the problems you can run into?
Sage ERP 500 (previously MAS 500) tries to simplify report customization by actually generating reports based on a higher level report designer built into the Sage ERP 500 product. This then eliminates a lot of the challenges in customizing reports inside the Crystal Designer. However this means that Sage ERP 500 uses the Crystal Design API and this API changes a lot from version to version making upgrades much harder. Sage ERP 500 also controls access to its database via a customer ODBC driver that then adds the Sage ERP 500 security model to the generic ODBC access.
Sage ERP 300 (previously Accpac) has a concept called “datapipes”. These were originally created for CARET as a performance enhancement and then taken over to Crystal. The concept is that you use these in place of ODBC to connect to the database and then these datapipes use the Sage 300 ERP database ERP to access the database. This allows the datapipes to use special knowledge of the Sage 300 ERP database schema to provide faster access. We use them to “flatten” out subreports, since accessing subreports in Crystal can be slow. We also use these for controlling access to things like G/L accounts with our G/L security module so users don’t see data they aren’t entitled to.
Using the Crystal Reports runtime from an ERP package is fairly straight forwards and easy, it can be installed with the ERP package and usually works quite well, so the user doesn’t have any extra hardware, installation or management problems. If you need to customize reports you need to buy and install Crystal Reports, but to just run reports there is no extra overhead. However much new development by Crystal is requiring the Crystal Reports Server.
The Crystal Reports Server is a central server that manages the printing and distribution of reports. You can use it to schedule when reports run and who gets the results. Most new Crystal development is now requiring a Reports Server to operate. For instance if you want to use the new Crystal Data Universe to make creating reports easier then you need a Reports Server to hold these. Similarly several new technologies like Crystal’s dash-boarding technology require the Crystal Server.
For us ERP vendors, we now have to evaluate the ROI for our customers. Is requiring this new server worth it? Is it worth the hardware, setup and licensing costs? Long term, is this tying the ERP package too closely to Crystal making adopting other technologies harder? At this point I don’t have a clear answer to these questions, but we are working with the new things in Reports Server to see their costs and benefits.
Crystal Reports has a long and interesting history and has become the de facto standard in report writers across all ERP packages. It is very full functioned and works with any database server. It takes a bit to learn how to use it effectively and is a very powerful tool for Business Partners that use it. The main complaint about Crystal is that it’s hard for end users to customize their own reports, due to some of its complexities.
The “Five Whys” is a method for performing a “Root Cause Analysis” to detect the true cause of problems in a product or process. The Five Whys was developed by Taiichi Ohno, the father of the Toyota Production System and is a cornerstone for how Toyota achieves the quality it does.
Software development is a complex process and modern business software is quite large and complicated. If something goes wrong, we have a tendency to just blame the first named culprit, for instance if a bug appears in the field then blame the QA person that tested that feature. Then this is usually accompanied with a lot of talk about people being “accountable” for the work they do. But if you step back a bit you realize that actually a much larger system has failed. The QA person wasn’t the only person to look at that feature. The programmer was responsible for testing it (both manually and providing unit tests) before giving it to QA. A software architect reviewed what the programmer was doing. The entire product went through a regression test before release. This feature was reviewed by a Business Analyst and a Usability Analyst. The product and feature had gone through a customer Beta Testing phase. And then in-spite of all this process and review by all these responsible and accountable people, the bug still made it out into the field. So to just superficially blame one person doesn’t give a good perspective on what went wrong and doesn’t help preventing the problem happening in the future.
The real goal of these methods is to allow companies to learn from their mistakes rather than to assign blame. Generally the process works way better when everyone is confident that this is the real intent. If everyone believes the goal is to assign blame, then the whole process goes political and getting to the truth is usually impossible.
Learning from mistakes has its own pitfalls. One common theory for why successful companies eventually fail is that they dutifully analyze problems and learn from their mistakes. For every problem that happens they develop a process or procedure to prevent that problem ever happening again. The downfall here is that as time passes, the number of processes and procedures created from this process becomes huge. This makes the company very bureaucratic and getting things done very hard. Great care has to be taken that the outcome of these studies doesn’t just bury a company in procedures that get harder and harder to follow. The best result is if the current process can be simplified, perhaps eliminating the step that created the mistake. After all simple processes and procedures are much easier to follow and less error prone. When developing remedies to problems analyzed, careful attention has to be paid to the cost of the solution, so that the solution isn’t worse than the original problem.
Performing a Root Cause Analysis study takes time and attention. So when you want to get started with this, start with the worst and most easily defined problems first. Run a few pilots to get used to the methodology. One good place to start is problems found in the field by customers, as mentioned before these indicate a fairly serious systemic failure which might be useful to fix. Problems found in the field are usually categorized by severity, so you could start by looking at the most severe problem first and then working down the list.
The Five Whys
The basic concept behind the Five Whys is really simple. Basically keep asking why for five times to get past the superficial reasons to the underlying reason. It’s something that we all did as three year olds when we kept annoying our parents by asking why over and over again. Basically as children we perceived that this was a good way to get a deeper understanding of the world, but then we abandoned this technique as we got older (or were scolded enough times). Basically the insight here is that we had it right as three year olds and shouldn’t have given up.
As an example perhaps of an interviewer asking a programmer:
- Why did this defect make it into the product?
- Because the code review and unit tests didn’t catch it.
- Why didn’t the code review catch it?
- Because we substituted a reviewer from another team.
- Why did you substitute who was doing the reviews?
- Because our team was frantically coding and couldn’t spare the time.
- Why were they so busy coding?
- Because the stories they committed to were larger than expected.
- Why was that?
- Two key dependencies were underestimated and forced a lot of extra work.
This is a fairly simplified and abbreviated example, but shows the basic idea. Also notice that each question can often lead to several avenues of pursuit for the next level. It’s up to the interviewer to decide whether to pursue several or follow the main thread. The diagram below shows how this might go.
Root Cause Analysis
When you perform a Root Cause Analysis (RCA), you usually follow the following phases:
The Five Whys are usually performed in the “Data Analysis and Assessment” step. The “Data Collection” step before is used to identify the correct people, so you know who to ask the Five Whys to. Typically you want to have a separate Five Whys interview with each of the people involved with the problem and then perhaps additional people depending on the outcome of the interviews.
Generally you do each interview separately and document all the questions and answers. We document everything to do with the process on a Wiki, so everything is kept transparent and visible. After all the interviews you then need perform the assessment to identify the root causes and choose the items you want to pursue. This is usually done in a couple of brainstorming sessions. First everyone studies the documented Five Why interviews then they get together to discuss what happened and make suggestions of how to change processes or make other changes. All these ideas are documented. Then some time is left to think about them and another meeting happens to decide which ideas will be implemented, keeping in mind that we don’t want solutions worse than the problem and don’t want to introduce unnecessary bureaucracy. Then we apply the solutions and inform all affected departments. Follow up is scheduled to ensure that the solution is working and we are getting the desired results, if they aren’t working they should be scrapped or changed.
Usually when people first read about RCA they take it as a very heavy handed procedure to address simple problems. However when you’ve run a few of these, you quickly realize that most problems aren’t that simple and that solving systemic problems can be quite difficult. I’ve found Five Whys an extremely effective method to get to the bottom of things, where one of the key benefits is that it forces people to take the time to really think about what happened. Then the documentation produced provides a lot of visibility into what happened which usually makes implementing the solution face less resistance.
Recently Software Development has been borrowing a lot of ideas from Japanese Lean Manufacturing. In a way this mirrors the progression from the Detroit way of building cars pioneered by Henry Ford to the modern way pioneered by Toyota.
The original Ford production line approach was that a car was assembled on an assembly line by going through a series of specialized stations that were optimized to do one things really well and efficiently, so the car would go from perhaps the attach doors stations to the add Window glass station. Each station was optimized to do its job really well and the priority was to keep the assembly line moving no matter what. If the assembly line ever stopped then the hundreds of people working on it would be idled and this was considered a disaster. If problems occurred then they were noted and fixed post-assembly line. This process was then improved by optimizing each station to be quicker and more automated. Generally this produced a very good system of producing a very high volume of identical cars.
From the outside a Toyota assembly line looks nearly identical. But it operates very differently. Toyota didn’t have the same large market as Detroit and didn’t need the volume. They needed to produce more specialized cars and needed to switch their assembly lines quickly from one run to another. Their emphasis wasn’t on making very specialized machines to do one task quickly; it was on producing general machines that could do a number of different things and could be switched quickly from doing one job to another. Toyota then does small runs on their assembly line rather than one massive long run. The big benefit to doing things in small batches was that quality problems are found much sooner. Hence they can stop the assembly line on the first car and fix a problem, rather than producing thousands of cars with the problem and then possibly not fixing the problem because it’s too expensive at that point. This also re-enforces the practice of fixing problems right away rather than letting them pile up and that it’s more cost effective to have the production line idle while the problem is fixed than having to fix it later.
Another example from the book “Lean Thinking” by James Womack and Daniel Jones recounts a story of one of the authors stuffing envelopes with their two young children. The children wanted to take the approach of folding all the letters, then putting all the folded letters in the envelopes then sealing all the envelopes then putting stamps on all the envelopes. This seems like an efficient and intuitive way to approach the problem. However the author insisted they do it the less intuitive way of one complete envelope at a time, i.e. folding one letter, stuffing it in an envelope, sealing it and putting a stamp on and then going on to the next one. Which way is faster? It turns out that researchers have timed groups of people performing this task and it turns out the one at a time approach is faster! It has a couple of other advantage as well. One is that if there is a problem, you find out right away; if the envelope is the wrong size, you find out on the first letter, not after you have folded all the letters, this could save you then having to refold all the letters to fit the envelopes. Similarly if you take a break you know how far along you are, if you’ve completed 40 out of 100 letters then you are 40% done and you can estimate your remaining time easily. You can’t do this when you do each step completely. This process is called “single piece flow” and is an example of doing a task as lots of small batches rather than one large batch. This is a general principle you want to apply in everything you do, resisting the old Ford type assembly line type mentality. You also receive the first finished product off the assembly line quicker, so you can validate it with your customer before too many more are produced to ensure you are producing the right thing.
Kanban vs Scrum
To some degree batch size is at the heart of the difference between the Kanban and Scrum methods of developing software. Scrum tends to batch things into sprints (usually two or three weeks long) which is much smaller than traditional waterfall development, but Kanban wants to take it further. In Kanban you only work on one or two small things at a time and finish one before starting the next things. You operate on lots of small batches, but you also have to limit how many batches (or tasks) you can have in progress at a time. This is called limiting work in progress (WIP). In Kanban you don’t worry about batching things up into sprints, just keeping the jobs small and limiting WIP. Kanban requires a lot of discipline and often if a development team hasn’t successfully progressed from waterfall to scrum, then adopting Kanban is a disaster.
The goal here is to produce small high quality increments that are immediately validated by customers. This way if you are going in the wrong direction, you won’t get too far before discovering it and not too much time will have been wasted (only a couple of small batches).
How do you get customer feedback, one small increment at a time? Obviously you can’t have dozens of customers download, install and play with a new version of your product every time you add a small increment and then interview them about it. But there are techniques you can use to do just this. The first is to produce your software via “continuous deployment”. This means that every build of your software is automatically deployed to customers whether that means pushing it to a web server or loading it on an automatic software updater service. You need to be able to control which customers get it and you need to be able to roll it back if something goes wrong. To be confident with this sort of procedure you need really good automated testing in place to ensure you haven’t accidentally broken something. Then to find out whether your increments (or batches) provide customer value, you need really good instrumentation in your product to report back on customer usage, whether they use the feature, how long it takes to perform tasks and any other useful statistics.
Software products are large and complicated systems. Managing that complexity is a major challenge. Breaking large tasks down into many small tasks is a powerful tool to reducing complexity. Combining many small but high quality parts results in emergent complexity, but again it is much easier to manage by adding one small thing at a time rather than suddenly throwing several large items into the mix at once.
Most of the ideas in this blog posting come from my reading the “Lean Startup” by Eric Ries over the holidays. Quite a good book with many useful ideas on product development and innovation. Reducing Batch size and limiting WIP appeal to me to help make Sage more agile and innovative as we move forward.
Traditionally with ERP systems, a new version is released every 18 months or so. Typically customers will only upgrade every few versions for a number of reasons discussed below. This leads to a major problem, namely if a customer requests a new feature and we jump on it, then it still could be four years before they see this feature. How this usually works is shown in the following timeline.
In an agile on-line world, this seems like rather a slow process to provide value to customers. Most Sage products have an on-line suggestion web site, such as https://www11.v1ideas.com/SageERPAccpac/Accpac for Sage 300 ERP. However customers tend to give up on making suggestions to these once they enter a number of suggestions and don’t see any movement in the next few months. As more and more people are part of the MTV Generation (or later) then instant gratification becomes the expectation.
Traditionally this is all part of a fairly large and involved Product Development Life Cycle. Product Managers spend their time talking to customers and partners as well as evaluating various industry trends to come up with a list of features for the next version. Usually a number of themes are developed and implemented as major new features. R&D then takes as many such themes and features that they think this can develop in 18 months. If all goes well, these are implemented, extensive testing is performed and an updated version of the product ships after these 18 months. Then if Product Management has done their job properly and R&D has faithfully implemented what was laid out, then customers should be fairly happy with the new version.
Of course things aren’t quite that rigid anymore. Most development groups (including Sage ERP) have adopted the Agile Software Development Lifecycle where the development team is implementing features in small batches taken from a product backlog. The product backlog is in priority order so the higher priority features are implemented first. A side effect of this is that for lower priority items in the backlog, Product Management can swap these out with other higher priority features as business needs or understanding changes. This then leads to greater flexibility and does allow some newer customer suggestions to make it into a product sooner. I blogged on some aspects of this here and here.
Often we are tasked with the goal of shortening our release cycles; say from 18 months to 12 months or even 9 months. However we get a lot of resistance from customers and partners on this. The main reason is that traditionally upgrades are fairly costly and time consuming activities that can often be quite disruptive to a business. As a result customers typically don’t want to upgrade more than every three years, so there is no increase in customer value by providing more frequent releases. Even then the main reason customer’s site that they upgrade is to stay in support, since we only support the two most recent versions or to stay compatible with upgrades from other vendors, notably Microsoft. If you need to know you work well on Windows 7, then you need a more recent version of your products. A large part of this problem has been caused by us software vendors, when presented with a choice to add a new feature to a release or to make upgrading easier, we consistently choose to add the new feature and leave any difficulty with upgrading to the Business Partners to deal with; who, of course, pass this cost on to our customers. Right now 18 months works well because by supporting two versions, customers only need to upgrade every 3 years.
So out of this we have two significant problems to address:
- Making upgrades easier so it isn’t a huge inconvenience and cost to upgrade.
- Make the features and functionality we add to our products more compelling so customers cite these as the reason for upgrading and not just keeping up with Microsoft compatibility problems.
To address the first point, we’ve been working hard to make the upgrade process easier. This is Sage’s Frictionless Upgrade initiative. We’ve just begun on this journey, but in a number of products we are already seeing results. Such as in Sage 300 ERP we’ve combined all the separate module installations into one master product installations, allow you to activate multiple applications to the new version in one go and to minimize database changes so Crystal Reports and Macros don’t need modifications.
Going Online with services like www.accpaconline.com can help, since now you don’t need to install the software yourself, but you still have the work of making sure custom reports still work and training staff on any changes to their workflow. Generally the process of upgrading can be done well or badly whether you are online or on-premise. Certainly we’ve seen cases where overnight changes to web sites like Facebook have been very disruptive.
Many products such as Windows or Apple iTunes have an auto-update feature, so even though these are on-premise installed on the local computer type applications, they can still be upgraded frequently with little fuss. Expect to see this sort of functionality come to more and more business applications allowing the software to be upgraded frictionlessly without requiring a major database conversion or requiring changes to customizations.
Making Features Compelling
The intent with all the customer visits, customer interviews and other research that Product Management performs is that the features we add to the product are compelling either to existing users or to attract new users. A lot of work goes into researching and collecting all the data that goes into this decision making. Generally this yields better results than being driven by technology trends or by what competitors are doing; however, we tend to not being seeing the results we would like. What is missing?
With the scientific method, like what you see in Physics, Chemistry or Biology, people come up with theories, but these aren’t really believed until they are backed by experimental evidence. In software development we tend to do a lot of work doing research and creating theories and spend a lot of time discussing these theories with customers, partners and analysts. However we don’t take the next step and scientifically test these theories in real customer situations.
We tend to rely on doing studies on customers using prototypes or mockups. But when customers aren’t in their real work environment, doing real work, the results tend to be suspect. Is the customer just telling you they like your new feature to make you feel good, even though they will probably never use it? Are the questions being asked at interviews slanted to get the answers the researcher wants?
I recently read “The Lean Startup” by Eric Ries. This book, among other things, heavily recommends real scientific experiments with real customers in their real environment. No more separate studies with proof of concepts or mock ups. In a way we are doing this now. The only problem is that we don’t know the result until after four years and then it’s very hard to remove something that really didn’t work.
So how do we perform these experiments on real users in their real workplaces doing real work? Sure it’s easy for Facebook to try a new feature in say one town to see what the reaction is, but here we are talking about real business software that people rely on to run their businesses.
The real trick to this is to keep features small and only try small changes at a time. This is often referred to as a Minimum Viable Product or MVP. Only the absolute minimum functionality to do the job is produced. Any additional functionality is only developed (and tested) due to customer demand. This tends to greatly reduce complexity and unnecessary bells and whistles that we see in so much software. This then allows features to be developed and tested quicker. In manufacturing this is referred to as keeping batch size small.
This requires a great deal of good instrumentation in the product, so we know if a feature is being used and if it is helping (are users really more productive entering invoices?). Creating a full release with all the changes from the work produced by a large team over a year is a major undertaking. But adding small changes that have been fully QA’ed is fairly easy. In fact we already do this with hotfixes. In a way each hotfix is an experiment as to whether we have fixed a defect (hopefully mostly we have). So why are small new features any different than hotfixes?
You might argue that SaaS vendors can do this easier, since they can introduced the change for just a few customers by routing them to a different application server. However with auto-update we could do the same for on-premise customers. Give them the new feature and then based on the results, if they are good, roll it out to all customers, but if they are bad, then roll it back for anyone that received it.
These techniques have produced great results in a number of well documented cases, but will they be accepted for business software? I think right now there will be a fair bit of resistance to these sort of practices, but I think it is really a PR exercise to prove to customers that by participating in these sort of experiments, that they will get the features they are asking for sooner and that the new features they receive will be far more valuable. One of the main advantages of these techniques is reducing waste, by detecting things that won’t work quickly and discarding them quickly. This way some really dumb idea can’t make its way all the way through development using all those resources, to just annoy users in the end. Chances are this will start with new connected services or new on-line features, but I think eventually this could become more mainstream for even large ERP and CRM systems.
New techniques are being developed to develop customer value quickly and to get it into customer’s hands much faster than we have traditionally. However some of these techniques like experimenting on customers running live systems are going to take a bit of good PR to prove their value to get the necessary participation. If this all sounds crazy, give “The Lean Startup” a read, it paints a fairly compelling picture.