Fault Tollerance and Self Healing Processes
Often in large software processes, when an error is encountered the process stops dead with an error message. This can be really annoying if the program is controlling the operation of your car. With ERP packages this can be equally annoying but not quite as deadly. If running a long running process that affects item costing, posts to A/R, G/L etc. If that process doesn’t complete it means your A/R, G/L and costs are not up to date. Even if it stopped because of bad data, the client is still stuck. Worse if it involves posted transactions that can not be edited anymore then what do you do?
It would be better if the process made note of the bad data, perhaps making a log of it and then proceeding past it. Then at least everything is upto date with that small bit of data aside. This makes dealing with the bad data a much lower priority task.
The next level is that rather than just log that something is wrong, can the program take the initiative to fix it? Is there a way to make some reasonable assumptions and proceed. Even if wrong, often to correct the data then only needs adding an adjusting G/L entry. Versus perhaps editing the data in the database and making worse assumptions than the program might.
Certainly stopping long running processes dead with an error is annoying especially if it was meant to run over night and only a small part of the processing got done. At a minimum a program should do as much work as it can without stopping, only leaving the exceptions to be dealt with separately. Being able to heal the data so the user doesn’t have to would be a huge benefit. Certainly something to strive to. There are established processes for dealing with these sort of things like FEMA and FTA. These are usually used in aircraft and automotive software engineering, but applying them to ERP and CRM systems should be a good thing also.