Accpac and Business Continuity
I remember long ago when hard-disks in PCs were new, that I never trusted what I saved until I had it backed up to some floppy disks. These days, a lot of people don’t seem to worry about the hard drive on their PC failing. In fact I think more people back up because they are worried they will get a virus or Trojan and have to re-install their computer from scratch. But even with better reliability hard drives can fail and even with virus scanning software, you can still get infected.
It used to be that you did a backup every night and if something failed, you said oh well and restored from backup. Often you would lose the day’s work when things failed, and then get behind another day (if you were lucky) while the system was restored. Generally you backed up to tape, perhaps doing a full backup on the weekend and incremental backups on weekdays; but, you could backup your entire hard drive to one tape. Tapes were never that reliable and restoring a backup was always a nail-biting experience. People started using CDs and later DVDs to backup to as these were more reliable, longer lasting and far easier to restore from (especially individual files).
However hard disk technology fast outpaced all these backup technologies. You can now get 2 tera-byte hard drives for around $100. This is 500 4Gig DVDs. So as it stands, really, the only way to backup a large hard drive is to another large hard drive. Along these lines there are many solutions from various RAID and disk mirroring solutions to replicated database servers.
Further today’s businesses can’t afford to have their computer systems down for any length of time. Businesses are now much reliant on their computerized systems and can’t run manually for a few days while the automated system is fixed. Today’s businesses require their systems are available and working properly 24x7x365. Most businesses now develop a formal Business Continuity Plan (http://en.wikipedia.org/wiki/Business_continuity_planning). This is a formal plan that documents how the business will recover from all sorts of common and uncommon disasters.
All these backup and redundant systems cost money, so there is a business decision to make. What does it cost you, when your systems are down? Do you run with many disparate data centers or one centralized one? What are the environmental risks to your data centers (earthquakes, hurricanes, tornadoes, etc.)? Backup generators can be quite expensive and require quite a bit of maintenance. Some redundant systems impose a performance penalty while you are running normally; is it worth living with this penalty compared to how often you need the redundancy?
At the hardware level, first there are all sorts of redundant drive solutions. There are lots of ways to mirror hard drives so if one fails, the other will be used. These can either be through multiple drives in a server made redundant through RAID (http://en.wikipedia.org/wiki/RAID) or via various NAS (http://en.wikipedia.org/wiki/Network-attached_storage) and SAN (http://en.wikipedia.org/wiki/Storage_Area_Network) solutions. The thing to remember here is that although the disk drives are redundant, the enclosure holding the drives isn’t. This means if something in the enclosure fails, then the drives are inaccessible until they are placed in a new enclosure (server, NAS, etc.). There is also the problem that being in the same box, they could all be damaged by a flood (bathroom above overflows), a power surge (ESCOM), a fire, or some other local event.
Most companies have uninterruptable power supplies (http://en.wikipedia.org/wiki/Uninterruptible_power_supply) for individual computers, the data center or the whole building. Most servers, these days, have dual redundant power supplies (http://en.wikipedia.org/wiki/Blade_server), this is since the power supply can be damaged by power surges, or generally power supplies seem to fail as often as disk drives. Redundant fans help, along with temperature monitoring software. Check your fans to ensure they are clean and working. Many BIOS’s will do a diagnostic on the fans at startup and let you know if one has failed.
Most companies like to have their data in multiple geographic locations. At a minimum these means off-site backups where backups are regularly moved to and stored at a separate location. This means if your local data center is damaged, once you get your hardware up and running again, then you can restore your backups and continue business. For many businesses this isn’t sufficient. They can’t be down while the data center is restored. Generally businesses would like a hot standby at an alternate location that can be switched to take over, if not automatically then very quickly.
Most modern databases including SQL Server provide quite a bit of functionality to help solve these problems. This can include mirroring or replicating databases to multiple locations to being able to backup the database server without shutting it down. There are quite an array of options and these are to allow tradeoffs to be made between redundancy and performance. For instance if you completely mirror a database to a different geographic region, then a user will need to wait for a transaction to be committed to both databases before proceeding, this can be slow since the link to the remote connection will be slower than the link to a local server. So the database vendors introduced ideas like lazy replication (http://en.wikipedia.org/wiki/Optimistic_replication). In this case the user proceeds as soon as the transaction is saved on the primary server and then this server lazily performs the updates to the backup server. This then means that if the primary server fails, whatever is in the lazy queue at the time will be lost. However the tradeoff is that hopefully the gained productivity while the primary server is working out weighs the few lost transaction in the rare instances when it fails.
Everything that has been discussed so far can be used by Accpac, but doesn’t really have much to do with Accpac. These are all database and server features and not Accpac features. But Accpac is a very flexible product that allows many configurations. So let’s spend a little time discussing Accpac configurations that fit nicely into these scenarios. Remember that to get up and running again, you need to have working Accpac programs, registry settings, site configuration database as well as the company database. You want to ensure that none of these is a single point of failure, that each has a way to either be restored or a way to switch to a hot standby. Generally the program files, registry settings and site configuration don’t change often whereas the company databases are being written to continuously. Ideally you would have hot standbys to switch in when anything fails, but practically this presents problems, like not being able to have two computers on your network with the same name.
If you are using SQL Server and configure Database Setup with the server name, then since this is shared by everyone, you only need to change the server name in one place in Database Setup to get everyone using a new server.
In this configuration, the Accpac programs are stored on a central file server and the database server is either this file server or more usually a separate database server. Each workstation is configured to point to this file server and the orgs.ism files on the file server points to the database server. If an individual workstation fails, then that one person can’t do any work until that workstation is replaced. This isn’t so serious because only 1 person is idle rather than the entire Enterprise. Usually the IS department will have a standard “ghost” image that they can restore on a new workstation and get that person going again in a few hours.
The database server contains the most valuable data. It needs to be backed up frequently. Using mirrored drives (RAID 5) or using a mirrored server is a big help.
The file server can fail. Usually you would want this running on mirrored RAID drives. You want a good backup and ghost image, so you can restore things quickly in case of a catastrophic failure. Windows Servers has a feature called DFS (Distributed File System) (http://www.informit.com/articles/article.aspx?p=174367&seqNum=2) (http://en.wikipedia.org/wiki/Distributed_File_System_(Microsoft)); I haven’t tried DFS, but it looks helpful. Another option is to have a restoration contract with your vendor, for instance Dell offers a service where they will restore your server to running in less than 4 hours (quite a long time, but often acceptable).
In this configuration, all the users access Accpac by logging into a Terminal Server session. All the Accpac programs and configuration files are installed on this Terminal Server and then access a separate Database Server. This configuration has some of the same issues as the previous one, in that the site directory will be only on one server and needs to be made redundant, or easily restored. For terminal server usually you would set up at least two terminal servers, then since you have them have your users split between them (either via load balancing or just assign them). This way you get maximum performance when everything is working, but if one fails you switch those users to the other and away they go. Just make sure that you have enough memory that you can run with all the users on the one server (perhaps a bit slower) for the day or two until you get the second one fixed.
Once we have migrated to a true web based application, then where ever in this article we mention Terminal Server would become Web Server. Then all the concepts would be the same.
In either the workstation setup or terminal server case, you probably now have a mirrored SQL Server that isn’t doing anything when things are working well. One thing many customers do is identify users that only do reporting and point them at the replicated server. Then this moves some of the reporting burden from the primary server to give better performance all around.
The company would setup a configuration like that below.
They would have a well documented business continuity plan. All the machines would be ghosted (http://en.wikipedia.org/wiki/Ghost_(software)) so that any failed machine can be restored using new hardware quickly. Redundant components will be used where ever possible, such as using RAID 5 mirrored disks within each unit, in addition to having redundant hardware. The backup terminal server and secondary SQL Server would be located in a different geographic location than the primary equipment. The company would test switching over from one computer to another to ensure the documentation in the business continuity plan is correct and that they can actually do it. They would purchase the equipment from a vendor that can quickly replace any failed component (perhaps with a less than one day service level agreement (SLA)).
Update 2013/06/08: Note that Windows Server 2012 now support high available file shares via SMB3 – http://www.infoworld.com/d/microsoft-windows/windows-server-2012-brings-high-availability-file-shares-214851. This is a good way to keep your file servers HA.