Accpac and It’s Databases
Sage ERP Accpac supports running using either Pervasive.SQL (http://www.pervasivedb.com/Database/Pages/home.aspx), Microsoft SQL Server (http://www.microsoft.com/sqlserver/2008/en/us/) or Oracle (http://www.oracle.com/us/products/database/index.html) Database Servers. This blog posting is going to talk a bit about how Accpac Accounting applications access the database, give some rationale for the way we do things and look at where we may go in the future.
Accpac Plus, the original character based product for first CP/M (http://en.wikipedia.org/wiki/CP/M) and then MS-DOS (http://en.wikipedia.org/wiki/MS-DOS) used a proprietary built-in database. When this product was developed there weren’t standard off the shelf packages to use. This was a simple ISAM based package that was fast and easy to install and configure. The limitations were that:
- It didn’t support database transactions so if something failed the database was left corrupted.
- It used record locking as its multi-user model, so once too many people were doing things, too many records would be locked and things would grind to a halt.
- It didn’t have a server version, so relied on file system locks which were hard to support for all the different network operating system.
- It didn’t support SQL and hence didn’t fully support ODBC or any other standard interface mechanism making it hard to use external tools like for reporting.
So when Accpac for Windows was developed, we made the decision to use standard off-the-shelf databases. We decided to base our new database support on transactions and to define a set of requirements that a database would need to support to work with Accpac.
CA-Accpac/2000 originally shipped supporting Btrieve 6.15, and then with version 2.5A we added support for Microsoft SQL Server 6. Later we added support for IBM DB2 and Oracle Databases. We can easily move Accpac data from one database to another via our Database Dump and Load utilities.
Along the way we’ve evolved our API and easily moved from database version to database version. Many third party tools integrate with Accpac through open database interfaces. We currently have a very robust and scalable solution.
Accpac Database API
Accpac is a three tier application with the business logic separated from the database. Our business logic objects (the real core of our Accounting Applications) communicate with the database via a common database API. Then we have database drivers that translate this API into calls to the database server currently being used. How the database driver accomplishes this is entirely up to the driver. Our SQL Server driver uses ODBC to communicate with the SQL Sever Native driver. Oracle uses the Oracle ODBC driver. For Pervasive we communicate with both the relational layer using SQL calls, and with the transactional layer making direct Pervasive API calls. The key point is that all this detail is invisible to the Business Logic. The application programmer creating our business logic objects, just needs to know our Accpac database API to do his job. Then all the differences in the databases are isolated to the Accpac database driver. If a new version of a database server is released and Accpac needs to be updated to support it, then its just a matter of providing a new database driver, without requiring any other application changes.
The Accpac database API includes a number of record oriented API calls, a number of meta-data calls and a number of set based operations. The record based APIs operate on one record at a time, such as read a record based on a supplied key, update that record. The meta-data calls return information about the structure of the database, things like get a list of all the tables in the database, get information on all the fields in a table. The set oriented API calls operate on a set of records at a time. Such as read all records where the description starts with A, or update all records changing any with customer number 1200 to 1400, or delete all records with status “deleted”.
Accpac Architecture Diagram
Reporting works a bit differently. We use Crystal Reports as our reporting engine and then Crystal provides the mechanisms to have database independent reports. From Accpac the same Crystal Report is used for Pervasive.SQL, Oracle and SQL Server. We just point Crystal at the correct database and provide the necessary connection information and then let Crystal do the rest. We provide an Accpac reporting API which then handles the details of talking to Crystal. Originally this API was used to talk to CA-RET the Computer Associates Reporting tools, then it was converted to talk to Crystal’s crpe32.dll interface, later to Crystal COM’s interface. With Accpac 6.0A we are upgrading to Crystal Reports 2008 and will use the .Net interface for the VB UIs and the Java interface when running Crystal from the Web. Even though the way we do reporting has changed quite a bit over the years, this has been largely invisible to the applications which just talk to Accpac System Manager’s API and then let System Manager take care of the details.
The Accpac Financial Reporter accesses the database through the Accpac Business Logic and hence the regular Accpac Database API. Accpac Intelligence and Accpac Insight work in a similar manner to Crystal and provide their own database independent mechanisms to access the various databases. Most other reporting solutions also work this way.
Accpac does have a number of minimum requirements for any database it supports. This includes things like number of fields per table, number of keys supported, record length and such. One key requirement is that the database supports database transactions. As long as a database meets these minimum requirements then it is possible to write a database driver for Accpac to use that database. Note that supporting SQL is not a requirement.
At the core of Accpac’s database support are database transactions. Any changes made to the database have to be done inside a transaction. A transaction is a group of database operations that are either all applied to the database, or none are applied to the database. So the application never has to worry about half of them being committed to the database. A key differentiator of database servers is how well they handle transactions. The application rule for Accpac is that when the database is changed, operations must be grouped together into a transaction so that the database always has full application integrity. This includes things like the General Ledger never being out of balance. When G/L entries are posted, each entry is a transaction and the database goes from one in balance state to another. This means that if something crashes on either the server or workstation, or an unexpected error occurs then the database will not end up in a corrupt state. Database integrity is always maintained via database transactions.
Then our multi-user model builds on the transactioning foundation. Within Accpac we use optimistic concurrency (http://en.wikipedia.org/wiki/Optimistic_concurrency_control) as our multi-user model. If two users go to edit the same record, both users bring the record up on their screen at the same time, then they make some changes to various fields, then both hit save. Now both users started with the same record and made independent changes and both want to save, what should the system do? In the most basic optimistic concurrency model, the first person to save wins, their changes are committed to the database and the second user gets an error message the “Record was modified by another user”. With optimistic concurrency we can be a little more clever than this, and for the second user, we could re-read the record and see what the first user changed and if they are unrelated fields then combine our changes and re-save the record. We do this in a number of cases, but it requires a bit of application support to define what “un-related fields” means. The alternative to optimistic concurrency is to lock the record when the first user reads it, and not let the second user read the record at all. The second user would get a message saying that someone is editing the record and they can’t access it. This saves the second user possibly losing some work. However it imposes a huge multiuser penalty since if a lot of records are locked, then it is highly likely people are going to trip over the locked records all over the place and be blocked from doing work. They won’t be able to load the record just to view it. Accpac Plus used this model of multi-user processing and it restricted Accpac Plus to pretty much only having 25 users, since after that, the number of locks would just cause too many problems. Most systems with high numbers of users use optimistic concurrency.
So what does optimistic concurrency have to do with database transactions? Most document entry in Accpac involves updating and inserting many different database tables. What happens if you are save an O/E Order and you have updated 5 tables and then on the sixth you get an error? You can either re-read the record and reconcile the two sets of changes or you have to abort saving the order. If you have to abort saving the order, you need to do a transaction rollback to undo the changes to the first 5 tables. Fortunately with database transactions this is easy, and you never have to worry about corrupting the database if you encounter an error (multiuser or otherwise). So for complicated header detail entry, you can only really do optimistic concurrency if you are based on a transactioning model.
Stored Procedures and Database Scale-out
A common question is why we don’t use stored procedures? Besides the problems with portability across databases and the attendant lock-in to one solution, there are definite problems with scalability. Since Accpac is aimed at the mid-market, like most such products we don’t support arbitrary database scale-out. This means if you have trouble with the performance of the SQL Server, you just can’t add another SQL Server to solve the bottleneck. We of course do support many configurations with multiple database servers, but there are restrictions. However you can have as many application servers as you like. If you run workstation setup type client server then every workstation is an application server and takes a share in running the business logic. If you deploy via Terminal Server or Citrix, they you can add as many of these as you like (either dedicating users to specific servers or adding a load balancer). So we want to put as little load on the database server as possible, to allow users to use one database server for as long as possible. If we ran stored procedures on the database server then this puts considerable load on the server and makes the database server become a bottleneck quite quickly. People often point out that they are getting slow, but SQL Server is only say 15% utilized, and we say add another applications server. They say why not shift more work to the database server. The problem being that once the database server is full, you’re done. Adding application servers is easy.
We support a couple of configurations with multiple database servers. One is to split a big company into multiple divisional companies within Accpac then run each divisional company on a separate database server. You then combine the data for financial reporting using G/L Consolidations. Another configuration is to add a second database server that is a replicated copy of the primary. Then if you have a number of users that only do reporting then you point them to the replicated server (and you can have as many of these as you like). However you can only have one primary server where database updates occur. But often reporting is quite a load on the database server and moving this to another server can really help.
True database scale-out means you can add database servers at will, but this requires the application be structured quite differently. For instance for Order Entry you might have one server designated the primary, it would contain a small Order header table with the order number and a link to the database that actually contains the order. So as people perform Order Entry, each order only requires one simple insert on the primary server and the actual order is placed on one of the other servers in a round robin fashion. This then allows scale-out of the Order Entry process, since you can add as many of these slave database servers as you like. The complexity comes in for reporting, since now to generate a report on a range of orders, you have to consolidate the data from a number of database servers, greatly adding to the complexity.
Originally our next database target was going to be MySQL. One reason being to reduce the TCO for our customers, as Sage as a very good deal for distributing MySQL with our applications. Another reason is to implement SaaS down the road, where to be economical, we have to ensure that we don’t have to pay all the subscription revenue to other software companies. However since Oracle bought MySQL, it’s become very questionable whether MySQL will maintain its TCO advantages or whether Oracle will invest in it sufficiently to keep it competitive. So right now we are taking a wait and see approach to MySQL.
Another approach that is becoming popular are the so-called “NoSQL” databases (http://en.wikipedia.org/wiki/NoSQL) which stands for ‘Not only SQL’. These databases are all about database scale-out, but since they think about large scale scalability from the start, they tend to be very different from relational (SQL) databases. Often they load the entire database into memory, and will naturally partition the database over multiple servers in a redundant manner. Often these databases remove the “hard” parts of SQL to make data access faster and more scalable. One example of these is Google Fusion (http://googleresearch.blogspot.com/2009/06/google-fusion-tables.html).
Along these lines the Apache foundation has been developing the Hadoop platform (http://hadoop.apache.org/) along with the HBase database (http://hbase.apache.org/). This is a very ambitious project to provide the entire infrastructure necessary to build fully scalable large scale web based applications like a Facebook or Twitter.
Some of the ideas from the Web have a place in Accounting Applications. For instance quick natural language Google-like searches. If the database is partitioned over 5 servers then these server can each produce their part of the results in parallel and then they can be combined and returned. The key point being the parallelism employed to return possibly difficult queries quickly.
Some ideas are perhaps not going to be so popular. For instance when you refresh Facebook you often get different results each time you do it. Facebook sends out the query to a bunch of servers and then combines all the results returned in a short fixed amount of time and displays your page, but any results coming late from a slow server are ignored. Will accounting users appreciate getting most of the results quicker rather than all the results slower?
Database technology is the foundation for Accounting Applications like Sage ERP Accpac. Most Accounting Applications started life using flat file ISAM packages or home grown solutions. Then at some point made the jump to standard SQL databases. Certainly we’ve gotten a lot of mileage out of SQL, but as the amount of information in data warehouses and other corporate databases grows, where we join to data out on the internet, are we at the point where we are going to see the start of a transition to the next big thing in databases?