Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘database

NoSQL for ERP?

with 11 comments


NoSQL databases are a class of database management systems that grew up with the Internet. Generally these refer to any database system that doesn’t use SQL as its query language (though some call them Not only SQL, meaning they support SQL and something else as well). This would make them any database system other than the standard databases. However these are usually associated with a class of database system that is intended to server large Internet companies. In this article we are only considering the large distributed databases and not the specialized in-memory databases. The goal of these is to be highly distributed and highly redundant. As a database scales you would be able to scale usage by just adding servers and have them add to the capacity of the system in a linear manner. You would like to have your database distributed all over the world, so it can be accessed from anywhere with a very low latency.

The main difference between traditional SQL databases and NoSQL databases comes down to the CAP Theorem which says out of the three things:

  • Consistency – roughly meaning that all clients of a data store get responses to requests that ‘make sense’. For example, if Client A writes 1 then 2 to location X, Client B cannot read 2 followed by 1.
  • Availability – all operations on a data store eventually return successfully. We say that a data store is ‘available’ for, e.g. write operations.
  • Partition tolerance – if the network stops delivering messages between two sets of servers, will the system continue to work correctly?

You have to choose two, you can’t have all three.

Then SQL databases choose consistency and partition tolerance and NoSQL databases choose availability and partition tolerance. The CAP Theorem is shown graphically below:

Generally then there has been a proliferation of NoSQL databases that all make a slightly different tradeoffs for the CAP theorem. If they can’t have all three does it make sense to have 2/3 of each perhaps? Generally there has been some great research that has produced amazing results, but this has led to a bit of a tower of Babel of database access languages.

Search Example

The classic example is Google’s BigData database used to power the Google search engine. How is it that when you type a search into Google, you immediately get back a page of search results instantly? How is it translating what you typed into some sort of database query and how is that query operating so quickly? How is Google handling millions of these queries coming from all over the world so quickly? What is the database schema that allows this magic?

The structure of the database is a really simple key-value store. The key is every possible search term that anyone could type in. The value is the table of web sites (and descriptions) to return for this search term. The database is called BigData because this value consisting of all the search results could have millions of hits for you to scroll through. There is no limit to the size of the value and there is no limit to the number of keys. Even so an indexed binary tree read of such a table would still be quite slow and this is Google’s big advancement with an algorithm called MapReduce that lets these simple key reads happen so quickly. The BigTable database spreads this huge table over many servers to operate and it has redundancy of having the database in many places at once. The MapReduce algorithm is a parallel algorithm in that parts of the search can be done by multiple servers in parallel.

Sound wonderful? Google published the MapReduce algorithm and there is an open source implementation of it called Apache HBase, so why doesn’t everyone use it? All the NoSQL databases provide a huge optimization on one of the things a SQL database does, but at a cost of making something else very slow, or eliminating something else entirely. In the case of BigTable, inserting and updating records in this table is very slow. But for Google search this is ok. Google operates a giant number of “web spiders” that are always out there traversing the web looking for new sites and updating the Google database of search results. These are running all the time and updating all the servers all the time. But these updates and inserts might take hours or days to happen, especially if you count the amount of time to find their way over all the thousands of servers involved. But Google doesn’t mind as long as it happens and is eventually consistent.

ERP Database Usage

Generally ERP packages rely very heavily on the tenants of a SQL type relational database. They require database transactioning and consistency (often referred to as ACID). ERP packages require general SQL type queries. They need good performance writing transactions to the database as well as reading and reporting from the database. I blogged a bit about Sage 300 ERP’s database usage here.

The NoSQL camp would contest these statements. For instance if I insert an Order, is it necessary that if I immediately list the Orders that I see it, or is it ok if it shows up in that list, say ten minutes later? Or do ERP packages believe they need all the benefits of SQL without weighing them against the other benefits of alternative systems? Often customers will prefer speed over customizability or functionality.

I like the way in SQL you can hit post and get back a response that a transaction is either all committed or all rolled back. However from working with SQL, I also know this is a huge cost, especially in our ability to scale out the database and is a definite limit on multi-user contention. For a large scale system, I’m really questioning the locking requirements that large transactions like entering an order place on the system.  Remember the “availability” part of NoSQL databases will ensure the Order is entered eventually.

Another problem is adhoc SQL queries. ERP packages tend to like to do reporting and data inquiries using adhoc SQL queries. Usually this is a no no in the NoSQL world. Often in the NoSQL world, all queries must be designed ahead of time using very complicated MapReduce formula. The question is whether this is really necessary. If you have a really powerful Google like search ability built into the database, then do you really need to search the data with SQL queries? Is it that we just do this because we know how and that in fact real users would much rather use the Google method than the SQL method?

Hybrid Approach

Suppose you want the data query ability of a NoSQL database, but still want the SQL transaction integrity and consistency? One approach has been hybrid systems. Think of these as two databases in one where both a SQL and NoSQL database engine operate on your data. Perhaps a cleaner solution is to take a data warehousing approach and create your data in a SQL database, but then archive it to a NoSQL database for various querying and analysis. This is the foundation of many cloud based BI solutions that are out there. With this approach you get some of the benefits of both. What you lack is the high availability and scalability of the database for data entry and regular SQL type queries. But still this can be a way to get a bit of the best of both worlds.


I think it will be a while longer before NoSQL databases gain acceptance in the ERP world. However the amount of innovation going on in the NoSQL world is huge and great strides are being taken. I suspect we will start to see NoSQL begin to be used for non-traditional BI via data warehouses. Even adding Google like search to ERP usually involves adding a NoSQL database on the side. I think once this starts to happen then we will see more and more functionality move to the NoSQL side of things and then perhaps down the road somewhere we will be able to remove the SQL database. But the jury is still our as to whether this is a good long term goal.

Written by smist08

January 28, 2012 at 5:20 pm

Accpac and SQL Server

with 6 comments


I blogged about Accpac and its databases here.  This was a general article about how Accpac works with the various databases it supports. This article is more specific to Microsoft SQL Server and covers various topics that affect Accpac using SQL Server for its databases. I’ll be specifically talking about SQL Server 2008, but most of what I say applies equally to SQL Server 2005. Generally SQL Server is quite a reliable and low maintenance database server that doesn’t require much in the way of configuration and administration. Accpac uses SQL Server in a fairly straight forward manner issuing standard SQL statements without using SQL Server specific modules. This means Accpac runs on all editions of SQL Server, since we don’t use features only available in higher editions, and it means we can work with most of SQL Server’s built in services and configurations. We store the database in an open manner, following standard practices, which allows you to access your Accpac data using standard SQL aware tools including most reporting solutions. I also blogged on business continuity here, which is related to the topics discussed in this article.

Setup – Collation

One of the first decisions you need to make is a database collation sequence. This defines the character set used in the database as well as the sort order for the database. Generally choose the one that is best for your location and the language you use. However there are a couple of notes to consider:

Most western character set names either end in _BIN or _CI_AS such as Latin1_General_BIN or SQL_Latin1_General_CP1_CI_AS. The _BIN means that this is a binary sort order where strings will be sorted using a computer binary order, the upshot is that A-Z are before a-z and things like accented characters will be at the end. The CI_AS ones are case insensitive and will be sorted in dictionary order meaning AaBb… and accented characters are inserted in with their base letter. A few consequences of this for Accpac are:

  1. If you are migrating from Pervasive.SQL, then since Pervasive uses a binary sort order, if you choose a _BIN sort order, there shouldn’t be any problems. If you choose a _CI_ one in SQL Server then since this is case insensitive, you will get record already exists errors if you load a database that has key fields that only vary by case. Typically this happens in unit of measures ilke Ea, EA, ea. If you want to make this conversion you will need to fix up your Pervasive database before DB-Dumping.
  2. If you chose a _CI_ sort order then since strings are case insensitive, when you do searches, say from the finder, then it doesn’t matter the case you enter, so searching for steve would return Steve, STEVE or steve. This is the main reason people want to use this sort order.
  3. If you want to change the sort order, you will need to DBDump create a new database in the new sort order and DBLoad, however you could get problems with the keys and need to make some manual adjustments based on the errors returned.

Setup – Configuration

At a minimum, you want to have TCP/IP and Shared Memory enabled as protocols. TCP/IP is required for the new Sage ERP Accpac 6.0A Portal, and is the network protocol that works best for the rest of Accpac. Shared Memory is a very high performance protocol that will be used if the program using SQL Server is running on the same computer as the SQL Server service, basically bypassing the whole network stack. The main gotcha here is SQL Server express which will have TCP/IP disabled and will further be configured to not receive external requests (connections tab of the server properties in the management studio). This is configured from the SQL Server Configuration Manager:

If you right click on the TCP/IP protocol, choose properties and then select the second tab, you can see the network port being used by SQL Server.

Database Setup

If you are running multiple instances of SQL Server, then you can enter the servername\instancename in Database Setup, however you can’t do this in the portal configuration. In the portal configuration you must you the server name and the correct port number. Each instance of SQL Server will be using a unique port number.

When Accpac first supported SQL Server we only supported access by a pre-setup ODBC data source. This led to problems with people setting these up incorrectly, since there are quite a few settings that were possible. Then later on we allowed you to connect by just specifying the server name (or servername\instancename) in database setup. Then we would correctly setup an ODBC data source for you (usually servername32).  This then worked far more reliably. We left the ODBC option there so people already setup this way wouldn’t be affected, or if there was some reason that special configuration values were required, you could still set them. I don’t know of any such special configuration values, not do I know of any good reason to use an ODBC data source rather than the server name. So I would strongly recommend you always use the server name in Database Setup to configure your SQL Server companies.

Database Dump and Load

DBDump and DBLoad are two utilities that come with Accpac that allow you to easily dump databases into a database independent format and reload them. These allow you to easily move Accpac companies between locations or between different database servers. I blogged on these previously here. These are very handy utilities and have some interesting side effects like re-indexing and compacting databases (since they completely re-create and reload the database). However they aren’t as fast as the built in SQL Server backup and restore facilities.

64 Bit SQL Server

Although Sage ERP Accpac is a 32-bit application, it works just fine with the 64-bit version of SQL Server. You cannot directly load 32-bit DLLs into a 64-bit application, but this isn’t how SQL Server works. Accpac (or any other program) is never loaded into SQL Server’s process space. All communications is between two processes using a standard well defined protocol, so it doesn’t matter whether either is 32 or 64 bits. Generally you want to run the 64-version of SQL Server on a 64-Bit version of Windows Server 2008. This way you can get better performance by adding more memory (the 32-Bit versions don’t really use memory beyond 4gig).

One caveat is that the ODBC data sources used by Accpac are 32 bit ones and stored in the 32-bit part of the registry. To see them you have to use the 32-bit version of odbcad32.exe which you need to run from the strange location of: C:\Windows\SysWOW64\ODBCAD32.exe.

Network Bandwidth

If you are running in a standard workstation setup client/server configuration then each workstation communicates with SQL Server over the network. For best performance you want as good network connectivity as you can get, but generally a 100 Megabit connection is fine.

If you are running using Citrix or Terminal Server then if the SQL Server is also on the Terminal/Citrix server then it can use shared memory access which provides good performance. However this is tempered by how much memory and processor resources are required by the Accpac Business Logic, UIs and SQL Server. Generally once you get past a few users then it works better to separate the SQL Server over to a separate machine with a high speed connection to the Citrix/Terminal server, ideally having them both on a gigabit backbone. As we move to the Web, a Web Server will replace the Citrix/Terminal server and then the same advice would apply for the Web Server.

Another recommendation I’ve heard is to ensure you only use switches and not hubs and to beware that if you are using VOIP that you aren’t saturating your network bandwidth with phone traffic.

Disk Storage

I know a lot of people that swear by various DAS (Direct Attached Storage), NAS (Network Attached Storage) and SAN (Storage Area Network) solutions. But my own experience has been a bit negative with these. I find them very useful for storing large amounts of files, like virtual images, photos and such. But I find them un-reliable and often with performance problems when using them to store databases being directly accessed by a database server.  There are also now cloud services to have your database in the cloud and your application server on premise, but I tend to be skeptical of the performance and uptime of such services. However these all make great backup devices.

My own preference is to use RAID storage on the server. RAID 0 is the fastest, but doesn’t have redundancy (and so never recommend this). Then the other RAID levels offer various levels of redundancy, so if a drive fails, it can be replaced without losing any data. Have a look at the link: The decision of which to go with depends a bit on how much redundancy you want to how much you are willing to pay. RAID 5 is very redundant, but tends to suffer a bit in performance.

A recommendation we often give is to locate your LDF files to RAID 1 storage and your MDF and other IDX files to RAID 1+0 storage.

Even with the reliability of RAID, make sure you still backup your company databases whether mirroring to offsite servers, backing up to rotating external hard drives, DBDumping, etc. Also remember it’s a good idea to test that you can restore your backups. Remember that the value of your company database far exceeds the value of the equipment it is stored on.

Disk Mirroring

SQL Server supports a number of database mirroring options; meaning that you can have your SQL Server stream database changes to a mirror SQL Server. Besides acting as a hot standby in case of failure, you can use this as a reporting server. Often reporting through Crystal can put a fairly big strain on the SQL Server, and one way offload this is to configure users that only do read-only access via reporting to connect to the mirror server. This way a department that runs many reports won’t impact the main data entry that is going on.


These are just a few tips and tricks for dealing with Accpac and SQL Server. Generally as your database becomes larger and the number of users increases, so does the amount of care you need to take of your database and network infrastructure; but, for smaller installations generally you just need to ensure you take regular backups. Of course if you don’t want to deal with database administration at all, then you might want to consider using a cloud service like AccpacOnline.

Written by smist08

August 13, 2011 at 5:44 pm

Posted in sage 300

Tagged with , ,

Accpac and It’s Databases

with 15 comments

Sage ERP Accpac supports running using either Pervasive.SQL (, Microsoft SQL Server ( or Oracle ( Database Servers. This blog posting is going to talk a bit about how Accpac Accounting applications access the database, give some rationale for the way we do things and look at where we may go in the future.

Accpac Plus, the original character based product for first CP/M ( and then MS-DOS ( used a proprietary built-in database. When this product was developed there weren’t standard off the shelf packages to use. This was a simple ISAM based package that was fast and easy to install and configure. The limitations were that:

  • It didn’t support database transactions so if something failed the database was left corrupted.
  • It used record locking as its multi-user model, so once too many people were doing things, too many records would be locked and things would grind to a halt.
  • It didn’t have a server version, so relied on file system locks which were hard to support for all the different network operating system.
  • It didn’t support SQL and hence didn’t fully support ODBC or any other standard interface mechanism making it hard to use external tools like for reporting.

So when Accpac for Windows was developed, we made the decision to use standard off-the-shelf databases. We decided to base our new database support on transactions and to define a set of requirements that a database would need to support to work with Accpac.

CA-Accpac/2000 originally shipped supporting Btrieve 6.15, and then with version 2.5A we added support for Microsoft SQL Server 6. Later we added support for IBM DB2 and Oracle Databases. We can easily move Accpac data from one database to another via our Database Dump and Load utilities.

Along the way we’ve evolved our API and easily moved from database version to database version. Many third party tools integrate with Accpac through open database interfaces. We currently have a very robust and scalable solution.

Accpac Database API

Accpac is a three tier application with the business logic separated from the database. Our business logic objects (the real core of our Accounting Applications) communicate with the database via a common database API. Then we have database drivers that translate this API into calls to the database server currently being used. How the database driver accomplishes this is entirely up to the driver. Our SQL Server driver uses ODBC to communicate with the SQL Sever Native driver. Oracle uses the Oracle ODBC driver. For Pervasive we communicate with both the relational layer using SQL calls, and with the transactional layer making direct Pervasive API calls. The key point is that all this detail is invisible to the Business Logic. The application programmer creating our business logic objects, just needs to know our Accpac database API to do his job. Then all the differences in the databases are isolated to the Accpac database driver. If a new version of a database server is released and Accpac needs to be updated to support it, then its just a matter of providing a new database driver, without requiring any other application changes.

The Accpac database API includes a number of record oriented API calls, a number of meta-data calls and a number of set based operations. The record based APIs operate on one record at a time, such as read a record based on a supplied key, update that record. The meta-data calls return information about the structure of the database, things like get a list of all the tables in the database, get information on all the fields in a table. The set oriented API calls operate on a set of records at a time. Such as read all records where the description starts with A, or update all records changing any with customer number 1200 to 1400, or delete all records with status “deleted”.

Accpac Architecture Diagram


Reporting works a bit differently. We use Crystal Reports as our reporting engine and then Crystal provides the mechanisms to have database independent reports. From Accpac the same Crystal Report is used for Pervasive.SQL, Oracle and SQL Server. We just point Crystal at the correct database and provide the necessary connection information and then let Crystal do the rest. We provide an Accpac reporting API which then handles the details of talking to Crystal. Originally this API was used to talk to CA-RET the Computer Associates Reporting tools, then it was converted to talk to Crystal’s crpe32.dll interface, later to Crystal COM’s interface. With Accpac 6.0A we are upgrading to Crystal Reports 2008 and will use the .Net interface for the VB UIs and the Java interface when running Crystal from the Web. Even though the way we do reporting has changed quite a bit over the years, this has been largely invisible to the applications which just talk to Accpac System Manager’s API and then let System Manager take care of the details.

The Accpac Financial Reporter accesses the database through the Accpac Business Logic and hence the regular Accpac Database API. Accpac Intelligence and Accpac Insight work in a similar manner to Crystal and provide their own database independent mechanisms to access the various databases. Most other reporting solutions also work this way.

Database Requirements

Accpac does have a number of minimum requirements for any database it supports. This includes things like number of fields per table, number of keys supported, record length and such. One key requirement is that the database supports database transactions. As long as a database meets these minimum requirements then it is possible to write a database driver for Accpac to use that database. Note that supporting SQL is not a requirement.

Database Transactions

At the core of Accpac’s database support are database transactions. Any changes made to the database have to be done inside a transaction. A transaction is a group of database operations that are either all applied to the database, or none are applied to the database. So the application never has to worry about half of them being committed to the database. A key differentiator of database servers is how well they handle transactions. The application rule for Accpac is that when the database is changed, operations must be grouped together into a transaction so that the database always has full application integrity. This includes things like the General Ledger never being out of balance. When G/L entries are posted, each entry is a transaction and the database goes from one in balance state to another. This means that if something crashes on either the server or workstation, or an unexpected error occurs then the database will not end up in a corrupt state. Database integrity is always maintained via database transactions.

Multi-User Model

Then our multi-user model builds on the transactioning foundation. Within Accpac we use optimistic concurrency ( as our multi-user model. If two users go to edit the same record, both users bring the record up on their screen at the same time, then they make some changes to various fields, then both hit save. Now both users started with the same record and made independent changes and both want to save, what should the system do? In the most basic optimistic concurrency model, the first person to save wins, their changes are committed to the database and the second user gets an error message the “Record was modified by another user”. With optimistic concurrency we can be a little more clever than this, and for the second user, we could re-read the record and see what the first user changed and if they are unrelated fields then combine our changes and re-save the record. We do this in a number of cases, but it requires a bit of application support to define what “un-related fields” means. The alternative to optimistic concurrency is to lock the record when the first user reads it, and not let the second user read the record at all. The second user would get a message saying that someone is editing the record and they can’t access it. This saves the second user possibly losing some work. However it imposes a huge multiuser penalty since if a lot of records are locked, then it is highly likely people are going to trip over the locked records all over the place and be blocked from doing work. They won’t be able to load the record just to view it. Accpac Plus used this model of multi-user processing and it restricted Accpac Plus to pretty much only having 25 users, since after that, the number of locks would just cause too many problems. Most systems with high numbers of users use optimistic concurrency.

So what does optimistic concurrency have to do with database transactions? Most document entry in Accpac involves updating and inserting many different database tables. What happens if you are save an O/E Order and you have updated 5 tables and then on the sixth you get an error? You can either re-read the record and reconcile the two sets of changes or you have to abort saving the order. If you have to abort saving the order, you need to do a transaction rollback to undo the changes to the first 5 tables. Fortunately with database transactions this is easy, and you never have to worry about corrupting the database if you encounter an error (multiuser or otherwise). So for complicated header detail entry, you can only really do optimistic concurrency if you are based on a transactioning model.

Stored Procedures and Database Scale-out

A common question is why we don’t use stored procedures? Besides the problems with portability across databases and the attendant lock-in to one solution, there are definite problems with scalability. Since Accpac is aimed at the mid-market, like most such products we don’t support arbitrary database scale-out. This means if you have trouble with the performance of the SQL Server, you just can’t add another SQL Server to solve the bottleneck. We of course do support many configurations with multiple database servers, but there are restrictions. However you can have as many application servers as you like. If you run workstation setup type client server then every workstation is an application server and takes a share in running the business logic. If you deploy via Terminal Server or Citrix, they you can add as many of these as you like (either dedicating users to specific servers or adding a load balancer). So we want to put as little load on the database server as possible, to allow users to use one database server for as long as possible. If we ran stored procedures on the database server then this puts considerable load on the server and makes the database server become a bottleneck quite quickly. People often point out that they are getting slow, but SQL Server is only say 15% utilized, and we say add another applications server. They say why not shift more work to the database server. The problem being that once the database server is full, you’re done. Adding application servers is easy.

We support a couple of configurations with multiple database servers. One is to split a big company into multiple divisional companies within Accpac then run each divisional company on a separate database server. You then combine the data for financial reporting using G/L Consolidations. Another configuration is to add a second database server that is a replicated copy of the primary. Then if you have a number of users that only do reporting then you point them to the replicated server (and you can have as many of these as you like). However you can only have one primary server where database updates occur. But often reporting is quite a load on the database server and moving this to another server can really help.

True database scale-out means you can add database servers at will, but this requires the application be structured quite differently. For instance for Order Entry you might have one server designated the primary, it would contain a small Order header table with the order number and a link to the database that actually contains the order. So as people perform Order Entry, each order only requires one simple insert on the primary server and the actual order is placed on one of the other servers in a round robin fashion. This then allows scale-out of the Order Entry process, since you can add as many of these slave database servers as you like. The complexity comes in for reporting, since now to generate a report on a range of orders, you have to consolidate the data from a number of database servers, greatly adding to the complexity.

The Future

Originally our next database target was going to be MySQL. One reason being to reduce the TCO for our customers, as Sage as a very good deal for distributing MySQL with our applications. Another reason is to implement SaaS down the road, where to be economical, we have to ensure that we don’t have to pay all the subscription revenue to other software companies. However since Oracle bought MySQL, it’s become very questionable whether MySQL will maintain its TCO advantages or whether Oracle will invest in it sufficiently to keep it competitive. So right now we are taking a wait and see approach to MySQL.

Another approach that is becoming popular are the so-called “NoSQL” databases ( which stands for ‘Not only SQL’. These databases are all about database scale-out, but since they think about large scale scalability from the start, they tend to be very different from relational (SQL) databases. Often they load the entire database into memory, and will naturally partition the database over multiple servers in a redundant manner. Often these databases remove the “hard” parts of SQL to make data access faster and more scalable. One example of these is Google Fusion (

Along these lines the Apache foundation has been developing the Hadoop platform ( along with the HBase database ( This is a very ambitious project to provide the entire infrastructure necessary to build fully scalable large scale web based applications like a Facebook or Twitter.

Some of the ideas from the Web have a place in Accounting Applications. For instance quick natural language Google-like searches. If the database is partitioned over 5 servers then these server can each produce their part of the results in parallel and then they can be combined and returned. The key point being the parallelism employed to return possibly difficult queries quickly.

Some ideas are perhaps not going to be so popular. For instance when you refresh Facebook you often get different results each time you do it. Facebook sends out the query to a bunch of servers and then combines all the results returned in a short fixed amount of time and displays your page, but any results coming late from a slow server are ignored. Will accounting users appreciate getting most of the results quicker rather than all the results slower?

Database technology is the foundation for Accounting Applications like Sage ERP Accpac. Most Accounting Applications started life using flat file ISAM packages or home grown solutions. Then at some point made the jump to standard SQL databases. Certainly we’ve gotten a lot of mileage out of SQL, but as the amount of information in data warehouses and other corporate databases grows, where we join to data out on the internet, are we at the point where we are going to see the start of a transition to the next big thing in databases?

Written by smist08

July 10, 2010 at 4:23 pm