Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘SQL Server

On Calculating Dashboards

with 5 comments

Introduction

Most modern business applications have some sort of dashboard that displays a number of KPIs when you first sign-in. For instance here area a couple of KPIs from Sage 300 ERP:

s300portal

To be useful, these KPIs can involve quite sophisticated calculations to display relevant information. However users need to have their home page start extremely quickly so they can get on with their work. This article describes various techniques to calculate and present this information quickly. Starting with easy straight forward approaches progressing into more sophisticated methods utilizing the power of the cloud.

Simple Approach

The simplest way to program such a KPI is to leverage any existing calculations (or business logic) in the application and use that to retrieve the data. In the case of Sage 300 ERP this involves using the business logic Views which we’ve discussed in quite a few blog posts.

This usually gives a quick way to get something working, but often doesn’t exactly match what is required or is a bit slow to display.

Optimized Approach

Last week, we looked a bit at using the Sage 300 ERP .Net API to do a general SQL Query which could be used to optimize calculating a KPI. In this case you could construct a SQL statement to do exactly what you need and optimize it nicely in SQL Management Studio. In some cases this will be much faster than the Sage 300 Views, in some cases it won’t be if the business logic already does this.

Incremental Approach

Often KPIs are just sums or consolidations of lots of data. You cloud maintain the KPIs as you generate the data. So for each new batch posted, the KPI values are stored in another table and incrementally updated. Often KPIs are generated from statistics that are maintained as other operations are run. This is a good optimization approach but lacks flexibility since to customize it you need to change the business logic. Plus the more data that needs to be updated during posting will slow down the posting process, annoying the person doing posting.

Caching

As a next step you could cache the calculated values, so if the user has already executed a KPI once today then cache the value, so if they exit the program and then re-enter it then the KPIs can be quickly drawn by retrieving the values from the cache with no SQL or other calculations required.

For a web application like the Sage 300 Portal, rather than cache the data retrieved from the database or calculated, usually it would cache the JSON or XML data returned from the web service call that asked for the data. So when the web page for the KPI makes a request to the server, the cache just gives it the data to return to the browser, no formatting, calculation or anything else required.

Often if the cache lasts one day that is good enough, there can be a manual refresh button to get it recalculated, but mostly the user just needs to wait for the calculation once a day and then things are instant.

The Cloud

In the cloud, it’s quite easy to create virtual machines to run computations for you. It’s also quite easy to take advantage of various Big Data databases for storing large amounts of data (these are often referred to as NoSQL databases).

Cloud Approach

Cloud applications usually don’t calculate things when you ask for them. For instance when you do a Google search, it doesn’t really search anything, it looks up your search in a Big Data database, basically doing a database read that returns the HTML to display. The searching is actually done in the background by agents (or spiders) that are always running, searching the web and adding the data to the Big Data database.

In the cloud it’s pretty common to have lots or running processes that are just calculating things on the off chance someone will ask for it.

So in the above example there could be a process running at night the checks each user’s KPI settings and performs the calculation putting the data in the cache, so that the user gets the data instantly first thing in the morning, and unless they hit the manual refresh button, never wait for any calculations to be performed.

That helps things quite a bit but the user still needs to wait for a SQL query or calculation if they change the settings for their KPI or hits the manual refresh button. A sample KPI configuration screen from Sage 300 is:

s300portalconfig

As you can see from this example there are quite a few different configuration options, but in some sense not a truly rediculous number.

I’ve mentioned “Big Data” a few times in this article but so far all we’ve talked about is caching a few bits of data, but really the number of these being cached won’t be a very large number. Now suppose we calculate all possible values for this setup screen. Use the distributed computing powe of the cloud to do the calculations and then store all the possibilities in a “Big Data” database. This is much larger than we talked about previously, but we are barely scratching the surface of what these databases are meant to handle.

We are using the core functionality of the Big Data database, we are doing reads based on the inputs and returning the JSON (or XML or HTML) to display in the widget. As our cloud grows and we add more and more customers, the number of these will increase greatly, but  the Big Data database will just scale out using more and more servers to perform the work based on the current workload.

Then you can let these run all the time, so the values keep getting updated and even the refresh button (if you bother to keep it), will just get a new value from the Big Data cache. So a SQL query or other calculation is never triggered by a user action ever.

This is the spider/read model. Another would be to sync the application’s SQL database to a NoSQL database that then calculates the KPIs using MapReduce calculations. But this approach tends to be quite inflexible. However it can work if the sync’ing and transformation of the database solves a large number of queries at once. Creating such a database in a manner than the MapReduce queries all run fast is a rather nontrivial undertaking and runs the risk that in the end the MapReduces take too long to calculate. The two methods could also be combined, phase one would be to sync into the NoSQL database, then the spider processes calculate the caches doing the KPI calculations as MapReduce jobs.

This is all a lot of work and a lot of setup, but once in the cloud the customer doesn’t need to worry about any of this, just the vendor and with modern PaaS deployments this can all be automated and scaled easily once its setup correctly (which is a fair amount of work).

Summary

There are lots of techniques to produce/calculate business KPIs quickly. All these techniques are great, but if you have a cloud solution and you want its opening page to display in less that a second, you need more. This is where the power of the cloud can come in to pre-calculate everything so you never need to wait.

Advertisements

Written by smist08

February 14, 2015 at 7:25 pm

Sage 300 ERP 2016 SQL Server Only

with 23 comments

Introduction

If you were able to attend the Sage 300 ERP roadmap sessions at Sage Summit you would have heard that the next major release of Sage 300 ERP (named 2016 but released in 2015) will be dropping support for Pervasive.SQL and Oracle as database servers. This means the only supported database will be Microsoft SQL Server. Although we will support several versions of SQL Server long with the Azure SQL flavor.

The intent of this article is to help make sure everyone has plenty of advanced warning about this change. To help explain the rationale behind this decision, and to help people formulate migration plan if you aren’t already running SQL Server.

Pervasive.SQL

PervasiveLogo

The first Windows version of Sage 300 ERP (then called CA-Accpac/2000) was released supporting one database which was good old Btrieve 6.15. We all have fond memories of those days when the world was much simpler, we just needed a simple robust database manager without any other real concerns. At that time we had a good bundling deal with Btrieve so we could include a database engine with every System Manager. At that time Btrieve was owned by Novell. At that point in time Btrieve was a good low cost database manager that supported transactioning, it was used by many ERPs, and was relatively easy to install and administer. Novell sold off Btrieve back to its original developers and that evolved into Pervasive.SQL and last year that was acquired by Actian.

Pervasive.SQL still has the same qualities that Btrieve 6.15 had, but it hasn’t really kept up with its competitors. SQL Server now has a free edition and Microsoft is much more favorable to doing bundling deals. Plus there are now many better low cost database alternatives such as SQLLite and MySQL.

Over that past years the higher end databases have become much easier to install and manage. Long gone are all the configurable parameters that would plague SQL Server installations (i.e. the defaults now work for most cases). So now Pervasive.SQL isn’t as easy to use.

Anyway Btrieve was the first database that Sage 300 ERP supported, and I think a lot of people have fond memories of Btrieve, but today it doesn’t seem to have a place anymore.

A lot of Sage 300 ERP installations require integrations to many other products, and nearly none of these support Pervasive.SQL. Hence if you want integration with BI tools, or other ERP related software, you are almost always forced to use SQL Server anyway. In the early days of Sage 300, SQL Server was very expensive and most products supported Btrieve as a low cost alternative, but today the need for that has disappeared and we are one of the last vendors to still be supporting Pervasive.SQL.

Oracle

Oracle1

We’ve had Oracle support for a while now. However the sales numbers have never really justified the resources required to support this platform. Oracle tends to be the database of choice for larger companies that tend to be bigger than Sage 300’s target market. We’ve made a few government and large company sales because we support Oracle, but generally these customers would have been as well served by SQL Server.

Our perspective is that the demand for Oracle has waned and that they are really pursuing larger and larger companies and moving further and further away from our market segment.

Multiple Product Integrations

Most Sage 300 ERP sites these days involve multiple products working together such as Sage CRM and Sage HRMS. Generally people only want to work with one database system and the common one across the various products is SQL Server. Most products support a choice of databases, like Sage CRM supports Oracle and SQL Server and Sage HRMS supports FoxPro and SQL Server. To get a more uniform experience across all these products really only works well if you choose SQL Server. It’s generally nicer to have just one set up database operations for things like backup.

Further when you start to use more advanced cross product reporting tools, these can only do their job if all the products are based on the same database engine (so that SQL joins can work properly, etc.).

Database Architecture

The Sage 300 ERP architecture is still the same and supports multiple databases, whether we support another database than SQL Server in the future will depend on the future of the database market. It might be a lighter weight SQL engine like SQLLite is best. Or one of the new NoSQL databases that are becoming popular like HBase or MongoDB. Certainly the NoSQL databases support capabilities that SQL Server can only dream of. Similarly products like SQLLite also run on all the mobile an alternate operating systems opening up all sorts of other possibilities. Chances are these will be introduced in a hybrid manner combined with SQL Server rather than as solutions that handle 100% of our system’s needs.

sql-server-logo

For the short term we will be concentrating on SQL Server which means can use some features that are more specific to SQL Server. However one of our keys to success has been sticking to the core SQL engine functionality. That we work fine with SQL Express and Azure SQL (unlike a number of competitors). So we will be careful to ensure anything we do doesn’t break our database independence or our flexibility in supporting all flavors of SQL Server.

Moving to SQL

If you are running an unsupported database and want to move to Sage 300 ERP 2016 then you will need to convert the database. To convert from an unsupported database like Pervasive.SQL, DB2 or Oracle, you need to run Database Dump on your databases, create SQL databases for these in SQL Management Studio, create the entries in Database Setup and then run Database Load. Make sure that you update and test your backup/restore and disaster recovery plans to ensure that you are still protected.

The conversion must be done before upgrading, since the 2016 version doesn’t include the unsupported database drivers and can’t access these databases and hence can’t do a Database Dump on them.

If you leave Pervasive, DB2 or Oracle databases in Database Setup then these won’t show up in any sign on dialogs. We’ve changed the message when you run the desktop, so that if you don’t have any databases because they are unsupported, why this is the case and to let you run Database Setup.

If you don’t want to switch to SQL Server, it just means the last version you can upgrade to is Sage 300 ERP 2014. This will be supported for its normal lifecycle. When it goes out of support, of course your software will still operate. But you won’t be able to get any new Service Packs or Hotfixes. This should present a quite large window on when to switch. These days, nearly all new sales are SQL Server and the number of SQL installs is the largest share and of course every one already running SQL Server won’t be affected.

Summary

The database world is changing and Sage 300 ERP needs to change with it. That’s why we are making these changes. We hope that converting your Pervasive or Oracle databases to SQL Server won’t be too painful and that you will get quite a few long term benefits from this move.

 

Written by smist08

September 2, 2014 at 12:23 am

Accpac and SQL Server

with 6 comments

Introduction

I blogged about Accpac and its databases here.  This was a general article about how Accpac works with the various databases it supports. This article is more specific to Microsoft SQL Server and covers various topics that affect Accpac using SQL Server for its databases. I’ll be specifically talking about SQL Server 2008, but most of what I say applies equally to SQL Server 2005. Generally SQL Server is quite a reliable and low maintenance database server that doesn’t require much in the way of configuration and administration. Accpac uses SQL Server in a fairly straight forward manner issuing standard SQL statements without using SQL Server specific modules. This means Accpac runs on all editions of SQL Server, since we don’t use features only available in higher editions, and it means we can work with most of SQL Server’s built in services and configurations. We store the database in an open manner, following standard practices, which allows you to access your Accpac data using standard SQL aware tools including most reporting solutions. I also blogged on business continuity here, which is related to the topics discussed in this article.

Setup – Collation

One of the first decisions you need to make is a database collation sequence. This defines the character set used in the database as well as the sort order for the database. Generally choose the one that is best for your location and the language you use. However there are a couple of notes to consider:

Most western character set names either end in _BIN or _CI_AS such as Latin1_General_BIN or SQL_Latin1_General_CP1_CI_AS. The _BIN means that this is a binary sort order where strings will be sorted using a computer binary order, the upshot is that A-Z are before a-z and things like accented characters will be at the end. The CI_AS ones are case insensitive and will be sorted in dictionary order meaning AaBb… and accented characters are inserted in with their base letter. A few consequences of this for Accpac are:

  1. If you are migrating from Pervasive.SQL, then since Pervasive uses a binary sort order, if you choose a _BIN sort order, there shouldn’t be any problems. If you choose a _CI_ one in SQL Server then since this is case insensitive, you will get record already exists errors if you load a database that has key fields that only vary by case. Typically this happens in unit of measures ilke Ea, EA, ea. If you want to make this conversion you will need to fix up your Pervasive database before DB-Dumping.
  2. If you chose a _CI_ sort order then since strings are case insensitive, when you do searches, say from the finder, then it doesn’t matter the case you enter, so searching for steve would return Steve, STEVE or steve. This is the main reason people want to use this sort order.
  3. If you want to change the sort order, you will need to DBDump create a new database in the new sort order and DBLoad, however you could get problems with the keys and need to make some manual adjustments based on the errors returned.

Setup – Configuration

At a minimum, you want to have TCP/IP and Shared Memory enabled as protocols. TCP/IP is required for the new Sage ERP Accpac 6.0A Portal, and is the network protocol that works best for the rest of Accpac. Shared Memory is a very high performance protocol that will be used if the program using SQL Server is running on the same computer as the SQL Server service, basically bypassing the whole network stack. The main gotcha here is SQL Server express which will have TCP/IP disabled and will further be configured to not receive external requests (connections tab of the server properties in the management studio). This is configured from the SQL Server Configuration Manager:

If you right click on the TCP/IP protocol, choose properties and then select the second tab, you can see the network port being used by SQL Server.

Database Setup

If you are running multiple instances of SQL Server, then you can enter the servername\instancename in Database Setup, however you can’t do this in the portal configuration. In the portal configuration you must you the server name and the correct port number. Each instance of SQL Server will be using a unique port number.

When Accpac first supported SQL Server we only supported access by a pre-setup ODBC data source. This led to problems with people setting these up incorrectly, since there are quite a few settings that were possible. Then later on we allowed you to connect by just specifying the server name (or servername\instancename) in database setup. Then we would correctly setup an ODBC data source for you (usually servername32).  This then worked far more reliably. We left the ODBC option there so people already setup this way wouldn’t be affected, or if there was some reason that special configuration values were required, you could still set them. I don’t know of any such special configuration values, not do I know of any good reason to use an ODBC data source rather than the server name. So I would strongly recommend you always use the server name in Database Setup to configure your SQL Server companies.

Database Dump and Load

DBDump and DBLoad are two utilities that come with Accpac that allow you to easily dump databases into a database independent format and reload them. These allow you to easily move Accpac companies between locations or between different database servers. I blogged on these previously here. These are very handy utilities and have some interesting side effects like re-indexing and compacting databases (since they completely re-create and reload the database). However they aren’t as fast as the built in SQL Server backup and restore facilities.

64 Bit SQL Server

Although Sage ERP Accpac is a 32-bit application, it works just fine with the 64-bit version of SQL Server. You cannot directly load 32-bit DLLs into a 64-bit application, but this isn’t how SQL Server works. Accpac (or any other program) is never loaded into SQL Server’s process space. All communications is between two processes using a standard well defined protocol, so it doesn’t matter whether either is 32 or 64 bits. Generally you want to run the 64-version of SQL Server on a 64-Bit version of Windows Server 2008. This way you can get better performance by adding more memory (the 32-Bit versions don’t really use memory beyond 4gig).

One caveat is that the ODBC data sources used by Accpac are 32 bit ones and stored in the 32-bit part of the registry. To see them you have to use the 32-bit version of odbcad32.exe which you need to run from the strange location of: C:\Windows\SysWOW64\ODBCAD32.exe.

Network Bandwidth

If you are running in a standard workstation setup client/server configuration then each workstation communicates with SQL Server over the network. For best performance you want as good network connectivity as you can get, but generally a 100 Megabit connection is fine.

If you are running using Citrix or Terminal Server then if the SQL Server is also on the Terminal/Citrix server then it can use shared memory access which provides good performance. However this is tempered by how much memory and processor resources are required by the Accpac Business Logic, UIs and SQL Server. Generally once you get past a few users then it works better to separate the SQL Server over to a separate machine with a high speed connection to the Citrix/Terminal server, ideally having them both on a gigabit backbone. As we move to the Web, a Web Server will replace the Citrix/Terminal server and then the same advice would apply for the Web Server.

Another recommendation I’ve heard is to ensure you only use switches and not hubs and to beware that if you are using VOIP that you aren’t saturating your network bandwidth with phone traffic.

Disk Storage

I know a lot of people that swear by various DAS (Direct Attached Storage), NAS (Network Attached Storage) and SAN (Storage Area Network) solutions. But my own experience has been a bit negative with these. I find them very useful for storing large amounts of files, like virtual images, photos and such. But I find them un-reliable and often with performance problems when using them to store databases being directly accessed by a database server.  There are also now cloud services to have your database in the cloud and your application server on premise, but I tend to be skeptical of the performance and uptime of such services. However these all make great backup devices.

My own preference is to use RAID storage on the server. RAID 0 is the fastest, but doesn’t have redundancy (and so never recommend this). Then the other RAID levels offer various levels of redundancy, so if a drive fails, it can be replaced without losing any data. Have a look at the link: http://en.wikipedia.org/wiki/RAID. The decision of which to go with depends a bit on how much redundancy you want to how much you are willing to pay. RAID 5 is very redundant, but tends to suffer a bit in performance.

A recommendation we often give is to locate your LDF files to RAID 1 storage and your MDF and other IDX files to RAID 1+0 storage.

Even with the reliability of RAID, make sure you still backup your company databases whether mirroring to offsite servers, backing up to rotating external hard drives, DBDumping, etc. Also remember it’s a good idea to test that you can restore your backups. Remember that the value of your company database far exceeds the value of the equipment it is stored on.

Disk Mirroring

SQL Server supports a number of database mirroring options; meaning that you can have your SQL Server stream database changes to a mirror SQL Server. Besides acting as a hot standby in case of failure, you can use this as a reporting server. Often reporting through Crystal can put a fairly big strain on the SQL Server, and one way offload this is to configure users that only do read-only access via reporting to connect to the mirror server. This way a department that runs many reports won’t impact the main data entry that is going on.

Summary

These are just a few tips and tricks for dealing with Accpac and SQL Server. Generally as your database becomes larger and the number of users increases, so does the amount of care you need to take of your database and network infrastructure; but, for smaller installations generally you just need to ensure you take regular backups. Of course if you don’t want to deal with database administration at all, then you might want to consider using a cloud service like AccpacOnline.

Written by smist08

August 13, 2011 at 5:44 pm

Posted in sage 300

Tagged with , ,

Accpac and It’s Databases

with 15 comments

Sage ERP Accpac supports running using either Pervasive.SQL (http://www.pervasivedb.com/Database/Pages/home.aspx), Microsoft SQL Server (http://www.microsoft.com/sqlserver/2008/en/us/) or Oracle (http://www.oracle.com/us/products/database/index.html) Database Servers. This blog posting is going to talk a bit about how Accpac Accounting applications access the database, give some rationale for the way we do things and look at where we may go in the future.

Accpac Plus, the original character based product for first CP/M (http://en.wikipedia.org/wiki/CP/M) and then MS-DOS (http://en.wikipedia.org/wiki/MS-DOS) used a proprietary built-in database. When this product was developed there weren’t standard off the shelf packages to use. This was a simple ISAM based package that was fast and easy to install and configure. The limitations were that:

  • It didn’t support database transactions so if something failed the database was left corrupted.
  • It used record locking as its multi-user model, so once too many people were doing things, too many records would be locked and things would grind to a halt.
  • It didn’t have a server version, so relied on file system locks which were hard to support for all the different network operating system.
  • It didn’t support SQL and hence didn’t fully support ODBC or any other standard interface mechanism making it hard to use external tools like for reporting.

So when Accpac for Windows was developed, we made the decision to use standard off-the-shelf databases. We decided to base our new database support on transactions and to define a set of requirements that a database would need to support to work with Accpac.

CA-Accpac/2000 originally shipped supporting Btrieve 6.15, and then with version 2.5A we added support for Microsoft SQL Server 6. Later we added support for IBM DB2 and Oracle Databases. We can easily move Accpac data from one database to another via our Database Dump and Load utilities.

Along the way we’ve evolved our API and easily moved from database version to database version. Many third party tools integrate with Accpac through open database interfaces. We currently have a very robust and scalable solution.

Accpac Database API

Accpac is a three tier application with the business logic separated from the database. Our business logic objects (the real core of our Accounting Applications) communicate with the database via a common database API. Then we have database drivers that translate this API into calls to the database server currently being used. How the database driver accomplishes this is entirely up to the driver. Our SQL Server driver uses ODBC to communicate with the SQL Sever Native driver. Oracle uses the Oracle ODBC driver. For Pervasive we communicate with both the relational layer using SQL calls, and with the transactional layer making direct Pervasive API calls. The key point is that all this detail is invisible to the Business Logic. The application programmer creating our business logic objects, just needs to know our Accpac database API to do his job. Then all the differences in the databases are isolated to the Accpac database driver. If a new version of a database server is released and Accpac needs to be updated to support it, then its just a matter of providing a new database driver, without requiring any other application changes.

The Accpac database API includes a number of record oriented API calls, a number of meta-data calls and a number of set based operations. The record based APIs operate on one record at a time, such as read a record based on a supplied key, update that record. The meta-data calls return information about the structure of the database, things like get a list of all the tables in the database, get information on all the fields in a table. The set oriented API calls operate on a set of records at a time. Such as read all records where the description starts with A, or update all records changing any with customer number 1200 to 1400, or delete all records with status “deleted”.

Accpac Architecture Diagram

Reporting

Reporting works a bit differently. We use Crystal Reports as our reporting engine and then Crystal provides the mechanisms to have database independent reports. From Accpac the same Crystal Report is used for Pervasive.SQL, Oracle and SQL Server. We just point Crystal at the correct database and provide the necessary connection information and then let Crystal do the rest. We provide an Accpac reporting API which then handles the details of talking to Crystal. Originally this API was used to talk to CA-RET the Computer Associates Reporting tools, then it was converted to talk to Crystal’s crpe32.dll interface, later to Crystal COM’s interface. With Accpac 6.0A we are upgrading to Crystal Reports 2008 and will use the .Net interface for the VB UIs and the Java interface when running Crystal from the Web. Even though the way we do reporting has changed quite a bit over the years, this has been largely invisible to the applications which just talk to Accpac System Manager’s API and then let System Manager take care of the details.

The Accpac Financial Reporter accesses the database through the Accpac Business Logic and hence the regular Accpac Database API. Accpac Intelligence and Accpac Insight work in a similar manner to Crystal and provide their own database independent mechanisms to access the various databases. Most other reporting solutions also work this way.

Database Requirements

Accpac does have a number of minimum requirements for any database it supports. This includes things like number of fields per table, number of keys supported, record length and such. One key requirement is that the database supports database transactions. As long as a database meets these minimum requirements then it is possible to write a database driver for Accpac to use that database. Note that supporting SQL is not a requirement.

Database Transactions

At the core of Accpac’s database support are database transactions. Any changes made to the database have to be done inside a transaction. A transaction is a group of database operations that are either all applied to the database, or none are applied to the database. So the application never has to worry about half of them being committed to the database. A key differentiator of database servers is how well they handle transactions. The application rule for Accpac is that when the database is changed, operations must be grouped together into a transaction so that the database always has full application integrity. This includes things like the General Ledger never being out of balance. When G/L entries are posted, each entry is a transaction and the database goes from one in balance state to another. This means that if something crashes on either the server or workstation, or an unexpected error occurs then the database will not end up in a corrupt state. Database integrity is always maintained via database transactions.

Multi-User Model

Then our multi-user model builds on the transactioning foundation. Within Accpac we use optimistic concurrency (http://en.wikipedia.org/wiki/Optimistic_concurrency_control) as our multi-user model. If two users go to edit the same record, both users bring the record up on their screen at the same time, then they make some changes to various fields, then both hit save. Now both users started with the same record and made independent changes and both want to save, what should the system do? In the most basic optimistic concurrency model, the first person to save wins, their changes are committed to the database and the second user gets an error message the “Record was modified by another user”. With optimistic concurrency we can be a little more clever than this, and for the second user, we could re-read the record and see what the first user changed and if they are unrelated fields then combine our changes and re-save the record. We do this in a number of cases, but it requires a bit of application support to define what “un-related fields” means. The alternative to optimistic concurrency is to lock the record when the first user reads it, and not let the second user read the record at all. The second user would get a message saying that someone is editing the record and they can’t access it. This saves the second user possibly losing some work. However it imposes a huge multiuser penalty since if a lot of records are locked, then it is highly likely people are going to trip over the locked records all over the place and be blocked from doing work. They won’t be able to load the record just to view it. Accpac Plus used this model of multi-user processing and it restricted Accpac Plus to pretty much only having 25 users, since after that, the number of locks would just cause too many problems. Most systems with high numbers of users use optimistic concurrency.

So what does optimistic concurrency have to do with database transactions? Most document entry in Accpac involves updating and inserting many different database tables. What happens if you are save an O/E Order and you have updated 5 tables and then on the sixth you get an error? You can either re-read the record and reconcile the two sets of changes or you have to abort saving the order. If you have to abort saving the order, you need to do a transaction rollback to undo the changes to the first 5 tables. Fortunately with database transactions this is easy, and you never have to worry about corrupting the database if you encounter an error (multiuser or otherwise). So for complicated header detail entry, you can only really do optimistic concurrency if you are based on a transactioning model.

Stored Procedures and Database Scale-out

A common question is why we don’t use stored procedures? Besides the problems with portability across databases and the attendant lock-in to one solution, there are definite problems with scalability. Since Accpac is aimed at the mid-market, like most such products we don’t support arbitrary database scale-out. This means if you have trouble with the performance of the SQL Server, you just can’t add another SQL Server to solve the bottleneck. We of course do support many configurations with multiple database servers, but there are restrictions. However you can have as many application servers as you like. If you run workstation setup type client server then every workstation is an application server and takes a share in running the business logic. If you deploy via Terminal Server or Citrix, they you can add as many of these as you like (either dedicating users to specific servers or adding a load balancer). So we want to put as little load on the database server as possible, to allow users to use one database server for as long as possible. If we ran stored procedures on the database server then this puts considerable load on the server and makes the database server become a bottleneck quite quickly. People often point out that they are getting slow, but SQL Server is only say 15% utilized, and we say add another applications server. They say why not shift more work to the database server. The problem being that once the database server is full, you’re done. Adding application servers is easy.

We support a couple of configurations with multiple database servers. One is to split a big company into multiple divisional companies within Accpac then run each divisional company on a separate database server. You then combine the data for financial reporting using G/L Consolidations. Another configuration is to add a second database server that is a replicated copy of the primary. Then if you have a number of users that only do reporting then you point them to the replicated server (and you can have as many of these as you like). However you can only have one primary server where database updates occur. But often reporting is quite a load on the database server and moving this to another server can really help.

True database scale-out means you can add database servers at will, but this requires the application be structured quite differently. For instance for Order Entry you might have one server designated the primary, it would contain a small Order header table with the order number and a link to the database that actually contains the order. So as people perform Order Entry, each order only requires one simple insert on the primary server and the actual order is placed on one of the other servers in a round robin fashion. This then allows scale-out of the Order Entry process, since you can add as many of these slave database servers as you like. The complexity comes in for reporting, since now to generate a report on a range of orders, you have to consolidate the data from a number of database servers, greatly adding to the complexity.

The Future

Originally our next database target was going to be MySQL. One reason being to reduce the TCO for our customers, as Sage as a very good deal for distributing MySQL with our applications. Another reason is to implement SaaS down the road, where to be economical, we have to ensure that we don’t have to pay all the subscription revenue to other software companies. However since Oracle bought MySQL, it’s become very questionable whether MySQL will maintain its TCO advantages or whether Oracle will invest in it sufficiently to keep it competitive. So right now we are taking a wait and see approach to MySQL.

Another approach that is becoming popular are the so-called “NoSQL” databases (http://en.wikipedia.org/wiki/NoSQL) which stands for ‘Not only SQL’. These databases are all about database scale-out, but since they think about large scale scalability from the start, they tend to be very different from relational (SQL) databases. Often they load the entire database into memory, and will naturally partition the database over multiple servers in a redundant manner. Often these databases remove the “hard” parts of SQL to make data access faster and more scalable. One example of these is Google Fusion (http://googleresearch.blogspot.com/2009/06/google-fusion-tables.html).

Along these lines the Apache foundation has been developing the Hadoop platform (http://hadoop.apache.org/) along with the HBase database (http://hbase.apache.org/). This is a very ambitious project to provide the entire infrastructure necessary to build fully scalable large scale web based applications like a Facebook or Twitter.

Some of the ideas from the Web have a place in Accounting Applications. For instance quick natural language Google-like searches. If the database is partitioned over 5 servers then these server can each produce their part of the results in parallel and then they can be combined and returned. The key point being the parallelism employed to return possibly difficult queries quickly.

Some ideas are perhaps not going to be so popular. For instance when you refresh Facebook you often get different results each time you do it. Facebook sends out the query to a bunch of servers and then combines all the results returned in a short fixed amount of time and displays your page, but any results coming late from a slow server are ignored. Will accounting users appreciate getting most of the results quicker rather than all the results slower?

Database technology is the foundation for Accounting Applications like Sage ERP Accpac. Most Accounting Applications started life using flat file ISAM packages or home grown solutions. Then at some point made the jump to standard SQL databases. Certainly we’ve gotten a lot of mileage out of SQL, but as the amount of information in data warehouses and other corporate databases grows, where we join to data out on the internet, are we at the point where we are going to see the start of a transition to the next big thing in databases?

Written by smist08

July 10, 2010 at 4:23 pm

SQL Server, Performance and Clustered Indexes

with 2 comments

One of our engineers Ben Lu, was doing some analysis of why SQL Server was taking along time to find some dashboard type totals for statistical display. There was an index on the data, and their weren’t that many records that met the search criteria that needed adding up. It turned out that even though it was following an index to add these up, that the records were fairly evenly distributed through the whole table. This meant that to read the 1% or so of records that were required for the sum, SQL Server was actually having to read pretty much every data table in the table usually to get one record. This problem is a bit unique to getting total statistics or sums; if the data was fed to a report writer, the index would return enough records quickly to start the report going and then SQL Server can feed the records faster than a printer or report preview windows can handle. However in this case nothing displays till all the records are processed.

It turns out that SQL Server physically will store records in the order on disk or the index that is specified as a clustered index (and only 1 is required). In Accpac we designate the primary index as a clustered index as we found long ago that this speeds up most processing (since most processing is based off the primary index). However in this case the primary index was customer number, document number; and for the query were interested in processing all unpaid documents which are spread fairly evenly over the customers (at least in the customer database we were testing with).

We found that we could speed up this calculation quite a bit by added an additional segment to the beginning of the primary key. For instance without any changes, the query was taking 24 seconds on a large customer database. Putting the document date as the first segment reduced the query to 17 seconds (since hopefully unpaid invoices are near the end). However it turned out there were quite a few old unpaind invoices. Adding an “archive” segment to the beginning where we mark older paid invoices as archived reduced the query time to 4 seconds. We could also add the unpaid flag as the first index to get the same speed up. But were are thinking that perhaps the concept of a virtual archiving would have other applications other than this one query.

Anyway now that we understand what is going on, we can decide how to best address the problem. Whether adding the paid flag to the record, adding the date or adding an archive flag. Its interesting to see how SQL Server’s performance characteristics actually affect the schema design of your database, if you want to get best perforamance. And that often these choices go beyond just adding another index.

Written by smist08

June 7, 2009 at 9:05 pm