NoSQL for ERP?

Introduction

NoSQL databases are a class of database management systems that grew up with the Internet. Generally these refer to any database system that doesn’t use SQL as its query language (though some call them Not only SQL, meaning they support SQL and something else as well). This would make them any database system other than the standard databases. However these are usually associated with a class of database system that is intended to server large Internet companies. In this article we are only considering the large distributed databases and not the specialized in-memory databases. The goal of these is to be highly distributed and highly redundant. As a database scales you would be able to scale usage by just adding servers and have them add to the capacity of the system in a linear manner. You would like to have your database distributed all over the world, so it can be accessed from anywhere with a very low latency.

The main difference between traditional SQL databases and NoSQL databases comes down to the CAP Theorem which says out of the three things:

Consistency – roughly meaning that all clients of a data store get responses to requests that ‘make sense’. For example, if Client A writes 1 then 2 to location X, Client B cannot read 2 followed by 1.
Availability – all operations on a data store eventually return successfully. We say that a data store is ‘available’ for, e.g. write operations.
Partition tolerance – if the network stops delivering messages between two sets of servers, will the system continue to work correctly?

You have to choose two, you can’t have all three.

Then SQL databases choose consistency and partition tolerance and NoSQL databases choose availability and partition tolerance. The CAP Theorem is shown graphically below:

Generally then there has been a proliferation of NoSQL databases that all make a slightly different tradeoffs for the CAP theorem. If they can’t have all three does it make sense to have 2/3 of each perhaps? Generally there has been some great research that has produced amazing results, but this has led to a bit of a tower of Babel of database access languages.

Search Example

The classic example is Google’s BigData database used to power the Google search engine. How is it that when you type a search into Google, you immediately get back a page of search results instantly? How is it translating what you typed into some sort of database query and how is that query operating so quickly? How is Google handling millions of these queries coming from all over the world so quickly? What is the database schema that allows this magic?

The structure of the database is a really simple key-value store. The key is every possible search term that anyone could type in. The value is the table of web sites (and descriptions) to return for this search term. The database is called BigData because this value consisting of all the search results could have millions of hits for you to scroll through. There is no limit to the size of the value and there is no limit to the number of keys. Even so an indexed binary tree read of such a table would still be quite slow and this is Google’s big advancement with an algorithm called MapReduce that lets these simple key reads happen so quickly. The BigTable database spreads this huge table over many servers to operate and it has redundancy of having the database in many places at once. The MapReduce algorithm is a parallel algorithm in that parts of the search can be done by multiple servers in parallel.

Sound wonderful? Google published the MapReduce algorithm and there is an open source implementation of it called Apache HBase, so why doesn’t everyone use it? All the NoSQL databases provide a huge optimization on one of the things a SQL database does, but at a cost of making something else very slow, or eliminating something else entirely. In the case of BigTable, inserting and updating records in this table is very slow. But for Google search this is ok. Google operates a giant number of “web spiders” that are always out there traversing the web looking for new sites and updating the Google database of search results. These are running all the time and updating all the servers all the time. But these updates and inserts might take hours or days to happen, especially if you count the amount of time to find their way over all the thousands of servers involved. But Google doesn’t mind as long as it happens and is eventually consistent.

ERP Database Usage

Generally ERP packages rely very heavily on the tenants of a SQL type relational database. They require database transactioning and consistency (often referred to as ACID). ERP packages require general SQL type queries. They need good performance writing transactions to the database as well as reading and reporting from the database. I blogged a bit about Sage 300 ERP’s database usage here.

The NoSQL camp would contest these statements. For instance if I insert an Order, is it necessary that if I immediately list the Orders that I see it, or is it ok if it shows up in that list, say ten minutes later? Or do ERP packages believe they need all the benefits of SQL without weighing them against the other benefits of alternative systems? Often customers will prefer speed over customizability or functionality.

I like the way in SQL you can hit post and get back a response that a transaction is either all committed or all rolled back. However from working with SQL, I also know this is a huge cost, especially in our ability to scale out the database and is a definite limit on multi-user contention. For a large scale system, I’m really questioning the locking requirements that large transactions like entering an order place on the system. Remember the “availability” part of NoSQL databases will ensure the Order is entered eventually.

Another problem is adhoc SQL queries. ERP packages tend to like to do reporting and data inquiries using adhoc SQL queries. Usually this is a no no in the NoSQL world. Often in the NoSQL world, all queries must be designed ahead of time using very complicated MapReduce formula. The question is whether this is really necessary. If you have a really powerful Google like search ability built into the database, then do you really need to search the data with SQL queries? Is it that we just do this because we know how and that in fact real users would much rather use the Google method than the SQL method?

Hybrid Approach

Suppose you want the data query ability of a NoSQL database, but still want the SQL transaction integrity and consistency? One approach has been hybrid systems. Think of these as two databases in one where both a SQL and NoSQL database engine operate on your data. Perhaps a cleaner solution is to take a data warehousing approach and create your data in a SQL database, but then archive it to a NoSQL database for various querying and analysis. This is the foundation of many cloud based BI solutions that are out there. With this approach you get some of the benefits of both. What you lack is the high availability and scalability of the database for data entry and regular SQL type queries. But still this can be a way to get a bit of the best of both worlds.

Summary

I think it will be a while longer before NoSQL databases gain acceptance in the ERP world. However the amount of innovation going on in the NoSQL world is huge and great strides are being taken. I suspect we will start to see NoSQL begin to be used for non-traditional BI via data warehouses. Even adding Google like search to ERP usually involves adding a NoSQL database on the side. I think once this starts to happen then we will see more and more functionality move to the NoSQL side of things and then perhaps down the road somewhere we will be able to remove the SQL database. But the jury is still our as to whether this is a good long term goal.

Written by smist08

January 28, 2012 at 5:20 pm

Posted in Software Architecture

Tagged with accpac, cap theorem, database, nosql, relational, Sage, sage 300, sage 300 erp, sage accpac, sage erp accpac, sql

11 Responses

Subscribe to comments with RSS.

Great MaybeNoSQL post touching on many considerations for ERP.

Diving further into adhoc SQL queries and ERP vs using Google search. The productivity is too rich to not pursue. Consider how Google (and Bing) have moved to incremental indexing (use the search on forecast inquiries on part #1000 to seed the indexing) via to Percolator and Dremel cite @mlmilleratmit http://bit.ly/wvxIks

This approach ties back to the Hybrid Approach “thinking of two databases in one” with “both a SQL and NoSQL database engine”. This pattern is essential the architecture of Force.com. Designed for business developers by using inserts to trigger reindexing of ERP important data such as order totals and financial balances.

Bringing this back to innovation, I’ve encountered very entrepreneurial companies using Sage for small factories internationally, then using some ETL process to build a consolidated planning forecast and financials. This approach usually leads to purchasing a larger system, often after the IT architect leaves.

Michael Rys @sqlservermike illustrates a different architecture approach using SQL Azure to gain scalability by relaxing the cap theorem http://littlebigpass

Common to the new approaches seems to be moving the on-premise databases into the Cloud to remove the latency necessary to relax the cap theorem. Perhaps if we had NoSQL ERP databases on premise we could percolate updates to the BI layer in the Cloud, or its simpler to architect ERP for the Cloud.

clive boulton

February 1, 2012 at 7:18 pm

Reply
- Thanks, those are all really good points.
  
  Cheers
  
  smist08
  
  February 1, 2012 at 8:24 pm
  
  Reply
[…] starts to put us into the world of “Big Data”. I blogged about Big Data and ERP here. Currently we are gathering all the data into SQL Server, but this is already strained with only a […]

Sage Advisor PEP « Stephen Smith's Blog

March 25, 2012 at 12:53 am

Reply
Please check out Bangdb (http://www.iqlect.com) the new key value store which seems to be very fast in terms of IOPS for both read and write. The Bangdb will be in many flavors for ex; embedded, network, elastic cache/imdg. Being crash proof, with many configuration parameters, it can be tuned to operate in most suitable fashion for a given requirement.

cloud-dev

June 22, 2012 at 7:22 pm

Reply
Stephen — Bangdb brings to mind was / is Btrieve NoSQL for ERP?

clive boulton

June 22, 2012 at 10:55 pm

Reply
- I guess it is in that it is not SQL. But Btrieve is like SQL in that it gives you consistency but not availability and partition tolerance.
  
  smist08
  
  June 22, 2012 at 11:29 pm
  
  Reply
  - Update the sands are shifting:
    
    SAP HANA in memory database continues on with a SQL, yet the default storage and query optimizer is columnar, though row based is also supported. Columnar tables for ERP make lots of sense as we shift into smaller mobile devices or wearable computing like Google Glass.
    
    Perhaps not for SMB, but for mid-market ERP and really BigERP with multisite architectures NoSQL ERP properly using distributed databases could really simplify and add performance.
    
    This blog post from Adron with Basho on the Riak team gets into the use case. http://compositecode.com/2012/12/19/cloud-computing-and-distributed-computing-something-is-broken/#comments
    
    boulton clive
    
    December 21, 2012 at 6:59 am
  - Thanks for the link, that’s a good blog post. Certainly interesting now that innovation in databases is exploding again and we aren’t just doing slight minor tweaks to SQL databases anymore.
    
    smist08
    
    December 21, 2012 at 3:59 pm
You might want to check out the JugglingDB ORM http://jsdoc.info/1602/jugglingdb/ for both SQL & noSQL adapter capabilities (Built for RailwayJS but could be used readily with any node.js setup).

For example, you could do:

schema ‘redis’, url: process.env.REDISTOGO_URL, ->
define ‘Lead’
# other definitions for redis schema

schema ‘sqlite3’, url: process.env.SQLLITE3_URL, ->
define ‘Transaction’
# other definitions for mongoose schema

I love the idea, by the way!

Richard

July 16, 2012 at 4:51 pm

Reply
- oh thats embarrassing, I meant to suggest mongo instead of redis! redis would not be the best of places to store leads
  
  Richard
  
  July 16, 2012 at 4:52 pm
  
  Reply
[…] In the cloud, it’s quite easy to create virtual machines to run computations for you. It’s also quite easy to take advantage of various Big Data databases for storing large amounts of data (these are often referred to as NoSQL databases). […]

On Calculating Dashboards | Stephen Smith's Blog

February 14, 2015 at 7:25 pm

Reply

Stephen Smith's Blog

NoSQL for ERP?

11 Responses

Leave a comment Cancel reply

Pages

Search

Blogroll

Archives

Categories

Meta

Stephen Smith's Blog

NoSQL for ERP?

Share this:

Related

11 Responses

Leave a comment Cancel reply

Pages

Search

Blogroll

Archives

Categories

Meta