Stephen Smith's Blog

Musings on Machine Learning…

Archive for April 2016

Synchronizing Data with Sage 300

with 3 comments

Introduction

Often there is a need to synchronize data from an external source with Sage 300 data. For instance, with Sage CRM we want to keep the list of customers synchronized between the two systems. This way if you enter a new customer in one, it gets automatically added to the other. Similarly, if you update say an address in one, then it is updated in the other. Along the way there have been various techniques to accomplish this. In this blog post I’ll cover how this has been done in the past and present a new idea on how to do this now.

In the past there have been a number of constraints, such as supporting multiple databases like Pervasive.SQL, SQL Server, IBM DB2 and Oracle. But today Sage 300 only supports SQL Server. Similarly, some suggested approaches would be quite expensive to implement inside Sage 300. Tied closely to synchronization is the desire by some customers for database auditing which we will also touch upon since this is closely related.

Sage CRM Synchronization

The first two-way data synchronization we did was the original Sage 300 to Sage CRM integration. The original integration used sub-classed views to capture changes to the data and to then use the Sage CRM API to make the matching change to Sage CRM. Sage CRM then did something similar and would write the change back to Sage 300 via one of its APIs.

The main problem with this integration technique is that its fairly brittle. You can configure the integration to either fail, warn or ignore errors in the other system. If you select error, then both systems need to be running in order for any one to use either. So if Sage CRM is offline, then so is Sage 300. If you select warn or ignore then the record will be updated in one system and not the other. This will put the databases out of sync and a manual full re-sync will need to be performed.

For the most part this system works pretty well, but isn’t ideal due to the trade off of either requiring both systems always be up, or having to run manual re-syncs every now and then. The integration is now built into the Sage 300 business logic; so sub-classed Views are no longer used.

The Sage Data Cloud

The intent of the Sage data cloud was to synchronize data with the cloud but not to require the on premise accounting system be always online. As a consequence, it couldn’t use the same approach as the original Sage CRM integration. In the meantime, Sage CRM added support for vector clock synchronization via SData. The problem with SData synchronization was that it was too expensive to retrofit that into all the accounting packages that needed to work with the Sage Data Cloud.

The approach the Sage Data Cloud connector took was to keep a table that matched the accounting data in a separate database. This table just had the key and a checksum so it could tell what changed in the database by scanning it and re-computing the checksums and if they didn’t match then the record had been modified and needed synching.

This approach didn’t require manual re-synchs or require both systems be online to work. However, it was expensive to keep scanning the database looking for changes, so they may not be reflected terribly quickly or would add unnecessary load to the database server.

What Does Synchronization Need?

The question is then what does a modern synchronization algorithm like vector clock sync require to operate efficiently? It requires the ability to ask the system what has changed since it last ran. This query has to be efficient and reliable.

You could do a CSQRY call and select based on the audit stamps that are newer than our last clock tick (sync time). However, the audit stamp isn’t and index and this query will be slow on larger tables. Further it doesn’t easily give you inserted or deleted records.

Another suggested approach would be to implement database auditing on the Sage 300 application’s tables. Then you get a database audit feature and if done right, you can use this to query for changed records and then base a synchronization algorithm on it. However, this has never been done since it’s a fairly large job and the ROI was never deemed worthwhile.

Another approach that is specific to SQL Server would be to query the database transaction logs. These will tell you what happened in the database. This has a couple of problems namely the queries on the transaction logs aren’t oriented around since the last sync, and so are either slow or return too much information. Further SQL Server manages these logs fairly aggressively so if your synchronization app was offline for too long, SQL Server would recycle the logs and the data wouldn’t be available anymore. Plus, this would force everyone to manage logs, rather than just have them truncated on checkpoint.

SQL Server 2008 to the Rescue

Fortunately, SQL Sever 2008 added some nice change tracking/auditing functionality that does what we need. And fortunately Sage 300 only supports SQL Server so we can use this functionality. There is a very good article on MSDN about this and how it applies to synchronization here. Basically the SQL Server team recognized that both data synchronization and auditing were important and quite time consuming to add at the application level.

Using this functionality is quite simple, you need to turn on change tracking for the database, then you need to turn on change tracking for each table you want to track changes for.

trackchanges

Then there is a SQL function that you can select from to get the data. For instance, I updated a couple of records in ARCUS and then inserted a new one and the result is shown.

changetable

This is giving me the minimal information, which is all I require for synchronization since I really only need to know which records to synchronize and then can get the full information form the main database table.

If you want to use this to audit all the changes in your database, then there are more options you can set to give you more complete information on what happened.

Summary

If you are writing an application that needs to synchronize data with Sage 300 (either one way or two way), consider using these features of SQL Server since you can add them externally to Sage 300 without affecting the application.

Similarly, if you are writing a database logging/auditing application you might want to look at what Microsoft has been adding to SQL Server starting with version 2008.

 

Written by smist08

April 18, 2016 at 10:09 pm

Some Thoughts on Artificial Intelligence

with 2 comments

Introduction

A few years ago I posted a blog post on the Singularity. This is the point where machine intelligence surpasses human intelligence and all predictions past that point are out the window. Just recently we’ve seen a number of notable advances in AI as well as a number of instances where it has gone wrong. On the notable side we have Google’s DeepMind AlphaGo program beat the world champion at Go. This is remarkable since only recently did IBM’s Deep Blue program beat the world champion at Chess and the prevailing wisdom was that Go would be much harder than Chess. On the downside we have Microsoft’s recent Tay chat bot which quickly became a racist rather tainting Microsoft’s vision as presented at their Build Conference.

So this begs the question, are computers getting smarter? Or are they just becoming more computationally better without any real intelligence? For instance, you can imagine that Chess or Go just require sufficient computational resources to overcome a poor old human. Are chat bot’s like Tay really learning? Or are they just blindly mimicking back what is fed to them? From the mimicking side they are getting lots of help from big data which is now providing huge storehouses or all our accumulated knowledge to draw on.

In this article I’ll look at a few comparisons to the brain and then what are some of the stumbling blocks and where might true intelligence emerge.

robotthink

Hardware

Let’s compare a few interesting statistics of humans to computers. Let’s start with initialization, the human genome contains about 3.2Gigabytes of information. This is all the information required to build a human body including the heart, liver, skin, and then the brain. That means there is very little information in the genome that could be dedicated to providing say an operating system for the brain. An ISO image of Windows 10 is about 3.8Gigabytes, so clearly the brain doesn’t have something like Windows running at its core.

The human brain contains about 86 billion neurons. The Intel i7 processor contains about 1.7 billion transistors. Intel further predicts that their processor will have as many transistors as the brain has neurons by 2026. The neuron is the basic computational logic gate in the brain and the transistor is the basic computational logic gate in a computer. Let’s set aside the differences. A neuron is quite a bit more complicated than a transistor, it has many more interconnections and works in a slightly analog fashion rather than being purely digital. However, these differences probably only account for one order of magnitude in the size (so perhaps the computer needs 860 billion transistors to be comparable). Ultimately though these are both Turing machines, and hence can solve the same problems as proved by Alan Turing.

To compare memory is a bit more difficult since the brain doesn’t separate memory from computation like a computer. The neurons also hold memories as well as performing computations.  Estimates on the brains memory capacity seem to range from a few gigabytes to 2.5petabytes. I suspect its unlike to be anywhere close to 1petabyte (100Gigabytes). Regardless it seems that computers currently can exceed the memory of the brain (especially when networked together).

From a speed point of view, it would appear that computers are much faster than the brain. A neuron can fire about 200 times per second, which is glacial compared to a 3GHz processor. However, the brain makes up for it through parallel processing. Modern computers are limited by the Von Neumann architecture. In this architecture the computer does one thing at a time, unlike the brain where all (or many) neurons are all doing things at the same time. Computers are limited to Von Neumann architectures because these make it easier to program. Its hard enough to program a computer today, let alone if it didn’t have the structure this architecture imposes. Generally, computer parallel processing is very simple either through multiple cores or through very specific algorithms.

Sobo_1909_624

Learning Versus Inherited Intelligence

From the previous comparisons, one striking data point is the size of the human genome. In fact, the genome is quite small and doesn’t have enough information to seed the brain with so called inherited intelligence. Plus, if we did have inherited intelligence it would be more aligned to what humans needed to survive hundreds of thousands of years ago and wouldn’t say tell you how to work your mobile phone. What it appears is that the genome defines the structure of the brain and the formulae for neurons, but doesn’t pre-program them with knowledge, perhaps with just some really basic things like when you feel hungry, you should eat and to be afraid of snakes. This means nearly all our intelligence is learned in our earliest years.

This means a brain is programmed quite differently from a computer. The brain has a number of sensory inputs, namely touch, sight, hearing, smell and taste and the with the help of adult humans, it learns everything through these senses. Whereas a computer is mostly pre-programmed and the amount of learning its capable of is very limited.

It takes many years for a human to develop, learn language, basic education, physical co-ordination, visual recognition, geography, etc. Say we want a computer with the level of intelligence of a ten-year-old human; then, do we need to train a computer for ten years to become comparable? If so this would be very hard on AI researchers needing ten years to test each AI to see if it works.

Complexity Theory

It seems that both computers and the brains are both Turing machines. All Turing machines can solve the same problems, though this says nothing about how long they may take. Computer’s logic elements are far faster than neurons, but suffer from being organized in a von Neumann architecture and thus operate very serially as opposed to the brain that does everything in parallel. But as such both are programmed from very simple logic elements with a certain small amount of initial programming. So where does self-aware intelligence arise from?

I believe the answer comes from complexity and chaos theory. When you study dynamic systems with increasing complexity like studying transitions to turbulence in fluid mechanics or studying more and more complicated systems like cellular automation or fractals, you find there are emergent stable solutions (sometimes called strange attractors) that appear that couldn’t be predicted from the initial conditions. With brains having billions of neurons all performing simple logic operations, but in parallel this is a very complex system. There is guaranteed to be emergent stable behaviours that evolution has adjusted into becoming our intelligence.

What’s Needed

Our computers aren’t quite at a truly self aware intelligent state yet (at least that I know of, who knows what might be happening in a secret research lab somewhere). So what is needed to get over the hump and to create a true artificial intelligence? I believe we need two things, one on the hardware side and the other on the software side.

First we need the software algorithm that the brain uses to learn from its environment. This must be fairly simple and it must apply to a wide range of inputs. There isn’t enough data in the human genome for anything else. I think we are getting closer to this with algorithms around the Hidden Markov Model that are currently being used in machine learning. One key part of this algorithm will be how it can be adapted to scale by running millions of copies in parallel.

Second we need the hardware to run it. This is a bit controversial, since one school of thought is that once we have the correct algorithm then we can run it on standard hardware, since its raw processing speed will overcome its lack of parallel processing. Even hardware like GPUs with hundreds of cores aren’t anywhere as parallel as the brain. Until we figure out this ideal learning algorithm, we won’t know the exact computer architecture to build. There are people building computer hardware that are very parallel and more precisely model neurons, but others feel that this is like building an aeroplane by exactly simulating birds flapping their wings.

Summary

We’ve solved a lot of difficult problems with Artificial Intelligence computer algorithms. We now have self-driving cars, robots that can walk over rugged terrain, computer world chess and go champions and really good voice and picture recognition systems. As these come together, we just need a couple more breakthroughs to achieve true intelligence. Now it seems every now and then we predict this is just around the corner and then we get stuck for a decade or so. Right now we are making great progress and hopefully we won’t hit another major roadblock. We are certainly seeing a lot of exciting advances right now.

Written by smist08

April 1, 2016 at 9:09 pm

Posted in Artificial Intelligence

Tagged with , , ,