Archive for the ‘Performance’ Category
In investigating some performance problems being reported on some systems running Sage 300 ERP, it lead down the road to investigating Windows Bit-Rot. Generally Bit-Rot refers to the general degradation of a system over time. Windows has a very bad reputation for Bit-Rot, but what is it? And what can we do about it? Some people go so far as to reformat their hard disk and re-install the operating system every year as a rather severe answer to Bit-Rot.
Windows Bit-Rot is the tendency for a Windows system to get slower and slower over time. Becoming slower to boot, taking longer to log-in, and taking longer to start programs. Along with other symptoms like excessive and continuous hard disk activity when nothing is running.
This blog posting is going to look at a few things that I’ve run into as well as some other background from around the web.
I needed to investigate why on some systems printing Crystal reports was quite slow. This involved software we have written as well as a lot of software from third parties. On my laptop Crystal would print quite slowly the first time and then would print quickly on subsequent times. My computer is used for development and is full of development tools, so the things I found here, might be relevant to myself more than real customers. So how to see what is going on? A really useful program for seeing what is going on is Process Monitor (procmon) from Microsoft (from their SysInternals acquisition). This program will show you every access of the registry, the file system and the network. You can filter the display, in particular you can filter to monitor only a single program to see what it’s doing.
ProcMon yielded some very interesting results.
My first surprise was to see that every entry in HKEY_CLASSES_ROOT was read. On my computer which has had many pieces of software installed, including several versions of Visual Studio, several versions of Crystal Reports and several versions of Sage 300 ERP, the number of classes registered here was huge. OK, but did it take much time? Well the first time something that’s run that does this it seems to take several seconds, then after this its fast probably because the registry ends up cached in memory. It appears that several .Net programs I tried do this. Not sure why, perhaps just .Net wants to know all the classes in the system.
But this does mean that as your system gets older and you install more and more programs (after all why bother un-installing when you have a multi-terabyte hard drive?), starting these programs will get slightly slower and slower. So to me this counts as Bit-Rot.
So what can we do about this? Un-installing unused programs should help, especially if they use a lot of COM classes. Visual Studio being the big one on my system, followed by Crystal and Sage 300. This helps a bit. But there are still a lot of classes there.
Generally I think uninstall programs leave a lots of bits and pieces in the registry. So what to do? Fortunately this is a good stomping ground for utility programs. Microsoft used to have RegClean.exe, Microsoft discontinued support for this program, but you can still find it around the web. A newer and better utility is Ccleaner from Piriform. Fortunately the free version includes a registry cleaner. I ran RegClean.exe first which helped a bit, but then ran Ccleaner and it found quite a bit more to clean up.
Of course there is danger in cleaning your registry, so it’s a use at your own risk type thing (backing up the registry first is a good bet).
At the end of the day all this reduced the first time startup time of a number of program by about 10 seconds.
My second surprise was the number of calls to check Windows Group Policy settings. Group Policy is a rather ad-hoc mechanism added to Windows to allow administrators to control networked computers on their domain. Each group policy is stored in a registry key, and when Windows goes to do an operation controlled by group policy, it reads that registry key to see what it should do. I was surprised at the amount of registry activity that goes on reading and checking group policy settings. Besides annoying users by restricting what they can do on their computer, it appears group policy causes a general high overhead of excessive registry reading in almost every aspect of Windows operation. There is nothing you can do about this, but it appears as Windows goes from version to version, that more and more gets added to this and the overhead gets higher and higher.
You may not think that you install that many programs on your computer, so you shouldn’t have these sort of problems but remember many programs including Windows/Microsoft Update, Adobe Updater and such are regularly installing new programs on your computer. Chances are these programs are leaving behind unused bits of older versions that are cluttering up your file system and your registry.
Related to auto-updates, it appears that so many programs now run as icons in the task bar, install Windows services or install programs to run when you log-in. All of these slow down the time it takes you to boot Windows and to sign-in. Further many of these programs, say like Dropbox, will keep frequently polling their server to see if there are any updates. Microsoft has a good tool Autoruns for Windows which helps you see all the things that are automatically run and help you remove them. Again this can be a bit dangerous as some of them are necessary (perhaps like a trackpad utility).
Similarly it seems that everyone and their mother wants to install browser toolbars. Each one of these will slow down the startup of your browser and use up memory and possibly keep polling a server. Removing/disabling these isn’t hard, but it is a nuisance to have to keep doing this.
Hard Disk Fragmentation
Another common problem is hard drive fragmentation. As your system operates the hard disk becomes more and more fragmented. Windows has a de-frag program that is either scheduled to run when your computer is turned off or you never bother to run it by hand. It is worth de-fragging your hard drive from time to time to speed up access. There are third party de-frag programs, but generally I just use the one that comes built into Windows.
Related to the above problems, often un-installation programs leave odds and ends files around and sometimes it’s worth going into explorer (or a cmd prompt) and deleting folders for un-installed programs. Generally it reduces clutter and speeds up operations like reading all the folders under program files.
Dying Hard Drives
Another common cause of slowness is that as hard drives age, rather than just out right failing, often they will start having to retry reading sectors more. Windows can mark sectors bad and move things around. Hard drives seem to be able to limp along for a while this way before completely failing. I tend to think that if you hear your hard drive resetting itself fairly often then you should replace it. Or when you defrag if you see the number of bad sectors growing, then replace it.
After going through this, I wonder if the people that just reformat their hard drive each year have the right idea? Does the time spent un-installing, registry cleaning, de-fragging just add up to too much? Are you better off just starting clean each year and not worrying about all these maintenance tasks? Especially now that it seems like we replace our computers far less frequently, is Bit-Rot becoming a much worse problem?
Sage ERP Accpac Day End Processing is a rather simple form in I/C Periodic Processing. It only has Process and Cancel buttons and you are expected to run it after the close of business every day. But what does it do? The online help states:
Use this dialog box to:
- Update costing data for all transactions (unless you chose the option to update costing during posting).
- Produce general ledger journal entries from the transactions that were posted during the day (unless you do item costing during posting or create G/L transactions using the Create G/L Batch icon).
- Produce a posting journal for each type of transaction that was posted.
- Update Inventory Control statistics and transaction history.
Day End Processing also performs processing tasks for the Order Entry and Purchase Orders modules, if you have them:
- Processing transactions that were posted during the day in Order Entry and Purchase Orders.
- Activating and posting future sales orders and purchase orders that have reached their order date, and updating quantities on sales order and on purchase order.
- Removing quotes and purchase requisitions with expiration dates up to and including the session date for day-end processing.
- Updating sales commissions.
- Creating batches of Accounts Receivable summary invoices and credit notes from posted Order Entry transactions.
- Deleting completed transaction details if you do not keep transaction history.
- Updating statistics and history in Order Entry and Purchase Orders.
Below is the flow in and out of Inventory. All of these transactions affect costing and generate sub-ledger transactions.
The main purpose of Day End is to move a lot of processing away from data entry. Generally in a large Accpac installation you will have hundreds of people entering Orders, Invoices, POs, etc. They need to get their work done in the most efficient way possible. A person entering Orders from the CRM system doesn’t want to have to wait for A/R and G/L transactions to be created every time. What they care about is entering their Orders as quickly as possible. As a side benefit, Day End can batch all the transactions together, so rather than each Order creating a single G/L Batch, these can all be combined reducing the number of documents downstream.
However there are a number of misunderstandings and confusion about Day End. This blog posting is looking to cover a few topics around Day End to hopefully make things a bit clearer.
Over the years we have also changed the way Day End operates and added additional options to let you choose when things happen. Prior to version 5.2A, all the functions mentioned above had to be done during Day End and there were no alternatives. People became imaginative and ran macros to run Day End on a frequent basis. Why were people doing this? The main reason was that for many businesses, updating the costing in inventory only once a day is not sufficient, if costs are changing quickly you want this reflected right away. Another reason is that if you operate your business 24×7 then you don’t have an after-hours time period when you can run this. Plus for some people Day End was taking longer than the overnight period to complete.
In version 5.2A we introduced the feature of “Day End at Posting Time”. With this mode essentially whenever you posted a document in I/C, O/E or P/O, we would run Day End. Then you never had to run the original stand-alone Day End screen and you could operate 24×7 without running a separate Day End process and your I/C Costing was always up-to-date. This worked fine for some people (usually people with lower volume), but it caused problems for others. One was that it slowed down posting time of documents too much and impeded the productivity of people posting Orders and such. Second, when you have longer transactions, you now run a larger risk of multi-user conflicts (which are really quite annoying). Third, this resulted in a large number of G/L, A/R and A/P batches being produced. The usual workaround for people that really need this was to turn off other features that slow down posting such as “Keep Statistics” or “Keep History”. You can speed up posting quite a bit by turning off various options in the various Accounting Module’s Options screens. However you then lose use of these features and often which you do can be a difficult trade-off.
In version 5.5A we introduced the feature of “Costing during Posting”. Here when you post an I/C, O/E or P/O document we would run the Costing part of Day End, but not all the other parts. This turned out to be a good compromise. It didn’t noticeably slow down document posting and hence didn’t introduce more multi-user conflicts. So people could now keep their Costing data up-to-date without frequently running Day End. However you still need to run the Day End processing function at night to create all the sub-ledger documents, create audit history and other miscellaneous functions.
Now let’s go through each of the Day End functions in a bit more detail.
This is based on the Item Costing Method and is when the costing buckets are updated for the affected items in I/C. Basically depending on whether we are buying or selling:
- Incoming Transactions: Increases Total Quantity/Cost in Location Details/Costing Buckets
- Outgoing Transactions: Calculates/Removes Quantity/Cost from Location Details/Costing Buckets
Create Audit Information
Day End is responsible for populating all the various I/C, O/E and P/O audit history tables including:
- Posting Journals (all transactions)
- Item Valuation (all transactions)
- IC Transaction History (all transactions)
- IC Statistics (all transactions)
- IC Serial Numbers Audit (IC and OE Shipments)
- IC Sales Statistics (IC and OE Shipments)
- OE Sales History (OE transactions)
- OE Sales Statistics (OE transactions)
- OE Commissions Audit (OE Invoices/Credit Notes/Debit Notes)
- PO Purchase History (PO transactions)
- PO Payables Clearing Audit (PO transactions)
Generate GL/AR/AP/PM/FA Entries
Create all the various batches in the sub-ledgers. These include:
- GL Entries (all transactions)
- AR Entries (OE Invoices, Credit Notes, and Debit Notes)
- AP Entries (PO Invoices, Credit Notes, and Debit Notes)
- PM Entries (IC Shipments, OE Shipments, PO Purchase Orders / Receipts / Invoices / Returns / Credit Notes / Debit Notes)
- FA Entries (IC Internal Usages, PO Receipts)
Then there are a collection of miscellaneous functions that include:
- Activates future orders (OE)
- Deletes expired quotes that have not been activated (OE)
- Deletes completed orders if “Keep History” is OFF (OE)
- Activates future purchase orders (PO)
- Clears completed transactions if “Keep History” is OFF (PO)
Day End Processing Structure
Day End Processing (DEP) is structured as follows:
Note that DEP doesn’t process all transactions in chronological order.
Your best bet is to use “Costing during Posting”. This will give you real-time costing without badly affecting performance. As ERP packages address larger organizations there tend to be more and more of these types of operations. The more people doing data entry, the less you want them burdening the application and database servers to maximize productivity. Many tier one ERP packages split this into more parts, the advantage of this is that several (that aren’t adjacent) can be run at once without causing multi-user conflicts. Sage ERP Accpac has always been under pressure to combine things down into all-in-one operations that work well for smaller businesses, however if we are to move the operations suite into larger Enterprises then we will have to slice these processes up finer.
All modern products rely heavily on automated testing to maintain quality during development. Accpac follows an Agile development methodology where development happens in short three week sprints where at the end of every sprint you want the product at a shippable level of quality. This doesn’t mean you would ship, that would depend on Product Management which determines the level of new features required, but it means that quality and/or bugs wouldn’t be a factor. When performing these short development sprints, the development team wants to know about problems as quickly as possible so any problems can be resolved quickly and a backlog of bugs doesn’t accumulate.
The goal is that as soon as a developer checks in changes, these are immediately built into the full product on a build server and a sequence of automated tests are run on that build to catch any introduced problems. There are other longer running automated tests that are run less frequently to also catch problems.
To perform the continuous builds and to run much of our automated tests we use “Hudson” (http://wiki.hudson-ci.org/display/HUDSON/Meet+Hudson). Hudson is an extremely powerful and configurable system to continuously build the complete product. It knows all the project dependencies and knows what to build when things change. Hudson builds on many other tools like Ant (similar to make or nmake) and Ivy among others to get things done (http://java-source.net/open-source/build-systems). The key thing is that it builds the complete Accpac system, not just a single module. Hudson also has the ability to track metrics so we get a graph of how the tests are running, performance metrics, lines of code metrics, etc. And this is updated on every build. Invaluable information for us to review.
The first level of automated testing are the “unit tests” (http://en.wikipedia.org/wiki/Unit_testing). These are tests that are included with the source code. They are short tests where each test should run in under 1 second. They are also self contained and don’t rely on the rest of the system being present. If other components are required, they are “mocked” with tools like easy-mock (http://easymock.org/). Mocking is a process of simulating the presence of other system components. One nice thing about “mocking” is that it makes it easy to introduce error conditions, since the mocking component can easily just return error codes. The unit tests are run as part of building each and every module. They provide a good level of confidence that a programmer hasn’t completely broken a module with the changes they are checking in to source control.
The next level of automated tests are longer running. When the Quality Assurance (QA) department first tests a new module, they write a series of test plans, the automation team takes these and transfers as many of these as possible to automated test scripts. We use Selenium (http://en.wikipedia.org/wiki/Selenium_(software)), which is a powerful scripting engine that drivers an Internet Browser simulating actual users. These tests are run over night to ensure everything is fine. We have a subset of these call the BVT (Build Validation Test) that runs against every build as a further smoke test to ensure things are ok.
All the tests so far are functional. They test whether the program is functioning properly. But further testing is required to ensure performance, scalability, reliability and multi-user are fine. We record the time taken for all the previous tests, so sometime they can detect performance problems, but they aren’t the main line of defense. We use the tool JMeter (http://jakarta.apache.org/jmeter/) to test multi-user scalability. JMeter is a powerful tool that simulates any number of client’s accessing a server. This tool tests the server by generating SData HTTP requests from a number of workstations and bombarding the server with them. A very powerful tool. For straight performance we use VBA macros that access the Accpac Business Logic Layer to write large numbers of transaction or very large transactions, which are all timed to make sure performance is fine.
We still rely heavily on manual QA testers to devise new tests and to find unique ways to break the software, but once they develop the tests we look to automate them so they can be performed over and over without boring a QA tester to death. Automated testing is a vibrant area of computer science where new techniques are always being devised. For instance we are looking at incorporating “fuzz testing” (http://en.wikipedia.org/wiki/Fuzz_testing) into our automated test suites. This technique if often associated with security testing, but it is proving quite useful for generalized testing as well. Basically fuzz testing takes the more standard automated test suites and adds variation, either by knowing how to vary input to cause problems, or just running for a long time trying all possible combinations.
As we incorporate all these testing strategies into our SDLC (Software Development Life Cycle) we hope to make Accpac more reliable and to make each release more trouble free than the previous.
Web based applications can be quite complex and good tools are required to simulate heavy user loads, automate functional testing and diagnose/troubleshoot problems once they are discovered. We have to ensure that Accpac will keep running reliably under heavy user loads as users are performing quite a diverse set of activities. With Accpac 6 there are quite a few new components like the dashboard data servlets that need to have good performance and not adversely affect people performing other tasks. We want to ensure that sales people entering sales orders inside CRM are as productive as possible.
We are fundamentally using SData for all our Web Service communications. This involved a lot of building and parsing XML. How efficient is this? Fortunately all web based development systems are highly optimized for performing these functions. So far SData has been a general performance boost and the overhead of the XML has been surprisingly light.
Towards ensuring this goal we employ a number of open source or free testing tools. Some of these are used repeatedly run automated tests to ensure our performance goals are being met. Some are used to diagnose problems when they are discovered. Here is a quick list of some of the tools we are using and to what end.
Selenium (http://seleniumhq.org/) is a User Interface scripting/testing engine for performing automated testing of Web Based applications. It drives the Browser as an end user would to allow automated testing. We run a battery of tests using Selenium and as well as looking for functionality breakages, we record the times all the tests take to run to watch for performance changes.
JMeter (http://jakarta.apache.org/jmeter/) is a load testing tool for Web Based applications. It basically simulates a large number of Browsers sending HTTP requests to a server. Since SData just uses HTTP, we can use JMeter to test our SData services. This is a very effective test tool for ensuring we run under heavy multiuser loads. You just type in the number of users you want JMeter to simulate and let it go.
Fiddler2 (http://www.fiddler2.com/fiddler2/) is a Web Debugging Proxy that records all the HTTP traffic between your computer and the Internet. We can use this to record the number of network calls we make and the time each one takes. One cool thing Fiddler does is record your network performance and then calculates the time it would have taken for users in other locations or other bandwidths would have taken. So we get the time for our network, but also estimates of the time someone using DSL in California or a modem in China would have taken to load our web page.
Firebug (http://getfirebug.com/) is a great general purpose profiling, measuring, debugging tool for the Firefox browser. Although our main target for release if Internet Explorer, Firebug is such a useful tool, that we find we are often drawn back to doing our testing in Firefox for the great tools like Firebug that are available as add-ins.
Using these tools we’ve been able to successfully stress the Accpac system and to find and fix many bugs that have caused the system to crash, lockup or drastically slow down. Our automation group runs performance tests on our nightly builds and posts the results to an internal web site for all our developers to track. As we develop out all the accounting applications in the new SWT (Sage Web Toolkit), we will aggressively be developing new automated tests to ensure performance is acceptable and look to keep expanding the performance of Accpac.
I remember when SQL Server 7 was nearly ready for release, Microsoft research had a project to make a 1 terabyte database. Their project was to server up satellite images of anywhere on Earth. Basically they had 1 Terabyte of images, so that was their terabyte database. I didn’t think this was a realistic terabyte database since their weren’t that many records, just each one was an image and quite large. Anyway they put the thing up on the web as a beta, and as soon as a few people tried it, the whole thing collapsed under the load.
A year or two later Google launched Google Earth, which basically did the same thing, only more detailed. But Google Earth can easily handle the load of all the people around the world accessing it. Why the difference? Why could Google do this and Microsoft couldn’t? I think the main difference is that Microsoft hosted it on one single SQL Server and had no way to scale it besides beefing up the hardware at great expense. Whereas Google uses a massively distributed database running on many many servers all coordinated and all sharing and balancing the load. Google uses many low cost Linux based servers keeping costs down and performance high.
This week Google released Google StreetView for major Canadian cities including Vancouver, where I live. So I can virtually cruise around Vancouver streets with very good resolution including seeing my house and neighborhood. This is really amazing technology. Rather than just panning around a patchwork of satellite images, we are actually navigating in 3D around the world. Suddenly Google has produced a virtual model of the entire world at quite good photographic quality.
Think about the size of this distributed database with all these photos, plus all the data to allow them to be stitched together into 3D Views that you can navigate through. This is so far beyond Google Earth, it’s really amazing. Is this the first step to having a completely virtual alternate Earth? If you are wearing 3D goggles, will you be able to tell if they are transparent or viewing these images?
I think we are just seeing the first applications of what is possible with these giant distributed databases. I’m really looking forwards to seeing some really amazing and mind blowing applications in the future. The neat thing is that Google is starting to open source this database technology so others can use it. Are SQL databases just dinosaurs waiting to be replaced? What will be able to accomplish in our business/enterprise databases and data warehouses one we start apply and using this technology?
One of our engineers Ben Lu, was doing some analysis of why SQL Server was taking along time to find some dashboard type totals for statistical display. There was an index on the data, and their weren’t that many records that met the search criteria that needed adding up. It turned out that even though it was following an index to add these up, that the records were fairly evenly distributed through the whole table. This meant that to read the 1% or so of records that were required for the sum, SQL Server was actually having to read pretty much every data table in the table usually to get one record. This problem is a bit unique to getting total statistics or sums; if the data was fed to a report writer, the index would return enough records quickly to start the report going and then SQL Server can feed the records faster than a printer or report preview windows can handle. However in this case nothing displays till all the records are processed.
It turns out that SQL Server physically will store records in the order on disk or the index that is specified as a clustered index (and only 1 is required). In Accpac we designate the primary index as a clustered index as we found long ago that this speeds up most processing (since most processing is based off the primary index). However in this case the primary index was customer number, document number; and for the query were interested in processing all unpaid documents which are spread fairly evenly over the customers (at least in the customer database we were testing with).
We found that we could speed up this calculation quite a bit by added an additional segment to the beginning of the primary key. For instance without any changes, the query was taking 24 seconds on a large customer database. Putting the document date as the first segment reduced the query to 17 seconds (since hopefully unpaid invoices are near the end). However it turned out there were quite a few old unpaind invoices. Adding an “archive” segment to the beginning where we mark older paid invoices as archived reduced the query time to 4 seconds. We could also add the unpaid flag as the first index to get the same speed up. But were are thinking that perhaps the concept of a virtual archiving would have other applications other than this one query.
Anyway now that we understand what is going on, we can decide how to best address the problem. Whether adding the paid flag to the record, adding the date or adding an archive flag. Its interesting to see how SQL Server’s performance characteristics actually affect the schema design of your database, if you want to get best perforamance. And that often these choices go beyond just adding another index.