Stephen Smith's Blog

Musings on Machine Learning…

Posts Tagged ‘Clustered Indexes

SQL Server, Performance and Clustered Indexes

with 2 comments

One of our engineers Ben Lu, was doing some analysis of why SQL Server was taking along time to find some dashboard type totals for statistical display. There was an index on the data, and their weren’t that many records that met the search criteria that needed adding up. It turned out that even though it was following an index to add these up, that the records were fairly evenly distributed through the whole table. This meant that to read the 1% or so of records that were required for the sum, SQL Server was actually having to read pretty much every data table in the table usually to get one record. This problem is a bit unique to getting total statistics or sums; if the data was fed to a report writer, the index would return enough records quickly to start the report going and then SQL Server can feed the records faster than a printer or report preview windows can handle. However in this case nothing displays till all the records are processed.

It turns out that SQL Server physically will store records in the order on disk or the index that is specified as a clustered index (and only 1 is required). In Accpac we designate the primary index as a clustered index as we found long ago that this speeds up most processing (since most processing is based off the primary index). However in this case the primary index was customer number, document number; and for the query were interested in processing all unpaid documents which are spread fairly evenly over the customers (at least in the customer database we were testing with).

We found that we could speed up this calculation quite a bit by added an additional segment to the beginning of the primary key. For instance without any changes, the query was taking 24 seconds on a large customer database. Putting the document date as the first segment reduced the query to 17 seconds (since hopefully unpaid invoices are near the end). However it turned out there were quite a few old unpaind invoices. Adding an “archive” segment to the beginning where we mark older paid invoices as archived reduced the query time to 4 seconds. We could also add the unpaid flag as the first index to get the same speed up. But were are thinking that perhaps the concept of a virtual archiving would have other applications other than this one query.

Anyway now that we understand what is going on, we can decide how to best address the problem. Whether adding the paid flag to the record, adding the date or adding an archive flag. Its interesting to see how SQL Server’s performance characteristics actually affect the schema design of your database, if you want to get best perforamance. And that often these choices go beyond just adding another index.

Written by smist08

June 7, 2009 at 9:05 pm