Friday, April 13, 2007

Storage – SQL Server I/O

SQL Server and the Storage Subsystem

So how does all this fit in with SQL Server? Well, you need to make sure your servers have the fastest drives possible, and have the proper RAID configuration.

At a basic level, data is stored in files. There are two types of files used in SQL Server. The first is the database file type, which normally has an extension of .mdf. There may be one or more of these for your database, as we’ll discuss in a moment. The second type of file is for the logs, normally with an extension of .ldf. Data is written first to the log file; then SQL Server writes the data to the database file. This is called a write-ahead log.

The log is written to in a sequential fashion, since all writes, deletes, or edits normally go through the log. All reads normally go through the database. For that reason, it’s best to separate the physical drives used for the logs and the database. Your server will then be able to write log entries and read data at the same time.

You can have multiples of these two types of files. That is, you can have more than one data file, and more than one log file. You first create the extra files, and then you create (or move) objects such as tables or indexes to those files. The speed increase becomes really clear when you use separate drives. Let’s look at a concrete example.

Suppose you have a database with a heavily used table, which also contains a fairly wide (many columns) index. That makes the table and index compete for the same access on the hard drive, slowing down the total access time. In this case, it makes sense to separate out the physical files that the table and the index use, and to place them on separate drives.

Optimization goes a bit further than separate files. SQL Server also supports Filegroups, which is a logical grouping of files into a single name. Filegroups are normally used for backups and allocations.

So how does SQL Server allocate and store that data? SQL Server’s storage is arranged in pages and extents. A page is 8KB of data (128 pages per megabyte), and an extent is 8 pages (16 extents per megabyte). A row can’t be larger than 8060 bytes. If you do the math, you’ll notice that 8060 isn’t 8KB. That’s because there is some overhead involved in storing the header.

The database can be set to grow automatically, by either a percentage of the size of the database or a set number of megabytes. I normally choose the percentage, since as the database grows larger it has to reallocate space less often. Your mileage will vary.

Databases can also shrink automatically. I don’t use this setting, because I’ve noticed a DBCC command running throughout the day to accomplish the task. I normally just let my maintenance plans take care of shrinking the files during maintenance. Also, shrinking the file almost guarantees that the indexes will become fragmented, which is a bad thing.

As another consideration, it’s best to have as many spindles in a database drive setup as possible. Microsoft Windows and SQL Server both allocate threads from the operating system based on the number of physical drives you have, so the more the better. Also, separating your files (based on the read/write patterns they have) allows the drives to be spinning at the same time for a write as a read. That way the drive doesn’t have to wait until a write completes to do a read, or visa-versa.

So to recap, the way you want to arrange your storage is highly dependent on the type of attachment and drives that you have. At a minimum, I recommend that you split the operating system and the program files for SQL Server from the page file onto separate physical drives (not just drive letters), place the database files on another physical drive, and use yet another physical drive for the log files. Even better is to use another physical drive for the indexes that you set up, so that the indexes can update their values as the table data is written.

No comments: