Fundamentals of Storage Systems – Capturing IO Patterns

We often take the advice given to us on forums or in articles at face value. Even though the authors almost always say things like “your mileage may vary” or “may not apply to your situation” people still assume it is the gospel. Sometimes it is lack of experience. Other times it is just lack of knowledge on how to verify these things on your own. In this article I’m going to give you a tool to look at what SQL Server is doing at the disk level and allow you to make better decisions on how to configure your underlying disks.

The Basics

There are several things you need to know about how SQL Server accesses the database files and the implications of that before you can construct a proper testing methodology.

http://technet.microsoft.com/en-us/library/cc966500.aspx covers the basics. There are a few things I will highlight here.

ACID and WAL

ACID (Atomicity, Consistency, Isolation, and Durability) is what makes our database reliable. The ability to recover from a catastrophic failure is key to protecting your data.

WAL (Write-Ahead Logging) is how ACID is achieved. Basically, the log record must be flushed to disk before the data file is modified.

Stable Media

Stable media isn’t just the disk drive. A controller with a battery backed cache is also considered stable. Since SQL Server can request blocks as big as 64KB make sure your controller can handle that block size request in cache. Some older controllers only do a 16KB block or smaller.

FUA (Forced Unit Access)

With the requirement of stable media SQL Server creates and opens all files with a specific set of flags. FILE_FLAG_WRITETHROUGH tells the underlying OS not to use write caching that isn’t considered stable media. So, the local disk cache is normally bypassed. Not all hard drives honor the flag though, Some SATA/IDE drives ignore it. Usually, the drive manufacturer provides a tool to turn off write caching. If you are using desktop drives in a mission critical situation be aware of the potential for data loss. FILE_FLAG_NO_BUFFERING tells the OS not to buffer the file ether. At this point the only cache available will be the battery backed or other durable cached on the controller.

File Access

SQL Server uses asynchronous access for data and log files. This allows IO request to queue up and use the IO system as efficiently as possible. The main difference between the two are SQL Server will try and gather writes to the data file into bigger blocks but the log is always written to sequentially.

All of these rules apply to everything but tempdb. Since tempdb is recreated at restart every time recoverability isn’t an issue.

SQL Server data access patterns

Searching around you will find these generalities about SQL Server’s IO patterns

Log Writes

Sequential 512 bytes to 64KB

Data File Read/Writes

8KB

Read ahead – more important to Enterprise Edition

8KB to 125KB

Bulk Insert

8KB to 128KB

Create Database

512 byte – full initialize on log file only.

Backup Sequential Read/Write

1 MB

Restore Sequential Read/Write

64K

DBCC – CHECKDB

Sequential Read 8K – 64K

DBCC – DBREINDEX

(Read Phase) Sequential Read (see Read Ahead)

DBCC – DBREINDEX

(Write Phase) Sequential Write

Any multiple of 8K up to 128K

DBCC – SHOWCONTIG

Sequential Read 8K – 64K

Now that we have an idea of what SQL Server is suppose to be doing its time to verify our assumptions.

Capturing IO activity

There are a few tools that will allow you to capture the file activity at the system level. Process Monitor is a free tool from Microsoft that I will use to collect some base line information. In it’s standard configuration Process Monitor captures a ton of stuff and uses the page file to spool the info to. So, before we begin we need to change the default configuration.

ProcessMon1

Capturing IO data using process monitor.

Filter to apply

process is sqlservr.exe
Operation is Read
Operation is Write

ProcessMon2

Columns to choose.

ProcsessMon5

Process Name
PID
PATH
Detail
Date & Time
Time of Day
Relative Time
Duration
TID
Category

Change Backing File.

ProcessMon3

The maximum number of events it will capture is 199 million. This is enough on my system to capture 12 hours of activity easily. Once we have a good sample you can save it off as an XML file or CSV. Choosing CSV it is pretty easy to import the data into SQL Server using SSIS or your tool of choice.

ProcessMon4

I import the CSV into a raw table first.

Raw table to import into.

CREATE TABLE [SQLIO].[dbo].[pm_imp] (
  [Process Name]  VARCHAR(12),
  [PID]           SMALLINT,
  [Path]          VARCHAR(255),
  [Detail]        VARCHAR(255),
  [Date & Time]   DATETIME,
  [Time of Day]   VARCHAR(20),
  [Relative Time] VARCHAR(50),
  [Duration]      REAL,
  [TID]           SMALLINT,
  [Category]      VARCHAR(6)
)

Next I create a cleaner structure with some additional information separated from the detail provided.

SELECT
[Process Name]       AS ProcessName,
PID                  AS ProcessID,
PATH                 AS DatabaseFilePath,
Detail,
[Date & Time]        AS EventTimeStamp,
[Time of Day]        AS TimeOfDay,
[Relative Time]      AS RelativeTime,
[Duration],
TID                  AS ThreadID,
Category             AS IOType,
substring(detail,charindex('Length: ',detail,0) + 8
,(charindex(', I/O',detail,0) - charindex('Length:',detail,0) - 8)) AS IOLength,
CASE reverse(left(reverse(PATH),3))
    WHEN 'mdf'
    THEN 'Data'
    WHEN 'ndf'
    THEN 'Data'
    WHEN 'ldf'
    THEN 'Log'
  END AS FileType
INTO   SQLIOData
FROM
  dbo.pm_imp
WHERE  reverse(left(reverse(PATH),3)) IN ('mdf','ndf','ldf')

Once we have the data cleaned up a bit we can now start doing some analysis on it.
Queries for interesting patterns.

This query gives us our read and write counts.

SELECT
count(*) IOCount
,IOType
FROM
SQLIOData
GROUP BY IOType
ORDER BY count(*) DESC
 

This one shows us the size of the IO and what type of operation it is.

SELECT
count(*) IOCount
,IOLength
,IOType
FROM
SQLIOData
GROUP BY IOLength,IOType
ORDER BY count(*) DESC

This is a look at activity by file type data or log.

SELECT
count(*) IOCount,
FileType
FROM
SQLIOData
GROUP BY
FileType
ORDER BY
count(*) DESC

Since we are capturing the thread id we can see how many IO’s by thread.

SELECT
count(*) IOCount,
ThreadID
FROM
SQLIOData
GROUP BY
ThreadID
ORDER BY
count(*) DESC

We can also look at IO types, sizes and count by file helping you see which ones are hot.

SELECT
count(*) IOCount,
databasefilepath,
iotype,
iolength
FROM
SQLIOData
WHERE
databasefilepath LIKE '%filename%'
GROUP BY
databasefilepath,
iotype,
iolength
HAVING   count(*) > 10000
ORDER BY databasefilepath,
count(*) DESC

Now that we see exactly what our IO patterns are we can make adjustments to the disk subsystem to help scale it up or tune it for a particular pattern.

This is just another tool in your tool belt. This is a supplement to using fn_virtualfilestats to track file usage. I use it to get a better idea of the size of the IO’s being issued.Using these two tools I can see the size of the IO’s in a window of time that is reported by my fn_virtualfilestats capture routine.

Always verify your assumptions, or advice from others.

Series To Date:
  1. Introduction
  2. The Basics of Spinning Disks
  3. The System Bus
  4. Disk Controllers, Host Bus Adapters and Interfaces
  5. RAID, An Introduction
  6. RAID and Hard Disk Reliability, Under The Covers
  7. Stripe Size, Block Size, and IO Patterns
  8. Capturing IO Patterns – You are here!
  9. Testing IO Systems