Category Archives: IO

Fundamentals of Storage Systems – Testing IO Systems

12/03/2009 – UPDATE! There were a couple of bugs in the SQLIOCommandGenerator new SQLIOTools.zip has been updated.

I often tell people one of the greatest things about SQL Server is that anyone can install it. I also tell people what the worst things about SQL Server is that anyone can install it. Microsoft fostered a “black-box” approach to SQL Server in 7.0 and 2000. Thankfully, they are reversing this course. As a follow-on to my last article, capturing I/O patterns, we will take a quick look at building some synthetic tests based on those results. There are several tools on the market test I/O systems, some of them free some of the not. SQLIO has been around for several years. There are lots of good articles already on the web describing various uses for this tool.SQLIO was specifically designed to test the limits of your I/O system at different workloads. The problem is people tend to run this tool, will look at the best results, and assume that they will see the same results when the server goes live. But, without understanding your current workloads that is an unreasonable expectation at best. What ends up happening, is a misconfigured I/O system, lots of headaches, with no idea why the system performs so poorly.

I always advocate testing new systems before they go into production. I also understand that it always isn’t an option. Having found myself in that exact situation recently, I’ve decided to take my own advice and pull the new storage off-line to do the proper testing. I’m also taking this opportunity to refine my testing methodology and gather as many data points before the system goes live.

The Test Scripts

With my IO patterns in hand I set out to build a couple of little tools to help me generate all the test scripts and manage the data. As usual, I built these as command line tools since I have no skill at all with GUI’s. It is all in C# and I will be posting them up to Codeplex. You can download the tools here SQLIOTools.zip, this zip has the two tools, they are beta and don’t have a ton of error checking built into them yet. The first tool, SQLIOCommandGenerator does just that, generates the batch file that has all the commands. I does depend on the SQLIO.exe being in the same directory as well as having already defined a parameter file for it to use.

params.txt

X: S Q L I O _testfile0.dat 8 0x0 150240

The first parameter is the test file name that SQLIO will create on start up or use if it already exists. Second is the number of threads that will access that file. Third is the affinity mask. Fourth is the file size in megabytes. Make sure and size the file large enough to be representative of a real database you would be housing on the system. If it is too small it will simply fit in the RAID controllers cache and give you inflated results. I also tend to use one thread per physical CPU core. Be careful though, if you are using a lot of files, having too many threads can cause SQLIO to run out of memory.

Calling SQLIOCommandGenerator:

SQLIOCommandGenerator 0.10
We assume -F<paramfile> -LS -d,-R,-f,-p,-a,-i,-m,-u,-S,-v, -t not implemented

Usage: SQLIOCommandGenerator [OPTIONS]

Generates the command line syntax for the SQLIO.exe program output into a batch file.
Options:
-f, –iopattern[=VALUE] Random, Sequential or Both
-k, –iotype[=VALUE] Read,Write or Both
-s, –seconds[=VALUE] Number of seconds to run each test 1(60) to 10(600) minutes is normal
-c, –cooldown[=VALUE] Number of seconds pause between tests suggested minimum is 5 seconds.
–os, –outstandingiostart[=VALUE]  Starting number of outstanding IOs 1
–oi, –outstandingioincrament[=VALUE] Multiply Outstanding IO start by X i.e 2
–oe, –outstandingioend[=VALUE] Ending Number of outstanding IOs i.e. 64
–ol, –outstandingiolist[=VALUE] Specific Outstanding IO List i.e. 1,2,4,8,16,32,64,128,256,512,1024
–oss, –iosizestart[=VALUE] Starting Size of the IO request in kilobytes i.e. 1
–osi, –iosizeincrament[=VALUE] Multiply IO size by X in kilobytes i.e. 2
–ose, –iosizeend[=VALUE]  Ending number of outstanding IOs in kilobytes – i.e. 1024
–osl, –iosizeList[=VALUE]  Specific IO Sizes in kilobytes i.e. 1,2,4,8,16,32,64,128,256,512,1024
-b, –buffering[=VALUE] Set the type of buffering None, All, Hardware, Software. None is the default for SQL Server
–bat, –sqliobatchfilename[=VALUE]  The name of the output batch file that will be  created
-?, -h, –help show this message and exit

So I passed it this command:

SQLIOCommandGenerator.exe -k=Both -s=600 -c=5 –os=1 –oi=2 –oe=256 –oss=1 –osi=2
–se=1024 -b=all –bat=c:wes_sqlio_bat.txt -f=both

That generates this sample:

:: Generated by SQLIOCommandGenerator
:: This relies on SQLIO.exe being in the same directory.
:: c:wes_sqlio_bat.txt c:paramfile.txt c:outputfile.csv “description of the tests”
:: param1 sqlio parameter file, param2 output of each test to single csv file, param3 test description
SET paramfile=%1
SET outfile=%2
SET runtime=600
SET cooloff=5
SET desc=%3
@ECHO OFF
ECHO ComputerName: %COMPUTERNAME% > %OUTFILE%
ECHO Date: %DATE% %TIME% >> %OUTFILE%
ECHO Runtime: %RUNTIME% >> %OUTFILE%
ECHO Cool Off: %COOLOFF% >> %OUTFILE%
ECHO Parameters File: %PARAMFILE% >> %OUTFILE%
ECHO Description: %DESC% >> %OUTFILE%
ECHO Test Start >> %OUTFILE%
ECHO Command Line: sqlio -kW -s%RUNTIME% -frandom -b1 -o1 -LS -BY -F%PARAMFILE% >> %OUTFILE%
sqlio -kW -s%RUNTIME% -frandom -b1 -o1 -LS -BY -F%PARAMFILE% >> %OUTFILE%
timeout /T %COOLOFF%
ECHO End Date: %DATE% %TIME% >> %OUTFILE%
:: This batch will take approximately 264.0014 Hours to Execute.

The batch file has the instructions for calling it and what parameters you can pass into it. You can omit seconds and cool down if you want to generate a more generic batch file.

This tool is flexible enough for my needs. I can generate specific targeted tests when I have data back that up, or I can generate more general tests to feel out the performance edges.

You may have noticed the estimate run time, that is pretty accurate. This is a worst case scenario where you have chosen pretty much every possible test to run. I wouldn’t recommend this. With the data we have already we can narrow down our testing to just a few IO sizes and queue depths to keep the test well within reason.

SQLIOCommandGenerator.exe -k=Both -s=600 -c=5 –ol=2 –osl=8,64 -b=None –bat=c:wes_sqlio_bat.txt -f=both

This batch will take approximately 80.08334 Minutes to Execute.

Much better! by focusing on our IO targets we now have a test that is meaningful and repeatable.

Why would you want to repeat this test over and over? Simple, not all RAID controllers are created equal. You may need to adjust several options before you hit the optimal configuration.

Running The Tests

Now that I have my tests defined I need to start running them and gathering information. There are some constants I always stay with. One, use diskpart.exe to sector align your disks. Two, format NTFS with a 64k block size. Since I”m doing these tests over and over I wrote a little batch file for that too. Diskpart can take a command file to do its work. Once the RAID controller is in I create an array and look what disk number is assigned to it. As long as you don’t make multiple arrays you will always get the same disk number. After that I format the volume accordingly. WARNING, I do use the /Y so the format happens without prompting for permission!

diskpart.txt

select disk 2

create partition primary align = 64

assign letter = X

testvol.bat

diskpart /S z:diskpart.txt

format x: /q /FS:NTFS /V:TEMP /A:64K /Y

I I also use the RAID controllers command line interface if it has one to make it easier to construct the tests and just let them run using a batch file as a control file. If that isn’t possible don’t worry, the bulk of your time will be waiting for the test to complete anyway.

Gathering The Data

As you have guessed, I have a tool to parse the output of the tests and import them into SQL Server or export it as a CSV file for easy access in Excel. SQLIOParser is also pretty simple to use.

SQLIOParser 0.20

Usage: SQLIOParser [OPTIONS]

Process output of the SQLIO.exe program piped to a text file.

Options:

-c, –computername[=VALUE] The comptuer name that the test was executed on.
-s, –sqlserver[=VALUE] The SQL Server you want to import the data into.
-u, –sqluser[=VALUE] If using SQL Server authentication specify a user
-p, –sqlpass[=VALUE] If using SQL Server authentication specify a password
-t, –tablename[=VALUE] The table you want to import the data into.
-d, –databasename[=VALUE] The database you want to import the data into.
-f, –sqliofilename[=VALUE]  The file name you want to import the data from.
-a, –sqliofiledirectory[=VALUE] The directory containing the files you want to import the data from.
-o, –csvoutputfilename[=VALUE]  The file name you want to export the data to.
-?, -h, –help show this message and exit

It will work with a single file or import a set of files in a single directory. If you are importing to SQL Server you need to have the table already created.

CREATE TABLE [dbo].[SQLIOResults](
[ComputerName] [varchar](255) NULL,
[TestDescription] [varchar](255) NULL,
[SQLIOCommandLine] [varchar](255) NULL,
[SQLIOFileName] [varchar](255) NULL,
[ParameterFile] [varchar](255) NULL,
[TestDate] [datetime] NULL,
[RunTime] [int] NULL,
[CoolOff] [int] NULL,
[NumberOfFiles] [int] NULL,
[FileSize] [int] NULL,
[NumberOfThreads] [int] NULL,
[IOOperation] [varchar](255) NULL,
[IOSize] [varchar](255) NULL,
[IOOutstanding] [int] NULL,
[IOType] [varchar](255) NULL,
[IOSec] [decimal](18, 2) NULL,
[MBSec] [decimal](18, 2) NULL,
[MinLatency] [int] NULL,
[AvgLatency] [int] NULL,
[MaxLatency] [int] NULL
)

This is the same structure the CSV is in as well.

Analyzing The Results

I will warn you that the results you get will not match your performance 100% once the server is in production. This shows you the potential of the system. If you have horrible queries hitting your SQL Server those queries are still just as bad as before. Generally, I ignore max latency and min latency focusing on the average. That is what I am most worried about as the IO load changes or queue depth increases how will the system respond. Remember raw megabytes a second isn’t always king. Number of IO’s at a given IO block size is also very important. I will go into great detail in the next article as I walk you through analyzing the results from my own system so stay tuned for that.

Final Thoughts

These tests aren’t the end of your road. I still advocate playing back traces and seeing how the system responds with your exact workload whenever possible. If you can’t do that then using tools like SQLIO is better than nothing at all. We are also working under the assumption that we are upgrading or replacing an existing production server. If that isn’t the case and this is a brand new deployment using SQLIO will help you know what your I/O system is capable of before you have a problem with bad queries or other issues that always crop up on new systems.

You can always to more testing. It is almost a never ending process, my goal isn’t to give you the end solution just to give you another tool to pull out when you need it. As always, I look forward to your feedback!

Series To Date:
  1. Introduction
  2. The Basics of Spinning Disks
  3. The System Bus
  4. Disk Controllers, Host Bus Adapters and Interfaces
  5. RAID, An Introduction
  6. RAID and Hard Disk Reliability, Under The Covers
  7. Stripe Size, Block Size, and IO Patterns
  8. Capturing IO Patterns
  9. Testing IO Systems – You are here!

Fundamentals of Storage Systems – Capturing IO Patterns

We often take the advice given to us on forums or in articles at face value. Even though the authors almost always say things like “your mileage may vary” or “may not apply to your situation” people still assume it is the gospel. Sometimes it is lack of experience. Other times it is just lack of knowledge on how to verify these things on your own. In this article I’m going to give you a tool to look at what SQL Server is doing at the disk level and allow you to make better decisions on how to configure your underlying disks.

The Basics

There are several things you need to know about how SQL Server accesses the database files and the implications of that before you can construct a proper testing methodology.

http://technet.microsoft.com/en-us/library/cc966500.aspx covers the basics. There are a few things I will highlight here.

ACID and WAL

ACID (Atomicity, Consistency, Isolation, and Durability) is what makes our database reliable. The ability to recover from a catastrophic failure is key to protecting your data.

WAL (Write-Ahead Logging) is how ACID is achieved. Basically, the log record must be flushed to disk before the data file is modified.

Stable Media

Stable media isn’t just the disk drive. A controller with a battery backed cache is also considered stable. Since SQL Server can request blocks as big as 64KB make sure your controller can handle that block size request in cache. Some older controllers only do a 16KB block or smaller.

FUA (Forced Unit Access)

With the requirement of stable media SQL Server creates and opens all files with a specific set of flags. FILE_FLAG_WRITETHROUGH tells the underlying OS not to use write caching that isn’t considered stable media. So, the local disk cache is normally bypassed. Not all hard drives honor the flag though, Some SATA/IDE drives ignore it. Usually, the drive manufacturer provides a tool to turn off write caching. If you are using desktop drives in a mission critical situation be aware of the potential for data loss. FILE_FLAG_NO_BUFFERING tells the OS not to buffer the file ether. At this point the only cache available will be the battery backed or other durable cached on the controller.

File Access

SQL Server uses asynchronous access for data and log files. This allows IO request to queue up and use the IO system as efficiently as possible. The main difference between the two are SQL Server will try and gather writes to the data file into bigger blocks but the log is always written to sequentially.

All of these rules apply to everything but tempdb. Since tempdb is recreated at restart every time recoverability isn’t an issue.

SQL Server data access patterns

Searching around you will find these generalities about SQL Server’s IO patterns

Log Writes

Sequential 512 bytes to 64KB

Data File Read/Writes

8KB

Read ahead – more important to Enterprise Edition

8KB to 125KB

Bulk Insert

8KB to 128KB

Create Database

512 byte – full initialize on log file only.

Backup Sequential Read/Write

1 MB

Restore Sequential Read/Write

64K

DBCC – CHECKDB

Sequential Read 8K – 64K

DBCC – DBREINDEX

(Read Phase) Sequential Read (see Read Ahead)

DBCC – DBREINDEX

(Write Phase) Sequential Write

Any multiple of 8K up to 128K

DBCC – SHOWCONTIG

Sequential Read 8K – 64K

Now that we have an idea of what SQL Server is suppose to be doing its time to verify our assumptions.

Capturing IO activity

There are a few tools that will allow you to capture the file activity at the system level. Process Monitor is a free tool from Microsoft that I will use to collect some base line information. In it’s standard configuration Process Monitor captures a ton of stuff and uses the page file to spool the info to. So, before we begin we need to change the default configuration.

ProcessMon1

Capturing IO data using process monitor.

Filter to apply

process is sqlservr.exe
Operation is Read
Operation is Write

ProcessMon2

Columns to choose.

ProcsessMon5

Process Name
PID
PATH
Detail
Date & Time
Time of Day
Relative Time
Duration
TID
Category

Change Backing File.

ProcessMon3

The maximum number of events it will capture is 199 million. This is enough on my system to capture 12 hours of activity easily. Once we have a good sample you can save it off as an XML file or CSV. Choosing CSV it is pretty easy to import the data into SQL Server using SSIS or your tool of choice.

ProcessMon4

I import the CSV into a raw table first.

Raw table to import into.

CREATE TABLE [SQLIO].[dbo].[pm_imp] (
  [Process Name]  VARCHAR(12),
  [PID]           SMALLINT,
  [Path]          VARCHAR(255),
  [Detail]        VARCHAR(255),
  [Date & Time]   DATETIME,
  [Time of Day]   VARCHAR(20),
  [Relative Time] VARCHAR(50),
  [Duration]      REAL,
  [TID]           SMALLINT,
  [Category]      VARCHAR(6)
)

Next I create a cleaner structure with some additional information separated from the detail provided.

SELECT
[Process Name]       AS ProcessName,
PID                  AS ProcessID,
PATH                 AS DatabaseFilePath,
Detail,
[Date & Time]        AS EventTimeStamp,
[Time of Day]        AS TimeOfDay,
[Relative Time]      AS RelativeTime,
[Duration],
TID                  AS ThreadID,
Category             AS IOType,
substring(detail,charindex('Length: ',detail,0) + 8
,(charindex(', I/O',detail,0) - charindex('Length:',detail,0) - 8)) AS IOLength,
CASE reverse(left(reverse(PATH),3))
    WHEN 'mdf'
    THEN 'Data'
    WHEN 'ndf'
    THEN 'Data'
    WHEN 'ldf'
    THEN 'Log'
  END AS FileType
INTO   SQLIOData
FROM
  dbo.pm_imp
WHERE  reverse(left(reverse(PATH),3)) IN ('mdf','ndf','ldf')

Once we have the data cleaned up a bit we can now start doing some analysis on it.
Queries for interesting patterns.

This query gives us our read and write counts.

SELECT
count(*) IOCount
,IOType
FROM
SQLIOData
GROUP BY IOType
ORDER BY count(*) DESC
 

This one shows us the size of the IO and what type of operation it is.

SELECT
count(*) IOCount
,IOLength
,IOType
FROM
SQLIOData
GROUP BY IOLength,IOType
ORDER BY count(*) DESC

This is a look at activity by file type data or log.

SELECT
count(*) IOCount,
FileType
FROM
SQLIOData
GROUP BY
FileType
ORDER BY
count(*) DESC

Since we are capturing the thread id we can see how many IO’s by thread.

SELECT
count(*) IOCount,
ThreadID
FROM
SQLIOData
GROUP BY
ThreadID
ORDER BY
count(*) DESC

We can also look at IO types, sizes and count by file helping you see which ones are hot.

SELECT
count(*) IOCount,
databasefilepath,
iotype,
iolength
FROM
SQLIOData
WHERE
databasefilepath LIKE '%filename%'
GROUP BY
databasefilepath,
iotype,
iolength
HAVING   count(*) > 10000
ORDER BY databasefilepath,
count(*) DESC

Now that we see exactly what our IO patterns are we can make adjustments to the disk subsystem to help scale it up or tune it for a particular pattern.

This is just another tool in your tool belt. This is a supplement to using fn_virtualfilestats to track file usage. I use it to get a better idea of the size of the IO’s being issued.Using these two tools I can see the size of the IO’s in a window of time that is reported by my fn_virtualfilestats capture routine.

Always verify your assumptions, or advice from others.

Series To Date:
  1. Introduction
  2. The Basics of Spinning Disks
  3. The System Bus
  4. Disk Controllers, Host Bus Adapters and Interfaces
  5. RAID, An Introduction
  6. RAID and Hard Disk Reliability, Under The Covers
  7. Stripe Size, Block Size, and IO Patterns
  8. Capturing IO Patterns – You are here!
  9. Testing IO Systems

When Technical Support Fails You – UPDATE and Answers!

As promised and update on what has happened so far. A correction needs to be made. the P800 is a PCIe 1.0 card so the bandwidth is cut in half from 4GB/sec to 2GB/sec.

My CDW rep did get me in contact with an HP technical rep who actually knew something about the hardware in question and its capabilities. It was one of those good news, bad news situations. We will start with the bad news. The performance isn’t off. My worst fears were confirmed.

The Hard Disks

The HP Guy (changing the names to protect the innocent) told me their rule of thumb for the performance of the 2.5” 73GB 15K drives is 10MB/Sec. I know what you are thinking, NO WAY! But, I’m not surprised at all. What I was told is the drives ship with the on board write cache disabled. They do this for data integrity reasons. Since the cache on the drive isn’t battery backed if there was any kind of failure the potential for data loss is there. There are three measurements of hard disk throughput, disk to cache, cache to system and disk to system. Disk to cache is how fast data can be transferred from the internal data cache to the disk usually sequentially. On our 15k drive this should be on average 80MB/sec. Disk to system, also referred to burst speed, is almost always as fast as our connection type. Since we are using SAS that will be close to 250MB/sec. Disk to system is no caching at all. Without the cache several IO reordering schemes aren’t used, there is no buffer between you and the system, so you are effectively limited by the Areal Density and the rotational speed of the disk. This gets us down to 10 to 15 megabytes a second. Write caching has a huge impact on performance. I hear you saying the controller has a battery backed cache on it, and you would be right.

The Disk Controller

The P800 controller was the top of the line that HP had for quite a while. It is showing its age now though. The most cache you can get at the moment is 512MB. It is battery backed so if there is a sudden loss of power the data in cache will stay there for as long as the battery holds out. When the system comes back on the controller will attempt a flush to disk. The problem with this scheme is two fold. The cache is effectively shared across all your drives since I have 50 drives total attached to the system that is around 10.5 megabytes per drive. Comparable drives ship with 16 to 32 megabytes of cache on them normally. The second problem is the controller can’t offload the IO sorting algorithms to the disk drive effectively limiting it’s throughput. It does support native command queuing and elevator sorting but applied at the controller level just isn’t as fast as at the disk level.If I had configured this array as a RAID 6 stripe the loss of performance from that would have masked the other bottlenecks in the controller. Since I’ve got this in a RAID 10 the bottleneck is hit much sooner with fewer drives. On the P800 this limit appears to be between 16 and 32 disks. I won’t know until I do some additional testing.

Its All My Fault

If you have been following my blog or coming to the CACTUSS meetings you know I tell you to test before you go into production. With the lack of documentation I went with a set of assumptions that weren’t valid in this situation. At that point I should have stopped and done the testing my self. In a perfect world I would have setup the system in a test lab run a series of controlled IO workloads and come up with the optimal configuration. I didn’t do as much testing as normal and now I’m paying the price for that. I will have to bring a system out of production as I run benchmarks to find the performance bottlenecks.

The Good News

I have two P800’s in the system and will try moving one of the MSA70’s to the other controller. This will also allow me to test overall system performance across multiple PCIe busses. I have another system that is an exact duplicate of this one and originally had the storage configured in this way but ran into some odd issues with performance as well.

HP has a faster external only controller out right now the P411. This controller supports the new SASII 6G protocols, has faster cache memory and is PCIe 2.0 complainant. I am told it also has a faster IO processor as well. We will be testing these newer controllers out soon. Also, there is a replacement for the P800 coming out next year as well. Since we are only using external chassis with this card the P411 may be a better fit.

We are also exploring a Fusion-io option for our tempdb space. We have an odd workload and tempdb accounts for half of our write operations on disk. Speeding up this aspect of the system and moving tempdb completely away from the data we should see a marked improvement over all.

Lessons Learned or Relearned

Faced with the lack of documentation, don’t make assumptions based on past experiences. Test your setup thoroughly. If you aren’t getting the information you need, try different avenues early. Don’t assume your hardware vendor has all the information. In my case, HP doesn’t tell you that the disks come with the write cache disabled. They also don’t give you the full performance specifications for their disk controllers. Not even my HP Guy had that information. We talked about how there was much more detailed information on the EVA SAN than there was on the P800.

Now What?

Again, I can’t tell you how awesome CDW was in this case. My rep, Dustin Wood, went above and beyond to get me as much help as he could, and in the end was a great help. It saddens me I couldn’t get this level of support directly from HP technical support. You can rest assured I will be giving HP feedback to that effect. By not giving the customer and even their own people all the information sets everyone up for failure.

I’m not done yet. There is a lot of work ahead of me, but at least I have some answers.You can bet I’ll be over at booth #414 next week at PASS asking HP some hard questions!

When Technical Support Fails You

I have had the pleasure of being a vendor, and technical support for both hardware and software products. I know it isn’t easy. I know it isn’t always possible to fix everything. The level of support I’ve received from HP on my current issue is just unacceptable. This is made more frustrating by the lack of documentation. The technical documents show capacity. How many drives in an array, Maximum volume size but nothing on throughput.Every benchmark they have seems to be relative to another product with no hard numbers. For example, the P800 is 30% faster than the previous generation.

I’m not working with a complicated system. It’s a DL380 G5 with a P800 and two MSA70’s fully populated with 15k 73GB hard drives. 46 of them are in a RAID 10 array with 128k stripe. Formatted it NTFS with a 64k block size and sector aligned the partition. Read/Write cache is set at 25%/75%. This server originally just had one MSA70. We added the second for capacity expansion and expected to see a boost in performance as well. As you can probably guess, there wasn’t any increase in performance at all.

Here is what I have as far as numbers. Some of these are guesses based on similar products.

P800 using two external miniSAS 4x connectors maximum throughput of 2400 MB/sec (2400Mbit per link x 4 per connector x 2 connectors).
The P800 uses a PCIe x8 connection to the system at 4,000 MB/Sec (PCIe 2.0 2.5GHz 4GB/sec each direction).
Attached to the controller are 15k 73GB 2.5” hard drives 46 of them for a raw speed 3680 MB/Sec of sequential read or write speed (23x80MB/sec write sequential 2 MSA70’s RAID 10 46 Drives total based on Seagate 2.5 73GB SAS 15.1k)

Expected write speed should be around 1200 megabytes a second.

We get around 320 MB/Sec sequential write speed and 750MB/sec in reads.

Ouch.

Did I mention I also have a MSA60 with 8 7.2k 500GB SATA drives that burst to 600MB/sec and sustain 160MB/Sec writes in a RAID 10 array? Yeah, something is rotten in the state of Denmark.

With no other options before me I picked up the phone and called.

I go through HP’s automated phone system, which isn’t that painful at all, to get to storage support. Hold times in queue were very acceptable. A level one technician picked up the call and started the normal run of questions. It only took about 2 minutes to realize the L1 didn’t understand my issue and quickly told me that they don’t fix performance issues period. He told me to update the driver, firmware, and reboot. Of course none of that had worked the first time but what the heck, I’ll give it the old college try. Since this is a production system I am limited on when I can just do these kinds of things. This imposed lag makes it very difficult to keep an L1 just sitting on the phone for five or so hours on hold while they wait for me to complete the assigned tasks. I let him go with the initial action plan in place with an agreement that he would follow up.Twice I got automated emails that the L1 had tried to call and left voicemails for me. Twice, there were no voicemails. I sent him my numbers again just to be on the safe side. Next, I was told to run the standard Array Diagnostic Utility and a separate utility that they send you to gather all the system information and logs, think a PSSDiag or SQLDiag. After reviewing the logs he didn’t se anything wrong and had me update the array configuration utility. I was then told they would do a deeper examination of the logs I had sent and get back to me. Three days later I got another email saying the L1 had tried to call and left me a message. Again there was no voicemail on my cell or my desk phone. I sent a note back to the automated system only to find the case had been closed!

I called back in to the queue and gave the L1 who answered my case number, he of course told me it was closed. He read the case notes to me, the previous L1 had logged it as a network issue and closed the case. If I had been copying files over the network and not to another local array I can see why it had been logged that way. I asked to open a new case and to speak to a manager. I was then told the manager was in a meeting. No problem, I’ll stay on the line. After 45 minutes I was disconnected. Not one to be deterred, I called back again. The L1 that answered was professional and understanding. Again, I was put on hold while I waited for the manager to come out of his meeting. About 10 minutes later I was talking to him. He apologized and told me my issues would be addressed.

I now had a new case number and a new L1. Again, we dumped the diagnostic logs and started from the beginning. This time he saw things that weren’t right. There was a new firmware for the hard drives, a new driver for the P800, and a drive that was showing some errors. Finally, I felt like I was getting somewhere! At this point it has been ten days since I opened the previous case. We did another round of updates. A new drive was dispatched and installed. The L1 did call back and actually managed to ether talk to me or leave a message. When nothing had made any improvement he went silent. I added another note to the case requesting escalation.

That was eight days ago. At this point I have sent seven sets of diagnostic logs. Spent several hours on the phone. And worked after hours for several days. The last time I talked to my L1, the L2’s were refusing to accept the escalation. It was clearly a performance problem and they don’t cover that. The problem is, I agree. Through this whole process I have begged for additional documentation on configuration and setup options, something that would help me configure the array for maximum performance.

They do offer a higher level of support that covers performance issues, for a fee of course. This isn’t a cluster or a SAN. It is a basic setup in every way. The GUI walks you through the setup, click, click, click, monster RAID 10 array done. What would this next level of paid support tell me?

My last hope is CDW will be able to come through with documentation or someone I can talk to. They have been very understanding and responsive through this whole ordeal.

Thirty one days later, I’ve still got the same issue. I now have ordered enough drives to fill up the MSA60. The plan is to transfer enough data to free up one of the MSA70’s. Through trial and error, I will figure out what the optimum configuration is. Once I do I’ll post up my findings here.

If any of you out there in internet-land have any suggestions I’m all ears.

Fundamentals of Storage Systems – The System Bus

This installment we will cover what connects the controller to the computer.

Disk controllers use a system bus to talk to your CPU and memory. It also determines the maximum speed your disk can talk to the computer. There may be as many as six different system busses in your computer. We are only interested in the ones that directly connect your disk controllers.

The oldest bus still in general use is PCI. You can still find them in your desktop and in servers though it is really on the way out. We are only covering PCI 2.0 32 bits wide running at 33 MHz. This allows for a theoretical top speed of 133.33 MB/Sec. In reality after overhead and other limitations you end up around 86 MB/Sec throughput. A single modern disk can achieve this speed. You generally don’t see PCI disk controllers with more than 4 ports. Adding more disk controllers to a system may not yield a direct increase in performance. Even if you have multiple PCI slots, they may only actually run through a single PCI bus. Limiting your bandwidth to the system to 133.33 MB/Sec.

File:PCI Slots Digon3.JPG

Credit:Jonathan Zander

IBM, HP, and Compaq came together to standardize a faster bus for servers, specifically for disk controllers and network interface cards. PCI-X build on the PCI standard and was backwards compatible. It extended the PCI bus to 64 bits wide and a speed of 66 MHz in its initial launch. We go from 133.33 MB/Sec to 533.3 MB/Sec, a 4x improvement. The next generation brought us two more implementations, PCI-X 64 bit/100 MHz and PCI-X 64 bit/133 MHz at 800 MB/Sec and 1067 MHz respectively. This was a major step up but had several flaws. The physical size of the connector was huge. It also carried over the shortcomings of PCI. Signal noise across slots, errors could be caused by having several cards next to each other. Communication was half-duplex bidirectional, It couldn’t send and receive data at the same time. You are only as fast as the slowest card on the bus, If you had a 66 MHz card your 133 MHz card was reduced to match.

Yikes!

File:64bitpci.jpg

Credit: Snickerdo

We have moved on to a completely new standard, PCI Express (PCIe). Some people confuse PCI-X with PCIe, but they are completely different. The new PCIe standard was introduced in 2004 and was quickly adopted in main stream computers for video cards, but  it is a general system bus. There are several key differences between PCIe and the buses that came before it. It is a fully serial and bidirectional bus, you can have multiple cards at multiple speeds reading and writing data at the same time. It also introduced the concept of lanes. PCIe card will use between 1 to 16 lanes. Each lane in the 1.0 specification was rated at 250 MB/sec. The 2.0 specification introduced in 2007 doubled that to 500 MB/Sec. In 2011 the 3.0 Specification will double that again to 1 GB/Sec. PCIe is also rated by how many transfers a second it can handle. Measuring transfers in Gigatransfers or Megatransfers has been around for a while, though not commonly used. See Gigatransfers at Wikipedia for a better explanation. One thing to be aware of the 1.0 and 2.0 standard loose speed due to the way data is encoded on the bus. Like the PCI bus, the 250 MB/Sec is a maximum you won’t see in the real world. You will loose about 20%. The 3.0 specification reduces that to around 1.5%. The most common sizing of PCIe slots is 1x, 4x, 8x, and 16x. It is also downwards compatible so a 1x, 4x, and 8x card will all work in a 16x slot. Just because a card is physically a 16x it may be a 8x or slower internally. That applies for the slot as well, it may be a 16x slot but only operate at 8x speeds.

File:PCIExpress.jpg

Credit: Snickerdo

PCI Express slots (from top to bottom: x4, x16, x1 and x16), compared to a traditional 32-bit PCI slot (bottom).

So, what does all this mean? If you have an older server, don’t use the PCI slots. Be careful with the PCI-X cards and placement. If you have PCIe you need to know if it is a 1.0 or 2.0 capable and what speed the physical connectors actually operate at.

Series To Date:

  1. Introduction
  2. The Basics of Spinning Disks
  3. The System Bus – You are here!
  4. Disk Controllers, Host Bus Adapters and Interfaces
  5. RAID, An Introduction
  6. RAID and Hard Disk Reliability, Under The Covers
  7. Stripe Size, Block Size, and IO Patterns
  8. Capturing IO Patterns
  9. Testing IO Systems