Category Archives: Storage Systems

Fundamentals of Storage Systems, IO Latency and SQL Server

A Thousand Men Marching Still Only March As Fast As One Man.

la·ten·cy – Computers . the time required to locate the first bit or character in a storage location, expressed as access timeminus word time.

Often when talking to people about performance they get rapped around the MB/Sec number and ignore a critical factor, latency. SQL Server is one of those few applications that is very sensitive to disk and network latency. Latency is what the end user sees. If your SQL Server is waiting around for disk or network they will start to complain. In an OLTP environment SQL Server accesses data on disk in a nonlinear fashion, aka random IO’s. The size of these IO request can be pretty small. In a good application you really try and limit the amount of data returned to keep things speedy. The down side of small Random IO’s is the system will never be faster than a single seek operation on your disk. So, if you have a 15k SAS drive that is around 2.5ms. Caching and buffering schemes aside for now, 2.5ms is your floor. You will never be faster than that. Depending on the size of the IO request you spend more time waiting for the seek operation than you do actually transferring the data from the disk. We group disks together in larger arrays to give us more aggregate throughout and higher operations per second, but you are ever only as fast as your slowest disk. You can always get more aggregate throughput up to several gigabytes a second but you still have to wait on that first bit to travel to you.

To get around these short comings, SQL Server buffers as much data as it can in memory. We also add large amounts of cache to our SAN’s and disk controllers. This can mask some of the problem, but at some point it still needs to come from your disk drives.

On the network side things are actually better. With latency under a millisecond on a LAN you usually waiting on disk to deliver the data. There are other factors like the speed of the network equipment and number of hops across interfaces you have to make can be more significant than the actual transmittion rate. TCP/IP can be a factor as well. Out of the box SQL Server is configured at 4KB(4096 byte) packet. This is a good general setting for most workloads. If you are working on a highly tuned OLTP system you may want to set that to something smaller or align it with the TCP packet size for your network, usually 1500 bytes. If it is a OLAP system with lots of streaming throughput latency will make up a very small part of the overall transmission time and going with a larger packet size possibly aligned to the 8KB page size with increase throughput and decrease the time to transmit overall. If you do set a large packet size you should consider enabling jumbo frames on your network card. Make sure that your network equipment can support the jumbo frame from end to end.

Another place where we are starting to see more latency issues is with database mirroring. In synchronous mode, the default setting, you are now adding network latency plus the disk latency of the other server to the overall transaction time.

Mirroring isn’t the only game in town. We have had SAN level replication for quite a while as well. In most of the scenarios where we were using SAN level replication it was site to site across several miles. To give you an idea of how much latency can be added in this situation go ping yahoo.com or google.com, I’ll wait….. Ok, from my location ether of them is 45ms~75ms, or 18 times slower than your spinning disks. All the sudden, your network is the major player in delaying transactions. I’ve used fibre optics to connect to sites and the latency can still be a killer for OLTP systems. The general rule of thumb is 7.5 microseconds for every  1 1/2 miles. If our target SAN is 125 miles away we just added 2ms of latency to the 4ms of latency the two sets of disks are providing. In reality, it is worse than that when you again figure in network equipment. I personally haven’t seen synchronous setups more than 50 miles apart.

Just something to keep in mind as you plan your SQL Server infrastructure. Latency in its myriad forms is king.

Series To Date:
  1. Introduction
  2. The Basics of Spinning Disks
  3. The System Bus
  4. Disk Controllers, Host Bus Adapters and Interfaces
  5. RAID, An Introduction
  6. RAID and Hard Disk Reliability, Under The Covers
  7. Stripe Size, Block Size, and IO Patterns
  8. Capturing IO Patterns
  9. Testing IO Systems
  10. Latency – You are here!

SQLSaturday #57 Houston, Smashing Success!

I attended and spoke at SQLSaturday #57, and it was an awesome event! Here are my notes and observations on the trip as a whole.

As always, I try to be in town on Friday night to do the speaker dinner. It’s always worth it. Even if you hate the food, resturant or the part of town the PEOPLE make it so worth wild. I always meet someone new and get to cultivate relationships that normally would only get some TLC at The Summit. To me SQLSaturday is a cheap way to keep my speaking skills sharp, educate some folks and get to spend quality time with a great group of people.

Friday night I got to do one of the things on my “bucket list”, guest on DBAs@Midnight with my friends Sean, Jen and Patrick. I’ve worked in broadcasting but it’s never easy to do. Sean and Jen put quite a bit of work into these weekly shows and being on the other side of the camera reminded me of that!

I was surprised at how nice the facilities were. It wasn’t what I had in mind when I saw it was at a church. It is always cool to see people looking to other communities besides the traditional venues for this kind of event. The food was awesome. Being a meat eater having hot brisket was a big plus for me! There were two things that will be corrected the next time around. Signage and room numbering. It wasn’t a huge deal just a rough spot in an otherwise flawless event.

You can download my slide deck here.

Again, thanks for putting on such a great event and allowing me to come speak!

#SQLRally is coming, Go vote!

 

We are in the final stages of selecting the speakers for the SQLRally May 11th through the 13th in sunny Orlando FloridaSQLRally Winner[11]. The program selection is a little different than what we have done with the Summit. The committee narrowed the number of selections and is putting the rest up to a public vote. This is your opportunity to voice your opinion on what you would like to hear at this inaugural event! I’ve been fortunate enough to have two of my sessions put up for a vote. If you follow my blog you know I have a passion for moving bits of data around as fast as possible. Both my sessions focus on storage. As much as I would love to have your votes to see my sessions at SQLRally, I would like it even more if you voted on what YOU want to learn about the most. Having served on the program committee for Summit last year I know just how hard it can be choosing what I think people would like to learn about. having the opportunity to make your choice known directly is just awesome. I am very excited to see PASS expand and have training events that cover the gambit. Starting with local user groups and SQL Saturdays now growing with SQLRally and finishing it off with the Summit, there is something for every budget.

With that said, here are my abstracts so you can get a better idea of what I’m speaking on. GO VOTE!

Title:
Solid State Storage Deep Dive
Speaker:
Wesley Brown
Category:
Storage
Level:
100

Abstract:
If you have ever wanted to know how SSD’s and Flash memory works this talk is for you. We will cover the fundamentals of Flash in detail. I will also highlight some of the specific vendor implementations and what makes a particular SSD enterprise-ready vs. consumer grade. We will also cover SQL Server usage patterns what is a good fit for SSD’s and when it may be better to go with hard disks. Solid State Storage isn’t a cure-all for every situation, this presentation will give you the tools you need to make the right choice for your SQL Server environment.

Session Goals

  • Understand the fundamental building block of Flash memory.
  • Get a clear explanation of what makes some SSD’s robust enough for enterprise use.
  • Learn where SSD will and won’t make a real difference in your SQL Server environment.

Title:
Understanding Storage Systems and SQL Server
Speaker:
Wesley Brown
Category:
Storage
Level:
100

Abstract:
The most important part of your SQL Server is also the slowest, Storage. This talk will take you through the fundamentals of your server’s Disk I/O System. From how hard drives work, through RAID configurations, and how to configure the file system. This session should give you a solid foundation over storage systems and help you understand why they are slow and how to overcome some of their limitations.

Session Goals

  • Understand the physical characteristics of IO hardware.
  • Understand the fundamentals of RAID.
  • Understand how to configure the file system.

SATA, SAS or Neither? SSD’s Get A Third Option

I recently wrote about solid state storage and its different form factor. Well, several major manufacturers have realized that solid state needs all the bandwidth it can get. Dell, IBM, EMC, Fujitsu and Intel have formed the SSD Form Factor Working Group bringing PCIe 3 to the same form factor that SATA and SAS use. Focusing on the same connector types and a 2.5” dive housing. I’m not sure how quickly it will make it’s way into the enterprise space but that is clearly it’s target. Reusing the physical form factor cuts down on manufacturing and R&D costs for all involved. They have an aggressive time scale for something like this. The specification hasn’t been published yet and I’ll take a deeper look into it when it becomes available. There are some key players missing though. HP and Seagate being the two in the enterprise space that give me pause. Both control a large segment of the storage space. On the controller side LSI is also absent. This could be a direct threat to their current market domination of the RAID controller chipset space if they aren’t on the ball.

Fusion-io got that early on and took a different route sticking with just PCIe to bypass the limitations of SAS/SATA and intermediate controllers. By going that route they opened up a whole other level of performance.

I asked David Flynn what he thought about the new standard. Fusion-io is a contributor to the working group.

It is quite validating that folks would be routing PCIe to the drive bays.  For us it’s just another form factor that

we can easily support it.

Two things, though…  First is that I believe it’s a hangover from the mechanical drive era to put such emphasis on form factors that allow easy servicing access.  Solid state should not need to be serviced.  It should be much more reliable than HDD’s.  But, outside of Fusion-io failure rates for solid state is actually much worse than for mechanical disk drives.

The second point is that form-factor and even PCIe attachment isn’t really the key thing to higher performing, more reliable solid state.  What makes the real difference is eliminating the embedded CPU bottleneck in the access path to the flash.

Fusion-io uses a memory controller approach to integrating flash.  You don’t find CPU’s on  DRAM modules.  SSD’s (SATA or PCIe) from everyone else use embedded CPU’s and attach using storage controller methodologies.

In an upcoming post on my solid state storage series I will explore failure rates in detail. I do find it interesting that Fusion-io is one of the very few companies that have significantly higher error detection rates than a standard hard drive or other SSD’s, even enterprise branded SSD’s. Fusion-io claims 10^20 uncorrectable detectable error rate and 10^30 uncorrectable undetectable error rate. I have yet to see any hard disk or SSD with a rate better than 10^17. So, I agree with David about actually needing a form factor for ease of service if you build the device with enough error correction, which clearly you can with solid state.

Fundamentals of Storage Systems – Stripe Size, Block Size, and IO Patterns

If you have been following this series we have covered system buses, hard disks, host bus adapters and RAID. Along the way we also covered how to capture your IO patterns and the SQLIO tool. Now we will pull it all together.We move up the stack even further to the actual layout of the RAID stripe and the file system. How the stripe and file system are laid out on your disks has a huge impact on performance. One of the things that has really gotten some traction over the last few years is sector alignment. This one thing, if not done, could cost you 30% to 40% of your IO potential. Jimmy May has covered sector alignment in depth So I won’t hash it here again. Kendal Van Dyke also has a good series that covers offset, stripe size, and allocation units with different raid levels.

It Don’t Add Up…

Something I’ve seen, and been guilty of, is taking a drives base specifications and just multiplying out. Say the manufacturer says the drive will to 79MB/Sec minimum throughput, we have 10 drives so that is 790MB/Sec of throughput! We all know from experience that this isn’t so. What eats us up is how much slower it really can be. As we have seen throughout this series there is overhead associated to everything. Before we just throw a bunch of disks in an enclosure and press it into service it would be nice to have an idea of what the performance should be. It’s also recommended to do some of this work before you actually buy anything so you don’t have to go back to your boss and beg for more money and explain to him that your wild guess was wrong.

Always add a pinch of salt to whatever the disk manufacturer puts in the specifications. Most of the time they will be close enough. The problem lies in the fact they don’t always disclose the methods for archiving those numbers. For instance, when they report minimum and maximum throughput they are usually talking about a scan of the entire disk including all meta data stored between tracks, the best possible throughput possible. You won’t see those results in every day life. They also give you numbers that can be completely irrelevant like single sector read rates. very rarely do you read a single sector at a time. Personally, I would love if the drive makers gave the engineering specifications. I know that won’t happen, it would make my life easier though. The disk characteristics that are important are, sector size,spindle speed, seek times read and write, sequential times read and write. To a lesser extent sequential throughput in megabytes per second. With the single disk numbers we can move on to the RAID configuration.

Configuring your RAID Array

There are several factors that impact the RAID arrays ability to perform. The RAID level, size of the IO request, and stripe size. RAID level is the easy one, what kind of hits do you take on writes vs. capacity of the array. On the stripe size there is a direct corollary with the size of the IO request. If the IO request is bigger than the stripe size it will have to seek across another disk to satisfy the data request. If the IO request size is very small and random you may loose some IO performance if the requests pile up on one disk causing a hot spot. There are established calculations that you can perform to get an idea of how to configure you array. I’ve built a web page that you can use to do all the basic calculations, Disk Drive RAID Configuration Tool. These equations are base line estimates so you aren’t working completely in the dark. You can enter your own drive statistics or pick from one of 1100 hard drives in the database. This web calculator is based off of Peter Chen’s equations for estimating RAID performance and best stripe size. I’ll add more to it as I get time.

SQL Server IO Patterns and Array Performance

SQL Server works with two specific IO request size 8K and 64K in general. If you did your due diligence earlier you could also add any other request size that you saw come through. Focusing on the page size and extent size is a good place to start. Using the raid calculator tool I chose a Seagate Savvio 15K.2 drive as my base. One of the things my calculator can’t take into consideration is your system and RAID HBA. This is where testing is essential. You will find there are anomalies in every card, physical limits on throughput and IO’s. Since my RAID card won’t do a stripe bigger than 256k that is my cap for size. Reading through several IO white papers on SQL Server the general recommendation is for 2000/2005 a 64k or 128k stripe size and for SQL Server 2008 a 256k stripe size. I’ve found as general guidance, this is a good place to start as well. The calculator tells me for a RAID 10 array with 24 drives at a 256k stripe size and 8k IO request I should get 9825 IOs/Sec and 76.75 MB/Sec on average, across reads, writes, sequential and random IO requests. That’s right, 76 MB/Sec throughput for 24 drives rated at 122 MB/sec minimum. That is 2.5 MB/Sec per drive. The same array at a 64k IO request size yields 8102 IOs/Sec and 506 MB/Sec. A huge difference in throughput just based on the IO request size. Still, not anywhere near 122 MB/Sec. As an estimate, I find that these numbers are “good enough” to start sizing my arrays. If I needed to figure out how big the array needs to be to support say 150 MB/sec throughput or 10000 IOs/Sec you can do that with the calculator as well. Armed with our estimates it’s time to actually test our new RAID arrays. I use SQLIO to do synthetic benchmarking before running any actual data loads.

After doing a round of testing I found that in some cases the numbers were a little high or a little low. Other factors that are hard to calculate are cache hit ratios. Enterprise RAID HBA’s usually disable the write cache on the local disk controller and just use their own batter backed cache for all write operations. This is safer but with more and more disks on a single controller the amount of cache per disk can get pretty low. The HBA will also want you to split that between read and write operations. On my HP RAID HBA’s the default is 25% read and 75% write. In an older study I found on disk caches and cache size saw diminishing returns above 2 MB gaining between 1 and 2 percent additional cache hits per megabyte of cache. I expect that to flatten out even more as the caches get larger, you simply can’t get 100% cache ratios that would mean the whole drive fit in the ram cache or your IO request are the same over and over. Generally if that is the case you will find SQL Server won’t have to go to disk it will have what it needs in the buffer pool for reads. I find that if you have less than 20 percent write activity leaving the defaults is fine. If I do have a write heavy load I will set the cache to 100% writes.

The Results

Having completed my benchmarking I found that 128k or 256k stripe size was fine on average. Just realize that if you optimize for one IO pattern the others will suffer. Latency is also important and I have included it here as well. You find that the larger the IO request and the smaller the stripe size latency gets worse. Here are the results from my tests on a DL380 G5 with a P411 and 24 drives in a MSA 70 enclosure. I’ve included tests for an 8k to 256k stripe sizes.

As a footnote I’d like to thank Joe Handley, Ben Poliakoff, David Gosslin and Dale Davis for helping me get the Disk Drive RAID Configuration Tool together. I’m not a web guy!

WARNING! Lots of charts below!

Read 8K IO Request 24 73GB 15K Drives RAID 10 64K File System Cluster Size 1 Outstanding IO’s 8 Threads
Random Sequential
image image
image image
image image
Write 8K IO Request 24 73GB 15K Drives RAID 10 64K File System Cluster Size 1 Outstanding IO’s 8 Threads
Random Sequential
image image
image image
image image
Read 64K IO Request 24 73GB 15K Drives RAID 10 64K File System Cluster Size 1 Outstanding IO’s 8 Threads
Random Sequential
image image
image image
image image
Write 64K IO Request 24 73GB 15K Drives RAID 10 64K File System Cluster Size 1 Outstanding IO’s 8 Threads
Random Sequential
image image
image image
image image
Series To Date:
  1. Introduction
  2. The Basics of Spinning Disks
  3. The System Bus
  4. Disk Controllers, Host Bus Adapters and Interfaces
  5. RAID, An Introduction
  6. RAID and Hard Disk Reliability, Under The Covers
  7. Stripe Size, Block Size, and IO Patterns – You are here!
  8. Capturing IO Patterns
  9. Testing IO Systems