Category Archives: IO

Solid State Storage: Enterprise State Of Affairs

Here In A Flash!

Its been a crazy last few years in the flash storage space. Things really started taking off around 2006 when NAND flash and moores law got together. in 2010 it was clear that flash storage was going to be a major part of your storage makeup in the future. It may not be NAND flash specifically though. It will be some kind of memory and not spinning disks.

Breaking The Cost Barrier.

For the last few years, I’ve always told people to price out on the cost of IO not the cost of storage. Buying flash storage was mainly a niche product solving a niche problem like to speed up random IO heavy tasks. With the cost of flash storage at or below standard disk based SAN storage with all the same connectivity features and the same software features I think it’s time to put flash storage on the same playing field as our old stalwart SAN solutions.

Right now at the end of 2012, you can get a large amount of flash storage. There is still this perception that it is too expensive and too risky to build out all flash storage arrays. I am here to prove at least cost isn’t as limiting a factor as you may believe. Traditional SAN storage can run you from 5 dollars a Gigabyte to 30 dollars a Gigabyte for spinning disks. You can easily get into an all flash array in that same range.

Here’s Looking At You Flash.

This is a short list of flash vendors currently on the market. I’ve thrown in a couple non-SAN types and a couple traditional SAN’s that have integrated flash storage in them. Please, don’t email me complaining that X vendor didn’t make this list or that Y vendor has different pricing. All the pricing numbers were gathered from published sources on the internet. These sources include, the vendors own website, published costs from TPC executive summaries and official third party price listings. If you are a vendor and don’t like the prices listed here then publicly publish your price list.

There are always two cost metrics I look at dollars per Gigabyte in raw capacity and dollars per Gigabyte in usable capacity. The first number is pretty straight forward. The second metric can get tricky in a hurry. On a disk based SAN that pretty much comes down to what RAID or protection scheme you use. Flash storage almost always introduces deduplication and compression which can muddy the waters a bit.

Fibre Channel/iSCSI vendor list

Nimbus Data

Appearing on the scene in 2006, they have two products currently on the market. the S-Class storage array and the E-Class storage array.

The S-Class seems to be their lower end entry but does come with an impressive software suite. It does provide 10GbE and Fibre Channel connectivity. Looking around at the cost for the S-Class I found a 2.5TB model for 25,000 dollars. That comes out to 9.7 dollars per Gigabyte in raw space. The S-Class is their super scaleable and totally redundant unit. I found a couple of quotes that put it in at 10.oo dollars a Gigabyte of raw storage. Already we have a contender!

Pure Storage

In 2009 Pure Storage started selling their flash only storage solutions. They include deduplication and compression in all their arrays and include that in the cost per Gigabyte. I personally find this a bit fishy since I always like to test with incompressible data as a worst case for any array. This would also drive up their cost. They claim between 5.00 and 10.00 dollars per usable Gigabyte and I haven’t found any solid source for public pricing on their array yet to dispute or confirm this number. They also have a generic “compare us” page on their website that at best is misleading and at worst plain lies. Since they don’t call out any specific vendor in their comparison page its hard to pin them for falsehoods but you can read between the lines.

Violin Memory

Violin Memory started in earnest around 2005 selling not just flash based but memory based arrays. Very quickly they transitioned to all flash arrays. They have two solutions on the market today. The 3000 series which allows some basic SAN style setups but also has direct attachments via external PCIe channels. It comes in at 10.50 dollars a Gigabyte raw and 12 dollars a Gigabyte usable. The 6000 series is their flagship product and the pricing reflects it. At 18.00 dollars per Gigabyte raw it is getting up there on the price scale. Again, not the cheapest but they are well established and have been used and are resold by HP.

Texas Memory Systems/IBM

If you haven’t heard, TMS was recently purchased by IBM. Based in Houston, TX I’ve always had a soft spot for them. They were also the first non-disk based storage solution I ever used. The first time I put a RamSan in and got 200,000 IO’s out of the little box I was sold. Of course it was only 64 Gigabytes of space and cost a small fortune. Today they have a solid flash based fibre attached and iSCSI attached lignup. I couldn’t find any pricing on the current flagship RamSan 820 but the 620 has been used in TPC benchmarks and is still in circulation. It is a heavy weight at 33.30 dollars a Gigabyte of raw storage.

Skyera

A new entrant into this space they are boasting some serious cost savings. They claim a 3.00 dollar per Gigabyte usable on their currently shipping product. The unit also includes options for deduplication and compression which can drive the cost down even further. It is also a half depth 1U solution with a built-in 10GbE switch. They are working on a fault tolerant unit due out second half of next year that will up the price a bit but add Fibre Channel connectivity. They have a solid pedigree as they are made up of the guys that brought the Sanforce controllers to market. They aren’t a proven company yet, and I haven’t seen a unit or been granted access to one ether. Still, I’d keep eye on them. At those price points and the crazy small footprint it may be worth taking a risk on them.

IBM

I’m putting the DS3524 on a separate entry to give you some contrast. This is a traditional SAN frame that has been populated with all SSD drives. With 112 200 GB drives and a total cost of 702908.00 it comes in at 31.00 a Gigabyte of raw storage. On the higher end but still in the price range I generally look to stay in.

SUN/Oracle

I couldn’t resist putting in a Sun F5100 in the mix. at 3,099,000.00 dollars it is the most expensive array I found listed. It has 38.4 Terabytes of raw capacity giving us a 80.00 dollars per Gigabyte price tag. Yikes!

Dell EqualLogic

When the 3Par deal fell apart Dell quickly gobbled up EqualLogic, a SAN manufacturer that focused on iSCSI solutions. This isn’t a flash array. I wanted to add it as contrast to the rest of the list. I found a 5.4 Terabyte array with a 7.00 dollar per Gigabyte raw storage price tag. Not horrible but still more expensive that some of our all flash solutions.

Fusion-io

What list would be complete without including the current king of the PCIe flash hill Fusion-io. I found a retail price listing for their 640 Gigabyte Duo card at 19,000 dollars giving us a 29.00 per usable Gigabyte. Looking at the next lowest card the 320 Gigabyte Duo at 7495.00 dollars ups the price to 32.20 per useable Gigabyte. They are wicked fast though 🙂

So Now What?

Armed with a bit of knowledge you can go forth and convince your boss and storage team that a SAN array fully based on flash is totally doable from a cost perspective. It may mean taking a bit of a risk but the rewards can be huge.

The Fundamentals of Storage Systems – Shared Consolidated Storage Systems

Shared Consolidated Storage Systems – A Brief History

Hey, “Shared Consolidated Storage Systems” did you just make that up? Why yes, yes I did.

For as long as we have had computers there has been a need to store and retrieve data. We have covered the basics of hard disks, RAID and solid state storage. We have looked at all of this through the aspect of being directly attached to a single server. It’s time we expand to attaching storage pools to servers via some kind of network. The reason I chose to say shared and consolidated storage instead of just SAN or Storage Area Network was to help define, broaden and give focus to what we really mean when we say SAN, NAS, Fibre Channel or even iSCSI. To understand where we are today we need to take a look back at how we got here.

Once, There Were Mainframes…

Yep, I know you have heard of these behemoths. They still roam the IT Earth today. Most of us live in an x86 world though. We owe much to Mainframes. One of these debts is networked storage. Way back when, I’m talking like the 1980’s now, Mainframes would attach to their storage via a system bus. This storage wasn’t internal the way we think of direct attached storage though. They had massive cables running from the Mainframe to the storage pods. The good folks at IBM and other big iron builders wanted to simplify the standard for connecting storage and other peripherals.

Who doesn’t love working with these cables?

You could never lose this terminator!

Out With The 1960’s And In with the 1990’s!

Initially IBM introduced it’s own standard in the late 80’s to replace the well aged bus & tag and other similar topologies with something that was more robust and could communicate over optical fiber. ESCON was born. The the rest of the industry backed Fibre Channel which is a protocol that works over optical fiber or copper based networks, more importantly it would be driven by a standards body and not a single vendor. Eventually, Fibre Channel won out. In 1994 Fibre Channel was ratified and became the defacto standard even IBM got on board. Again, we are still talking about connecting storage to a single Mainframe, longer connections were possible and the cabling got a lot cleaner though. To put this in perspective, SQL Server 4.2 was shipping at that point with 6.0 right around the corner.

High Performance Computing and Editing Video.

One of the other drivers for Fibre Channel was the emerging field of High Performance Computing (HPC) and the need to connect multiple mainframes or other compute nodes to backend storage. Now we are really starting to see storage attached via a dedicated network that is shared among many computers. High end video editing and rendering farms also drove Fibre Channel adoption. Suddenly, those low end pc-based servers had the ability to connect to large amounts of storage just like the mainframers’.

Commodity Servers, Enterprise Storage.

Things got interesting when Moore’s Law kicked into high gear. Suddenly you could buy a server from HP, Dell or even Gateway. With the flood of cheaper yet powerful servers containing either an Intel, MIPS, PPC or Alpha chip you didn’t need to rely on the mainframe so heavily. Coupled with Fibre Channel and suddenly you had the makings for a modern system. One of the biggest challenges in this emerging commodity server space was storage management. Can you deal with having hundreds of servers and thousands of disks without any real management tools? What if you needed to move some unused storage from server A to Server B? People realized quickly that maintaining all these islands of storage was costly and also dangerous. Even if they had RAID systems if someone didn’t notice the warnings you could lose whole systems and the only people who knew something was up was the end user.

Simplify, Consolidate, Virtualize and Highly Available

Sound familiar? With the new age of networked storage we needed new tools and methodologies. We also gained some nifty new features. Network attached storage became much more than a huge hard drive. To me, if you are calling your storage solution a SAN it must have a few specific features.

Simplify

Your SAN solution must use standard interconnects. That means if it takes a special cable that only your vendor sells it doesn’t qualify. In this day and age, if a vendor is trying to lock you into specific interface cards and cables they are going to go the way of the dodo very quickly. Right now the two main flavors are Fiber Optics and copper twisted pair a.k.a Ethernet. It must also reduce your management overhead this usually means a robust software suite above and beyond your normal RAID card interface.

Consolidate

It must be able to bring all your storage needs together under one management system. I’m not just talking disks. Tape drives and other storage technologies like deduplication appliances are in that category. The other benefit to consolidation is generally much better utilization of these resources. Again, this falls back to how robust the software stack that your SAN or NAS comes with.

Virtualize

It must be able to abstract low level storage objects away from the attached servers allowing things like storage pools. This plays heavily into the ability to manage the storage that is available to a server and maintain consistency and up time. How easily can I add a new volume? Is it possible to expand a volume at the SAN level without having to take the volume off-line? Can other resources share the same volumes enabling fun things like clustering?

Highly Available

If you are moving all your eggs into one HUGE basket it better be one heck of a basket. Things like redundant controllers where one controller head can fail but the SAN stays on line without any interruption to the attached servers. Multiple paths into and out of the SAN so you can build out redundant network paths to the storage. Other aspects like SAN to SAN replication to move your data to a completely different storage network in the same room or across the country may be available for a small phenomenal add on fee.

If your SAN or NAS hardware doesn’t support these pillars then you may be dealing with something as simple as a box of disks in a server with a network card. Realize that most SANs and NAS’es are just that. Specialized computers with lots of ways to connect with them and some really kick-ass software to manage it all.

Until Next Time…

Now that we have a bit of history and a framework we will start digging deep into specific SAN and NAS implementations. Where they are strong and where they fall flat.

Speaking at PASS Summit 2012

It’s Not A Repeat

Speaking at the PASS Summit last year was one of the highlights of my career. I had a single regular session initially and picked up an additional session due to a drop in the schedule. Both talks were fun and I got some solid feedback.

The Boy Did Good

I won’t say great, there were some awesome sessions last year. I did do well enough to get an invite to submit for all the “invite only sessions”. I was stunned. I don’t have any material put together for a half day or a full day session yet and the window to submit sessions was a lot smaller this year. But I do have three new sessions and all of them could easily be extended from 75 minutes to 90 minutes. So, I submitted for both regular sessions and spotlight sessions and got one of both! WOO HOO!

The Lineup

I’ll be covering two topics near and dear to my heart.

How I Learned to Stop Worrying and Love My SAN [DBA-213-S]
Session Category: Spotlight Session (90 minutes)
Session Track: Enterprise Database Administration & Deployment

SANs and NASs have their challenges, but they also open up a whole new set of tools for disaster recovery and high availability. In this session, we’ll cover several different technologies that can make up a Storage Area Network. From Fibre Channel to iSCSI, there are similar technologies that every vendor implements. We’ll talk about the basics that apply to most SANs and strategies for setting up your storage. We’ll also cover SAN pitfalls as well as SQL Server-specific configuration optimizations that you can discuss with your storage teams. Don’t miss your chance to ask specific questions about your SAN problems.

I’ve built a career working with SAN and System Administrators. The goal of this session is to get you and your SAN Administrator speaking the same language, and to give you tools that BOTH of you can use to measure the health and performance of your IO system.

Integrating Solid State Storage with SQL Server [DBA-209]
Session Category: Regular Session (75 minutes)
Session Track: Enterprise Database Administration & Deployment

As solid state becomes more mainstream, there is a huge potential for performance gains in your environment. In this session, we will cover the basics of solid state storage, then look at specific designs and implementations of solid state storage from various vendors. Finally, we will look at different strategies for integrating solid state drives (SSDs) in your environment, both in new deployments and upgrades of existing systems. We will even talk about when you might want to skip SSDs and stay with traditional disk drives.

I’ve spoken quite a bit on solid state storage fundamentals this time around I’ll be tackling how people like myself and vendors are starting to mix SSD’s into the storage environment. Where it makes sense and where it can be a huge and costly mistake.

Finally

I hope to see you at the Summit again this year! Always feel free to come say hi and chat a bit. Networking is as important as the sessions and you will build friendships that last a lifetime.

Pliant Technology, Enterprise Flash Drives For Your SQL Server: Part 2

Adding In Others For Contrast

In our first part we introduced Pliant and the LS 300 drive. In part 2 we get down to the details. To give a better idea where you stand with the setup described last time I’m throwing in two other storage setups. A RAID 10 array made up of 12 500GB 7200 RPM drives attached via SATA II controllers In a RAID 0 configuration I was able to get 800MB/sec in sequential throughput so it isn’t horrible, just not “enterprise” worthy. A Patriot Torqx 128GB based on Indilinx Bigfoot SSD controller, not the greatest SSD on the consumer market but Indilinx was the king of the previous generation. I will be using the LSI controller just like I did for the Pliant LS 300.

Patriot Torqx Specifications:
Available in 64GB, 128GB and 256GB capacities
Interface: SATA I/II
Raid Support: 0, 1, 0+1
256GB and 128GB: Sequential Read: up to 260MB/s Sequential Write: up to 180MB/s
MTBF: >2,500,000 Hours
Data Retention: 5 years at 25°C
Data Reliability: Built in BCH 8, 12 and 16-bit ECC
10 Year Warranty

RAID support? I’m not sure what they are saying here other than don’t put this drive in a RAID 5 or RAID 6 setup at all. Mean time between failures(MTBF) is a pretty useless number, I would have rather seen a maximum write life or writes per day metric. It has ECC error checking, since this is an MLC based drive that doesn’t surprise me at all. 10 year warranty, yep 10 YEARS! This was one of the reasons I bought this drive. And I’m glad I did, it has already been replaced once.

The Setup

Since we are just testing storage systems I’m not as concerned with the host machine. It is more than up to the task of generating IO’s. I used Iometer 2008.06.18-RC2 for testing and my trusty
Iometer SQL Server IO Patterns File. After the test runs I used my other tool the Iometer output parser and importer to process the results and import them into a SQL Server table. The tests consisted of two different patters. These two patterns are close to what I’ve seen in the real world and loosely based on the Intel database test pattern. I run these test at different queue depths with a single worker
OLTP Heavy Read:
A mix of 8KB and 64KB size request with 90% of them being read request and 10% being write request. This test is 100% random access.

OLTP Moderate Read:
A mix of 8KB and 64KB size request with 65% of them being read request and 35% being write request. This test is 100% random access.

Lots And Lots of Graphs

This first set is OLTP Heavy Read at a queue depth of 1. Average Response Time is in milliseconds (ms).

Interesting to see the Torqx drive actually performing better than the Pliant drive. Since this is an extremely light load and mostly read only we can assume that the Torqx is tuned more towards that kind of workload. The hard disks put in a respectable showing, for hard disks.

OLTP Heavy Read at a queue depth of 4. Average Response Time is in milliseconds (ms).

As soon as we put some kind of load the Pliant drive just walks away from the other two drives. The Torqx is still five times faster than the RAID 10 setup.

OLTP Heavy Read at a queue depth of 8. Average Response Time is in milliseconds (ms).

Again, as the workload ramps up the Pliant really just ends up in a category all its own. We are still in a decent zone for the RAID setup but the single Torqx drive still is four to five times faster.

OLTP Heavy Read at a queue depth of 32. Average Response Time is in milliseconds (ms).

Now we are pushing past the bounds of the SATA based Torqx and the SATA based RAID setup. The Pliant drive just keeps getting faster jumping from 13,000 IO/sec to 22,000 IO/sec. Response times are still very impressive as well.

OLTP Heavy Read at a queue depth of 128. Average Response Time is in milliseconds (ms).

This is what we would call a “worst case scenario” for the RAID setup. With only 12 drives we are at a queue length of 10 for each drive. Response times are showing it too with the average being 110ms. Even the Torqx drive can’t shed the IO load at this point while the Pliant drive drives past 26,000 IO/sec and inches up on 500MB/sec as well. That last statement is accurate. Since this is a dual-port drive even though its a SAS 300 drive it is able to use both ports for read and writes. I did run the test up to 256 outstanding IO/sec but the Pliant drive was capped out and was starting to add some to the response time. The RAID array and the Torqx drive were getting so slow that the Pliant drive was hard to see on the average response time graph.

This second set is OLTP Moderate Read at a queue depth of 1. Average Response Time is in milliseconds (ms).

This workload is much more write intensive and the Pliant LS 300 jumps out in front very quickly. Even at 1 queue depth it is shaming the Torqx on write performance. The RAID array is performing pretty well with lower than expected response times.

OLTP Moderate Read at a queue depth of 4. Average Response Time is in milliseconds (ms).

Quickly the Pliant drive starts to walk away with this contest. It clearly has much more capacity for write workloads than the Torqx or RAID array.

OLTP Moderate Read at a queue depth of 8. Average Response Time is in milliseconds (ms).

Here we are again at the end of the road for the RAID array. The Torqx drive is holding on but response times are getting long. It is only managing to pull a two fold increase in performance over the RAID array.

OLTP Moderate Read at a queue depth of 32. Average Response Time is in milliseconds (ms).

Now things are just embarrassing for the RAID array and the Torqx drive. Both showing that write heavy workloads aren’t the best fit. Again, the Pliant drive is starting to get response times in the millisecond range but at 320MB/Sec and 18,000 IO/Sec I would have to call that a fair trade.

OLTP Moderate Read at a queue depth of 128. Average Response Time is in milliseconds (ms).

At last we have hit a wall with the RAID array and the Torqx drive. With the Torqx drive posting up numbers that are less than two times the RAID array it is starting to show its real weaknesses. The Pliant drive however is pulling a solid 22,ooo IO/Sec and creeping up on 43oMB/Sec of throughput. All of this from a single SAS 3.5″ drive.

Final Thoughts

I’ve had the Pliant LS 300 in my lab for quite a while now. I’ve also had the Patriot Torqx and this particular RAID array setup. All three have been running hard during the last three months. The Pliant drive did show some signs of slowing down as it settled into the workloads. The RAID array lost three drives total and as I stated earlier, the first Torqx drive I had gave up the ghost in the first month. I’ve said it before, and I will say it again. If you need an enterprise drive then buy an enterprise drive! Don’t get a drive that has a SATA interface and is dressed up like it is ready for the big show. I can say without a doubt the the Pliant LS 300 is one of the finest solid state disk I’ve ever worked with.

Understanding Benchmarks

That Means What?

Vizzini: HE DIDN’T FALL? INCONCEIVABLE.

Inigo Montoya: You keep using that word. I do not think it means what you think it means.

– Princess Bride

If you are like me, you are constantly reading up on the latest hardware. Each site has it’s own spin on what makes up its review. All of them use some kind of synthetic benchmarking software. Some don’t rely to heavily on them because they can show the real world performance using playback tools. This method is used heavily on gaming hardware sites like [H]ard|OCP where they decided long ago that using purely synthetic benchmarks were at best inaccurate and at worst flat misleading. In the graphics card and processor space this is especially so. Fortunately, on the storage side of the house things are a little simpler.

What’s In A Workload

In the processor space measuring performance is a complicated beast. Even though every processor may be able to run the same software they can vary wildly in how they do it. On the processor side of things I favor Geekbench right now since it uses known mathematical algorithms. John Poole is very open on how Geekbench works Are the benchmarks relevant to database workloads? I’ll be exploring that in a future post.

In the storage space we have a pretty standard benchmarking tool in Iometer. This tool was initially developed by Intel and spread like wildfire throughout the industry. Intel quit working on it but did something very rare, turned it over to the Open Source Development Lab for continued development. You may ask why I favor Iometer over SQLIO? The answer is simple, complexity. Iometer allows me to simulate diffrent read/write patterns in a very predictable manor. SQLIO doesn’t simulate complex patterns. It does reads or writes, random or sequential for a fixed duration. This is fine for finding the peak performance of a specific IO size but doesn’t really tell you how your storage system might respond under varying workloads. You my notice that they only sites that use SQLIO are SQL Server sites. While the rest of the world generally uses Iometer. The problem is none of the sites that I regularly visit publish the exact Iometer settings they used to get the results they publish. Tom’s Hardware, Anandtech, Ars Technica and Storage Review all use Iometer in some fashion. Doing some digging and testing like hard drives I think most of the sites are using a mix 67% reads 33% writes 100% random at an 2KB block which was defined by Intel and represents an OLTP workload. Storage Review did a nice writeup a decade ago on what they use for I/O patterns and Iometer. This isn’t the best fit for a purely SQL Server workload but isn’t the worst ether. By moving from a 2KB block to an 8KB block we are now squarely in SQL Server I/O land.

SQL Server Specific

Now we are starting to get to the root of the problem. All the main hardware review sites don’t focus on us at all. If we are lucky there will be a single column marked “Database workload”. So what do we do? You read, research and put together your own test suite. SQL Server I/O access patterns are pretty well documented. So, I put those general patterns in a Iometer configuration file and keep it in my back pocket. I have posted a revised file in the My Tools section here on the site.

For the storage stuff that is fine but what about CPU and memory throughput? Things get a little murky here. Like Glenn Berry(blog|twitter) and I you can use Geekbench to get a baseline on those two things but again, this isn’t a SQL Server specific tool. In most cases sampling a workload via trace getting a baseline on performance then replaying that same workload on different servers will help but only tells you about your application. If you are looking for general benchmarks I personally wouldn’t put much stock in the old TPC-C tests anymore. They aren’t a realistic assessment of database hardware at this point. It is pretty easy to stack a ton of memory and throw a bunch of CPU’s at the test to get some ridiculous numbers. I personally look at TPC-E for OLTP tests since there is a decent sampling of SQL Server based systems and TPC-H for data warehouse style benchmarks. As always don’t expect the exact same numbers on your system that you see on the TPC benchmark scores. Even TPC tells you to take the numbers with a grain of salt.

My Personal Reader List

I personally follow Joe Chang (blog) for hard core processor and storage stuff. He has a keen mind for detail. I also read Glenn Berry(blog|twitter) he has some deep experience with large SQL Server deployments. Also, Paul Randal (blog|twitter) because he has more hardware at his house than I do and puts it to good use. I would advise you to always try and find out how the benchmark was performed before assuming that the numbers will fit your own environment.

What’s On My Todo List

I wrote a TPC-C style benchmark quite a while back in C#. I’m currently building up instructions for TPC-E and TPC-H using the supplied code and writing the rest myself in hopes of building up a benchmark database. This will be in no way an official TPC database or be without bias. I’m also always updating my Iometer and SQLIO tools as well with full instructions on how I run my tests so you can validate them yourself.

As always if you have any suggestions or questions just post them up and I’ll do my best to answer.