Category Archives: Testing

Why does change control for SQL Server have to be so hard?

I’ve been dealing with change control and source code repositories for most of my professional career. While I’ve seen change control and integration advance steadily for writing programs it feels like the database part of things is just stuck in the stone age. For months now I’ve been researching solutions for source control, change management, and deployment of database objects. The conclusion I’ve come to is there is no solution. Well, no easy solution. I was very happy in the early days of SQL Server 2005 when they announced source control integration into management studio. It was a great pain for me personally to have Visual Studio, and the solution architecture it offered and not have that on the database side of things. Alas, it wasn’t meant to be. What they meant buy source control was using the previous generation of integration and then crippling it.

Really?

image

This doesn’t look like much of a solution to me.

I know what most of you are thinking. If you have Visual Studio use it. That works for me but not the people on my team that only have access to SSMS. It also means I have to jump between two tools to do one thing, work with SQL Server. I have been told that Microsoft is basically pushing you to Visual Studio for all of your development needs. Leaving SSMS as a management tool only. If Visual Studio did everything SSMS did it wouldn’t be that big a deal for me personally.

 

Options Available

SQL Server Management Studio Hacks

I tried several things to work around the limitations SSMS has. I found you could manually edit the solution file to get extra folders. The only problem with that is they all show up as ether Queries or Miscellaneous. Other than that one and the old fix for sorting files by name there aren’t any other hacks I can find.

Toad for SQL Server

Toad1

Generally has a nice look and feel.It has all the development and management features to be a true replacement for management studio. I tried all the normal things that I do in SSMS in Toad and several things were better. The debugger was nice and the statement optimizer is also a nice addition. It does fall down flat in some basic key areas. I never could get it to display an execution plan. As a T-SQL guy the plan is a must. I know it is a bug somewhere. Having something this fundamental during and evaluation is a big red light though.

The only down side is it doesn’t support Sourcegear Vault/Fortress which is a real shame. Lots of SMB’s use Vault for source control since it is miles better than visual source safe and much cheaper than team system.

ApexSQL Edit

apexsql1

That left only one other contender in this fight. ApexSQL Edit has been around quite a while as well. Initially, it has a similar look and feel to Toad. I know there isn’t a lot that you can do to since both look like Office. I is also missing the management features but I can live with that. The goal is to get the developers a tool they can develop in and use our code repository easily. ApexSQL Edit did include support for Vault and it worked as expected. Again, I started using it daily like I would SSMS. Everything I tried worked, for the most part. 95% of the time it would generate an execution plan. Not as clean as SSMS but it had more options on how to display the plan, which I liked. I did have a few crashes, but this was a beta build and I will let that go until I test the full release. Since this was a beta I did provide feedback and initially the folks at ApexSQL were very responsive. Eventually though everything just went quiet accept for the sales guys asking me how things were going. Right now they are a no go until the stability issues are addressed and the RTM is out so I can do a full evaluation again.

 

Final Thoughts

What I hoped would be a pretty easy exercise turned out to be a real work out. For all of SSMS’s problems it is stable and familiar. I was really hoping that ether Toad or ApexSQL Edit would solve my problems. I haven’t given up on ApexSQL Edit yet, we will just have to play the waiting game and keep using an inadequate solution until someone comes up with something better.

When Technical Support Fails You – UPDATE and Answers!

As promised and update on what has happened so far. A correction needs to be made. the P800 is a PCIe 1.0 card so the bandwidth is cut in half from 4GB/sec to 2GB/sec.

My CDW rep did get me in contact with an HP technical rep who actually knew something about the hardware in question and its capabilities. It was one of those good news, bad news situations. We will start with the bad news. The performance isn’t off. My worst fears were confirmed.

The Hard Disks

The HP Guy (changing the names to protect the innocent) told me their rule of thumb for the performance of the 2.5” 73GB 15K drives is 10MB/Sec. I know what you are thinking, NO WAY! But, I’m not surprised at all. What I was told is the drives ship with the on board write cache disabled. They do this for data integrity reasons. Since the cache on the drive isn’t battery backed if there was any kind of failure the potential for data loss is there. There are three measurements of hard disk throughput, disk to cache, cache to system and disk to system. Disk to cache is how fast data can be transferred from the internal data cache to the disk usually sequentially. On our 15k drive this should be on average 80MB/sec. Disk to system, also referred to burst speed, is almost always as fast as our connection type. Since we are using SAS that will be close to 250MB/sec. Disk to system is no caching at all. Without the cache several IO reordering schemes aren’t used, there is no buffer between you and the system, so you are effectively limited by the Areal Density and the rotational speed of the disk. This gets us down to 10 to 15 megabytes a second. Write caching has a huge impact on performance. I hear you saying the controller has a battery backed cache on it, and you would be right.

The Disk Controller

The P800 controller was the top of the line that HP had for quite a while. It is showing its age now though. The most cache you can get at the moment is 512MB. It is battery backed so if there is a sudden loss of power the data in cache will stay there for as long as the battery holds out. When the system comes back on the controller will attempt a flush to disk. The problem with this scheme is two fold. The cache is effectively shared across all your drives since I have 50 drives total attached to the system that is around 10.5 megabytes per drive. Comparable drives ship with 16 to 32 megabytes of cache on them normally. The second problem is the controller can’t offload the IO sorting algorithms to the disk drive effectively limiting it’s throughput. It does support native command queuing and elevator sorting but applied at the controller level just isn’t as fast as at the disk level.If I had configured this array as a RAID 6 stripe the loss of performance from that would have masked the other bottlenecks in the controller. Since I’ve got this in a RAID 10 the bottleneck is hit much sooner with fewer drives. On the P800 this limit appears to be between 16 and 32 disks. I won’t know until I do some additional testing.

Its All My Fault

If you have been following my blog or coming to the CACTUSS meetings you know I tell you to test before you go into production. With the lack of documentation I went with a set of assumptions that weren’t valid in this situation. At that point I should have stopped and done the testing my self. In a perfect world I would have setup the system in a test lab run a series of controlled IO workloads and come up with the optimal configuration. I didn’t do as much testing as normal and now I’m paying the price for that. I will have to bring a system out of production as I run benchmarks to find the performance bottlenecks.

The Good News

I have two P800’s in the system and will try moving one of the MSA70’s to the other controller. This will also allow me to test overall system performance across multiple PCIe busses. I have another system that is an exact duplicate of this one and originally had the storage configured in this way but ran into some odd issues with performance as well.

HP has a faster external only controller out right now the P411. This controller supports the new SASII 6G protocols, has faster cache memory and is PCIe 2.0 complainant. I am told it also has a faster IO processor as well. We will be testing these newer controllers out soon. Also, there is a replacement for the P800 coming out next year as well. Since we are only using external chassis with this card the P411 may be a better fit.

We are also exploring a Fusion-io option for our tempdb space. We have an odd workload and tempdb accounts for half of our write operations on disk. Speeding up this aspect of the system and moving tempdb completely away from the data we should see a marked improvement over all.

Lessons Learned or Relearned

Faced with the lack of documentation, don’t make assumptions based on past experiences. Test your setup thoroughly. If you aren’t getting the information you need, try different avenues early. Don’t assume your hardware vendor has all the information. In my case, HP doesn’t tell you that the disks come with the write cache disabled. They also don’t give you the full performance specifications for their disk controllers. Not even my HP Guy had that information. We talked about how there was much more detailed information on the EVA SAN than there was on the P800.

Now What?

Again, I can’t tell you how awesome CDW was in this case. My rep, Dustin Wood, went above and beyond to get me as much help as he could, and in the end was a great help. It saddens me I couldn’t get this level of support directly from HP technical support. You can rest assured I will be giving HP feedback to that effect. By not giving the customer and even their own people all the information sets everyone up for failure.

I’m not done yet. There is a lot of work ahead of me, but at least I have some answers.You can bet I’ll be over at booth #414 next week at PASS asking HP some hard questions!

When Technical Support Fails You

I have had the pleasure of being a vendor, and technical support for both hardware and software products. I know it isn’t easy. I know it isn’t always possible to fix everything. The level of support I’ve received from HP on my current issue is just unacceptable. This is made more frustrating by the lack of documentation. The technical documents show capacity. How many drives in an array, Maximum volume size but nothing on throughput.Every benchmark they have seems to be relative to another product with no hard numbers. For example, the P800 is 30% faster than the previous generation.

I’m not working with a complicated system. It’s a DL380 G5 with a P800 and two MSA70’s fully populated with 15k 73GB hard drives. 46 of them are in a RAID 10 array with 128k stripe. Formatted it NTFS with a 64k block size and sector aligned the partition. Read/Write cache is set at 25%/75%. This server originally just had one MSA70. We added the second for capacity expansion and expected to see a boost in performance as well. As you can probably guess, there wasn’t any increase in performance at all.

Here is what I have as far as numbers. Some of these are guesses based on similar products.

P800 using two external miniSAS 4x connectors maximum throughput of 2400 MB/sec (2400Mbit per link x 4 per connector x 2 connectors).
The P800 uses a PCIe x8 connection to the system at 4,000 MB/Sec (PCIe 2.0 2.5GHz 4GB/sec each direction).
Attached to the controller are 15k 73GB 2.5” hard drives 46 of them for a raw speed 3680 MB/Sec of sequential read or write speed (23x80MB/sec write sequential 2 MSA70’s RAID 10 46 Drives total based on Seagate 2.5 73GB SAS 15.1k)

Expected write speed should be around 1200 megabytes a second.

We get around 320 MB/Sec sequential write speed and 750MB/sec in reads.

Ouch.

Did I mention I also have a MSA60 with 8 7.2k 500GB SATA drives that burst to 600MB/sec and sustain 160MB/Sec writes in a RAID 10 array? Yeah, something is rotten in the state of Denmark.

With no other options before me I picked up the phone and called.

I go through HP’s automated phone system, which isn’t that painful at all, to get to storage support. Hold times in queue were very acceptable. A level one technician picked up the call and started the normal run of questions. It only took about 2 minutes to realize the L1 didn’t understand my issue and quickly told me that they don’t fix performance issues period. He told me to update the driver, firmware, and reboot. Of course none of that had worked the first time but what the heck, I’ll give it the old college try. Since this is a production system I am limited on when I can just do these kinds of things. This imposed lag makes it very difficult to keep an L1 just sitting on the phone for five or so hours on hold while they wait for me to complete the assigned tasks. I let him go with the initial action plan in place with an agreement that he would follow up.Twice I got automated emails that the L1 had tried to call and left voicemails for me. Twice, there were no voicemails. I sent him my numbers again just to be on the safe side. Next, I was told to run the standard Array Diagnostic Utility and a separate utility that they send you to gather all the system information and logs, think a PSSDiag or SQLDiag. After reviewing the logs he didn’t se anything wrong and had me update the array configuration utility. I was then told they would do a deeper examination of the logs I had sent and get back to me. Three days later I got another email saying the L1 had tried to call and left me a message. Again there was no voicemail on my cell or my desk phone. I sent a note back to the automated system only to find the case had been closed!

I called back in to the queue and gave the L1 who answered my case number, he of course told me it was closed. He read the case notes to me, the previous L1 had logged it as a network issue and closed the case. If I had been copying files over the network and not to another local array I can see why it had been logged that way. I asked to open a new case and to speak to a manager. I was then told the manager was in a meeting. No problem, I’ll stay on the line. After 45 minutes I was disconnected. Not one to be deterred, I called back again. The L1 that answered was professional and understanding. Again, I was put on hold while I waited for the manager to come out of his meeting. About 10 minutes later I was talking to him. He apologized and told me my issues would be addressed.

I now had a new case number and a new L1. Again, we dumped the diagnostic logs and started from the beginning. This time he saw things that weren’t right. There was a new firmware for the hard drives, a new driver for the P800, and a drive that was showing some errors. Finally, I felt like I was getting somewhere! At this point it has been ten days since I opened the previous case. We did another round of updates. A new drive was dispatched and installed. The L1 did call back and actually managed to ether talk to me or leave a message. When nothing had made any improvement he went silent. I added another note to the case requesting escalation.

That was eight days ago. At this point I have sent seven sets of diagnostic logs. Spent several hours on the phone. And worked after hours for several days. The last time I talked to my L1, the L2’s were refusing to accept the escalation. It was clearly a performance problem and they don’t cover that. The problem is, I agree. Through this whole process I have begged for additional documentation on configuration and setup options, something that would help me configure the array for maximum performance.

They do offer a higher level of support that covers performance issues, for a fee of course. This isn’t a cluster or a SAN. It is a basic setup in every way. The GUI walks you through the setup, click, click, click, monster RAID 10 array done. What would this next level of paid support tell me?

My last hope is CDW will be able to come through with documentation or someone I can talk to. They have been very understanding and responsive through this whole ordeal.

Thirty one days later, I’ve still got the same issue. I now have ordered enough drives to fill up the MSA60. The plan is to transfer enough data to free up one of the MSA70’s. Through trial and error, I will figure out what the optimum configuration is. Once I do I’ll post up my findings here.

If any of you out there in internet-land have any suggestions I’m all ears.

My Virtual Lab setup Hyper-V 2008 and Virtual Server 2005 R2

One of the things I’ve maintained over the years is a lab to do everything I need. From testing new software, proof of concept designs, benchmarking or testing new builds of SQL Server.
This has always been a tall order and now that I’m back developing software part-time I need a solid, yet cheap, lab to do mostly functional resting in.
To give you an idea of how tall an order testing software to run on SQL Server has come let’s take a small trip down memory lane.

Once upon a time…… I was the product manager for Litespeed at Imceda, life was simpler then. We had a small support matrix.

SQL Server 7
SQL Server 2000 32bit Itanium

Windows NT 4
Windows 2000 Itanium
Windows 2003 32bit, Itanium, 64

Clustering on all platforms except Itanium

Multiply that by the growing feature set and it added up to hundreds of tests that needed to be executed for any built that would see the light of day.So, we setup several machines some virtual to handle the functional testing, some physical to do performance testing. We had automated test harnesses to do some of it but there was still quite a bit of manual work to be done.

Fast forward to today and this is what my test matrix looks like:

SQL Server 2000 32bit Itanium
SQL Server 2005 32bit 64bit Itanium
SQL Server 2008 32bit 64bit Itanium
SQL Server 2008 R2 32bit 64bit Itanium

Windows 2000 32bit Itanium
Windows 2003 32bit 64bit Itanium
Windows 2003 R2 32bit 64bit Itanium
Windows 2008 32bit 64bit Itanium
Windows 2008 R2 32bit 64bit Itanium

Clustering on all platforms except Itanium

Yikes! This is getting huge. This is starting to feel like testing for Oracle and Unix platforms with the only exception you can’t pick and choose what to test. If it is under support by Microsoft you are pretty much stuck supporting it. To my great fortune computers and virtualization has come along way. Almost all desktop processors have multiple cores and support hardware acceleration ether Intel’s VT-x or AMDV. Memory sizes has grown from 2GB to 8GB or better. Hard disks have gotten ALOT larger to hold all the images needed. Oddly, network speeds have stayed at 1Gb/sec but at least it has gotten cheaper. With all that in mind I should be able to build a test environment to handle quite a bit on a shoestring budget.

Lets take a look at the hardware mix I’ve got.

I started building out machines peace meal like all good home grown labs go. My first cut were Intel Q6600 based machines. Hard to beat them for the quad core goodness and all around performance. All four of these machines have multiple terabytes of hard disk space on hardware raid controllers and 8GB of ram. My second round was also Intel but were smaller using E6750 dual core processors as the base. Fewer hard disks but still 8GB of ram on these two. My third pass I switched gears and went with AMD 810 AM3 based machines. These two also have 8GB of ram and enough space to handle whatever I need them too. Last in this stack is an older AMD 5600+ is used for overflow and testing Windows 2000 on Virtual Server 2005. It has 4GB of ram, still has AMDV and two cores enough to get some work done!

Round them out are a few odd balls and support machines.

P4 3.6Ghz is used as a client load generator.
E6750 with 4GB of ram is used also as a client load generator.
AMD 4800+ acts as my controlling server managing all the images and deployments as well as acting as a Virtual Server 2005 for Windows 2000 as well. 4GB and two cores means its up to the task of functional testing when needed.

You will notice I don’t have any Itanium in the mix. I haven’t had a need for it yet as 100% of all our customers and potential customers aren’t running it. eBay is a quick solution for that though.

That’s the basic hardware setup now moving on to the software stack.

As a startup business I’ve joined the Bizspark and Empower for ISV’s. If you are staring a software or software as service business focused on Microsoft technologies and aren’t using these great programs you are putting yourself at a serious disadvantage. That made the choice of VMware or Hyper-V a no brainer for me. But, just to be fair there were other reasons than cost involved in that decision. I can hear some of you now telling me that VMware ESXi 3.5 is F R E E!!!!! And you would be right. The problem is the hardware it will, or more importantly, won’t run on. Remember this thing is hobbled together and will have a wide mix of hardware that needs to be supported. They also offer VMware for windows (GSX) for free as well. I agree that in general it is superior to Virtual Server 2005 another piece of software comes into play Microsoft Virtual Systems Center. It does manage ESX but I couldn’t figure out if it supported GSX and it does support VS2005 and Hyper-V out of the gate (surprise, surprise) With the hardware stack and now the software stack in place we have to work to get this hodgepodge group of servers installed and configured. As I mentioned before, Hyper-V was pretty much the only choice if I was going to manage it centrally and support my hardware mix. That doesn’t mean it was all plug-n-play so to speak.

Some pros and cons of Hyper-V Server 2008

Pros:

Its Free.
It supports VT-x and AMDV.
It supports 32bit and 64bit clients.
You can load drivers for pretty much any hardware Windows Server 2008 x64 supports.

Cons:

It requires x64 support. (Some people really didn’t know)
It requires the disable execution bit to be turned on in bios(not all motherboard manufacturers have the function exposed)
You must have a processor with VT-x or AMDV. If you don’t the OS still installs but the hyper-v services will simply not start.
It is a core installation so no GUI to speak of, if you are a hard core Windows person and has to have the GUI tools you need to load the regular version of Windows 2008 and install the Hyper-V role.
Older clients can run really, really slow. Windows 2000 and Windows XP are prime examples and aren’t listed on the compatibility list there for the need for Virtual Server 2005 R2.

Things to know about Virtual Server 2005 R2. It is also free. Host OS is 32bit support only Windows 2000 or 2003 (no R2). Client wise only up to Windows 2003 32bit is supported but it does run them well enough for what we need.

Installing Hyper-V Sever 2008

The install is pretty simple, it is a Windows 2008 install. If you need to load drivers so the OS can see the disks you can do it with ease even from USB. After the install configuring everything can get tricky simply because you don’t have the GUI to do all the stuff you normally would do post install.

Installing Drivers

If you aren’t prompted to load your drivers by the hardware detection wizard, aka the hardware was already in the machine, there are some simple and not so simple steps that you can take.

The simplest way:

Locate the drivers you need for ether Windows Server 2008 x64 or Vista x64.
Run the installer from the command prompt.
I found that the Intel INF updater runs out of the box just fine for example.

Slightly harder way:

If the installer won’t run you will need to locate the driver files sys and inf and use the command line pnputil tool to install them.
pnputil –i –a <path to the inf and driver>
Some drivers are rapped in cab files and you may need to extract them before putting them on your server.
run driverquery and locate the driver you just installed, may not be so easy if there is a ton of drivers on the machine.
reboot.
check to see if the device is now working (nic card, sata/sas controller etc)
If the Hyper-V server is on the network you can use the compmgmt.msc to remotely view services and stuff on the server if the firewall is setup to allow it too.

The HARDEST way:

I found this method on the internet and haven’t verified it but I thought it really showed what some people can do when they think outside the box.

Here’s how I did it, the hard way.  But maybe it will help someone else.
1. I built a Lab server, with Windows Deployment Services.
2. Installed Microsoft Deployment Toolkit 2008,
http://www.microsoft.com/downloads/details.aspx?familyid=3bd8561f-77ac-4400-a0c1-fe871c461a89&displaylang=en&tm
2. Installed the latest AIK Kit: Windows Automated Installation Kit (Windows AIK)
http://www.microsoft.com/downloads/details.aspx?familyid=94BB6E34-D890-4932-81A5-5B50C657DE08&displaylang=en
3. Loaded the Hyper-V 2008 Server ISO.
4. Installed all Intel Drivers I will need for old and current builds.
5. Used the Guidelines in Microsoft® Deployment Toolkit 2008
Getting Started Guide, from the Microsoft Deployment Toolkit.
6. Built x64bit Image with LAN Drivers installed.
7. Built a Bootable USB.
8. Booted Server with USB, Did a network install from Windows Deployment Services.
9 Rebooted Server and was able to use NIC Card.

Some follow on links for driver management.
Server Core and Drivers
Technet forums: Core NIC Driver installation
Managing Services and Hardware in Server Core

 

Specific uncommon things to my setup

The next thing on my list is at least one of my installs isn’t joined to the domain. Why would you do that you may ask? Easy, this is the machine that actually hosts the domain controller. I will warn you this isn’t a supported Microsoft configuration. they always recommend having at least one physical domain controller. But since this is a test lab I didn’t want to dedicate a machine to just a DC role. Use a fixed IP, no DHCP if the DC hosts the DHCP you will be stuck with a “chicken and the egg” problem. I personally disable the firewall all together instead of just managing openings since all of these already sit behind a hardware firewall. Issuing this command accomplishes that: netsh advfirewall set allprofiles state off

If the server isn’t a part of the domain managing it requires a few tweaks.  John Howard has resolved pretty much all of them in this blog post.
http://blogs.technet.com/jhoward/archive/2008/03/28/part-1-hyper-v-remote-management-you-do-not-have-the-requested-permission-to-complete-this-task-contact-the-administrator-of-the-authorization-policy-for-the-computer-computername.aspx
http://code.msdn.microsoft.com/HVRemote

I also ran into an issue where my Windows 7 RC build couldn’t access the hyper-v server and this post fixed that.
http://blog.mpecsinc.ca/2009/06/hyper-v-error-access-denied-unable-to.html

Lastly, Sense I am running VS2005 R2 and Hyper-v I have a need to transfer images back and forth these three posts cover that topic as well.
http://blogs.msdn.com/virtual_pc_guy/archive/2009/02/03/virtual-server-to-hyper-v-migration-guide.aspx
http://www.groovypost.com/howto/microsoft/windows-server/migrate-microsoft-virtual-server-2005-r2-vm-to-windows-server-2008-hyper-v/

hyper-v to virtual server
http://cosier.wordpress.com/2009/04/17/migrating-from-hyper-v-to-virtual-server/

As a side effect of setting up this lab and working with SQL Server 2008 and Exchange 2007 SP1 I’m learning how to use Powershell.
As things settle in I’ll be talking more about actually managing the VM’s with System Center and Powershell.

Oh, just a quick shot of the magic closet. Yes, in the lower right hand corner is an AC unit. No, you don’t want to know what my electricity bill is.

 

Test_Lab