Category Archives: Tools

Moving Files with Open Source Software, cURL and winSCP

Open Source, ON WINDOWS!

I’m still stunned in this day and age that people pay obscene amounts of money on tools that are easily replaced in most cases with quality and stable open source alternatives. Part of the problem is that a lot of OSS tools don’t work well on Windows or require some fiddling which can scare off some people. The two tools I’ll be chatting about today don’t fall under that category. In one or two steps you have them installed and ready to rock. I’m constantly working with clients that require file movement in a secure manor. Ether upload or download.

cURL, World Wide Web Command Line

curl is a command line tool for transferring data with URL syntax, supporting DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, Telnet and TFTP. curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, kerberos…), file transfer resume, proxy tunneling and a busload of other useful tricks.

Holy cow, that’s a ton of stuff! I use cURL mostly for HTTP and HTTPS stuff. It comes as a command line tool and a library. To get all you can out of cURL you need to install a couple of bits on Windows. First of all our favorite redistributes in the form of vcredist_x86.exe. cURL website has a link to “The ones that work” but I ended up just using the latest from Microsoft.

Secondly, if you want the S part of HTTP you need to install the Win32 OpenSSL libraries. Once that is done you will need to go do some reading. cURL is very powerful and has very good documentation.

Here is an example of downloading a file from a secure HTTPS server.

curl https://<pass>:<user>@<mysecuresite.xxx>:443/ -o transfer.log -k -l -c CSX https://<mysecuresite.xxx>:443/<myfile.csv> -o “E:\<localfolder>\<myfile.csv>” -s -S

Ow, that’s a mouth full. Lets break it down.

curl – the command line tool

https://<pass>:<user>@<mysecuresite.xxx>:443/ – the secure website we are talking to

-o transfer.log – a log of what the server returns, in this case it just sends a “login successful” message

-k -This allows us to accept self-signed certificates I use this if we are dealing with site to site transfers inside the company

-l – This tells cURL to follow 3xx http redirects

-c CSX – This tells cURL to setup a cookie store some sites need them some don’t.

https://<mysecuresite.xxx>:443/<myfile.csv> – The path and name of the file we want to grab

-o “E:\<localfolder>\<myfile.csv>” – If you don’t tell cURL to save it to a file it will just echo it at the command prompt.

-s – This tells cURL to be silent. You can omit this and add the -v for verbose when you are troubleshooting issues.

-S – This tells cURL to NOT be silent on errors. If you are automating things trapping the error is important.

BAM!

Now I can grab the files I need secured via SSL over HTTP. Script them via batch file and call them from an SSIS package or SQL Server Agent job step.

WinSCP, Softer Than PuTTY

WinSCP is another tool for transferring files built specifically for Windows. If you wanted to do SFTP, SCP or FTPS you would usually fall back to using PuTTY’s psftp.exe command line tool. It does allow for robust automation but I’ve always had a headache dealing with importing SSH keys. Since the scripts will be executed under the service account that SQL Server is running under I would always have to log into the server as the service account, execute the script and accept the key. Not a huge deal just a PITA since the key is stored in the registry of he locally logged in profile. WinSCP allows you to add the SHA key in the script bypassing all that hassle.

Lets look at an example batch operation.

I’m calling this from a batch file

winscp.com /script=e:\<batchdir>\<wsftpcommands.txt> -resumesupport /parameter “%1”

winscp.com – This is the command line executable

/script – This is the command script we want to execute

-resumesupport – This bit of awesomeness will allow restarting of failed downloads so, if you have a large file you can resume instead of starting over if it fails

/parameter “%1” -This passes a single parameter to the command script. So, I’d call winscp.bat “<filename>” and it will pass the file name into the command script.

wsftpcommands.txt internals.

# Automatically abort script on errors
option batch abort
# Disable overwrite confirmations that conflict with the previous
option confirm off
open sftp://<user>:<pass>@<ftp.mysecureftpserver.xxx> -hostkey=”ssh-rsa 2048 xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx”
# Change remote directory
cd /<mydir>/<mysubdir>
# Force binary mode transfer
option transfer binary
get “%1%” “<mydir>\<mysubdir>\%1%”
# Disconnect
close
# Exit WinSCP
exit

Poof!

I’ve now got a bulletproof way to download files securely AND have the ability to restart failed downloads. Again, I can’t state how huge this is. I’ve got a backup file coming from France nightly and it is 1 terabyte. If it fails say at 999 gigabytes and I had to restart from the beginning it would be a real bummer, oh and miss our restore window completely.

Don’t Be Afraid To OSS

With these two tools I made these download processes automated and reliable for the client replacing a tool that costs thousands of dollars a year, and wasn’t as reliable as two command line tools and a batch file.

Sharing

Do you use OSS tools like these? I’d love to hear about them and if you can provide an example on how you use them and I’ll add them to this post!

Finding SQL Server Installs Using Powershell

Old Dog, New Tricks

I’ve been writing tools for SQL Server for a lot of years. Some of these tools were never completely released. Some of them were just for me. Some of them overlapped other tools already on the market and free for all. Recently, I started updating my bag of tricks and tools. This seemed like a great time to get back into PowerShell. I decided to pull out a bit of C# code I cobbled together nine years ago as part of a tool to find SQL Server instances on a network. I never really got around to making it a “production” ready tool since there was already a most excellent one on the scene in the form of SQLPing from Chip Andrews. SQLPing is a fantastic scan tool and can scan many more things than the method covered here.

Hello Operator?

When Microsoft implemented named instances with SQL Server 2000 they had to have a way to direct incoming traffic pointed to a single IP to the correct TCP port. So, they built in a traffic cop. When SQL Server 2005 came around it was pulled from the core network listener and put into its own service, the browser service. There was little documentation on how the browser worked. Luckily for poor sods like me using a network packet sniffing tool it was pretty easy to figure out how to talk to the browser. Once I figured out how to get the browser service to tell me what instances it knew about it was trivial to implement. These days Microsoft is being much more open about these kinds of things and actually have released documentation on how the browser service and the SQL Server Resolution Protocol works.

The Basic Mechanics.

As most of you know SQL Server’s default instance is on 1433 and the browser service is on 1434. Our goal is to send a UDP packet to port 1434. According to the docs we only need to send a single byte containing the number two. This prompts the listener to give us a list of instances and what port they are bound to. When I wrote my implementation it really was that simple. I dug around and figured out how to get PowerShell to send a UDP packet. I tested it and lo’ it worked, on my machine….

It’s Never So Easy.

When I tested it on my lab VM cluster with multiple nodes and multiple instances it would fail! I just didn’t get it. My C# code from the stone age worked just fine. My PowerShell code was a hit or miss. I started troubleshooting the problem just as I had in the beginning. I fired up my network sniffer and watched the traffic flow back and fourth. Again, I saw exactly what I expected, or more accurately what I wanted to see.  A single packet with the hex number two in the first byte position. I ran the test several times over the next hour or so. Eventually, I just had to walk away from it. The next day I started over again. I read the documentation, it still said the same thing. I ran the test, still a two in the first byte position. Then I spotted it. The packet wasn’t one byte long. I went back and read the document again. It gives an upper boundary of 32 bytes but no lower limit. The packet I saw come through wasn’t one byte long it was always more than that. Armed with that I started big and worked my way down until I got errors. Now I know that a packet of three bytes always triggers a response. a  two with two zeros.

Meet QuerySQLListener.

Here is the function I put together. It takes a server name and returns an array of strings with the fun bits in it.

function QuerySQLListener{
    [cmdletbinding(
        DefaultParameterSetName = '',
        ConfirmImpact = &quot;low&quot;
    )]
    Param(
        [Parameter(
            Mandatory = $True,
            Position = 0,
            ParameterSetName = '',
            ValueFromPipeline = $True)]
        [string]$Computer
    )
    Begin {
        $ErrorActionPreference = &quot;SilentlyContinue&quot;
        $Port = 1434
        $ConnectionTimeout = 1000
        $Responses  = @();
    }
    Process {
        $UDPClient = new-Object system.Net.Sockets.Udpclient
        $UDPClient.client.ReceiveTimeout = $ConnectionTimeout
        $IPAddress = [System.Net.Dns]::GetHostEntry($Computer).AddressList[0].IPAddressToString
        $UDPClient.Connect($IPAddress,$Port)
        $ToASCII = new-object system.text.asciiencoding
        $UDPPacket = 0x02,0x00,0x00
        Try {
            $UDPEndpoint = New-Object system.net.ipendpoint([system.net.ipaddress]::Any,0)
            $UDPClient.Client.Blocking = $True
            [void]$UDPClient.Send($UDPPacket,UDPPacket.length)
            $BytesRecived = $UDPClient.Receive([ref]$UDPEndpoint)
            [string]$Response = $ToASCII.GetString($BytesRecived)
            $res = &quot;&quot;
            If ($Response) {
                $Response = $Response.Substring(3,$Response.Length-3).Replace(&quot;;;&quot;,&quot;~&quot;)
                #$i = 0;
                $Response.Split(&quot;~&quot;) | ForEach {
                $Responses += $_
            }
            $socket = $null;
            $UDPClient.close()
        }
        }
        Catch {
            $Error[0].ToString()
            $UDPClient.Close()
        }
    }
    End {
        return ,$Responses
    }
}

 

It Isn’t Perfect But It Works.

I”m sure there is a cleaner way to implement it but I’m really just getting into PowerShell again after several months of tinkering with it last time. If you have any suggestions or improvements I’ll gladly take them!

Building A New Storage Test Server

We’re Gonna Need A Bigger Boat

Not to sound too obvious, I test IO systems. That means from time to time I have to refresh my environment if I want to test current hardware. Like you, I work for a living and can’t afford something like a Dell R910 Heck, I can’t afford to shell out for the stuff that Glenn Berry gets to play with these days. Yes, I work for the mighty Dell. No, they don’t give me loads of free hardware to just play with. That doesn’t mean I, or you, can’t have a solid test system that is expandable and a good platform for testing SQL Server.

The hardware choices, inexpensive doesn’t mean cheap

Well, most of the time. Realize I’m not building what I would consider a truly production ready server. Things like ECC memory and redundant power supplies are a must if you are building a “fire and forget” server to rack up. A good test server on the other hand doesn’t have the same up time requirements.

Case

A couple of years ago I would have bought something like a Aerocool Masstige. It will take a full size motherboard and has 10 5.25 bays. This allows me to then put something like this 3×5 5.25 to 3.5 mobile rack. with 10 bays I can put 15 hard drives in plus have one bay left over for something like a CD-Rom drive or another hard drive. The Aerocool Masstige does have two internal hard drive bays as well making for a total of 18 3.5″ drives in one case. The cost does add up though. The case has been discontinued but can still be found for around 110.00. The three drive cadges will run you another 100.oo. Oh, and you need a power supply that’s another 100.00. That brings the cost up to 510.00. Considering that a 3U Supermicro case with 15 bays will run you 700.00 easily. Not horrible for the amount of drive bays but there are better options now.

Norco RPC-4224 4U Server Case
This thing is big, I mean really big. It is deep and tall. It was designed to be a rack mount server but sits just fine on a shelf if you have clearance in the back. I was looking at another version of this same case that houses 20 drives but the price difference just made this hard pass up. This case isn’t a Supermicro case. It doesn’t have the build quality. To be honest though, I’m fine with that. What it does have is the ability to take a large range of ATX motherboards and a standard ATX power supply. Right now Newegg has this case on for 400.00. With a power supply that brings the total up to 500.00 still cheaper than the Supermicro with a ton of drive bays to boot. If you have worked with servers and had to cable them up you may notice that the RPC-4224 has a very different backplane layout. Every four drives has its own backplane and four lane SFF-8087 connector. Usually, most back planes have a single or maybe two connectors for 8 lanes shared via on board SAS expander. Since this doesn’t have that feature it actually makes it easier to build this thing for maximum speed. I can ether buy a very large RAID controller with 24 SAS ports or I can buy my own SAS expanders. The only down side to the backplanes on this server is the fact they are SAS 3Gb/s and not the newer 6Gb/s ports. For spinning drives it isn’t that big of an issue but if you are planning on stacking some SSD’s in those bays it can hurt you if the SSD’s support the newer protocol.

The one warning I’ll make is this thing is very front heavy. Oddly enough having 24 drives stuffed in the front doesn’t make for good weight distribution.  Pro tip, don’t put the hard drives in until the server is where you want it. It is a lot easier to move the case if it isn’t as heavy as two car batteries.

CPU

Just like Glenn, I think the Core i7 2600k is a very good choice for this build. At 314.00 you are only paying a slight premium over the 2600 for a lot more flexibility, *cough*overclocking*cough*.

Motherboard

I thought long and hard on this one and settled on a GIGABYTE GA-Z68A-D3H-B3. This is a very reasonably priced motherboard at 129.00 with some nice features. First, it is based off of the Intel Z68 chipset which means I have video built into the system and don’t have to give up a PCIe slot to video. Secondly, it has USB 3.0 which makes it easy to hook up an external USB 3.0 drive and get some livable speeds. Thirdly, it has SATA III 6Gb/s ports native. It only has two out of the six ports available at that speed but it does give me a few more drive options outside a add on RAID controller. Lastly, the PCIe slots on board are upgradeable to the new PCIe 3.0 standard. This means I don’t have to change my motherboard out to get a nice little bump in speed from newer PCIe RAID controllers or solid state cards.

Memory

Another perk of the Z68 chipset is that it will support up to 32GB of DDR3 RAM, when it becomes available that is. In the short to mid term I’ve got 16GB of Kingston HyperX 1600 DDR3 installed. That’s 115.00 in memory. I could have shaved a few dollars off but buying this as a four piece kit saves me from having to play the mix and match game with memory and hoping that it all works out.

IO System

This is where things get a little complicated. Since I need a lot of flexibility I need to have some additional hardware.

RAID Controller

I have an LSI MegaRAID 9260 6Gb/s card in the server now. At 530.00 it is a lot of card for the money. If you wanted to skip the SAS expanders and get a 24 port card you would be looking between 1100.00 to 1500.00. What’s worse, you really won’t see a huge jump in performance. Hard disks are a real limiting factor here.

SAS Expanders

SAS expanders are a must. There will be times where I will power all 24 drives from a single RAID card that has 24 lanes. There will also be times where I have smaller controllers installed and need to aggregate those drives together across or two connectors on a RAID controller. There are a couple of choices available to you. I opted for the Intel RES2Sv240 expander over the HP 468406-B21. The Intel expander supports the SAS 6Gb/s protocol and has one additional killer feature, it doesn’t require a PCIe slot to run. It was designed to work in cases that support the MD2 form factor. That means it could be mounted on a chassis wall and fed with a standard molex power connector. Why is such a big deal? It means I can stack these in my case and keep my very valuable PCIe slots free for RAID controllers and SSD cards. Newegg has them at 279.00 but you can find them cheaper. The HP expander is listed at 379.00 and requires a PCIe slot for power.

Hard Drives

I opted for smaller 73GB 15,000 RPM Fujitsu drives. They aren’t the fastest drives out since they are a generation behind. What they lack in speed they make up in price. Normally, these drives new cost 150.00 a pop. But, I’m a risk taker. You can find refurbished or pulls for as little as 22 bucks a drive. Make sure you are dealing with a seller that will take returns! I personally have had pretty good luck dealing with wholesale companies that specialize in buying older servers and then reselling the parts. Almost all of them will offer at least a 30 day return. That means you need to do a little more work on your end and validate the drives during your return window. Now I have 24 15k drives for under 600.00 bucks.

I’m using a 2.5″ 7200RPM drive as my boot drive mounted inside the case.

SSD’s

You didn’t think I’d put together a new system and not have some solid state in it did you? I’ve got a few SSD’s floating around but wanted to buy the latest in consumer grade drives and see if they have upped the game any. I opted for the Corsair Force GT 60GB drive, four of them. At 125.00 they are a solid buy for the performance you are getting. Based on the new Sanforce SF2280 controller and able to deliver 85k IOps and 500MB/sec in reads and writes they are a mighty contender. The other thing that pushed me to this drive was the fact it uses ONFI synchronous flash. I won’t hash out why it is better other than to say it produces more reliable results and is faster than its asynchronous or toggle NAND brothers.

Again, the case is so big on the inside I mounted two 1×2 3.5″ to 2.5″ drive bays to house them. That was an extra 50.00 a pop.

Lets Recap

Case 400.00
Powersupply 100.00
Motherboard 130.00
CPU 314.oo
Memory 115.00
RAID HBA 530.00
SAS Expanders 558.00
24 15K drives 558.00
4 SSD’s 500.00

Grand total: 3205.00

What does this buy me? A server that can do 2GB/s in reads or writes and 160k IOps or more. I’ll let you in on another little secret, shop around! Don’t think you have to buy everything at once. Don’t be afraid to wait a week for your parts if you get free shipping. By taking a month to put this machine together I paid about 2700.00. A huge discount over the listed price getting 30% or more off some stuff like the expanders, RAID controller, SSD’s, Case and CPU.

Just in case you were wondering what it looks like:

With the bonnet off (early test setup):

The SAS Backplanes cabled up:

Understanding Benchmarks

That Means What?

Vizzini: HE DIDN’T FALL? INCONCEIVABLE.
Inigo Montoya: You keep using that word. I do not think it means what you think it means.
– Princess Bride

If you are like me, you are constantly reading up on the latest hardware. Each site has it’s own spin on what makes up its review. All of them use some kind of synthetic benchmarking software. Some don’t rely to heavily on them because they can show the real world performance using playback tools. This method is used heavily on gaming hardware sites like [H]ard|OCP where they decided long ago that using purely synthetic benchmarks were at best inaccurate and at worst flat misleading. In the graphics card and processor space this is especially so. Fortunately, on the storage side of the house things are a little simpler.

 

 

What’s In A Workload

In the processor space measuring performance is a complicated beast. Even though every processor may be able to run the same software they can vary wildly in how they do it. On the processor side of things I favor Geekbench right now since it uses known mathematical algorithms. John Poole is very open on how Geekbench works Are the benchmarks relevant to database workloads? I’ll be exploring that in a future post.

In the storage space we have a pretty standard benchmarking tool in Iometer. This tool was initially developed by Intel and spread like wildfire throughout the industry. Intel quit working on it but did something very rare, turned it over to the Open Source Development Lab for continued development. You may ask why I favor Iometer over SQLIO? The answer is simple, complexity. Iometer allows me to simulate diffrent read/write patterns in a very predictable manor. SQLIO doesn’t simulate complex patterns. It does reads or writes, random or sequential for a fixed duration. This is fine for finding the peak performance of a specific IO size but doesn’t really tell you how your storage system might respond under varying workloads. You my notice that they only sites that use SQLIO are SQL Server sites. While the rest of the world generally uses Iometer. The problem is none of the sites that I regularly visit publish the exact Iometer settings they used to get the results they publish. Tom’s Hardware, Anandtech, Ars Technica and Storage Review all use Iometer in some fashion. Doing some digging and testing like hard drives I think most of the sites are using a mix 67% reads 33% writes 100% random at an 2KB block which was defined by Intel and represents an OLTP workload. Storage Review did a nice writeup a decade ago on what they use for I/O patterns and Iometer. This isn’t the best fit for a purely SQL Server workload but isn’t the worst ether. By moving from a 2KB block to an 8KB block we are now squarely in SQL Server I/O land.

SQL Server Specific

Now we are starting to get to the root of the problem. All the main hardware review sites don’t focus on us at all. If we are lucky there will be a single column marked “Database workload”. So what do we do? You read, research and put together your own test suite. SQL Server I/O access patterns are pretty well documented.  So, I put those general patterns in a Iometer configuration file and keep it in my back pocket. I have posted a revised file in the My Tools section here on the site.

For the storage stuff that is fine but what about CPU and memory throughput? Things get a little murky here. Like Glenn Berry(blog|twitter) and I you can use Geekbench to get a baseline on those two things but again, this isn’t a SQL Server specific tool. In most cases sampling a workload via trace getting a baseline on performance then replaying that same workload on different servers will help but only tells you about your application. If you are looking for general benchmarks I personally wouldn’t put much stock in the old TPC-C tests anymore. They aren’t a realistic assessment of database hardware at this point. It is pretty easy to stack a ton of memory and throw a bunch of CPU’s at the test to get some ridiculous numbers. I personally look at TPC-E for OLTP tests since there is a decent sampling of SQL Server based systems and TPC-H for data warehouse style benchmarks. As always don’t expect the exact same numbers on your system that you see on the TPC benchmark scores. Even TPC tells you to take the numbers with a grain of salt.

My Personal Reader List

I personally follow Joe Chang (blog) for hard core processor and storage stuff. He has a keen mind for detail. I also read Glenn Berry(blog|twitter) he has some deep experience with large SQL Server deployments. Also, Paul Randal (blog|twitter) because he has more hardware at his house than I do and puts it to good use. I would advise you to always try and find out how the benchmark was performed before assuming that the numbers will fit your own environment.

What’s On My Todo List

I wrote a TPC-C style benchmark quite a while back in C#. I’m currently building up instructions for TPC-E and TPC-H using the supplied code and writing the rest myself in hopes of building up a benchmark database. This will be in no way an official TPC database or be without bias. I’m also always updating my Iometer and SQLIO tools as well with full instructions on how I run my tests so you can validate them yourself.

As always if you have any suggestions or questions just post them up and I’ll do my best to answer.

SQLDIY: Tracking Wait Stats

One of the most important methodologies to date for troubleshooting problems with the SQL Server engine is examining the wait statistics. As with most of the DMV’s provided in SQL Server 2005/2008 sys.dm_os_wait_stats is a cumulative view. To add to my series SQLDIY: Manage and Monitor SQL Server Yourself  here is my stored procedure for capturing deltas on an ongoing basis. It is modeled the same way I capture sys.dm_io_virtual_file_stats. This procedure captures the delta and the raw data at a point in time. This allows you to do your own roll ups to look at the numbers in different ways. Another reason I do this is to capture what happened before the problem and right at the beginning as the customer started reporting the problems. Starting with the slide deck from Joe Sack (blog|twitter) on Performance Tuning With Wait Statistics as a foundation and working through the links provided, you will be able to get up to speed quickly.

Link to the script Gather Server Wait Stats

I’m hosting all my scripts from this series on GitHub as I do with all my open source projects.

As always, if you find any bugs please let me know and I will correct them!

Your Homework:
SQL Server 2005 Waits and Queues Tom Davidson
SQL Server Waits and Queues Robert Pearl (blog|twitter)
SQL Server 2008 DR, Testing, Waits and Queues Jose Barreto (blog|twitter)
Drum Roll, Please…The Debut of The SQL DMV All-Stars Dream Team! Jimmy May(blog|twitter)

Specific examples on using wait stats:
Diagnosing Transaction Log Performance Issues and Limits of the Log Manager Mike Ruthruff (blog|twitter)
How do you measure CPU pressure? Tom Davidson