Category Archives: Backups

Growing It At Home: Building Your Own Tools

Short On Money?

Sometimes that is the motivation. Maybe the tool you want doesn’t really exist yet? For me it was those things and the joy of building something useful. I’ve built dozens if not hundreds of tools in the form of scripts, batch files and executable code. Did you know some of the tools you are paying your employers hard earned money for came up from the same roots? Someone needed a tool to do X, and lo’ there were others who were willing to pay for that work.

Great, Someone Has Written The Tool…

They also want something called money for me to use it. That may be the case but some for profit companies are very open about what they do. One of my personal favorites is Confio’s Ignite. They have NEVER been dodgy about the data they gather or how they gather it. I’ve seen them in presentations say things like “you can gather this exact data from these DMV’s” and show people the query to pull that data. They will gladly show you their schema and how to write your own queries against it. It seems like they are giving away the secret sauce! Yet, they have a very awesome product that I and others are happy to pay for. Sometimes it isn’t just gathering the data it is everything else rapped around the data that is the real value.

I’m Still Broke, Now What?

If you read my last few blog posts you also now know there are a TON of opensource alternatives for the majority of the tools you would pay for. Some are of better quality, some aren’t. Some don’t do everything you want but maybe two of them together do meet your needs. Again, there isn’t a direct cost in dollars but there may be an indirect cost in your time. For me it isn’t just about the time spent but the enjoyment of building something new or building on someone else’s work to meet my specific needs. My overriding mantra for most opensource tools is “Good Enough”. In general, it may not have the polish of X commercial tool but it is good enough.

What I’m Working On Now?

Isn’t that the question of the day. I’ve currently got not one but two irons in the fire. One is an extension of an existing opensource project another is a project that was all my own and now I’ve started to incorporate some work done by others into it.

MSSQLCompressedBackup

This was a project I recently picked up based on a need I had at my work. As some of you know, I’ve got some experience with backup compression and encryption. We are a SQL Server shop. About 70% of my servers are still on 2005 at the moment. I also wanted to move our encryption to the backup level and not at our tape drive level. We really didn’t want to spend the extra money for one of the commercial products out there if we didn’t need to. Fortunately there was an opensource project that came close to meeting our needs. With a little work I was able to add some additional features like fast LZ compression and AES encryption. Is it as fast as all the commercial products out there? No, probably not. It is fast enough for us though. Does it have all the features of the commercial products out there? Again, that would be a big fat no. Right now though, we don’t need the extra bells and whistles that come with those products, if we do and I can’t find an opensource alternative and I can’t build the feature we need then we will look at the cost again.

Central Operations and Reporting Engine

I’ve been kicking this can around for the better part of 15 years. It really came it to focus at my time with The SCOOTER Store. At one point I came close to releasing a commercial version but the timing was off. That hasn’t stopped me from working on it and releasing bits of it in the form of my SQLDIY stuff. I’ve decided to quit just puttering around with it and go full bore again just to see what would happen. Previously everything was written in C# and T-SQL. It had it’s own scheduling engine, reporting engine and GUI. Most of it is woefully out of date now and I’ll be using powershell to do the heaving lifting where possible. I’m also working on a dashboard, web configuration tool and some other reporting bits to make it even more useful. When the project goes up on codeplex I’ll write a full post on it.

Demystifying SQL Server Differential Database Backups

Odd Man Out

SQL Server has three backup types. Two you have heard of and used. One, while useful, isn’t very well understood.

Let’s start with a technical recap of the three backup types for SQL Server.

1. Full Database Backup

When you request a full backup, SQL Server dumps all the data pages from your database, metadata about how your database is stored on disk and finally enough of the transaction log to bring the database back into a consistent state. There are a few things you need to know about the full backup semantics. When you take a full backup it makes a few changes. Those changes are tracked in two places in the database and one in MSDB. The changes tracked in the database allow us to then use transaction log backups and differential database backups. The data logged to MSDB isn’t critical for restoring your backups. It does make it much easier to do so. Full backups are considered our “base” backup type. Every other backup type can use a full database backup as its base. Even though a full backup does capture some of the transaction log it doesn’t clear the log. If you are in simple mode, the normal checkpoint process will clear the log. If you are in bulk load or full recovery mode, you will need to take a transaction log backup to clear the log.

2. Transaction Log Backup

Transaction log backups are a critical part of any recovery plan. Without them you can’t restore up to the minute. If your database is in anything other than simple recovery mode, your only supported option to clear the log is a transaction log backup. Transaction log backups are serial by nature. The log restored depends on either a full or differential for its base and any log backups done before the current log you wish to apply.

3. Differential Database Backup

Like a full database backup, the differential backs up data pages and enough of the transaction log to bring the database back into a consistent state. Unlike full or transaction log backups, the differential backup captures all changes since the last full backup occurred. The information on changed data pages is stored internally in the database and doesn’t require any information from MSDB. The map of changed data pages only gets reset on the next full backup. Transaction log backups or other differential database backups will not reset the changed data map. You can think of transaction log backups as incremental backups. People coming from a systems administration background can get tripped up and treat differential backups like incremental backups. This can cause a significant waste of time when restoring your database since you only need to apply the full backup and the most current differential, or the differential you are interested in to get your database back into a recovered state.

Understanding Differential Database Backups

Most people are put off by the nature of differential backups mainly due to the amount of space they can grow to and the extra complexity they can add to your recovery plans. If you don’t manage them, you can quickly run into a differential that is larger than the full it is based on. Also, any data page alterations are tracked. For example, if you take a full backup then perform full index reorganization on a heavily fragmented index you can end up with very large differential backups. File shrinks with full reorganizations also have the same effect. Even though the actual data hasn’t changed, you end up with differential database backups that are unwieldy and difficult to manage. If you miss a full backup in your schedule, your differentials again may grow larger than your full backup.

There are several cases where differential database backups are a pivotal key to recovering your database quickly and with as little data loss as possible. Let’s take a look at a few scenarios.

Shortening Recovery Time

This alone should be good enough reason for you to investigate differential backups. Every restore operation has a cost-in-time associated to it. Remember, even if a transaction log backup is virtually empty, there is a cost-in-time to spool up and tear down the restore session for each log backup you apply. Not to mention replaying the transactions in the logs. In many cases, it can be much faster to apply a differential backup than applying multiple transaction log backups. By skipping all the data manipulation and just replacing the altered pages you are reducing the amount of IO required to restore.

Database in Simple Recovery Mode

There may be situations where you aren’t concerned with up-to-the-minute recovery but still need something better than weekly full backups to meet your recovery goals. Differentials fit in well here. By leveraging differential backups, you can take a single full once a week and daily differentials to cut down on the space needed to store your backups. Also, since differential backups contain all the changes since the last full, to recover you only need the full backup and the differential backup of the time interval you want to restore to. I recommend keeping your differentials just like you would your transaction log backups so if you need to recover your database into another environment or if you suffer corruption in one of your differentials, you still have as much data as possible to restore.

Large Database with Minimal Data Change

With today’s large disk capacities, it isn’t unusual to see multi-terabyte databases with years of data in them. Moving our full backup schedule out to every two weeks or every month and supplementing with differentials is an excellent way to conserve backup space and shorten time to recovery. Again, we only need the last full, the last differential and any transaction logs after the differential was taken to get us back up to the minute.

Increasing Recoverability

if you only take a full database backup once a week and transaction log backups every 15 minutes, you could end up applying over 670 logs to get your database back on line if you have a failure at the end of the week. If you have any errors in one of the transaction log backups, everything after that is pretty much useless to you. If it dies at backup 599, it may not be the end of your business. If it is log 38, it could mean a week’s worth of data gone. Since differential backups don’t break the LSN chain and transaction log backups don’t reset the changed data map, you can use either backup type even when one or the other may have had an error. Differentials allow us to bridge gaps in our transaction log since we can apply any transaction logs taken after the differential backup. This is one of the real strengths of differential backups. So, if you are doing weekly full backups, daily differential backups and transaction log backups every 15 minutes, you are covered in two ways. Normally, you would restore the full then the latest differential backup followed by any additional transaction logs. If you had a differential backup corrupted but your transaction logs, were fine you could still restore fully.

Repairing Log Shipping

Another great use of differentials is to repair your log shipped databases. If anything happens to the LSN chain, in most cases the only way to repair your log ship target is to start over again from a full and then apply all the logs to get it back up to current. If this is a large database or if there are a lot of transaction log backups to recover this could leave you exposed for quite a while. You can always take a differential backup, apply that to the log ship target then restart your log shipping from that point. I have used this technique successfully over the years when there have been network outages causing our log ship targets to fall way behind cutting catch up time from hours to minutes.

Final Thoughts

Incorporating differential backups will add complexity to your backup strategy but the benefits can be staggering. Between the storage savings and reduction in recovery time it’s clear that differential backups should be in your tool belt. I would also encourage you to practice restoring using your differential backups. Try out different failure scenarios like failed transaction logs or differential backups. Make sure you understand how to restore up to the minute and stop at a specific time now that you have differentials in the mix.

This is a re-blog from an article I wrote for SWWUG on April 19th 2012

Software Review: Idera’s Virtual Database

Ever since the upgrade from SQL Server 6.5 to 7.0 one of the most requested features I’ve heard people complain about loosing was the ability to backup a single table. During my tenure as Product Manager for Quest’s SQL Litespeed extracting tables was one of the things we were constantly asked about. Eventually built it, and it worked. So, believe me when I tell you it is no easy feat to extract data from a backup file, let alone make it look like a normal database and attach it to a running SQL Server instance. The holy grail to me was always “select * from backup”. Virtual Database is as close to using a backup file as a real database that I have ever seen.

Setting Up Your Virtual Database

The main screen isn’t your traditional style of GUI. This is where everything happens. It has options to attach backup files, ether a single full or multiple files to a point in time. There is a run queries option, which is redundant since you can use management studio to access the database like you would any other database. The Tips @Tricks is a little pointless since this is installed on your server and if you are like me it is locked down pretty tight so cruising web sites is out. Finally, Help & More it a lot more important than you think.

To attach a full backup is dead simple. Give it the file name, SQL Server instance name and a new database name.

If you have used Idera’s SQL Safe to do the initial backup with and had it generate the content map file the attach time is fast. I didn’t test with native backup files since we don’t do any native backups where I work. If the content file isn’t there it will have to generate it and depending on the size of the backup file it can take a while.

Attaching multiple files is a little more difficult. The wizard steps you through it though.

If you accessing files from a network share it will prompt you for credentials so it can read the backup files. Virtual Database fully supports UNC mounting.

If you have encrypted your backups using SQL Safe, it will prompt you for the password as well. You are encrypting your backups right?

It looks through the backup files to pull out key information needed to do the point in time recovery. This is also very fast. I actually had to do it a few times to get this screen shot!

Once you have the files added it will show them sorted. Any files it thinks there is a problem with it will highlight in red for you. These were LSN out of sync errors. It doesn’t stop you from going to the next screen though.

Once you have settled on a file list you can now choose your point in time. You can have multiple full, differential, or transaction log backup files specified. When you choose a point in time it selects the files needed to do the recovery. It will choose the backup files that are needed. If you have done what I just did and selected everything in a folder it won’t apply three full backups plus all the transaction log files.

Like the single file attachment process, you still need to provide an instance name and database name.

Finally, it gives you a summary of what is going to happen. After that the database is ether attached, or fails. If it fails you have to start all over.

Configuration Options

There are very few configuration changes that you can make the product.

I’m not thrilled with this screen setup. It’s got a lot going on and some of the functions like most of the help stuff, doesn’t work on my locked down server.

Where you put your support files may an impact on performance. SQLvdb has some support files it needs to do its magic. This is also where it records changes you make. If you plan on making a lot of changes you may want to move the location from the C drive to another faster and larger drive.

Some files aren’t deleted when you drop the virtual database and require a manual cleanup step. You can do it here or through the command line tool.

The drive letter is basically a protection mechanism. SQLvdb is looking for a specific set of file extensions like .mxf or .lxf. So, don’t use those extensions for anything else. You can change it if you do have a file extension collision.

The service connection allows you to change the port but not the server name. This means I can’t have the GUI installed on my desktop and access my SQL Server where the services and driver is installed. I don’t know if this is supposed to be this way or if it is a bug.

The Good

What can I say, it is select * from backup. You mount it and like magic your backup file is a live database again.

You can also create objects in this new database. That’s right, you can create a new table and put data in it. That data and structure aren’t stored in the backup file though and if you remove the database all changes are lost. You can do a backup of the “new” database just like you would a normal backup though, saving your changes to a new backup file.

You can also drop tables, alter stored procedures you name it. I haven’t found a DDL statement I couldn’t execute yet. Just like before, if you need to reset the database just drop it and re-attach it, just like magic you have a clean slate again.

It works across the network. If you don’t have enough space on your machine for the backup file you can still mount it via UNC. This allows me to look at backup files from my development or test servers without having to consume the space to house the database. Don’t expect the performance to be great, you are accessing a file across the network.

Any tool that works with SQL Server works with this. If you have a favorite scripting tool you can use it. Written your own programs? No problem you can use them too.

Never having to restore to a test or development environment again. When you are working with large databases getting an exact duplicate for functional testing is a huge undertaking, not to mention expense. You still have to buy all the disk space. Before SQLvdb I would restore the database, set it to simple mode, issue a checkpoint and then shrink all the files to the bare minimum. On our largest database this would take eleven hours or more. Now, I copy the backup file and attach the virtual database. Copying takes about an hour an the attachment takes about 20 minutes. This is a 380GB backup file of a 1.2TB database. If I didn’t mind accessing it from a network share it would cut that down to 10 or 20 minutes tops.

Never having to restore the entire database to get a single object from a backup again. Same as above but now I don’t have the boss standing behind me for several hours asking “Is it done restoring yet?”

Doing it all from the command line. This means I can automate a ton of stuff and build processes around SQLvdb.

The Bad

The User Interface

I really don’t like the GUI. It is a far enough departure from what we are familiar with that it it makes it difficult to use. Why split up single file and multiple file attachment? It took me some time to find where to change the configuration options as well.

Do I need a Run Queries option? The whole point of the product is being able to use SSMS or whatever you need to access the database. What if I don’t have SSMS installed on my production server?

Tips & Tricks is also a waste of screen space. Why isn’t this in the help system? It doesn’t point out the more creative uses like offsetting restores to development boxes that I can find.

Can’t use the console remotely. It has the option built in but greyed out. With a locked down server getting remote access may be a problem, not to mention all the help is build into the GUI as well.

The wizard for multiple file attach will warn you in one stage about missing or invalid backup files but doesn’t in the second stage. It will let you attempt to attach the virtual database then fail. You can’t edit the virtual database ether. Your only choice is to drop it and start over.

The Command Line

Almost zero documentation on the command line options. I had to root around and play with the options to figure out that there is some switches that are order dependent.

The GUI allows you to kill all users out of a SQLvdb before dropping it, the CLI doesn’t seem to have that option. This is just an extra step I have to put into my automation scripts and hope I get it right.

Other Issues

You can’t run a DBCC CHECKDB on it. Since DBCC uses sparse files and snapshots to get a consistent look at the data it can’t run on a SQLvdb. SQLvdb is also using sparse files to do some of the things it does. What’s worse is the DBCC command just hangs out and looks like it is running OK. I let it run for a couple of hours before killing it.

Doing anything in Management Studio that requires it to pull a list of databases locks up SSMS while the attach is in progress. If you leave SSMS alone it will come back. If you don’t and keep clicking around it can lock it up or crash it all together. I’ve been told they are working on a fix for this.

I ran into some other odd issues that may be bugs or something wrong with my server I don’t know yet. I am working with Idera to identify the problem, when I do I’ll update this post.

Final Thoughts

I can honestly say that this is a great product. It is everything I had hoped it would be. Aside from the small issues it has been a solid purchase. I don’t think Idera sees all the uses for SQLvdb. Being able to get a table or stored procedure from a backup file is nice, but it is an insurance policy. Being able to offset eleven ours of restores to development every week and cut disk space usage by 70%, well that is money in the bank.