I was in Phoenix last week to visit a customer, and I saw an email come through regarding a customer who contacted support because they discovered they were missing data from a table. The customer contacted support on Friday morning and our team was engaged almost immediately. The data loss had been discovered on Thursday afternoon. When we asked about a database backup to restore, so we could recover the data from it, we were told that they did have a database backup because they backup nightly. Hooray! They backup nightly! However, the backup was from Tuesday night, after the data was found to be missing. Apparently, they overwrite the backup every night.
Ouch. I immediately asked if they had an off-site copy (they did not) and then I asked why they only have one day’s backup available. The answer? They have issues with disk space.
One of my next thoughts was, “How many DBAs ask for backup space when they’re asking for space for a database?” So to any DBAs who might be reading this: when you are asked how much disk space you need for a database, do you also include backup storage? Do you include test, development or QA environments?
Then I started thinking about the documentation that our team provides to customers. Do we ever mention disk space for backups? We do not. We have no specific recommendation regarding how much space is needed for backups, but…we do talk about backing up the database to separate storage. That’s probably not enough. Conveniently, we’re in the process of updating those guides, so I’m going to add a section covering disk space for backups. And space for testing restores. And space for test, development and QA environments. Am I greedy? Yes, I am. But can I afford not to be?
And in case you’re wondering about the customer and their lost data…a team member was able to recover it using Lumigent’s Log Explorer tool. Some of you may have heard of it. It has saved data for a few customers over the years. Sadly, it doesn’t work past SQL Server 2005 and you’d have to do some serious searching to find the files online. It has a few quirks when you’re working with it, but all in all, it’s been a very useful application for us. However, a third party tool is no replacement for a sound recovery strategy.
So when you have a minute today, go check the storage where backups are stored. How much space do you have? How many backups are you keeping online, and how often are backups going off site? And finally, if you had to recover, right now, from a disaster that occurred yesterday afternoon, could you get the backup that you needed, and how quickly?
Allan-
You are spot in, thanks for chiming in. The thing is, our team is not usually involved during the implementation phase. We come in after the fact to do a health check, or because performance is poor, or to do HA/DR consulting. I’m checking to make sure they have backup jobs running regularly, and making sure they’re taking backups off site, but unless I’m talking about HA/DR, I don’t start digging in too deep. I should start doing that as well.
My SAN guys always cringe when they seem e coming, because they know they’re about to lost at least a couple TB out of their storage pool.
Then that means you’re a good DBA 🙂
Great Post!!! This is a very important point and I have seen many number of situation where backup space is ignored and no capacity planning is done for the same.
Thanks for your comment Anup! It’s so unfortunate to see this occur, because it’s so easy to prevent.
Luckily I’ve only ever had to have the argument for having backup space sitting around once. When I had to go to slow tape to recover from a production DB problem, the management soon realised that the cost of downtime was a lot higher than the cost of ‘spare’ storage.
Still use that as an example when I need to convince someone of this blindingly obvious fact.
One thing I’m seeing at the moment though, is companies cutting costs with external storage providers. Some of the speeds they get for pulling files back they might as well be looking to tape. It’s going to bite them just when they need it.
That’s an excellent point, Stuart. I debated whether to mention what type of disks you want for backups vs for a QA/Dev/Test environment, but then realized that was a rabbit hole and could be an entirely separate post. For backups, fast disks can be critical. I wouldn’t skimp there. You can go with slower disks for those environments, and for a place to test restores…but not for the actual backups. I worked with a customer recently where their backup time, using a third party utility, was 12+ hours (for a database less than 500 GB). I shudder to think about how long it takes to restore…
“the cost of downtime was a lot higher than the cost of ‘spare’ storage. Still use that as an example when I need to convince someone of this blindingly obvious fact.”
Precisely because of the resistance to “spare storage”, I have been tempted to keep several large worthless files around for the express purpose of being able to delete them and free up space quickly in a pinch. A passive aggressive approach that just might work.
Greg, that’s a great line. I understand the temptation to keep large files around so you can delete them quickly if you need the space. I don’t love that approach, as I would hope someone on the business and/or storage side would see the value of your initial argument. However, I see why you would do it 🙂
That sounds like fun – I’m kidding 🙂 Always fun to restore a large database on a USB drive…
In a production environment, I usually ask for enough space to hold 3 full backups. This will generally allow enough room to accommodate a current backup, a restore, and any transaction logs or differentials. We backup our backup drives to tape nightly, so there’s not a big need to house multiple backups on disk.
In dev and QA, I’ll follow a similar rule of thumb with one exception. I have a one application that is notorious for wanting multiple backups and multiple copies of the database at any given point in time. When planning out those servers I’ll always ask them a) how many copies of their database they want to be able to have, and b) how many backups they want to store online. Then I add 1 to each number. 🙂
I think the biggest oversight is that people don’t pay attention to how their database is growing and adjust their backup storage accordingly.
Thanks for providing some real world examples, I love it. And love that you’re adding 1 to the number of copies they tell you they need!
You are spot on in mentioning that people don’t pay attention to how their database is growing. It’s like all of the sudden one day they notice the database is 300GB and wonder how the heck that happened, when it had been steadily growing all along, if they had just been monitoring it.
While answering a question on recovering data from the tran log http://bit.ly/yliY3l I found an interesting blog post by Muhammad Imran who has done the transforms required to regenerate deleted rows from the tran log records http://bit.ly/yrSHSB
That’s quite a script that Muhammad developed, thanks for sharing! But I think you’d agree, it’s not a substitute for a good recovery strategy 🙂
We typically allocate 2x the space of the datafiles allocating. One of the initiatives I’m currently working on is using a Data Domain backup appliance which will allow to keep a lot more copies.