We had a customer lose their entire database this week. The server failed in some manner and they tried to recover the database from the disks, but they could not bring the database online. They engaged our group and we tried everything we could to no avail. Until now, the most data loss I had seen was when a customer had to restore a two year old backup. In this case, the customer had no backup.
Unfortunately, this means I have a new story for my “When Bad Things Happen to Good Databases” session, which I present at our user conference. We first did this session last year (props to my manager for the title) and the idea is to share real stories from the field with other customers so they realize that these things can happen to them. We cover maintenance and monitoring tasks, and I start with backups because I think they’re the most important thing you can do (with CHECKDB second). Haven’t updated stats? We can help you fix that. Have fragmentation all over the database? Yep, we can help you fix that, too. Run out space on the drive where the transaction log is, and you’re not doing log backups? Well, you may have some downtime but we can work with you to fix that. Need to restore to a backup but you don’t have one? Anywhere? I got nothin’.
For anyone that manages a database, I implore you…make sure you know when your backup jobs are scheduled to run, and check to make sure they are really running (with no errors). Maybe take a couple minutes today to verify where the backups are copied and where they are stored off site. And just for kicks, restore a backup. Pick any database, just one, and restore a backup. Please.
Was this a “lost job” scenario? That is, did the DBA lose their job over this? They should!
Good story. Thanks for sharing,
-Kev
Kevin – that’s a great question, and I don’t know if anyone lost their job (yet) over this. What’s even worse, which I didn’t mention, is that the system was down for over two weeks before we were even involved. I have no idea how that happens…
E
Righteous! Sing it sister!
It’s actually incredible how often this happens.
I agree, Grant, and what’s crazy is that I’m sure there are other cases where data is lost, and we’re never informed. It’s pretty scary.
And for them to be down for weeks before they do something… Makes you wonder.
But I’ve seen stuff like that happen in good shops. Someone accidently removes a server from the monitoring program that checks the backups and no one notices it’s missing until you need a backup and, Whoops, those haven’t run for 3 weeks….
We set up monitors on our monitors and checks on our checks and still occasionally had the whoopsy occur, but to just not, ever take a backup… Words escape me.
Aaron-
Noted. My restore “for kicks” was meant to be something like…”I assume you’re doing this regularly anyway, but just for fun, do another one today!” But a good point and thank you for mentioning it. Can you ever verify enough? Probably not…
E
I’m always amazed when I “hear” a story like this. It’s hard to imagine that stuff like this happens in the wild.
David-
Believe me, I’m always amazed when they tell me there’s no backup, or that the last one was taken weeks/months ago, or they don’t know where it is. I have yet to come up with a good, “professional” response… So unfortunate because it IS avoidable.
E
What’s even more difficult than coming up to a professional response to not having an out of date backup, is a dba trying to come up to a professional explanation for why you have nine months worth of good backups of an unchecked, corrupt database 🙂
Even after all of this, will your customer bother to treat this as a critical system? If they could go 2 weeks without their data and not be bothered to so backups…
That’s a great question Jeff, and I have no idea. Since they’ve been informed that we cannot recover the database from the files they had, we haven’t heard from them again. It’s very unfortunate.
Truly unfortunate. When you deliver-ship your solution today to new customers, does it have any maintenance plans in place? I know you can’t force them to ‘do it right’, but (a really big but!)
That’s another good point, Jeff. We don’t ship the product with maintenance plans, but if we do the installation, we create them before we leave and review with the DBA (or get agreement from the DBA that they will handle it). I’m open to new ways to raise awareness for this…we will hopefully do a PSA video that goes out to the majority of customers soon on the importance of backups…
Great advice, I thought I was safe as I was backing up an instance of 2005 SQL with backup exec10 & SQL agent. jobs were sucessful.
Then it came to the point where I needed to restore a database and found that I had no data, after some investigation I found that I needed a patch for backup software to work!!
Problem is that time to do restores is not always available.
Martin-
That’s an excellent example of why you should test your restores (to Aaron’s point above). Thank you for sharing!
Erin
[…] The Most Important Maintenance Task… | Erin Stellato | Erin StellatoJun 30, 2011 … We first did this session last year (props to my manager for the title) and the idea is to share real stories from the field with other customers so … […]