The Most Important Maintenance Task…

We had a customer lose their entire database this week. The server failed in some manner and they tried to recover the database from the disks, but they could not bring the database online. They engaged our group and we tried everything we could to no avail. Until now, the most data loss I had seen was when a customer had to restore a two year old backup. In this case, the customer had no backup. 

Unfortunately, this means I have a new story for my “When Bad Things Happen to Good Databases” session, which I present at our user conference.  We first did this session last year (props to my manager for the title) and the idea is to share real stories from the field with other customers so they realize that these things can happen to them. We cover maintenance and monitoring tasks, and I start with backups because I think they’re the most important thing you can do (with CHECKDB second). Haven’t updated stats?  We can help you fix that.  Have fragmentation all over the database?  Yep, we can help you fix that, too. Run out space on the drive where the transaction log is, and you’re not doing log backups?  Well, you may have some downtime but we can work with you to fix that. Need to restore to a backup but you don’t have one?  Anywhere?  I got nothin’.

For anyone that manages a database, I implore you…make sure you know when your backup jobs are scheduled to run, and check to make sure they are really running (with no errors). Maybe take a couple minutes today to verify where the backups are copied and where they are stored off site. And just for kicks, restore a backup. Pick any database, just one, and restore a backup. Please.

17 Responses to The Most Important Maintenance Task…
  1. Kevin Kline
    June 30, 2011 | 9:08 am

    Was this a “lost job” scenario? That is, did the DBA lose their job over this? They should!

    Good story. Thanks for sharing,

    -Kev

    • Erin Stellato
      June 30, 2011 | 9:22 am

      Kevin – that’s a great question, and I don’t know if anyone lost their job (yet) over this. What’s even worse, which I didn’t mention, is that the system was down for over two weeks before we were even involved. I have no idea how that happens…

      E

  2. Grant Fritchey
    June 30, 2011 | 9:09 am

    Righteous! Sing it sister!

    It’s actually incredible how often this happens.

    • Erin Stellato
      June 30, 2011 | 9:23 am

      I agree, Grant, and what’s crazy is that I’m sure there are other cases where data is lost, and we’re never informed. It’s pretty scary.

      • Grant Fritchey
        June 30, 2011 | 9:56 am

        And for them to be down for weeks before they do something… Makes you wonder.

        But I’ve seen stuff like that happen in good shops. Someone accidently removes a server from the monitoring program that checks the backups and no one notices it’s missing until you need a backup and, Whoops, those haven’t run for 3 weeks….

        We set up monitors on our monitors and checks on our checks and still occasionally had the whoopsy occur, but to just not, ever take a backup… Words escape me.

  3. Aaron Bertrand
    June 30, 2011 | 9:15 am

    Great post Erin! I would suggest that verifying the backups can be restored is at least as important as verifying that backups are running at all. So your restore advice, IMHO, is more than “just for kicks” – it should be a primary component in your recovery plan.

    After all, who cares that you have 20 backups all nice and organized in a folder somewhere, if you can’t restore any of them. I’ve seen multiple cases where a backup completed successfully with no errors or warnings, but the restore failed miserably for various reasons. Who knows what can happen to a static .BAK file that’s been sitting on disk since it was backed up, possibly moved around manually or via scheduled tasks, possibly compressed by native or 3rd party services, etc…

    • Erin Stellato
      June 30, 2011 | 9:24 am

      Aaron-

      Noted. My restore “for kicks” was meant to be something like…”I assume you’re doing this regularly anyway, but just for fun, do another one today!” But a good point and thank you for mentioning it. Can you ever verify enough? Probably not…

      E

  4. David Stein
    June 30, 2011 | 12:23 pm

    I’m always amazed when I “hear” a story like this. It’s hard to imagine that stuff like this happens in the wild.

    • Erin Stellato
      June 30, 2011 | 12:28 pm

      David-

      Believe me, I’m always amazed when they tell me there’s no backup, or that the last one was taken weeks/months ago, or they don’t know where it is. I have yet to come up with a good, “professional” response… So unfortunate because it IS avoidable.

      E

      • Andy Borgmann
        July 19, 2011 | 2:45 pm

        What’s even more difficult than coming up to a professional response to not having an out of date backup, is a dba trying to come up to a professional explanation for why you have nine months worth of good backups of an unchecked, corrupt database 🙂

  5. hillbillytoad
    June 30, 2011 | 12:31 pm

    Even after all of this, will your customer bother to treat this as a critical system? If they could go 2 weeks without their data and not be bothered to so backups…

    • Erin Stellato
      June 30, 2011 | 12:55 pm

      That’s a great question Jeff, and I have no idea. Since they’ve been informed that we cannot recover the database from the files they had, we haven’t heard from them again. It’s very unfortunate.

  6. hillbillytoad
    June 30, 2011 | 1:03 pm

    Truly unfortunate. When you deliver-ship your solution today to new customers, does it have any maintenance plans in place? I know you can’t force them to ‘do it right’, but (a really big but!)

    • Erin Stellato
      June 30, 2011 | 6:59 pm

      That’s another good point, Jeff. We don’t ship the product with maintenance plans, but if we do the installation, we create them before we leave and review with the DBA (or get agreement from the DBA that they will handle it). I’m open to new ways to raise awareness for this…we will hopefully do a PSA video that goes out to the majority of customers soon on the importance of backups…

  7. Martin Roberts
    June 30, 2011 | 6:36 pm

    Great advice, I thought I was safe as I was backing up an instance of 2005 SQL with backup exec10 & SQL agent. jobs were sucessful.

    Then it came to the point where I needed to restore a database and found that I had no data, after some investigation I found that I needed a patch for backup software to work!!

    Problem is that time to do restores is not always available.

    • Erin Stellato
      June 30, 2011 | 7:00 pm

      Martin-

      That’s an excellent example of why you should test your restores (to Aaron’s point above). Thank you for sharing!

      Erin

  8. Stellato props | Olpera
    August 29, 2012 | 10:29 pm

    […] The Most Important Maintenance Task… | Erin Stellato | Erin StellatoJun 30, 2011 … We first did this session last year (props to my manager for the title) and the idea is to share real stories from the field with other customers so … […]

Leave a Reply

Wanting to leave an <em>phasis on your comment?

Trackback URL http://erinstellato.com/2011/06/most-important-maintenance-task/trackback/