The Most Important Maintenance Task…

We had a customer lose their entire database this week. The server failed in some manner and they tried to recover the database from the disks, but they could not bring the database online. They engaged our group and we tried everything we could to no avail. Until now, the most data loss I had seen was when a customer had to restore a two year old backup. In this case, the customer had no backup. 

Unfortunately, this means I have a new story for my “When Bad Things Happen to Good Databases” session, which I present at our user conference.  We first did this session last year (props to my manager for the title) and the idea is to share real stories from the field with other customers so they realize that these things can happen to them. We cover maintenance and monitoring tasks, and I start with backups because I think they’re the most important thing you can do (with CHECKDB second). Haven’t updated stats?  We can help you fix that.  Have fragmentation all over the database?  Yep, we can help you fix that, too. Run out space on the drive where the transaction log is, and you’re not doing log backups?  Well, you may have some downtime but we can work with you to fix that. Need to restore to a backup but you don’t have one?  Anywhere?  I got nothin’.

For anyone that manages a database, I implore you…make sure you know when your backup jobs are scheduled to run, and check to make sure they are really running (with no errors). Maybe take a couple minutes today to verify where the backups are copied and where they are stored off site. And just for kicks, restore a backup. Pick any database, just one, and restore a backup. Please.

16 Responses to The Most Important Maintenance Task…
  1. Kevin Kline
    June 30, 2011 | 9:08 am

    Was this a “lost job” scenario? That is, did the DBA lose their job over this? They should!

    Good story. Thanks for sharing,

    -Kev

    • Erin Stellato
      June 30, 2011 | 9:22 am

      Kevin – that’s a great question, and I don’t know if anyone lost their job (yet) over this. What’s even worse, which I didn’t mention, is that the system was down for over two weeks before we were even involved. I have no idea how that happens…

      E

  2. Grant Fritchey
    June 30, 2011 | 9:09 am

    Righteous! Sing it sister!

    It’s actually incredible how often this happens.

    • Erin Stellato
      June 30, 2011 | 9:23 am

      I agree, Grant, and what’s crazy is that I’m sure there are other cases where data is lost, and we’re never informed. It’s pretty scary.

      • Grant Fritchey
        June 30, 2011 | 9:56 am

        And for them to be down for weeks before they do something… Makes you wonder.

        But I’ve seen stuff like that happen in good shops. Someone accidently removes a server from the monitoring program that checks the backups and no one notices it’s missing until you need a backup and, Whoops, those haven’t run for 3 weeks….

        We set up monitors on our monitors and checks on our checks and still occasionally had the whoopsy occur, but to just not, ever take a backup… Words escape me.

  3. Erin Stellato
    June 30, 2011 | 9:24 am

    Aaron-

    Noted. My restore “for kicks” was meant to be something like…”I assume you’re doing this regularly anyway, but just for fun, do another one today!” But a good point and thank you for mentioning it. Can you ever verify enough? Probably not…

    E

  4. David Stein
    June 30, 2011 | 12:23 pm

    I’m always amazed when I “hear” a story like this. It’s hard to imagine that stuff like this happens in the wild.

    • Erin Stellato
      June 30, 2011 | 12:28 pm

      David-

      Believe me, I’m always amazed when they tell me there’s no backup, or that the last one was taken weeks/months ago, or they don’t know where it is. I have yet to come up with a good, “professional” response… So unfortunate because it IS avoidable.

      E

      • Andy Borgmann
        July 19, 2011 | 2:45 pm

        What’s even more difficult than coming up to a professional response to not having an out of date backup, is a dba trying to come up to a professional explanation for why you have nine months worth of good backups of an unchecked, corrupt database 🙂

  5. hillbillytoad
    June 30, 2011 | 12:31 pm

    Even after all of this, will your customer bother to treat this as a critical system? If they could go 2 weeks without their data and not be bothered to so backups…

    • Erin Stellato
      June 30, 2011 | 12:55 pm

      That’s a great question Jeff, and I have no idea. Since they’ve been informed that we cannot recover the database from the files they had, we haven’t heard from them again. It’s very unfortunate.

  6. hillbillytoad
    June 30, 2011 | 1:03 pm

    Truly unfortunate. When you deliver-ship your solution today to new customers, does it have any maintenance plans in place? I know you can’t force them to ‘do it right’, but (a really big but!)

    • Erin Stellato
      June 30, 2011 | 6:59 pm

      That’s another good point, Jeff. We don’t ship the product with maintenance plans, but if we do the installation, we create them before we leave and review with the DBA (or get agreement from the DBA that they will handle it). I’m open to new ways to raise awareness for this…we will hopefully do a PSA video that goes out to the majority of customers soon on the importance of backups…

  7. Martin Roberts
    June 30, 2011 | 6:36 pm

    Great advice, I thought I was safe as I was backing up an instance of 2005 SQL with backup exec10 & SQL agent. jobs were sucessful.

    Then it came to the point where I needed to restore a database and found that I had no data, after some investigation I found that I needed a patch for backup software to work!!

    Problem is that time to do restores is not always available.

    • Erin Stellato
      June 30, 2011 | 7:00 pm

      Martin-

      That’s an excellent example of why you should test your restores (to Aaron’s point above). Thank you for sharing!

      Erin

  8. Stellato props | Olpera
    August 29, 2012 | 10:29 pm

    […] The Most Important Maintenance Task… | Erin Stellato | Erin StellatoJun 30, 2011 … We first did this session last year (props to my manager for the title) and the idea is to share real stories from the field with other customers so … […]

Leave a Reply to Grant Fritchey

Wanting to leave an <em>phasis on your comment?

Trackback URL http://erinstellato.com/2011/06/most-important-maintenance-task/trackback/