A Successful Failure

Last week I dialed into a customer site to make some changes to their database.  The customer wanted to alter the configuration of the application, and the application interface did not permit the change they wanted.  Neither the dial in nor the change request was unusual; what was unusual was that I had to ask the customer to restore their database from backup when their post-change testing didn’t go as expected.  That’s the first time I’ve ever had to ask a customer to do that.

Now, let me explain why I had to have them do something as drastic as a restore (again, that’s atypical in my world).  The changes required me to delete data from a few tables, and then update other rows in the same tables.  I had the customer complete a full backup, and then I backed up the rows I was deleting and the rows I was updating.  Then I started my deletes and updates.  My statements were based off a dial in I had the previous day, where we made the changes in their test environment.  I was told that they had tested the changes and everything looked good.

After I finished the script, the customer logged in to the application and began to check the changes.  At some point the customer requested that I make another modification.  The change was technically out of scope, and it was not something I had tested the previous day.  However, I felt confident in my ability to make the change.  I backed up the rows to be deleted and modified, then made changes.  The customer continued testing, and eventually tried functionality that they had not tested the prior day.  It was not working as expected.  We spent some time troubleshooting the issue, and after about 30 minutes I got to the point where I knew I had to either manually un-do the changes, or go to a backup.

Here’s why I went to a backup:

  1. Processing had continued against the database while I was making changes, which was not supposed to happen but apparently they did not stop all the necessary services.  I take partial responsibility here.  There is probably more I can be doing to ensure that everything is stopped (trusting people is not always enough).  I don’t love the idea of putting the database in single user mode, but I suppose I could do that.
  2. When the customer was testing my changes, they also made changes to the configuration.  They were minor, but if I was going to roll back, I would have to roll back all that as well, which I didn’t have scripted.
  3. The database was not that large, the restore took less than 10 minutes.
  4. When you rollback changes via scripting, there is still concern that the database has not been “reverted” to its original state.  By going to a backup, I took that concern out of the equation.

Now, interestingly, after we restored the backup and verified the system was good to go, I had the customer test the functionality again (the functionality that led to the restore).  Funny thing: it wasn’t working as expected after the restore…at first.  Eventually it did.  I have my suspicions that the customer was trying to hurry things along and wasn’t being patient.

So while I had to have a customer restore a database for the first time, and I have to schedule another dial in to make the changes again, I still view the incident as a success.  No data was lost, and I left the system as I found it.  What’s the phrase in the medical world, “First, do no harm?”  It’s sound advice for a database, too.

There are no comments yet. Be the first and leave a response!

Leave a Reply

Wanting to leave an <em>phasis on your comment?

Trackback URL http://erinstellato.com/2011/11/successful-failure/trackback/