It’s after midnight and I’ve missed the deadline for Adam Machanic’s ( b | t ) TSQLTuesday event this month. But would I miss the chance to talk about some crap code that I’ve written? Of course not!
When I read Adam’s post I went looking for a specific script I wrote back in 2006. Yes, it’s that bad that I remembered it immediately. We had a customer that needed data populated in a new table, and the update was based off of data in two other tables: one existing, one a temporary table that I would load from a flat file they provided. I wrote my script, tested it in my system, and logged in to the customer environment to run it.
Problem #1: I ran the script during the middle of the business day.
While the data that I was inserting wasn’t being accessed by the customer, it still wasn’t a good idea to run the script at three in the afternoon.
Problem #2: I didn’t pay attention to the log file.
The customer was in full recovery mode. I was working with about 20 million rows of data. All the inserts would be logged, and I had no plan for the log itself (e.g. maybe think about a log backup along the way so the file wouldn’t grow).
Problem#3: I didn’t pay attention to the indexes.
The table was originally empty and had two indexes on it. I could have dropped the indexes (this was SQL 2000) and rebuilt them when finished to save a lot of time and resources.
Problem #4: It was crappy code.
I could not find the original script from 2006, and believe me, I looked. I am pretty sure that I was so horrified about what I wrote that at some point I deleted it.
Here is what I did:
- Load data from flat file into a temp table.
- Insert the data from the temp table into the new table.
- Create a cursor and loop through every row in the new table, and update the row in the new table with the value from the existing table.
Yes, that’s right, I did this update row by agonizing row.
It was an awesome display of inefficiency.
When I asked my manager to look at my code and help me figure out why the inserts were taking so long, he looked at it and said, “Oh, you could have just done that in one statement.” We then had a lovely discussion about cursors and set-based code. Going forward, when I to had write a T-SQL script I would think about my approach before writing and I would start thinking cursor-based, then work my way to set-based. For a very long time, I had a post-it note with this text on my monitor:
Today, set-based is my initial approach. There are cases where you have to use a cursor-based approach, but I try to think of it as my last resort. What’s ironic is that when my two new team members started, their first approach when scripting was also cursor-based. So of course they now have the above post-it note on their monitors.
When I start writing T-SQL code these days, a couple times I found myself thinking, “I wonder what Adam Machanic’s code would look like if he wrote this?” I randomly tweeted this question the other day and Argenis Fernandez ( b | t ) replied…
Awesome. I don’t know Adam’s Ten Commandments of Coding, but I would guess that would what one of them would be, “Thou shall avoid cursors whenever possible.” Thoughts on the others?