There is one more attendee question from SQLSaturday #118 that I want to follow up on, and it is this: “Is there any way to create an index with a varchar(max) column as the leading column?”
This question came up during my Statistics presentation, and was at the end as we were winding down. My reply was, “I have no idea, I’ve never tried it.”
The attendee noted that he had tried to create the index but couldn’t – so it cannot be done – but he was looking to see if there was a creative way to work around it. During the ensuing conversation he wondered if maybe it couldn’t be the leading column because the index would be too large, and then mentioned that he had ended up making in an included column in the index. My question was, “Well if it’s an included column what does that do to the size of the index?”
Then I stopped and said something to the effect of, “Hang on, what is the problem you’re trying to solve? Meaning, I’m guessing you’re trying to put a varchar(max) column in an index because you want to search on it. Is making it an included column going to help that much? I suppose you’ll get an index scan instead of a table scan, but is that really the right solution?”
My new favorite question to ask is, “What problem are you trying to solve?” In this case, I assume that users need to search on the data in the varchar(max) field. And this can be done in SQL Server a couple different ways. My first thought was Full Text Search. Now, configuring FTS is going to take more work than creating an index with an included column, but I would bet that performance and management long-term would be a lot easier. You could also create a computed column on a substring of the varchar(max) column, and put that in an index. There are many assumptions you have to handle here, most notably, is the data you want to search always in the same location within the original column?
I’m sure there are other options, but the overall point is this: understand what problem you’re trying to solve, and then figure out what options you have for solving it. Once you have your options, understand the pros and cons of each, and then make an informed decision.
Circling back to the original question, one of the first things I tried when I was waiting at the Madison airport for my flight home was to create a table with a varchar(max) column and create an index with that as a leading column. It of course failed, and Books Online clearly states:
Columns that are of the large object (LOB) data types ntext, text, varchar(max), nvarchar(max), varbinary(max), xml, or image cannot be specified as key columns for an index.
So there you have it, two presentations at SQLSaturday in Madison, and three blog posts from questions. SQLSaturdays always reaffirm several things for me. First, I am truly grateful for the SQL Community – the people, the opportunities, the fun…all of it. Second, I love presenting, although I tend to think of it as teaching and even said during one of my sessions, “If you have questions please come ask me after class.” Apparently I was having flashbacks to grad school. And finally, I will never know enough; there is always so much more to learn.
I would think that the performance benefits of Full-Text would outweigh any setup work – but interesting point about people not installing it during setup. I have to think it’s not a ton of space to install the additional files.
And here I’ve been getting frustrated that it takes 2 hours to get to work on my cow…I should be riding a horse!
E
Nice post, and for more than just the technical review.
Grad school? That’s new information for me. What did you pursue?
I got a Master’s in Motor Control at Michigan, after I finished undergrad. It’s on my About page (http://erinstellato.com/about/). Comes in handy sometimes 🙂
Aaron – that might be my new favourite quote. Plus I REALLY want to see someone teach a cow traffic rules.
Overall, nice post Erin.
In those situations I like to ask “What does God need with a starship?” and see if the people I’m asking know what my point is (and if they’re Star Trek fans.)
[…] varchar(max) in an Index – A fundamental lesson in problem solving that is so often overlooked due to over emphasis on the solution, shared courtesy of Erin Stellato (Blog|Twitter). […]
Your blog post is an perfect example why I love SQL Server. There is always something new to learn. 🙂
Erin,
Would you shy away from adding varchar(max) columns as included columns in indexes in general? I am creating a covering index, but hesitant to ‘fully cover’ it because there are 3 varchar(max) columns being selected (along with the other columns). Would it make sense to create the covering index w/o the 3 varchar(max) columns and have key lookups get those, or is including them not going to incur as much overhead as I am thinking.
Thanks,
Jeremy
Jeremy-
A very belated reply here 🙂 But, I would add those columns only in specific circumstances. If I were searching on that column, I probably would not add it. If I were DISPLAYING it in a search result, consistently, then I might add it. You have to do the work to see if the benefit of adding it outweighs the cost. A query might run faster and be more efficient, but is it “faster enough” and efficient enough that it’s worth the cost of maintaining that column in the index, and the disk space (and space in memory) it consumes?
E
Adding nonkey columns to the index uses more disk space to store the index. In particular, adding varchar(max), nvarchar(max), varbinary(max), or xml data types as nonkey columns may significantly increase disk space requirements, because the column values are copied into the index leaf level and also remain in the table or clustered index.
https://technet.microsoft.com/en-us/library/ms189607(v=sql.105).aspx\
Correct, this is also something to consider.