It’s Wednesday which means it’s time for the second post in the Internals series for SQL University this week. On Monday we started off with a review of data structures for a database. Today we are keeping with the data structures topic, but looking way under the covers at what happens when you update data. This post is a continuation of one written by Paul Randal ( blog | @paulrandal ) back in February titled, Do changes to index keys really do in-place updates? The discussion started when I asked, on Twitter, about updates in SQL Server. I wanted to know if an update was truly an update. Does the Storage Engine just modify the data on the page, or does the engine actually delete the data, then insert the changed value(s). A fascinating discussed ensued and Paul was kind enough to write up a post detailing what happens when you change a key in a clustered index. If you haven’t read it, please do so before continuing here.
At the end of Paul’s post, he mentioned that the same behavior occurs with nonclustered indexes. The purpose of this post is to understand that an update isn’t always a simple modification of data on the page, with a secondary goal of realizing that the order of the data on the page doesn’t really matter.
Note: I apologize, I’m having some formatting issues with posts. I’m working to get this fixed.
Here we go! We will start with creating a database and a table, add the clustered index and a nonclustered index, then insert some data:
/* create database */ CREATE DATABASE Movies ON PRIMARY ( NAME = N'Movies', FILENAME = N'C:\Databases\Movies.mdf' , SIZE = 4069KB , FILEGROWTH = 1024KB ) LOG ON ( NAME = N'Movies_log', FILENAME = N'C:\Databases\Movies_log.ldf' , SIZE = 1024KB , FILEGROWTH = 1024KB ); USE Movies GO ALTER DATABASE Movies SET RECOVERY SIMPLE; /* create table for testing */ CREATE TABLE dbo.MovieInfo ( MovieName VARCHAR (800), ReleaseDate SMALLDATETIME, Rating VARCHAR(5), Genre VARCHAR(250) ); /* add indexes */ CREATE CLUSTERED INDEX CI_ReleaseDate ON dbo.MovieInfo (ReleaseDate); CREATE NONCLUSTERED INDEX NCI_Rating ON dbo.MovieInfo (Rating); CHECKPOINT; /* add some data */ INSERT INTO dbo.MovieInfo ( MovieName, ReleaseDate, Rating, Genre ) VALUES ('Apollo 13', '1995-05-30 00:00:00', 'PG', 'Adventure, Drama, History'), ('The Right Stuff', '1983-10-21 00:00:00', 'PG-13', 'Adventure, Drama, History'); CHECKPOINT;
Using Paul’s sp_AllocationMetadata stored procedure (which you can download from his blog here), we will check to see which pages have been assigned to the MovieInfo objects:
We can then take page information and use that in DBCC PAGE (see homework from Monday), starting with the clustered index, identified by Index ID 1:
/* clustered index */ DBCC TRACEON (3604); GO DBCC PAGE (Movies, 1, 145, 2); GO
PAGE: (1:145)
< output removed >
Memory Dump @0x000000000C83C000
000000000C83C000: 01010400 00420001 00000000 00000800 †.....B..........
000000000C83C010: 00000000 00000200 1c000000 211fdb00 †............!.Û.
000000000C83C020: 91000000 01000000 17000000 8d000000 †...............
000000000C83C030: 34000000 00000000 00000000 9209bfaf †4........... ¿¯
000000000C83C040: 00000000 00000000 00000000 00000000 †................
000000000C83C050: 00000000 00000000 00000000 00000000 †................
000000000C83C060: 30000800 00001f88 05000004 0015001e †0...............
000000000C83C070: 00200039 0041706f 6c6c6f20 31335047 †. .9.Apollo 13PG
000000000C83C080: 41647665 6e747572 652c2044 72616d61 †Adventure, Drama
000000000C83C090: 2c204869 73746f72 79300008 00000090 †, History0......
000000000C83C0A0: 77050000 04001500 24002900 42005468 †w.......$.).B.Th
000000000C83C0B0: 65205269 67687420 53747566 6650472d †e Right StuffPG-
000000000C83C0C0: 31334164 76656e74 7572652c 20447261 †13Adventure, Dra
000000000C83C0D0: 6d612c20 48697374 6f727900 00212121 †ma, History..!!!
000000000C83C0E0: 21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
< output removed >
000000000C83DFF0: 21212121 21212121 21212121 60009900 †!!!!!!!!!!!!`...
OFFSET TABLE:
Row - Offset
1 (0x1) - 96 (0x60)
0 (0x0) - 153 (0x99)
The entry for The Right Stuff exists in slot 0, and the entry for Apollo 13 exists in slot 1. Note that the Apollo 13 row was inserted first (which is why the offset is lower), but based on the clustered index key of ReleaseDate, The Right Stuff is the first row in the clustered index, so it is slot 0.
Now let’s take a look at DBCC PAGE for the nonclustered index:
/* nonclustered index */ DBCC PAGE (Movies, 1, 147, 2); GO
PAGE: (1:147)
< output removed >
Memory Dump @0x000000000EB0A000
000000000EB0A000: 01020400 00820001 00000000 00000500 †...............
000000000EB0A010: 00000000 00000200 1d000000 7d1f7f00 †............}...
000000000EB0A020: 93000000 01000000 17000000 8d000000 †...............
000000000EB0A030: 35000000 00000000 00000000 670c169c †5...........g..
000000000EB0A040: 00000000 00000000 00000000 00000000 †................
000000000EB0A050: 00000000 00000000 00000000 00000000 †................
000000000EB0A060: 3600001f 88030000 01000e00 50473600 †6...........PG6.
000000000EB0A070: 00907703 00000100 11005047 2d313300 †..w.......PG-13.
000000000EB0A080: 00212121 21212121 21212121 21212121 †.!!!!!!!!!!!!!!!
< output removed >
000000000EB0BFF0: 21212121 21212121 21212121 6e006000 †!!!!!!!!!!!!n.`.
OFFSET TABLE:
Row - Offset
1 (0x1) - 110 (0x6e)
0 (0x0) - 96 (0x60)
In this output, the entry for The Right Stuff exists in slot 1, and the entry for Apollo 13 exists in slot 0. In our nonclustered index, the row for Apollo 13 came in first again, but this time it is the first row in the nonclustered index because it’s rating of PG comes before the PG-13 rating for The Right Stuff.
Now we’re going to update the release date for Apollo 13 – I purposely entered it incorrectly – which is part of the clustering key. We expect that both the clustered index and nonclustered index will be updated because the clustering key is stored in the leaf level of the nonclustered index, so if we change the clustering key, the nonclustered index(es) have to be updated as well.
/* update a column in the clustering key */ UPDATE dbo.MovieInfo SET ReleaseDate = '1995-06-30 00:00:00' WHERE MovieName = 'Apollo 13'
We immediately select from fn_dblog to see what’s in the transaction log (extra rows in the output that are not relevant were removed):
SELECT [Current LSN], [Operation], [Context], [Log Record Length], [Page ID], [Slot ID] FROM fn_dblog (NULL, NULL); GO
Fun!
- We see the row deleted from the clustered index: row 45, LOP_DELETE_ROWS, LCX_MARK_AS_GHOST
- We see the row re-inserted for the clustered index: row 48, LOP_INSERT_ROWS, LCX_CLUSTERED
- We see the row deleted for the nonclustered index: row 49, LOP_DELETE_ROWS, LCX_MARK_AS_GHOST
- We see the row re-inserted for the nonclustered index: row 51, LOP_INSERT_ROWS, LCX_INDEX_LEAF
*Note: changes to the slot array occur in the same fashion as described in Paul’s post because the change to the clustered key does not change the order of the data in the clustered index – the entry for The Right Stuff is still first, and still slot 0 ultimately. As such, I am not going to re-explain the changes here.
If you run DBCC PAGE again for both indexes, we can see that the data has moved on the page, even though the slot is the same:
/* clustered index */ DBCC PAGE (Movies, 1, 145, 2); GO
PAGE: (1:145)
< output removed >
Memory Dump @0x000000000C83C000
000000000C83C000: 01010400 00800001 00000000 00000800 †................
000000000C83C010: 00000000 00000200 1c000000 211f1401 †............!...
000000000C83C020: 91000000 01000000 17000000 b3000000 †...........³...
000000000C83C030: 01000000 4b020000 00000000 9209bfaf †....K....... ¿¯
000000000C83C040: 01000000 00000000 00000000 00000000 †................
000000000C83C050: 00000000 00000000 00000000 00000000 †................
000000000C83C060: 3c000800 00001f88 05000004 0015001e †<...............
000000000C83C070: 00200039 0041706f 6c6c6f20 31335047 †. .9.Apollo 13PG
000000000C83C080: 41647665 6e747572 652c2044 72616d61 †Adventure, Drama
000000000C83C090: 2c204869 73746f72 79300008 00000090 †, History0......
000000000C83C0A0: 77050000 04001500 24002900 42005468 †w.......$.).B.Th
000000000C83C0B0: 65205269 67687420 53747566 6650472d †e Right StuffPG-
000000000C83C0C0: 31334164 76656e74 7572652c 20447261 †13Adventure, Dra
000000000C83C0D0: 6d612c20 48697374 6f727930 00080000 †ma, History0....
000000000C83C0E0: 003e8805 00000400 15001e00 20003900 †.>.......... .9.
000000000C83C0F0: 41706f6c 6c6f2031 33504741 6476656e †Apollo 13PGAdven
000000000C83C100: 74757265 2c204472 616d612c 20486973 †ture, Drama, His
000000000C83C110: 746f7279 00002121 21212121 21212121 †tory..!!!!!!!!!!
000000000C83C120: 21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
< output removed >
000000000C83DFF0: 21212121 21212121 2121db00 db009900 †!!!!!!!!!!Û.Û...
OFFSET TABLE:
Row - Offset
1 (0x1) - 219 (0xdb)
0 (0x0) - 153 (0x99)
For the clustered index, the offset for the Apollo 13 row changed from 96 to 219, but it is still slot 1. Let’s check the nonclustered index:
/* nonclustered index */ DBCC PAGE (Movies, 1, 147, 2); GO
PAGE: (1:147)
< output removed >
Memory Dump @0x000000000D1AA000
000000000D1AA000: 01020400 00000001 00000000 00000500 †................
000000000D1AA010: 00000000 00000200 1d000000 7d1f8d00 †............}...
000000000D1AA020: 93000000 01000000 17000000 b3000000 †...........³...
000000000D1AA030: 03000000 4b020000 00000000 670c169c †....K.......g..
000000000D1AA040: 01000000 00000000 00000000 00000000 †................
000000000D1AA050: 00000000 00000000 00000000 00000000 †................
000000000D1AA060: 3a00001f 88030000 01000e00 50473600 †:...........PG6.
000000000D1AA070: 00907703 00000100 11005047 2d313336 †..w.......PG-136
000000000D1AA080: 00003e88 03000001 000e0050 47000021 †..>........PG..!
000000000D1AA090: 21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
< output removed >
000000000D1ABFF0: 21212121 21212121 21216e00 6e007f00 †!!!!!!!!!!n.n...
OFFSET TABLE:
Row - Offset
1 (0x1) - 110 (0x6e)
0 (0x0) - 127 (0x7f)
For the nonclustered index, the offset for the Apollo 13 row changed from 96 to 127, but again, it’s still slot 0 here.
Ok, we changed a value in the clustering key and we saw the update occur in the nonclustered index. Now let’s change a column in the nonclustered index and see what happens (for the sake of brevity, I will only show the OFFSET values in the DBCC PAGE output).
CHECKPOINT; /* update a column in the nonclustered index */ UPDATE dbo.MovieInfo SET rating = 'R' WHERE moviename = 'The Right Stuff'; SELECT [Current LSN], [Operation], [Context], [Log Record Length], [Page ID], [Slot ID] FROM fn_dblog (NULL, NULL); GO
Interesting…we see a LOP_MODIFY_ROW for the clustered index (row 4), and a delete and insert for the nonclustered index (rows 5 and 8).
Let’s check the output from DBCC PAGE:
/* clustered index */ DBCC PAGE (Movies, 1, 145, 2); GO
PAGE: (1:145)
< output removed >
Row - Offset
1 (0x1) - 219 (0xdb)
0 (0x0) - 153 (0x99)
/* nonclustered index */ DBCC PAGE (Movies, 1, 147, 2); GO
PAGE: (1:147)
< output removed >
OFFSET TABLE:
Row - Offset
1 (0x1) - 141 (0x8d)
0 (0x0) - 127 (0x7f)
The offset for the row in the clustered index did not change (153), but the offset for the row in the nonclustered index did, from 110 to 141.
I am explicitly calling this out: when key values that are in an index change, the in-place update does not occur.
- If I change any value in the clustering key, the in-place update does not occur for either the clustered index or nonclustered indexes.
- If I change any value in a nonclustered index key, the in-place update does not occur for the nonclustered index, but it could occur in-place for the clustered index because the updated column is not in the clustering key. In this update, I changed a date value, which is a fixed length field. If I had changed a variable length column, the data on the clustered index page might have moved, depending on whether I increased or decreased the length of the field (and this is a test for another day).
Just for fun, let’s see what happens with a heap.
/* create second table (heap) for testing */ CREATE TABLE dbo.MovieReview( MovieName VARCHAR (800), ReleaseDate SMALLDATETIME, Review VARCHAR (5000) ); /* add a nonclustered index */ CREATE NONCLUSTERED INDEX NCI_MovieName ON dbo.MovieReview (MovieName); /* add some data */ INSERT INTO dbo.MovieReview ( MovieName, ReleaseDate, Review ) VALUES ('Apollo13', '1995-05-30 00:00:00', 'Based on the autobiography written by Jim Lovell on the "successful failure" of the Apollo 13 mission in 1970.'), ('The Right Stuff', '1983-10-21 00:00:00', 'Based on the novel of the same name by Tom Wolfe, the film never feels like a documentary or history lesson.'); CHECKPOINT;
We again use sp_AllocationMetadata to get the page information, and then DBCC PAGE to check the offsets:
EXEC sp_AllocationMetadata 'MovieReview';
/* heap */ DBCC PAGE (Movies, 1, 155, 2); GO
PAGE: (1:155)
< output removed >
OFFSET TABLE:
Row - Offset
1 (0x1) - 231 (0xe7)
0 (0x0) - 96 (0x60)
/* nonclustered index */ DBCC PAGE (Movies, 1, 157, 2); GO
PAGE: (1:157)
< output removed >
OFFSET TABLE:
Row - Offset
1 (0x1) - 120 (0x78)
0 (0x0) - 96 (0x60)
Now we’ll update the column in the nonclustered index:
/* change value in the nonclustered index */ UPDATE dbo.MovieReview SET MovieName = 'Apollo 13' WHERE MovieName = 'Apollo13'; SELECT [Current LSN], [Operation], [Context], [Log Record Length], [Page ID], [Slot ID] FROM fn_dblog (NULL, NULL); GO
We see the same pattern we saw before, we see LOP_MODIFY_ROW for the heap, and then a delete and insert for the nonclustered index (rows 5 and 8).
Let’s look at the DBCC PAGE for the nonclustered index first:
/* nonclustered index */ DBCC PAGE (Movies, 1, 157, 2); GO
PAGE: (1:157)
< output removed >
OFFSET TABLE:
Row - Offset
1 (0x1) - 120 (0x78)
0 (0x0) - 151 (0x97)
We can see that the offset changed from 96 to 151, which is line with our delete and add.
Now we’ll check the heap page:
/* heap */ DBCC PAGE (Movies, 1, 155, 2); GO
PAGE: (1:155)
< output removed >
OFFSET TABLE:
Row - Offset
1 (0x1) - 231 (0xe7)
0 (0x0) - 371 (0x173)
Note: the offset changed. It went from 96 to 371. Notice that in the update of the MovieName column, we changed the value from Apollo13 to Apollo 13. We added a space, so we made the row longer. The row could no longer fit in the current space, so it had to move, hence the new offset.
There are other updates of variable length columns that we could explore (decrease length of the column, for example) I think the underlying point is clear: an update is not always an update of the row on the page, sometimes it’s a delete and an insert. Further, if the data is deleted and inserted to an entirely separate location the page, it doesn’t matter, because SQL Server uses the slot array to manage the order of the data.
Whew, that was a lot…is beer-thirty yet?! In Friday’s post, we will look at what happens when you delete from a table. See you then!