January’s Performance VC LiveMeeting: In the books!

Yesterday was the first LiveMeeting hosted by the Performance Virtual Chapter in 2012 and it went very well, in spite of a few glitches!  I am so proud of our team; it was a solid team effort.  Carlos ( b | t ) had asked Jason Strate ( b | t ) to be our speaker in January, and sorted out everything in good time to hand it over to Wil ( b | t ), Ryan ( b | t ), Phil ( t ), Paul  ( t ) and Yulia ( t ).

Wil worked out the details with Jason and made sure he was ready to run the LiveMeeting.  Ryan put together emails, a blog post and other notifications promoting the presentation, and Paul and Yulia updated the site and made some other tweaks along the way.

There are definitely things we can improve – we had an issue with the LiveMeeting link and I have no idea why, so I’m not sure how that affected our attendance.  But overall, I’m very happy with our first event and I am already looking forward to February when Jes Borland will present on Filegroups.  I had some questions from a customer recently about files and filegroups, so I can’t wait to see what I learn from Jes.

Jason’s presentation, slide deck and scripts will be up on the Archives page soon, as will the information for Jes’ presentation (feel free to add it to your calendars now: Thursday, February 23rd at 2 PM EST/11 AM PST).  The winner of this month’s $50 Amazon gift card sponsored by the wonderful Confio should get notified tomorrow or Monday so check your email if you attended!  And if you want to see who won, come back to the main page next week to find out, and get signed up for next month’s LiveMeeting.  See you there!

Statistics and Recompilations

A couple weeks ago I was asked to look at system performance for a customer, and as part of my investigation I found a high read query.  I had run a trace and filtered for anything over 10,000 reads and saw a query that required around 32,000 reads, and it was running at least once a second.  I examined it further.  It was a query against one table, for one presumably unique value in a column.  I was suspicious.  Based on the reads I guessed it was doing a table scan, even though I expected an index seek.

I ran the query in SSMS to see how many rows it returned.  Three.  I then ran the query with Include Actual Execution Plan enabled…it did an index seek.  Well, the software runs the query differently than SSMS, so I grabbed the plan handle from the DMV and looked up the query plan that way.  It was doing a full table scan against the table.  I checked statistics…they had been updated with 100% sample that morning, and the value in the query existed in the histogram.

Then I checked the database options…Auto Update Statistics was disabled.

<sidebar>

Yes, I know that Kimberly Tripp recommends leaving Auto Update Statistics and Auto Create Statistics enabled…but she does say “It depends” in her post.  So let me explain why we have traditionally recommended to customers that they disable the Auto Update Statistics option:

  • We do not want statistics updating during production hours if at all possible; we want the customer to manage statistics through a regularly scheduled job.
  • We want statistics updated with 100% sample because very often the distribution in critical tables is skewed.  With Auto Update Statistics enabled, the statistics do not update with 100% sample unless the table is less than 8 MB in size.

I know, there are many discussions we can have about whether this is the “best” thing to recommend.  I actually tried to go through the rationale, but it became too long, and will be another post entirely at some point.  Just think of this from the point of the vendor (me) for a minute.  We have to make configuration and maintenance recommendations for our customers.  And we have to make them to fit thousands of customers.  It is not an easy task to make general recommendations, because as you know, “It depends.”

</sidebar>

So why does it matter, in this case, that Auto Update Statistics is disabled?

Because when Auto Update Statistics is disabled, query plans will not recompile, even when statistics are updated.*

This was a shocker to me, and I have to thank Steve Smith ( b | t ) for pointing it out to me last fall during the SQLSaturday in Kalamazoo.  He sent me a link on Execution Plan Caching and Reuse, and in the pretty yellow Note at the bottom it mentions this fact.  Surprisingly, I haven’t found that same information in any of the other statistics-related documentation in BOL.

*To be clear, this is the relevant text, copied from the MSDN article:

When the AUTO_UPDATE_STATISTICS database option is SET to ON, queries are recompiled when they target tables or indexed views whose statistics have been updated or whose cardinalities have changed significantly since the last execution. This behavior applies to standard user-defined tables, temporary tables, and the inserted and deleted tables created by DML triggers. If query performance is affected by excessive recompilations, consider changing this setting to OFF. When the AUTO_UPDATE_STATISTICS database option is SET to OFF, no recompilations occur based on statistics or cardinality changes, with the exception of the inserted and deleted tables that are created by DML INSTEAD OF triggers.

Now, maybe everyone has Auto Update Statistics enabled and no one will ever have to worry about this problem, and this can become an inane fact that you know.  But just in case, here’s a demo so you can see it in action.  I commented the code along the way, and included screen shots where relevant.


USE master;
GO

/* create a new database for testing (optional) */
CREATE DATABASE [Stats]
ON PRIMARY
     ( NAME = N'Stats',
     FILENAME = N'C:\Databases\SQL2008R2\Stats.mdf' ,
     SIZE = 4096KB ,
     FILEGROWTH = 1024KB )
LOG ON
     ( NAME = N'Stats_log',
     FILENAME = N'C:\Databases\SQL2008R2\Stats_log.ldf' ,
     SIZE = 1024KB ,
     FILEGROWTH = 10% );

ALTER DATABASE [Stats] SET RECOVERY SIMPLE WITH NO_WAIT;

USE [Stats];
GO

/* turn off auto update statistics,
 as it is enabled by default for new databases
*/
sp_dboption [Stats], 'auto update statistics', 'off';
GO
/* create a table and nonclustered index for testing */
CREATE TABLE dbo.mytable (
     ID INT IDENTITY (1,1),
     SomeValue INT);

CREATE NONCLUSTERED INDEX mytable_ID ON dbo.mytable (ID);
/* verify that no statistics exist for the index */
DBCC SHOW_STATISTICS ("dbo.mytable", mytable_ID) WITH STAT_HEADER

 

This is an abbreviated output from DBCC SHOW_STATISTICS, but you can see that there’s no statistics information available.

No statistics exist

 

 

 


/* load some data */
SET NOCOUNT ON

DECLARE
     @count INT = 1,
     @random INT = RAND()*100000;

WHILE @count < 550
BEGIN
     INSERT INTO dbo.mytable (SomeValue) VALUES (@random);
     SET @count = @count + 1;
     SET @random = RAND()* 100000;
END

/*
 clear cache, just to make sure there are no
 query plans from previous testing hanging around
*/

DBCC FREEPROCCACHE

/* run a selective query */

SELECT * FROM dbo.mytable WHERE ID = 549;
/* get plan_handle */
SELECT p.usecounts, p.plan_handle, s.text
FROM sys.dm_exec_cached_plans p
OUTER APPLY sys.dm_exec_sql_text(p.plan_handle) s
WHERE s.text LIKE '%dbo.mytable%';

/* use the plan_handle from the above query */
SELECT *
FROM sys.dm_exec_query_plan(0x06000F0062117321B8A0FA07000000000000000000000000);

 

When we view the query plan, we see a full scan, which isn’t surprising since the table is so small. 

Initial query plan

 

 

 

 


/* load some more data */
SET NOCOUNT ON

DECLARE
     @count INT = 1,
     @random INT = RAND()*100000;

WHILE @count < 1000
BEGIN
     INSERT INTO dbo.mytable (SomeValue) VALUES (@random);
     SET @count = @count + 1;
     SET @random = RAND()* 100000;
END

/*
 run an update statement, which would would
 normally invoke an update of statistics,
 as 20% plus 500 rows are changing
*/

UPDATE dbo.mytable
SET SomeValue = 10000
WHERE ID IN (SELECT MAX(ID) FROM dbo.mytable);

/* verify that statistics were not updated */
DBCC SHOW_STATISTICS ("dbo.mytable", mytable_ID) WITH STAT_HEADER; 

Statistics were not updated, because Auto Update is disabled. 

Statistics haven't been updated

Statistics haven't been updated

 

 

 

 

Let’s update them manually and see what happens.


/* update statistics manually */
UPDATE STATISTICS dbo.mytable WITH FULLSCAN;

/* verify that statistics were updated */
DBCC SHOW_STATISTICS ("dbo.mytable", mytable_ID) WITH STAT_HEADER; 
Statistics have been updated

Statistics have been updated

 

 

 

/* run the same selective query */
SELECT * FROM dbo.mytable WHERE ID = 549;

/* check the plan_handle */
SELECT p.usecounts, p.plan_handle, s.text
FROM sys.dm_exec_cached_plans p
OUTER APPLY sys.dm_exec_sql_text(p.plan_handle) s
WHERE s.text LIKE '%dbo.mytable%'

/* the plan handle is the same, still a full scan */
SELECT *
FROM sys.dm_exec_query_plan(0x06000F0062117321B8A0FA07000000000000000000000000);

Even though statistics were updated with 100% sample, because Auto Update Statistics was disabled, the query plans for dbo.mytable did not recompile, and they won’t until we either flush procedure cache, restart the instance, or enable Auto Update Statistics.

If you repeat the same steps, but this time with Auto Update Statistics enabled, you will see the query plan get removed from cache after statistics are updated (either automatically or manually).


/* clean up code */
USE master;
GO

DROP DATABASE [Stats];

 

Backups and Disk Space

I was in Phoenix last week to visit a customer, and I saw an email come through regarding a customer who contacted support because they discovered they were missing data from a table.  The customer contacted support on Friday morning and our team was engaged almost immediately.  The data loss had been discovered on Thursday afternoon.  When we asked about a database backup to restore, so we could recover the data from it, we were told that they did have a database backup because they backup nightly.  Hooray!  They backup nightly!  However, the backup was from Tuesday night, after the data was found to be missing.  Apparently, they overwrite the backup every night.

Ouch.  I immediately asked if they had an off-site copy (they did not) and then I asked why they only have one day’s backup available.  The answer?  They have issues with disk space.

One of my next thoughts was, “How many DBAs ask for backup space when they’re asking for space for a database?”  So to any DBAs who might be reading this: when you are asked how much disk space you need for a database, do you also include backup storage?  Do you include test, development or QA environments?

Then I started thinking about the documentation that our team provides to customers.  Do we ever mention disk space for backups?  We do not.  We have no specific recommendation regarding how much space is needed for backups, but…we do talk about backing up the database to separate storage.  That’s probably not enough.  Conveniently, we’re in the process of updating those guides, so I’m going to add a section covering disk space for backups.  And space for testing restores.  And space for test, development and QA environments.  Am I greedy?  Yes, I am.  But can I afford not to be?

And in case you’re wondering about the customer and their lost data…a team member was able to recover it using Lumigent’s Log Explorer tool.  Some of you may have heard of it.  It has saved data for a few customers over the years.  Sadly, it doesn’t work past SQL Server 2005 and you’d have to do some serious searching to find the files online.  It has a few quirks when you’re working with it, but all in all, it’s been a very useful application for us.  However, a third party tool is no replacement for a sound recovery strategy.

So when you have a minute today, go check the storage where backups are stored.  How much space do you have?  How many backups are you keeping online, and how often are backups going off site?  And finally, if you had to recover, right now, from a disaster that occurred yesterday afternoon, could you get the backup that you needed, and how quickly?

Introducing the 2012 Performance Virtual Chapter Team

In late November I wrote a post  asking for volunteers for the Performance Virtual Chapter.  There was a great response, and many thanks to those who replied!  Without further ado, let me introduce the Performance VC team for 2012:

 Speaker Wrangler: Carlos Bossy ( b | t )

I briefly met Carlos at the SQLSaturday in Portland earlier this year, and he emailed me early on expressing interest in the Speaker Wrangler position.  Carlos has a lot of speaking experience…he spoke at the Summit and SQLRally last year and has presented at numerous SQLSaturdays.  Why is this so great?  He knows a lot of speakers!  He’s an independent BI consultant out in Colorado so I know he has no problem connecting with people.  The lineup for the first quarter is almost finished; we will have details on the site soon!

LiveMeeting Hosts: Wil Sisney ( b | t ) and Neeraj Jandwani ( b | t )

Wil is someone I’ve wanted to get to know for months.  As a Star Wars fan, how can I not like a guy with HanSQL for his Twitter handle?  I do wonder what he really looks like…  Anyway, Wil is SQL Server DBA who works for a large health insurance company, and has spent hours (as in almost 600 hours in 2011 alone) studying SQL Server.  I am quite certain that the time he’s spent attending sessions and listening to webinars will translate into an engaging host for the LiveMeetings.

Neeraj lives in San Francisco, CA and works as a Senior SQL Server DBA for an advertising community.  He is already on Twitter and looking to get more involved in the Community, and the VC is a great place to start!  He’s going to be Wil’s backup for hosting LiveMeetings because as we all know, redundancy is a good thing.

Web Master: Paul Hiles  ( t ) and Yulia Fuller ( t )

I first met Paul many years ago, as he used to work for one of our customers as their DBA (and supported SQL Server 2000 for them).  I was sad when Paul moved on to a different company, as he had made my job pretty easy.  Paul and I reconnected when I started attending User Group meetings here in Cleveland, and I look forward to seeing what he and Yulia can do with the website.

Yulia was recommended to me by one of her colleagues before I even wrote the blog post asking for volunteers, and when I emailed her she quickly replied and said she was definitely interested in volunteering for the VC.  Yulia is a DBA for a hospital in Connecticut, and has regularly attended the Summit so I’m hoping that I get to meet her in person this year.

Marketing Genius: Ryan Adams ( b | t ), assisted by Phil Franz ( t )

Ryan is a busy guy.  He is on the Board of Directors for the North Texas SQL Server User Group, a regional mentor and is helping with next spring’s SQLRally.  Oh yes, and he has a regular full time job and blogs.  Ryan originally inquired about being the Web Master, but I admit, I really wanted to see him leverage his experience and community access in the marketing position. Lucky for me, he agreed!  Ryan does have some help from Phil Franz.

Phil was the first person to respond to my post, and he is new to the community and is really looking forward to getting more involved.  Phil is coming off a stint in the Armed Services (thank you greatly, Phil, for your service to our country) and finishing up his college degree so I’m expecting emails at crazy hours and someone with a lot of energy.  Ah, to be young again :)

What’s Next

I am really looking forward to this year, as I think this team will do great things in the Virtual Chapter space.  We decided to change our regular LiveMeeting time to the fourth Thursday of the month at 2 PM EST (so block off that time on January 27th now!) and our January speaker will be the first person we had speak when the Performance VC was revived in 2010: Jason Strate ( b | t ).  I love that we’re kicking off the year with an “alum”, so to speak, and I remember being so nervous for that first LiveMeeting.  Thank you, Jason, for your patience then and for presenting for us again!  We will again have a sponsor this year, and it continues to be Confio.  Thank you Confio!  Did you know that after every LiveMeeting we have a drawing of attendees, and give away a gift card, courtesy of Confio?  We’ll do a better job of highlighting it this year.

We will have registration details on our site soon, and expect changes in the site in the coming months.  You can follow us on Twitter (@SQLPASS_PVC) and monthly LiveMeetings will also be listed on the PASS Events page.  Finally, if you are interested in presenting for our Virtual Chapter, feel free to email me (erin dot stellato at sqlpass dot org) or contact Carlos via Twitter or his site.  We look forward to seeing you at our next LiveMeeting!