Sorting Out the Error “Failed to perform data correlation”

While working with PerfMon data recently I ran into an interesting error when I tried to view it overlayed against a Trace file.  If you’re not familiar with that process, I recommend Brad McGehee’s article Correlating SQL Server Profiler with Performance Monitor.  I had opened a Trace file and selected a PerfMon file to import.  I selected about five counters and soon after the import started, then I got the error message:

Failed to perform data correlation

Profiler Error

And the PerfMon data looked like this in Profiler:

Truncated PerfMon data in Profiler

Truncated PerfMon data in Profiler (click to see larger)

I hadn’t seen that message before and the output was then somewhat useless since I couldn’t compare the PerfMon data with Trace in that format after 6:30 AM.  I took another set of PerfMon and Trace files and tried to the import again.  Same message.  I decided to dig in and see if I could figure it out.

I loaded a third set of PerfMon and Trace, same thing.  I checked to see if the data cut off at the same time in each trace and it didn’t.

Then I tried to load just one counter at a time.  I started with %Processor Time_Total, I have no idea why I picked that counter first, but when I went to click OK, it was grayed out:

Nope, cannot select "OK"

Nope, cannot select "OK"

Interesting.  I selected another counter and the OK button became available so I selected OK to import the data and got the error message again.  I started over.  After I selected the %Processor Time_Total counter, I selected another counter, then unselected it.  The OK button was now available:

OK is now available

OK is now available

I imported the data, and I got the error again.  Then I zoomed in on the PerfMon graph to see the timestamp for when it stopped:

Zoomed in to find the timestamp

Zoomed in to find the timestamp (click to see larger)

At that point, I knew the issue was with data somewhere between 6:40:05 and 6:40:20.  Next I read the data directly from the PerfMon trace, which I had loaded into a database table using relog.  I found this:

PerfMon data up close

PerfMon data up close (click to see larger)

There were two samples taken at 6:40:10, three milliseconds apart.  I have to conclude that for some reason, that causes the import to fail, because when I checked the other files, I found the same thing: multiple entries for the same second, taken milliseconds apart.

There is a fix for this.  I used the relog utility to split out the original file into two files, excluding the data sampled at 6:40:10, and then merged the files back into one.

 

separate the file:

relog PerfMon_20120309.blg -b 03/09/2012 03:35:00 -e 03/09/2012 06:40:10 -o PartI.blg

relog PerfMon_20120309.blg -b 03/09/2012 06:40:15 -e 03/09/2012 18:35:00 -o Part2.blg

merge files:

relog Part1.blg Part2.blg -o PerfMon_20120309_Fixed.blg

 

Not elegant, but possible.  Once I did that, I could import all the data for % Processor Time_Total:

Everything except 6:40:10 in the graph

Everything except 6:40:10 in the graph (click to see larger)

Even though I have a way to fix this problem, this is not a solution.  The solution is to not have bad data to begin with.  If I were consistently working with this customer’s data, I would dig in and try to figure out what was causing the bad sampling.  It’s not likely that I will have the opportunity to do that, but should I run into this problem again, at least I have a quick fix, and maybe I will have the time then to figure out what’s really going on.

There are no comments yet. Be the first and leave a response!

Leave a Reply

Wanting to leave an <em>phasis on your comment?

Trackback URL http://erinstellato.com/2012/03/failed-perfom-data-correlation/trackback/