Tuesday, August 21, 2012

Comparing Two Tables to Determine Differences

I recently needed to find the difference between the same table in two points of time in order to reconstruct data lost during a deployment where a database restore went wrong. I searched for tips on how to do this best and came across a few different posts and then used the techniques to compare for accuracy and speed.

Technique 1: Min w/ Union

Lee Everest proposed this method in his blog, Find Differences Between Two Tables in TSQL.
SELECT MIN(id) AS id, col1, col2, col3 FROM (SELECT id, col1, col2, col3 FROM tbl1 UNION ALL SELECT id, col1, col2, col3 FROM tbl2) X GROUP BY col1, col2, col3 HAVING COUNT(1)=1
Due to differences on my db instance, I only had one row different out of 50,000,000 instead of the 3 he had. It ran in 30 seconds.

Technique 2: Except

I then came across this version on MSSQLTips.com, Compare SQL Server Datasets with INTERSECT and EXCEPT which also correctly found the differences but took 1:37.

Technique 3: Use the SQL Server Data Comparison Tool

Built into Visual Studio is a menu called Data.  This has a comparison tool that compares selected data columns from two tables or views.  I found it slow and buggy but it spared me having to write a ton of comparison to get an overview of the differences in the data.  Sadly once it found the differences, to investigate them, I wrote a lot of SQL still.


Technique 1 is much faster. Might need more data to get conclusive results. Would be interesting to compare to CDC in terms of both performance and accuracy.