Merge Against SSIS

Yesterday I was lucky enough to get to do a presentation for Pragmatic Works on using the Merge statement and comparing it to SSIS and the performance differences between them. Some highlights of the session included:

  • Same server vs different server comparisons
  • SSIS ole db command and it’s limitations
  • Type 1 vs Type 2 changes
  • Merge statement living on the same server

Below are the answers to some of the questions I received, followed by the sample code. I hope everyone finds it useful!

 

What are the common types of dimension changes and what are the differences between them?

    1. While this probably qualifies as it’s own blog post (note to self :)), the major differences are outlined below:
      1. Type 1 attributes are attributes (columns) within your table that do NOT track history. For example, if the color of my product was orange yesterday but today it is green I simply overwrite the orange value in the table and move on.
      2. Type 2 attributes are attributes (columns) within your table that DO track history. Using the same example, if my color was orange yesterday but changed when the package ran last night I would expire the orange record(showing that it is no longer current), and then inserting a new row with the color of green with a new start date and a current row.

 

When the source and destination tables are on the same server, the merge statement is amazingly fast. In our sample file set we were running yesterday, we were running 2.2 million updates in 2-3 minutes. If you’re in a small shop, this may be the perfect solution for you in order to house both your data warehouse and your transactional database(s) on the same server. However, if you’re in a larger shop, we’ve already stated the biggest drawback to this: the source and destination have to be on the same server. Sometimes that’s just not possible, either physically or logically. In this case, what do we do? SSIS is absolutely great at moving data across networks from server A to server B, but the drawback with SSIS is how to get all of those updates in within a short amount of time. The OLE DB command in SSIS is RBAR (row by agonizing row), and will definitely not perform at scale. To accomplish tons of work at scale, we can still have SSIS do the heavy lifting across servers but have the merge statement do the actual updates server side to get the updates done quickly. For this example, I used the National Features File available for free download at http://geonames.usgs.gov/domestic/download_data.htm. Download the file, and unzip/extract it. To demonstrate, first create your dimension table:

 

CREATE TABLE [MergePres].[NationalFile](
    [FeatureID] [int] NOT NULL,
    [FeatureName] [varchar](120) NULL,
    [FeatureClass] [varchar](50) NULL,
    [StateCode] [char](2) NULL,
    [StateNumeric] [char](2) NULL,
    [CountyName] [varchar](100) NULL,
    [CountyNumeric] [char](3) NULL,
    [PrimaryLatitudeDMS] [varchar](7) NULL,
    [PrimaryLongitudeDMS] [varchar](8) NULL,
    [PrimaryLatitudeDEC] [numeric](11, 7) NULL,
    [PrimaryLongitudeDEDC] [numeric](12, 7) NULL,
    [SourceLatitudeDEC] [numeric](11, 7) NULL,
    [SourceLongitudeDEC] [numeric](12, 8) NULL,
    [ElevationMeters] [int] NULL,
    [ElevationFeet] [int] NULL,
    [MapName] [varchar](100) NULL,
    [DateCreated] [datetime] NULL,
    [DateEdited] [datetime] NULL,
    [Country] [varchar](50) NULL,
    [IsCurrent] [char](1) NULL,
    [BeginEffectiveDate] [datetime] NULL,
    [EndEffectiveDate] [datetime] NULL
) ON [PRIMARY]

GO

 

Notice that I added a Country column into the table to show as our example. Create a staging table, but for country we are going to change it so our updates will take affect:

 

CREATE TABLE [MergePres].[NationalFileMergeWithSSISUpdates](
    [FeatureID] [int] NOT NULL,
    [FeatureName] [varchar](120) NULL,
    [FeatureClass] [varchar](50) NULL,
    [StateCode] [char](2) NULL,
    [StateNumeric] [char](2) NULL,
    [CountyName] [varchar](100) NULL,
    [CountyNumeric] [char](3) NULL,
    [PrimaryLatitudeDMS] [varchar](7) NULL,
    [PrimaryLongitudeDMS] [varchar](8) NULL,
    [PrimaryLatitudeDEC] [numeric](11, 7) NULL,
    [PrimaryLongitudeDEDC] [numeric](12, 7) NULL,
    [SourceLatitudeDEC] [numeric](11, 7) NULL,
    [SourceLongitudeDEC] [numeric](12, 8) NULL,
    [ElevationMeters] [int] NULL,
    [ElevationFeet] [int] NULL,
    [MapName] [varchar](100) NULL,
    [DateCreated] [datetime] NULL,
    [DateEdited] [datetime] NULL,
    [Country] [varchar](50) NULL,
    [IsCurrent] [char](1) NULL,
    [BeginEffectiveDate] [datetime] NULL,
    [EndEffectiveDate] [datetime] NULL
) ON [PRIMARY]

GO

INSERT INTO [MergePres].[NationalFileMergeWithSSISUpdates]
([FeatureID]
      ,[FeatureName]
      ,[FeatureClass]
      ,[StateCode]
      ,[StateNumeric]
      ,[CountyName]
      ,[CountyNumeric]
      ,[PrimaryLatitudeDMS]
      ,[PrimaryLongitudeDMS]
      ,[PrimaryLatitudeDEC]
      ,[PrimaryLongitudeDEDC]
      ,[SourceLatitudeDEC]
      ,[SourceLongitudeDEC]
      ,[ElevationMeters]
      ,[ElevationFeet]
      ,[MapName]
      ,[DateCreated]
      ,[DateEdited]
      ,[IsCurrent]
      ,[BeginEffectiveDate]
      ,[Country]
      ,[EndEffectiveDate])
SELECT [FeatureID]
      ,[FeatureName]
      ,[FeatureClass]
      ,[StateCode]
      ,[StateNumeric]
      ,[CountyName]
      ,[CountyNumeric]
      ,[PrimaryLatitudeDMS]
      ,[PrimaryLongitudeDMS]
      ,[PrimaryLatitudeDEC]
      ,[PrimaryLongitudeDEDC]
      ,[SourceLatitudeDEC]
      ,[SourceLongitudeDEC]
      ,[ElevationMeters]
      ,[ElevationFeet]
      ,[MapName]
      ,[DateCreated]
      ,[DateEdited]
      ,[IsCurrent]
      ,[BeginEffectiveDate]
      ,’United States of America’ AS [Country]
      ,[EndEffectiveDate]
  FROM [MergePres].[NationalFile]

 

 

Next up we need to create our stored procedure for the merge statement:

 

CREATE PROCEDURE [MergePres].[MergeAgainstSSISType2AllSSISHybrid]
AS

INSERT INTO MergePres.NationalFile
    ([FeatureID]
              ,[FeatureName]
              ,[FeatureClass]
              ,[StateCode]
              ,[StateNumeric]
              ,[CountyName]
              ,[CountyNumeric]
              ,[PrimaryLatitudeDMS]
              ,[PrimaryLongitudeDMS]
              ,[PrimaryLatitudeDEC]
              ,[PrimaryLongitudeDEDC]
              ,[SourceLatitudeDEC]
              ,[SourceLongitudeDEC]
              ,[ElevationMeters]
              ,[ElevationFeet]
              ,[MapName]
              ,[DateCreated]
              ,[DateEdited]
              ,[IsCurrent]
              ,[BeginEffectiveDate]
              ,[Country])
SELECT
    [FeatureID]
              ,[FeatureName]
              ,[FeatureClass]
              ,[StateCode]
              ,[StateNumeric]
              ,[CountyName]
              ,[CountyNumeric]
              ,[PrimaryLatitudeDMS]
              ,[PrimaryLongitudeDMS]
              ,[PrimaryLatitudeDEC]
              ,[PrimaryLongitudeDEDC]
              ,[SourceLatitudeDEC]
              ,[SourceLongitudeDEC]
              ,[ElevationMeters]
              ,[ElevationFeet]
              ,[MapName]
              ,[DateCreated]
              ,[DateEdited]
              ,’Y’
              ,GETUTCDATE()
              ,[Country]
FROM
(
MERGE [MergePres].[NationalFile] AS Dest
USING [MergePres].[NationalFileMergeWithSSISUpdates] AS Src
    ON Src.FeatureID=Dest.FeatureID
WHEN NOT MATCHED THEN    –if no matching record found, insert it into the table
        INSERT ([FeatureID]
              ,[FeatureName]
              ,[FeatureClass]
              ,[StateCode]
              ,[StateNumeric]
              ,[CountyName]
              ,[CountyNumeric]
              ,[PrimaryLatitudeDMS]
              ,[PrimaryLongitudeDMS]
              ,[PrimaryLatitudeDEC]
              ,[PrimaryLongitudeDEDC]
              ,[SourceLatitudeDEC]
              ,[SourceLongitudeDEC]
              ,[ElevationMeters]
              ,[ElevationFeet]
              ,[MapName]
              ,[DateCreated]
              ,[DateEdited]
              ,[IsCurrent]
              ,[BeginEffectiveDate]
              ,[Country])
        VALUES (Src.[FeatureID]
              ,Src.[FeatureName]
              ,Src.[FeatureClass]
              ,Src.[StateCode]
              ,Src.[StateNumeric]
              ,Src.[CountyName]
              ,Src.[CountyNumeric]
              ,Src.[PrimaryLatitudeDMS]
              ,Src.[PrimaryLongitudeDMS]
              ,Src.[PrimaryLatitudeDEC]
              ,Src.[PrimaryLongitudeDEDC]
              ,Src.[SourceLatitudeDEC]
              ,Src.[SourceLongitudeDEC]
              ,Src.[ElevationMeters]
              ,Src.[ElevationFeet]
              ,Src.[MapName]
              ,Src.[DateCreated]
              ,Src.[DateEdited]
              ,’Y’
              ,GETUTCDATE()
              ,Src.[Country])
WHEN MATCHED
AND (
      ISNULL(Dest.[FeatureID], ”) <> ISNULL(Src.[FeatureID], ”) OR
      ISNULL(Dest.[FeatureName], ”) <> ISNULL(Src.[FeatureName], ”) OR
      ISNULL(Dest.[FeatureClass], ”) <> ISNULL(Src.[FeatureClass], ”) OR
      ISNULL(Dest.[StateCode], ”) <> ISNULL(Src.[StateCode], ”) OR
      ISNULL(Dest.[StateNumeric], ”) <> ISNULL(Src.[StateNumeric], ”) OR
      ISNULL(Dest.[CountyName], ”) <> ISNULL(Src.[CountyName], ”) OR
      ISNULL(Dest.[CountyNumeric], ”) <> ISNULL(Src.[CountyNumeric], ”) OR
      ISNULL(Dest.[PrimaryLatitudeDMS], ”) <> ISNULL(Src.[PrimaryLatitudeDMS], ”) OR
      ISNULL(Dest.[PrimaryLongitudeDMS], ”) <> ISNULL(Src.[PrimaryLongitudeDMS], ”) OR
      ISNULL(Dest.[PrimaryLatitudeDEC], ”) <> ISNULL(Src.[PrimaryLatitudeDEC], ”) OR
      ISNULL(Dest.[PrimaryLongitudeDEDC], ”) <> ISNULL(Src.[PrimaryLongitudeDEDC], ”) OR
      ISNULL(Dest.[ElevationMeters], ”) <> ISNULL(Src.[ElevationMeters], ”) OR
      ISNULL(Dest.[ElevationFeet], ”) <> ISNULL(Src.[ElevationFeet], ”) OR
      ISNULL(Dest.[MapName], ”) <> ISNULL(Src.[MapName], ”) OR
      ISNULL(Dest.[DateCreated], ”) <> ISNULL(Src.[DateCreated], ”) OR
      ISNULL(Dest.[DateEdited], ”) <> ISNULL(Src.[DateEdited], ”) OR
      ISNULL(Dest.[Country], ”) <> ISNULL(Src.[Country],”)
     )
THEN UPDATE
    SET
        Dest.IsCurrent=’N’,
        Dest.EndEffectiveDate=GETUTCDATE()
OUTPUT Src.[FeatureID]
      ,Src.[FeatureName]
      ,Src.[FeatureClass]
      ,Src.[StateCode]
      ,Src.[StateNumeric]
      ,Src.[CountyName]
      ,Src.[CountyNumeric]
      ,Src.[PrimaryLatitudeDMS]
      ,Src.[PrimaryLongitudeDMS]
      ,Src.[PrimaryLatitudeDEC]
      ,Src.[PrimaryLongitudeDEDC]
      ,Src.[SourceLatitudeDEC]
      ,Src.[SourceLongitudeDEC]
      ,Src.[ElevationMeters]
      ,Src.[ElevationFeet]
      ,Src.[MapName]
      ,Src.[DateCreated]
      ,Src.[DateEdited]
      ,Src.[Country]
      ,$Action AS [MergeAction]
) AS Mrg
;

 

Let’s examine the different pieces of the statement real quick. The first portion (from the initial SELECT statement down to the FROM is where we’ll do our insert. We use the merge statement as a subquery inside of the from statement so the only records that will be inserted are those that are identified in the merge statement.

 

The second part from the beginning of the Merge keyword to the end of the USING clause are where we’re identifying the business key between the source and destination so we can do our lookup.

 

The third part of the query is doing an insert if the record previously didn’t exist. Since we’re having SSIS do the inserts if no match is found, this shouldn’t really ever need to write it, but it’s in there just in case.

 

The fourth part of the query (the WHEN MATCHED part of the query onwards) is identifying if the business keys do match, what’s different? Anything? If it is different it will expire the current row and set the IsCurrent flag to N and the End Effective date into the table.

 

In my next post we’ll cover the SSIS portion of this package.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s