Merge Against SSIS Part Two

In my last post I went over the merge statement and some good uses for it when that’s all you want, but an extremely effective way to use it with SSIS is to have SSIS do all of your transformations, and the merge statement do the actual updates of the records, so SSIS isn’t slowed down by the OLE DB Command or other type of blocking transformation. In order to implement this, you’ll need to set up a staging table on the same server as your data warehouse to insert your changing records too where the merge statement can access them. It’s up to you if you want how you want to classify your staging tables (different name, different schema, or different db), but the key is that it just needs to be on the same server. For this example, I just appended staging to the name, although in actual implementation my preference is to have a different schema. But hey, that’s just me. 🙂 The easiest way to create it is to just script out your actual dimension table, but instead of your surrogate key column being an identity just make it a primary key. This is going to be the key that the staging table the dimension table join on. Leave it as an actual primary key, in order to prevent any duplicates. As an example, my staging script is below:

 

CREATE TABLE [MergePres].[NationalFileStagingUpdates](
    [FeatureID] [int] NOT NULL,
    [FeatureName] [varchar](120) NULL,
    [FeatureClass] [varchar](50) NULL,
    [StateCode] [char](2) NULL,
    [StateNumeric] [char](2) NULL,
    [CountyName] [varchar](100) NULL,
    [CountyNumeric] [char](3) NULL,
    [PrimaryLatitudeDMS] [varchar](7) NULL,
    [PrimaryLongitudeDMS] [varchar](8) NULL,
    [PrimaryLatitudeDEC] [numeric](11, 7) NULL,
    [PrimaryLongitudeDEDC] [numeric](12, 7) NULL,
    [SourceLatitudeDEC] [numeric](11, 7) NULL,
    [SourceLongitudeDEC] [numeric](12, 8) NULL,
    [ElevationMeters] [int] NULL,
    [ElevationFeet] [int] NULL,
    [MapName] [varchar](100) NULL,
    [DateCreated] [datetime] NULL,
    [DateEdited] [datetime] NULL,
    [Country] [varchar](25) NULL,
CONSTRAINT [PK_NationalFileStagingUpdates] PRIMARY KEY CLUSTERED
(
    [FeatureID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

 

Once this is created, we’re now ready to bring in SSIS. Create a new package and on the control flow tab, drag over an execute SQL task and point the connection to your database that contains your staging table. In the SQL Statement, truncate your existing Staging table: TRUNCATE TABLE [MergePres].[NationalFileMergeWithSSISUpdates]. The reason why we’re truncating staging here is to make sure we’re only keeping the most current changes to the dimension. Underneath it drag over a data flow task, and then make sure you drag in your precedence constraint from the execute SQL to the data flow. Should look similar to the below:

image

 

After you have your data flow task, open it  and create your dimension like you normally would, the only exception is at the bottom left (in my data flow), where you would normally drag over your ole db command and destination to expire the old record and insert the new. Just drag over your destination component, and point the destination at your staging table that you created earlier. In my simple package below, you can see there isn’t much actual transformation happening. Just reading from the source, adding an IsCurrent and BeginEffectiveDate to the data flow, looking up to the existing dimension, and then identifying new records and changes.

image

 

Now you have your changing records writing to a brand new table instead of trying to update your actual dimension. So the last step is to go back to your control flow, and add another execute SQL task to call the Merge statement that we created in our last blog post. Drag over the execute SQL task, create the connection to where your Merge statement stored procedure lives, and then write the execute statement:

 

EXEC MergePres.MergeAgainstSSISType2AllSSISHybrid

 

Your final control flow should look like:

image

 

And that’s it. It’s only a couple more steps than you would already have to do to create a dimension but now you have a highly scalable package that can handle thousands of updates a day without worrying about performance.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s