SSIS: Dynamically setting the Default Buffer Max Rows property

I’ve blogged in the past about the Default Buffer Max Rows property in SSIS, and how powerful it can be. As a recap, there are 2 properties in SSIS that control the amount of data that your ETL package will hold in any given buffer: The DefaultBufferMaxRows property, and the DefaultBufferSize. When performance tuning SSIS packages, we can modify either one of these to adjust how much data SSIS can carry through a buffer at once.  By default, this is 10,000 rows or 10 MB. One thing I didn’t like about my “CalculateDefaultBufferSize” spreadsheet is that it made an assumption that you were going to change your DefaultBufferSize to 100MB. So this is 100 MB per component that is running in our data flow, which can become extremely costly. If you leave it at the default however, you’re  still using 10 MB per component no matter what. If we have a really skinny data set we’re importing (such as a fact table or a hub), it’s very possible we could be leaving a lot of empty space in the data flow if we stop every 10,000 rows. Conversely,  if we have a really wide dataset that we can only fit 6,000 or 7,000 rows in a buffer, we’re having to make extra hops to move that same data. It’s always a balancing act. I recently decided to see if I could automate the setting of this property in order to maximize the efficiency of my SSIS packages without needing to manually calculate it out every time.

To start, we need to find our destination table and identify the maximum row length in that table. Fortunately, this can be obtained with a simple query (using DimCurrency in the AdventureWorksDW2012):

SELECT
    SUM (max_length)  [row_length]
FROM  sys.tables t
JOIN sys.columns c
    ON t.object_id=c.object_id
JOIN sys.schemas s
    ON t.schema_id=s.schema_id
WHERE t.name = ‘DimCurrency’
    AND s.name = ‘dbo’

You could easily turn this into a procedure to take the tablename and schema as a parameter, and return it. For this post, I’m going to leave it as a query. Once we have this, it becomes a simple exercise in math. The table above returns a max row length of 118 (in bytes), so we take this value and divide it into 1our default value (which fortunately is stored in SSIS in bytes as well – as 10485760). So our calculation becomes:

10485760/118

giving us a result of 88,862 rows. We’ll round down to the closest 100, and that will be our default buffer max rows property. Now how do we make this dynamic? To start, we open up our SSIS package and add an Execute SQL task into the Control Flow. A good item to improve upon this would be take this and put in a Pre-Execute SQL task, and write the value to an information table. Then you can log it and do some reporting around what your buffers are. As an added bonus with this dynamic structure: as your data scales, your ETL package will too!

image

Next we want to create 3 variables: str_table_name, str_schema_name, and int_row_size. str_table_name and str_schema_name should be strings, and int_row_size should be an Int32. For the int_row_size variable, set the Value to 1. This is going to avoid an error in our expression later. This way we can parameterize our inputs for our destination tables and the corresponding values, and create a placeholder for our incoming result.  If your package names are the same as the destination tables then instead of having to assign this value manually you could just configure it from an expression, using the @[System::PackageName] global variable:

image

Let’s open up our Execute SQL Task and configure it. The connection will point to our destination database, and the SQL query will be as we’ve outlined above, subbing parameters for the table and schema name:

image

Do the parameter mapping on the 2nd tab to parameterize your table and schema with the variables you created above, and set your result set to single row. The single row result set should map to your int_row_size variable. Now for the fun part. Create a data flow task, and right click on it and go to properties.

image

Under expressions, select DefaultBufferMaxRows and expand the expression ellipses. For the expression, use the following logic:

@[User::int_row_size]==1? 10000: 10485760/ @[User::int_row_size]

The initial test of the value is to ensure that if the sql fails to get the proper row size, your buffers remain at the default. And that’s it! Now you can dynamically set your default buffer max rows for ssis!

Now that’s some awesome-sauce.

Chris

SSIS: Removing Duplicates in a data flow

Recently I came across a post online that was providing a solution to duplicates in a data flow. As I was reading it, in it there was something that makes me cringe: a Sort transform. Those of you who know me know that the Sort transform in a data flow gives me nightmares at night, almost as much so as the Slowly Changing Dimension wizard on medium and large dimensions. As I was reading it, I kept thinking there has to be a better way than sorting the data inside of the data flow with that component. You could write a script to do it, but my personal opinion is to try and do as much in the data and control flow as possible without scripting, as I feel script components make SSIS much harder to maintain. I came up with an alternate solution, and I would like to use the rest of this blog post to prevent my findings.

For my sample tests here, I am going to use a sample file that I’ve created on customers:

image

I’m also going to make an assumption that we don’t have DQS available in this environment. If you want more information on how DQS can help with this problem, see my earlier posts here.

The first option we have is to use the Fuzzy Grouping component. Let’s start by creating a simple SSIS package to pick up from the customer csv file. I’ve created a package, the source connection csv manager, and a flat file source in my data flow. We can tell by looking at the file that Bob and Robert are 99% duplicates of either. SSN could work just as well here, or something else as long as we know it really tells who a customer is and what makes them distinct (or product, or account, or facility, etc…) Moving on! Let’s drag over and configure a fuzzy grouping component from the “Other Transforms” section of SSIS. Add the precedence constraint, and open it up.

image

Configure the OLE DB connection manager, and then go to the columns tab. We want to tell this component the same keys that we identified in the last step, What can the component use to resolve the duplicates together and come up with the distinct (in this case) customers? In our sample set, we have Telephone number. Check the box next to telephone number, and a table will appear. Under the minimum similarity column, input a value to give the component a rule to identify how similar something must be for it to be a match. Depending on your data point, this can be any value. For phone number, we’re going to say if it’s 99% match, it’s probably the same customer or household. This value will change though, it depends on your environment and your specific data point. Groupings of products, for example, may justify lowering this number to something in the 75%-80% range. My completed column looks like the below:

image 

Let’s look at what we have so far. Drag over a multicast, and enable the data viewer on it so we can see the row pass through the pipeline. Execute the package, and our data view appears:

image

I included the data flow over to the left in this screenshot, so you can also see what the package looks like. We can see the key_in and key_out fields have been added. Let’s take a conditional split component, and configure it with 2 outputs, one for the matching records and one for the other, duplicate records:

image

For the wrong data, we don’t just want to delete it out of the data flow. We want to keep tracking of it somewhere so we can see what’s being removed out of the data flow. Let’s drag over a destination component for our good and bad records, and configure them. Our finished data flow looks like the below:

image

Run the package, and let’s check our tables:

image

image

image

We can see that the bad records moved into the appropriate table for auditing, and our new clean records are now available for us to use!

SSIS Data Types

For all of its life, there have always been a lot of questions circulating around about data types in SSIS. What does an int in SQL map to in SSIS? What about a varchar? nvarchar? The questions could just go on and on. But, finally, we have an answer. In the all new SSISDB in 2012, there is a view built specifically to answer all of these questions. It’s called data_type_mapping, and exists as an actual table in the internal schema in tables. It’s a pretty small table, but it’s usefulness is far reaching. I’ve pasted the actual contents of the table here, for anyone who might not have SSIS 2012 or might not have the SSISDB set up.

mapping_id ssis_data_type sql_data_type
9 Boolean bigint
8 Boolean bit
10 Boolean int
11 Boolean smallint
12 Boolean tinyint
1 Byte bigint
13 Byte bit
25 Byte int
37 Byte smallint
44 Byte tinyint
22 DateTime datetime
21 DateTime datetime2
36 DateTime smalldatetime
55 Decimal decimal
58 Decimal float
57 Decimal numeric
56 Decimal real
23 Double decimal
24 Double float
33 Double numeric
35 Double real
2 Int16 bigint
14 Int16 bit
26 Int16 int
38 Int16 smallint
45 Int16 tinyint
3 Int32 bigint
15 Int32 bit
27 Int32 int
39 Int32 smallint
46 Int32 tinyint
4 Int64 bigint
16 Int64 bit
28 Int64 int
40 Int64 smallint
47 Int64 tinyint
5 SByte bigint
17 SByte bit
29 SByte int
41 SByte smallint
48 SByte tinyint
51 Single decimal
54 Single float
53 Single numeric
52 Single real
20 String char
32 String nchar
34 String nvarchar
6 UInt32 bigint
18 UInt32 bit
30 UInt32 int
42 UInt32 smallint
49 UInt32 tinyint
7 UInt64 bigint
19 UInt64 bit
31 UInt64 int
43 UInt64 smallint
50 UInt64 tinyint

 

Happy Developing!

Merge Against SSIS Part Two

In my last post I went over the merge statement and some good uses for it when that’s all you want, but an extremely effective way to use it with SSIS is to have SSIS do all of your transformations, and the merge statement do the actual updates of the records, so SSIS isn’t slowed down by the OLE DB Command or other type of blocking transformation. In order to implement this, you’ll need to set up a staging table on the same server as your data warehouse to insert your changing records too where the merge statement can access them. It’s up to you if you want how you want to classify your staging tables (different name, different schema, or different db), but the key is that it just needs to be on the same server. For this example, I just appended staging to the name, although in actual implementation my preference is to have a different schema. But hey, that’s just me. 🙂 The easiest way to create it is to just script out your actual dimension table, but instead of your surrogate key column being an identity just make it a primary key. This is going to be the key that the staging table the dimension table join on. Leave it as an actual primary key, in order to prevent any duplicates. As an example, my staging script is below:

 

CREATE TABLE [MergePres].[NationalFileStagingUpdates](
    [FeatureID] [int] NOT NULL,
    [FeatureName] [varchar](120) NULL,
    [FeatureClass] [varchar](50) NULL,
    [StateCode] [char](2) NULL,
    [StateNumeric] [char](2) NULL,
    [CountyName] [varchar](100) NULL,
    [CountyNumeric] [char](3) NULL,
    [PrimaryLatitudeDMS] [varchar](7) NULL,
    [PrimaryLongitudeDMS] [varchar](8) NULL,
    [PrimaryLatitudeDEC] [numeric](11, 7) NULL,
    [PrimaryLongitudeDEDC] [numeric](12, 7) NULL,
    [SourceLatitudeDEC] [numeric](11, 7) NULL,
    [SourceLongitudeDEC] [numeric](12, 8) NULL,
    [ElevationMeters] [int] NULL,
    [ElevationFeet] [int] NULL,
    [MapName] [varchar](100) NULL,
    [DateCreated] [datetime] NULL,
    [DateEdited] [datetime] NULL,
    [Country] [varchar](25) NULL,
CONSTRAINT [PK_NationalFileStagingUpdates] PRIMARY KEY CLUSTERED
(
    [FeatureID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

 

Once this is created, we’re now ready to bring in SSIS. Create a new package and on the control flow tab, drag over an execute SQL task and point the connection to your database that contains your staging table. In the SQL Statement, truncate your existing Staging table: TRUNCATE TABLE [MergePres].[NationalFileMergeWithSSISUpdates]. The reason why we’re truncating staging here is to make sure we’re only keeping the most current changes to the dimension. Underneath it drag over a data flow task, and then make sure you drag in your precedence constraint from the execute SQL to the data flow. Should look similar to the below:

image

 

After you have your data flow task, open it  and create your dimension like you normally would, the only exception is at the bottom left (in my data flow), where you would normally drag over your ole db command and destination to expire the old record and insert the new. Just drag over your destination component, and point the destination at your staging table that you created earlier. In my simple package below, you can see there isn’t much actual transformation happening. Just reading from the source, adding an IsCurrent and BeginEffectiveDate to the data flow, looking up to the existing dimension, and then identifying new records and changes.

image

 

Now you have your changing records writing to a brand new table instead of trying to update your actual dimension. So the last step is to go back to your control flow, and add another execute SQL task to call the Merge statement that we created in our last blog post. Drag over the execute SQL task, create the connection to where your Merge statement stored procedure lives, and then write the execute statement:

 

EXEC MergePres.MergeAgainstSSISType2AllSSISHybrid

 

Your final control flow should look like:

image

 

And that’s it. It’s only a couple more steps than you would already have to do to create a dimension but now you have a highly scalable package that can handle thousands of updates a day without worrying about performance.

Merge Against SSIS

Yesterday I was lucky enough to get to do a presentation for Pragmatic Works on using the Merge statement and comparing it to SSIS and the performance differences between them. Some highlights of the session included:

  • Same server vs different server comparisons
  • SSIS ole db command and it’s limitations
  • Type 1 vs Type 2 changes
  • Merge statement living on the same server

Below are the answers to some of the questions I received, followed by the sample code. I hope everyone finds it useful!

 

What are the common types of dimension changes and what are the differences between them?

    1. While this probably qualifies as it’s own blog post (note to self :)), the major differences are outlined below:
      1. Type 1 attributes are attributes (columns) within your table that do NOT track history. For example, if the color of my product was orange yesterday but today it is green I simply overwrite the orange value in the table and move on.
      2. Type 2 attributes are attributes (columns) within your table that DO track history. Using the same example, if my color was orange yesterday but changed when the package ran last night I would expire the orange record(showing that it is no longer current), and then inserting a new row with the color of green with a new start date and a current row.

 

When the source and destination tables are on the same server, the merge statement is amazingly fast. In our sample file set we were running yesterday, we were running 2.2 million updates in 2-3 minutes. If you’re in a small shop, this may be the perfect solution for you in order to house both your data warehouse and your transactional database(s) on the same server. However, if you’re in a larger shop, we’ve already stated the biggest drawback to this: the source and destination have to be on the same server. Sometimes that’s just not possible, either physically or logically. In this case, what do we do? SSIS is absolutely great at moving data across networks from server A to server B, but the drawback with SSIS is how to get all of those updates in within a short amount of time. The OLE DB command in SSIS is RBAR (row by agonizing row), and will definitely not perform at scale. To accomplish tons of work at scale, we can still have SSIS do the heavy lifting across servers but have the merge statement do the actual updates server side to get the updates done quickly. For this example, I used the National Features File available for free download at http://geonames.usgs.gov/domestic/download_data.htm. Download the file, and unzip/extract it. To demonstrate, first create your dimension table:

 

CREATE TABLE [MergePres].[NationalFile](
    [FeatureID] [int] NOT NULL,
    [FeatureName] [varchar](120) NULL,
    [FeatureClass] [varchar](50) NULL,
    [StateCode] [char](2) NULL,
    [StateNumeric] [char](2) NULL,
    [CountyName] [varchar](100) NULL,
    [CountyNumeric] [char](3) NULL,
    [PrimaryLatitudeDMS] [varchar](7) NULL,
    [PrimaryLongitudeDMS] [varchar](8) NULL,
    [PrimaryLatitudeDEC] [numeric](11, 7) NULL,
    [PrimaryLongitudeDEDC] [numeric](12, 7) NULL,
    [SourceLatitudeDEC] [numeric](11, 7) NULL,
    [SourceLongitudeDEC] [numeric](12, 8) NULL,
    [ElevationMeters] [int] NULL,
    [ElevationFeet] [int] NULL,
    [MapName] [varchar](100) NULL,
    [DateCreated] [datetime] NULL,
    [DateEdited] [datetime] NULL,
    [Country] [varchar](50) NULL,
    [IsCurrent] [char](1) NULL,
    [BeginEffectiveDate] [datetime] NULL,
    [EndEffectiveDate] [datetime] NULL
) ON [PRIMARY]

GO

 

Notice that I added a Country column into the table to show as our example. Create a staging table, but for country we are going to change it so our updates will take affect:

 

CREATE TABLE [MergePres].[NationalFileMergeWithSSISUpdates](
    [FeatureID] [int] NOT NULL,
    [FeatureName] [varchar](120) NULL,
    [FeatureClass] [varchar](50) NULL,
    [StateCode] [char](2) NULL,
    [StateNumeric] [char](2) NULL,
    [CountyName] [varchar](100) NULL,
    [CountyNumeric] [char](3) NULL,
    [PrimaryLatitudeDMS] [varchar](7) NULL,
    [PrimaryLongitudeDMS] [varchar](8) NULL,
    [PrimaryLatitudeDEC] [numeric](11, 7) NULL,
    [PrimaryLongitudeDEDC] [numeric](12, 7) NULL,
    [SourceLatitudeDEC] [numeric](11, 7) NULL,
    [SourceLongitudeDEC] [numeric](12, 8) NULL,
    [ElevationMeters] [int] NULL,
    [ElevationFeet] [int] NULL,
    [MapName] [varchar](100) NULL,
    [DateCreated] [datetime] NULL,
    [DateEdited] [datetime] NULL,
    [Country] [varchar](50) NULL,
    [IsCurrent] [char](1) NULL,
    [BeginEffectiveDate] [datetime] NULL,
    [EndEffectiveDate] [datetime] NULL
) ON [PRIMARY]

GO

INSERT INTO [MergePres].[NationalFileMergeWithSSISUpdates]
([FeatureID]
      ,[FeatureName]
      ,[FeatureClass]
      ,[StateCode]
      ,[StateNumeric]
      ,[CountyName]
      ,[CountyNumeric]
      ,[PrimaryLatitudeDMS]
      ,[PrimaryLongitudeDMS]
      ,[PrimaryLatitudeDEC]
      ,[PrimaryLongitudeDEDC]
      ,[SourceLatitudeDEC]
      ,[SourceLongitudeDEC]
      ,[ElevationMeters]
      ,[ElevationFeet]
      ,[MapName]
      ,[DateCreated]
      ,[DateEdited]
      ,[IsCurrent]
      ,[BeginEffectiveDate]
      ,[Country]
      ,[EndEffectiveDate])
SELECT [FeatureID]
      ,[FeatureName]
      ,[FeatureClass]
      ,[StateCode]
      ,[StateNumeric]
      ,[CountyName]
      ,[CountyNumeric]
      ,[PrimaryLatitudeDMS]
      ,[PrimaryLongitudeDMS]
      ,[PrimaryLatitudeDEC]
      ,[PrimaryLongitudeDEDC]
      ,[SourceLatitudeDEC]
      ,[SourceLongitudeDEC]
      ,[ElevationMeters]
      ,[ElevationFeet]
      ,[MapName]
      ,[DateCreated]
      ,[DateEdited]
      ,[IsCurrent]
      ,[BeginEffectiveDate]
      ,’United States of America’ AS [Country]
      ,[EndEffectiveDate]
  FROM [MergePres].[NationalFile]

 

 

Next up we need to create our stored procedure for the merge statement:

 

CREATE PROCEDURE [MergePres].[MergeAgainstSSISType2AllSSISHybrid]
AS

INSERT INTO MergePres.NationalFile
    ([FeatureID]
              ,[FeatureName]
              ,[FeatureClass]
              ,[StateCode]
              ,[StateNumeric]
              ,[CountyName]
              ,[CountyNumeric]
              ,[PrimaryLatitudeDMS]
              ,[PrimaryLongitudeDMS]
              ,[PrimaryLatitudeDEC]
              ,[PrimaryLongitudeDEDC]
              ,[SourceLatitudeDEC]
              ,[SourceLongitudeDEC]
              ,[ElevationMeters]
              ,[ElevationFeet]
              ,[MapName]
              ,[DateCreated]
              ,[DateEdited]
              ,[IsCurrent]
              ,[BeginEffectiveDate]
              ,[Country])
SELECT
    [FeatureID]
              ,[FeatureName]
              ,[FeatureClass]
              ,[StateCode]
              ,[StateNumeric]
              ,[CountyName]
              ,[CountyNumeric]
              ,[PrimaryLatitudeDMS]
              ,[PrimaryLongitudeDMS]
              ,[PrimaryLatitudeDEC]
              ,[PrimaryLongitudeDEDC]
              ,[SourceLatitudeDEC]
              ,[SourceLongitudeDEC]
              ,[ElevationMeters]
              ,[ElevationFeet]
              ,[MapName]
              ,[DateCreated]
              ,[DateEdited]
              ,’Y’
              ,GETUTCDATE()
              ,[Country]
FROM
(
MERGE [MergePres].[NationalFile] AS Dest
USING [MergePres].[NationalFileMergeWithSSISUpdates] AS Src
    ON Src.FeatureID=Dest.FeatureID
WHEN NOT MATCHED THEN    –if no matching record found, insert it into the table
        INSERT ([FeatureID]
              ,[FeatureName]
              ,[FeatureClass]
              ,[StateCode]
              ,[StateNumeric]
              ,[CountyName]
              ,[CountyNumeric]
              ,[PrimaryLatitudeDMS]
              ,[PrimaryLongitudeDMS]
              ,[PrimaryLatitudeDEC]
              ,[PrimaryLongitudeDEDC]
              ,[SourceLatitudeDEC]
              ,[SourceLongitudeDEC]
              ,[ElevationMeters]
              ,[ElevationFeet]
              ,[MapName]
              ,[DateCreated]
              ,[DateEdited]
              ,[IsCurrent]
              ,[BeginEffectiveDate]
              ,[Country])
        VALUES (Src.[FeatureID]
              ,Src.[FeatureName]
              ,Src.[FeatureClass]
              ,Src.[StateCode]
              ,Src.[StateNumeric]
              ,Src.[CountyName]
              ,Src.[CountyNumeric]
              ,Src.[PrimaryLatitudeDMS]
              ,Src.[PrimaryLongitudeDMS]
              ,Src.[PrimaryLatitudeDEC]
              ,Src.[PrimaryLongitudeDEDC]
              ,Src.[SourceLatitudeDEC]
              ,Src.[SourceLongitudeDEC]
              ,Src.[ElevationMeters]
              ,Src.[ElevationFeet]
              ,Src.[MapName]
              ,Src.[DateCreated]
              ,Src.[DateEdited]
              ,’Y’
              ,GETUTCDATE()
              ,Src.[Country])
WHEN MATCHED
AND (
      ISNULL(Dest.[FeatureID], ”) <> ISNULL(Src.[FeatureID], ”) OR
      ISNULL(Dest.[FeatureName], ”) <> ISNULL(Src.[FeatureName], ”) OR
      ISNULL(Dest.[FeatureClass], ”) <> ISNULL(Src.[FeatureClass], ”) OR
      ISNULL(Dest.[StateCode], ”) <> ISNULL(Src.[StateCode], ”) OR
      ISNULL(Dest.[StateNumeric], ”) <> ISNULL(Src.[StateNumeric], ”) OR
      ISNULL(Dest.[CountyName], ”) <> ISNULL(Src.[CountyName], ”) OR
      ISNULL(Dest.[CountyNumeric], ”) <> ISNULL(Src.[CountyNumeric], ”) OR
      ISNULL(Dest.[PrimaryLatitudeDMS], ”) <> ISNULL(Src.[PrimaryLatitudeDMS], ”) OR
      ISNULL(Dest.[PrimaryLongitudeDMS], ”) <> ISNULL(Src.[PrimaryLongitudeDMS], ”) OR
      ISNULL(Dest.[PrimaryLatitudeDEC], ”) <> ISNULL(Src.[PrimaryLatitudeDEC], ”) OR
      ISNULL(Dest.[PrimaryLongitudeDEDC], ”) <> ISNULL(Src.[PrimaryLongitudeDEDC], ”) OR
      ISNULL(Dest.[ElevationMeters], ”) <> ISNULL(Src.[ElevationMeters], ”) OR
      ISNULL(Dest.[ElevationFeet], ”) <> ISNULL(Src.[ElevationFeet], ”) OR
      ISNULL(Dest.[MapName], ”) <> ISNULL(Src.[MapName], ”) OR
      ISNULL(Dest.[DateCreated], ”) <> ISNULL(Src.[DateCreated], ”) OR
      ISNULL(Dest.[DateEdited], ”) <> ISNULL(Src.[DateEdited], ”) OR
      ISNULL(Dest.[Country], ”) <> ISNULL(Src.[Country],”)
     )
THEN UPDATE
    SET
        Dest.IsCurrent=’N’,
        Dest.EndEffectiveDate=GETUTCDATE()
OUTPUT Src.[FeatureID]
      ,Src.[FeatureName]
      ,Src.[FeatureClass]
      ,Src.[StateCode]
      ,Src.[StateNumeric]
      ,Src.[CountyName]
      ,Src.[CountyNumeric]
      ,Src.[PrimaryLatitudeDMS]
      ,Src.[PrimaryLongitudeDMS]
      ,Src.[PrimaryLatitudeDEC]
      ,Src.[PrimaryLongitudeDEDC]
      ,Src.[SourceLatitudeDEC]
      ,Src.[SourceLongitudeDEC]
      ,Src.[ElevationMeters]
      ,Src.[ElevationFeet]
      ,Src.[MapName]
      ,Src.[DateCreated]
      ,Src.[DateEdited]
      ,Src.[Country]
      ,$Action AS [MergeAction]
) AS Mrg
;

 

Let’s examine the different pieces of the statement real quick. The first portion (from the initial SELECT statement down to the FROM is where we’ll do our insert. We use the merge statement as a subquery inside of the from statement so the only records that will be inserted are those that are identified in the merge statement.

 

The second part from the beginning of the Merge keyword to the end of the USING clause are where we’re identifying the business key between the source and destination so we can do our lookup.

 

The third part of the query is doing an insert if the record previously didn’t exist. Since we’re having SSIS do the inserts if no match is found, this shouldn’t really ever need to write it, but it’s in there just in case.

 

The fourth part of the query (the WHEN MATCHED part of the query onwards) is identifying if the business keys do match, what’s different? Anything? If it is different it will expire the current row and set the IsCurrent flag to N and the End Effective date into the table.

 

In my next post we’ll cover the SSIS portion of this package.