SSIS: Dynamically setting the Default Buffer Max Rows property

I’ve blogged in the past about the Default Buffer Max Rows property in SSIS, and how powerful it can be. As a recap, there are 2 properties in SSIS that control the amount of data that your ETL package will hold in any given buffer: The DefaultBufferMaxRows property, and the DefaultBufferSize. When performance tuning SSIS packages, we can modify either one of these to adjust how much data SSIS can carry through a buffer at once.  By default, this is 10,000 rows or 10 MB. One thing I didn’t like about my “CalculateDefaultBufferSize” spreadsheet is that it made an assumption that you were going to change your DefaultBufferSize to 100MB. So this is 100 MB per component that is running in our data flow, which can become extremely costly. If you leave it at the default however, you’re  still using 10 MB per component no matter what. If we have a really skinny data set we’re importing (such as a fact table or a hub), it’s very possible we could be leaving a lot of empty space in the data flow if we stop every 10,000 rows. Conversely,  if we have a really wide dataset that we can only fit 6,000 or 7,000 rows in a buffer, we’re having to make extra hops to move that same data. It’s always a balancing act. I recently decided to see if I could automate the setting of this property in order to maximize the efficiency of my SSIS packages without needing to manually calculate it out every time.

To start, we need to find our destination table and identify the maximum row length in that table. Fortunately, this can be obtained with a simple query (using DimCurrency in the AdventureWorksDW2012):

SELECT
    SUM (max_length)  [row_length]
FROM  sys.tables t
JOIN sys.columns c
    ON t.object_id=c.object_id
JOIN sys.schemas s
    ON t.schema_id=s.schema_id
WHERE t.name = ‘DimCurrency’
    AND s.name = ‘dbo’

You could easily turn this into a procedure to take the tablename and schema as a parameter, and return it. For this post, I’m going to leave it as a query. Once we have this, it becomes a simple exercise in math. The table above returns a max row length of 118 (in bytes), so we take this value and divide it into 1our default value (which fortunately is stored in SSIS in bytes as well – as 10485760). So our calculation becomes:

10485760/118

giving us a result of 88,862 rows. We’ll round down to the closest 100, and that will be our default buffer max rows property. Now how do we make this dynamic? To start, we open up our SSIS package and add an Execute SQL task into the Control Flow. A good item to improve upon this would be take this and put in a Pre-Execute SQL task, and write the value to an information table. Then you can log it and do some reporting around what your buffers are. As an added bonus with this dynamic structure: as your data scales, your ETL package will too!

image

Next we want to create 3 variables: str_table_name, str_schema_name, and int_row_size. str_table_name and str_schema_name should be strings, and int_row_size should be an Int32. For the int_row_size variable, set the Value to 1. This is going to avoid an error in our expression later. This way we can parameterize our inputs for our destination tables and the corresponding values, and create a placeholder for our incoming result.  If your package names are the same as the destination tables then instead of having to assign this value manually you could just configure it from an expression, using the @[System::PackageName] global variable:

image

Let’s open up our Execute SQL Task and configure it. The connection will point to our destination database, and the SQL query will be as we’ve outlined above, subbing parameters for the table and schema name:

image

Do the parameter mapping on the 2nd tab to parameterize your table and schema with the variables you created above, and set your result set to single row. The single row result set should map to your int_row_size variable. Now for the fun part. Create a data flow task, and right click on it and go to properties.

image

Under expressions, select DefaultBufferMaxRows and expand the expression ellipses. For the expression, use the following logic:

@[User::int_row_size]==1? 10000: 10485760/ @[User::int_row_size]

The initial test of the value is to ensure that if the sql fails to get the proper row size, your buffers remain at the default. And that’s it! Now you can dynamically set your default buffer max rows for ssis!

Now that’s some awesome-sauce.

Chris

Advertisements