SSIS: Dynamically setting the Default Buffer Max Rows property

I’ve blogged in the past about the Default Buffer Max Rows property in SSIS, and how powerful it can be. As a recap, there are 2 properties in SSIS that control the amount of data that your ETL package will hold in any given buffer: The DefaultBufferMaxRows property, and the DefaultBufferSize. When performance tuning SSIS packages, we can modify either one of these to adjust how much data SSIS can carry through a buffer at once.  By default, this is 10,000 rows or 10 MB. One thing I didn’t like about my “CalculateDefaultBufferSize” spreadsheet is that it made an assumption that you were going to change your DefaultBufferSize to 100MB. So this is 100 MB per component that is running in our data flow, which can become extremely costly. If you leave it at the default however, you’re  still using 10 MB per component no matter what. If we have a really skinny data set we’re importing (such as a fact table or a hub), it’s very possible we could be leaving a lot of empty space in the data flow if we stop every 10,000 rows. Conversely,  if we have a really wide dataset that we can only fit 6,000 or 7,000 rows in a buffer, we’re having to make extra hops to move that same data. It’s always a balancing act. I recently decided to see if I could automate the setting of this property in order to maximize the efficiency of my SSIS packages without needing to manually calculate it out every time.

To start, we need to find our destination table and identify the maximum row length in that table. Fortunately, this can be obtained with a simple query (using DimCurrency in the AdventureWorksDW2012):

SELECT
    SUM (max_length)  [row_length]
FROM  sys.tables t
JOIN sys.columns c
    ON t.object_id=c.object_id
JOIN sys.schemas s
    ON t.schema_id=s.schema_id
WHERE t.name = ‘DimCurrency’
    AND s.name = ‘dbo’

You could easily turn this into a procedure to take the tablename and schema as a parameter, and return it. For this post, I’m going to leave it as a query. Once we have this, it becomes a simple exercise in math. The table above returns a max row length of 118 (in bytes), so we take this value and divide it into 1our default value (which fortunately is stored in SSIS in bytes as well – as 10485760). So our calculation becomes:

10485760/118

giving us a result of 88,862 rows. We’ll round down to the closest 100, and that will be our default buffer max rows property. Now how do we make this dynamic? To start, we open up our SSIS package and add an Execute SQL task into the Control Flow. A good item to improve upon this would be take this and put in a Pre-Execute SQL task, and write the value to an information table. Then you can log it and do some reporting around what your buffers are. As an added bonus with this dynamic structure: as your data scales, your ETL package will too!

image

Next we want to create 3 variables: str_table_name, str_schema_name, and int_row_size. str_table_name and str_schema_name should be strings, and int_row_size should be an Int32. For the int_row_size variable, set the Value to 1. This is going to avoid an error in our expression later. This way we can parameterize our inputs for our destination tables and the corresponding values, and create a placeholder for our incoming result.  If your package names are the same as the destination tables then instead of having to assign this value manually you could just configure it from an expression, using the @[System::PackageName] global variable:

image

Let’s open up our Execute SQL Task and configure it. The connection will point to our destination database, and the SQL query will be as we’ve outlined above, subbing parameters for the table and schema name:

image

Do the parameter mapping on the 2nd tab to parameterize your table and schema with the variables you created above, and set your result set to single row. The single row result set should map to your int_row_size variable. Now for the fun part. Create a data flow task, and right click on it and go to properties.

image

Under expressions, select DefaultBufferMaxRows and expand the expression ellipses. For the expression, use the following logic:

@[User::int_row_size]==1? 10000: 10485760/ @[User::int_row_size]

The initial test of the value is to ensure that if the sql fails to get the proper row size, your buffers remain at the default. And that’s it! Now you can dynamically set your default buffer max rows for ssis!

Now that’s some awesome-sauce.

Chris

Advertisements

6 thoughts on “SSIS: Dynamically setting the Default Buffer Max Rows property”

  1. Hello thanks for the article. I’m still having performance issues reading in a csv file with 3.8 million rows into a table below using ADO.NET Destination with the following settings set on the Data Flow Task:

    1. DefaultBufferMaxRows => 85948
    a. Using your row length calculation I get the following : 10485760/122
    2. DefaultBufferSize => 104857600

    Time to complete is taking 10min. Also the destination table resides in SQLAzure yet I have a high speed connection. Any suggestions how I could speed this up?

    Thanks,

    Bill
    wrfleming@gmail.com

    CREATE TABLE [dbo].[AlphaStage](
    [RPT_REC_NUM] [numeric](7, 0) NOT NULL,
    [WKSHT_CD] [char](7) NOT NULL,
    [LINE_NUM] [char](5) NOT NULL,
    [CLMN_NUM] [char](5) NOT NULL,
    [ALPHNMRC_ITM_TXT] [varchar](100) NOT NULL,
    CONSTRAINT [PK_AlphaStage] PRIMARY KEY CLUSTERED
    (
    [RPT_REC_NUM] ASC,
    [WKSHT_CD] ASC,
    [LINE_NUM] ASC,
    [CLMN_NUM] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
    )

    1. Hi Bill,
      How many components are in the SSIS package? Not sure if it was a typo but up above you stated the DefaultBufferSize was 104757600? If so that means each component is using 100 MB of memory to run, you may be taking up a lot of memory. If not, consider upping your DefaultBufferMaxRows to maximize that size (104757600/122 = 859400). Also, can you use on OLE DB connection instead of ADO.NET? Typically, OLE DB performs quicker than ADO.NET does. Since it’s a csv as well, consider setting the fastparse property for reading the rows out, as long as you trust that the data types will not change. There is a good article on setting them here: http://microsoft-ssis.blogspot.com/2013/04/performance-best-practice-flat-file.html

      ~Chris

      1. Thanks Chris for replying. Yes that was a typo. There are only two components one to read in the data from the csv file and the destination to a table in sqlazure db. In regards to the connection type I can certainly switch back as I did have it originally to the OLE DB connections. Thanks for the tips on the fast parser and I will check out the article.

        Bill

  2. Hello Chris, may i ask you why dont we maximize the DefaultBufferSize, and we are keeping it at 10MB ? or cant we also dynamically set that 🙂 im a bit confused there. Anyway thanks for this post.

    Gokberk

    1. Hi Gokberk,
      Absolutely! Thanks for posting. You could definitely set the DefaultBufferSize as well if you’d prefer. I’m partial to maximizing the rows you push through, and just considering 10 MB as a “constant” price you have to pay. Certainly, that’s not entirely accurate, but I find that it’s an easier way to explain it than getting into the nuances. To do it, you’d just switch the post slightly to treat your 10,000 rows as the denominator, instead of the 10 MB default buffer size.

      Chris

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s