SSAS Tabular: Calculating the actual compression rate

We always hear about the power and compression ratio of tabular databases and Power Pivot, and how the Vertipaq compression engine can compress things up to 90% of what their source is. But, this opens up a lot of questions. For example, how can we properly size our tabular hardware solution in the first place? As in, how do we ensure that our hardware we’re setting aside is the proper amount. Unfortunately, just saying “Give me all the memory you can”, isn’t going to make our virtualization/networking guys a) love us, or b) happy. There are generally two steps in the iterative process for sizing:

Step 1: The initial estimate

Step 2: Refining the estimate and comparing the actual

 

Step 1 is relatively straight-forward. The general size of of a tabular database can be computed by running a simple calculation against the source database. First, run sp_spaceused against the database that you plan on bringing into your tabular model. If you are bringing in only a few tables, include the table name parameter in spaceused and calculate the total of all of the tables you wish to bring in. For this example, I have a simple AirlinePerformance data mart I’ve created on my local machine. Running sp_spaceused against the whole database gives me the following:

image

The generally accepted calculation we can run looks like the below:

([Database Size]/10) * 2

Dividing by 10 was established after benchmarking this formula against a wide range of datasets. You multiply it by 2 to account for when the database is not at rest, covering queries and processing operations. There are other formulas that exist that can go into deeper detail, but the generally accepted formula is the one above. Running this against my sample database gives me the following calculation:

(3182.75/10) * 2 = 636.55 mb

So the result of the above gives me 636 mb, which equates to 0.62 GB. I now know that I need to have at least 0.62 GB of RAM available to even load the model. This is the initial estimate, but how can I find the actual after I’ve created this. I’ve estimated 0.62, but is it accurate?

In step 2, we want to take our database after we’ve built the model and determine how much space it really is actually using. To get the actual file size of our tabular database, let’s go directly to the file system and determine it. Navigate to your SQL install directory ($\Microsoft SQL Server\MSAS11.TABULAR\OLAP\Data) and find the folder with the database name you’re calculating. Right click –> Properties, and get the file size of the folder. My file looks like the below:

 

image

 

So it’s 332 MB on disk. Let’s take 332 MB and divide it by our original uncompressed db size we got in our first statement (3182.75 MB). Divide the folder size into the original. 3182.75/332 = 9.58. NOW we have a good number for the actual compression ratio the Vertipaq engine is able to achieve. Remember, our original formula is ([Database Size]/10) * 2. We’re going to replace the 10 with our new actual compression rate, so the formula on my database would be like the below:

(3182.75/9.58) * 2 = 664.46 MB

So to handle a 3 GB source database, I need 664 MB of memory to handle it on an actual production server. As it is very hard to predict one blanket answer for everyone due to varying granularity, industires, data models, and data types, hopefully this will help other developers come up with decent size recommendations for your networking guys when you’re trying to get an idea of how much memory you need for your tabular database.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s