Connecting SSAS Tabular to Excel Files in SharePoint

One of my absolute favorite features in tabular models is the ease in which we can do things like discretization and grouping. What used to take days in Multidimensional can now be accomplished in just minutes. Additionally, we can now give full control of these groups to business users to be able to dynamically adjust these groups in Excel, and import these into our tabular models. I did a post a while back about setting up discretization buckets in tabular, but I was recently showing this to a client and the thought occurred to me that we could create a SharePoint library to house these workbooks instead of having to keep them on a fileshare. Placing them into SharePoint allows us to set security, adjust it in the browser, and just get some other awesome sauce into our model. This should be a pretty short post, but it’s really cool so bear with me. 🙂

 

First, create a SharePoint library to house you’re workbook. Truthfully, any library will do but long term you’re going to want to restrict this library down to just a few individuals who can access it. Once your library is created, upload the workbook. Now select the workbook, and go up to Library in the browser options. About halfway across the top of the page, there is an option to “Open in Explorer”

image

 

When you select that, your workbook will open up in windows explorer and you can then grab the connection string. Right click on the file, and go to properties. You want to grab the location section, and then add in the workbook at the end of it. Notice that SharePoint files have an extra section in the filepath that must be there in order for it to pull.

image

Copy it, and move over to tabular. In the tabular model, create a new connection to an excel workbook, and then paste the filepath you just copied in. Click through and you’re good to go. Now when users want to modify groups they can do so easily from within SharePoint, and you’re tabular model can just pull that data through. Awesome-sauce!

What happens when the OLAP Query Log doesn’t turn on

Bill Anton recently had a really great post about the OLAP query log and how to turn it on and translate it into English (available here). I’ve been meaning to blog about this for awhile but have been putting it off, and reading his post motivated me to go actually do it, instead of just staring at the note I have sketched down (thanks Bill!) One thing to keep in mind is that the query log does consume I/O while in use, so be careful about putting too much overhead on your server.

 

So let’s say you do all this great reading, and want to turn on your OLAP Query Log. You go to properties->general, and change the query log create table to true, specify your connection string, give your table a name, and then go check your database. But….nothing happens. Even after you refresh. Oh, Analysis Services, why must you vex me so? Fortunately, all of the information Analysis Services is doing with trying to create this table is logged. We just need to know where.

 

Deep in the depths of Analysis Services there is a file (well, lots of them actually, but we’re looking for something specific here :)). In the SQL Analysis services install directory, everything that the msmdsrv config file tries to do has a corresponding log entry, located in a log file at $InstallDrive\Microsoft SQL Server\MSAS12.MSSQLSERVER\OLAP\Log (note the 12 is the version number, for 2014. yours might be 11 for 2012, 10.5 for 2008R2, 10 for 2008, or 9 for 2005). In it is a simple text file called msmdsrv that shows all of the activity. It contains information like the start and stop times of the service, flight recorder start/stop info, some I/O information, and general configuration errors. If you scroll down to the bottom (assuming you just tried to create the OLAP Query log table and it doesn’t work, you should see an entry with the detailed error information:

 

image

 

Starting at the bottom, it says “An error occurred while starting the query log. Now we know at least what happened! An error. Ok, what error? Up two lines is our culprit: “Login failed for user ______________”. There we have it. I’d changed the service account on my vm to a dedicated SSAS MD service account (best practice and all), and hadn’t given it login and create table permissions in the source server. I’m going to go create a user in the SQL database engine for my new service account, and then go back into the properties. Turn the create table to false, click Ok. Go back in and change back to true, and click Ok. Now Analysis Services was able to create the log table without any issues:

 

image

 

Fortunately, a relatively easy fix, once you find the log file. Now my mind is beginning to wonder what else we can do with this log file… (hmm. definitely unstructured. maybe either filestream to get it into SQL or…. A HADOOP CLUSTER!!!!)

DAX: Date Calculation Tables

Some time ago, I blogged about using a date calculation dimension in a multidimensional cube, and how easy it makes doing time intelligence in a multidimensional model. 2014 seems to have started out as “the year of Power Pivot”, as it is definitely gaining traction with my clients and more and more of them are asking about it during engagements. One of the absolutely amazing things I get absolutely giddy about in Power Pivot is the concept of disconnected slicers. As Rob Collie best said, disconnected slicers in Power Pivot are pure magic. I particularly enjoy the “mad scientist” reference he makes. For those unfamiliar, a disconnected slicer is a piece of data that has absolutely nothing to relate to in your data, but you want to use it as a part of your analysis. For a great introduction, see Rob’s Power Pivot pro blog about it as an introduction. I was recently showing a client this mad scientist level stuff, but then took the natural progression and thought: “If I can do this to control the number of months back, can I take it a step further and create a date calculation dimension like in multidimensional too?” What, that wasn’t the first thing you thought too?

First off, let’s create some measures in Power Pivot that contain our measures and time intelligence analysis:

image

For this post, I’ve created 4: Total Sales, Calendar YTD Sales, Previous Period, and previous period diff. These are typical time intelligence functions, nothing overly special or different that I did on them. If you’re interested, these are the formulas for them:

Total Sales:=SUM([SalesAmount])

Calendar YTD:=TOTALYTD([Total Sales],’Date'[FullDateAlternateKey])

Calendar Previous Period:=CALCULATE([Total Sales],DATEADD(‘Date'[FullDateAlternateKey],-12,MONTH))

Calendar Previous Period Diff:=[Total Sales]-[Calendar Previous Period]

Not too bad, yet. Ok, let’s create a table to house our disconnected slicer. In Excel, create a table with the value you want to select and an “ID” column next to it:

image

Then, add it in to your data model, either by copying and pasting or clicking the “Add to Data Model” link in the Power Pivot ribbon. Below is my Power Pivot diagram view after importing, notice how there is no relationship between the date calculation table and the fact table:

image

Getting back in to the data view, we’ll navigate back to the fact table, and take the same approach as Rob did for his disconnected slicer to return the currently selected slicer:

Date Calculation Selected:=MIN(‘Date Calculation'[Value])

In the preceding formula, we’re taking the min value of the value column of the date calculation dimension. This allows us to then select the current active context inside of our pivot table. In the below image, I’m selecting the measure value from disconnected slicer table I created, and the Date Calculation selected measure from my fact table.

image

This is…AWESOME! However, when I go to drag on my time intelligence and performance measures, it doesn’t quite look right:

image

Whoa! That’s… not awesome. So let’s fix it. Back on the fact table, we can utilize a function called SWITCH, which works like a case statement in SQL. The syntax for switch starts with the column or value you want to evaluate, and then the cases afterwards, separated by commas. As an example, in this case, we want to swap in a different measure based on that [Date Calculation Selected] measure we created 2 paragraphs ago:

Active Measure:=SWITCH([Date Calculation Selected],1,[Total Sales],2,[Calendar YTD],3,[Calendar Previous Period],4,[Calendar Previous Period Diff])

We’re evaluating the [Date Calculation Selected], and then returning a different measure based on what the active context is for that row. Let’s go back to our pivot table, take off all of those old values and replace that didn’t look so good above and replace them with our new “Active Measure” measure:

image

Oh. My. God. That is so AWESOME!!!! Even more amazing, we could even take our date calculation table and put it in a slicer above, freeing the columns section of the pivot for something else to make this even more flexible and put in on a dashboard. This way, we can actually have the MEASURES in a slicer!

image

That is some serious awesome-sauce! The complete workbook is available for download here.

DAX: Calculating different rates on user defined inputs

A few weeks ago a client asked me an interesting question. They had a table of data, and they wanted to pass in (in their words) a variable in order to change the calculated value. I’d played around with banding before (see my post here), but I hadn’t been able to play with something quite as specific as the scenario they presented. To outline the issue, let’s say we have a user created table that consists of the different pricing tiers for our customers, similar to the below:

image

 

And we have a table of data that looks like this:

image

The challenge that the user wanted to present was to leave these 2 tables independent of each other. One challenge in DAX is that we cannot use a between statement to join tables, so we’re going to need to come up with something to give the value of the parameter we’re looking for. In the next step, I’ve now brought these 2 tables into Power Pivot, and joined in a date dimension, as shown by the below screenshot:

 

image

We can see that the data itself joins over to the date dimension fine, but we can’t join to the rates table as we are trying to go between. What we can do, however, is modify the data table to return the value of the parameter table, in a denormalized pattern. By utilizing the calculate command with a filter, we can get a list of that value repeated through our entire fact table.

 

=CALCULATE(VALUES(Rates[Value]),FILTER(Rates,Rates[Parameter]=”Hour Limit”))

In the above formula, I’m asking DAX to go to the rates table, and return back the corresponding value for the specific Hour Limit parameter. I can then repeat this same step to return the corresponding rate for each of the different tiers. Expanding on this value slightly, I can then do the same thing to return the tier 1 and tier 2 rate, to return the following values:

image

 

Now we can create a calculation to calculate the total amount that should be paid by the customer. For utility billing, it’s very common to have tiering rates, so for example, the first 2000 kWh hours are charged at one rate, but once you go over that, it’s then an additional surcharge on top of the tier 1 rate. We can come up with the amount due by using a formula like the below:

=IF([HoursUsed]<[SpecifiedLimit],([HoursUsed]*[Tier 1 Rate]),((([HoursUsed]-[SpecifiedLimit])*[Tier 2 Rate]) + ([SpecifiedLimit]*[Tier 1 Rate])))

Which gives me the following screenshot.

image

 

I can now either choose to leave it this way and hide my additional fields that are just doing the calculations, or roll them all into one. Rolling everything into one calculation so I only use one calculation in my model would result in a formula equal to the below:

 

=IF([HoursUsed]<(CALCULATE(VALUES(Rates[Value]),FILTER(Rates,Rates[Parameter]=”Hour Limit”))),
([HoursUsed]*(CALCULATE(VALUES(Rates[Value]),FILTER(Rates,Rates[Parameter]=”Tier 1 Rate”)))),
((([HoursUsed]-(CALCULATE(VALUES(Rates[Value]),FILTER(Rates,Rates[Parameter]=”Hour Limit”))))*(CALCULATE(VALUES(Rates[Value]),FILTER(Rates,Rates[Parameter]=”Tier 2 Rate”)))) +
((CALCULATE(VALUES(Rates[Value]),FILTER(Rates,Rates[Parameter]=”Hour Limit”)))* (CALCULATE(VALUES(Rates[Value]),FILTER(Rates,Rates[Parameter]=”Tier 1 Rate”))))))

 

A very handy way to get around the parameter issue in DAX is to use an excel workbook and link the Power Pivot or Tabular model to that in order to allow users to change “run-time” values, although it does require the Power Pivot model to be refreshed. This workbook can be downloaded here.

SSAS Tabular: Internal Physical Files

SSAS Tabular stores data mostly in memory. However, it still consists of two main engines to drive and respond to user queries, the formula engine and the storage engine. This, theoretically, is very similar to how queries were answered in multidimensional models. The overall query request follows a pattern that can be visually represented using the below diagram:

 

 

image

 

Just like in multidimensional, the formula engine takes the request, determines the optimal query plan, and then answers the query or requests the data that it needs from the storage engine. The storage engine runs against the data that the xVelocity engine stores in the cache. However, to me, 1 major question has bugged me about this from the beginning. In the case of a restart, or memory clearing of some way, how does the tabular model store it’s relationships between everything. The data itself may be stored in memory, but the relationships and the structure surely must have some reference files on disk that xVelocity and Tabular can use. Today while working and playing with it tonight, I think I’ve figured it out.

 

Each tabular model creates a folder in the data drive that we created and configured during installation, typically located at $InstallLocation/Microsoft SQL Server\MSAS11.TABULAR\OLAP\Data. Inside of this folder, each tabular model creates a folder, appended with .db, and an XML file. The XML file contains relevant information about the database, and the folder contains all the “good stuff”. Opening the folder against the Adventure Works tabular database, for example, gives me an image that looks like the below:

image

 

Similar to the database level, each table in tabular consists of an XML file, ended with .dim. Each table then also contains a folder. The .dim XML file contains relevant information about everything the table needs and the pointers that exist between the different columns to store the relationships for storage. Inside of the folder, each column consists of a tbl xml file, an IDF file, and potentially an HIDX file or DICTIONARY file, depending on the data type. Everything except for the tbl xml file are for tabular internal use, but we can get some good information about what tabular did during processing for each attribute from the tbl file. Opening up the tbl file, shows us some really good information about that column. We can see things like the min and max data id, and the number of segments:

 

image

Min and max segment and data id’s are things that the storage engine uses during query time to identify what data exists in the proper segment and data file. This is quite useful information! I’ll have to remember this for the future and when it comes to performance tuning tabular models.

SSAS Tabular: Transactional Deployment

SSAS has long performed deployment and processing as two very separate occurrences, that happen independently of each other. However, in tabular databases, there is a property that allows you to override this default behavior and tell SSAS to treat them as one process together. For developers, this is great news, as we can now tell SSAS to roll back the deployment if processing fails. The default behavior, with transactional deployment set to false, is that deployment can succeed before a process. What this means is that if you deploy a project successfully, but the processing then fails, you are left with an SSAS project that is unable to be browsed by any users until you can re-successfully process again.

 

To turn on this feature, right click on the project, and change the Transactional Deployment property from false to true. This option exists in both multidimensional and tabular solutions.

 

image

 

Awesome-sauce!

SSAS Tabular: Calculating the actual compression rate

We always hear about the power and compression ratio of tabular databases and Power Pivot, and how the Vertipaq compression engine can compress things up to 90% of what their source is. But, this opens up a lot of questions. For example, how can we properly size our tabular hardware solution in the first place? As in, how do we ensure that our hardware we’re setting aside is the proper amount. Unfortunately, just saying “Give me all the memory you can”, isn’t going to make our virtualization/networking guys a) love us, or b) happy. There are generally two steps in the iterative process for sizing:

Step 1: The initial estimate

Step 2: Refining the estimate and comparing the actual

 

Step 1 is relatively straight-forward. The general size of of a tabular database can be computed by running a simple calculation against the source database. First, run sp_spaceused against the database that you plan on bringing into your tabular model. If you are bringing in only a few tables, include the table name parameter in spaceused and calculate the total of all of the tables you wish to bring in. For this example, I have a simple AirlinePerformance data mart I’ve created on my local machine. Running sp_spaceused against the whole database gives me the following:

image

The generally accepted calculation we can run looks like the below:

([Database Size]/10) * 2

Dividing by 10 was established after benchmarking this formula against a wide range of datasets. You multiply it by 2 to account for when the database is not at rest, covering queries and processing operations. There are other formulas that exist that can go into deeper detail, but the generally accepted formula is the one above. Running this against my sample database gives me the following calculation:

(3182.75/10) * 2 = 636.55 mb

So the result of the above gives me 636 mb, which equates to 0.62 GB. I now know that I need to have at least 0.62 GB of RAM available to even load the model. This is the initial estimate, but how can I find the actual after I’ve created this. I’ve estimated 0.62, but is it accurate?

In step 2, we want to take our database after we’ve built the model and determine how much space it really is actually using. To get the actual file size of our tabular database, let’s go directly to the file system and determine it. Navigate to your SQL install directory ($\Microsoft SQL Server\MSAS11.TABULAR\OLAP\Data) and find the folder with the database name you’re calculating. Right click –> Properties, and get the file size of the folder. My file looks like the below:

 

image

 

So it’s 332 MB on disk. Let’s take 332 MB and divide it by our original uncompressed db size we got in our first statement (3182.75 MB). Divide the folder size into the original. 3182.75/332 = 9.58. NOW we have a good number for the actual compression ratio the Vertipaq engine is able to achieve. Remember, our original formula is ([Database Size]/10) * 2. We’re going to replace the 10 with our new actual compression rate, so the formula on my database would be like the below:

(3182.75/9.58) * 2 = 664.46 MB

So to handle a 3 GB source database, I need 664 MB of memory to handle it on an actual production server. As it is very hard to predict one blanket answer for everyone due to varying granularity, industires, data models, and data types, hopefully this will help other developers come up with decent size recommendations for your networking guys when you’re trying to get an idea of how much memory you need for your tabular database.