SSAS Formula Engine

I’ve been reading the SSAS Multidimensional Operations and Performance Guide, which is honestly the best reading on Analysis Services I’ve ever done. A major area Thomas Kejser and Denny Lee focus on is performance requests and bottlenecks. When an MDX query is written and executed, it does a lot more than you’d tend to think. There are a lot of places it passes through before it just magically returns that result back to you on your screen. From a high level, the general flow from the client application to retrieval looks something like the below:

image 

 

In this post, I’m going to focus specifically on the formula engine. Over the next several days I plan on blogging about the others, so by the end of the series each piece will have it’s own post. The formula engine comprises of two parts: the query processor and the formula engine cache. The query processor is the piece that takes your request, generates an execution plan, finds the best place to find it, and returns the cell set. It’s very similar (ideologically, anyway) to the database engine query processor. However, unlike the database engine, the Analysis Services query processor has 2 types of execution plans that it uses: cell by cell or block mode computation. These are important as they have a determining factor on performance. As the storage engine doesn’t speak the same language, the query processor has to translate the request into something that it will recognize. If we think about how partitions and dimensions are actually stored on disk in the storage engine, this makes sense, as they’re organized into logical sub cube structures that the storage engine organizes.

 

As the query processor gets these responses back from the storage engine, it organizes and stores them in the query processor cache (also known as the formula engine cache). A handy tip that you can do to improve performance is do something called “warm the cache” where you warm this formula engine cache as the cube processes. See this post by the one and only Chris Webb for more information on that here. As Analysis Services technically has 2 locations, the formula engine and the storage engine, both caches could theoretically be warmed to improve performance, but that’s another post for another day.

 

Furthermore, the query processor cache has 3 distinct contexts that it uses to determine how often that value can be used in the cube, which are query, session, or global. It’s easiest to explain these via MDX examples.

 

Query context caches apply only to the duration of that query. This is very similar to a query scoped variable in a tsql query. Using an MDX function like WITH Member, where you are creating a calculation only for that specific query, is a query context example.

Session context applies to the length of your session. MDX queries like ALTER CUBE, CREATE MEMBER, or CREATE SET are session context scoped and apply for the duration of your connection.

Global context applies to your cube at a much larger level. Unlike the other two, which apply only to the one user who is writing the statement or executing the query, global caches can be used by multiple users, under the condition that the users share the same security role. Furthermore, the Global cache context is completely disabled for the entire cube if cell security, or functions like UserName, any of the “StrTo” functions, or LOOKUPCUBE functions exist anywhere in the cube.

 

Now, clearing up cell by cell versus block mode. Cell by cell mode (also called naive mode) are evaluated in a turn by turn style approach. What this means is that for each record that needs returned under your attribute hierarchy, the query processor looks at it, and returns every row, before it evaluates for what is not empty. If you have a cube that sparsely populated, this can lead to significant performance issues as it’s returning large amounts of values that are null (and will always be null). Additionally, if the data is populated in every intersection of the points, the engine is spending additional time identifying that same filter for every attribute, when it’s already a given what that filter is (year, for example)

 

Using subspace mode corrects this behavior by having the engine work its way down an execution tree identifying what needs to be filled in the first place. However, this can only be used in certain situations, especially given that Analysis Services, unlike SQL Server, will spend more time figuring out the execution plan than it will actually retrieving the data. 

Advertisements

SSAS Parent Child Dimensions Part 2: The Performance

Previously I’ve blogged about Parent Child dimensions in Analysis Services, and what exactly they are and how to configure them (here). However, parent-child dimensions have 1 huge issue: they don’t scale. Literally. The reason why is that in parent child dimensions, aggregations are only created for the key attribute. Everything in between is just calculated at run-time. This means that the larger your parent-child dimension gets, the bigger your performance issue gets along with it. There is a tool available on codeplex called the SSAS Parent Child Naturalizer (http://pcdimnaturalize.codeplex.com/#SQLView). This tool is super easy to use, and gives you the ability to “naturalize” out your parent child dimension (you can also naturalize this yourself if you don’t want to use the tool, but IMO this way is fast and efficient and gives you more time to focus on other development things as well). Instead of having 3 columns (ParentKey, ChildKey, and Name), this tool creates a separate column for each level that exists. There is a link on the site on how to use the tool, it’s pretty straightforward. Instead of walking through that here when it’s already readily available, I’m going to walk through the why you would want to use this tool. Specifically, how performance changes when the parent child dimension is naturalized.

 

I created an AS database that has 2 dimensions in it. The original Employee dimension is the straight employee dimension out of Adventure Works, with the parent child relationship. The second dimension is my modified employee dimension after using the SSAS Parent Child Naturalizer on CodePlex. Below is a screenshot after I initially just process them. Notice the difference in number of files (the regular parent child dimension is on the left, the naturalized hierarchy is on the right):

image

 

62 vs 242!? Wow, that’ is a BIG difference. Why the huge difference? By ordering the files by type, we can see how many of each type of file AS is creating as it’s processing. As an example, let’s look at just the DATA file type that exists. Prior to naturalizing the dimension, only 1 data file type existed, for the EmployeeKey (EmployeeKey.fact.data). After we naturalized it, we now have 8 different data files, one that exists for each level of our hierarchy. This proves my statement above how Analysis Services only creates the aggregation for the key level attribute. This same performance boost is carried through for every kind of AS file, for each level of the hierarchy.

 

Essentially, what we can learn from all of this is that natively parent child hierarchies perform very similarly on disk to those of unnatural hierarchies, which are terrible in terms of performance. By using the naturalizer tool we can turn this performance frown upside down (ooh, that’s super cheesy, even for me)!!

SSAS: There are currently no logon servers available to process the request

Today I went to process my Analysis Services dimension so that I could examine the differences between 2 things. When I went to process it, my processing failed with this error message:

 

image

 

The key part of this error seemed to be the section highlighted: “The following system error occurred:  There are currently no logon servers available to service the logon request.” That’s….odd. It’s the Adventure Works database on my machine, on a database with only 1 dimension. Then I realized that my project was in SSDT 2010 but the Analysis Services solution I was pointing to was 2008R2 (whoops). I went into the data source, changed the Provider from SQL Server Native Client 11.0 to SQL Server Native Client 10.0, and boom, it processes fine. After that I went back in and changed the provider back to SQL Server Native Client 11.0, and it still processed successfully. Seems like a bug, I’ve logged an issue on Connect today. The really interesting part about this error is that it actually has nothing to do with the server not being able to handle the logon request.

 

Also, instead of changing the provider I could have tried changing the source to point to a 2012 instance, but I didn’t have a 2012 multidimensional instance installed on the machine I was working on the time.

SSAS Parent Child Dimensions Part 1: The Properties

I’ve saved the best for last, so they say. Well, depending on how you look at it. Over the last week, I’ve blogged about all of the other sections in Analysis Services dimensions except for parent-child, so now it’s time to finish off our series. Parent-child dimensions in Analysis are equipped to handle the ragged and unbalanced hierarchies that the world gives us.You can get decent performance out of them, name your different levels, customize your calculations with unary operators, the works. Before we get into all that stuff, though, let’s look at just a basic parent-child hierarchy and it’s properties:

 

Sidebar: This is the employee dimension in AdventureWorks2012Multidimensional, if you want to follow along.

 

The parent child section of the dimension properties are shorter than some of the others (mainly advanced), and contain the following 5 properties:

image_thumb[3]

MembersWithData: There are 2 options for this property: NonLeafDataVisible and NonLeafDataHidden. NonLeafDataVisible means that if a member is also it’s parent, make it visible. NonLeafDataHidden is if it’s also it’s parent, make it invisible. A quick example shows it better, I think. Below is a snapshot of our employees dimension with the NonLeafDataVisible set. Notice that “Amy E. Alberts” is the manager of this department, but also appears under her employees:

image

 

However, if we set the property to “NonLeafDataHidden”, notice that she now only appears as she should, as the manager of the department:

image

 

MembersWithDataCaption: This setting will only have an affect if the MembersWithData property is set to NonLeafDataVisible. It tells Analysis Services what to name the member of the parent if it appears with its’ children in the hierarchy. It’s a free text field, and you can just type what you need. For example, typing in *(Parent) makes the following appear to our example above of “Amy E. Alberts”:

image

 

NamingTemplate: Now this property is pretty cool. Did you know that you can actually name the different levels of a parent-child hierarchy? Instead of the default “level 1, level 2”, etc., you can create your own naming template for the hierarchy. For example, if you click on the ellipses at the end of the property:

image

The default looks like:

image

However, you can rename each level of the hierarchy to call it what you want. This lets you write MDX against it easier, along with a host of other benefits. Below is the same AdventureWorks employee hierarchy, with new renamed levels:

image

 

RootMemberIf: This property tells AS when to consider a member a parent of it’s hierarchy. The choices are:

ParentIsBlank: Only keys in the dimension that have a parent key of null, zero, or empty string are the top most level in your parent-child hierarchy

ParentIsMissing: Only keys in the dimension where no key can be found are the top most level in your parent-child hierarchy

ParentIsSelf: Only keys in the dimension where the key is equal to itself are the top most level in your parent-child hierarchy

By default it’s set to “ParentIsBlankSelforMissing”,” which means essentially “any of the above”

UnaryOperatorColumn: This property tells AS what column in the dimension is the unary operator of the dimension. For those not familiar with what the unary column is/does, in your dimension you can specify a different mathematical calculation to be performed at each member of the hierarchy. It’s most commonly used in a budget/expense scenario. For example, I can have a column that says “if my attribute is income related, add it, otherwise do something else to it” (based on what you specify)

Hierarchies in SSAS Multidimensional: What’s really happening up (or down) in there?

  A few days ago I got an interesting question. The question was “When you build an hierarchy in SSAS, what’s actually happening to it on disk?” As in, how does the file system actually place it in files so it can easily identify the hierarchy and keep performance up? Sure, dragging all of those attribute relationships around in the dimension GUI is “fun”, but what does it do, and what do you gain? To answer these questions, let’s look at the process from beginning to end. For the below example, I’m just using the date dimension that’s available in Adventure Works.

 

Open up your date dimension and create your hierarchy. In this example, our hierarchy has 4 levels: Year, Semester, Quarter, and Month.

image

 

Make sure you set your key columns to use collections so your attributes stay unique, and set your attribute relationships properly. If you do not do this, you wind up creating what is called an “unnatural” hierarchy, which acts completely different behind the scenes than the natural hierarchy we’re creating here. An easy way to identify an unnatural hierarchy while you’re creating an SSAS dimension is the hierarchy has a yellow warning symbol above it. A quick definition of the 2:

  • An unnatural hierarchy is where the attributes don’t “really” belong with each other, except for reporting. For example, Marital Status –> Education. Therefore, your attribute relationships tab would have no relationship between these attributes, except at the key level.
  • A natural (user-defined) hierarchy is where each attribute is a property is a member property of the attribute above it. For example, Date (Year –> Quarter –> Month –> Day, or Country –> State –> City). Since each level is a member  property, we can use the attribute relationships to define cardinality of one or many appropriately, as in the below:

image

 

And process it.

 

image

 

Now, we’re ready to talk about the file structure. When building a natural hierarchy, SSAS creates an actual physical file “materializes the hierarchy” within the file system that it can use to reference the hierarchy level extremely quickly, in something called an AHSTORE file. For example, this is what our same calendar hierarchy above actually looks like in the file system:

 

image

 

You can in the screenshot where AS created a hash file for each level (English Month Name, Calendar Quarter, Calendar Semester, and Year) so it could traverse the hierarchy faster.

 

Conversely, with an unnatural hierarchy, no such hash ahstore files are created, leading to bad performance and difficulty navigating your dimension through MDX queries. Here is a screenshot of the customer dimension with an unnatural hierarchy between Marital Status and Education:

 

image

In conclusion of this post, the importance of creating natural hierarchies and avoiding unnatural hierarchies cannot be understated.

SSAS Dimension Properties : Misc and Source

Previously I’ve blogged about both Basic dimension properties (here) and Advanced dimension properties (here), so today I’d like to finish off this initial set by blogging about the misc and source sections of the properties of an attribute in a dimension. The source section, above everything else, is where you define your key for that attribute and the name and value properties, so it’s definitely one of, if not the, most used property sections.

 

Source

 

image

 

KeyColumns

Specifies what is the key of that attribute. This is the table in the source that binds to that attribute.

If you don’t put anything in the name column, AS uses the key column to as the name column as well.

NameColumn

This property determines the name that is displayed for the attribute. In cases where your key property is set to more than 1 column, you must put something in this field. Another nice thing about this field is that you can set say, your surrogate key attribute with the name column in your dimension and have it process on the key but display the name, so you can combine 2 attributes into 1 and improve performance while you’re doing it.

ValueColumn

This property determines what is actually displayed on the screen. If you have something in the name property, that value will be used by default, otherwise the key column is used. Or you can populate the value property with a value of your choosing if say, you want the key value to display or an entirely different attribute.

 

CustomRollupColumn & CustomRollupPropertiesColumn

These 2 properties are most commonly used in parent-child hierarchies and can be used if you want to apply a custom formula to your attribute as it’s being processed. You can also use this to track something like depreciation, as an example. There is an excellent post on how to set this up from Hennie located here.

 

Misc

image

 

AttributeHierarchyOrdered

Specifies whether or not to order this attribute. When you select “True” (also the default), Analysis Services creates an index during processing that it uses to ensure that members are displayed in order. (which is also defined by your key attribute property, see above). To save processing time, for certain attributes you can set this to “False” which tells Analysis Services it does not need to be displayed in a certain order. Ironically, the surrogate key of the dimension can be set to “False,” because you hardly, if ever, care in what order a key is displayed.

GroupingBehavior

This property only has 2 choices: EncourageGrouping or DiscourageGrouping. Essentially, it tells the client application if members of that attribute can be grouped. Something like Gender, or education, can be grouped as the same property could potentially apply to lots of different rows (there are a lot of individuals with Bachelor’s degrees, for example). Likewise, something that will always have a 1:1 relationship with the key attribute isn’t a good candidate for grouping (phone number, name, etc.)

InstanceSelection

You can use this property to tell client applications (namely, Excel), how to display the list of members in each attribute hierarchy. You have a couple of choices here:

  • None – This is the default. Like none in other cases, pretty much says do whatever you feel is best, Mr. Client Application
  • DropDown – there aren’t many records in here, so you can always display all of the items in a drop down list
  • List – too many for a drop-down list, but not so many that it’s unwieldy.
  • FilteredList – there are a lot of records, so it always needs to be filtered before it can be displayed.
  • MandatoryFilter – there are tons of records, the list must always be filtered before it can be displayed (think millions)

MemberNamesUnique

This property tells AS if each member in this attribute hierarchy must be unique.

 

 

By the way, I haven’t forgotten about the parent-child section, but in my opinion it deserves it’s own blog post, so that should be coming soon (hopefully later this week). 🙂

SSAS Dimension Properties: Advanced folder

Yesterday I blogged about the properties of a dimension in the basic folder, and what each property means. Today I’m going to cover the advanced folder. In the advanced folder, there are 13 different properties, all of which do (in my opinion) some pretty cool stuff.

 

Here’s screenshot of our properties window:

 

image

 

AttributeHierarchyDisplayFolder

This specifies what the folder is in the client tool that the attribute displays under. If you leave it blank it shows up directly under the dimension in SSMS, or under a folder called “More Fields” in Excel. Changing this setting to a folder you type in controls how you see it. For example, I have grouped all of the attributes in Spanish in the Adventure Works Customers dimension into a folder called “Spanish Fields”, this is what it looks like:

image

 

In Excel, it would look very similar, except the rest of the fields would show up under the “More Fields” folder mentioned above.

 

BTW, this same concept can apply to nesting folders. For example, if I wanted to nest “Spanish” under a folder called “Education Types”, in the property I would just specify my folder with a backslash, as Education Types\Spanish. My folder structure would then look like:

image

 

AttributeHierarchyEnabled

This attribute specifies whether or not SSAS even uses the attribute at all. Technically, every attribute in a dimension is an hierarchy, just at one level (Dimension level –> Attribute). If you set this to False, the attribute isn’t visible, can’t be used in a user-defined hierarchy, can’t be referenced by MDX statements, well you get the gist. It grays out in the dimension designer screen and you can’t use it at all.

 

AttributeHierarchyOptimizedState

This property has 2 options: FullyOptimized and NotOptimized. FullyOptimized is the default, and means that SSAS processes the attribute as normal. It builds indexes on disk for that attribute to keep query speeds fast. NotOptimized means that no such indexes are created, and therefore querying speeds may suffer. Only use NotOptimized if you are sure that a) it is not a queried field, and b) that attribute does not have a high cardinality in comparison to the rest of your attributes. For more information, see section 2.4.3.1.1 in the SSAS Multidimensional Performance and Operations Guide.

AttributeHierarchyVisible

This property is either True or False. True means that it is visible to all client applications, and can be sliced and diced on.

False means that the attribute isn’t visible to client applications, but can be referenced via MDX statement, actions, calculations, etc. This is handy if you have an attribute you’d like to use to order something, or just want to make available via a tooltip or drilldown or something. Think something like “Customer Address.” Do users really want a pivot table with 2 million addresses on the left of a pivot table report? Or do they want to double-click on city and see the list of addresses in that city in the detail level?

DefaultMember

This is the default member of the dimension. If this is left blank, the default member of the attribute is the all level, unless the “IsAggregatable” property below is set to false, in which case there is no All member. Then the default member is set to the first member of the attribute. SSAS uses the default member to evaluate an expression if that attribute isn’t used in a query.

DiscretizationBucketCount and DiscretizationMethod

Ok, I’m grouping these 2 together as I recently wrote a separate blog post about them indicating what they are and what they do. For more information, see here.

 

EstimatedCount

This field is used for aggregation designs. You can either enter the estimated count manually, or when using the aggregation design wizard in the cube, it will automatically count the attributes and enter it for you. This is especially handy in environments where development only contains a subset of your production data. You can enter in the actual production counts and when SSAS is aggregating, it will aggregate at the higher number you input for production, keeping your performance up.

IsAggregatable

This property specifies whether or not to create an <All> level for that attribute. There are certain situations where it doesn’t make sense to aggregate every member of the attribute to an All level, so you can turn this off. However, if you set this property equal to FALSE, it is highly recommended that you specify a default member. 

OrderBy

This property has 4 options: Name, Key, AttributeKey, and AttributeName. It sets how to order the leaf members that are contained within the attribute when they are displayed by a client application. The descriptions below are straight out of BOL:

  • NameColumn     Order by the value of the NameColumn property.

  • Key    Order by the value of the key column of the attribute member.

  • AttributeKey    Order by the value of the member key of a specified attribute, which must have an attribute relationship to the attribute.

  • AttributeName    Order by the value of the member name of a specified attribute, which must have an attribute relationship to the attribute.

OrderByAttribute

If you select AttributeKey or AttributeName, you must fill in this field. This tells the OrderBy field what to order this attribute by, if those fields were selected.