Previously I’ve blogged about Parent Child dimensions in Analysis Services, and what exactly they are and how to configure them (here). However, parent-child dimensions have 1 huge issue: they don’t scale. Literally. The reason why is that in parent child dimensions, aggregations are only created for the key attribute. Everything in between is just calculated at run-time. This means that the larger your parent-child dimension gets, the bigger your performance issue gets along with it. There is a tool available on codeplex called the SSAS Parent Child Naturalizer (http://pcdimnaturalize.codeplex.com/#SQLView). This tool is super easy to use, and gives you the ability to “naturalize” out your parent child dimension (you can also naturalize this yourself if you don’t want to use the tool, but IMO this way is fast and efficient and gives you more time to focus on other development things as well). Instead of having 3 columns (ParentKey, ChildKey, and Name), this tool creates a separate column for each level that exists. There is a link on the site on how to use the tool, it’s pretty straightforward. Instead of walking through that here when it’s already readily available, I’m going to walk through the why you would want to use this tool. Specifically, how performance changes when the parent child dimension is naturalized.
I created an AS database that has 2 dimensions in it. The original Employee dimension is the straight employee dimension out of Adventure Works, with the parent child relationship. The second dimension is my modified employee dimension after using the SSAS Parent Child Naturalizer on CodePlex. Below is a screenshot after I initially just process them. Notice the difference in number of files (the regular parent child dimension is on the left, the naturalized hierarchy is on the right):
62 vs 242!? Wow, that’ is a BIG difference. Why the huge difference? By ordering the files by type, we can see how many of each type of file AS is creating as it’s processing. As an example, let’s look at just the DATA file type that exists. Prior to naturalizing the dimension, only 1 data file type existed, for the EmployeeKey (EmployeeKey.fact.data). After we naturalized it, we now have 8 different data files, one that exists for each level of our hierarchy. This proves my statement above how Analysis Services only creates the aggregation for the key level attribute. This same performance boost is carried through for every kind of AS file, for each level of the hierarchy.
Essentially, what we can learn from all of this is that natively parent child hierarchies perform very similarly on disk to those of unnatural hierarchies, which are terrible in terms of performance. By using the naturalizer tool we can turn this performance frown upside down (ooh, that’s super cheesy, even for me)!!