SSAS: Currency Conversion

A sometimes challenging business requirement is the concept of global transactions and sales. As a company, if you have sales in say, the United States, Canada, Germany, and Brazil, how do you report all of those sales together? There are 3 different ways the business could potentially want those reports. The first is that I want all of my reporting done in 1 currency that’s been standardized. Using the 4 countries above, let’s say that’s US Dollars. The second way I might want to view it is as stored as the native currency, and reporting is done in the native currency. The third is that everything is converted to one currency while it is stored and I want to report it as its native currency. Fortunately, with SSAS it’s easy to implement a solution that can provide the reporting around different currency types and SQL Server even as has a money type to make it easier to store this data. Each of the 3 different types is very similar in implementation from an Analysis Services wizard standpoint. As is pointed out in the book Expert Cube Development (by Marco Russo, Chris Webb, and Alberto Ferrari), currency conversion is a business decision, not a technological one. This means that the person determining the type of implementation (one-to-many, many-to-many, or many-to-one) should always be a business user. For the rest of this blog post I’ll be walking through implementing currency conversion in AS MD.

I am going to assume we are reporting off of a data model that is structured like the below, which lends itself well to currency conversion: (shamelessly lifted out of Adventure Works)

image

I’ve created my data source, my data source view, and 2 dimensions (currency and date). I’ve also created a small cube containing the sales amount from the FactInternetSales table and the FactCurrencyRates. In this model, we are collecting data in multiple currencies and want to convert it to a base currency of USD for reporting purposes. Let’s create our date dimension first:

image

Once you’ve created the attributes and hierarchies for the date dimension let’s tell AS MD that this is our date dimension, by clicking on the Add Business Intelligence wizard at the top left of the dimension designer and selecting Define Dimension Intelligence. On the next screen, select Time, then map the attributes of your date dimension to the corresponding attribute type.

Once that’s complete, create a currency dimension. The currency dimension should consist of at least the currency key and name. For more efficient storage on disk, for the key column, use a key column of the currency sk of the dimension and use a name column of the currency name. Since there is a one to one relationship between the 2 in the dimension, this will make it much faster for AS MD to run through the indexes at query time. My completed currency dimension looks like the below. Make sure you go into the Define Dimension Intelligence wizard again, but this time select currency, and map currency name  to the currency attribute. Also in the currency dimension, select the currency attribute you created, and set the default member to your base currency (USD). The expression should look like [Currency].[Currency].&[100]. You always want to set the IsAggregatable property to false. For this dimension, you cannot create an ALL member, as summing across currencies doesn’t make any sense.

image

Now we’re ready to create a cube. Create the cube and select SUM for the Internet Sales In the cube, select the Internet Sales measure group and the currency rate conversion table, and change the currency rate measures for EndOfDayRate and AverageRate to LastNonEmpty. Then, move over to the dimension usage tab. ensure that the currency rate table is pointed to the date dimension, along with the date internet sales measure group you want to point to. My dimension usage screen looks like:

image

Now we’ll go back to the Cube structure, and add some currency conversion. In the Add business Intelligence icon at the top left, select Define Currency Conversion. On the next screen, we want to select our currency rate table, the currency we’re pivoting on (the one you want to be the reporting standard in a case of a many to one implementation), and whether it is in foreign currency per dollar or dollar per foreign currency.

image

Click Next, and then select all of the measures that will be affected by the currency conversion:

image

On the last screen, select whether the conversion is one-to-many, many-to-many, or many-to-one. As many-to-one is the most common type of currency conversion requested, we’re going to select that here and select Next >

image

On the last screen, select the currency\currency dimension to specify that the fact table references the currency dimension directly, and select Finish

image

Click Finish again, and the wizard will run through. If you go to the calculations tab, you’ll see the script that the wizard generated:

// <Currency conversion>
        // Currency conversion wizard generated script.   
        // Currency conversion type: ManyToOne   
        // Selected members to be converted: Sales Amount   
        // Please be aware that any changes that you decide to make to it may be overridden the next time you run the Currency Conversion wizard again.   
        // All currency conversion formulas are calculated for the pivot currency and at leaf of the time dimension   
        Scope ( { Measures.[Sales Amount]} );
            Scope( Leaves([Date]) ,
                [Reporting Currency].[US Dollar],   
                Leaves([Currency]));

              // Convert Local value into Pivot currency for selected Measures that must be converted with Measure rate [End Of Day Rate]
              Scope( { Measures.[Sales Amount]} );
               This = [Reporting Currency].[Local] / Measures.[End Of Day Rate];
              End Scope;

            End Scope; // Leaves of time, source currency and pivot currency   
        End Scope; // Measures
        // End of the currency conversion wizard generated script   
    // </Currency conversion>

Also, if you go back to the Dimension Usage tab, you’ll see that the wizard has created a new dimension for the reporting currency. This is currency that the user will select to specify what currency it will be reported in. Process the cube, and we’re good to go!

Happy cubing!

SSAS: DistinctCount

As I’m sure all experienced SSAS developers have noticed, when you create distinct count measures in SSAS multidimensional, you are required to create a separate measure group in order to let Analysis Services identify these calculations easier. DIstinct counts are one of the most demanding requests that can be placed on a MOLAP database, caused by the large number of dimensional attributes that typically exist in a cube. I was recently reading the Analysis Services Distinct Count Optimization whitepaper by Denny Lee, and wanted to see for myself how changing the partitioning strategy for a distinct count measure group could affect performance, both from a processing and querying point of view. I enlarged the FactInternetSales table in AdventureWorks to ~1 million records, and took the existing Adventure Works multidimensional solution and just added my new big internet sales table into it. In full disclosure, the machine I’m working on is an Intel I7, dual-core, with 16 GB of RAM, and a 500GB SSD. I decided to run 3 tests, the first with the initial default partition, the second with 4 partitions, one for each year, and the third one creating equal distributions by the size of the table. Onwards!

To start, I’ve turned profiler on against my target server and created an 2 measure groups against my FactInternetSalesBig table, one calculating the sum of all of the sales amount and one doing the distinct count on my customerkey, representing the distinct number of customers. Below is the screenshot of the partition screen:

image

One partition results in the following query being sent against the database during processing, requesting an ORDER BY, as we’d expect.

        SELECT [dbo_FactInternetSalesBig].[CustomerKey] AS [dbo_FactInternetSalesBigCustomerKey0_0],[dbo_FactInternetSalesBig].[OrderDateKey] AS [dbo_FactInternetSalesBigOrderDateKey0_1],[dbo_FactInternetSalesBig].[DueDateKey] AS [dbo_FactInternetSalesBigDueDateKey0_2],[dbo_FactInternetSalesBig].[ShipDateKey] AS [dbo_FactInternetSalesBigShipDateKey0_3]
FROM [dbo].[FactInternetSalesBig] AS [dbo_FactInternetSalesBig]
ORDER BY [dbo_FactInternetSalesBig].[CustomerKey]
ASC

Additionally, this took 11622 milliseconds to process. What happens when we partition by year?

image

This time, profiler shows us the 4 separate queries being fired off into the relational database, each with their own order by statement. The longest partition, 2008, took 5111 milliseconds to process.

image

Our issue here is that when Analysis Services is reading from each partition, they are unequal in size, as evidenced by a graphic similar to the below:

image

pulled from the SSAS Distinct Count Optimizations Guide, written by Denny Lee

 

Grouping by year causes wide differences between the partitions files stored on disk, causing analysis services to read many more records for say, 2007 than 2005. We want to take this same data set, and group them into equal buckets each, in order to make it easier for Analysis Services to do the calculations at query time. Our ideal image should look like the below (adopted from the same source as the image above). This also ensures that AS only needs to scan each partition once, by creating non-intersecting partitions.

image

How can we ensure that partitions are of equal size? If we return back to our original table that is generating the fact, we can run a query like the below which tells us the min and max range of the table, as well as how many records should be in each group:

SELECT
COUNT (DISTINCT f.SalesOrderNumber) as CustomerKeyCount,
MIN(f.CustomerKey) as MinCustomerKey,
MAX(f.CustomerKey) as MaxCustomerKey,
(MAX(f.CustomerKey) – MIN(f.CustomerKey)) / 4 as CustomerKeySlice
FROM FactInternetSalesBig f

This table yields the following results in my big internet sales table:

image

In the fourth column of the select, I’ve hardcoded the 4, stating that I want to create 4 partitions. I’m now going to create 4 equal sized partitions in AS by creating each range, starting with the min value and incrementing by 4620 for each partition. Doing this takes 4883 milliseconds to process. The results can be represented via the following below graphic:

image

So, creating 4 equal partitions can help us scale our distinct count queries more and giving AS a really good performance boost when these queries come into play. Thanks Denny!

SSAS: Synchronize

As SSAS databases get larger and larger, deployment becomes a bigger and bigger problem. How can we ensure that databases are easily and quickly deployed across from development to production with minimal downtime? For some environments, SLA’s dictate that analysis services database must be available for querying 24×7, with seconds for downtime, especially for global companies. When it comes to deployment, there are 4 options to deploy: backup and restore the development database across, using the ssas deployment wizard, scripting the database using an alter xmla command, or using synchronize. backup and restore is slow and cumbersome, using the wizard is, well, using the wizard, and scripting the database with an xmla command is a decent strategy for deployment, although once you’ve run your script, the database needs to be completely re-processed before it can be queried again. In comes synchronize to the rescue! The beauty of synchronize is that while analysis services is doing it sync, users can continue to query and use the destination database. When the synchronize is complete, Analysis Services will automatically switch users over to the newly copied database, putting downtime for deployment in the millisecond range. This can be represented via the following diagram:

image

This ensures a high performing, scalable solution. Even better, we can use the synchronize command to synchronize production databases across instances in a Network Load Balancing cluster, and increase the resources available to analysis services as needed. For anyone unfamiliar with Network Load Balancing, you use a switch to distribute the workload across multiple smaller servers (scale out), similar to the below diagram:

 image

Additional information on scaling out an Analysis Services solution can be found here.

So, back to the reason for today’s post. Let’s say that we want to synchronize our Analysis Services database across from our development to our production machine. I have two Analysis Services instances on my machine, both 2012 SP1. In Management Studio, right click on the databases folder in the Analysis Services target instance and select Synchronize…

image

After the wizard introductory page, select the source and destination servers. Note that you also have the option to change the storage location on the destination server. (Remember, fast drives are better for heavily queried stuff, slow drives for older, less queried traffic). If the source server uses remote partitions, the destination server will point to the same remote partitions. For more information on remote partitions, see my previous post, Remote Partitions in SSAS.

image

On the next screen, select the partition locations and click Next. Then choose if roles and members should be synchronized as well. If you are creating your roles initially you may want to deploy your roles and members, but once you’ve been in production for a while there will be a disconnect, as users have access to the production data but not the development server. We’re going to set it to copy all for this example. Notice the last checkbox, “Use compression when synchronizing databases”. We want to make sure this is checked, in order to keep time to a minimum. Click Next, and then select whether to synchronize now, or to save the script as a file. If you save it as a file, you can then take the xmla command and put it as a part of your nightly job run. We’ll keep it as Synchronize now, and click Next.

image

The summary screen will appear and we’ll click Finish. That’s it! You’ve successfully synchronized your two instances! I wanted to go into a bit deeper detail, so I’ve turned profiler on in my target instance before I click Finish so I can capture the commands being sent. Also, if you just want the synchronize script for your job, it looks like the below:

image

Once you click Finish, you’ll see the status bar tick across and do the synchronization between the two servers. My database was pretty small (93 MB), so it only took a few seconds. Below is a screenshot of the profiler events that I captured:

image

Selecting one of the Progress Reports that was logged, we can see that what Analysis Services that all Analysis Services did was copy of the files from the source to the target. There are no other commands being done (this line was repeated for every object throughout the database).

image

Keep in mind that this was an initial deployment. Running through the wizard again, this time the profiler shows us some profoundly different results:

image

It seems that when you are running the synchronize command, any subsequent runs against an existing database are compared to the existing metadata for differences (which would make sense, and you would expect that behavior). As databases grow larger, the impact of this will magnify, and slow performance down. For 90% of the solutions that I see on engagements, synchronize works perfectly like a charm. For larger (over a hundred GB AS databases), this approach may be able to be increased by using the technique outlined by using the Robocopy method outlined in this best practices article from Microsoft. Keep in mind that the Robocopy method demands that the service be stopped/restarted, and writeback and ROLAP support can’t be used). If you need those, using synchronize will be your best option.

SSAS: Remote Partitions

UPDATE 5/29/2014: I noticed today on msdn that Microsoft has posted that remote partition will be deprecated in a future release of SQL Server. So while this is a great tip for performance tuning now, know that in the future Microsoft is planning on discontinuing this functionality (although personally I think I would disagree with that decision, unless we’re getting some better for performance tuning! :) For more information, see here)

 

Last week I blogged about setting the partition slice in SSAS and how it can help with performance (see here). Sometimes, for scalability reasons it may be helpful to store partitions on a different server or instance where you can take advantage of additional cpu and memory in order to gain faster performance. This functionality does exist, in a feature referred to as remote partitions. Remote partitions reside on a server different from the main cube. For simpler administration, remote partitions are still managed on the instance that the parent cube is defined. One key caveat to this awesome functionality is that both servers must be running the same instance and SP level (in order to ensure compatibility). To start, you will need at least 2 instances of SSAS in multidimensional mode. Remote partitions are much easier to set up than you might think, and if you’ve already created partitions you’ve already passed the configuration screen.

First, let’s start out with why you would want to create remote partitions. Remote partitions allow the remote instance (the secondary database) to perform most of the memory and CPU utilization, allowing the master (primary) database location the ability to keep its resources free. As per BOL:

When remote partitions are included in a measure group, the memory and CPU utilization of the cube is distributed across all the partitions in the measure group. For example, when a remote partition is processed, either alone or as part of parent cube processing, most of the memory and CPU utilization for that partition occurs on the remote instance of Analysis Services.

Additionally, this allows us to scale out Analysis Services across multiple instances. This can be represented via the below diagram:

image

Since “the memory and CPU utilization of the cube is distributed across all of the partitions”, we can have multiple AS servers working in parallel to answer a query as needed. Quite handy! For example, if I query the Adventure Works cube for Internet Gross Profit Amount by country by year it takes approximately 96 milliseconds: (on my local SSD)

SELECT NON EMPTY Hierarchize({DrilldownLevel({[Date].[Calendar].[All Periods]},,,INCLUDE_CALC_MEMBERS)})
ON COLUMNS ,
NON EMPTY Hierarchize({DrilldownLevel({DrilldownLevel({[Customer].[Customer Geography].[All Customers]},,,INCLUDE_CALC_MEMBERS)},
[Customer].[Customer Geography].[Country],INCLUDE_CALC_MEMBERS)})
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME,[Customer].[Customer Geography].[State-Province].[Country] ON ROWS
FROM [Adventure Works]
WHERE ([Measures].[Internet Gross Profit])

Moving the 2005, 2006, and 2007 partition to a remote instance, this same query runs in about 46 milliseconds from a cold cache. So we gained around 50% improvement! Nice! So how do we configure these remote partitions? For the remainder of this post, I’m going to walk through how we can configure these remote partitions. Referring to the diagram above, before we can store anything on a secondary database, we first need to create it. Open up SSDT, and create a new analysis services project. Give it a name, and create a data source pointing to your master analysis services database. You are going to want to use the Native OLE DB driver for Analysis Services. Deploy it to your secondary instance that you’re going to be using for your remote partition storage locations. On my laptop, I have an instance called .\REMOTE, which is where I’ll be deploying this too:

image  image

Now there are 2 server level properties we need to adjust, in both the master and the secondary server. Click on the server name, and go to properties. On the general page, you are looking for the Feature \ LinkFromOtherInstanceEnabled and Feature \ LinkToOtherInstanceEnabled properties. By default these are set to false, we want to change them to true. Once you finish the first one, make sure you change it for the second server as well.

image

Now we want to go back into our remote database that we just created. Right click on it, and select SCRIPT DATABASE AS –> ALTER TO –> NEW QUERY WINDOW. After the data source impersonation info tag closes, you want to insert the following line:

<MasterDataSourceID>YourDatabaseNameHere</MasterDataSourceID>

So my complete script executing this against an AdventureWorks database I have on my local machine looks like the following:

image

Run it, and you’ll should get your success XML return. Finally we’re ready to change our partition location in our primary database. Open up SSDT, and open the project for your original solution (with all your dimensions, cubes, and other stuff). Open up cubes, and go to the partitions tab. If you have an existing partition that you want to move, click on it, go to properties, and under the Configurable section select Storage Location and expand the ellipses:

image

Select “Remote Analysis Services instance”, and then select the database name. The first time you do it you will need to click “NEW”, and create an Analysis Services connection to your remote instance. Select whether the storage location should be at the default or in another location. It’s generally a good rule of thumb that your partitions should be on faster drives the more current the data is. For example, if I have a 2013 partition, and then another partition for older data, it’s ok to have my older data partition on a slower, SATA drive, but my 2013 partition should be on something significantly quicker, as I’m accessing it more often. Remember that the goal here is to get speed & performance up as well. We are moving this to a separate instance in order to allow Analysis Services to compute millions of records over a different partition in addition to what the master server is doing. Moving all of this data and then sticking it on the old drive you found in the basement isn’t going to give you the lift you want. Click OK.

If you are creating a new partition, select New partition, and select the Measure group you wish to partition. Once you click Next, select whether the partition is query bound or table bound, and click Next. If you followed along in the last exercise the next screen will look quite familiar:

image

Click next, and give your new partition a name. That’s all there is to it! Make sure you do a process full on your master database the first time after you initially configure. Administration and management of the new remote partition is all done via the master database. You can scale this out to as many secondary databases as needed, if you want. Be careful with over-saturation though. I didn’t test it but I would imagine there is an ROI point similar to SSIS’, where you reach a point of diminished return.

Happy tuning!

SSAS: Partition Slicing

In SSAS multidimensional there is a concept within creating partitions for your measure groups called slicing. Slicing can be thought of as the definition Analysis Services uses to find data within partitions at query time. We are all aware of the benefit that creating multiple partitions can have in improving Analysis Services multidimensional performance, but how can partition slices help us further? As per BOL, “Explicitly setting the Slice property can improve query performance by overriding default slices generated for MOLAP and HOLAP partitions”. This means that slicing partitions ourselves could have a profound impact on query performance.

Let’s take a look in more detail. Starting off, we’ll turn SQL Server Profiler on against our SSAS instance. For this post, I’m just going to use the Adventure Works 2012 cube. My partitions currently in my SSAS solution appear like the below for Internet Orders, partitioned by year:

image

 

Now let’s an MDX query to show the problem:

SELECT [Measures].[Internet Order Count] ON 0,
    [Date].[Calendar].[Calendar Year].&[2008] ON 1 –2008
FROM [Adventure Works]

 

Running a profiler trace against this clearly shows our problem:

image

We are looking for data in one specific partition (2008), yet the formula engine thought that it needed to look through every partition to find the answer! In a large cube, the impact of this can be disastrous. Scanning every partition is very inefficient when we know from a logical standpoint that our partition strategy should answer our question.

During query time, SSAS internally reads the partition header files to determine what partition contains the data it is trying to read. Navigating to a deployed cube to an Analysis Services instance in File Explorer (typically located at C:\Program Files\Microsoft SQL Server\MSAS11.MSSQLSERVER\OLAP\Data), under the cube name folder, cube name.cub folder, all of the measure group partitions are located. Inside of these folders, generally a few levels down, lives an info.*.xml file, which contains relevant information for analysis services on what is contained in that partition. Frequently, it seems, the automatic detection available in MOLAP storage mode doesn’t always detect the partition where it belongs, and ALL partitions are scanned. So how can we fix this issue? Fortunately, there is a property available in SSAS that tells the engine specifically what a partition is sliced by, allowing us greater control over performance by targeting where query scans are sent during execution. Opening up our project in data tools, on the partition screen inside of the cube, I am going to select each partition and set the partition slice property:

image

Re-processing the project and re-executing the query yields the following results in profiler:

 

image

Voila, which is what we are looking for! Now our queries will be much more efficient when being sent from client applications, only hitting relevant partition data.

So we’ve learned that it is very important to set the slice property when creating partitions. However, it is important that these slices should correspond to the where clause in the partition inside. 

BI Administration: The Business Intelligence Management Data Mart

Firstly, I’d like to thank everyone who attended my session this morning for Pragmatic Works Training on the T’s. I always enjoy presenting, Today’s topic was about management and administration of business intelligence solutions. The theme of the overall presentation was that although there are many ways to collect the meta data around our business intelligence solution, the data itself still lends itself to a dimensional structure, that we can use to model into various facts and dimensions. Below is a data model that represents performance between ssis, ssas, and ssrs. These are key performance metrics surrounding these 3 development tools. As developers/administrators, it is our responsibility to ensure that we maintain and monitor this information to adhere to SLA’s and ensure timely performance. A overall diagram of the architecture would look like the below:

 

image

 

A complete recording of the webcast is available here, and the complete solution and database backup is available here.

Power Pivot: Adding tooltips to your Power View report

In Excel 2013, did you know that you can make descriptions available to users as they’re browsing the model in Power View? I recently discovered this, and thought it was a really nice feature that you can use to add some extensibility to your Power View reports. In this blog post, I’d like to walk through and show how easy it is to get this set up for use in your environment.

 

Let’s start with a blank Excel workbook, and open up the Power Pivot window:

 

image

 

 

Let’s import some data into Power Pivot. For this example, I’m just going to import in the Internet Sales table from Adventure Works:

 

image

 

Now that it is loaded, find the UnitPrice column and right click on it. Look for the Description field.

image

 

Select it, and the Description dialog window will appear. Type in “The price per unit that was charged.” and click OK. That’s it! Now let’s see it in action. Switch back over to excel view, and insert a Power View sheet.

 

image

 

Expand the Internet Sales table, and hover over the Unit Price column that you added the description to.

 

image

 

Your description now shows as a tooltip for report users to navigate and get additional information around. What’s even better, is this works exactly the same way in a tabular database when you’re building it. To demonstrate, open up the Adventure Works tabular database in 2012, and find the Internet Sales table again. Find the Unit Price column, right click and go to description.

 

image

 

In description, type in “The price per unit that was charged.” and click OK. Then build and deploy your solution. After it finishes deploying, open up Excel and connect to your tabular database.

 

image

 

After you click Finish, select Power View report under the Import Data dialog box and click Ok.

 

image

 

After your Power View sheet, open up the table list and find the Unit Price column again. Notice that your description can be implemented as a tooltip this way too.

 

image

 

 

That’s all for now! Short blog post, but very useful!