THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Get most out of partition slices

Setting partition slice has always been an important optimization technique in Analysis Services. Every presentation talked about it and every whitepaper mentioned it, for example the Microsoft SQL Server 2000 Analysis Services Performance Guide contains a chapter appropriately named “Define the Data Slice for Each Partition”, here is a quote from it:

“If the data slice value for a partition is set properly, Analysis Services can quickly eliminate irrelevant partitions from the query processing and significantly reduce the amount of physical I/O and processor time needed for many queries issued against MOLAP and HOLAP partitions <skip>

Caution: Creating a partition without setting the data slice is not a good practice, and can result in considerable overhead being added to the system (artificially increasing response times). Without the data slice, Analysis Services cannot limit a query to the appropriate partitions and must scan each partition even if zero cells will be returned.

The data slice enables Analysis Services to determine which partitions contain data relevant to the query”

However, after AS2005 was released, there was a new tip making it into presentations - “For MOLAP partitions no need to specify slice property, it is detected automatically”. Even though I briefly debunked it before here,  this rumor just kept popping up, and it even made it as far as official “OLAP Design Best Practices for Analysis Services 2005” document – here is the quote:

“For MOLAP partitions, you do not have to specify the slice because the server will be able to figure out the relevant partitions after the data has been processed”

Even worse, the official documentation on data slices goes as far as stating:

Data slices are applicable only to objects that use the ROLAP storage mode”

But this is completely not true, and many people already discovered it. This Jesse Orosz’s blog entry has great description of 4 reasons why setting partition slice in MOLAP is important. In my blog I want to focus on #3 from his list – automatic slice detection is not perfect.

Here is how automatic slice detection works – during building indexes for the partition, for every attribute it determines the minimum and maximum data id values in this attribute (for more detailed explanation see section “Partition Slice” in the Microsoft SQL Server 2005 Analysis Services book, page 355). It is possible to discover what values exactly Analysis Services detected, either doing it hard way as described here, or doing it easy way as described here. In either case, it is clear, that since slice autodetection operates with ranges of data ids, it is totally possible that partition which contains only two attribute members, but one with the lowest possible data id (2), and another one with highest possible data id, and as a result the range will cover all possible values, i.e. will be useless.

On the other hand, if for certain attribute the data in partition contains only single value of member’s data id, then the range will be very effective – since such a range will be equivalent to slice on a single member.

So this must be the root of all the confusion in documentation and in whitepapers. Prior to AS2005, partition slice could have only been a single member. And specifying partition slice of a single member is really not necessary in MOLAP, since it will always be automatically discovered by autoslice (well, except for other 3 reasons that Jesse lists in his blog). But starting with AS2005, it is possible to specify MDX sets for partition slice, and that’s exactly the case where setting partition slice manually makes sense.

Analysis Services Stored Procedures Project on Codeplex features PartitionHealthCheck function, which shows the overlaps in data id ranges between partitions, but it is no clear what action the user can take when he discovers that the autodetected ranges are not optimal. User doesn’t have direct control over assignments of data ids to dimension members. What this blog entry is going to explore is how to avoid relying on autoslice altogether, and put the precise control over partition slices into the hands of the user.

Almost related attributes

First scenario to consider is what I call “almost related attributes”. I.e. we have attributes, which are not truly related to each other, but, on the other hand are not completely unrelated. Examples of such attributes are Zip code to State – it is almost many to one, but sometimes there are Zip codes which cross State boundaries. Another example is Week to Year. Year mostly contains 52 weeks, but the weeks at the beginning and end of year can cross to previous or next year. 

Let’s use Adventure Works 2008 as our sample database. We have ‘Calendar Year’ and ‘Calendar Week’ attributes. Let’s first query for specific Calendar Year:

SELECT 
  [Measures].[Reseller Sales Amount] ON 0
FROM [Adventure Works]
WHERE 
  [Date].[Calendar Year].[Calendar Year].[CY 2004];

Running this query under Hierarchical Profiler of MDX Studio shows that only one partition – Reseller_Sales_2004 – was read for that query. This is because that partition contains data just for year 2004, and therefore the automatic slice detection worked just fine. But let’s query for a specific week in 2004:

SELECT 
  [Measures].[Reseller Sales Amount] ON 0
FROM [Adventure Works]
WHERE 
  [Date].[Calendar Week].&[17]&[2004];

Of course, we would like again to read just 2004 partition, but this time profiler shows that also partitions Reseller_Sales_2002 and Reseller_Sales_2003 were touched. This is pretty bad, this week 17 of 2004 is well inside year 2004 with no hope of crossing into 2003, let alone 2002 !

So why did automatic slice detection did such a poor job here. To answer this question, let’s look at data id values for the Calendar Week attribute:

WITH MEMBER [Measures].CalendarWeekDataID AS
  DataId([Date].[Calendar Week].CurrentMember)
 ,CAPTION = 'Data ID of Calendar Week'
SELECT [Measures].[CalendarWeekDataID] ON 0
, [Date].[Calendar Week].[Calendar Week] ON 1
FROM [Adventure Works];

This query shows the data id for every calendar week, and the problem is now obvious. The calendar weeks don’t have nice ordering by the years, instead first we get Week 1 of years 2002, 2003 and 2004 (this week didn’t exist in 2001), then Week 2 for these years etc. No wonder the range of data ids for calendar week, included week 17 of 2004 also in 2003 and 2002. We could fix the Calendar Week attribute, but we would still be at the mercy of Analysis Services about how it decides to assign data ids. Instead, we can explicitly set the slice on every partition in the ‘Reseller Sales’ measure group.

And what is better way to build this slice if not by using another MDX query !

 

WITH 
  SET WeeksIn2004 AS 
    Exists(
       [Date].[Calendar Week].[Calendar Week].MEMBERS
      ,[Date].[Calendar Year].[Calendar Year].[CY 2004])
  MEMBER [Measures].PartitionSlice AS
   "{" +
   Generate(
     WeeksIn2004 AS itr, 
         [Date].[Calendar Week].CurrentMember.UniqueName 
       + IIF(itr.CurrentOrdinal = WeeksIn2004.Count, "", ","))
   + "}"
SELECT [Measures].PartitionSlice ON 0
FROM [Adventure Works]

This returns long string representing MDX set with all the weeks in 2004. We just need to make sure that the resulting string is XML’ized, i.e. ‘&’ is replaced with ‘&amp;’ if it is to be pasted directly into XMLA script for ALTER partition, and we are good to go.

When we apply this procedure to every partition, rerunning the query

SELECT 
  [Measures].[Reseller Sales Amount] ON 0
FROM [Adventure Works]
WHERE 
  [Date].[Calendar Week].&[17]&[2004];

Now only touches 2004 partition.

Related attributes

So far we considered “almost related attributes”, as for truly related attributes one may think there is no problem. But in fact, in Adventure Works 2008, Calendar Week 2004 is defined as related to Calendar Year. So why when we sliced on [Date].[Calendar Week].&[17]&[2004] it didn’t decode the year ? If we run coordinate decoding in MDX Studio, it does properly show ‘CY 2004’ being decoded. The problem here lies in the fact that FE does coordinate decoding lazily, only if it really needs to. This makes sense – otherwise there will be a lot of time spent decoding attributes that nobody needs. So in our query, FE noticed that there were no calculations looking at Calendar Year’s CurrentMember, and it left it non decoded, as showed by verbose data in profiler in MDX Studio:

Dimension 9 [Date] (0 0 0 0 0 0 0 0 0 0 52 0 0 0 0 0 0 0 0 0 0)  

[Calendar Week]:[Week 17 CY 2004]  
[Calendar Year]:0  

But in this case, it looks like SE could benefit from attribute decoding on query subcube – this could be useful performance optimization.

What about multiple attributes

We now successfully dealt with Calendar Week attribute, but Date dimension contains plenty of other attributes. If their Data IDs are as messed up as with Calendar Week – we need to do something about them. Unfortunately here we run into unintentional, yet very unfortunate limitation of Analysis Services. It is impossible to set partition slice by multiple attributes if at least one of them is a set and not a single member (which is the only useful case for us !). This limitation stems from unfortunate combination of few different factors:

1. Sets used in partition slice have to comply with limitations of SetToStr(…, CONSTRAINED) (see documentation). There is a good reason for that – partition slices are computed when cube is not really ready for querying, so putting just any MDX set could trigger FE wanting to get data from the cube, which is not yet ready with unpredictable results.

2. SetToStr(…, CONSTRAINED) doesn’t allow crossjoin of sets using neither CrossJoin function, nor * operator, no tuple-like form – i.e. it is not possible to specify crossjoin

3. While it is possible to specify set of tuples for SetToStr(…, CONSTRAINED), it still doesn’t help, since FE is not capable of building a subcube out of set of tuples (instead it immediatelly resorts to dreaded arbitrary shaped form).

It would be useful if any one of these limitations was relaxed – because it would enable true great control over partition slices in non-trivial cubes.

Published Tuesday, October 14, 2008 12:36 AM by mosha
Filed under: , ,
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement