THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Optimizing order of sets in MDX crossjoins

For scalar values a*b is the same as b*a. (although we saw that performance of these two could be different in MDX in “Performance of multiplication in MDX” article). But if a and b are sets, then obviously results are different – since the order of tuples in the resulting set will be different. But does it matter if we were to perform some other operation on top of result, which wouldn’t depend on order of the tuples ? Semantically – the answer is no, but there could be a big difference in performance. Let’s take the following example:

WITH MEMBER [Measures].x AS
 Count(
   (
     [Customer].[City].[City]
    ,[Customer].[Gender].[Gender]
    ,[Customer].[Education].[Education]
    ,[Product].[Subcategory].[Subcategory]
    ,[Product].[Color].[Color]
    ,[Product].[Size].[Size]
   )
 )
SELECT x ON 0
FROM [Adventure Works]

This query runs in 78 ms – no problem. But if we do innocently looking change – move one set to another position -

WITH MEMBER [Measures].x AS
 Count(
   (
     [Customer].[City].[City]
    ,[Product].[Subcategory].[Subcategory]
    ,[Customer].[Gender].[Gender]
    ,[Customer].[Education].[Education]
    ,[Product].[Color].[Color]
    ,[Product].[Size].[Size]
   )
 )
SELECT x ON 0
FROM [Adventure Works]

Now this query takes 13 seconds, and also grabs significant memory. So what happened ? To understand this better, let’s go back to the first query and pay closer attention to what MDX Studio told us about the execution. We notice, that in PerfMon tab, it reported that there were 2 autoexists, and also hierarchical profiler shows 2 NON EMPTY events (each one corresponding to autoexist operation). Why does it report 2 ? The query has just single crossjoin, so there should’ve been only 1 autoexist. What happens here is that crossjoin detects that the set that it needs to apply autoexists to, can be nicely split into two parts – first part related to Customer dimension, and second part to Product dimension. Then it can apply the following formula

JOIN( c1, c2, c3, p1, p2, p3 ) = CROSSJOIN( INNERJOIN(c1,c2,c3), INNERJOIN(p1,p2,p3) )

I.e. it can run inner join against each dimension table separately, and then do full cross join between results. Inner join against single dimension table is very efficient since even in worse case, we don’t need to go deeper than the dimension key to which all other attributes relate – so this isn’t going to take additional memory or much time.

However, in second case, the equation doesn’t hold anymore, since the order of dimensions inside crossjoin is mixed. So some other, much less efficient algorithm will be needed, probably loop join, with crossjoin now being pushed to the earlier stages. This is very inefficient and also going to take lots of additional memory.

Conclusion: Inside crossjoin, cluster all the sets from the same dimension together – this will result in great benefit to performance.

Published Friday, October 24, 2008 1:00 AM by mosha
Filed under: ,
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement