THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Possible SWITCH Optimization in DAX – #powerpivot #dax #tabular

In one of the Advanced DAX Workshop I taught this year, I had an interesting discussion about how to optimize a SWITCH statement (which could be frequently used checking a slicer, like in the Parameter Table pattern).

Let’s start with the problem. What happen when you have such a statement?

Sales :=
    SWITCH (
        VALUES ( Period[Period] ),
        "Current", [Internet Total Sales],
        "MTD", [MTD Sales],
        "QTD", [QTD Sales],
        "YTD", [YTD Sales],
         BLANK ()
    )

The SWITCH statement is in reality just syntax sugar for a nested IF statement. When you place such a measure in a pivot table, for every cell of the pivot table the IF options are evaluated. In order to optimize performance, the DAX engine usually does not compute cell-by-cell, but tries to compute the values in bulk-mode. However, if a measure contains an IF statement, every cell might have a different execution path, so the current implementation might evaluate all the possible IF branches in bulk-mode, so that for every cell the result from one of the branches will be already available in a pre-calculated dataset.

The price for that could be high. If you consider the previous Sales measure, the YTD Sales measure could be evaluated for all the cells where it’s not required, and also when YTD is not selected at all in a Pivot Table. The actual optimization made by the DAX engine could be different in every build, and I expect newer builds of Tabular and Power Pivot to be better than older ones. However, we still don’t live in an ideal world, so it could be better trying to help the engine finding a better execution plan.

One student (Niek de Wit) proposed this approach:

Selection :=
IF (
    HASONEVALUE ( Period[Period] ),
    VALUES ( Period[Period] )
)

Sales :=
CALCULATE (
    [Internet Total Sales],
    FILTER (
        VALUES ( 'Internet Sales'[Order Quantity] ),
        'Internet Sales'[Order Quantity]
            = IF (
                [Selection] = "Current",
                'Internet Sales'[Order Quantity],
                -1
            )
    )
)

    + CALCULATE (
        [MTD Sales],
        FILTER (
            VALUES ( 'Internet Sales'[Order Quantity] ),
            'Internet Sales'[Order Quantity]
                = IF (
                    [Selection] = "MTD",
                    'Internet Sales'[Order Quantity],
                    -1
                )
        )
    )
    + CALCULATE (
        [QTD Sales],
        FILTER (
            VALUES ( 'Internet Sales'[Order Quantity] ),
            'Internet Sales'[Order Quantity]
                = IF (
                    [Selection] = "QTD",
                    'Internet Sales'[Order Quantity],
                    -1
                )
        )
    )
    + CALCULATE (
        [YTD Sales],
        FILTER (
            VALUES ( 'Internet Sales'[Order Quantity] ),
            'Internet Sales'[Order Quantity]
                = IF (
                    [Selection] = "YTD",
                    'Internet Sales'[Order Quantity],
                    -1
                )
        )
    )

At first sight, you might think it’s impossible that this approach could be faster. However, if you examine with the profiler what happens, there is a different story. Every original IF’s execution branch is now a separate CALCULATE statement, which applies a filter that does not execute the required measure calculation if the result of the FILTER is empty. I used the ‘Internet Sales’[Order Quantity] column in this example just because in Adventure Works it has only one value (every row has 1): in the real world, you should use a column that has a very low number of distinct values, or use a column that has always the same value for every row (so it will be compressed very well!). Because the value –1 is never used in this column, the IF comparison in the filter discharge all the values iterated in the filter if the selection does not match with the desired value.

I hope to have time in the future to write a longer article about this optimization technique, but in the meantime I’ve seen this optimization has been useful in many other implementations. Please write your feedback if you find scenarios (in both Power Pivot and Tabular) where you obtain performance improvements using this technique!

Published Monday, August 18, 2014 1:54 PM by Marco Russo (SQLBI)
Filed under: , , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

David Hager said:

Each CALCULATE formula appears to be a separate measure, so could be written as =IS_Current+IS_MTD+IS_QTD+IS_YTD. But I am sure you knew that already :)

August 18, 2014 3:24 PM
 

Marco Russo (SQLBI) said:

David, yes you're right, and it would be a more readable pattern with no differences in execution plan. Before writing a pattern, I want to examine in more detail pros and cons of this approach, even if we've seen many scenarios where it's faster, I want to evaluate other possible side effects. But splitting the formula in multiple measures should not affect the query plan.

Thanks!

August 19, 2014 1:58 AM
 

Oxenskiold said:

I wonder if converting the IFs into arithmetic expressions will optimize the whole measure even more. I don't have any practical knowledge of what goes and what does not go regarding the storage engine, but if the storage engine can handle the DAX functions FIND() and SIGN() you could try to convert the IF-functions and see if that makes any difference.

(a=s) is similar to FIND(a,s,1,0)

(a<>s) is similar to SIGN(1-FIND(a,s,1,0))

---

(0 equals FALSE; 1 equals TRUE)

So IF ([Selection] = "YTD", 'Internet Sales'[Order Quantity], -1)

Can be converted to:

(FIND([Selection], "YTD",1,0) * 'Internet Sales'[Order Quantity]) + (SIGN(1-FIND(a,s,1,0)) * -1)

As a matter of fact since the FIND() function gives you the option to choose what should be returned in case it doesn't find the string you are looking for you might be able to make do with the following:

(FIND([Selection],"YTD",1,BLANK()) * 'Internet Sales'[Order Quantity])

OR alternatively

(FIND([Selection], "YTD",1,0) * 'Internet Sales'[Order Quantity]) + (FIND([Selection], "YTD",1,-1)

BTW if the IF-condition is a numeric expression you can use these expressions:

(a <> b) is similar to (ABS(SIGN(a - b)))

(a = b) is similar to (1 -  ABS(SIGN(a - b)))

(a >= b) is similar to (SIGN(1 + SIGN(a – b)))

(a < b ) is similar to (1 - SIGN(1 + SIGN(a – b)))

(a <= b ) is similar to (SIGN(1 – SIGN(1 – b)))

(a > b ) is similar to (1 - SIGN(1 – SIGN(1 – b)))

---

(0s equals FALSE; 1s equals TRUE)

So

IF([Amountsomething] >= 12000, [SpecialAmount], [plainAmount])

Becomes

(SIGN(1 + SIGN([Amountsomething] – 12000)) * [SpecialAmount])   +   ((1 - (SIGN(1 + SIGN([Amountsomething]  – 12000)))) * [plainAmount])

Albeit the maintenance of the code might suffer slightly, but you know in the name of optimization. :-)

September 6, 2014 5:58 AM
 

Oxenskiold said:

I was a little bit too fast in the writing so here is a CORRECTED version of my former comment.

I wonder if converting the IFs into arithmetic expressions will optimize the whole measure even more. I don't have any practical knowledge of what goes and what does not go regarding the storage engine, but if the storage engine can handle the DAX functions FIND() and SIGN() you could try to convert the IF-functions and see if that makes any difference.

(a=s) is similar to FIND(a,s,1,0)

(a<>s) is similar to SIGN(1-FIND(a,s,1,0))

---

(0s equals FALSE; 1s equals TRUE)

So IF ([Selection] = "YTD", 'Internet Sales'[Order Quantity], -1)

Can be converted to:

(FIND([Selection], "YTD",1,0) * 'Internet Sales'[Order Quantity]) + (SIGN(1-FIND([Selection], "YTD",1,0)) * -1)

As a matter of fact since the FIND() function gives you the option to choose what should be returned in case it doesn't find the string you are looking for you might be able to make do with the following:

(FIND([Selection],"YTD",1,BLANK()) * 'Internet Sales'[Order Quantity])

BTW if the IF-condition is a numeric expression you can use these expressions:

(a <> b) is similar to (ABS(SIGN(a - b)))

(a = b) is similar to (1 -  ABS(SIGN(a - b)))

(a >= b) is similar to (SIGN(1 + SIGN(a – b)))

(a < b ) is similar to (1 - SIGN(1 + SIGN(a – b)))

(a <= b ) is similar to (SIGN(1 – SIGN(a – b)))

(a > b ) is similar to (1 - SIGN(1 – SIGN(a – b)))

---

(0 equals FALSE; 1 equals TRUE)

So

IF([Amountsomething] >= 12000, [SpecialAmount], [plainAmount])

Becomes

(SIGN(1 + SIGN([Amountsomething] – 12000)) * [SpecialAmount])   +   ((1 - (SIGN(1 + SIGN([Amountsomething]  – 12000)))) * [plainAmount])

Albeit the maintenance of the code might suffer slightly, but you know in the name of optimization.

September 6, 2014 6:32 AM
 

Marco Russo (SQLBI) said:

Unfortunately the storage engine today doesn't support complex operations. FIND, ABS and SIGN are all functions executed by the formula engine.

In case you execute an iteration function and you have a row context, you see a callback to formula engine made by storage engine. It's not as fast as a native storage engine operation, but at least it could be multi-threaded (useful only for table with 2 million rows or more in Power Pivot, 16 million rows or more in Tabular)

However, when applied to measures, this technique doesn't work in the same way and moves the entire calculation is moved to the formula engine (which is single-threaded in this case), using results obtained by a low number of storage engine queries. This is usually a good idea, but doesn't work well in all situations.

My suggestions is always to optimize DAX only when it's necessary because of performance issues. Trying to optimize in advance might result in a damage to possible optimizations made by future versions of the engine.

September 6, 2014 7:00 AM
 

Paul Cunningham said:

Hi Marco

Within the filter, would it make any difference to use:  

ALL ( 'Internet Sales'[Order Quantity] )

rather than:

VALUES ( 'Internet Sales'[Order Quantity] )

?

I am curious about whether this removed the need to evaluate the filter context on 'Internet Sales'[Order Quantity], or if it actually costs more to override the filter with ALL?

Thanks

September 26, 2014 6:48 AM
 

Marco Russo (SQLBI) said:

VALUES keep the existing filter context, whereas ALL would remove it.

There are not substancial differences, but as I said in the article it would be better using a dedicated column with a single value equal for all the rows - at that point, using VALUES or ALL would not make any difference.

September 26, 2014 7:56 AM
 

Geiber said:

The following two measures do exactly the same, but the nested IF solution is faster in a pivot table connected to a SSAS 2012 tabular model.  Nested IF statements better than Switch statements, is it possible?

>>USIGN NESTED IF<<

DiscountRate:=IF(ISFILTERED ('DimBuilding'[BuildingID]),

CALCULATE(SUM([DiscountRate])/100, LASTNONBLANK('DATE'[Date], CALCULATE(COUNTROWS('BUILDING_FACT_TABLE')))),

  IF(ISFILTERED ('DimBuilding'[Building Name]),

CALCULATE(SUM([DiscountRate])/100, LASTNONBLANK('DATE'[Date], CALCULATE(COUNTROWS('BUILDING_FACT_TABLE')))),

IF(ISFILTERED ('DimBuilding'[Seller]),

CALCULATE(SUM([DiscountRate])/100, LASTNONBLANK('DATE'[Date], CALCULATE(COUNTROWS('BUILDING_FACT_TABLE')))),

IF(ISFILTERED ('DimBuilding'[Status]),

CALCULATE(SUM([DiscountRate])/100, LASTNONBLANK('DATE'[Date], CALCULATE(COUNTROWS('BUILDING_FACT_TABLE'))))

)

)

  )

)

>>SING SWITCH STATEMENT<<

DiscountRate:=SWITCH(TRUE(),

ISFILTERED ('DimBuilding'[BuildingID]),

CALCULATE(SUM([DiscountRate])/100, LASTNONBLANK('DATE'[Date], CALCULATE(COUNTROWS('BUILDING_FACT_TABLE')))),

ISFILTERED ('DimBuilding'[Building Name]),

CALCULATE(SUM([DiscountRate])/100, LASTNONBLANK('DATE'[Date], CALCULATE(COUNTROWS('BUILDING_FACT_TABLE')))),

ISFILTERED ('DimBuilding'[Seller]),

CALCULATE(SUM([DiscountRate])/100, LASTNONBLANK('DATE'[Date], CALCULATE(COUNTROWS('BUILDING_FACT_TABLE')))),

ISFILTERED ('DimBuilding'[Status]),

CALCULATE(SUM([DiscountRate])/100, LASTNONBLANK('DATE'[Date], CALCULATE(COUNTROWS('BUILDING_FACT_TABLE'))))

)

November 13, 2014 4:04 PM
 

Marco Russo (SQLBI) said:

Geiber, this is strange.

Can you provide me the version of the SSAS Tabular you are running and the SQL Profiler trace of the two executions? (including DAX Query Plan, VertiPaq queries and QueryEnd event). Thanks!

November 13, 2014 6:17 PM
 

Geiber said:

Thanx Marco,

It took me a while to prepare dax query plan, vertipaq queries and query end events.

SSAS:  Microsoft Analysis Server 11.0.5532.0

To sum up, I use the measure, DiscountRate, with two dimensions, dimBuilding and DimCalendar filtered to 2014-9.  Measure duration with nested-if and switch were 297 and 375 respectively in this short scenario. However, duration differences get longer when I used more dimensions along with others measures.

Because profiler trace returned a lot of data, I've shared that here:

https://dl.dropboxusercontent.com/u/18401645/PROFILER.txt

November 17, 2014 4:10 PM
 

Marco Russo (SQLBI) said:

Sorry but in this way I cannot see the timing for VertiPaq queries and Duration of formula engine. Can you share the two .TRC files generated by the profiler?

November 17, 2014 5:17 PM
 

Geiber said:

Sure Marco!, Here you are: https://www.dropbox.com/sh/71x4dvjep79a0bc/AACEbOXO6wK1-dp15HSWqSLGa?dl=0

You'll find dimension and measure names in Spanish

For instance, DiscountRate is 'PO_TASA_DESCUENTO'.

DimBuilding is 'Inmueble', etc...

Thanx ahead.

November 17, 2014 6:06 PM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement