|
|
|
|
-
As you might (or not) know, I travel a lot for work and, when possible, I always try to speak at local user groups. It is fun and I always learn a lot by speaking with other professionals like me. Being in Lisbon next week, I’ll deliver a speech at Microsoft Portugal with the NetPonto user group for a couple of hours of DAX, Vertipaq and many-to-many relationships. Here’s the link of the event, a free registration is required: http://netponto-lisboa-fevereiro-2012-powerpivot.eventbrite.com/ If you happen to be there, I’ll be really happy to meet you.
|
-
My 4yo son keeps asking me why I should leave home so often to attend SQL conferences around the world and today, after a couple of days of snow, he discovered a good reason, along with the best place for the SQLBits scarf.  
|
-
What is a circular dependency? If A depends on B and B depends on A, then you have a circular dependency. As programmers, we know that they should be avoided to obtain a working system. All this looks fine but, when authoring a simple DAX formula containing CALCULATE, SSDT (formerly known as BIDS, sometimes called Visual Studio… I love Microsoft attitude to find new names) warned me about circular dependency errors and, believe me, there seemed to be no loops inside my formulas. After some fight with the tool, it turned out that anytime you use CALCULATE in a formula for a calculated column you are in danger for a circular dependency and, to solve the issue, the easiest way is to add a key column to a table. Understanding why the problem exists is not trivial and it requires you to read the article I have published on the SQLBI website here: http://www.sqlbi.com/articles/understanding-circular-dependencies/. Clearly the scenario is the same for PowerPivot 2012 and for Tabular in SSAS 2012. Happy reading! And, as always, comments are very welcome. BTW: Happy new year to everybody! In case you forgot it, THIS is the year of SQL2012. 
|
-
Data modeling in Tabular is different from Multidimensional. I spend much time teaching people that they need to change their mind when approaching the Vertipaq engine, because it is a different tool with different capabilities. Nevertheless, I still have a lot to learn on this topic, as I am going to show in the next paragraphs. One customers of mine has a 4 Billion rows table containing some numbers he wants to work with. Among those, two columns contain a value and a weight and, when aggregating the value, it should be weighted by the weight. This is a very common pattern: in the banking environment, for example, you might have the rate and the amount and, to get the mean rate, you need to weight the detail rates by the amount. In my specific scenario, the value represents the age of a customer, ranging from 0 to 100, while the weight is an integer number ranging from 1 to 60M. Take note of the ranges, it will come handy later. I am used, in such a scenario, to store in the fact table the result of Age * Weight in a column called AgeMultipliedByWeight so that the aggregate can be computed as SUM (AgeMultipliedByWeight)/SUM(Weight). Following this pattern, I developed the solution and performances were pretty good. Nothing exceptional, but it worked fine. The reason to store the intermediate calculation is to use SUM instead of SUMX, getting better performances, as I normally teach during my classes. Then, during PASS 2011, I had the option to discuss this model with the dev team and it has been very funny to get their feedback, which sounded like: “why did you do something that stupid? You don’t need it, just compute the values runtime using SUMX, it is much faster and you consume a lot less memory”. Learning why this pattern is wrong in Tabular is definitely worth spending some time, because in Multidimensional this is the right pattern. So, let us think in DAX and try to figure out what’s wrong with storing intermediate calculations. First of all, we already know that the memory usage of Vertipaq depends on the number of distinct values of a column. It turns out that, while the age has only 100 distinct values, AgeMultipliedByWeight has 600 millions of different values. The difference is huge and, in fact, it turned out that the AgeMultipliedByWeight column was using a lot of memory (9Gb in my scenario with 4 billion rows). First lesson learned, increasing the number of distinct values of a column is easy as performing some computation over it and a lot of precious memory is wasted if you don’t pay attention to that. The second and most important consideration is that the DAX Formula Engine is capable to push simple calculations to the Vertipaq Engine, which runs in a multithreaded environment (while the Formula Engine is still single threaded). Thus, if a SUMX contains a simple calculation (i.e. simple math with no complex formulas), this formula will be pushed down to Vertipaq and will be executed in a parallel way directly at the leaf level. There’s a lot of work yet to be done in order to understand what “simple calculations” means, but I will speak about that in future posts, as soon as I discover more information about what can and cannot be pushed down to Vertipaq. Back home, I had the option to try this on my server and, as expected, performance is now much better. By changing the formula from SUM to SUMX, avoiding the intermediate calculated column, I saved 9Gb of RAM and the test query, which was running in 13 seconds, is now executed in 3 seconds, i.e. 4 times faster, scaling on all the available cores. Thus, the first lesson here is always the same: thinking in DAX is not easy, we need to get rid of many of the concepts we have learned with UDM/MDX and change the way we approach problems. The results, when following this practice, are awesome. There is another lesson, probably the most important one: speaking with people at conferences lets you learn a lot, much more than you will ever be able to learn alone in the office or at home. For the ones of you coming at SQL PASS Nordic… I am eager to speak with you too, see you there on 8-9 November.
|
-
This year I and Marco Russo will host a Birds of a Feather lunch about “DAX, Vertipaq and BISM Tabular” during PASS Summit in Seattle. If you want to spend some time speaking about DAX, the new Vertipaq engine of just have a good chat in front of some food to give a face to a name, just come there and sit down. The BOF lunch will be on Friday, there is a bunch of tables to choose from, as you can see in the complete list. Social networking is one of the main reasons to go to a conference, thus, don’t be shy, come to meet us in person: we’ll be really happy to share some time with you and provide a demonstration of true “Spaghetti English”.
|
-
Using Visual Studio to author SSAS Multidimensional projects, I was used to copy the project, deploy it with another name and then proceed with updates, until I was satisfied with the final result. Then, working with the two projects side-by-side I could check performance and numbers. I tried the same technique with Visual Studio 2010 with CTP3 Denali with a Tabular model. I created a copy of an existing project but, as soon as I opened it, I got the error “This model is already open in another location by the same user, please close the other project before working on this one”:  The reason for the error is that, copying the project folder, I copied the user settings file too, which contains the workspace database definition. Because the workspace is already in use by the previous instance of the project (which is still open, due to the way I work) and a workspace cannot be used by more than one active project. For more information about the user settings file and the behavior when opening a BIM project in Denali, refer to this very comprehensive post of Cathy Dumas. The solution to this problem is straightforward (once you learn it, thanks Cathy for explaining this!): just locate and delete the user settings file (which, by the way, is a hidden file, thus you will need to show hidden files before searching for it). As soon as Visual Studio opens a project with no user settings file, it will create a new workspace, detaching the copied project from the original one, and I can work with the two projects side-by-side. Tabular projects work in a different way, when compared with Multidimensional ones, we just need to learn new tricks, as always. 
|
-
Having read this question on the mdsn blogs, I investigated on the KEEPFILTERS function and, after having learned it, it is now time to write about it. Moreover, before start to write about it, I need to thank the dev team of SSAS and Marco Russo who helped me understanding this complex topic. This blog post is not an easy one, so let me start with some conclusions, in order to let you understand why you need to read the post up to the end and digest its content. KEEPFILTERS: - Is a new feature in the Denali version of DAX
- Is very useful, I would say necessary
- It solves a problem that is very common and very difficult to address
- It is complex. No, is is very complex. I think Rob will need to update his spicy scale of functions to make some place for KEEPFILTERS
- If you don’t use and understand it, you will incur in major problems with your formulas and debugging the wrong results will turn into a nightmare
These are the final considerations. Now, if you want to discover why these facts holds, roll up your sleeves and come with me in a travel in the land of filter contexts. We all know that filter contexts are the foundation of any DAX calculation. The CALCULATE function is used to create, alter, update filter contexts and to create any complex formula in DAX. We also know that iterators like FILTER, SUMX and AVERAGEX create a row context that gets translated into a row context when a measure is used as the formula to iterate. For example, to compute the average of yearly sales, we can define a couple of measures: TotalSales := SUM (FactInternetSales[SalesAmount])
AvgYear := AVERAGEX (VALUES (DimTime[CalendarYear]), [TotalSales])
The measure [TotalSales] inside AVERAGEX is called in a filter context that filters only one year. The values are then averaged to return the correct value. All this is well known.
For educational purposes, we are not going to use AVERAGEX but a modified version of the formula which is useless, but makes the concepts clearer. We simply substitute AVERAGEX with SUMX:
TotalSales := SUM (FactInternetSales[SalesAmount])
SumYear := SUMX (VALUES (DimTime[CalendarYear]), [TotalSales])
Thus, SumYear is equivalent to SUM (FactInternetSales[SalesAmount]). Or… it should be… as we are going to see in a few minutes, something weird will happen when we evaluate this formula over a user defined hierarchy.
In the following figure, I have put the calendar hierarchy on the rows, TotalSales and SumYear on the columns and the result is straightforward: the two results are identical.

Now, I can filter the time and decide that I want to see only July and August for 2001 and September and October for 2002. This can be easily accomplished filtering the hierarchy but, this time, the result is much more interesting:

If you look at the highlighted cells you should have the strong feeling that something is going wrong. From where do these strange numbers come? At the month level, everything is fine. At the year level and at the grand total the values shown make no sense at all. In order to understand what is going wrong here, we need, as always, to dive into the different filter contexts that exists during the steps of the computation.
First of all, we have filtered some months for 2001 and some other months for 2002, creating a complex filter that looks like this:
(DimTime[CalendarYear] = 2001 && (DimTime[MonthName] = "July" || DimTime[MonthName] = "August"))
|| (DimTime[CalendarYear] = 2002 && (DimTime[MonthName] = "September" || DimTime[MonthName] = "October"))
This filter contains two columns: MonthName and CalendarYear and the resulting filter is a mix of both columns, resulting in a relationship between the two columns. Please, read this sentence twice and keep this in mind: this filter contains two columns.
Now, what happens when the SumYear gets evaluated? This is the formula:
SumYear := SUMX (VALUES (DimTime[CalendarYear]), [TotalSales])
SUMX iterates over the values of CalendarYear and, for each value, it computes TotalSales, after having transformed the row context over CalendarYear in a filter context. Thus, if we unroll the iteration, the formula is equivalent to:
CALCULATE ([TotalSales], DimTime[CalendarYear] = 2001) +
CALCULATE ([TotalSales], DimTime[CalendarYear] = 2002)
We have two CALCULATE that set a filter on the year. Now it is useful to remember that when a column gets filtered inside CALCULATE, the new filter overrides any existing filter on the same column. Do we have any filter on CalendarYear? Yes, we do, because the previous filter context imposed by the hierarchy was filtering CalendarYear and MonthName. Thus, the engine will remove the filter on CalendarYear from that formula and then apply the new filter.
What happens to our original filter if we remove all the references to CalendarYear? It becomes:
((DimTime[MonthName] = "July" || DimTime[MonthName] = "August"))
|| ((DimTime[MonthName] = "September" || DimTime[MonthName] = "October"))
You see that removing CalendarYear from the filter it now become a completely different filter, that says: “any month from July to October is fine”. We are then going to add the new filter on CalendarYear and the resulting filter, under which [TotalSales] gets computed is:
DimTime[CalendarYear] = 2001 && (DimTime[MonthName] = "July" || DimTime[MonthName] = "August" || DimTime[MonthName] = "September" || DimTime[MonthName] = "October")
In other words, the cell for the year 2001 takes into account the months from July to October and, clearly, the same happens fro 2002. The final result is completely wrong because the original filter is lost. This is the reason for which numbers are wrong.
Luckily, the SSAS dev team addressed this problem in advance and gave us the magic function KEEPFILTERS. What KEEPFILTERS does is to modify the semantics of CALCULATE so that the new filter will not replace any existing filter but will be merged in AND with any previous filters.
If we rewrite our SumYear definition in this way:
SumYearWithKeepFilters := SUMX (KEEPFILTERS (VALUES (DimTime[CalendarYear])), [TotalSales])
We are asking DAX to take the values of DimTime[CalendarYear] but keep any existing filter on the same column in place, without removing them. With this definition, the filter context under which [TotalSales] gets evaluated is:
DimTime[CalendarYear] = 2001 && DimTime[CalendarYear] = 2001 && (DimTime[MonthName] = "July" || DimTime[MonthName] = "August"))
|| (DimTime[CalendarYear] = 2002 && (DimTime[MonthName] = "September" || DimTime[MonthName] = "October"))
And, this time, the formula will return the correct result because the original filter context is preserved and, with it, the relationship between months and years. You can see that in the following figure:

Now that we have understood the issue with SUM, it is easy to see that the same problem happens with AVERAGE, but it is harder to detect because numbers are not so easy to check.
Before to go to a conclusion, I would like to spend some more words on the topic and I will use DAX EVALUATE function to show how the same scenario can easily happen (and be verified) in DAX. The same model, deployed on SSAS Server in Vertipaq mode, can be queried with DAX and, to simulate the complex condition, we use CALCULATETABLE and put a filter on a CROSSJOIN. Take a look at this DAX query:
DEFINE
MEASURE FactInternetSales[SumYearNoKeep] = SUMX (VALUES (DimTime[CalendarYear]), [TotalSales])
MEASURE FactInternetSales[SumYearKeep] = SUMX (KEEPFILTERS (VALUES (DimTime[CalendarYear])), [TotalSales])
EVALUATE
CALCULATETABLE (
SUMMARIZE (
CROSSJOIN (
VALUES (DimTime[CalendarYear]),
VALUES (DimTime[MonthName])
),
ROLLUP (
DimTime[CalendarYear],
DimTime[MonthName]
),
"SumYeaNoKeep", [SumYearNoKeep],
"SumYeaKeep", [SumYearKeep]
),
FILTER (
CROSSJOIN (
VALUES (DimTime[CalendarYear]),
VALUES (DimTime[MonthName])
),
(DimTime[CalendarYear] = 2001 && (DimTime[MonthName] = "July" || DimTime[MonthName] = "August"))
|| (DimTime[CalendarYear] = 2002 && (DimTime[MonthName] = "September" || DimTime[MonthName] = "October"))
)
)
This query returns both [SumYearNoKeep] and [SumYearKeep] and, strangely, they return the same wrong value. This is because the damage of destroying the original filter context has already been done by the CROSSJOIN inside SUMMARIZE, which did not take into account the previous filters.
If we add KEEPFILTERS to the CROSSJOIN inside SUMMARIZE, the formula will be different:
DEFINE
MEASURE FactInternetSales[SumYearNoKeep] = SUMX (VALUES (DimTime[CalendarYear]), [TotalSales])
MEASURE FactInternetSales[SumYearKeep] = SUMX (KEEPFILTERS (VALUES (DimTime[CalendarYear])), [TotalSales])
EVALUATE
CALCULATETABLE (
SUMMARIZE (
KEEPFILTERS (
CROSSJOIN (
VALUES (DimTime[CalendarYear]),
VALUES (DimTime[MonthName])
)
),
ROLLUP (
DimTime[CalendarYear],
DimTime[MonthName]
),
"SumYeaNoKeep", [SumYearNoKeep],
"SumYeaKeep", [SumYearKeep]
),
FILTER (
CROSSJOIN (
VALUES (DimTime[CalendarYear]),
VALUES (DimTime[MonthName])
),
(DimTime[CalendarYear] = 2001 && (DimTime[MonthName] = "July" || DimTime[MonthName] = "August"))
|| (DimTime[CalendarYear] = 2002 && (DimTime[MonthName] = "September" || DimTime[MonthName] = "October"))
)
)
And, this time, the result will be the correct one. Take your time to understand well these two formulas, they are not easy ones but, hopefully, they will let you understand how KEEPFILTERS works and why it is needed.
Moreover, for the brave reader that has still wants to go deeper into the topic, it is worth to study this variation of the same query, where I have removed the FILTER on the CROSSJOIN and replaced it with a classical FILTER on the full table.
DEFINE
MEASURE FactInternetSales[SumYearNoKeep] = SUMX (VALUES (DimTime[CalendarYear]), [TotalSales])
MEASURE FactInternetSales[SumYearKeep] = SUMX (KEEPFILTERS (VALUES (DimTime[CalendarYear])), [TotalSales])
EVALUATE
CALCULATETABLE (
SUMMARIZE (
CROSSJOIN (
VALUES (DimTime[CalendarYear]),
VALUES (DimTime[MonthName])
),
ROLLUP (
DimTime[CalendarYear],
DimTime[MonthName]
),
"SumYeaNoKeep", [SumYearNoKeep],
"SumYeaKeep", [SumYearKeep]
),
FILTER (
DimTime,
(DimTime[CalendarYear] = 2001 && (DimTime[MonthName] = "July" || DimTime[MonthName] = "August"))
|| (DimTime[CalendarYear] = 2002 && (DimTime[MonthName] = "September" || DimTime[MonthName] = "October"))
)
)
This time the filter is not on the CROSSJOIN of the columns but on the full table, resulting in a table filter context. This formula will return the correct value for both [SumYearNoKeep] and [SumYearKeep] because the table filter context will impose its filter on the individual rows of the table and the new filter on CalendarYear will not touch the filter on the table. As always, in DAX there is a strong difference between a table filter context and a column one. In Denali we can now create a new kind of filter context that contains many columns from the same table and, for that kind of filter context, KEEPFILTER might be necessary to use to avoid inconsistent results in your formulas.
Moreover, even if it seems that KEEPFILTERS usage can be avoided by means of using table filters, keep in mind that we have provided this example using only one table. If you want to filter a CROSSJOIN that uses more than one table, then it will not be easy to create table filter contexts on that structure. Thus, KEEPFILTERS usage is much more convenient because it solves any scenario you will encounter.
|
-
September 29th I and Marco Russo will be delivering a full day of deep dive into the DAX programming language, during the training days of SqlBits 9. Later on, during the conference, I will deliver a session about many to many relationships in DAX, exploiting the power of the Tabular data model with DAX to solve complex problems with many to many relationships. Mine is just one among several great sessions delivered by many of the most talented SQL professional from all around the world. There are still a few seats available for the training day on DAX. Thus, if you are interested, hurry up and register yourself, I’m eager to meet you in Liverpool. You can find full information here: http://www.sqlbits.com/information/Event9/DAX_Deep_Dive/TrainingDetails.aspx. See you there
|
-
-
A new date is available for the PowerPivot workshop I and Marco are delivering all over Europe. On September 19-20, 2011 we will be in Lisbon (Portugal) to present the workshop to Excel users and BI professionals. Moreover, as Denali CTP3 of PowerPivot is now disclosed, we will present, for the first time, an enhanced version of the workshop with a new set of slides and demos aimed to understand what’s new in Denali for all the PowerPivot users. In the meantime, we are organizing a community event on the evening of the 19th, where we are going to speak about the roadmap from UDM to BISM and how to shape your data and skills for the incoming version of Analysis Services. We will publish all the details of the event as soon as we will know them. You can find more information and the registration page at www.powerpivotworkshop.com. I know from personal experience that Lisbon is a wonderful city and the food is simply great. Moreover, I have been assured that the weather will be nice too in that period. Thus, I look forward to see many of you at the workshop and at the community event, even from outside of Portugal!
|
-
Wow! Today I hit F5 on this interesting transaction: BEGIN TRANSACTION INSERT INTO World.Babies (Name, Surname, DateOfBirth, Gender) VALUES ('Arianna', 'Ferrari', '20110720', 'Female'); UPDATE World.Parents SET STATUS = 'Happy' WHERE NAME = 'Alberto' AND Surname = 'Ferrari'; COMMIT Thanks Caterina, I’m now the proud father of two babies, thanks to you! And… yes, I am going to lose some sleep, this time without thinkink at Denali. 
|
-
In my last post about Parent/Child hierarchies, there is a question, in the comments, that I found interesting. Nevertheless, the formula is a complex one and cannot be written in a simple comment. Thus, I am making a follow-up to that post. I am not repeating all the stuff of the previous post so, please, read that before reading this one, to have the necessary background. The question looks a simple one: “If you needed a measure (SumOfLeafAmount) that only shows childrens values, what would it be like?” I was about to answer that “I have a formula that is too large to fit in the margin” but… Fermat did it some years ago and caused a real mess. I don’t want to be responsible for such a complication. Thus, I am providing the answer in this new blog post. And… ehy, no, I don’t consider myself anyhow similar to that genius, I’m just joking! There two interesting points in this simple question: - There are two interpretation of the formula: one is “sum only the leaves”, the other ones is “sum only the children”. Both are valid and interesting formulas, but they will lead to completely different implementations, and I am going to show both.
- During the development of the “sum of the children” we are going to hit the CALCULATE wall with our head and… yes, it hurts.
Let us start with the set of data we are going to work on. The first figure is the hierarchy:  And, in the second one, I have put in the same figure a PivotTable containing the P/C structure with the SumOfAmount measure and the content of the Invoices table, which is the source of the Amount column.  Now, what are the desired results? - SumOfChildren, for Bill, is 1600, because his amount is coming from children only, i.e. Brad, Chris, Vincent, Julie. Bill himself is not producing any sale. SumOfChildren for Brad, on the other hand, is 900, because he sold 400 by himself with two invoices.
- SumOfLeaves, for Bill, is 1200 because the value produced by Brad is not on a leaf. Brad has children, thus he is not a leaf and his value should not be aggregated. Among its children, only Brad has a SumOfLeaves greater than zero, because he has children, while all others (Chris, Vincent and Julie) are leaves.
In the case of Annabel, both formulas will return the same value because all of Annabel’s children are leaves. Let us start with SumOfLeaves, which is pretty easy. Any row in the hierarchy can be a leaf or not. It is a leaf if it has no children, otherwise it is the parent of somebody, thus it is not a leaf. In other words, it is a leaf it there aren’t any rows in the hierarchy where the ParentNodeId has the value of the NodeId. We can define a new calculated column, called IsLeaf, with this formula: =COUNTROWS (
FILTER ('Hierarchy', 'Hierarchy'[ParentNodeID] = EARLIER ([NodeId]))
) = 0
Or, if you are a real fan of Denali and of the new Parent/Child functions, this one works too (and, these days, it makes you look cooler):
=COUNTROWS (
FILTER (
'Hierarchy',
PATHCONTAINS ('Hierarchy'[HierarchyPath], EARLIER ([NodeId]))
)
) = 1
Now, the hierarchy has a new column:

And the formula for the measure SumOfLeaves becomes straightforward, just a simple usage of CALCULATE:
SumOfLeaves=IF (
[BrowseDepth] > [MinNodeDepth],
BLANK(),
CALCULATE (
SUM (Invoices[Amount]),
'Hierarchy'[IsLeaf] = TRUE
)
)
The result, compared with SumOfAmount, is very clear:

The reason for which SumOfLeaves is easy is because each row in the hierarchy is either a leaf or not, its behavior is easy to define. In the case of SumOfChildren, this is no longer true. Each row can be aggregated to the total or not, depending on the level we are browsing.
Take, for example, the value of Brad. When we are summarizing Brad’s amount, his value should not be computed, because he is not a child of himself. Thus, SumOfLeaves(Brad)=900. But, when we summarize the value of Bill, Brad is a children of Bill and the complete amount of 1300 should be computed for Bill’s amount. Same row, different behaviors… things are becoming interesting.
If we want to compute the sum of all the children of a node in a P/C hierarchy, it is enough to create a filter context that isolates the current root of the tree. Once we have isolated that row, we can create a FILTER expression that computes all the nodes but that one. The problem is that a row can be the root at some point and a regular node at other points, the difference being the value of BrowseDepth.
In order to understand the incoming formula, we need to take some time to study our data set. I have created a new measure, to see the value of NodeDepth inside the PivotTable, which basically shows the value only when a single row of the hierarchy is selected:
[Node Depth] = IF (
COUNTROWS ('Hierarchy') = 1,
VALUES ('Hierarchy'[NodeDepth]),
BLANK ()
)
With this new measure, and after having modified slightly the definition of the Level1..3 calculated columns to always show the value of a node (it was previously blank), I got this very interesting figure:

Up to now, we have used the difference between BrowseDepth and NodeDepth to hide rows. Now we will use the same values to compute the SumOfChildren. By carefully looking at the PivotTable, it is easy to verify that the SumOfChildren of Brad will need to avoid computing the highlighted row, where BrowseDepth > NodeDepth. Clearly, when computing the value for Brad, we will be interested in the rows in the red box, all others are of no interest.
When the same row (the highlighted one) will be computed for Bill’s total, the value of BrowseDepth will be 1, while NodeDepth will still be 2. In that case, the value of Brad will be used to compute the total for Bill.
Thus, by using this simple test, SumOfChildren will compute the sum of the values of all the children of a node, avoiding the node itself, which is the only one where the condition is false. With all this in mind, the formula is straightforward:
[SumOfChildrenWrong]=
IF (
[BrowseDepth] > [MinNodeDepth],
BLANK(),
CALCULATE (
SUM (Invoices[Amount]),
FILTER (
'Hierarchy',
'Hierarchy'[NodeDepth] > [BrowseDepth]
)
)
)
We use FILTER to remove the nodes whose NodeDepth is not higher than BrowseDepth. Add this measure to the PivotTale and this is the result:

Et voilà, BLANK everywhere, not a single number shown. Something is wrong but… what? I strongly suggest you to check the formula by yourself at least twice before to continue reading because this is the CALCULATE wall and, hitting it, is always a pain.
Now, the solution. I already gave you the first hint: the problem has something to do with CALCULATE. Ok, how many CALCULATE do you see in the formula?
If you answered 1, which is the naïve answer, you are able to read a formula, but you hit the CALCULATE wall. If you answered zero, two or three, it means you need better glasses. But if you answered the only right answer, which is FOUR, then you have already touched the DAX karma: you already know what I am going to write.
Where are those four CALCULATE? We need to remember that whenever a MEASURE is used inside a formula, it is automatically surrounded by a hidden CALCULATE. And, in this formula, we have three measure calls:
- [BrowseDepth] in the first parameter of IF
- [MinNodeDepth] in the first parameter of IF
- [BrowseDepth] in the inner test of the FILTER
These three measures, plus the evident CALCULATE, make the total of four CALCULATE inside the formula.
Now, what happens for the [BrowseDepth] measure computed inside FILTER? We have a CALCULATE inside a loop, that means that the row context introduced by FILTER (yes, FILTER is an iterator, never forget it!) is converted into a filter context before evaluating the measure. Thus, the [BrowseDepth] is computed in a filter context where the only visible row of the Hierarchy table is the currently iterated row. Check it by yourself, the value of that [BrowseDepth] is always three, because only a single row of the hierarchy is visible. The original filter context is lost when that measure is computed.
The value of [BrowseDepth] inside the FILTER is NOT the value of [BrowseDepth] for the original filter context, it is the [BrowseDepth] in a new context introduced by the automatic CACLULATE added by the measure call.
If we want to compute our measure, we need to avoid that nasty CALCULATE created by the engine. The only way to do that is to remove the measure call and expand it to its measure definition (which you can find in my previous post):
[SumOfChildren]=
IF (
[BrowseDepth] > [MinNodeDepth],
BLANK(),
CALCULATE (
SUM (Invoices[Amount]),
FILTER (
'Hierarchy',
'Hierarchy'[NodeDepth] >
IF (ISFILTERED ('Hierarchy'[Level3]), 3,
IF (ISFILTERED ('Hierarchy'[Level2]), 2,
IF (ISFILTERED ('Hierarchy'[Level1]), 1
)))
)
)
)
The formula is identical to the previous one but, this time, we are not calling a measure, we are computing a formula. Thus, no automatic CALCULATE is added to the expression. Guess what? It now works as expected

Now, you see that all the leaf level nodes do not have any value and this is expected, since their value is provide by themselves only, not by their non-existent children.
I personally don’t believe that this formula is really useful but it might mean something to a customer who give me some money to compute it. Thus, it would be useful at least for me. The interesting point of this post is that DAX is simple, but not easy. This is the best definition of DAX I have ever heard. Until you fully understand CALCULATE and all its implications… DAX will be a black art. Take your time to study it, and DAX will become your best friend.
|
-
Some time ago, I wrote a post about how to handle Parent/Child hierarchies in PowerPivot 1.0. It was pretty intricate, because P/C are not handled natively in PowerPivot. Now, with the advent of Denali, Microsoft gave us all the functions needed to handle P/C in a convenient way. Kasper has already wrote a nice post about P/C here. His post is very informative and I suggest you reading it to look at all the P/C functions. In this post, I will need to repeat something he has already said but, hopefully, I will add some more insights into the DAX language solving a small issue that still exists with Parent/Child hierarchies, at least in CTP3. Let us start with the data model. We have this data model (wow, take a look at how wonderful is the the new graphical view of tables in PowerPivot… great job guys!):  where the Hierarchy table contains this unbalanced P/C hierarchy, using the classical ParentID data model:  In order to show this P/C data structure in a PivotTable, we need to understand how DAX implements P/C handling. It is based on some very powerful functions, the most important of which are: - PATH: computes the full path of a node following the ParentID column, starting from the root
- PATHITEM: extracts an item from a path, as returned by PATH
- LOOKUPVALUE: searches for a value inside a table, following a relationship that is valid only inside the formula
Thus, by means of defining some simple columns: - HierarchyPath = PATH([NodeId], [ParentNodeID])
- Level1= LOOKUPVALUE ([Node], [NodeId], PATHITEM ([HierarchyPath], 1))
You get this result, which is self-explaining:  In order to have the Parent/Child hierarchy work, we still need to create a user define hierarchy putting the various Levels, from 1 to 3, inside a single hierarchy:  Et voilà, the work is done. We can put this hierarchy in a PivotTable, expand all the levels and we get this result:  There are both good and bad news here. - The good news is that all the levels, once put into a hierarchy, behave exactly like a Multidimensional P/C.
- The bad news is that if the hierarchy is a ragged one, as it is often the case with P/C, Denali CTP3 still makes a load of empty nodes visible.
SSAS handles this scenario with the usage of the HideMemberIf property of hierarchies, which hides useless nodes from the final result shown by the PivotTable. In CTP3, there seem not to be a way to set the HideMemberIf property for hierarchies, leading to this unwanted behavior. Hopefully, in the final release of Tabular the HideMemberIf will be available (if you are interested in having it, please vote this connect item, to make Microsoft guys aware of how useful this feature would be). If the item will be solved, then the next part of this post will be useless and life much easier. Thus, you might want to follow that item to check for its solution. Nevertheless, if we want to mimic the HideMemberIf feature in CTP3, we still need to make some work with DAX. So, let us roll up our sleeves and start thinking at a viable solution. The key to hide the unwanted rows is to note that: - We can define, for each node, its depth. Annabel and Bill have a depth of one, being root nodes. Catherine, being a child of Annabel, has a depth of two. Vincent, being a nephew of Bill, has a depth of three. The depth can be defined at the row level of the Hierarchy table.
- We can define, for each cell, a Browse depth which is the depth up to which the P/C hierarchy has been unfolded, to make that cell visible.
With these definitions in mind, our PivotTable can be seen in this way:  If we hide, from this PivotTable, all the rows where NodeDepth is greater than BrowseDepth, we will reach the desired result. In PowerPivot 1.0 this was a big pain in the neck, as can be seen in my previous post. In Denali CTP3, the solution is much easier and elegant, because there is a very interesting function available in DAX: ISFILTERED. ISFILTERED gets a column reference and returns a Boolean value indicating whether the column is filtered in the current filter context or not. Now, it is easy to note that a cell with BrowseDepth of 1 has a filter on the column Level1 and no filter for Level2 and Level3. A cell with BrowseDepth of 2, will have a filter on Level1 and a filter on Level2, with no filter for Level3. Thus, we can define the BrowseDepth measure with this formula: BrowseDepth := IF (ISFILTERED ('Hierarchy'[Level3]), 3, IF (ISFILTERED ('Hierarchy'[Level2]), 2, IF (ISFILTERED ('Hierarchy'[Level1]), 1 ))) This measure will compute, for each cell, the browse depth. We are still missing the node depth of each row. Another useful DAX function will help us: PATHLENGTH, which computes the length of a path column. We can add a calculated column NodeDepth=PATHLENGTH([HierarchyPath]) and get this:  Clearly, the NodeDepth cannot be aggregated using SUM, we need to use MIN. Thus, we define a new measure MinNodeDepth which simply aggregates the NodeDepth column using MIN. The final result, with all these measures in a PivotTable, can be seen here:  The final touch is to make the Amount measure BLANK whenever BrowseDepth is greater than MinNodeDepth, with this definition: SumOfAmount :=
IF (
[BrowseDepth] > [MinNodeDepth],
BLANK(),
SUM (Invoices[Amount])
)
All the measures in a PivotTable:

And, after removing all the technical measures, we get the final result, which is nice as we wanted it to be:

Pretty easy, elegant and super fast, all of the requirements of DAX are met 
I am confident that the SSAS team will take the time to let us use Parent/Child hierarchies with the HideMemberIf option. In the meantime this is a viable solution and a good exercise to start getting acquainted with the new DAX functions.
Last, but not least, in Tabular P/C hierarchies are standard user hierarchies. Thus, you can have as many P/C for a single table as you want. I know of many customers who will be happy to have more than one P/C hierarchy on the same dimension, without having the need to define separate dimensions as it was the case with Multidimensional in SSAS.
|
-
Surfing on the web, here, I came into this intriguing question: How do we ask something like "Show me how many customers have an iPad but don't have a book?" We are speaking about a sales analysis where the canonical entities are Customers, Products and Sales. Moreover, because we have been used to speak about cycles since many years ago, due to the nature of AdventureWorks, the question can be posed in AdventureWorks’s language as: Of all the customers who have bought a mountain bike, how many have never bought at least one mountain tire tube? The marketing need to answer this question is evident, if my customers have bought a mountain bike, they need to buy tire tubes. If they don’t buy them from me, it means that they are taking them somewhere else, and this is not good. There are some interesting points in this simple question that turn the scenario into a challenging one: - We want to filter customers based on sales and then perform further analysis on the behavior of these customers
- The simple presence of “not” in the question makes it much more complicated because we will need to check the customer that did not buy tires and this is not very intuitive to compute because we need to search for lacking information
- The question has been posed for a single product but it is clear that the customer might want to perform the analysis on categories of products too. Moreover he probably want to be able to slice at the single product level.
- Finally, we need to find a generic solution which is easy to query with a PivotTable, because users like Excel
The scenario is an interesting one, definitely worth a post. What about the solution, is it interesting too? It is for one simple reason: in order to solve it we will need to search for a many-to-many relationship where we are not used to find one. Moreover, I love all the solutions that force me to think in DAX and, as we are going to see, the scenario is quite simple in DAX while it would be much more complicated using any other tool. Well… beware that “quite simple” does not mean “simple”, the DAX code is a bit intricate but, hopefully, I am going to help you reading it. A simple side note: this data model has been developed with PowerPivot but, as the new data modeling paradigm has been disclosed just a few days ago with the name “tabular” in the BISM architecture, I will refer to this as a “tabular data model” to distinguish it from the “multidimensional data model”. In fact, this scenario is very similar to the survey data model described in the “Many-to-Many revolution” by Marco Russo even if the data structure is not so evident at first glance. Let us start speaking about the data model. It is a canonical star schema as taken from the AdventureWorks data warehouse:  Importing these tables (and their relationships) inside PowerPivot is a good first step but soon we will discover that something is missing. We need to have two instances of the DimProduct table. One is needed to detect the customers who have bought one of the selected product, in our example the mountain bike. The other one will be needed to select the product which we want to check, in our example the tire tubes. Thus, we will need to load the DimProduct twice inside the tabular data model. Because the new instance of DimProduct will be used for filtering, we name it “ProductFilter”. Thus, the complete data model we are going to work with is the following:  This definitely looks like a strange diagram: the new ProductFilter table has no relationships with the fact table nor with any dimension. In fact, if you think at it carefully, you’ll discover that we have no way to create a relationship between the fact table and the ProductFilter table, because the only available ProductKey in the fact table has already been used for the relationship with DimProduct. Tabular, among its limitations, can use a column for one relationship only. Nevertheless, we don’t care about it: the point is that that relationship is not needed, we will write the DAX code without leveraging its existence. Before to dive into the DAX solution, let us spend some time to describe the desired result. We want to be able to produce a report like this:  In the report, we have selected four model types in the vertical slicer (mountain bikes from 100 to 500) which are the product we require the customer to have bought. Then we have selected two product models in the horizontal slicer (tire tubes and patch kits) which represent the products we want to check whether the customer have bought or not. On the rows we have put FilterModelName and the ProductName columns. On the columns, as usual, there is the time. The two measures shown are: - HavingProduct: it counts the number of customer who bought a mountain bike and some tire tube or patch kit.
- NotHavingProduct: it counts the number of customer who bought a mountain bike but no tire tube nor patch kit.
Due to the nature of the measures, we have decided to make these measures incremental over time. This means that, in 2003, we count all the customers that have bought a mountain bike before the end of 2003 and have bought tire tubes before the same time. In 2004, we perform the same computation, thus including the customers of 2003. It can be changed, but it makes sense to compute the value this way. Thus, the PivotTable can be read as (first line): 396 (43+353) people have bought a Mountain-100 bicycle before 2003 and, of those, 43 have bought either a mountain tire tube or a patch kit. 353 of them did not buy tires or tubes (at least from us). The same for 2004 and for all other cells. Ok we spent enough time describing the scenario, it is now time to write some DAX code. Because the final formulas turn out to be quite complicated, we will describe them step by step, trying to figure out the algorithm before giving the final formula. Let us start with the HavingProduct measure, which is somehow simpler since it does not contain the “not” part. First of all, we need to compute the set of the customers who have bought a mountain bike before 2003. Let us focus on the worksheet: there is a slicer filtering model names on the ProductFilter table and we want to use this filter to detect the customer who have bought these products. Moreover, the presence of the year on the columns creates a filter on the time table, showing only the sales for 2003. The steps we need to carry on are two: - We need to push the filter from the ProductFilter to the customers, using the fact table as a bridge
- We need to extend the filter context on time to show all the time before the end of 2003, since we want “before 2003”
We need to take into account the fact that the scenario is a bit more complicated than what we have spoken so far. In reality, we forgot to speak about the horizontal slicer. Its presence means that the fact table is filtered from the DimProduct too. That filter exists because there is a relationship between DimProducts and the fact table. Moreover, remember that our task is to push the filter on ProductFilter to the DimCustomer, passing through the fact table and there are no relationships between ProductFilter and the fact table. It comes out that the fact table, for our first task, is automatically filtered by the wrong product dimension. Now, with all these considerations in mind, the algorithm looks like: - For each customer
- For each product
- Check if there are sales
- Of at least one of the products selected in ProductFilter
- Anytime before the end of the currently selected year
The key of the solution is to think at the fact table as being a bridge table that implements a many-to-many relationship between customers and products. While sales are not usually intended to implement this relationship, it is clear that this relationship holds. Thus, for each customer, we can filter the fact table showing only the sales of that customer, of the products selected in ProductFilter and of the dates before the end of the current year. If there exists at least one row, this means that that customer has bought one of the ProductFilter products. But, remember that the fact (now bridge) table is already filtered by the wrong product dimension. Thus, we need to remove that filter and replace it with the one we want to use. Although non very easy, the formula that detects these customers should now be understandable: FILTER (
DimCustomer,
SUMX (
ProductFilter,
CALCULATE (
COUNTROWS (FactInternetSales),
ALL (FactInternetSales),
FactInternetSales[CustomerKey] = EARLIER (DimCustomer[CustomerKey]),
FactInternetSales[ProductKey] = EARLIER (ProductFilter[ProductKey]),
FILTER (
ALL (DimTime),
DimTime[TimeKey] <= MAX (DimTime[TimeKey])
)
)
) > 0
)
If we use this FILTER inside a CALCULATE, we will be sure that only the customers who have bought one of the ProductFilter products will be visible. Please note that the inner CALCULATE need to call ALL on the fact table to extend the filter context outside of the current year and remove the existing filter based on DimProduct (which is still active at that point). Having done that, we need to re-apply the filters on the customer, on the product (this time based on ProductFilter) and, finally, on the time (the fact table gets filtered due to the relationship with DimTime).
Before continuing to read, please focus on the SUMX (third row) and answer this simple question: “why is it there?”. I ‘m not going to give the answer, use it to check if you got a good insight of the formula behavior. If not, read it again. 
For what concerns the HavingProduct, the hard part ends here. The final formula of HavingProduct is simply a COUNTROWS(DISTINCT…)) using this complex filter context:
CALCULATE (
COUNTROWS (DISTINCT (FactInternetSales[CustomerKey])),
FILTER (
ALL (DimTime),
DimTime[TimeKey] <= MAX (DimTime[TimeKey])
),
FILTER (
DimCustomer,
SUMX (
ProductFilter,
CALCULATE (
COUNTROWS (FactInternetSales),
ALL (FactInternetSales),
FactInternetSales[CustomerKey] = EARLIER (DimCustomer[CustomerKey]),
FactInternetSales[ProductKey] = EARLIER (ProductFilter[ProductKey]),
FILTER (
ALL (DimTime),
DimTime[TimeKey] <= MAX (DimTime[TimeKey])
)
)
) > 0
)
)
Nevertheless, before continuing, try to clearly understand exactly the different filter contexts that are active in this formula, because we are using completely different filter contexts in different parts of the same formula, computing different values. In fact, the interesting part is understanding in which part of the formula is the DimProduct filter (horizontal slicer) still active.
Now, after you have digested this first formula, it is time to think at the other one: the one implying the “not” part. If you got the insights of the first one, this formula should be pretty easy. It is enough to note that the customers who did not buy any of the products selected in DimProducts, will simply have no sales for those products in the required period. Thus, using again the fact table as a bridge between customers and products, we can apply one more time the same pattern reversing the condition.
The formula for NotHavingProduct is very similar to the previous one, the only difference being in the first argument of the first CALCULATE:
=CALCULATE (
COUNTROWS (
FILTER (
DimCustomer,
CALCULATE (COUNTROWS (FactInternetSales)) = 0
)
),
FILTER (
ALL (DimTime),
DimTime[TimeKey] <= MAX (DimTime[TimeKey])
),
FILTER (
DimCustomer,
SUMX (
ProductFilter,
CALCULATE (
COUNTROWS (FactInternetSales),
ALL (FactInternetSales),
FactInternetSales[CustomerKey] = EARLIER (DimCustomer[CustomerKey]),
FactInternetSales[ProductKey] = EARLIER (ProductFilter[ProductKey]),
FILTER (
ALL (DimTime),
DimTime[TimeKey] <= MAX (DimTime[TimeKey])
)
)
) > 0
)
)
You should recognize the many-to-many pattern twice in this formula: once for ProductFilter (same as above) and another one in the formula computed by the first CALCULATE. Please note that this time (first part of the formula) we had no need to use SUMX and, again, understanding why is one of the key points of understanding the formula. Here we leverage automatic relationships, before now we had to manually implement a sort of “IN” operator and the SUMX came from that need.
Well, the work is done, the two formulas can be used as measures in a PivotTable and the user is free to slice using any attribute of the two product dimensions, going at the customer level, if he wants to, with the usual blazing speed of DAX. And… if you think that these are long and complex formulas, please think at the length of your average SQL query, is it really shorter than these two formulas? Mine are not.
The post should end here, or it might end with an invitation to our PowerPivot Workshop, as I often do. But, this time, I want to end it in a different way, sharing with you the considerations I made while I was writing the post.
Long before starting to think at this scenario, I already had a good knowledge of how to handle many-to-many relationships with DAX. Nevertheless, I never thought about using fact sales as a bridge table and the “not” part of the problem was completely new to me. I started to work at this data model and reached the final formulas in less than one hour. Looking at the final formula I am sure that no Excel power user will be able to write such a formula by his own, so I agree that this is not self-service BI.
But now we all know that DAX is the foundation of the tabular side of BISM. This means that BI consultants, with good data modeling skills and a good DAX knowledge, will be able to solve such a scenario for their customers in less than one hour, ending up with a working model the customer can play with. Not a prototype… a working model!
I don’t know what are your feelings about DAX, BISM, tabular and multidimensional, but there is a fact that simply cannot be denied: DAX and the tabular data model are cool. Once you get used to think in DAX, formulas can be written at an amazing speed and they end up working at a more amazing speed. I think there is no doubt that customers will be happy to see what DAX can do for their needs, even if they will not be able to write code by themselves. How many of your customers would be happy to pose you such a question and get answered: “ok, give me an hour and I’ll bring you a working Excel workbook”?
To come to an end, I think that the future of BI is brighter than ever, thanks to the effort of all the guys at Microsoft who spend their time figuring out what we will eventually need and who have decided to implement such a wonderful environment for BI, years before we even started thinking at it. I oftentimes blame MS guys when something does not work as expected but I am very happy to say that whenever I am able to solve complex problems so quickly I can think only one thing of them: “You guys rock, DAX rocks. Thank you and keep on with your great work”.
|
-
Browsing on the web I have seen an interesting question that is worth a post, since I think it is a very common pattern and a good example of “thinking in DAX”. Starting from a table which contains a people and two dates (ActiveFrom, ActiveTo), we want to compute how many days an individual has been active over some period of time, let us say at the month level. As always, a picture is worth a thousand words, here is the input and the desired result:  The solution is very easy by means of using a calendar table. To create one, it is enough to create an Excel table like the following one and use “Create Linked Table” from the PowerPivot tab of the Ribbon.  Once linked in PowerPivot, you end up with the two tables, that I called Activities and Calendar. Beware that there is no need at all to create relationships between the two tables, we will solve the scenario using DAX only. Moreover, there is no means to create these relationships, since the dates represent a time period, not singular dates. To reach the desired result, it is enough to note that, for each record in the Activity table, the number of rows in the calendar table that are between ActiveFrom and ActiveTo, represent the total number of activity days. Thus, the formula is straightforward: =SUMX (
Activities,
CALCULATE (
COUNTROWS (Calendar),
Calendar[Date] >= EARLIER (Activities[ActiveFrom]),
Calendar[Date] <= EARLIER (Activities[ActiveTo])
)
)
This formula seems to always compute, for each row in the activities table, the total number of active days for the individual, since it does not take into account years and months. Surprisingly, if you write this code in a measure and put it on a PivotTable, you will get the result shown at the beginning.
The reason for which the formula works is that the inner CALCULATE creates a filter on the Calendar[Date] but does not change the current filter context on year and month. Thus, for each cell, the filter on year and month is still active and produces the correct result. Moreover, the same formula work for any calendar period without any change.
I guess for PowerPivot newbies this behavior seems like magic, at least it was so for me a few months ago… in reality, understanding PowerPivot is just a matter of understanding row and filter contexts, once you get them formulas are really simple to write.
An interesting exercise, left to the reader, is to count the number of working days of activity. Easier than you might think and surely worth spending some time if you want to learn some DAX data modeling. 
|
|
|
|
|
|