THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Calculate New, Returning, Lost, and Recovered Customers in #dax

Calculating the number of new and returning customers is a recurring question. I would say this is a “classical” Business Intelligence problem, very common in marketing department. I worked on these problems with many customers, with small and large datasets, and I wrote a DAX Pattern “New and Returning Customers” showing how to calculate:

  • New Customers: customers who never made any purchase
  • Returning Customers: customers who bought something in the past
  • Lost Customers: customers who bought something but did not buy in the last N days days
  • Recovered Customers: previously “lost customers” who made a new purchase

This is not a brand new topic, you can find many other blog posts on this topic (Chris Webb, Javier Guillén, Gerhard Brueckl, David Hager, Rob Collie), so my goal was to show very generic formulas that were generally the best solution in term of performance. This make the formula less readable, such as the following:

[Returning Customers] :=
COUNTROWS (
    CALCULATETABLE (
        VALUES ( <customer_key_column> ),
        VALUES ( <customer_key_column> ),
        FILTER (
            ALL ( <date_column> ),
            <date_column> < MIN ( <date_column> )
        )
    )
)

As you see, using CALCULATETABLE ( VALUES ( table[column] ), VALUES ( table[column] ), … ) seems a useless thing. Why counting the rows returned by VALUES and passing it also as a filter argument? This is a not so intuitive behavior of CALCULATE. The first argument is an expression that will be evaluated in a modified filter context. The third argument is a FILTER on the date column, which extends the range of dates considered, including all the past sales transactions. At this point, the first VALUES would return any customers who made a purchase in the past, but the second argument will only considered those that made a purchase in the current selection of time. The final result is an AND condition between two sets of customers (the intersection of the two sets), which is faster than trying to calculate the number of past transactions of each customer who made a purchase in the current selection of time, filtering only those that results in zero transactions.

In general, I prefer using more readable DAX formulas, also in DAX patterns, optimizing them only when necessary. But in this case the performance might be important (visible to the user) also with a few thousands of customers. As usual, any feedback on the New and Returning Customers pattern will be very welcome!

Published Thursday, June 26, 2014 10:13 AM by Marco Russo (SQLBI)

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Ronald said:

Nice one. I wrote some t-sql the do the same, maybe also interesting to take a look at:

http://www.sqlblog.nl/2014/08/calculate-lost-customers-using-t-sql/

Can do the same trick for new/recurring customers.

grts. Ronald

August 20, 2014 5:32 AM
 

Marco Russo (SQLBI) said:

Hi Ronald, thanks for the link!

August 20, 2014 7:14 AM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement