THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

  • Power Query now connects to Analysis Services #ssas #mdx #dax

    The last update of Power Query finally supports Analysis Services Multidimensional and Tabular. I waited this version for a very long time, but now it's finally here!

    Chris Webb already wrote an interesting blog post with several useful link and informations. 

    You can connect to both Multidimensional and Tabular, but queries are generated in MDX. For this reason, I consider Multidimensional as a primary source by now. Many improvement can be done for multidimensional, whereas Tabular would benefit from DAX support at first.

    I want to share my feedback and I already look forward to future improvements: please use the comment to this blog post to share your priorities for these features I would like to see.

    Expose Keys

    Each Hierarchy in Multidimensional has one or more levels, corresponding to dimension attributes.
    Each attribute has up to three properties:

    • Key
      • Can be multipart, which means it has one or more columns, each column can have a different data type
    • Name
      • This is always a string. If not defined, it corresponds to the Key, which must have a single column. If the attribute has multipart key, the Name has to be defined in the Multidimensional model
    • Value
      • This property is optional and is not used often. However, it could represent the name in a numeric value and it’s used when the key cannot be used to represent the corresponding value. This property is not shown to the user but can be used in MDX calculations.

    The Name is the one shown in the user interface of power Query. However, when you extract data from a cube, most of the times you need the key column(s) in order to create relationships with other query. For example, a very common scenario is creating three tables, corresponding to two dimensions and a fact table. The keys used to connect the dimension tables to the fact table are mandatory, but are not exposed in the attribute name visible in the user interface of a cube.
    Thus, exposing the Key is very important. Please remember that the Key can be a multipart key, so you might have one or more columns to add.
    If a user request the Key, by default I would show all the columns of a multipart key so he can remove those columns that are not required (but usually you’ll need all of them). Since the Value is not requested so often, I would expose it as a separate menu.

    Surrogate Keys and Other Hidden Attributes

    Depending on the cube design, it would be nice exposing those attributes that are hidden in the cube.
    For example, in a well-designed cube, the model should not expose the surrogate keys in a visible attribute, because this would create a dependency in reports that would break a filter based on a surrogate key in case the relational tables are reprocessed and surrogate keys are regenerated. The general idea is that a surrogate key does not have a semantic meaning. Thus, it shouldn’t be exposed to an end user.
    However, if you are importing several tables from a Multidimensional model, trying to create one table per dimension and one table per fact table (maybe by changing the cardinality, reducing the number of dimensions used), then you should import the surrogate keys too, at least for SCD type 2. Unfortunately, there are no info in the cube model that could help you discriminating between SCD1 and SCD2, so I’m not sure about what could be the best practice in this case. Probably, hidden attributes should be accessible only by advanced users, exposing them by default could be really confusing and I would avoid that.
    This is an area where it’s hard to make a right choice, a compromise is required.

    Related Properties

    An attribute can have other properties related to it. By default, all browsable parent attributes can be considered attribute’s properties. However, when we talk about attribute’s properties we usually refer to the non-browsable attribute. Each non-browsable attribute is just another attribute. Non-browsable attributes are not shown in the list of attributes of a dimension, but they should be available to the user that want to import data for a certain column. The user interface could be designed in several ways for that:

    • Show attribute’s properties in a window from which the user can drag&drop – but maybe confusing – the UI should show only properties existing for a particular attribute and each attribute might have different properties. User interface might be a concern here.
    • Automatically import all the properties of an attribute (maybe by asking confirmation to the end user?) when adding that attribute to a query. Then the user can remove the columns that are not required in the Query.

    Show Formatted Measures

    Sometime it might be useful to import the formatted value of a measure. However, I would not replace the number imported today with the formatted value, because the latter could be a string that does not have any semantic meaning. Providing the option of importing the formatted measure as an additional column in the PowerQuery window would be very welcome, but don’t just replace one with the other.

    Invisible attributes

    Currently, Power Query shows all the attributes of a dimension, which in general is a good thing. However, I would put a checkbox that shows/hides invisible attributes. By default, I would show only visible attributes, because this is what user would be more familiar with. THe “show invisible attributes/columns” should be an advanced view.

    Multiple selection without measures

    if you select attributes from several dimensions without selecting a measure, you obtain as a result the Crossjoin between the tables you selected. In my opinion, this is counterintuitive and useless: I cannot imagine a use case where this would be meaningful. A much better solution would be importing every dimension as a single table, just as you do when you select many tables from SQL Server. It is the user that will handle joins between table, if necessary. My suggestion is to keep the existing behavior (import a single table) only when you import also a measure, even if I would like to be able to import all the dimensions and the set of measures as separate tables in the data model – creating one query for each dimension and one query for each measure group (or for each cube – not sure about the better approach here).
     

  • What are useful tools and resources for DAX developers? #dax #powerpivot #tabular

    At the last PASS Summit I received an interesting question: is there a list of all the useful (I would say necessary…) tools for DAX developers? My answer was… “no, but this is an interesting topic for a blog post”.

    In the meantime, I thought that a page to keep updated would have been better, and of course an easy-to-remember URL is equally important. So here is the URL where you will find an updated list of tools and resources useful to any DAX developer:

    http://sql.bi/daxtools

    Of course, feedback are welcome!

  • Don’t use SUMMARIZE to sum your data–or just be careful #dax #ssas #powerpivot

    During the last PASS Summit I and Alberto Ferrari had long discussions at SQL Clinic with some of the developers of our loved tools. Sometime you really have to dig in the internals of certain feature to understand why there are some “unexpected” behaviors for certain functions. One of the discussions was about SUMMARIZE. This function can be very powerful (after all, it’s a way to do a join between related tables…) but also very dangerous, because of the way it implements its logic (especially for the ROLLUP condition).

    The rule of thumb, that we already mentioned in the past, is to use SUMMARIZE only as a way to execute a sort of SELECT DISTINCT, and not to create column to aggregate values. Use ADDCOLUMNS for this other job. We previously mentioned mainly performance reasons for that, but now we have a more complete description of why you should avoid SUMMARIZE for computing aggregations: you might obtain a different result than the one expected. The complete discussion of the issue and of the workarounds is included in the new article All the secrets of SUMMARIZE written by Alberto Ferrari.

  • BI Announcements at PASS Summit 2014 #sqlpass #powerbi #powerpivot

    This morning the PASS Summit 2014 started in Seattle and during the keynote there was several announcements from Microsoft. I’m considering here only the ones about Business Intelligence (you will find other blogs around about SQL Server).

    • In the coming months, Azure SQL Database will get new features such as column-store indexes, which can be very interesting for creating data marts on the cloud
    • Another upcoming feature in SQL Server will be an updateable columns-store index on in-memory tables. Real-time analytics will like this feature.
    • For a store analysis, an interesting demo using Kinect capturing heatmap to display which areas of a shop store have been visited more using Power Map. Just a demo, but it’s an interesting idea and the best big data demo I’ve been so far (something you can implement in the real world using big data technologies without being Twitter or Facebook).
    • New Power BI dashboards: many new visualizations and a new user interface to place data visualizations on a dashboard (similar to the grid you have in DataZen if you know that product)
      • You can connect to your data source from the cloud, without creating a local data model and sending it to the cloud
      • Q&A is integrated in the new user interface – the web site is a powerbi.com domain, it seems not in SharePoint
      • Q&A generates the report in HTML5, no Silverlight signs here
      • The entire editing is done in a web browser – a preview of that was presented at PASS BA Analytics keynote, this seems a more refined version (still not available, however)
      • TreeMap is available as a new visualizations
      • You can upload an Excel file from your disk or from OneDrive – just Excel file, no Power Pivot data model required (it is created on the fly on the cloud?)
      • Combo chart combining line and bar chart visualization available
      • Private preview now, public preview available soon
      • Request access to public preview on http://solutions.powerbi.com
    • Azure ML is publicly available for free in trial mode

    The Power BI story seems the real big news. Combining this with the fact that you can query *existing* on-prem databases on Analysis Services without moving them on the cloud opens up interesting scenarios. Many questions now about when it will be available and how it will be deployed. Interesting times ahead.

  • Power Query support for Analysis Services (MDX)

    Today at TechEd Europe 2014 Miguel Llopis made the first public show of Power Query support for Analysis Services.

    This is still not available, but it should be released soon (hopefully it will be our Christmas gift!).

    Here is a list of features shown:

    • It should be able to query both Multidimensional and Tabular
    • Generates query in MDX (no DAX by now)
    • Load one table at a time (but a query can mix dimensions and measures)
    • Shows dimensions, measures, hierarchies and attributes in Navigator
    • Use the typical Power Query transformations working on a “table” result
    • You import one table at a time

    I think the last point deserves an explanation. When you write a query in Power Query, the result is a single table. If I want to build a Power Pivot data model getting data from an existing cube in Analysis Services, but with a different granularity, I have to run one query for each dimension and one query for the fact table. Depending on the definition of the cube, this could be easier or harder, because original columns could have been hidden because measures are exposed instead. Moreover, the result of a measure that is not aggregated with a sum (imagine just an average) could be impossible to aggregate in Power Pivot in the right way.

    Thus, if you want your user to take advantage of Power Query, make sure you expose in a model measures that can be aggregated to compute non-additive calculations (such as an average!)

    Now I look forward for receiving this Christmas gift!

  • The state of Multidimensional and Tabular adoption

    SQL Server Analysis Services is a product that does not have its own product code, being sold as part of SQL Server. I think that also Microsoft does not know (and cannot know) the exact number of users. For this reason, it’s also more difficult to analyze the effective usage of the two different model types: Multidimensional and Tabular. Thus, only indirect observations are possible: books sold, web page reads, web searches, support calls, consulting engagements, and so on.

    Accessing to public data is very hard. For example, it’s hard to discriminate between MDX and DAX web searches (Google Trends is an interesting tool for this kind of things). In fact, MDX is also the name of a car mode, and DAX is the main German Stock Index. As a book and blog author, I have some data that I can share.

    I co-authored books that cover SSAS Multidimensional and we recently released a refresh of the Expert Cube Development with SSAS Multidimensional Models. The main change we had to do compared to the first edition published in 2009 was changing the screenshot. The average “life” of a book covering a software product is 2-3 years. Well, this book kept the same level of sales for 3 years, it decreased sales after the release of Analysis Services Tabular, and it increased sales with the release of the new edition.

    expertcubesales

    This can be a predictable behavior, but what is interesting is that the lifespan of this book is much longer than average (of other software product books) and the technology we talk about is not going to disappear in the market.

    So what about SSAS Tabular? The book Microsoft SQL Server 2012 Analysis Services: The BISM Tabular Model has a shortest sales history, but it has a regular trend, with a slow decrease over time.

    ssastabular-sales

    The spike you see in the first quarter is because of the different distribution of the two publishers, so just consider the trend in the following quarters.

    A common question I receive also in comments to my blog posts is about the future of Multidimensional and MDX. Well, I’m pretty much convinced that MDX will be like C++. Nowadays, many people use C# (myself included, and I have been using C++ for many years). But for certain applications, C++ is the only option. Moreover, MDX is used by other products (Pentaho, SAP HANA, and others), whereas by now DAX is used only by Microsoft products (Power Pivot and SSAS Tabular).

    I think that we will not see a single language/technology replacing everything we are using now. There is an interesting movement towards “R”, but I see it as an additional tool for certain users and usages, and not as something that can replace existing technologies. It is interesting to consider the effects of integration of these technologies, at least from a Microsoft point of view. The recent article on SQLBI about AutoExist and Normalization is just an example of the work we have to do in order to understand how to get the best from this integration, hoping that Microsoft will continue investments on the server side.

    I do not expect major changes in MDX, but I would prioritize a better integration between MDX and DAX in both Multidimensional and Tabular. For example, I would love being able to write a DAX expression for an MDX measure in a Multidimensional model. Imagine the power of writing a leaf-level calculation in DAX, propagating its effects in attribute hierarchies defined in a cube. Very powerful, not easy to master, but really interesting for certain conditions (that I’m not inventing – I’ve actually seen customers that would have used it if it was possible!).

    What about DAX? It’s an interesting language, much easier to learn than MDX for people with a SQL language background. I think it will be improved, we’re still working with a version 1 (remember MDX evolved between 2009 and 2005, adding SCOPE and other features), the only concern is that it is a Microsoft language, whereas MDX has been adopted by a number of companies (and we have many dialects – so don’t expect to see all MDX formulas working the same in all MDX products…).

    So, when I say that I don’t see the need of migrating Multidimensional models to Tabular or other technologies, I have the support of the market adoption, which means there will be tools, support and knowledge. Can Microsoft do more and better for keeping this technology alive? Yes, definitely. Is the Multidimensional world lacking features that would be useful? Yes, but not in the measure new features are required by Tabular now. The next major release of Analysis Services will be important to answer to these questions.

  • A strange behavior of AutoExist for MDX on Tabular models #powerpivot #ssas #tabular

    Alberto Ferrari wrote an interesting article about a strange behavior of AutoExist in normalized data models. I always say that a star schema is the best practice in Power Pivot and Tabular data modeling. The issue described by Alberto is another good reason to avoid snowflake schemas.

    I think that an example is better than many words. Consider this simple measure working in a star schema where all product attributes (such as Category and Subcategory) are in the same denormalized DimProduct table:

    SalesOfBikes := CALCULATE ( [Sales], DimProduct[Category] = "Bikes" )

    If you have a snowflake schema with DimProduct, DimProductSubcategory and DimProductCategory tables, you have to write a much longer and complex DAX formula in order to obtain the same result:

    SalesOfBikes :=
    CALCULATE (
        [Sales],
        DimProductCategory[EnglishProductCategoryName] = "Bikes",
        CALCULATETABLE (
            DimProductCategory,
            FILTER (
                ALL ( DimProductSubcategory ),
                IF (
                    ISFILTERED ( DimProductSubcategory[EnglishProductSubcategoryName] ),
                    CONTAINS (
                        VALUES ( DimProductSubcategory ),
                        DimProductSubcategory[ProductSubcategoryKey], DimProductSubcategory[ProductSubcategoryKey]
                    ),
                    TRUE
                )
            )
        )
    )

    Which seems crazy, and actually it is…

    The reasons are interesting and well described in the AutoExist and Normalization article on SQLBI.

  • Questions and Answers about SSAS Tabular Models #ssas #tabular

    I recently delivered the online session Create your first SSAS Tabular Model at 24HOP Pass Summit, which recording is now available here. I received many questions and did not have enough time, so I answer now in this blog post.

    How do you prevent a user from aggregating certain measures where the result would be invalid (example: unit margin %)?
    In Tabular you do not have the notion of “aggregation”. Every measure evaluates a DAX expression in a particular filter context. Imagine this as a SELECT statement in SQL that consider only the rows filtered by a WHERE condition. You might “remove” the visualization of a value in a measure by using DAX functions such as ISFILTERED, ISCROSSFILTERED and HASONEVALUE. Since you do not have the ability to “intersect” expressions in different dimensions as you can do in MDX using Tool dimensions, you do not have the problem of invalidate certain combinations (such as “Unit” and “Margin %”.

    How do you incrementally refresh this new Tabular model that was created?
    You can add data to an existing partition, create new partitions or reprocess existing partitions.

    Can you connect to SSRS report from Power View to drill down to the actual data rows?
    Currently this is not available. In general, you cannot add link to external URLs in the current version Power View.

    Can you create those views within the model instead of in the database, like you can with cube creation?
    No, in Tabular you do not have the notion of “Data Source View” like the one you have in Multidimensional. By the way, I do not consider a “best practice” embedding queries in a DSV in Multidimensional. If you can create views on the relational database, you simplify troubleshooting in case of data quality issues reported by end users (any DBA can check the views used, even without any knowledge about BI development tools).

    Can many to many relationships work in tabular model?
    In this version, you cannot create a many-to-many relationship directly in the data model, but you can apply many-to-many relationships in DAX formulas. The good news is that performance are usually faster than equivalent models in Multidimensional.

    Any changes for Tabular model between 2012 and 2014 versions of SSAS?
    No, there are no changes in Analysis Services between 2012 and 2014. SQL Server 2014 is a release that added new features only to the relational engine of SQL Server.

    Can you give a few examples of benefits over using multi-dimensional cubes?
    Tabular is easier to use, it is usually faster and it requires no maintenance (Multidimensional requires maintenance of aggregations as data volume grows and data distribution changes). Multidimensional has features not available in Tabular (e.g. custom rollup formulas, MDX Script, dynamic formatting for measures).

    How about the role playing dimensions concept can work in Tabular model? Is there an equivalent of role-playing dimensions in Multidimensional in Tabular?
    Role-playing dimensions are a usually a bad idea in Multidimensional, because you cannot rename hierarchies, attributes and member names. This result in confusing pivot table when you browse the data, considering that the only difference is the dimension name, which is not directly visible in the pivot table itself.
    Tabular does not support role-playing dimensions, but you can overcome that limitation by using DAX, enabling inactive relationships for specific calculations (you can define multiple relationships between the same tables in Tabular). However, if you want to offer navigation in different role-playing dimensions to the user, the best practice for both Multidimensional and Tabular is to import the same table multiple times, renaming data and metadata.

    Can you install regular SSAS and the Tabular model on the same server?
    Yes, you can install several instances of SSAS on the same server. You run the setup multiple times and choose for each instance whether it has to run as Multidimensional or Tabular. You make this choice during the setup.

    Can you use a server based Tabular model for O365 Power BI sheets? (on premise data – SharePoint cloud spreadsheets)
    Not yet – at the moment (September 2014) you can only publish a Power Pivot workbook and then refresh it getting on-premise data through the Data Management Gateway.

    Can I create a Tabular Model using a query as the source?
    Yes, but remember that this is not a best practice. Creating SSAS Tabular or Multidimensional models, the best practice is getting data from SQL views, without modifying the query on SSAS side. In this way, the content of a Tabular table will match the result of a view in SQL. This simplifies the maintenance and the support operation. Any DBA can check the result of a SQL view without having to open an Analysis Services project just to figure out where actual data comes from.

    How I can manage access of the data by level (For example: first group have access to 2014 year, but second group have access to all periods)?
    Role-based security allows you to create row-level security for each table in the data model. Thus, you can filter the rows of the tables you want to hide to a certain group of user. You define filter conditions using logical DAX expressions that are evaluated when each user creates a connection to SSAS Tabular model.

    Does the language M has something to do with the SQL server or it is used only in Excel?
    The language “M” is used only by Power Query, which is used only in Excel and in Data Management Gateway today. The “M” language can produce transformations in SQL, but you cannot obtain an “M” version of a SQL query.

    What happens if the source data of a Tabular model is not a star schema?
    A star schema is the best data model for Tabular. However, you can have more complex data models, but keep in mind that more tables and relationships might cause slower performances at query time.

    How do you handle multiple dates in a fact table that you want to attach to a date table?
    This is a question similar to the role-dimension one. You can import the Date table multiple times (possibly by renaming columns and content, reflecting the “role” of the dimension in these names), or you can create multiple relationships between Date dimension and Fact table, activating one relationship for each measure through the USERELATIONSHIP function in a CALCULATE statement.

  • Advanced DAX Workshops in the upcoming months –#dax #tabular #powerpivot

    The adoption of DAX is slowly growing and we can see that from comments to article, blog posts, web site traffic (including DAX Formatter usage). For this reason, the demand for more advanced training about DAX is also increasing. We had great feedback from our last white papers (Using Tabular Models in a Large-scale Commercial Solution and SSAS Tabular as Analytical Engine) and we see that this reflect in a broader adoption. For this reason, also the demand for more advanced DAX content is growing, and we planned a number of courses in the next future.

    I and Alberto Ferrari will travel a lot in the upcoming months to deliver many Advanced DAX workshops, but we still have some more introductive training coming soon.

    This is the list of upcoming Advanced DAX Workshop (only in Europe by now, hopefully US coming next…):

    We will have two pre-conferences at PASS Summit (3-4 November, 2014 – Seattle, WA):

    If you want to learn SSAS Tabular starting from scratch, you have these other options to attend the SSAS Tabular Workshop:

    We are also working on other content for the next months, and remember that we can deliver onsite training if your team is too large and/or too far from one of the public classroom sites. I look forward to see new things announced in the next weeks/months, and I am pretty sure DAX will grow its popularity (despite the lack of a good editor!).

  • Simulate the SQL WHERE condition in DAX–#dax #powerpivot #ssas

    I recently wrote a short article about filtering data in DAX. In the last month I’ve seen a growing number of people with a SQL background going to Tabular, without having a previous experience of MDX. In my opinion, learning DAX is easier if you try to match its features to SQL rather than to MDX. The reality is that DAX is somewhere in the middle, because join between tables can be implicit in a query (like in MDX), but you can create filters that resembles existing “virtual relationships” (like in SQL with explicit JOIN).

    For this reason, I recently restarted writing articles of the series “From SQL to DAX”, and I already published

    I am already working on other articles to enhance the series – feedback and suggestions for the future topics are welcome!

  • MDX Studio refresh

    If you work with MDX, using MDX Studio is a must, but it has not been updated for years. But community is great and Ihor Bobak has been able to apply two fixes: performance counters work with named instance and there is a single version regardless of the client libraries, so you have only one executable working with all available versions of Analysis Services (2005, 2008, 2008 R2, 2012, and 2014).

    You can download this refreshed version of MDX Studio from http://sql.bi/mdxstudio (we also have a nice shortened path now!).

  • Embed Analysis Services Tabular in your service or application #ssas #tabular

    Since 2012 I have seen many companies adopting SQL Server Analysis Services Tabular as the analytical engine for their product or service. I think that this is still a fraction of the companies that might do the same choice. Why? Because…

    The features already existing in Tabular might be enough to justify its adoption as analytical engine.

    This short statements needs more explanations, so I wrote a longer article titles SSAS Tabular as Analytical Engine, which is available also as a downloadable PDF that can be read offline and shared by mail. There are many other companies that are looking for an analytical engine for their applications. I hope that the experiences I tried to share will help these companies to better evaluate whether Tabular could be a model good for their needs or not.

    Have you made this choice? Have you considered (and maybe adopted) other products in one of the described scenarios?
    Feedbacks are welcome!

  • Possible SWITCH Optimization in DAX – #powerpivot #dax #tabular

    In one of the Advanced DAX Workshop I taught this year, I had an interesting discussion about how to optimize a SWITCH statement (which could be frequently used checking a slicer, like in the Parameter Table pattern).

    Let’s start with the problem. What happen when you have such a statement?

    Sales :=
        SWITCH (
            VALUES ( Period[Period] ),
            "Current", [Internet Total Sales],
            "MTD", [MTD Sales],
            "QTD", [QTD Sales],
            "YTD", [YTD Sales],
             BLANK ()
        )

    The SWITCH statement is in reality just syntax sugar for a nested IF statement. When you place such a measure in a pivot table, for every cell of the pivot table the IF options are evaluated. In order to optimize performance, the DAX engine usually does not compute cell-by-cell, but tries to compute the values in bulk-mode. However, if a measure contains an IF statement, every cell might have a different execution path, so the current implementation might evaluate all the possible IF branches in bulk-mode, so that for every cell the result from one of the branches will be already available in a pre-calculated dataset.

    The price for that could be high. If you consider the previous Sales measure, the YTD Sales measure could be evaluated for all the cells where it’s not required, and also when YTD is not selected at all in a Pivot Table. The actual optimization made by the DAX engine could be different in every build, and I expect newer builds of Tabular and Power Pivot to be better than older ones. However, we still don’t live in an ideal world, so it could be better trying to help the engine finding a better execution plan.

    One student (Niek de Wit) proposed this approach:

    Selection :=
    IF (
        HASONEVALUE ( Period[Period] ),
        VALUES ( Period[Period] )
    )

    Sales :=
    CALCULATE (
        [Internet Total Sales],
        FILTER (
            VALUES ( 'Internet Sales'[Order Quantity] ),
            'Internet Sales'[Order Quantity]
                = IF (
                    [Selection] = "Current",
                    'Internet Sales'[Order Quantity],
                    -1
                )
        )
    )

        + CALCULATE (
            [MTD Sales],
            FILTER (
                VALUES ( 'Internet Sales'[Order Quantity] ),
                'Internet Sales'[Order Quantity]
                    = IF (
                        [Selection] = "MTD",
                        'Internet Sales'[Order Quantity],
                        -1
                    )
            )
        )
        + CALCULATE (
            [QTD Sales],
            FILTER (
                VALUES ( 'Internet Sales'[Order Quantity] ),
                'Internet Sales'[Order Quantity]
                    = IF (
                        [Selection] = "QTD",
                        'Internet Sales'[Order Quantity],
                        -1
                    )
            )
        )
        + CALCULATE (
            [YTD Sales],
            FILTER (
                VALUES ( 'Internet Sales'[Order Quantity] ),
                'Internet Sales'[Order Quantity]
                    = IF (
                        [Selection] = "YTD",
                        'Internet Sales'[Order Quantity],
                        -1
                    )
            )
        )

    At first sight, you might think it’s impossible that this approach could be faster. However, if you examine with the profiler what happens, there is a different story. Every original IF’s execution branch is now a separate CALCULATE statement, which applies a filter that does not execute the required measure calculation if the result of the FILTER is empty. I used the ‘Internet Sales’[Order Quantity] column in this example just because in Adventure Works it has only one value (every row has 1): in the real world, you should use a column that has a very low number of distinct values, or use a column that has always the same value for every row (so it will be compressed very well!). Because the value –1 is never used in this column, the IF comparison in the filter discharge all the values iterated in the filter if the selection does not match with the desired value.

    I hope to have time in the future to write a longer article about this optimization technique, but in the meantime I’ve seen this optimization has been useful in many other implementations. Please write your feedback if you find scenarios (in both Power Pivot and Tabular) where you obtain performance improvements using this technique!

  • New training on Power Pivot with recorded video courses

    I and Alberto Ferrari started delivering training on Power Pivot in 2010, initially in classrooms and then also online. We also recorded videos for Project Botticelli, where you can find content about Microsoft tools and services for Business Intelligence. In the last months, we produced a recorded video course for people that want to learn Power Pivot without attending a scheduled course.

    We split the entire Power Pivot course training in three editions, offering at a lower price the more introductive modules:

    • Beginner: introduces Power Pivot to any user who knows Excel and want to create reports with more complex and large data structures than a single table.
    • Intermediate: improves skills on Power Pivot for Excel, introducing the DAX language and important features such as CALCULATE and Time Intelligence functions.
    • Advanced: includes a depth coverage of the DAX language, which is required for writing complex calculations, and other advanced features of both Excel and Power Pivot.

    There are also two bundles, that includes two or three editions at a lower price.

    Most important, we have a special 40% launch discount on all published video courses using the coupon SQLBI-FRNDS-14 valid until August 31, 2014. Just follow the link to see a more complete description of the editions available and their discounted prices. Regular prices start at $29, which means that you can start a training with less than $18 using the special promotion.

    P.S.: we recently launched a new responsive version of the SQLBI web site, and now we also have a page dedicated to all videos available about our sessions in conferences around the world. You can find more than 30 hours of free videos here: http://www.sqlbi.com/tv.

  • DIVIDE vs division operator in #dax

    Alberto Ferrari wrote an interesting article about DIVIDE performance in DAX. This new function has been introduced in SQL Server Analysis Services 2012 SP1, so it is available also in Excel 2013 (which still doesn’t have other features/fixes introduced by following Cumulative Updates…). The idea that instead of writing:

    IF ( Sales[Quantity] <> 0, Sales[Amount] / Sales[Quantity], BLANK () )

    you can write:

    DIVIDE ( Sales[Amount], Sales[Quantity] )

    There is a third optional argument in DIVIDE that defines the result in case the denominator (second argument) is zero, and by default its value is BLANK, so I omitted the third argument in my example.

    Using DIVIDE is very important, especially when you use a measure in MDX (for example in an Excel PivotTable) because it raise the chance that the non empty evaluation for the result is evaluated in bulk mode instead of cell-by-cell. However, from a DAX point of view, you might find it’s better to use the standard division operator removing the IF statement. I suggest you to read Alberto’s article, because you will find that an expression applying a filter using FILTER is faster than using CALCULATE, which is against any rule of thumb you might have read until now!

    Again, this is not always true, and depends on many conditions – trying to simplify, we might say that for a simple calculation, the query plan generated by FILTER could be more efficient – but, as usual, it depends, and 90% of the times using FILTER instead of CALCULATE produces slower performance. Do not take anything for granted, and always check the query plan when performance are your first issue!

More Posts Next page »

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement