It's time for the fifth T-SQL Tuesday, managed this time by Aaron Nelson of SQLVariations. Once again, the deadline came up just too quickly, and I'm on the road this week, so my entry will not be too long. Aaron's topic is reporting and in keeping with my past posts, this contribution will include a history lesson.
Since I first learned SQL, I've always thought of aggregation as a way of producing simple reports. Summary information was frequently all that was needed on an ongoing basis to see what was going on in an organization. Clients would want straightforward summaries of sum of sales per region or average orders per day or per month, or number of products available per category. Once a TSQL programmer was fluent with the GROUP BY operator, these simple reports could be generated very easily.
But what if we wanted just a bit more? What if we to see a summarization of total sales amount by region and salesperson, and also just by region?
Way back in the earliest versions of Transact-SQL before we had any 3rd party report writing products, any results we wanted had to be generated purely through T-SQL. We could use a clause called COMPUTE BY that allowed us to get aggregated values for any column in the SELECT list, and we could have multiple COMPUTE BY clauses for a single SELECT. This functionality gave us some ability to have multiple aggregations in a single query, but it had some drawbacks. First, we couldn't just return the aggregate information… we always had to return the detail rows also. Second, the rows that contained the aggregates ONLY contained the requested aggregates, which were based on a single column in the SELECT list; they did not include a value for the nonaggregated columns. This meant that that the result set was non-relational… the detail rows contained different columns than the COMPUTE BY rows. In fact, it was by understanding COMPUTE BY all those years ago that I learned exactly what was meant by relational result set. [COMPUTE BY is actually still in the product, even thought it produces non-relational results. However, the documentation for SQL Server 2008 indicates that it will finally be removed in the next version. ]
Having an interest in using aggregation for reporting, I was very interested in two new options to GROUP BY that were introduced in SQL Server 6.5. There were the CUBE and ROLLUP extensions. Their addition to the SQL language were described in a paper by Jim Gray in 1997, which you can read here:
This paper was a real eye-opener when I first read it 13 years ago. And it does an awesome job of explaining why a CUBE is called a CUBE. I'm reproducing one figure from this paper here, that shows the genesis of the cube, when grouping by three different 'dimensions'. The example in the paper is a car dealership, and the grouping values are year, make and color of vehicle.
Although I have not begun to explore analysis services or reporting services in any depth at all, I find that a thorough understanding of what is a cube and why we should care allows me to at least understand why these services are such powerful additions to the SQL Server product line.