T-SQL is a declarative programming language. This means that a T-SQL developer defines what (s)he wants the query to do, not how to do it. The secret sauce that turns the “what” into the “how” is a piece of software called the query optimiser and there are tomes dedicated to how one can make the query optimiser dance to their will. Generally the existence of a query optimiser is a good thing as in most cases it will do a better job of figuring out the “how” than a human being could*.
SSIS dataflows on the other hand are an imperative programming language**, the data integration developer dataflow builds a data pipeline to move data exactly as (s)he desires it to happen. In SSIS, There is no equivalent of the query optimiser.
I’ve often pondered whether there is an opportunity for someone to build a declarative data pipeline, that is, a method for a data integration developer to define what data should be moved rather than how to move it. Over the last few months I’ve come to the realisation that Power Query (formerly known as Data Explorer) actually fits that description pretty well. If you don’t believe me go and read Matt Masson’s blog post Filtering in Data Explorer in which he talks about query folding:
[Power Query] will automatically push filters directly to the source query
Even though we selected the full table in the UI before hiding the columns certain columns, we can see the source query only contains the columns we want
“filters” aren’t the only type of query folding that Data Explorer can do. If you watch the queries, you’ll see that other operations, such as column removal, renaming, joins, and type conversions are pushed as close to the source as possible.
Let’s not idly dismiss this. Query folding is very impressive and is (I think) analogous to predicate pushdown that is done by the query optimiser (for more on predicate pushdown read Nested Views, Performance, and Predicate Pushdown by Dave Wentzel). Does Power Query then have the equivalent of a query optimiser? I would say yes, it does although its more accurate to say that its the underlying query language called M for which this query optimiser exists!
So, we have a declarative data integration language which is surfaced in Power Query and hence can only (currently) push data into an Excel spreadsheet. Imagine if M were extended were extended into a fully-fledged data integration tool that could move data between any heterogeneous source, would that constitute “declarative ETL” and is there a need for such a tool?
I’ll leave that one out there for you to ponder. Comments would be most welcome.
*OK, you might debate whether the query optimiser can do a better job than a human but let’s save that for another day.
**We can debate whether or not SSIS dataflows are a programming language or not but again, let’s save that for another day. For the purposes of this discussion just go with me, OK.