To my mind the most interesting piece of news to come out of the recent PASS conference was the unveiling of a new SQL Azure Labs project coming from the SQL Server organisation that has the codename "Data Explorer" (not a very imaginitive codename I'm sure you'll agree) and for which there is information available at http://www.microsoft.com/en-us/sqlazurelabs/labs/dataexplorer.aspx (in case you've surfed on here a few months on from when I originally wrote this blog post you should expect that that that URI will have become a dead link).
My good buddy Chris Webb (blog | twitter) has already blogged about Data Explorer at Pass Summit 2011 - Day 1 Keynote and Self-Service ETL with Data Explorer in which he made a very telling observation:
It allows you to mash up data from various different sources then publish the result as an OData feed – very similar to Yahoo Pipes
I couldn't agree more with that assertion. I blogged about Yahoo Pipes over four years ago at Taking Yahoo Pipes for a test drive and I referred to it then as "ETL for RSS feeds"; it interested me greatly because here was a tool that enabled non-developers to pull data from multiple sources and make it available as a single data source that could be easily consumed; moreover it ran as a cloud service which has also long been an interest of mine. Granted, it only did this for RSS feeds but the premise was still really interesting to me; I believe that making data easily consumable is far more important than the tool chosen to consume it hence why I'm such a massive advocate of iCalendar for BI and why you'll rarely find me talking about the likes of Business Objects, Cognos, Qlikview, Tableau and Power View on this blog (no disrespect intended to those tools or the people that use them - they're just not what floats my boat).
Where Yahoo Pipes consumes RSS feeds and provides RSS feeds, Data Explorer consumes from loads of different places and provides OData feeds (something I've been banging on about for a while now) and if you're in the Microsoft ecosystem OData is increasingly looking like the lingua franca for platform and device independent data integration. Moreover, according to recent blog post Creating a custom RSS reader in Montego (cloud) by project lead Tim Mallalieu Data Explorer will also be able to pull data directly out of web pages and that is stepping firmly into the territory of Kapow which, again, is a tool that Chris and I have blogged about before at Kapow – ETL for HTML and Kapow Technologies. Chris referred to Kapow as:
a cross between a screenscraper and an ETL tool
and again I wouldn't disagree. Data Explorer looks like filling the missing link that I was alluding to in the final paragraphs of my June 2009 blog post Enterprise Mashups.
Are you spotting a common theme here? Data Explorer is an ETL tool and given my obvious SSIS affiliations that makes it very interesting to me. That it runs as a cloud service and will be available to non-developers only makes it more intriguing and I can't wait until Data Explorer becomes available for us to tinker with later this year. No doubt Chris will be keeping a watching brief too.
@Jamiet
UPDATE:Some further thoughts...
It would be interesting to see what else could be done with this data once its exposed as a feed. I'll wager that in the not too distant future you'll be able to (for example) sell the output from your Data Explorer mashup on Azure Datamarket or view geocoded feeds on Bing Maps (note that Geospatial support is coming to OData in the very near future). There are lots of possibilities I'm sure and I'm looking forward to seeing what ideas others have for using and sharing this data.
I'm also wondering whether there will be an option to host Data Explorer (and hence Data Explorer mashups) inside the enterprise. Today most enterprise data is contained within the corporate firewall thus will not be accessible from a Data Explorer service provided via SQL Azure; it would be a shame if such data could not be accessed by Data Explorer and hence why I hope there will be an on-premise version available. I can think of many scenarios at my past clients where the ability to easily make data consumable over HTTP and behind the firewall would have been invaluable.