This post is not going to be about SQL Server. I have been reading recently more and more about “Big Data” – very catchy term that describes untamed increase of the data that mankind is producing each day and the struggle to capture the meaning of these data. Ten years ago, and perhaps even three years ago this need was not so recognized. Increasing number of smartphones and discernable trend of mainstream Internet traffic moving to the smartphone generated one means that there is bigger and bigger stream of information that has to be stored, transformed, analysed and perhaps monetized. The nature of this traffic makes if very difficult to wrap it into boundaries of relational database engines. The amount of data makes it near to impossible to process them in relational databases within reasonable time. This is where ‘cloud’ technologies come to play.
I just read a good article about the growing pains of Hadoop, which became one of the leading players on distributed processing arena within last year or two. Toby Baer concludes in it that lack of enterprise ready toolsets hinders Hadoop’s apprehension in the enterprise world. While this is true, something else drew my attention. According to the article there are already about half of a dozen of commercially supported distributions of Hadoop. For me, who has not been involved into intricacies of open-source world, this is quite interesting observation. On one hand, it is good that there is competition as it is beneficial in the end to the customer. On the other hand, the customer is faced with difficulty of choosing the right distribution. In future, when Hadoop distributions fork even more, this choice will be even harder. The distributions will have overlapping sets of features, yet will be quite incompatible with each other. I suppose it will take a few years until leaders emerge and the market will begin to resemble what we see in Linux world. There are myriads of distributions, but only few are acknowledged by the industry as enterprise standard. Others are honed by bearded individuals with too much time to spend.
In any way, the third fact I can’t help but notice about the proliferation of distributions of Hadoop is that IT professionals will have jobs.