As soon as I had the possibility to test HDInsight on Azure I promptly started using it. Nothing really exciting, just creating the cluster and do some tests following the official tutorial you can find here:
I’m using my MSDN MVP subscription for which I have 1500 hours of “Compute Services”. Well, be careful to keep your HDInsight cluster turned on. Even if you don’t use it it will consume resources. The the resource created behind the scene is a “LargeSKU” VM, as the detailed usage report that you can download says.
And, wow, look! In just less then a week it has consumed ALL my available hours, and thus my CC started to be drained. Non that much, luckily, just a hundred of bucks (which is also not so few in this times of crisis) and even more luckily I’m very careful to monitor Azure expenses often since I still not trust the pay-per-use system so much to just leave it alone and without constant supervision. And as soon as I discovered that there was something strange going on I shout the HDInsight cluster down immediately.
Unfortunately the expenses are reported under the generic “Compute Hours – Cloud Services” summary so without downloading the detailed billing report and analyzing it with Excel was impossible to me to understand that the resources consumption came from the HDInsight cluster.
That’s why took me two days to understand the problem (at the beginning I thought the problem was my website) but on May 24th I finally understood what was happening.
To be honest this is the only case where I had such bad surprise (I’m also using VMs, Web Roles and SQL azure and I never consumed more then what I expected so far so I’m not blaming MS at all here), but this is a lesson learned that I want to share with the community, hoping to help someone to avoid even worse surprises.
The feedback to MS I’d like to give is that it would be very good if, at least for the “preview” features, there could be a specific settings to decide how much resources they could use before they are automatically shut down.
And the conclusion, on my side, is that, at least for now, playing with Hadoop on this area of the Big Data universe is better done on-premise on the spare machine I have in the office.
Of course what happened could also totally be my fault, even if I just did everything following the tutorial, but if someone with more experience on Azure HDInsight would like to leave a feedback I’ll be more than happy the ear it.
So…keep your eyes open and your money safe
Another feedback to MS: it would be very very very nice if the billing details can be accessed via OData. It would be a perfect match with PowerPivot capabilities!