THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Aaron Bertrand

Aaron is a senior consultant for SQL Sentry, Inc., makers of performance monitoring and event management software for SQL Server, Analysis Services, and Windows. He has been blogging here at sqlblog.com since 2006, focusing on manageability, performance, and new features; has been a Microsoft MVP since 1997; tweets as @AaronBertrand; and speaks frequently at user group meetings and SQL Saturday events.

Blogging from the PASS Summit Keynote : Day 3

This is a rough morning. Got in at 2:00 AM, then had no water in my hotel this morning. I doubt I will ever voluntarily stay at the Seattle Hilton again. Still, I'm very much looking forward to this keynote - the illustrious Dr. David DeWitt is going to make my brain hurt today!

First, though, Rick Heiges introduces Rob Farley and Buck Woody, who walk out playing a pretty funny acoustic song about query tuning. Then there was a touching tribute to Wayne Snyder, who is retiring after this year.

SQL Rally Dallas will be held May 11th-12th. PASS Summit 2012 will be held November 6-9 in Seattle. Check out the early registration prices valid until November 15th.

Dr. David DeWitt takes the stage and starts his keynote, entitled, "Big Data: What's the Big Deal?" He says that Big Data is 10's of petabytes. I take that as, "Stop saying 'huge' when talking about your 40 GB table." A zetabyte (ZB) can be thought of as a quadrillion megabytes, a trillion gigabytes, or a billion terabytes. No matter your interpretation, that's a lot of data 35ZB, the amount of data we should have by 2020, can be represented by a stack of DVDs halfway to Mars. Some breakdowns of properties managing big data:

  • eBay: 10 PB, 256 nodes
  • Facebook: 20PB, 2,700 nodes
  • Bing: 150PB, 40,000 nodes

NoSQL does not mean "SQL should never be used" or that "SQL is dead." What it means is "Not Only SQL." He talks about the benefits of NoSQL and how it trades consistency for availability. Relational databases provide maturity and stability at the cost of flexibility. Look back at the comparison above: eBay, with roughly half the data Facebook has, requires 10% of the computing power. He explains that relational databases are not going away; we will all still have jobs regardless of how popular NoSQL gets. Dr. DeWitt promises this, and I'm going to hold him to it.

Hadoop and MapReduce offer scalability, high degree of fault tolerance, relatively easy programmability, efficient data analysis, lower up-front software / hardware costs (but not necessarily lower TCO).

HDFS = underlying file system for Hadoop, assuming failures are common. Write once, read multiple times. Actually on NTFS - 64MB blocks. A block is replicated twice - first copy on node that creates the file, second is on another node under the same switch (same rack), third on a different rack. This maximizes risk tolerance at lowest performance cost. He gives a great explanation of the Hadoop fault tolerance model, MapReduce, HiveQL, Hive vs. PDW, Sqoop (a bridge between Hadoop and RDBMS). But I did not even think about trying to reproduce that here. If you didn't see the keynote (live or streaming), you should definitely consider the DVDs.

For next year's keynote, I voted for "Main Memory Database Systems." You can tell him what you want to hear about at dewitt@microsoft.com.
 

Published Friday, October 14, 2011 9:00 AM by AaronBertrand

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Leave a Comment

(required) 
(optional)
(required) 
Submit

About AaronBertrand

...about me...

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement