Relational?!? Move On, Geezer!
Maybe you're thinking that relational databases management systems (RDBMSs), like Microsoft SQL Server and Oracle, are going the way of punched cards and rotary phones. After all, there's been a lot of hype these days in the IT media about the rise of so-called NoSQL (Not Only SQL) databases. Many new and upcoming CS and MIS graduates who like working with data might think that relational databases are, at best, soon-to-be legacy systems and, at worst, are a career dead-end.
It's true that all the cool-cat computing services (Amazon, Facebook, Google, Pinterest, etc) are indeed making heavy use of NoSQL technology. They're also making heavy use of traditional RDBMS'es too. In fact, some of the world's biggest users of SQL databases are hand-in-hand the biggest users of NoSQL databases. The reason for that is that both types of data platforms are exceeding good at specific types of data storage and data processing. They also have their own unique weaknesses too. Meaning, each platform has a sweet spot and a weak spot, and that none are a 100% panacea for all imaginable data processing scenarios. Take a look at this article by my friend and former colleague, Guy Harrison - 10 Things You Should Know About NoSQL Databases, for a good discussion on the pros and cons of NoSQL in comparison to SQL data platforms.
Timewarp! Let's Take a Look Back at Why Relational Databases Were Needed.
These days, relational database management systems (RDBMSs) like Microsoft SQL Server and Oracle are the primary engines of information systems everywhere, particularly for enterprise computing systems and web applications. Though RDBMSs are now common enough to trip over, it wasn’t always that way. Not too long ago, you would probably trip over hierarchical database systems, or network database systems, or flat-file systems (heck, that still happens in many government IT shops who still use COBOL). A quick-and-dirty definition for a relation database might be: a system whose users view data as a collection of tables related to each other through common data values.
Perhaps you are interested in more than a quick-and-dirty definition for the term relational database? Here goes. The whole basis for the relational model follows this train of thought: data is stored in tables, which are composed of rows and columns. Tables of independent data can be linked, or related, to one another if they each have columns of data that represent the same data value, called keys. This concept is so common as to seem trivial; however, it was not so long ago that achieving and programming a system capable of sustaining the relational model was considered a longshot with limited usefulness.
Relational data theory was first proposed by E.F. Codd in his 1970 paper to the ACM entitled “A Relational Model of Data for Large Shared Data Banks”. Soon after, Codd clarified his position in the 1974 paper to the Texas Conference on Computing Systems entitled “The Relational Approach to Data Base Management: An Overview”. It was in this paper that Codd proposed the now legendary 12 Principles of Relational Databases.
If a vendor’s database product didn’t meet Codd’s 12 item litmus tests, then it was not a member of the club. Note that the rules do not apply to applications development. Instead, these rules determine whether the database engine itself can be considered truly “relational”. These rules were constructed to support a data model that would ensure the ACID properties of transactions and also eliminate a variety of data manipulation anomalies that frequently occurred on non-relation database platforms (and still do occur on non-relational database platforms). (As an aside, the transactional paradigm was conceived by my hero, Gray, Jim in 1981 while at Tandem Computer and presented in the paper "The Transaction Concept: Virtues and Limitations").
Codd’s 12 Rules for a Truly Relational Database System
Are you curious about Codd’s 12 Principles of Relational Databases? Don’t be ashamed that you don’t know them by heart; few technology professionals do, and no one on the marketing staff of technology companies do. However, the few folks who do know these principles by heart treat them like religious doctrine, and would likely be mortified by their “lightweight” treatment here. Nevertheless, I'll give them to you in my own paraphrasing:
- Information is represented logically in tables.
- Data must be logically accessible by table, primary key, and column.
- Null values must be uniformly treated as “missing information” not as empty strings, blanks, or zeros.
- Metadata (data about the database) must be stored in the database just as regular data is.
- A single language must be able to define data, views, integrity constraints, authorization, transactions, and data manipulation.
- Views must show the updates of their base tables and vice versa.
- A single operation must be able to retrieve, insert, update, or delete data.
- Batch and end-user operations are logically separate from physical storage and access methods.
- Batch and end-user operations can change the database schema without having to recreate it or applications built upon it.
- Integrity constraints must be available and stored in the metadata, not in an application program.
- The data manipulation language of the relational system should not care where or how the physical data is distributed and should not require alteration if the physical data is centralized or distributed.
- Any row-processing done in the system must obey the same integrity rules and constraints that set-processing operations do.
If you know much about SQL, then you probably recognize immediately that SQL ended up fulfilling rules #5, #7, #11 and possibly more. Others of the rule are manifest in the system tables of a relational database, such as DMVs in Microsoft SQL Server and V$ and X$ views in Oracle.
There is some debate about why relational database systems won out over hierarchical and network database systems back in the late 1980's and early 1990's, but a couple of reasons seem self-evident. First, the high-level language interface (SQL) is much simpler to learn and more intuitive than that mishmash of languages supporting non-relational databases. (In fact, the lack of something like SQL is a hindrance to adoption of many NoSQL database platforms). Second, relational databases provide efficient and intuitive data structures that easily accommodate ad-hoc queries and reporting. People just intuitively understand the value of storing data in tables. From phone books to hotel registries, relational databases (of the paper sort) are second nature to most people. Third, relational databases provide powerful integrity controls such as check constraints and referential integrity - thus providing higher quality data. And high quality data is near and dear to the heart of CFOs around the world.
In fact, the strength that relational databases demonstrate with data quality, consistency, and durability are the same reasons that they'll be with us - quite possibly - forever. So were NoSQL databases excel at storing data that is moderately important and requires eventual consistency, SQL database excel at storing data that is of paramount importance and requires immediate consistency. As long as we're exchanging money, there's a need for relational database technology and ACID transactions.
And, just my opinion here, but database administration is currently, and will continue for decades to be, an excellent career choice. Why? First, although databases are widespread, good databases are not. So there's always need for those who can tune, troubleshoot, and optimize what is currently in the marketplace. Second, just because database are widespread doesn't mean that they're everywhere they need to be. Some estimates gauge that only half of the enterprises that need SQL databases actually use SQL databases. Imagine if only half of the citizenry wore shoes, and of the half that wore shoes, only half of them wore both shoes and consistently tied them. It'd be a good time to be a maker of loafers! Well, that's where we're at today with relational databases.
So what do you think? Am I off the mark on the longevity of relational database? Do you think the sun has set on them? Will they be smashed, degraded, and humiliated by NoSQL database platforms? Or will they stand shoulder-to-shoulder with a variety of data platforms in the years to come?