This is the third part of a series (Part 1
, Part 2
) thinking out loud about the decision making around data access for applications. Once you've considered how tightly bound your application code can safely be to tables, I would like to put two related thoughts out there.
First, it's counterproductive, over the long term, to think of only the application as the focus of all development effort, and the database just as "back end" or "persistence." In the current environment where practically all applications are OO, and all storage is relational, you will have two models to design and maintain, one object oriented and one relational. Is that ideal? Perhaps not, but ignoring one or the other, or wishing the distinction away only causes future pain and suffering.
Second, getting to effective performance and scalability is 90% smart design and only 10% capable hardware.
In case the connection between these statements is not obvious, consider the following pattern of failure that's repeated in IT departments in businesses everywhere:
- Developers and SME's or Domain Experts in an ISV work out super duper code that exactly fits business needs and wows users with elegant interface design and indispensable features. Sweet!
- Caught up in the heady fun of design and coding C# (or pick a language), they grudgingly come to realize that it'll be necessary to store the data from their app. In a halfhearted way, they figure out how to stuff the data into a database. "Darn that antiquated SQL and those restrictive tables! I need polymorphism! I need inheritance!" Some fundamental blunder is made like choosing EAV, or hopelessly wide tables of nullable columns, or a schema that mixes unrelated rows in the same huge, index-hostile table, or something else that cripples an otherwise very capable RDBMS.
- As product development continues through version 2 and version 3, the schema of the database becomes hopelessly fixed, as the app code is glued tighter and tighter to every detail of the database schema. The product sells well and gains market (again, it's awesome-looking from the front end, and has great features) and with the increased market share come some large customers.
- The product then hits a wall, whose foundation was laid back in step 2. The problem is unfixable, short of a rewrite, and large customers are gravely unhappy.
- Some hapless DBA working for a large customer is then spec'ing an unbelievably overpowered SQL server, to overcome, by brute force, the unsolvable design issue. Said DBA is shaking his/her head in disbelief. Dev's on the project start yelling "Tokyo Cabinet! SQL Server doesn't scale!"
- Because scalability via good coding is cheap, and scalability by massive hardware is not, a whole pile of cash is wasted.
Here I'm afraid I have to resort to a cornball analogy, so please bear with me. Today's apps are always two-headed beasts, like two dancers. Think Apolo Anton Ohno & Julianne Hough doing the Paso Doble. They only work well when, at some level, both the database and the application know how to boogie. One might lead, maybe the other follows, but they both have to dance. And the failure illustrated above is simple: one partner can dance and the other one can't. Maybe when they audition doing the hokey-pokey with 10,000 rows of data, everything's fine, but when it comes time for the Big Time Ballroom Finals, it makes no difference how well one dancer can dance if the other one is constantly tripping and falling.
When examined closely, the critical mistake is actually two-fold: first, the design of one of the two required data models (relational and OO) was ignored in preference for the other. Further, the application was bound tightly to the database schema such that any correction to that schema becomes a large-scale, breaking change. The team is truly backed into a corner.
To avoid this common disaster scenario, it's important, then, to take two precautions:
- Don't neglect the database design. Modern RBDMS's work well, and they do scale when designed correctly; it's easy, however, to make a schema that circumvents the very features that make databases perform. The relational model in your two-model system might not be an exact match for the OO object graph, if the system is to perform well. If your team is dev-centric, and you don't have a database wonk in house, find one anyway. At least have someone capable advise about the structure of the DB, before the system is stuck with an unworkable schema.
- Even if you don't use the database itself to encapsulate data access (see the previous post), do encapsulate it somehow in a clean, separate layer of code, so that the database has the possibility to be re-factored. Use something like the repository pattern. If you use an object/relational mapper, make sure it can adequately abstract the details of the database schema so that changes are possible over time. Review your design with the phrase, "How would I change this code if the underlying table changes in the database..."