Tech Blog‎ > ‎Geek corner‎ > ‎

The C in ACID

posted Feb 16, 2016, 1:41 PM by Unknown user
Everyone who works with databases is familiar with the acronym ACID, which lists the attributes of a proper transaction. It should be:
  • Atomic
  • Consistent
  • Isolated
  • Durable
We all know about the A and the D -- they're relatively intuitive. Far fewer people truly understand the I, but that's for another article. Today, I'd like to focus on the C. What exactly does it mean for data to be consistent?

Consistent means that the data is in a valid state; in other words, it follows the definition of the schema. For instance, if the column is defined as NOT NULL, it shouldn't ever be null. If it's defined as a foreign key, then the referred object should always exist. The list goes on.

Many databases allow you to go further and define domains. For instance, perhaps the customer's status should be one of Bronze, Silver or Gold, or the customer's age should be between 0 and 125.

These definitions are good and useful because they are easy to declare, and once they are declared, you don't have to think about them. The database is going to do whatever it needs to do to make sure that these definitions remain true, no matter what happens to the data.

For anything more complicated, you typically have to use triggers and stored procedures -- not that there's anything wrong with that, mind you. Triggers and stored procedures participate in transactions, and therefore are part of consistency. In fact, they can be considered to be part of the schema, if you use the term loosely.

But of course, triggers and stored procedures are going to be vendor-dependent, and are often difficult to write and debug. In addition, they add to the database load, which can lead to scalability issues. So the non-trivial logic is often defined in the middle tier, using a language like C#, Java, Python, etc...

There is a big gap between declaring a schema, and writing procedural code. Defining a constraint as part of a schema is (comparatively) easy, and you don't have to explain what it means to the database. For instance, a foreign key definition will automatically cover inserts, updates and deletes. Not only that, but it's also self-documenting: everyone will know what it means.

As soon as you start writing procedural code (whether in triggers and stored procedures, or other languages), you're leaving all that behind, and taking responsibility for a lot of things. You have to make sure that your code does the right thing at the right time, and in particular, you're responsible for dealing with the various dependencies between the various bits of code that you may have. This problem is exacerbated when the logic governing the data is expressed in more than one place. It's not unusual to have some of that logic defined in triggers and stored procedures, some in the middle tier, and (shudder) even some in the presentation layer. Getting a global view of how all this logic works is daunting. Changing any of it can be a frightening proposition, since there may be a lot of non-obvious dependencies that might be tripped by a seemingly innocent change.

Wouldn't it be nice to be able to do more than trivial definitions as part of the schema? What if we could extend schema definition to include higher-level constructs, like complex derivations, aggregates, and multi-table validations? That wouldn't solve all of our problems, but it would allow us to work at a higher level of abstraction.

That's what database reactive programming aims for. We're pushing the declarative aspect of database schemas to a whole new level. By doing so, we want to capture more of the logic as declarations, and less as code.