Today, we go deeper into the rabbit hole. We will explore concepts to equip you to have your own opinions about which kinds of databases suit your specific needs. And this is important to stress up front: there is no right answer. Each database carries its own set of tradeoffs and advantages. If something looks like a “one-size-fits-all” solution, be careful: you might be missing something!

Before we begin, it’s important to highlight that what we loosely refer to as “databases” are actually “Database Management Systems (DBMS).” A DBMS is a piece of software that enables the user to more ergonomically write, read, delete, or update information in a given set of data. For this series, we will focus mainly on Relational and Non-Relational DBMSs. There are many other types, all categorized by their data structures, but relational and non-relational are the most common for web development by any measure.

Both Relational (R) and Non-Relational (NR) DBMS have different terms for the parts that compose them. Such components are almost interchangeable in definition, and that’s why you can commonly hear a developer referring to a Document (NR term) as “Table,” which is its Relational equivalent structure. Don’t be afraid of confusing them; they appear often enough for this cognitive overload to disappear quickly with usage. Additionally, once you get more familiar with the differences of each data structure, you will realize they probably shouldn’t be used interchangeably because there are differences among them. But for now, and for the sake of simplicity, let’s focus on the similarities:

To use a more familiar example, let’s take a JSON object as an example:

Given that JSON, a column would be represented by the key-value pair (column name has value “Atila”), and the key follows the same meaning as the JSON spec, key name will access the value “Atila.”

In addition to the above definition, there are special kinds of keys. Such keys play a special role in the schema of your database and how you will interact with your data. Define these wisely: any minimal change to them can be considered a breaking change to your data layer.

Example of Foreign Key: if each author in an authors table has posts (and posts is another table), post.title will be a FK at authors. And the junction between authors and posts is called a relation.

Once you look at the different data structures and choose your DBMS, you are ready to draw the first connection from your data layer to your application layer. And suddenly, you noticed it isn’t quite straightforward to bring data from the database to your client-side (or even to your server-side API in some cases).

In this part of our series, we are learning what to look for when choosing the stack for our data layers. It is essential to understand common concepts around maintaining and choosing a DBMS (from here on out, we’re back into referring to them as “databases” to remain consistent with the rest of the world).

The next sections will not go so deep that you jump out and find a job as Database Administrator (DBA), but hopefully, it will offer you enough ammo to engage in conversation with experts and identify the best solution for your use cases. These concepts are common for every kind of data layer, from a spreadsheet to a self-hosted database and even to a serverless database. What will vary is how each solution will balance the variables intertwined in these paradigms.

The most important concepts, to begin with, are Consistency, Availability, and Partition Tolerance. They’re better understood if presented together because the balance between them will guide how predictable your data is in different contexts.

This theorem describes the relationship between 3 components in a distributed system: Consistency, Availability, and Partition tolerance (C.A.P). The overall conclusion is easy to summarize: any system is only able to contemplate 2 of these components at the same time. Though just a simple sentence, this idea requires a bit of unpacking.

Within the CAP Theorem constraints, “consistency” refers directly to the data. When different clients make the same request, they will get the same response. When a written request is accepted and confirmed, all users will have access to this updated information at the same time.

Every request will receive a response with data. No errors. However, this comes with no promises on whether the data is up to date or not. Depending on whether this is paired with consistency (C) or partition tolerance (P), you will get different behavior.

A “partition” is when the connection between two nodes in a system is broken. The CAP Theorem essentially describes how a system will tolerate the partition: enforcing availability (AP system) or enforcing consistency (CP system). Regardless of how small a system is, partitions can always happen. Therefore there is no such thing as a CA system.

To provide a more graphical example of each system, we can consider:

Though popular and very often referred to, the CAP Theorem can be considered incomplete because it doesn’t consider situations beyond the network partition. Especially today, with high-availability content delivery networks (CDN), and extreme connectivity, it’s crucial to take into consideration latency and deeper aspects of consistency (linearizability and serializability).

It’s funny how “consistency” is a term that switches meanings based on the context. So, in the CAP Theorem, as we just saw, it’s all about data and how reliably a user will get the same response to the same request regardless of which partition they reach. Once we take a step away from our own system, a new perspective makes us ask: “how consistently will we handle concurrent operations?”. And just like that, the CAP Theorem does not sufficiently describe the intricacies of operating at scale.

The first three letters are the same as CAP, just reordered. The “consistency” in the PACELC Theorem goes deeper than in the CAP Theorem. It follows the Consistency Model, which determines the contract between a data store and a system. To make things less complicated, consider the PACELC an extension of the CAP Theorem.

Beyond asking the developer to strategize for the event of a network partition, the PACELC also considers what happens when the system has no partitions (healthy network).

ELC stands for Else: Latency or Consistency?

Because of this, I consider the PACELC carries a better mental model for this new era of Serverless Databases. When healthy, it’s not a scalar classification “AP” or “CP”; it accepts a spectrum because latency can be high or not, and consistency can have different levels as well.

Once we start talking about the types of databases and their data structures and which guarantees each can give, we will also talk about a bit of system architecture and how your architecture can reduce the tradeoffs when even by enforcing consistency, you can strive for a low latency scenario.

With that, I believe we are ready to start narrowing down our discussions on the differences between each database type. Moreover, it’s possible to discuss and level expectations on each solution taken and analyze architectures individually. From now on, we will focus on Relational and Non-Relational databases.

In a few days, on our season finale, we will cover the differences between NoSQL and SQL: what guarantees to expect, what that means to your data, and how that affects development workflow. Then we will be ready to jump right into Serverless Databases and what to expect going forward. I can’t wait!