Consistency Models | Distributed Systems Tutorial

Consistency Models in Distributed Systems

Consistency models define the guarantees a distributed system provides to its clients regarding the order and visibility of writes. They are crucial for building reliable and predictable applications. Stronger consistency models are easier to reason about but often come with performance trade-offs. Weaker consistency models offer higher availability and scalability but require applications to handle potential inconsistencies.

Here's a breakdown of common consistency models, ranging from strong to weak:

I. Strong Consistency Models

These models provide the strongest guarantees, making the system behave as if there's only a single copy of the data.

Strict Consistency: (Theoretical Ideal)
- Guarantee: Any read operation returns the result of the most recent write operation, regardless of which replica is accessed. This implies perfect synchronization between all replicas.
- Implementation: Extremely difficult and impractical to achieve in real-world distributed systems due to the speed of light and network latency.
- Example: None in practical systems.
Sequential Consistency:
- Guarantee: Operations appear to execute in some sequential order, and the operations of each individual process appear in the order specified by its program. This means there's a single, global total order of operations.
- Implementation: Requires global synchronization mechanisms. Can be achieved with techniques like two-phase commit (2PC).
- Example: Early versions of some database systems aimed for this, but it's often too restrictive.
Linearizability (Atomic Consistency):
- Guarantee: Every operation appears to take effect instantaneously at some point between its invocation and completion. It's as if there's a single, atomic operation. This is the strongest practical consistency model.
- Implementation: Requires strong coordination between replicas, often using consensus algorithms like Paxos or Raft.
- Example: ZooKeeper, etcd, some distributed locks. Often used for critical metadata.
- Analogy: Imagine a single, central register. All writes and reads go through this register, ensuring a single, consistent view.

II. Eventual Consistency Models

These models relax the strong consistency guarantees, prioritizing availability and scalability. Data will eventually become consistent across all replicas, but there's a period of inconsistency.

Causal Consistency:
- Guarantee: If operation A happened before operation B (causally related - e.g., B read the value written by A), then every replica must see A before B. Operations that are not causally related can be seen in different orders on different replicas.
- Implementation: Requires tracking causal dependencies between operations.
- Example: DynamoDB (with some caveats), some version control systems.
- Analogy: If Alice sends a message to Bob, all replicas must see Alice's message before seeing Bob's reply.
Read-Your-Writes Consistency:
- Guarantee: After a process performs a write operation, any subsequent read operation by that same process will return the value of the write.
- Implementation: Often achieved by routing reads from a process to the replica that handled its last write.
- Example: Amazon S3, many web applications.
- Analogy: If you update your profile, you'll immediately see the updated profile when you refresh the page.
Session Consistency:
- Guarantee: Within a single client session, reads will see the writes performed during that session. Different sessions may see different versions of the data.
- Implementation: Often implemented using sticky sessions or session IDs to route requests to the same replica.
- Example: Many e-commerce websites.
- Analogy: Items added to your shopping cart during a session will be visible to you throughout that session.
Weak Consistency:
- Guarantee: No guarantees are made about the order or visibility of writes. Data will eventually become consistent, but there's no bound on how long that takes.
- Implementation: Simple replication with asynchronous updates.
- Example: DNS, some content delivery networks (CDNs).
- Analogy: Changes to a website's DNS record may take time to propagate across the internet.

III. Other Important Considerations

Monotonic Reads: If a process reads a particular value, subsequent reads will never return an older value. (Guarantees that you won't "go back in time").
Monotonic Writes: Writes by a process are applied in the order they were issued. (Guarantees that your writes are applied in the order you intended).
Write Follows Read (WFR): A write operation will not be lost due to a subsequent read. (Useful for preventing lost updates).

Trade-offs & Choosing a Consistency Model:

Consistency Model	Strength	Availability	Performance	Complexity	Use Cases
Linearizability	Strongest	Lowest	Lowest	Highest	Financial transactions, distributed locks
Sequential Consistency	Strong	Low	Low	High	Rarely used in practice
Causal Consistency	Medium	Medium	Medium	Medium	Version control, social networks
Read-Your-Writes	Medium	High	High	Medium	User profiles, shopping carts
Session Consistency	Medium	High	High	Medium	E-commerce, web applications
Eventual Consistency	Weakest	Highest	Highest	Lowest	DNS, CDNs, large-scale data storage

Key Takeaways:

CAP Theorem: In a distributed system, you can only guarantee two out of three: Consistency, Availability, and Partition Tolerance.
Application Requirements: The choice of consistency model depends heavily on the specific requirements of the application.
Trade-offs: Stronger consistency models come with performance and availability trade-offs. Weaker consistency models offer better scalability but require applications to handle potential inconsistencies.
Understanding the Guarantees: It's crucial to understand the guarantees provided by each consistency model to build reliable and predictable distributed systems.

This provides a foundational understanding of consistency models. Further research into specific algorithms (Paxos, Raft, etc.) and implementation details is recommended for deeper understanding.