Reference
educative - Dynamo
AWS Official Docs
Introduction
Dynamo is a highly available key-value store.
Dynamo is used at Amazon to manage services that have very high-reliability requirements and need tight control over the trade-offs between availability, consistency, cost-effectiveness, and performance.
Key Words
Distributed Hash Table (DHT)
Consistent Hashing
Pptimistic Replication
Eventual Consistency
MD5 hashing algorithm
Sloppy Quorum
Gossip Protocol
Hinted Handoff
Vector Clock
Merkle Tree
Requirements
Functional
-
put(key, context, object)
- The put operation finds the nodes where the object associated with the given key should be stored and writes the given object to the disk.
- The context is a value that is returned with a get operation and then sent back with the put operation. The context is always stored along with the object and is used like a cookie to verify the validity of the object supplied in the put request.
-
get(key)
- The get operation finds the nodes where the object associated with the given key is located and returns either a single object or a list of objects with conflicting versions along with a context.
- The context contains encoded metadata about the object that is meaningless to the caller and includes information such as the version of the object.
Non-Functional
- Highly available
- Highly scalable
- Completely decentralized
- Eventually consistent and low latency
High Level Design
Distributed Hash Table (DHT
Dynamo is a Distributed Hash Table (DHT) that is replicated across the cluster for high availability and fault tolerance.
Data Paritioning
Chanllenges
- How do we know on which node a particular piece of data will be stored?
- When we add or remove nodes, how do we know what data will be moved from existing nodes to the new nodes?
- Furthermore, how can we minimize data movement when nodes join or leave?
Solution
- Consistent Hashing
- Optimistic Replication
- Preference List
- The list of nodes responsible for storing a particular key.
Follow-up
Handle temporary failures
Sloppy quorum
- All read/write operations are performed on the first N healthy nodes from the preference list, which may not always be the first N nodes encountered while moving clockwise on the consistent hashing ring.
- Consider the example of Dynamo configuration given in the figure below with N=3.
- In this example, if Server 1 is temporarily down or unreachable during a write operation, its data will now be stored on Server 4.
- Thus, Dynamo transfers the replica stored on the failing node (i.e., Server 1) to the next node of the consistent hash ring that does not have the replica (i.e., Server 4).
- This is done to avoid unavailability caused by a short-term machine or network failure and to maintain desired availability and durability guarantees.
- The replica sent to Server 4 will have a hint in its metadata that suggests which node was the intended recipient of the replica (in this case, Server 1).
- Nodes that receive hinted replicas will keep them in a separate local database that is scanned periodically.
- Upon detecting that Server 1 has recovered, Server 4 will attempt to deliver the replica to Server 1.
- Once the transfer succeeds, Server 4 may delete the object from its local store without decreasing the total number of replicas in the system.
-
Hinted handoff
- When a node is unreachable, another node can accept writes on its behalf.
- The write is then kept in a local buffer and sent out once the destination node is reachable again.
Handle conflicting data
Vector Clock