Design DynamoDB

Reference

educative - Dynamo
AWS Official Docs


Introduction

Dynamo is a highly available key-value store.
Dynamo is used at Amazon to manage services that have very high-reliability requirements and need tight control over the trade-offs between availability, consistency, cost-effectiveness, and performance.


Key Words

Distributed Hash Table (DHT)
Consistent Hashing
Pptimistic Replication
Eventual Consistency
MD5 hashing algorithm
Sloppy Quorum
Gossip Protocol
Hinted Handoff
Vector Clock
Merkle Tree


Requirements

Functional

  • put(key, context, object)
    • The put operation finds the nodes where the object associated with the given key should be stored and writes the given object to the disk.
    • The context is a value that is returned with a get operation and then sent back with the put operation. The context is always stored along with the object and is used like a cookie to verify the validity of the object supplied in the put request.
  • get(key)
    • The get operation finds the nodes where the object associated with the given key is located and returns either a single object or a list of objects with conflicting versions along with a context.
    • The context contains encoded metadata about the object that is meaningless to the caller and includes information such as the version of the object.

Non-Functional

  • Highly available
  • Highly scalable
  • Completely decentralized
  • Eventually consistent and low latency

High Level Design

Distributed Hash Table (DHT

Dynamo is a Distributed Hash Table (DHT) that is replicated across the cluster for high availability and fault tolerance.

Data Paritioning

Chanllenges

  • How do we know on which node a particular piece of data will be stored?
  • When we add or remove nodes, how do we know what data will be moved from existing nodes to the new nodes?
  • Furthermore, how can we minimize data movement when nodes join or leave?

Solution

  • Consistent Hashing
  • Optimistic Replication
  • Preference List
    • The list of nodes responsible for storing a particular key.

Follow-up

Handle temporary failures

Sloppy quorum

  • All read/write operations are performed on the first N healthy nodes from the preference list, which may not always be the first N nodes encountered while moving clockwise on the consistent hashing ring.
    Design DynamoDB
    • Consider the example of Dynamo configuration given in the figure below with N=3.
    • In this example, if Server 1 is temporarily down or unreachable during a write operation, its data will now be stored on Server 4.
    • Thus, Dynamo transfers the replica stored on the failing node (i.e., Server 1) to the next node of the consistent hash ring that does not have the replica (i.e., Server 4).
    • This is done to avoid unavailability caused by a short-term machine or network failure and to maintain desired availability and durability guarantees.
    • The replica sent to Server 4 will have a hint in its metadata that suggests which node was the intended recipient of the replica (in this case, Server 1).
    • Nodes that receive hinted replicas will keep them in a separate local database that is scanned periodically.
    • Upon detecting that Server 1 has recovered, Server 4 will attempt to deliver the replica to Server 1.
    • Once the transfer succeeds, Server 4 may delete the object from its local store without decreasing the total number of replicas in the system.
  • Hinted handoff
    • When a node is unreachable, another node can accept writes on its behalf.
    • The write is then kept in a local buffer and sent out once the destination node is reachable again.

Handle conflicting data

Vector Clock

上一篇:Ansible配置及常用模块总结 - 运维笔记


下一篇:Leetcode107. 二叉树的层序遍历 II