CREATED 2021-12-03 21:26:05
System Design - Yelp or Proximity Service
Functional requirements
Given a location, a user can search all the nearby places within a give radius.
A user can add his/her favourite places.
A user can post feedback/review/comment to a place: rating(required) + text/pictures(optional)
Non-Functional requirements
Search with minimum latency.
A Read heavy & Write little system.
Scale
500M places and 100K queries per second (QPS)
APIs Design
1. Function1(Core): GET /search
parameters: keyword, radius, category(optional), max_return(optional), (optional)sortOrder, (optional)filter, (optional)next_token
return: A JSON containing information about a list of popular places matching the search query. Each result entry will have the place name, address, category, rating, and image(s).
Function2: POST/add DELETE/remove (to/from favorite list)
parameters: placeId
Function3: POST/add
parameters: placeId, rating, review text(optional), photo_urls(optional)
Database Schema
Database of Places
placeId 8 bytes (hash key)
name 256 bytes
latitude 4 bytes
longtitude 4 bytes
category 1 byte
description 512 bytes
Business Profile DB
PlaceDB
placeId (Primary Key) | placeName | Address | Category | Description | [ReviewInfo(ReviewText, [mediaId1,2,3,...])]
Business Media DB
mediaId (Primary Key) | placeId | mediaURL | userId
Review DB
reviewId (Primary Key) | placeId | text | userId | rating
Basic System Design and Algorithm
SQL
Select * from Places where Latitude between X-D and X+D and Longitude between Y-D and Y+D
Not completely accurate and not efficent.
Grids
GridID is a four bytes number. Based on a given location and radius, we can find all the neighboring grids and then query these grids to find nearby places.
20 million grids and 500 million places. Since LocationID is 8 bytes, we would need 4GB of memory (ignoring hash table overhead) to store the index in the memory:
4 20M + 8 500M ~= 4 GB
QuadTree
A tree in which each node has 4 children nodes. Starting from thre root node which represents the whole world, we keep spliting each child node until there are no nodes left with more than 500 locations.
All information of places are stored in leaf nodes. (A node represents a grid with no more than 500 places).
With double linked pointer to other leaf nodes and parent pointer to parent node, we can find neighboring grids of a given grid.
Workflow
First we find the node that contains the user’s location.
If that node has enough desired places, we can return them to the user.
If not, we will keep expanding to the neighboring nodes (either through the parent pointers or doubly linked list) until either we find enough required number of places within the maximum radius.
Storage 24 Bytes x 500M = 12 GB (potentially)
Partition
1. By region or zipcode
This could cause the issue of hot places. To avoid one of the server in the cluster receive too many requests.
2. By locationId
This could make a request querying too many shard servers. May not be efficient.
3. By Geohash/Google S2
To avoid the issue of hot shard, try to make that cells are continuous in one shard and adjust the number of cells in one shard by the number of places.
Geohash is widely adopted in the open source community (e.g. ElasticSearch, MongoDB, and others).
Cache
Between backend databases and application servers, we can store all data about hot places.
Least Recently Used (LRU) seems suitable in this case.
Load Balance(Other)
Round Robin (distributed equally)
More intelligent: also take traffic/load/server status as consideration by periodically querying backend server and then adjusts their traffic volumes
CDN(Other)
A CDN is a system of globally distributed servers that deliver web content to a user based on the geographic locations of the user, the origin of the web page and a content delivery server. CDNs replicate content in multiple places. User can get the content from the nearest CDN.
Push CDNs/Pull CDN
Disadvantage: expensive/read stale data
Reference
[1] https://medium.com/swlh/design-a-proximity-server-like-nearby-or-yelp-part-1-c8fe2951c534
[2] https://codeburst.io/design-a-proximity-server-like-yelp-part-2-d430879203a5
[3] https://www.educative.io/courses/grokking-the-system-design-interview/B8rpM8E16LQ