- Do not get into detail too fast.
- Avoid silver bullet(fix mindset)
- K-I-S-S Keep it small and simple
- Justify why you want to make a simple choice
- Aware of current technologies
-
Load Balancing(Nginx)
- For Webserver
-
Caching(Memcached, Redis, CDN to cache static assets file)
-
makes around the world fast, PULL/PUSH to CDN Method
-
For image server
-
For database server(reads)
-
Least Recently Used, Least Frequently Used, First In First Out eviction policy
-
-
Slave Master Replication
- for database server(reads)
-
Databases + Indexes
- RDBMS - MSSQL/MYSQL For relationship and speed of access
- NoSql scalability - MongoDB, HBase(supports small write and read)
- Cassandra NoSql - Masterless, low downtime
- HBase - Minimum five datanodes and one namenode, high maintenance, require the use of JRuby shell and interdependency like ZooKeeper, high learning curve
-
Database Sharding
- For database server(write)
- Horizontal (consistent hashing or hash on keys)/Vertical Sharding (each server has different tables)
-
Load Balancer
- Determine what algo to use like Round Robin, Weighted Round Robin, Consistent Hashing
-
Pagination
- Determine the functional requirements
- Determine the non-functional requirements (high availability, data should not be lost)
- data consistency (seeing same message for all devices)
- availability (replication)
- reliability (persistent)
- latency
- Determine extended requirements(accessible through REST API's, analytics, push events etc)
- Limit on content/text (how many data/string can a user push)
- Traffic estimates
- Users
- Files per user
- Storage per file
- Storage estimates
- How long to keep the data
- char(1 byte), int(4 byte), date(4 byte)
- Bandwidth estimates
- Memory estimates
- System APIs (Parameters/Function calls)
- Database design (scheme)
- determine how much each row would be and
- if there is relation
- read/write heavy
- Maximum 500 connections for webservers(it will block)
- split read and write servers and have dedicated services for each other to ensure the system does not block, also allows scaling and optimizing
- Reliability and redundancy
- multiple replicas and at least 1 replica in case primary has an issue
- must be a number of 3 if using master-slave approach
- Data sharding
- Determine number of shards to use and use UserID % 10
- For hot users, we can partition based on photoID, by having a dedicated separate database instance to generate auto-incrementing IDs. To also solve the issue of Single Point of Failure, we can have two KGS with one generating odd and the other generating even and put a round robin infront.
- Sending Data to Users
- Pull, clients can pull contents from the server on a regular basis:
- New data will not be shown until client issue a pull request
- Most of the time pull request will result in an empty response if there is no new(difference) data
- Push, maintaining a long poll request with server to receive updates
- Server has to push update frequently
- Hybrid:
- Separate pool for different users
- Pull, clients can pull contents from the server on a regular basis:
- Reading data
- We can maintain latency by pushing contents to cache servers/CDNs
- Use Cache between Client-Server, Server-Database
- In addition, for hot users
My framework for system design: 1st clarify the domain to design data model, from there design the CRUD API, from there figure out data flow, R/W frequency, persistency requirements which naturally leads to operations/scaling. In short:
- Data Model
- API
- Data flow
- Scaling
- https://www.youtube.com/watch?v=vvhC64hQZMk Designing WhatsApp
- https://www.youtube.com/watch?v=oUJbuFMyBDk Queue
- https://www.youtube.com/watch?v=FMhbR_kQeHw Publisher Subscriber Model, drawbacks is consistency
- https://www.youtube.com/watch?v=GeGxgmPTe4c Distributed consensus
- https://www.youtube.com/watch?v=xrizarXJgC8 Avoid cascading failures in a distributed system
- Caching
- Gradual deployment
- Coupling(etc. Save authentication in server memory and assume it works for the next few hours)
- https://www.youtube.com/watch?v=K0Ta65OqQkY What is Load Balancing?