The inadequacy of learning resources

Existing materials on System Design Interviews (SDI) has two categories, neither is sufficient for Staff+ level interviews. Not even close.

The first category is books, papers, and blogs from industrial leaders. For example, DDIA and All Things Distributed. While they have some academic merits, the contents are not intended for the system design skill we apply in daily jobs (hence in SDI). For example, the famous CAP theorem everyone teaches is useless. Because in all distributed system, network partition – or failures in general – is a given fact rather than a factor you can trade off with. In reality, a much better model is the trade off between the speed of eventual consistency convergence, the speed of response, throughput, and availability.

The second category is online courses claimed to teach you SDI. For example, Grokking the System Design Interview, ByteByteGo and its book, Interviewing.io, and many others. Those contents are beautifully presented and logically organized. I believe those materials are sufficient for entry level jobs like L3 and L4. Sometimes you can pass L5 interviews too with some lock. But if a job candidate presented solutions and understanding in those course, he will fail almost 100% times for L6+ interviews.

For example, one Ticket Master solution suggested using a distributed lock like Zookeeper to solve the race condition when multiple users try to book the same seat. Firstly, it’s wrong because Zookeeper is too slow and can’t scale big enough. Secondly, if one do propose a distributed lock, the interviewer will ask how it’s implemented. No one can explain such complex algorithms (Paxos or Raft) in a short time. It’s like suggesting to use B tree in a coding interview. Obviously, the author didn’t understand what are acceptable building blocks in SDI.

Another very common mistake is using distributed KV engines as if it can horizontally scale and be faster than relational DB. In fact, those KV engines scale the same as relational DB, but are slower to update (because of replicas) and much harder to maintain. The reason of choosing them is availability rather than speed.

Previous
Previous

The Real Game

Next
Next

The proved method: Apprenticeship