I have little sde background, so apologies for noob question. I was recently asked in a System Design interview to design a system that will get hundreds of thousands of hits within a short time span (think multiplayer online short video game). The main challenge is to handle concurrency issue. I gave a standard answer, "have multiple application servers and sync server with replicated cache and write locks on db". But interviewer was disappointed and also mentioned it in the feedback. I'm curious to learn how to actually answer questions like this. Could some Blind ninja please spare a few minutes to answer? Edit: You're supposed to design a game where there is no limit to the number of users (open to make assumptions here). Each player is looking to uncover treasures on the game map. So each player should also get a consistent copy of the game map and current state. This is not a turn-based game. p.s. I understand that grokking course and that book and those YouTube channels are the best way to learn. But I'm looking for keywords for this answer so I can directly read those topics and be able to get this out of my head. Tc: 280k
Do you think consistency is important for a game?
Yes, very. Each player should get the latest copy of the game board. That's why I added a sync server.
@amazon, what if it's not a users turn, will it still require strong consistency model?
Just some ideas: - async/await (I.e. event loops) to handle incoming requests - message queues to decouple critical actions from non-critical actions - use distributed data layers (e.g. redis, cassandra, spanner) - consider alternatives to threads such as Java Fiber or Goroutines
Thank you! Do you any recommendations for resources to learn more about these?
Or Kotlin CoRoutine, green threads, etc...
How will you ensure members of one game are mapped to same instance? What if they disconnect for a while and join again? How will you check if someone have left the game?
For 1, there is only one instance of the game, so all users are playing on the same active map. 2 and 3 are open for making assumptions.
One node? no redundancy? sounds strange.
The interviewer isn’t interested in an “answer” at all. They’re more interested in a discussion. Plus your answer is hand wavy AF — as an interviewer I don’t even want to bother double clicking.
Thanks for the helpful suggestions!
I agree. My best system design was a great discussion where I proposed solution which will barely work on practice but the whole problem was a huge "what-if", while my worst system design was when I replied "easy-peasy, you do this, this and that and we get the most efficient solution".
What’s your role and level at Amazon? Curious about your TC and background
Science. L5
Data science? Applied science? Can you be a little more specific?
The game client maintains a local state of the environment and a persistent connection with a server. Depending on the game all players can fit into a single server, or can be partitioned appropriately. Clients update state changes at regular intervals which gets written to an event stream. Many player moves are non conflicting and can happen concurrently. Those that are conflicting, needs to be totally ordered and causal. The client side logic can either be pessimistic or optimistic when dealing with conflicting changes. The state changes are persisted as an event stream, that allows for fault tolerance. The server determines the actual state in presence of conflicts and writes the updates to another stream that needs to be fanned out to clients. (I have no idea how multiplayer games are actually built)
The same way how google sheets are built?
Now that you said, seems quite similar
Read about Google Spanner and FB Live commenting design.
Thanks! You mean on their respective engineering blogs?
Spanner white paper and FB live commenting blog and also any videos/articles you find on that as a system design quest
I highly recommend also reading "designing data-intensive applications" by Martin Keppelmann
Which chapter in that covers this?
roughly half of the chapters are relevant here. How much data does the game state represent? how is it encoded? How are replications handled? Partitions? Data warehousing? Streams?
here is what I would do: 1) use optimistic locking, most of times users probably pick up different treasures, we don't want to use pessimistic lock to create bottleneck 2) for two players trying to pick up same treasure or any conflict like this, we will check if the state of the data to be modified is changed (can use monotonic version number), if changed, abort the change. This is a first come first serve strategy. if a player picking up a treasure, and the request arrives on the server first, it will succeed. otherwise fails. 3) whenever there is change to the map, broadcast to all players. This can use a queue for publish changes, and then have workers/subscribers to pick up the changes to fan out to other players. depending on the performance/delay requirement, we could do batch publishing to players to reduce network bandwidth usage.
India
Yesterday
751
A list of ethnic slurs on Indians that should be banned on Blind
Tech Industry
13h
1311
Why doesn't OpenAI offshore and reduce expense by 80%
Software Engineering Career
6h
1714
L4 Google -> 45 interviews, 5 offers, AMA
Software Engineering Career
Yesterday
1446
Principal Software Engineer TC~300K at Microsoft vs 600K at Meta. Is 300k pretty low for Principal scope?
Software Engineering Career
Yesterday
744
If your team does daily standups, your manager is a micromanager
It’s hard to answer it without full context.
I apologize. What additional details should I provide?
What’s the requirement for consistency vs availability? Do you care if the db is slightly inconsistent for some time ?