The shape of the problem
Multiplayer rooms are stateful. Once four players join a room, the room has a lifetime measured in minutes; the socket connections are long-lived; and the pod backing it has to be deterministic for the duration. Drop the pod and you drop the match.
That breaks every default Kubernetes scaling assumption. Horizontal Pod Autoscaler watches CPU. A multiplayer pod's CPU is dominated by tick rate, which is the same whether it serves one room or four. Liveness probes can't reschedule a pod mid-match. And a vanilla Deployment has no way to allocate a specific pod to a specific group of players.
Agones is the open-source project that fills this gap. It treats game servers as first-class Kubernetes resources, with lifecycle states (Ready, Allocated, Shutdown) and an allocation API that hands you a specific pod by name. On top of that we layered Open Match for matchmaking and a thin Game Server Manager service that owned the room lifecycle.
The architecture, end to end
Six-step room allocation flow. Yellow lines are the user-facing path; the dashed grey line is Agones' auto-scale callback.
The six-step room allocation flow
-
Match Maker → Game Server Manager. Open Match (or Fleet IQ) batches users into a room and fires
POST /allocate-roomwith the user list. -
Game Server Manager → Agones. The manager calls
POST /allocate-gameserveron Agones. Agones is the source of truth for which pods exist and which are free. -
Re-use first, allocate second. Agones consults pod
Labels. If anAllocatedserver is flagged as "has capacity for another room," that pod is returned. Only if no warm pod can take the room does Agones promote aReadypod toAllocated. - IP + Port + AuthToken. Agones responds with the pod's external address and a server-issued AuthToken (UUID). The manager persists the AuthToken so the game server can later verify joins.
-
Relay to clients. The Game Server Manager sends each player the connection triple
(IP, Port, AuthToken)through the Match Maker channel they already trust. -
Direct connection. Players open a WebSocket to
https://IP:PORTpresenting the AuthToken. The game server validates the token, places them into the correctRoomInstance, and from this moment forward the Manager and Agones are out of the hot path.
The reuse trick: one pod, many rooms
The naive design is one game server per room. That works for trivial loads and falls over the moment you try to scale: every match means a new pod, new IP, new cold start, new image pull.
We made every game server multi-room aware. Inside the pod, the process maintains a map of RoomInstance objects keyed by room ID. Players from different rooms connect to the same port; the dispatcher routes inbound socket frames to the appropriate room based on the AuthToken at handshake time.
The contract between game server and Agones is communicated via Kubernetes Labels. Every game server keeps a count of how many rooms it is hosting. When that count is below the configured ceiling, it sets a label like rooms-available: true on its own pod via the Agones SDK. Agones's allocator filter then prefers labelled-available pods over Ready pods, automatically.
Why this matters
- Cold starts disappear. Most matches reuse a warm pod. New
Readypods only get promoted when the fleet is genuinely saturated. - Pod count tracks room ceiling, not room count. A pod hosting four rooms uses the same image pull as a pod hosting one. The number of pods is bounded by your per-pod ceiling, not by your traffic.
- Per-room state stays in-process. No cross-pod chatter for in-room events. The room is a struct on one pod, period.
The auto-scale callback
Agones doesn't autoscale by CPU. Instead, it periodically calls a URL you configure — in our case http://game-server-manager/auto-scale — and asks: "given the fleet state I'm about to show you, should I scale up, scale down, or hold?"
That callback is where the room-aware policy lives. The Manager looks at the room count across the fleet, the per-pod ceiling, and current allocations, and answers with a target replica count. The decision the policy actually makes is:
Keep enoughReadypods that the next batch of matchmaker results doesn't have to wait on a pod-spawn. Don't letReadypods accumulate beyond what an honest traffic surge would need within the next minute.
The exact buffer is a constant per environment — bigger for live, smaller for staging. The shape of the policy is the same: scale the warm pool, not the allocated pool.
The metadata store
The Game Server Manager keeps a few things hot: the room → pod mapping, the AuthToken → room mapping, and the per-pod room counts. We backed this with Redis for the PoC.
For a cost-conscious deployment, that store can be swapped for a Kubernetes Persistent Volume backing a small embedded KV. It's worth the trouble only when you've validated the rest — Redis is the right "doesn't get in the way" choice while you're still tuning the allocation policy.