Scaling multiplayer game servers with Agones on Kubernetes

The shape of the problem

Multiplayer rooms are stateful. Once four players join a room, the room has a lifetime measured in minutes; the socket connections are long-lived; and the pod backing it has to be deterministic for the duration. Drop the pod and you drop the match.

That breaks every default Kubernetes scaling assumption. Horizontal Pod Autoscaler watches CPU. A multiplayer pod's CPU is dominated by tick rate, which is the same whether it serves one room or four. Liveness probes can't reschedule a pod mid-match. And a vanilla Deployment has no way to allocate a specific pod to a specific group of players.

Agones is the open-source project that fills this gap. It treats game servers as first-class Kubernetes resources, with lifecycle states (Ready, Allocated, Shutdown) and an allocation API that hands you a specific pod by name. On top of that we layered Open Match for matchmaking and a thin Game Server Manager service that owned the room lifecycle.

The architecture, end to end

Six-step room allocation flow. Yellow lines are the user-facing path; the dashed grey line is Agones' auto-scale callback.

The six-step room allocation flow

Match Maker → Game Server Manager. Open Match (or Fleet IQ) batches users into a room and fires POST /allocate-room with the user list.
Game Server Manager → Agones. The manager calls POST /allocate-gameserver on Agones. Agones is the source of truth for which pods exist and which are free.
Re-use first, allocate second. Agones consults pod Labels. If an Allocated server is flagged as "has capacity for another room," that pod is returned. Only if no warm pod can take the room does Agones promote a Ready pod to Allocated.
IP + Port + AuthToken. Agones responds with the pod's external address and a server-issued AuthToken (UUID). The manager persists the AuthToken so the game server can later verify joins.
Relay to clients. The Game Server Manager sends each player the connection triple (IP, Port, AuthToken) through the Match Maker channel they already trust.
Direct connection. Players open a WebSocket to https://IP:PORT presenting the AuthToken. The game server validates the token, places them into the correct RoomInstance, and from this moment forward the Manager and Agones are out of the hot path.

The reuse trick: one pod, many rooms

The naive design is one game server per room. That works for trivial loads and falls over the moment you try to scale: every match means a new pod, new IP, new cold start, new image pull.

We made every game server multi-room aware. Inside the pod, the process maintains a map of RoomInstance objects keyed by room ID. Players from different rooms connect to the same port; the dispatcher routes inbound socket frames to the appropriate room based on the AuthToken at handshake time.

The contract between game server and Agones is communicated via Kubernetes Labels. Every game server keeps a count of how many rooms it is hosting. When that count is below the configured ceiling, it sets a label like rooms-available: true on its own pod via the Agones SDK. Agones's allocator filter then prefers labelled-available pods over Ready pods, automatically.

Why this matters

Cold starts disappear. Most matches reuse a warm pod. New Ready pods only get promoted when the fleet is genuinely saturated.
Pod count tracks room ceiling, not room count. A pod hosting four rooms uses the same image pull as a pod hosting one. The number of pods is bounded by your per-pod ceiling, not by your traffic.
Per-room state stays in-process. No cross-pod chatter for in-room events. The room is a struct on one pod, period.

The auto-scale callback

Agones doesn't autoscale by CPU. Instead, it periodically calls a URL you configure — in our case http://game-server-manager/auto-scale — and asks: "given the fleet state I'm about to show you, should I scale up, scale down, or hold?"

That callback is where the room-aware policy lives. The Manager looks at the room count across the fleet, the per-pod ceiling, and current allocations, and answers with a target replica count. The decision the policy actually makes is:

Keep enough Ready pods that the next batch of matchmaker results doesn't have to wait on a pod-spawn. Don't let Ready pods accumulate beyond what an honest traffic surge would need within the next minute.

The exact buffer is a constant per environment — bigger for live, smaller for staging. The shape of the policy is the same: scale the warm pool, not the allocated pool.

The metadata store

The Game Server Manager keeps a few things hot: the room → pod mapping, the AuthToken → room mapping, and the per-pod room counts. We backed this with Redis for the PoC.

For a cost-conscious deployment, that store can be swapped for a Kubernetes Persistent Volume backing a small embedded KV. It's worth the trouble only when you've validated the rest — Redis is the right "doesn't get in the way" choice while you're still tuning the allocation policy.

The shape of the lesson: the hardest part of running multiplayer on Kubernetes is admitting that the room — not the pod and not the CPU — is the unit you actually want to scale. Once you commit to that, Agones gives you the primitives, and the rest is a couple of hundred lines of allocation policy.