Sponsored By

Scaling Dedicated Game Servers with Kubernetes: Part 4 - Scaling Down

Fourth and final part of a series on managing and scaling Dedicated Game Servers with open source projects Kubernetes and Docker. In this version we write a strategy to scale down our Kubernetes cluster, while making sure we don't interrupt active games.

Mark Mandel, Blogger

April 26, 2018

9 Min Read

Originally posted on compoundtheory.com.

This is part four of a fivefour-part series on scaling game servers with Kubernetes.

In the previous three posts, we hosted our game servers on Kubernetes, measured and limited their resource usage, and scaled up the nodes in our cluster based on that usage. Now we need to tackle the harder problem: scaling down the nodes in our cluster as resources are no longer being used, while ensuring that in-progress games are not interrupted when a node is deleted.

On the surface, scaling down nodes in our cluster may seem particularly complicated. Each game server has in-memory state of the current game and multiple game clients are connected to an individual game server playing a game. Deleting arbitrary nodes could potentially disconnect active players -- and that tends to make them angry! Therefore, we can only remove nodes from a cluster when a node is empty of dedicated game servers.

This means that if you are running on Google Kubernetes Engine (GKE), or similar, you can’t use a managed autoscaling system. To quote the documentation for the GKE autoscaler “Cluster autoscaler assumes that all replicated Pods can be restarted on some other node…” -- which in our case is definitely not going to work, since it could easily delete nodes that have active players on them.

That being said, when looking at this situation more closely, we discover that we can break this down into three separate strategies that when combined together make scaling down a manageable problem that we can implement ourselves:

  1. Group game servers together to avoid fragmentation across the cluster

  2. Cordon nodes when CPU capacity is above the configured buffer

  3. Delete a cordoned node from the cluster once all the games on the node have exited

Let’s look at each of these detail.

Grouping Game Servers Together in the Cluster

We want to avoid fragmentation of game servers across the cluster so we don’t end up with a wayward small set of game servers still running across multiple nodes, which will prevent those nodes from being shut down and reclaiming their resources.

This means we don’t want a scheduling pattern that creates game server Pods on random nodes across our cluster like this:

Fragmented across the cluster

But instead want to have our game server Pods scheduled packed as tight as possible like this:

Fragmented across the cluster

To group our game servers together, we can take advantage of Kubernetes Pod PodAffinity configuration with the PreferredDuringSchedulingIgnoredDuringExecution option. This gives us the ability to tell Pods that we prefer to group them by the hostname of the node that they are currently on, which essentially means that Kubernetes will prefer to put a dedicated game server Pod on a node that already has a dedicated game server Pod on it already.

In an ideal world, we would want a dedicated game server Pod to be scheduled on the node with the most dedicated game server Pods, as long as that node also has enough spare CPU resources. We could definitely do this if we wanted to write our own custom scheduler for Kubernetes, but to keep this demo simple, we will stick with the PodAffinity solution. That being said, when we consider the short length of our games, and that we will be adding (and explaining) cordoning nodes shortly, this combination of techniques is good enough for our requirements, and removes the need for us to write additional complex code.

When we add the PodAffinity configuration to the previous post’s configuration, we end up with the following, which tells Kubernetes to put pods with the labels sessions: game on the same node as each other whenever possible.

apiVersion: v1
kind: Pod
  generateName: "game-"
  hostNetwork: true
  restartPolicy: Never
    role: game-server
    - name: soccer-server
      image: gcr.io/soccer/soccer-server:0.1
        - name: SESSION_NAME
              fieldPath: metadata.name
            cpu: "0.1"
    podAffinity: # group game server Pods
      - podAffinityTerm:
              sessions: game
          topologyKey: kubernetes.io/hostname

Cordoning Nodes

Now that we have our game servers relatively well packed together in the cluster, we can discuss “cordoning nodes”. What does cordoning nodes really mean? Very simply, Kubernetes gives us the ability to tell the scheduler: “Hey scheduler, don't schedule anything new on this node here”. This ensures that no new Pods get scheduled on that node. In fact, in some places in the Kubernetes documentation, this is simply referred to as marking a node unschedulable.

Cordoning nodes

In the code below, if you focus on the section s.bufferCount < available you will see that we make a request to cordon nodes if the amount of CPU buffer we currently have is greater than what we have set as our need. We’ve stripped some parts out for brevity, but you can see the original here.

// scale scales nodes up and down, depending on CPU constraints
// this includes adding nodes, cordoning them as well as deleting them
func (s Server) scaleNodes() error {
        nl, err := s.newNodeList()
        if err != nil {
                return err

        available := nl.cpuRequestsAvailable()
        if available < s.bufferCount {
                finished, err := s.uncordonNodes(nl, s.bufferCount-available)
                // short circuit if uncordoning means we have enough buffer now
                if err != nil || finished {
                        return err

                nl, err := s.newNodeList()
                if err != nil {
                        return err
                // recalculate
                available = nl.cpuRequestsAvailable()
                err = s.increaseNodes(nl, s.bufferCount-available)
                if err != nil {
                        return err

        } else if s.bufferCount < available {
                err := s.cordonNodes(nl, available-s.bufferCount)
                if err != nil {
                        return err

        return s.deleteCordonedNodes()

As you can also see from the code above, we can uncorden any available cordoned nodes in the cluster if we drop below the configured CPU buffer. This is faster than adding a whole new node, so it’s important to check for cordoned nodes before adding a whole new node from scratch.  Because of this we also have a configured delay on how long before a cordoned node is deleted (you can see the source here) to limit thrashing on creating and deleting nodes in the cluster unnecessarily.

This is a pretty great start. However, when we want to cordon nodes, we want to cordon only the nodes that have the least number of game server Pods on them, as in this instance, they are most likely to empty first as game sessions come to an end.

Thanks to the Kubernetes API, it’s relatively straightforward to count the number of game server Pods on each Node, and sort them in ascending order.  From there we can do arithmetic to determine if we still remain above the desired CPU buffer if we cordon each of the available nodes. If so, we can safely cordon those nodes.

// cordonNodes decrease the number of available nodes by the given number of cpu blocks (but not over),
// but cordoning those nodes that have the least number of games currently on them
func (s Server) cordonNodes(nl *nodeList, gameNumber int64) error {
       // … removed some input validation ... 

        // how many nodes (n) do we have to delete such that we are cordoning no more
        // than the gameNumber
        capacity := nl.nodes.Items[0].Status.Capacity[v1.ResourceCPU] //assuming all nodes are the same
        cpuRequest := gameNumber * s.cpuRequest
        diff := int64(math.Floor(float64(cpuRequest) / float64(capacity.MilliValue())))

        if diff <= 0 {
                log.Print("[Info][CordonNodes] No nodes to be cordoned.")
                return nil

        log.Printf("[Info][CordonNodes] Cordoning %v nodes", diff)

        // sort the nodes, such that the one with the least number of games are first
        nodes := nl.nodes.Items
        sort.Slice(nodes, func(i, j int) bool {
                return len(nl.nodePods(nodes[i]).Items) < len(nl.nodePods(nodes[j]).Items)

        // grab the first n number of them
        cNodes := nodes[0:diff]

        // cordon them all
        for _, n := range cNodes {
                log.Printf("[Info][CordonNodes] Cordoning node: %v", n.Name)
                err := s.cordon(&n, true)
                if err != nil {
                        return err

        return nil

Removing Nodes from the Cluster

Now that we have nodes in our clusters being cordoned, it is just a matter of waiting until the cordoned node is empty of game server Pods before deleting it. The code below also makes sure the node count never drops below a configured minimum as a nice baseline for capacity within our cluster.

You can see this in the code below, and in the original context:

// deleteCordonedNodes will delete a cordoned node if it
// the time since it was cordoned has expired
func (s Server) deleteCordonedNodes() error {
  nl, err := s.newNodeList()
  if err != nil {
     return err

  l := int64(len(nl.nodes.Items))
  if l <= s.minNodeNumber {
     log.Print("[Info][deleteCordonedNodes] Already at minimum node count. exiting")
     return nil

  var dn []v1.Node
  for _, n := range nl.cordonedNodes() {
     ct, err := cordonTimestamp(n)
     if err != nil {
        return err

     pl := nl.nodePods(n)
     // if no game session pods && if they have passed expiry, then delete them
     if len(filterGameSessionPods(pl.Items)) == 0 && ct.Add(s.shutdown).Before(s.clock.Now()) {
        err := s.cs.CoreV1().Nodes().Delete(n.Name, nil)
        if err != nil {
           return errors.Wrapf(err, "Error deleting cordoned node: %v", n.Name)
        dn = append(dn, n)
        // don't delete more nodes than the minimum number set
        if l--; l <= s.minNodeNumber {

  return s.nodePool.DeleteNodes(dn)


We’ve successfully containerised our game servers, scaled them up as demand increases, and now scaled our Kubernetes cluster down, so we don’t have to pay for underutilised machinery -- all powered by the APIs and capabilities that Kubernetes makes available out of the box. While it would take more work to turn this into a production level system, you can already see how to take advantage of the many building blocks available to you.

Before we finish, I would like to apologise for the delay in producing the fourth part in this series. If you saw the announcement, you may have guessed that a lot of my time got taken up developing and releasing Agones, the open source, productised version of this series of posts on running game servers on Kubernetes.

On that note, this will also be the last installment in this series. I had already completed the work to implement scaling down before starting on Agones, and rather than build out new functionality for global cluster management on Paddle Soccer, I’m going to focus those efforts building out awesome new features for Agones and bring it up from its current 0.1 alpha release, to a full 1.0, production-ready milestone.

I’m very excited about the future of Agones, and if my series of blog posts have inspired you, watch the GitHub repository, join the Slack, follow us on Twitter and get involved the mailing list. We’re actively seeking more contributors, and would love to have you involved.

Lastly, I welcome questions and comments here, or reach out to me via Twitter. You can also see my presentation at GDC and GCAP from 2017 on this topic, as well as check out the code in GitHub.

All posts in this series:

Read more about:

Featured Blogs

About the Author(s)

Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like