hashicorp / nomad-autoscaler

Nomad Autoscaler brings autoscaling to your Nomad workloads.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Autoscaler crashes with SIGSEGV if strategy is missing from check in policy

ryndaniels opened this issue · comments

I've seen this happen with both 0.3.5 and 0.3.7. If I create a policy with a check that is missing a strategy block, the autoscaler crashes with a SIGSEGV:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xa1b168]

goroutine 19 [running]:
github.com/hashicorp/nomad-autoscaler/sdk.(*ScalingPolicy).Validate(0xc0000a7a20)
        /home/circleci/project/project/sdk/policy.go:74 +0x128
github.com/hashicorp/nomad-autoscaler/policy.(*Handler).handleTick(0xc000192fa0, {0x1a22e70, 0xc0004ebd40}, 0xc0004f4cb0)
        /home/circleci/project/project/policy/handler.go:205 +0x85
github.com/hashicorp/nomad-autoscaler/policy.(*Handler).Run(0xc000192fa0, {0x1a22e70, 0xc0004ebd40}, 0xc0005a1fd0)
        /home/circleci/project/project/policy/handler.go:156 +0x645
github.com/hashicorp/nomad-autoscaler/policy.(*Manager).Run.func1({0xc00003a150, 0x24})
        /home/circleci/project/project/policy/manager.go:116 +0x46
created by github.com/hashicorp/nomad-autoscaler/policy.(*Manager).Run
        /home/circleci/project/project/policy/manager.go:115 +0xed3

Example of a policy that will trigger this crash:

scaling "cluster_policy" {
  enabled = true
  min     = 1
  max     = 10
  policy {
    cooldown            = "2m"
    evaluation_interval = "2m"
    check "cpu_allocated_percentage" {
        source = "nomad-apm"
        query  = "avg_cpu"
    }
    target "aws-asg" {
      dry-run             = "false"
      aws_asg_name        = "test-asg"
      node_drain_deadline = "5m"
    }
  }
}

Adding a strategy block will fix the issue, so this policy does not crash:

scaling "cluster_policy" {
  enabled = true
  min     = 1
  max     = 10
  policy {
    cooldown            = "2m"
    evaluation_interval = "2m"
    check "cpu_allocated_percentage" {
        source = "nomad-apm"
        query  = "avg_cpu"
        strategy "target-value" {
            target = 10
        }
    }
    target "aws-asg" {
      dry-run             = "false"
      aws_asg_name        = "test-asg"
      node_drain_deadline = "5m"
    }
  }
}

If the required strategy block is missing from the config, I would expect to get some sort of error message in the logs indicating this, rather than a SIGSEGV crash.

Hi @ryndaniels and thanks for raising this issue with a great reproduction.

If the required strategy block is missing from the config, I would expect to get some sort of error message in the logs indicating this, rather than a SIGSEGV crash.

Yep I totally agree, and this is something we will look into and fix up.