neondatabase / autoscaling

Postgres vertical autoscaling in k8s

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug: neonvm-controller (still) has log sampling enabled

sharnoff opened this issue · comments

Environment

Prod (and, local development with a bit of effort)

Steps to reproduce

Modify neonvm-controller so it produces a bunch of logs all at once. I used the following patch:

diff --git a/neonvm/controllers/vm_controller.go b/neonvm/controllers/vm_controller.go
index e9ebfcf..cfc81de 100644
--- a/neonvm/controllers/vm_controller.go
+++ b/neonvm/controllers/vm_controller.go
@@ -30,6 +30,7 @@ import (
 	"os"
 	"reflect"
 	"strconv"
+	"sync"
 	"time"
 
 	nadapiv1 "github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/apis/k8s.cni.cncf.io/v1"
@@ -102,6 +103,8 @@ type VMReconciler struct {
 //+kubebuilder:rbac:groups=vm.neon.tech,resources=ippools/finalizers,verbs=update
 //+kubebuilder:rbac:groups=k8s.cni.cncf.io,resources=network-attachment-definitions,verbs=get;list;watch
 
+var logABunch = sync.Once{}
+
 // Reconcile is part of the main kubernetes reconciliation loop which aims to
 // move the current state of the cluster closer to the desired state.
 
@@ -117,6 +120,12 @@ type VMReconciler struct {
 func (r *VMReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
 	log := log.FromContext(ctx)
 
+	logABunch.Do(func() {
+		for i := 0; i < 300; i += 1 {
+			log.Info("Test log sampling", "i", i)
+		}
+	})
+
 	var vm vmv1.VirtualMachine
 	if err := r.Get(ctx, req.NamespacedName, &vm); err != nil {
 		// Error reading the object - requeue the request.

Expected result

There should be 300 log lines from the example above, with all values of i represented.

Actual result

We only get 102 log lines - i.e. 100 + (300-100)/100 from the sampling settings.

More specifically, we get the following sequence of log lines:

{"level":"info","ts":"2024-06-14T22:20:06.177Z","msg":"Test log sampling","controller":"virtualmachine","controllerGroup":"vm.neon.tech","controllerKind":"VirtualMachine","VirtualMachine":{"name":"postgres16-disk-test","namespace":"default"},"namespace":"default","name":"postgres16-disk-test","reconcileID":"5c8a2d5f-3e6a-4a98-9042-161cec15e23c","i":97}
{"level":"info","ts":"2024-06-14T22:20:06.177Z","msg":"Test log sampling","controller":"virtualmachine","controllerGroup":"vm.neon.tech","controllerKind":"VirtualMachine","VirtualMachine":{"name":"postgres16-disk-test","namespace":"default"},"namespace":"default","name":"postgres16-disk-test","reconcileID":"5c8a2d5f-3e6a-4a98-9042-161cec15e23c","i":98}
{"level":"info","ts":"2024-06-14T22:20:06.177Z","msg":"Test log sampling","controller":"virtualmachine","controllerGroup":"vm.neon.tech","controllerKind":"VirtualMachine","VirtualMachine":{"name":"postgres16-disk-test","namespace":"default"},"namespace":"default","name":"postgres16-disk-test","reconcileID":"5c8a2d5f-3e6a-4a98-9042-161cec15e23c","i":99}
{"level":"info","ts":"2024-06-14T22:20:06.177Z","msg":"Test log sampling","controller":"virtualmachine","controllerGroup":"vm.neon.tech","controllerKind":"VirtualMachine","VirtualMachine":{"name":"postgres16-disk-test","namespace":"default"},"namespace":"default","name":"postgres16-disk-test","reconcileID":"5c8a2d5f-3e6a-4a98-9042-161cec15e23c","i":199}
{"level":"info","ts":"2024-06-14T22:20:06.178Z","msg":"Test log sampling","controller":"virtualmachine","controllerGroup":"vm.neon.tech","controllerKind":"VirtualMachine","VirtualMachine":{"name":"postgres16-disk-test","namespace":"default"},"namespace":"default","name":"postgres16-disk-test","reconcileID":"5c8a2d5f-3e6a-4a98-9042-161cec15e23c","i":299}

Other logs, links

I though this would be fixed by

Turns out that's not the case. See that PR for more detail.