InvalidResourceReference - urlPathMaps referenced by requestRoutingRules was not found

Question

InvalidResourceReference - urlPathMaps referenced by requestRoutingRules was not found

cwjoalder opened this issue 10 months ago · comments

Describe the bug
We are using AGIC as an AKS addon but have it enabled to run in a shared App Gateway setup (added CRD AzureIngressProhibitedTarget and APPGW_ENABLE_SHARED_APPGW: true to the config map) between two clusters (DEV/TEST) which was working fine for the last ~2 years.

After a recent change to an ingress we noticed nothing got updated and found the following error in the logs of the controller on the DEV environment.

E0127 14:58:16.128358 1 worker.go:72] Error processing event.network.ApplicationGatewaysClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidResourceReference" Message="Resource /subscriptions/company-subscription/resourceGroups/company-rg-1/providers/Microsoft.Network/applicationGateways/nonprod-public-agw-01/urlPathMaps/url-9cd989d2ce94d8d5b665a2c10b238fba referenced by resource /subscriptions/company-subscription/resourceGroups/company-rg-1/providers/Microsoft.Network/applicationGateways/nonprod-public-agw-01/requestRoutingRules/rr-9cd989d2ce94d8d5b665a2c10b238fba was not found. Please make sure that the referenced resource exists, and that both resources are in the same region." Details=[]

We are using a AzureIngressProhibitedTarget based on hostname like

apiVersion: appgw.ingress.k8s.io/v1   ️
kind: AzureIngressProhibitedTarget
metadata:
  name: inthub-test-ingress-prohibited-target                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             spec:
  hostname: test.example.com

What is interessting about the error is, that the referenced urlPathMap url-9cd989d2ce94d8d5b665a2c10b238fba is from the other environment (TEST) and thus should not be ignored. Even seems as it is ignored in some parts of the logic, but not others as the path map definitely exists as evident when queried on the Azure management API.

This currently blocks us in rolling out any ingress changes in the affected environments. As a test I've manually changed the version in the deployment to 1.7.4, which seems to still have been able to apply the changes. However that is not a sustainable fix as the version is managed by Azure if AGIC is deployed as an add-on.

Current controller version is 1.7.6 as managed by Azure.
Redacted internal names and IDs.
After additional search it looks like #1671 has a very similar error and might be related.

To Reproduce
Steps to reproduce the behavior:

Create shared App Gateway setup with version <1.7.5 (see specifics above)
Upgrade to >1.7.6
Wait for resync or restart ingress controller

Ingress Controller details

Output of kubectl describe pod <ingress controller> . The pod name can be obtained by running helm list.

Name:             ingress-appgw-deployment-786d5cb658-szp48
Namespace:        kube-system
Priority:         0
Service Account:  ingress-appgw-sa
Node:             aks-agentpool1-25902014-vmss000000/10.153.10.5
Start Time:       Mon, 27 Jan 2025 15:58:05 +0100
Labels:           app=ingress-appgw
                  kubernetes.azure.com/managedby=aks
                  pod-template-hash=786d5cb658
Annotations:      checksum/config: d6dd7d4cc4c7c004ac449609f0e28835d1e6fbcf805219999d0ca96a687b3394
                  cluster-autoscaler.kubernetes.io/safe-to-evict: true
                  kubernetes.azure.com/metrics-scrape: true                                                                                                                                                                                                                                                                                                           prometheus.io/path: /metrics
                  prometheus.io/port: 8123
                  prometheus.io/scrape: true                                                                                                                                                                                                                                                                                                                          resource-id:
                    /subscriptions/company-subscription/resourceGroups/company-rg-1/providers/Microsoft.ContainerService/managedC...
Status:           Running                                                                                                                                                                                                                                                                                                                           IP:               10.153.10.15
IPs:
  IP:           10.153.10.15                                                                                                                                                                                                                                                                                                                        Controlled By:  ReplicaSet/ingress-appgw-deployment-786d5cb658
Containers:                                                                                                                                                                                                                                                                                                                                           ingress-appgw-container:
    Container ID:   containerd://379387b828d58c17e2a9724be31e8239f24191ada7080630747d0e252a21adff                                                                                                                                                                                                                                                       Image:          mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.7.6                                                                                                                                                                                                                                                                Image ID:       mcr.microsoft.com/azure-application-gateway/kubernetes-ingress@sha256:b1a4bc293ac673d29524f3340a3c76ba008b3ed60def578f57a7789b16f2ef0f                                                                                                                                                                                              Port:           <none>                                                                                                                                                                                                                                                                                                                              Host Port:      <none>                                                                                                                                                                                                                                                                                                                              State:          Running                                                                                                                                                                                                                                                                                                                               Started:      Mon, 27 Jan 2025 15:58:06 +0100                                                                                                                                                                                                                                                                                                     Ready:          True                                                                                                                                                                                                                                                                                                                                Restart Count:  0
    Limits:
      cpu:     700m
      memory:  600Mi
    Requests:
      cpu:      100m
      memory:   20Mi
    Liveness:   http-get http://:8123/health/alive delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:  http-get http://:8123/health/ready delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      ingress-appgw-cm  ConfigMap  Optional: false
    Environment:
      KUBERNETES_SERVICE_HOST:        dev-xxx.blub.privatelink.switzerlandnorth.azmk8s.io
      KUBERNETES_PORT:                tcp://dev-xxx.blub.privatelink.switzerlandnorth.azmk8s.io:443
      KUBERNETES_PORT_443_TCP:        tcp://dev-xxx.blub.privatelink.switzerlandnorth.azmk8s.io:443
      KUBERNETES_PORT_443_TCP_ADDR:   dev-xxx.blub.privatelink.switzerlandnorth.azmk8s.io
      AZURE_CLOUD_PROVIDER_LOCATION:  /etc/kubernetes/azure.json
      AGIC_POD_NAME:                  ingress-appgw-deployment-786d5cb658-szp48 (v1:metadata.name)
      AGIC_POD_NAMESPACE:             kube-system (v1:metadata.namespace)
      AZURE_ENVIRONMENT:              AZUREPUBLICCLOUD
    Mounts:
      /etc/kubernetes/azure.json from cloud-provider-config (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2wvmn (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  cloud-provider-config:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/azure.json
    HostPathType:  File
  kube-api-access-2wvmn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                     Age                From                       Message
  ----     ------                     ----               ----                       -------
  Normal   Scheduled                  32m                default-scheduler          Successfully assigned kube-system/ingress-appgw-deployment-786d5cb658-szp48 to aks-agentpool1-25902014-vmss000000
  Normal   Pulled                     32m                kubelet                    Container image "mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.7.6" already present on machine
  Normal   Created                    32m                kubelet                    Created container ingress-appgw-container
  Normal   Started                    32m                kubelet                    Started container ingress-appgw-container
  Warning  FailedApplyingAppGwConfig  32m (x2 over 32m)  azure/application-gateway  network.ApplicationGatewaysClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidResourceReference" Message="Resource /subscriptions/company-subscription/resourceGroups/company-rg-1/providers/Microsoft.Network/applicationGateways/nonprod-public-agw-01/urlPathMaps/url-9cd989d2ce94d8d5b665a2c10b238fba referenced by resource /subscriptions/company-subscription/resourceGroups/company-rg-1/providers/Microsoft.Network/applicationGateways/nonprod-public-agw-01/requestRoutingRules/rr-9cd989d2ce94d8d5b665a2c10b238fba was not found. Please make sure that the referenced resource exists, and that both resources are in the same region." Details=[]

Output of `kubectl logs .
Any Azure support tickets associated with this issue.

cwjoalder · Answer 1 · Tue Jan 28 2025 20:46:48 GMT+0800 (China Standard Time)

Just confirmed that the issue starts appearing with version 1.7.5. Anything >1.7,4 produces the above issue in our setup, also checked 1.7.7 manually.

With a (temporary) manual downgrade to 1.7.4, the controller is able to apply/process the configuration successfully.

I0128 12:43:45.990954       1 mutate_app_gateway.go:166] BEGIN AppGateway deployment
I0128 12:43:47.267850       1 client.go:220] OperationID='7eb2694d-d34b-4970-824c-d1d609667673'
I0128 12:43:47.267883       1 mutate_app_gateway.go:174] Applied generated Application Gateway configuration
I0128 12:43:47.267891       1 mutate_app_gateway.go:189] cache: Updated with latest applied config.
I0128 12:43:47.272154       1 mutate_app_gateway.go:193] END AppGateway deployment
I0128 12:43:47.272174       1 controller.go:152] Completed last event loop run in: 1.742905784s

...

I0128 12:43:48.550783       1 targets.go:45] [brownfield] Target {"Hostname":"test.examplecom"} is blacklisted
I0128 12:43:48.550786       1 routing_rules.go:39] [brownfield] Routing Rule rr-9cd989d2ce94d8d5b665a2c10b238fba is blacklisted
I0128 12:43:48.550792       1 routing_rules.go:95] [brownfield] Rules AGIC created: rr-3b1704f66692797cdff6e51bc6a35d31
I0128 12:43:48.550799       1 routing_rules.go:96] [brownfield] Existing Blacklisted Rules AGIC will retain: rr-9cd989d2ce94d8d5b665a2c10b238fba
I0128 12:43:48.550803       1 routing_rules.go:97] [brownfield] Existing Rules AGIC will remove: n/a
I0128 12:43:48.562949       1 mutate_app_gateway.go:153] cache: Config has NOT changed! No need to connect to ARM.
I0128 12:43:48.562969       1 controller.go:152] Completed last event loop run in: 290.060205ms

Chris Enck · Answer 2 · Fri Jan 31 2025 05:44:24 GMT+0800 (China Standard Time)

I encountered this same issue. Oddly enough, it was present in the South Central US Region but not in East US despite both running 1.7.6. Azure support recommended that we migrate to the helm based install since the shared app gateway feature isn't supported for the add-on.

https://learn.microsoft.com/en-us/azure/application-gateway/ingress-controller-overview#difference-between-helm-deployment-and-aks-add-on

cwjoalder · Answer 3 · Fri Jan 31 2025 16:07:19 GMT+0800 (China Standard Time)

Thanks for the addition. I assume I will be getting a similar recommendation from support (created a ticket yesterday). However from what I understand from the error I think the deployment method wont really have an impact on the issue. The only advantage would be that we could control the version to deploy and thus circumvent the issue by not upgrading.

Chris Enck · Answer 4 · Fri Jan 31 2025 19:14:07 GMT+0800 (China Standard Time)

You are correct. I migrated our test cluster to the helm install this morning with 1.7.6 and it's failing with the same error. Downgrading to 1.7.4 resolves the issue, same behavior as the add-on. They kept our ticket open so I'll report this finding to them.

halittiryaki · Answer 5 · Tue Feb 25 2025 01:31:49 GMT+0800 (China Standard Time)

I noticed that this only happens for a multi-path ingress deployment in cluster-b in my case.
with the next agic update in cluster-a, suddenly the reported error urlPathMaps not found shows up in cluster-a/agic and endpoints mapped to cluster-a result in a bad-gateway response. @cwjoalder @cenck09 is a multi-path ingress also involved in your usecase?

cwjoalder · Answer 6 · Tue Feb 25 2025 15:34:06 GMT+0800 (China Standard Time)

@halittiryaki yes, in my case there are multiple paths on both gateways in use.

cwjoalder · Answer 7 · Tue Feb 25 2025 15:59:24 GMT+0800 (China Standard Time)

Discussion with Azure Support lead me to a solution and also a workaround for our current setup.

Ideal solution is to go for a Helm deployment and use the appropriate values on deploy to configure it. Namely this at least needs appgw.shared and appgw.subResourceNamePrefix.

This way all resources will be deployed with a prefix, which solves the issue in our environment.

If however Helm is not an option just yet, the behavior can also be replicated by setting the values APPGW_ENABLE_SHARED_APPGW and APPGW_CONFIG_NAME_PREFIX in the config map ingress-appgw-cm. Please note that this approach is not supported by Azure.

Hope this helps some out there. Will close this as provides a solution as well as a workaround.