InvalidResourceReference - urlPathMaps referenced by requestRoutingRules was not found
cwjoalder opened this issue · comments
Describe the bug
We are using AGIC as an AKS addon but have it enabled to run in a shared App Gateway setup (added CRD AzureIngressProhibitedTarget and APPGW_ENABLE_SHARED_APPGW: true to the config map) between two clusters (DEV/TEST) which was working fine for the last ~2 years.
After a recent change to an ingress we noticed nothing got updated and found the following error in the logs of the controller on the DEV environment.
E0127 14:58:16.128358 1 worker.go:72] Error processing event.network.ApplicationGatewaysClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidResourceReference" Message="Resource /subscriptions/company-subscription/resourceGroups/company-rg-1/providers/Microsoft.Network/applicationGateways/nonprod-public-agw-01/urlPathMaps/url-9cd989d2ce94d8d5b665a2c10b238fba referenced by resource /subscriptions/company-subscription/resourceGroups/company-rg-1/providers/Microsoft.Network/applicationGateways/nonprod-public-agw-01/requestRoutingRules/rr-9cd989d2ce94d8d5b665a2c10b238fba was not found. Please make sure that the referenced resource exists, and that both resources are in the same region." Details=[]
We are using a AzureIngressProhibitedTarget based on hostname like
apiVersion: appgw.ingress.k8s.io/v1 ️
kind: AzureIngressProhibitedTarget
metadata:
name: inthub-test-ingress-prohibited-target spec:
hostname: test.example.com
What is interessting about the error is, that the referenced urlPathMap url-9cd989d2ce94d8d5b665a2c10b238fba is from the other environment (TEST) and thus should not be ignored. Even seems as it is ignored in some parts of the logic, but not others as the path map definitely exists as evident when queried on the Azure management API.
This currently blocks us in rolling out any ingress changes in the affected environments. As a test I've manually changed the version in the deployment to 1.7.4, which seems to still have been able to apply the changes. However that is not a sustainable fix as the version is managed by Azure if AGIC is deployed as an add-on.
Current controller version is 1.7.6 as managed by Azure.
Redacted internal names and IDs.
After additional search it looks like #1671 has a very similar error and might be related.
To Reproduce
Steps to reproduce the behavior:
- Create shared App Gateway setup with version <1.7.5 (see specifics above)
- Upgrade to >1.7.6
- Wait for resync or restart ingress controller
Ingress Controller details
- Output of
kubectl describe pod <ingress controller> . The pod name can be obtained by runninghelm list.
Name: ingress-appgw-deployment-786d5cb658-szp48
Namespace: kube-system
Priority: 0
Service Account: ingress-appgw-sa
Node: aks-agentpool1-25902014-vmss000000/10.153.10.5
Start Time: Mon, 27 Jan 2025 15:58:05 +0100
Labels: app=ingress-appgw
kubernetes.azure.com/managedby=aks
pod-template-hash=786d5cb658
Annotations: checksum/config: d6dd7d4cc4c7c004ac449609f0e28835d1e6fbcf805219999d0ca96a687b3394
cluster-autoscaler.kubernetes.io/safe-to-evict: true
kubernetes.azure.com/metrics-scrape: true prometheus.io/path: /metrics
prometheus.io/port: 8123
prometheus.io/scrape: true resource-id:
/subscriptions/company-subscription/resourceGroups/company-rg-1/providers/Microsoft.ContainerService/managedC...
Status: Running IP: 10.153.10.15
IPs:
IP: 10.153.10.15 Controlled By: ReplicaSet/ingress-appgw-deployment-786d5cb658
Containers: ingress-appgw-container:
Container ID: containerd://379387b828d58c17e2a9724be31e8239f24191ada7080630747d0e252a21adff Image: mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.7.6 Image ID: mcr.microsoft.com/azure-application-gateway/kubernetes-ingress@sha256:b1a4bc293ac673d29524f3340a3c76ba008b3ed60def578f57a7789b16f2ef0f Port: <none> Host Port: <none> State: Running Started: Mon, 27 Jan 2025 15:58:06 +0100 Ready: True Restart Count: 0
Limits:
cpu: 700m
memory: 600Mi
Requests:
cpu: 100m
memory: 20Mi
Liveness: http-get http://:8123/health/alive delay=15s timeout=1s period=20s #success=1 #failure=3
Readiness: http-get http://:8123/health/ready delay=5s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
ingress-appgw-cm ConfigMap Optional: false
Environment:
KUBERNETES_SERVICE_HOST: dev-xxx.blub.privatelink.switzerlandnorth.azmk8s.io
KUBERNETES_PORT: tcp://dev-xxx.blub.privatelink.switzerlandnorth.azmk8s.io:443
KUBERNETES_PORT_443_TCP: tcp://dev-xxx.blub.privatelink.switzerlandnorth.azmk8s.io:443
KUBERNETES_PORT_443_TCP_ADDR: dev-xxx.blub.privatelink.switzerlandnorth.azmk8s.io
AZURE_CLOUD_PROVIDER_LOCATION: /etc/kubernetes/azure.json
AGIC_POD_NAME: ingress-appgw-deployment-786d5cb658-szp48 (v1:metadata.name)
AGIC_POD_NAMESPACE: kube-system (v1:metadata.namespace)
AZURE_ENVIRONMENT: AZUREPUBLICCLOUD
Mounts:
/etc/kubernetes/azure.json from cloud-provider-config (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2wvmn (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
cloud-provider-config:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/azure.json
HostPathType: File
kube-api-access-2wvmn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 32m default-scheduler Successfully assigned kube-system/ingress-appgw-deployment-786d5cb658-szp48 to aks-agentpool1-25902014-vmss000000
Normal Pulled 32m kubelet Container image "mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.7.6" already present on machine
Normal Created 32m kubelet Created container ingress-appgw-container
Normal Started 32m kubelet Started container ingress-appgw-container
Warning FailedApplyingAppGwConfig 32m (x2 over 32m) azure/application-gateway network.ApplicationGatewaysClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidResourceReference" Message="Resource /subscriptions/company-subscription/resourceGroups/company-rg-1/providers/Microsoft.Network/applicationGateways/nonprod-public-agw-01/urlPathMaps/url-9cd989d2ce94d8d5b665a2c10b238fba referenced by resource /subscriptions/company-subscription/resourceGroups/company-rg-1/providers/Microsoft.Network/applicationGateways/nonprod-public-agw-01/requestRoutingRules/rr-9cd989d2ce94d8d5b665a2c10b238fba was not found. Please make sure that the referenced resource exists, and that both resources are in the same region." Details=[]
- Output of `kubectl logs .
- Any Azure support tickets associated with this issue.
Just confirmed that the issue starts appearing with version 1.7.5. Anything >1.7,4 produces the above issue in our setup, also checked 1.7.7 manually.
With a (temporary) manual downgrade to 1.7.4, the controller is able to apply/process the configuration successfully.
I0128 12:43:45.990954 1 mutate_app_gateway.go:166] BEGIN AppGateway deployment
I0128 12:43:47.267850 1 client.go:220] OperationID='7eb2694d-d34b-4970-824c-d1d609667673'
I0128 12:43:47.267883 1 mutate_app_gateway.go:174] Applied generated Application Gateway configuration
I0128 12:43:47.267891 1 mutate_app_gateway.go:189] cache: Updated with latest applied config.
I0128 12:43:47.272154 1 mutate_app_gateway.go:193] END AppGateway deployment
I0128 12:43:47.272174 1 controller.go:152] Completed last event loop run in: 1.742905784s
...
I0128 12:43:48.550783 1 targets.go:45] [brownfield] Target {"Hostname":"test.examplecom"} is blacklisted
I0128 12:43:48.550786 1 routing_rules.go:39] [brownfield] Routing Rule rr-9cd989d2ce94d8d5b665a2c10b238fba is blacklisted
I0128 12:43:48.550792 1 routing_rules.go:95] [brownfield] Rules AGIC created: rr-3b1704f66692797cdff6e51bc6a35d31
I0128 12:43:48.550799 1 routing_rules.go:96] [brownfield] Existing Blacklisted Rules AGIC will retain: rr-9cd989d2ce94d8d5b665a2c10b238fba
I0128 12:43:48.550803 1 routing_rules.go:97] [brownfield] Existing Rules AGIC will remove: n/a
I0128 12:43:48.562949 1 mutate_app_gateway.go:153] cache: Config has NOT changed! No need to connect to ARM.
I0128 12:43:48.562969 1 controller.go:152] Completed last event loop run in: 290.060205ms
I encountered this same issue. Oddly enough, it was present in the South Central US Region but not in East US despite both running 1.7.6. Azure support recommended that we migrate to the helm based install since the shared app gateway feature isn't supported for the add-on.
Thanks for the addition. I assume I will be getting a similar recommendation from support (created a ticket yesterday). However from what I understand from the error I think the deployment method wont really have an impact on the issue. The only advantage would be that we could control the version to deploy and thus circumvent the issue by not upgrading.
You are correct. I migrated our test cluster to the helm install this morning with 1.7.6 and it's failing with the same error. Downgrading to 1.7.4 resolves the issue, same behavior as the add-on. They kept our ticket open so I'll report this finding to them.
I noticed that this only happens for a multi-path ingress deployment in cluster-b in my case.
with the next agic update in cluster-a, suddenly the reported error urlPathMaps not found shows up in cluster-a/agic and endpoints mapped to cluster-a result in a bad-gateway response. @cwjoalder @cenck09 is a multi-path ingress also involved in your usecase?
@halittiryaki yes, in my case there are multiple paths on both gateways in use.
Discussion with Azure Support lead me to a solution and also a workaround for our current setup.
Ideal solution is to go for a Helm deployment and use the appropriate values on deploy to configure it. Namely this at least needs appgw.shared and appgw.subResourceNamePrefix.
This way all resources will be deployed with a prefix, which solves the issue in our environment.
If however Helm is not an option just yet, the behavior can also be replicated by setting the values APPGW_ENABLE_SHARED_APPGW and APPGW_CONFIG_NAME_PREFIX in the config map ingress-appgw-cm. Please note that this approach is not supported by Azure.
Hope this helps some out there. Will close this as provides a solution as well as a workaround.
