Changing awsName attribute in VirtualService requires Virtual Node backend to be recreated to work properly
aldredb opened this issue · comments
Describe the bug
Changing awsName
attribute in VirtualService
requires Virtual Node backend to be recreated to work properly
Steps to reproduce
I deployed DJ App (https://github.com/aws/aws-app-mesh-examples/tree/main/examples/apps/djapp). I changed the awsName
attribute from jazz.prod.svc.cluster.local
to jazz2.prod.svc.cluster.local
and change the service name to jazz2
so that the awsName
can be resolved from the cluster.
---
apiVersion: v1
kind: Service
metadata:
name: jazz2 #<-- Original value = jazz
namespace: prod
labels:
app: jazz
spec:
ports:
- port: 9080
name: http
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualService
metadata:
name: jazz
namespace: prod
spec:
awsName: jazz2.prod.svc.cluster.local #<-- Original value = jazz.prod.svc.cluster.local
provider:
virtualRouter:
virtualRouterRef:
name: jazz-router
curl
-ing to jazz2.prod.svc.cluster.local from DJ pod will result in 404
root@dj-59dcd54bb9-t6nw8:/usr/src/app# curl -v jazz2.prod.svc.cluster.local:9080
* Rebuilt URL to: jazz2.prod.svc.cluster.local:9080/
* Trying 172.20.78.225...
* TCP_NODELAY set
* Connected to jazz2.prod.svc.cluster.local (172.20.78.225) port 9080 (#0)
> GET / HTTP/1.1
> Host: jazz2.prod.svc.cluster.local:9080
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 404 Not Found
< date: Sun, 28 Nov 2021 12:11:50 GMT
< server: envoy
< content-length: 0
<
* Curl_http_done: called premature == 0
* Connection #0 to host jazz2.prod.svc.cluster.local left intact
To make it work, I need to comment out the jazz virtualServiceRef
, apply and uncomment again so that it is recreated
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualNode
metadata:
name: dj
namespace: prod
spec:
podSelector:
matchLabels:
app: dj
...
backends:
# - virtualService: # <-- comment, apply, uncomment and apply jazz virtualServiceRef
# virtualServiceRef:
# name: jazz
- virtualService:
virtualServiceRef:
name: metal
serviceDiscovery:
dns:
hostname: dj.prod.svc.cluster.local
After that, I am able to curl
to jazz2.prod.svc.cluster.local successfully
root@dj-59dcd54bb9-t6nw8:/usr/src/app# curl -v jazz2.prod.svc.cluster.local:9080
* Rebuilt URL to: jazz2.prod.svc.cluster.local:9080/
* Trying 172.20.78.225...
* TCP_NODELAY set
* Connected to jazz2.prod.svc.cluster.local (172.20.78.225) port 9080 (#0)
> GET / HTTP/1.1
> Host: jazz2.prod.svc.cluster.local:9080
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 200 OK
< x-powered-by: Express
< content-type: text/html; charset=utf-8
< content-length: 33
< etag: W/"21-s+1WcpvM4djE33a7u06ogiSSvZg"
< date: Sun, 28 Nov 2021 12:15:57 GMT
< x-envoy-upstream-service-time: 1
< server: envoy
<
* Curl_http_done: called premature == 0
* Connection #0 to host jazz2.prod.svc.cluster.local left intact
["Astrud Gilberto","Miles Davis"]
Expected outcome
Changing awsName
attribute in VirtualService
do not require additional steps for the functionality to work properly
Environment
- App Mesh controller version - v1.4.1
- Envoy version - v1.19.1.0-prod
- Are you using any integrations? X-ray, Jaeger etc. If so versions? No
- Kubernetes version - v1.21
- Using EKS (yes/no), if so version? Yes - v1.21
Additional Context:
Could you try with envoy version v1.20.0.1-prod. We will be releasing controller soon with this envoy version. I tried this walkthrough and changed the color virtual service. It didn't require me to recreate virtual router or virtual node backends
Assigning to @shaileshgupta2k to verify the generated envoy configs if the issue persists
@cgchinmay issue still persists when using envoy version v1.20.0.1-prod
@aldredb Okay, I will take a look and get back to you. Will try with the same walkthrough that you tried. Meanwhile you can also verify the config dump for the envoy and see if the backing service didn't get an update.
Hi, I was able to repro the issue. So when you simply change the awsName, the envoy config for dj app is not getting updated with the updated jazz VirtualService Spec, thats why you see this issue. However if you delete and recreate jazz VirtualService without changing anything then you wouldn't see this issue.
So it appears to me that there is some stale virtual service reference which gets latched onto the VirtualService Name. This requires more investigation. Until we root cause this, you can ensure that VirtualService metadata name is in sync with the awsName prefix and also update VirtualServiceRef for dj VirtualNode so that dj app envoy config is always updated.
We shouldn't allow the awsName to be changed after Create. @cgchinmay is there a way to use a CEL expression to prevent this ? If you want to change awsName you need to create new CRD object.