aws / aws-app-mesh-controller-for-k8s

A controller to help manage App Mesh resources for a Kubernetes cluster.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Changing awsName attribute in VirtualService requires Virtual Node backend to be recreated to work properly

aldredb opened this issue · comments

Describe the bug
Changing awsName attribute in VirtualService requires Virtual Node backend to be recreated to work properly

Steps to reproduce

I deployed DJ App (https://github.com/aws/aws-app-mesh-examples/tree/main/examples/apps/djapp). I changed the awsName attribute from jazz.prod.svc.cluster.local to jazz2.prod.svc.cluster.local and change the service name to jazz2 so that the awsName can be resolved from the cluster.

---
apiVersion: v1
kind: Service
metadata:
  name: jazz2 #<-- Original value = jazz
  namespace: prod
  labels:
    app: jazz
spec:
  ports:
  - port: 9080
    name: http
---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualService
metadata:
  name: jazz
  namespace: prod
spec:
  awsName: jazz2.prod.svc.cluster.local #<-- Original value = jazz.prod.svc.cluster.local 
  provider:
    virtualRouter:
      virtualRouterRef:
        name: jazz-router

curl-ing to jazz2.prod.svc.cluster.local from DJ pod will result in 404

root@dj-59dcd54bb9-t6nw8:/usr/src/app# curl -v jazz2.prod.svc.cluster.local:9080
* Rebuilt URL to: jazz2.prod.svc.cluster.local:9080/
*   Trying 172.20.78.225...
* TCP_NODELAY set
* Connected to jazz2.prod.svc.cluster.local (172.20.78.225) port 9080 (#0)
> GET / HTTP/1.1
> Host: jazz2.prod.svc.cluster.local:9080
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 404 Not Found
< date: Sun, 28 Nov 2021 12:11:50 GMT
< server: envoy
< content-length: 0
<
* Curl_http_done: called premature == 0
* Connection #0 to host jazz2.prod.svc.cluster.local left intact

To make it work, I need to comment out the jazz virtualServiceRef, apply and uncomment again so that it is recreated

---
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualNode
metadata:
  name: dj
  namespace: prod
spec:
  podSelector:
    matchLabels:
      app: dj
...
  backends:
#    - virtualService: # <-- comment, apply, uncomment and apply jazz virtualServiceRef
#       virtualServiceRef:
#          name: jazz
    - virtualService:
        virtualServiceRef:
          name: metal
  serviceDiscovery:
    dns:
      hostname: dj.prod.svc.cluster.local

After that, I am able to curl to jazz2.prod.svc.cluster.local successfully

root@dj-59dcd54bb9-t6nw8:/usr/src/app# curl -v jazz2.prod.svc.cluster.local:9080
* Rebuilt URL to: jazz2.prod.svc.cluster.local:9080/
*   Trying 172.20.78.225...
* TCP_NODELAY set
* Connected to jazz2.prod.svc.cluster.local (172.20.78.225) port 9080 (#0)
> GET / HTTP/1.1
> Host: jazz2.prod.svc.cluster.local:9080
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 200 OK
< x-powered-by: Express
< content-type: text/html; charset=utf-8
< content-length: 33
< etag: W/"21-s+1WcpvM4djE33a7u06ogiSSvZg"
< date: Sun, 28 Nov 2021 12:15:57 GMT
< x-envoy-upstream-service-time: 1
< server: envoy
<
* Curl_http_done: called premature == 0
* Connection #0 to host jazz2.prod.svc.cluster.local left intact
["Astrud Gilberto","Miles Davis"]

Expected outcome
Changing awsName attribute in VirtualService do not require additional steps for the functionality to work properly

Environment

  • App Mesh controller version - v1.4.1
  • Envoy version - v1.19.1.0-prod
  • Are you using any integrations? X-ray, Jaeger etc. If so versions? No
  • Kubernetes version - v1.21
  • Using EKS (yes/no), if so version? Yes - v1.21

Additional Context:

Could you try with envoy version v1.20.0.1-prod. We will be releasing controller soon with this envoy version. I tried this walkthrough and changed the color virtual service. It didn't require me to recreate virtual router or virtual node backends

Assigning to @shaileshgupta2k to verify the generated envoy configs if the issue persists

@cgchinmay issue still persists when using envoy version v1.20.0.1-prod

@aldredb Okay, I will take a look and get back to you. Will try with the same walkthrough that you tried. Meanwhile you can also verify the config dump for the envoy and see if the backing service didn't get an update.

Hi, I was able to repro the issue. So when you simply change the awsName, the envoy config for dj app is not getting updated with the updated jazz VirtualService Spec, thats why you see this issue. However if you delete and recreate jazz VirtualService without changing anything then you wouldn't see this issue.
So it appears to me that there is some stale virtual service reference which gets latched onto the VirtualService Name. This requires more investigation. Until we root cause this, you can ensure that VirtualService metadata name is in sync with the awsName prefix and also update VirtualServiceRef for dj VirtualNode so that dj app envoy config is always updated.

We shouldn't allow the awsName to be changed after Create. @cgchinmay is there a way to use a CEL expression to prevent this ? If you want to change awsName you need to create new CRD object.