Defining a failover policy in an Upstream pointing to AWS Lambda function causes out-of-sync

Question

Defining a failover policy in an Upstream pointing to AWS Lambda function causes out-of-sync

sadieleob opened this issue 2 months ago · comments

Sadiel Ortega commented 2 months ago

Gloo Edge Product

Enterprise

Gloo Edge Version

1.16.8

Kubernetes Version

v1.27.13-eks-3af4770

Describe the bug

Defining a failover policy for an Upstream using AWS Lambda function is rejected by envoy:

glooctl check
Checking deployments... OK
Checking pods... OK
Checking upstreams... OK
Checking upstream groups... OK
Checking auth configs... OK
Checking rate limit configs... OK
Checking VirtualHostOptions... OK
Checking RouteOptions... OK
Checking secrets... OK
Checking virtual services... OK
Checking gateways... OK
Checking proxies... 1 Errors!
Gloo has detected that the data plane is out of sync. The following types of resources have not been accepted: [{resource="type.googleapis.com/envoy.config.cluster.v3.Cluster"} {resource="type.googleapis.com/envoy.config.route.v3.RouteConfiguration"}]. Gloo will not be able to process any other configuration updates until these errors are resolved.
Problem while checking for gloo xds errors
Error: 2 errors occurred:
	* An update to your gateway-proxy deployment was rejected due to schema/validation errors. The envoy_cluster_manager_cds_update_rejected{} metric increased.
You may want to try using the `glooctl proxy logs` or `glooctl debug logs` commands.

	* Problem while checking for gloo xds errors

gateway-proxy-2024-06-03-1717435914.log

Expected Behavior

Gloo Edge should not be generating an Upstream with a failover policy for AWS Lambda

Steps to reproduce the bug

CRs to reproduce

kubectl apply -f - <<EOF
apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: aws-route
  namespace: gloo-system
spec:
  virtualHost:
    domains:
    - '<DOMAIN>'
    routes:
    - matchers:
      - exact: /
      routeAction:
        single:
          destinationSpec:
            aws:
              logicalName: echo
              invocationStyle: SYNC
              unwrapAsApiGateway: true
              wrapAsApiGateway: true
          upstream:
            name: aws-upstream
            namespace: aws-upstream
---
apiVersion: gloo.solo.io/v1
kind: Upstream
metadata:
  name: aws-upstream
  namespace: aws-upstream
spec:
  failover:
    prioritizedLocalities:
    - localityEndpoints:
      - lbEndpoints:
        - address: ab7vke2semfjsxp4ui7tvmd2r40xgost.lambda-url.us-west-2.on.aws
          port: 80
        locality:
          region: us
          zone: us-west-2
  aws:
    lambdaFunctions:
    - logicalName: echo
    region: us-west-2
    secretRef:
      name: aws-creds
      namespace: gloo-system
EOF

Additional Environment Detail

No response

Additional Context

No response

soloio-bot · Answer 1 · Tue Jun 04 2024 02:28:10 GMT+0800 (China Standard Time)

Zendesk ticket #3850 has been linked to this issue.

Eitan Yarmush · Answer 2 · Tue Jun 04 2024 21:58:39 GMT+0800 (China Standard Time)

The API isn't really intended to do this, failover is really meant only to work with static upstream afaik. I think we should probably pre-validate for configurations like this. Also we should maybe consider revisiting this sort of behavior by grouping upstreams rather than inlining priorities like this. Potentially via the aggregate cluster.