splunk / splunk-operator

Splunk Operator for Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Splunk Operator: Cannot retrieve apps from S3 when using IRSA

tnycum opened this issue · comments

Please select the type of request

Bug

Tell us more

Describe the request

  • On versions of splunk operator newer than 2.2.0 when using AWS IAM Roles for Service Accounts (IRSA), the operator controller Pod logs an error about not being able to retrieve apps stored in S3
  • Sample error message:
ERROR   GetAppListFromRemoteBucket      Unable to get apps list
{{"controller"}, "namespace": "splunk", "name": "sh", "reconcileID": "1e31ce12-83a0-4862-b83b-cb3bc9c9353c", "name": "sh", "namespace": "splunk", "appSource": "esApps", "error": ""WebldentityErr: failed to retrieve credentials#n
caused by: SerializationError: failed to unmarshal error message\n\t
status code: 405, request id: \n
caused by: UnmarshalError: failed to unmarshal error message\n\t
00000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 |<?xml version=\"1|\n
00000010 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 55 54 |.0\" encoding=\"UT|\n
00000020 46 2d 38 22 3f 3e 0a 3c 45 72 72 6f 72 3e 3c 43 |F-8\"?>.<Error> <C|\n
00000030 6f 64 65 3e 4d 65 74 68 6f 64 4e 6f 74 41 6c 6c lode>MethodNotAll|\Hn
00000040 6f 77 65 64 3c 2f 43 6f 64 65 3e 3c 4d 65 73 73 lowed</Code><Mess|\n
00000050 61 67 65 3e 54 68 65 20 73 70 65 63 69 66 69 65 |age> The specifie|\n
00000060 64 20 6d 65 74 68 6f 64 20 69 73 20 6e 6f 74 20 Id method is not |\n
00000070 61 6c 6c 6f 77 65 64 20 61 67 61 69 6e 73 74 20 |allowed against |\n
00000080 74 68 69 73 20 72 65 73 6f 75 72 63 65 2e 3c 2f |this resource. </|\n
00000090 4d 65 73 73 61 67 65 3e 3c 4d 65 74 68 6f 64 3e |Message> <Method>I\n
000000a0 50 4f 53 54 3c 2f 4d 65 74 68 6f 64 3e 3c 52 65 |POST</Method><Re |\n
000000b0 73 6f 75 72 63 65 54 79 70 65 3e 53 45 52 56 49 |sourceType>SERVI|\n
000000c0 43 45 3c 2f 52 65 73 6f 75 72 63 65 54 79 70 65 |CE</ResourceType||\n
000000d0 3e 3c 52 65 71 75 65 73 74 49 64 3e 4a 4d 57 51 |><Requestld>JMWQ|\n
000000e0 54 56 34 45 44 51 58 35 47 35 52 32 3c 2f 52 65 |TV4EDQX5G5R2</Re|\n
000000f0 71 75 65 73 74 49 64 3e 3c 48 6f 73 74 49 64 3e |questld><Hostld>|\n
00000100 66 61 67 39 69 33 67 69 4e 35 38 32 74 6f 46 48 |fag9i3giN582toFH|\n
00000110 53 36 76 43 2b 38 34 45 4c 4a 4a 58 57 7a 68 6a |S6vC+84ELJJXWzhj|\n
00000120 39 6d 6b 79 4f 46 70 2b 69 56 4f 44 34 70 53 71 |9mkyOFp+iVOD4pSq|\n
00000130 42 76 49 66 72 50 65 59 41 36 51 6a 36 74 6b 4e |BvlfrPeYA6Qj6tkN|\n
00000140 33 48 76 76 75 34 6a 65 63 47 59 3d 3c 2f 48 6f |3Hvvu4jecGY=</Ho|\n
00000150 73 74 49 64 3e 3c 2f 45 72 72 6f 72 3e |Istld></Error>|\n\n
caused by: unknown error response tag, {{ Error} []}
"}
[github.com/splunk/splunk-operator/pkg/splunk/enterprise.GetAppListFromRemoteBucket](http://github.com/splunk/splunk-operator/pkg/splunk/enterprise.GetAppListFromRemoteBucket)
        /workspace/pkg/splunk/enterprise/util.go:867
...

Expected behavior

  • The apps from S3 should be listed by the operator Pod with no errors

Splunk setup on K8S

  • Distributed Clustered Deployment + SHC with Multi-Site, deployed with Helm

Reproduction/Testing steps

  • Steps to reproduce the bug. For an enhancement or feature request, please provide steps to test.
  • Use IRSA and link the service account to an IAM Role with full permissions on the s3 bucket used to store Splunk Apps
  • Set up cluster manager to include following Helm values:
clusterManager:
  enabled: true
  appRepo:
    appsRepoPollIntervalSeconds: 60
    defaults:
      volumeName: app_vol
    appSources:
    - name: esCMApps
      location: es-cm/
      scope: cluster
      volumeName: app_vol
    volumes:
    - name: app_vol
      storageType: s3
      provider: aws
      path: <bucket-name>/path/to/splunk/apps/
      endpoint: https://s3.<region>.amazonaws.com
      region: <region>

K8s environment

  • Running on AWS EKS
  • Kubernetes v1.23.17

Proposed changes(optional)

  • Proposed change, if any.

K8s collector data(optional)

  • n/a

Additional context(optional)

@vivekr-splunk I see you opened a PR for this issue. Assuming this gets merged, when can we expect to get a new release of the operator image to include this fix?

@tnycum thank you for your code changes, i have created new PR with your changes as pipeline test cannot be run on your PR, this should get merged soon to develop branch. will update you soon on the release dates.

I am having similar issue. Though no errors have been reported by splunk or kubernetes.
this is my config file for clustermanager ; CM comes up no issues however it does not even try to pull the apps from S3.
Secret is defined apps are present on S3, what am I doing wrong.

---
apiVersion: enterprise.splunk.com/v4
kind: ClusterManager
metadata:
  name: cm
  namespace: splunk-operator
  finalizers:
  - enterprise.splunk.com/delete-pvc
spec:
  serviceTemplate:
     spec:
       type: LoadBalancer
  appRepo:
    appsRepoPollIntervalSeconds: 900
    defaults:
      volumeName: volume_app_repo_us
      scope: cluster
    appSources:
      - name: networkApps
        location: CLUSTER_MASTER/
        scope: cluster
        volumeName: ccss-dev-splunk
      #- name: networkApps
      #  location: networkAppsLoc/
      #- name: adminApps
      #  location: adminAppsLoc/
      #  scope: local
    volumes:
      - name: ccss-dev-splunk
        storageType: s3
        provider: aws
        path: ccss-dev-splunk/CLUSTER_MASTER
        endpoint: https://ccss-dev-splunk
        region: us-east-1
        secretRef: s3-secret
---