splunk / ansible-role-for-splunk

Splunk@Splunk's Ansible role for installing Splunk, upgrading Splunk, and installing apps/addons on Splunk deployments (VM/bare metal)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Playbook fails at ERROR: Misconfiguration detected! Unable to proceed as handlers will fail in the play later

Ces-Ces opened this issue · comments

commented

My play is failing every time when i run multiple configuration files from a playbook.
I have my variables splunk_use_initd and splunk_use_sytemd defined in group_vars/all.yml as follow:

splunk_use_initd: false  # If set to true, the system will use init.d. Default false
splunk_use_systemd: true # DO NOT EDIT. To use init.d, set `splunk_use_initd` to true.

My playbook contains:

-  hosts:
    - clustermanager
    - indexer
  roles:
    - ../roles/splunk
  vars:
    - deployment_task: check_splunk.yml

- hosts:
    - clustermanager
  roles:
    - ../roles/splunk
  vars:
    - deployment_task: configure_idxc_manager.yml

- hosts:
    - indexer
  roles:
    - ../roles/splunk
  vars:
    - deployment_task: configure_idxc_member.yml

This fails every time at configure_idxc_manager.yml task at the main.yml fail task:
ERROR: Misconfiguration detected! Unable to proceed as handlers will fail in the play later."

This is also confusing from roles/splunk/main.yml:

Fail the play if the currently configuredboot-start method does match the expected state or boot-start is not enabled

VS

Either splunk boot-start is not enabled on this host, or its current boot-start method does not matched the expected value of splunk_use_initd/splunk_use_systemd

If set boot-start misconfigured var is being skipped but the Fail the play if the currently configure boot-start... still fails.

TASK [../roles/splunk : Set boot-start misconfigured var] **********************
skipping: [splunk-host-indexer-manager]

Then it fails:
ERROR: Misconfiguration detected! Unable to proceed as handlers will fail in the play later."

This seems it's getting the value from the previous play (check_splunk.yml ) set_fact since these are the values being returned (even when set boot-start misconfigured var is being skipped):

ok: [splunk-host-indexer-manager] => {
     "msg": [
         true,
         true,
         "configure_idxc_manager.yml"
     ]
 }

Am I missing something?
Please let me know if any other information is needed.

Thanks in advance.

You should not set splunk_use_systemd variable in group_vars/all.yml, It's right there in the comment to not change that.

All the variable that are set by main.yml should not be changed manually, The reason it's there is so if you want to re-configure boot-start, the configure_boot_start.yml knows what to do. The main.yml checks how your host is configured so it knows how to handle certain operations later in the play, like restarting the service etc. and if it's not configured correctly, some tasks will fail.

The only time you would want to set splunk_use_initd is when you have a system that support systemd, and for some reson you want it to use SysV instead, the you should set splunk_use_initd: True.

commented

Hi @dtwersky,

leaving the below variables as it is in default/main.yml file the issue persists.

splunk_use_initd: false  # If set to true, the system will use init.d. Default false
splunk_use_systemd: true # DO NOT EDIT. To use init.d, set `splunk_use_initd` to true.

I still get this:

The Set boot-start misconfigured var task is skipped:

TASK [../roles/splunk : Set boot-start misconfigured var] **********************
skipping: [splunk-host-indexer-manager]

My debug shows all the conditions as follow:

ok: [splunk-host-indexer-manager] => {
     "msg": [
         true,
         true,
         "configure_idxc_manager.yml"
     ]
 }

Then the task fails because all the conditions were met including the configure_boot_start

ERROR: Misconfiguration detected! Unable to proceed as handlers will fail in the play later.

According to variable precedence unless I'm missing something, group_vars/all.yml should overwrite the default/main.yml variables. So, that doesn't seem the issue.

As mentioned previously, it seems the configure_boot_start set_fact persists between plays in the same playbook (In this case the second play configure_idxc_manager.yml takes the set_fact that was set in the previous play check_splunk.yml).

Please let me know if any other information is needed.

@cesar-ayuuk Please add the following on line 125 to tasks/main.yml, and run it again, and share the output of that task.

- debug:
    msg:
      - "current_start_method is {{ current_start_method }}"
      - "desired_start_method is {{ desired_start_method }}"
      - "configure_boot_start is {{ configure_boot_start }}"
      - "splunk_use_initd is {{ splunk_use_initd }}"
      - "splunk_use_systemd is {{ splunk_use_systemd }}"
      - "OS is {{ ansible_distribution }} version {{ ansible_distribution_version }}"
   failed_when: False
commented

Hi @jewnix,

Thanks for your quick response:
At first I got the below error since there is not current_start_method and splunk is not installed yet.

The task includes an option with an undefined variable. The error was: 'current_start_method' is
│ undefined

I commented current_start_method out and this is the output when the first task (check_splunk) runs:

TASK [../roles/splunk : debug] *************************************************
ok: [splunk-host-indexer-manager] => {
    "msg": [
        "desired_start_method is systemd",
        "configure_boot_start is True",
        "splunk_use_initd is False",
        "splunk_use_systemd is True",
        "OS is CentOS version 7.9"
    ]
}

Here is the output when the second task (configure_idxc_manager.yml) runs:

TASK [../roles/splunk : debug] *************************************************
ok: [splunk-host-indexer-manager] => {
    "msg": [
        "desired_start_method is systemd",
        "configure_boot_start is True",
        "splunk_use_initd is False",
        "splunk_use_systemd is True",
        "OS is CentOS version 7.9"
    ]
}

Same output, then the second task fails with the same error:

Run it with -vv and paste the output of the Check active boot-start configuration task. It's on line 86 in tasks/main.yml.

Also, is it configured to run at boot? Check by running $SPLUNK_HOME/bin/splunk display boot-start

commented

Output below of Check active boot-start configuration block when the second task (configure_idxc_manager.yml) is run whereas the first task (check_splunk.yml) skips all of them since splunkd_found.stat.exists condition returns false:

TASK [../roles/splunk : Check if active boot-start configuration is systemd] ***
task path: /home/user/ansible-role-for-splunk/roles/splunk/tasks/main.yml:79
ok: [splunk-host-indexer-manager] => {"changed": false, "stat": {"atime": 1669241598.447853, "attr_flags": "", "attributes": [], "block_size": 4096, "blocks": 8, "charset": "us-ascii", "checksum": "26fd2b4793fc1ddd814390a10ebf588e152a997d", "ctime": 1669241587.3930836, "dev": 2050, "device_type": 0, "executable": false, "exists": true, "gid": 0, "gr_name": "root", "inode": 51966280, "isblk": false, "ischr": false, "isdir": false, "isfifo": false, "isgid": false, "islnk": false, "isreg": true, "issock": false, "isuid": false, "mimetype": "text/plain", "mode": "0644", "mtime": 1669241587.3920836, "nlink": 1, "path": "/etc/systemd/system/Splunkd.service", "pw_name": "root", "readable": true, "rgrp": true, "roth": true, "rusr": true, "size": 1013, "uid": 0, "version": "18446744073672008787", "wgrp": false, "woth": false, "writeable": true, "wusr": true, "xgrp": false, "xoth": false, "xusr": false}}

TASK [../roles/splunk : Set current_start_method var to systemd if systemd is being used for splunk] ***
task path: /home/user/ansible-role-for-splunk/roles/splunk/tasks/main.yml:86
ok: [splunk-host-indexer-manager] => {"ansible_facts": {"current_start_method": "systemd"}, "changed": false}

TASK [../roles/splunk : Check if active boot-start method is init.d] ***********
task path: /home/user/ansible-role-for-splunk/roles/splunk/tasks/main.yml:91
ok: [splunk-host-indexer-manager] => {"changed": false, "stat": {"exists": false}}

TASK [../roles/splunk : Set current_start_method to initd if initd is being used for splunk] ***
task path: /home/user/ansible-role-for-splunk/roles/splunk/tasks/main.yml:98
skipping: [splunk-host-indexer-manager] => {"changed": false, "skip_reason": "Conditional result was False"}

TASK [../roles/splunk : set current_start_method to disabled if boot-start is disabled] ***
task path: /home/user/ansible-role-for-splunk/roles/splunk/tasks/main.yml:103
skipping: [splunk-host-indexer-manager] => {"changed": false, "skip_reason": "Conditional result was False"}

Yes, it was configured when the first task check_splunk.yml ran:

[root@splunk-host-indexer-manager bin]# ./splunk display boot-start
Checking if init.d script exists:
File is not installed (checked: /etc/init.d/splunk).
Init script is not configured to run at boot.

Checking if systemd unit file exists:
Systemd unit file installed at /etc/systemd/system/Splunkd.service.
Polkit rules are not configured.
Configured as systemd managed service.

I can see that it does set the current_start_method, so I'm not sure why it thinks it's undefined later of. If it would have really been undefined, then it should have been set to disabled on line 113. This is strange.

Did you pull the most recent version of the repo?

What is the command you are running to play this?
Try running:
ansible-playbook playbooks/splunk_install_or_upgrade.yml -i <your_inventory.yml> --limit indexer

commented

I just did a new clone of the repo to test things. I am and have been running the command like this:

ansible-playbook -i ../environments/production/inventory.yml splunk_idxc_deploy.yml

The playbook splunk_idxc_deploy.yml content:

-  hosts:
    - clustermanager
    - indexer
  roles:
    - ../roles/splunk
  vars:
    - deployment_task: check_splunk.yml

- hosts:
    - clustermanager
  roles:
    - ../roles/splunk
  vars:
    - deployment_task: configure_idxc_manager.yml

- hosts:
    - indexer
  roles:
    - ../roles/splunk
  vars:
    - deployment_task: configure_idxc_member.yml

This failed again with the same error.
ERROR: Misconfiguration detected! Unable to proceed as handlers will fail in the play later.

since splunk_install_or_upgrade.yml file only includes check_splunk.yml task, this will never fail. Do you suggest to include the configure_idxc_manager.yml and configure_idxc_member.yml tasks.

@Ces-Ces does this only happen on a fresh install? What happens if you run the playbook again after it fails?

I was able to reproduce it, and identified the issue. When check_splunk.yml runs for the first time, it sets configure_boot_start: true on line 120, since splunk is not yet installed.
After splunk is installed, and the next task in the play is executed, it fails because it meets the condition of splunkd_found.stat.exists and configure_boot_start is defined in the task that fails, since the configure_boot_start variable does not reset after the install.
I tried running it twice, and it succeeded on the second run.

commented

As @jewnix mentioned, it passes the second time you run the playbook. However, configure_boot_start should not be set when the second task runs, it should be skipped.

My workaround to skip the Fail the play... task was to add a condition not set_fact_problem.skipped as below.

- name: Set boot-start misconfigured var
  set_fact:
    configure_boot_start: true
  when: >
    not splunkd_found.stat.exists or
    current_start_method != desired_start_method or
    current_start_method == "disabled"
  register: set_fact_problem
- name: Fail the play if the currently configured boot-start method does match the expected state or boot-start is not enabled
  fail:
    msg:
      - "ERROR: Misconfiguration detected! Unable to proceed as handlers will fail in the play later."
      - "Either splunk boot-start is not enabled on this host, or its current boot-start method does not matched the expected value of splunk_use_initd/splunk_use_systemd."
      - "To correct this: Either run configure_splunk_boot.yml or update the value of splunk_use_initd/splunk_use_systemd in your group_vars."
  when:
    - splunkd_found.stat.exists
    - configure_boot_start is defined
    - not deployment_task == "configure_splunk_boot.yml"
    - not set_fact_problem.skipped

I created a PR that does it a bit different, but both solution will work.