flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.

Home Page:https://www.flatcar.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Installation issue with dell server

melaz opened this issue · comments

Description

I'm trying to install flatcar using pxe boot via typhoon. 2 of 3 servers boots perfectly fine, and join cluster. Dell rx740xd while trying to install it fails. instead of creating multiple partitions in GPT it creates PMBR with only 2 partitions

Impact

Cant bring the cluster up

Environment and steps to reproduce

  1. Set-up: dell rx740xd, HBA330 mini (tried with PERC in eHBA also), 2x1.92TB SSD, UEFI (no secure-boot)
  2. Task: installer service fails
Apr 17 14:11:30 k8s-worker-01 installer[2393]: mount: /tmp/flatcar-install.XuA4mjZEqr/oemfs: wrong fs type, bad option, bad superblock on , missing codepage or helper program, or other error.
Apr 17 14:11:30 k8s-worker-01 installer[2393]:        dmesg(1) may have more information after failed mount system call.
Apr 17 14:11:30 k8s-worker-01 installer[2394]: ERROR: mount check: cannot open : No such file or directory
Apr 17 14:11:30 k8s-worker-01 installer[2251]: Error: return code 1 from [[ -n "${IGNITION}" ]]
Apr 17 14:11:30 k8s-worker-01 installer[2395]: wipefs: failed to create a signature backup, $HOME undefined

Disk /dev/sda: 1.75 TiB, 1920383410176 bytes, 468843606 sectors
Disk model: PX05SRQ192      
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot Start    End Sectors Size Id Type
/dev/sda1  *     4096 266239  262144   1G  c W95 FAT32 (LBA)
/dev/sda2           1   4095    4095  16M ee GPT

Partition table entries are not in disk order.

  1. Action(s): Did multiple changes in BIOS nothing changed behaviour
  2. Error: Failed Units: 1
    installer.service

Expected behavior

Should install and boot like 2 other servers

Can you give the output of journalctl -k for the timespan where the service failed?

Apr 17 14:11:00 localhost kernel: i40e 0000:19:00.1: fw 8.84.66032 api 1.14 nvm 8.40 0x8000af80 20.5.13 [8086:1572] [1028:1f99]
Apr 17 14:11:00 localhost kernel: ice 0000:5e:00.1 ens3f1: renamed from eth0
Apr 17 14:11:00 localhost kernel: ice 0000:5e:00.0 ens3f0: renamed from eth1
Apr 17 14:11:00 localhost kernel: ice 0000:5e:00.0 ens3f0: NIC Link is up 100 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg Advertised: On, Autoneg Negotiated: False, Flow Control: None
Apr 17 14:11:00 localhost kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ens3f0: link becomes ready
Apr 17 14:11:00 localhost kernel: scsi host7: ahci
Apr 17 14:11:00 localhost kernel: scsi host8: ahci
Apr 17 14:11:00 localhost kernel: scsi host9: ahci
Apr 17 14:11:00 localhost kernel: scsi host10: ahci
Apr 17 14:11:00 localhost kernel: scsi host11: ahci
Apr 17 14:11:00 localhost kernel: i40e 0000:19:00.1: MAC address: 24:6e:96:6e:68:be
Apr 17 14:11:00 localhost kernel: scsi host12: ahci
Apr 17 14:11:00 localhost kernel: scsi host13: ahci
Apr 17 14:11:00 localhost kernel: scsi host14: ahci
Apr 17 14:11:00 localhost kernel: ata7: SATA max UDMA/133 abar m524288@0x92c00000 port 0x92c00100 irq 667
Apr 17 14:11:00 localhost kernel: ata8: SATA max UDMA/133 abar m524288@0x92c00000 port 0x92c00180 irq 667
Apr 17 14:11:00 localhost kernel: ata9: SATA max UDMA/133 abar m524288@0x92c00000 port 0x92c00200 irq 667
Apr 17 14:11:00 localhost kernel: ata10: SATA max UDMA/133 abar m524288@0x92c00000 port 0x92c00280 irq 667
Apr 17 14:11:00 localhost kernel: ata11: SATA max UDMA/133 abar m524288@0x92c00000 port 0x92c00300 irq 667
Apr 17 14:11:00 localhost kernel: ata12: SATA max UDMA/133 abar m524288@0x92c00000 port 0x92c00380 irq 667
Apr 17 14:11:00 localhost kernel: ata13: SATA max UDMA/133 abar m524288@0x92c00000 port 0x92c00400 irq 667
Apr 17 14:11:00 localhost kernel: ata14: SATA max UDMA/133 abar m524288@0x92c00000 port 0x92c00480 irq 667
Apr 17 14:11:00 localhost kernel: i40e 0000:19:00.1: PCI-Express: Speed 8.0GT/s Width x8
Apr 17 14:11:00 localhost kernel: usb 1-14: new high-speed USB device number 2 using xhci_hcd
Apr 17 14:11:00 localhost kernel: i40e 0000:19:00.1: Features: PF-id[1] VFs: 64 VSIs: 2 QP: 96 RSS FD_ATR FD_SB NTUPLE VxLAN Geneve PTP VEPA
Apr 17 14:11:00 localhost kernel: hub 1-14:1.0: USB hub found
Apr 17 14:11:00 localhost kernel: hub 1-14:1.0: 4 ports detected
Apr 17 14:11:00 localhost kernel: ata12: SATA link down (SStatus 0 SControl 300)
Apr 17 14:11:00 localhost kernel: ata10: SATA link down (SStatus 0 SControl 300)
Apr 17 14:11:00 localhost kernel: ata7: SATA link down (SStatus 0 SControl 300)
Apr 17 14:11:00 localhost kernel: ata11: SATA link down (SStatus 0 SControl 300)
Apr 17 14:11:00 localhost kernel: ata13: SATA link down (SStatus 0 SControl 300)
Apr 17 14:11:00 localhost kernel: ata14: SATA link down (SStatus 0 SControl 300)
Apr 17 14:11:00 localhost kernel: ata9: SATA link down (SStatus 0 SControl 300)
Apr 17 14:11:00 localhost kernel: ata8: SATA link down (SStatus 0 SControl 300)
Apr 17 14:11:00 localhost kernel: i40e 0000:19:00.1 eno2: renamed from eth0
Apr 17 14:11:00 localhost kernel: i40e 0000:19:00.0 eno1: renamed from eth2
Apr 17 14:11:01 localhost kernel: usb 1-14.1: new high-speed USB device number 3 using xhci_hcd
Apr 17 14:11:01 localhost kernel: hub 1-14.1:1.0: USB hub found
Apr 17 14:11:01 localhost kernel: hub 1-14.1:1.0: 4 ports detected
Apr 17 14:11:01 localhost kernel: usb 1-14.2: new high-speed USB device number 4 using xhci_hcd
Apr 17 14:11:01 localhost kernel: hid: raw HID events driver (C) Jiri Kosina
Apr 17 14:11:01 localhost kernel: usbcore: registered new interface driver usbhid
Apr 17 14:11:01 localhost kernel: usbhid: USB HID core driver
Apr 17 14:11:01 localhost kernel: input: DELLEMC DRAC 5 Virtual Keyboard and Mouse as /devices/pci0000:00/0000:00:14.0/usb1/1-14/1-14.2/1-14.2:1.0/0003:413C:0000.0001/input/input0
Apr 17 14:11:01 localhost kernel: usb 1-14.4: new high-speed USB device number 5 using xhci_hcd
Apr 17 14:11:01 localhost kernel: hid-generic 0003:413C:0000.0001: input,hidraw0: USB HID v1.01 Mouse [DELLEMC DRAC 5 Virtual Keyboard and Mouse] on usb-0000:00:14.0-14.2/input0
Apr 17 14:11:01 localhost kernel: input: DELLEMC DRAC 5 Virtual Keyboard and Mouse as /devices/pci0000:00/0000:00:14.0/usb1/1-14/1-14.2/1-14.2:1.1/0003:413C:0000.0002/input/input1
Apr 17 14:11:01 localhost kernel: hid-generic 0003:413C:0000.0002: input,hidraw1: USB HID v1.01 Keyboard [DELLEMC DRAC 5 Virtual Keyboard and Mouse] on usb-0000:00:14.0-14.2/input1
Apr 17 14:11:01 localhost kernel: hub 1-14.4:1.0: USB hub found
Apr 17 14:11:01 localhost kernel: hub 1-14.4:1.0: 4 ports detected
Apr 17 14:11:01 localhost kernel: mlx5_core 0000:18:00.0: Port module event: module 0, Cable unplugged
Apr 17 14:11:03 localhost kernel: audit: type=1130 audit(1713363063.058:28): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=ignition-fetch comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 17 14:11:03 localhost kernel: audit: type=1130 audit(1713363063.146:29): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=ignition-kargs comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 17 14:11:03 localhost kernel: audit: type=1130 audit(1713363063.241:30): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=ignition-disks comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 17 14:11:03 localhost kernel: loop0: Can't mount, would change RO state
Apr 17 14:11:03 localhost kernel: audit: type=1130 audit(1713363063.913:31): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=initrd-setup-root comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 17 14:11:04 localhost kernel: audit: type=1130 audit(1713363064.021:32): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=ignition-mount comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 17 14:11:04 localhost kernel: audit: type=1130 audit(1713363064.160:33): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=ignition-files comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 17 14:11:04 localhost kernel: audit: type=1130 audit(1713363064.227:34): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=initrd-setup-root-after-ignition comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 17 14:11:04 localhost kernel: audit: type=1130 audit(1713363064.365:35): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=initrd-parse-etc comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 17 14:11:04 localhost kernel: audit: type=1131 audit(1713363064.365:36): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=initrd-parse-etc comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 17 14:11:04 localhost kernel: audit: type=1130 audit(1713363064.558:37): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-pre-pivot comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 17 14:11:07 localhost systemd-journald[725]: Received SIGTERM from PID 1 (systemd).
Apr 17 14:11:09 localhost kernel: SELinux:  Permission cmd in class io_uring not defined in policy.
Apr 17 14:11:09 localhost kernel: SELinux: the above unknown classes and permissions will be allowed
Apr 17 14:11:09 localhost kernel: SELinux:  policy capability network_peer_controls=1
Apr 17 14:11:09 localhost kernel: SELinux:  policy capability open_perms=1
Apr 17 14:11:09 localhost kernel: SELinux:  policy capability extended_socket_class=1
Apr 17 14:11:09 localhost kernel: SELinux:  policy capability always_check_network=0
Apr 17 14:11:09 localhost kernel: SELinux:  policy capability cgroup_seclabel=1
Apr 17 14:11:09 localhost kernel: SELinux:  policy capability nnp_nosuid_transition=1
Apr 17 14:11:09 localhost kernel: SELinux:  policy capability genfs_seclabel_symlinks=0
Apr 17 14:11:09 localhost kernel: SELinux:  policy capability ioctl_skip_cloexec=0
Apr 17 14:11:09 localhost systemd[1]: Successfully loaded SELinux policy in 142.410ms.
Apr 17 14:11:09 localhost systemd[1]: Relabelled /dev, /dev/shm, /run, /sys/fs/cgroup in 7.889ms.
Apr 17 14:11:09 localhost systemd[1]: systemd 252 running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL -ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE -TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Apr 17 14:11:09 localhost systemd[1]: Detected architecture x86-64.
Apr 17 14:11:09 localhost systemd[1]: Detected first boot.
Apr 17 14:11:09 localhost systemd[1]: Initializing machine ID from random generator.
Apr 17 14:11:09 localhost systemd-gpt-auto-generator[1788]: EFI loader partition unknown, exiting.
Apr 17 14:11:09 localhost systemd-gpt-auto-generator[1788]: (The boot loader did not set EFI variable LoaderDevicePartUUID.)
Apr 17 14:11:09 localhost systemd[1]: Populated /etc with preset unit settings.
Apr 17 14:11:09 localhost systemd[1]: iscsiuio.service: Deactivated successfully.
Apr 17 14:11:09 localhost systemd[1]: Stopped iscsiuio.service - iSCSI UserSpace I/O driver.
Apr 17 14:11:09 localhost systemd[1]: iscsid.service: Deactivated successfully.
Apr 17 14:11:09 localhost systemd[1]: Stopped iscsid.service - Open-iSCSI.
Apr 17 14:11:09 localhost systemd[1]: initrd-switch-root.service: Deactivated successfully.
Apr 17 14:11:09 localhost systemd[1]: Stopped initrd-switch-root.service - Switch Root.
Apr 17 14:11:09 localhost systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
Apr 17 14:11:09 localhost systemd[1]: Created slice system-addon\x2dconfig.slice - Slice /system/addon-config.
Apr 17 14:11:09 localhost systemd[1]: Created slice system-addon\x2drun.slice - Slice /system/addon-run.
Apr 17 14:11:09 localhost systemd[1]: Created slice system-getty.slice - Slice /system/getty.
Apr 17 14:11:09 localhost systemd[1]: Created slice system-modprobe.slice - Slice /system/modprobe.
Apr 17 14:11:09 localhost systemd[1]: Created slice system-system\x2dcloudinit.slice - Slice /system/system-cloudinit.
Apr 17 14:11:09 localhost systemd[1]: Created slice system-systemd\x2dfsck.slice - Slice /system/systemd-fsck.
Apr 17 14:11:09 localhost systemd[1]: Created slice user.slice - User and Session Slice.
Apr 17 14:11:09 localhost systemd[1]: Started systemd-ask-password-console.path - Dispatch Password Requests to Console Directory Watch.
Apr 17 14:11:09 localhost systemd[1]: Started systemd-ask-password-wall.path - Forward Password Requests to Wall Directory Watch.
Apr 17 14:11:09 localhost systemd[1]: boot.automount - Boot partition Automount Point was skipped because of an unmet condition check (ConditionPathExists=!/usr/.noupdate).
Apr 17 14:11:09 localhost systemd[1]: Set up automount proc-sys-fs-binfmt_misc.automount - Arbitrary Executable File Formats File System Automount Point.
Apr 17 14:11:09 localhost systemd[1]: dev-disk-by\x2dlabel-OEM.device - /dev/disk/by-label/OEM was skipped because of an unmet condition check (ConditionPathExists=!/usr/.noupdate).
Apr 17 14:11:09 localhost systemd[1]: Stopped target initrd-switch-root.target - Switch Root.
Apr 17 14:11:09 localhost systemd[1]: Stopped target initrd-fs.target - Initrd File Systems.
Apr 17 14:11:09 localhost systemd[1]: Stopped target initrd-root-fs.target - Initrd Root File System.
Apr 17 14:11:09 localhost systemd[1]: Reached target integritysetup.target - Local Integrity Protected Volumes.
Apr 17 14:11:09 localhost systemd[1]: Reached target remote-cryptsetup.target - Remote Encrypted Volumes.
Apr 17 14:11:09 localhost systemd[1]: Reached target remote-fs.target - Remote File Systems.
Apr 17 14:11:09 localhost systemd[1]: Reached target slices.target - Slice Units.
Apr 17 14:11:09 localhost systemd[1]: Reached target swap.target - Swaps.
Apr 17 14:11:09 localhost systemd[1]: Reached target veritysetup.target - Local Verity Protected Volumes.
Apr 17 14:11:09 localhost systemd[1]: Listening on systemd-coredump.socket - Process Core Dump Socket.
Apr 17 14:11:09 localhost systemd[1]: Listening on systemd-initctl.socket - initctl Compatibility Named Pipe.
Apr 17 14:11:09 localhost systemd[1]: Listening on systemd-networkd.socket - Network Service Netlink Socket.
Apr 17 14:11:09 localhost systemd[1]: Listening on systemd-udevd-control.socket - udev Control Socket.
Apr 17 14:11:09 localhost systemd[1]: Listening on systemd-udevd-kernel.socket - udev Kernel Socket.
Apr 17 14:11:09 localhost systemd[1]: Listening on systemd-userdbd.socket - User Database Manager Socket.
Apr 17 14:11:09 localhost systemd[1]: Mounting dev-hugepages.mount - Huge Pages File System...
Apr 17 14:11:09 localhost systemd[1]: Mounting dev-mqueue.mount - POSIX Message Queue File System...
Apr 17 14:11:09 localhost systemd[1]: Mounting media.mount - External Media Directory...
Apr 17 14:11:09 localhost systemd[1]: proc-xen.mount - /proc/xen was skipped because of an unmet condition check (ConditionVirtualization=xen).
Apr 17 14:11:09 localhost systemd[1]: Mounting sys-kernel-debug.mount - Kernel Debug File System...
Apr 17 14:11:09 localhost systemd[1]: Mounting sys-kernel-tracing.mount - Kernel Trace File System...
Apr 17 14:11:09 localhost systemd[1]: Mounting tmp.mount - Temporary Directory /tmp...
Apr 17 14:11:09 localhost systemd[1]: Starting flatcar-tmpfiles.service - Create missing system files...
Apr 17 14:11:09 localhost systemd[1]: ignition-delete-config.service - Ignition (delete config) was skipped because no trigger condition checks were met.
Apr 17 14:11:09 localhost systemd[1]: Starting kmod-static-nodes.service - Create List of Static Device Nodes...
Apr 17 14:11:09 localhost systemd[1]: Starting modprobe@configfs.service - Load Kernel Module configfs...
Apr 17 14:11:09 localhost systemd[1]: Starting modprobe@dm_mod.service - Load Kernel Module dm_mod...
Apr 17 14:11:09 localhost systemd[1]: Starting modprobe@drm.service - Load Kernel Module drm...
Apr 17 14:11:09 localhost systemd[1]: Starting modprobe@efi_pstore.service - Load Kernel Module efi_pstore...
Apr 17 14:11:09 localhost systemd[1]: Starting modprobe@fuse.service - Load Kernel Module fuse...
Apr 17 14:11:09 localhost kernel: ACPI: bus type drm_connector registered
Apr 17 14:11:09 localhost kernel: fuse: init (API version 7.37)
Apr 17 14:11:09 localhost systemd[1]: Starting modprobe@loop.service - Load Kernel Module loop...
Apr 17 14:11:09 localhost systemd[1]: setup-nsswitch.service - Create /etc/nsswitch.conf was skipped because of an unmet condition check (ConditionPathExists=!/etc/nsswitch.conf).
Apr 17 14:11:09 localhost systemd[1]: Stopped systemd-journald.service - Journal Service.
Apr 17 14:11:09 localhost systemd[1]: systemd-journald.service: Consumed 5.840s CPU time.
Apr 17 14:11:09 localhost kernel: kauditd_printk_skb: 57 callbacks suppressed
Apr 17 14:11:09 localhost kernel: audit: type=1130 audit(1713363069.172:95): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 17 14:11:09 localhost kernel: audit: type=1131 audit(1713363069.172:96): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 17 14:11:09 localhost kernel: audit: type=1334 audit(1713363069.185:97): prog-id=18 op=LOAD
Apr 17 14:11:09 localhost kernel: audit: type=1334 audit(1713363069.218:98): prog-id=19 op=LOAD
Apr 17 14:11:09 localhost kernel: audit: type=1334 audit(1713363069.266:99): prog-id=20 op=LOAD
Apr 17 14:11:09 localhost kernel: audit: type=1334 audit(1713363069.277:100): prog-id=16 op=UNLOAD
Apr 17 14:11:09 localhost kernel: audit: type=1334 audit(1713363069.277:101): prog-id=17 op=UNLOAD
Apr 17 14:11:09 localhost systemd[1]: Starting systemd-journald.service - Journal Service...
Apr 17 14:11:09 localhost kernel: audit: type=1305 audit(1713363069.337:102): op=set audit_enabled=1 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 res=1
Apr 17 14:11:09 localhost kernel: audit: type=1300 audit(1713363069.337:102): arch=c000003e syscall=46 success=yes exit=60 a0=5 a1=7ffc963b4e50 a2=4000 a3=7ffc963b4eec items=0 ppid=1 pid=1856 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="systemd-journal" exe="/usr/lib/systemd/systemd-journald" subj=system_u:system_r:kernel_t:s0 key=(null)
Apr 17 14:11:09 localhost kernel: audit: type=1327 audit(1713363069.337:102): proctitle="/usr/lib/systemd/systemd-journald"
Apr 17 14:11:09 localhost systemd[1]: Starting systemd-modules-load.service - Load Kernel Modules...
Apr 17 14:11:09 localhost systemd[1]: Starting systemd-network-generator.service - Generate network units from Kernel command line...
Apr 17 14:11:09 localhost systemd[1]: Starting systemd-remount-fs.service - Remount Root and Kernel File Systems...
Apr 17 14:11:09 localhost systemd[1]: Starting systemd-udev-trigger.service - Coldplug All udev Devices...
Apr 17 14:11:09 localhost systemd[1]: xenserver-pv-version.service - Set fake PV driver version for XenServer was skipped because of an unmet condition check (ConditionVirtualization=xen).
Apr 17 14:11:09 localhost systemd[1]: Started systemd-journald.service - Journal Service.
Apr 17 14:11:09 localhost systemd-journald[1856]: Received client request to flush runtime journal.
Apr 17 14:11:10 localhost kernel: input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
Apr 17 14:11:10 localhost kernel: ACPI: button: Power Button [PWRF]
Apr 17 14:11:10 localhost kernel: mousedev: PS/2 mouse device common for all mice
Apr 17 14:11:10 localhost kernel: IPMI message handler: version 39.2
Apr 17 14:11:10 localhost kernel: ipmi device interface
Apr 17 14:11:10 localhost kernel: i801_smbus 0000:00:1f.4: SPD Write Disable is set
Apr 17 14:11:10 localhost kernel: i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
Apr 17 14:11:10 localhost kernel: i2c i2c-2: 12/24 memory slots populated (from DMI)
Apr 17 14:11:10 localhost kernel: ses 0:0:2:0: Attached Enclosure device
Apr 17 14:11:10 localhost kernel: i2c i2c-2: Systems with more than 4 memory slots not supported yet, not instantiating SPD
Apr 17 14:11:10 localhost kernel: ipmi_si: IPMI System Interface driver
Apr 17 14:11:10 localhost kernel: ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS
Apr 17 14:11:10 localhost kernel: ipmi_platform: ipmi_si: SMBIOS: io 0xca8 regsize 1 spacing 4 irq 10
Apr 17 14:11:10 localhost kernel: ipmi_si: Adding SMBIOS-specified kcs state machine
Apr 17 14:11:10 localhost kernel: ipmi_si IPI0001:00: ipmi_platform: probing via ACPI
Apr 17 14:11:10 localhost kernel: ipmi_si IPI0001:00: ipmi_platform: [io  0x0ca8] regsize 1 spacing 4 irq 10
Apr 17 14:11:10 localhost kernel: mei_me 0000:00:16.0: Device doesn't have valid ME Interface
Apr 17 14:11:10 localhost kernel: iTCO_vendor_support: vendor-support=0
Apr 17 14:11:10 localhost kernel: dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.4)
Apr 17 14:11:10 localhost kernel: ipmi_si dmi-ipmi-si.0: Removing SMBIOS-specified kcs state machine in favor of ACPI
Apr 17 14:11:10 localhost kernel: ipmi_si: Adding ACPI-specified kcs state machine
Apr 17 14:11:10 localhost kernel: ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca8, slave address 0x20, irq 10
Apr 17 14:11:10 localhost kernel: iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=4, TCOBASE=0x0400)
Apr 17 14:11:10 localhost kernel: iTCO_wdt iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
Apr 17 14:11:10 localhost kernel: ipmi_si IPI0001:00: The BMC does not support setting the recv irq bit, compensating, but the BMC needs to be fixed.
Apr 17 14:11:10 localhost kernel: ipmi_si IPI0001:00: Using irq 10
Apr 17 14:11:10 localhost kernel: ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20)
Apr 17 14:11:10 localhost kernel: ipmi_si IPI0001:00: IPMI kcs interface initialized
Apr 17 14:11:10 localhost kernel: intel_rapl_common: Found RAPL domain package
Apr 17 14:11:10 localhost kernel: intel_rapl_common: Found RAPL domain dram
Apr 17 14:11:10 localhost kernel: intel_rapl_common: DRAM domain energy unit 15300pj
Apr 17 14:11:10 localhost kernel: intel_rapl_common: Found RAPL domain package
Apr 17 14:11:10 localhost kernel: ipmi_ssif: IPMI SSIF Interface driver
Apr 17 14:11:10 localhost kernel: intel_rapl_common: Found RAPL domain dram
Apr 17 14:11:10 localhost kernel: intel_rapl_common: DRAM domain energy unit 15300pj
Apr 17 14:11:11 localhost kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ens3f0: link becomes ready
Apr 17 14:11:11 localhost kernel: loop1: detected capacity change from 0 to 139360
Apr 17 14:11:11 localhost kernel: loop2: detected capacity change from 0 to 80584
Apr 17 14:11:11 localhost kernel: loop3: detected capacity change from 0 to 139360
Apr 17 14:11:11 localhost kernel: loop4: detected capacity change from 0 to 80584
Apr 17 14:11:11 localhost systemd-gpt-auto-generator[2093]: EFI loader partition unknown, exiting.
Apr 17 14:11:11 localhost systemd-gpt-auto-generator[2093]: (The boot loader did not set EFI variable LoaderDevicePartUUID.)
Apr 17 14:11:30 k8s-worker-01 kernel: : Can't open blockdev
Apr 17 14:11:30 k8s-worker-01 kernel: fuseblk: Bad value for 'source'

The logs don't have a value for the drive - what is the value set for install_disk in terraform?

Also, I hoped to see more in the kernel logs, maybe it helps to run lsblk and journalctl --all -e when the failure appears.

lsblk:

NAME  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0   7:0    0 354.1M  0 loop /usr
loop3   7:3    0    68M  1 loop 
loop4   7:4    0  39.3M  1 loop 
sda     8:0    0   1.7T  0 disk 
sdb     8:16   0   1.7T  0 disk 

blkid:

/dev/loop4: TYPE="squashfs"
/dev/loop0: TYPE="squashfs"
/dev/sda: PTTYPE="PMBR"
/dev/loop3: TYPE="squashfs"

I didnt set install drive so it should be default sda?

I wonder whether it wrote the disk image but couldn't lists the partitions due to a race or permanently and whether it wrote the right image. On error it ran wipefs which might explain why we won't see them afterwards.

In the installer OS, can you provide cat /etc/os-release and journalctl -u installer.service and also check what happens if you run sudo /opt/installer?

One more question, oem_type set in Typhoon is also left to the default, or?

image
I''l post information you asked in a moment, but right now ive found interesting message in IPMI console

cat /etc/os-release

NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=3815.2.1
VERSION_ID=3815.2.1
BUILD_ID=2024-03-17-2158
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 3815.2.1 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:3815.2.1:*:*:*:*:*:*:*"
Apr 18 11:50:24 k8s-worker-01 systemd[1]: Started installer.service.
Apr 18 11:50:24 k8s-worker-01 installer[2246]: + curl --retry 10 'http://[URL_REMOVED]:8080/ignition?mac=40:a6:b7:5c:50:08&os=installed' -o ignition.json
Apr 18 11:50:24 k8s-worker-01 installer[2249]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Apr 18 11:50:24 k8s-worker-01 installer[2249]:                                  Dload  Upload   Total   Spent    Left  Speed
Apr 18 11:50:24 k8s-worker-01 installer[2249]: [158B blob data]
Apr 18 11:50:24 k8s-worker-01 installer[2246]: + flatcar-install -d /dev/sda -C stable -V 3815.2.1 -b [URL_REMOVED]/assets/flatcar -i ignition.json
Apr 18 11:50:24 k8s-worker-01 installer[2252]: Downloading the signature for [URL_REMOVED]flatcar_production_image.bin.bz2...
Apr 18 11:50:24 k8s-worker-01 installer[2269]: 2024-04-18 11:50:24 URL:http://[URL_REMOVED]:8080/assets/flatcar/3815.2.1/flatcar_production_image.bin.bz2.sig [594/594] -> "/tmp/flatcar-install.W0zmh4i8Al/flatcar_>
Apr 18 11:50:24 k8s-worker-01 installer[2252]: Downloading, writing and verifying flatcar_production_image.bin.bz2...
Apr 18 11:50:26 k8s-worker-01 installer[2271]: 2024-04-18 11:50:26 URL:http://[URL_REMOVED]:8080/assets/flatcar/3815.2.1/flatcar_production_image.bin.bz2 [467398635/467398635] -> "-" [1]
Apr 18 11:50:26 k8s-worker-01 installer[2274]: gpg: Signature made Sun Mar 17 22:15:01 2024 UTC
Apr 18 11:50:26 k8s-worker-01 installer[2274]: gpg:                using RSA key E9426D8B67E35DF476BD048185F7C8868837E271
Apr 18 11:50:26 k8s-worker-01 installer[2274]: gpg:                issuer "buildbot@flatcar-linux.org"
Apr 18 11:50:26 k8s-worker-01 installer[2274]: gpg: key E25D9AED0593B34A marked as ultimately trusted
Apr 18 11:50:26 k8s-worker-01 installer[2274]: gpg: checking the trustdb
Apr 18 11:50:26 k8s-worker-01 installer[2274]: gpg: marginals needed: 3  completes needed: 1  trust model: pgp
Apr 18 11:50:26 k8s-worker-01 installer[2274]: gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
Apr 18 11:50:26 k8s-worker-01 installer[2274]: gpg: Good signature from "Flatcar Buildbot (Official Builds) <buildbot@flatcar-linux.org>" [ultimate]
Apr 18 11:52:56 k8s-worker-01 installer[2419]: mount: /tmp/flatcar-install.W0zmh4i8Al/oemfs: wrong fs type, bad option, bad superblock on , missing codepage or helper program, or other error.
Apr 18 11:52:56 k8s-worker-01 installer[2419]:        dmesg(1) may have more information after failed mount system call.
Apr 18 11:52:56 k8s-worker-01 installer[2420]: ERROR: mount check: cannot open : No such file or directory
Apr 18 11:52:56 k8s-worker-01 installer[2252]: Error: return code 1 from [[ -n "${IGNITION}" ]]
Apr 18 11:52:56 k8s-worker-01 installer[2421]: wipefs: failed to create a signature backup, $HOME undefined
Apr 18 11:52:57 k8s-worker-01 systemd[1]: installer.service: Main process exited, code=exited, status=1/FAILURE
Apr 18 11:52:57 k8s-worker-01 systemd[1]: installer.service: Failed with result 'exit-code'.
Apr 18 11:52:57 k8s-worker-01 systemd[1]: installer.service: Consumed 1min 13.463s CPU time.

sudo /opt/installer

+ curl --retry 10 'http://matchbox.[URL_REMOVED]:8080/ignition?mac=40:a6:b7:5c:50:08&os=installed' -o ignition.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4636    0  4636    0     0   897k      0 --:--:-- --:--:-- --:--:-- 1131k
+ flatcar-install -d /dev/sda -C stable -V 3815.2.1 -b http://matchbox.[URL_REMOVED]:8080/assets/flatcar -i ignition.json
Downloading the signature for http://matchbox.[URL_REMOVED]:8080/assets/flatcar/3815.2.1/flatcar_production_image.bin.bz2...
2024-04-18 12:44:18 URL:http://matchbox.[URL_REMOVED]:8080/assets/flatcar/3815.2.1/flatcar_production_image.bin.bz2.sig [594/594] -> "/tmp/flatcar-install.2GMlnnvs3y/flatcar_production_image.bin.bz2.sig" [1]
Downloading, writing and verifying flatcar_production_image.bin.bz2...
2024-04-18 12:44:20 URL:http://matchbox.[URL_REMOVED]:8080/assets/flatcar/3815.2.1/flatcar_production_image.bin.bz2 [467398635/467398635] -> "-" [1]
gpg: Signature made Sun Mar 17 22:15:01 2024 UTC
gpg:                using RSA key E9426D8B67E35DF476BD048185F7C8868837E271
gpg:                issuer "buildbot@flatcar-linux.org"
gpg: key E25D9AED0593B34A marked as ultimately trusted
gpg: checking the trustdb
gpg: marginals needed: 3  completes needed: 1  trust model: pgp
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
gpg: Good signature from "Flatcar Buildbot (Official Builds) <buildbot@flatcar-linux.org>" [ultimate]
mount: /tmp/flatcar-install.2GMlnnvs3y/oemfs: wrong fs type, bad option, bad superblock on , missing codepage or helper program, or other error.
       dmesg(1) may have more information after failed mount system call.
ERROR: mount check: cannot open : No such file or directory
Error: return code 1 from [[ -n "${IGNITION}" ]]
/dev/sda: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
/dev/sda: calling ioctl to re-read partition table: Success

oem_type = empty

Thanks for the logs. Since there were kernel errors for this drive, does installation work for /dev/sdb?

For understanding the error for /dev/sda: Can you dd if=flatcar_production_image.bin of=/dev/sda and check whether the partitions eventually show up? Maybe first blockdev --rereadpt /dev/sda before lsblk /dev/sda and fdisk -l /dev/sda?

Ok, new updated - did what you said, didnt solve the issue. Stil same while using just dd:

Disk /dev/sda: 1.75 TiB, 1920383410176 bytes, 468843606 sectors
Disk model: PX05SRQ192      
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot Start    End Sectors Size Id Type
/dev/sda1  *     4096 266239  262144   1G  c W95 FAT32 (LBA)
/dev/sda2           1   4095    4095  16M ee GPT

With /dev/sdb - same story, wipefs -a /dev/sdb, then dd of image produces same result in fdisk.
Going further, i've installed USB stick in that server and did dd on it - and it made perfectly fine layout:

Disk /dev/sdc: 28.65 GiB, 30765219840 bytes, 60088320 sectors
Disk model:  SanDisk 3.2Gen1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 00000000-0000-0000-0000-000000000001

Device       Start     End Sectors  Size Type
/dev/sdc1     4096  266239  262144  128M EFI System
/dev/sdc2   266240  270335    4096    2M BIOS boot
/dev/sdc3   270336 2367487 2097152    1G unknown
/dev/sdc4  2367488 4464639 2097152    1G unknown
/dev/sdc6  4464640 4726783  262144  128M Linux filesystem
/dev/sdc7  4726784 4857855  131072   64M unknown
/dev/sdc9  4857856 9285631 4427776  2.1G unknown

So my next idea is to take this disks out of this machine and replace them with brand new drives, maybe since they previously was RAID drives in this server before i've installed HBA card instead of RAID their firmware holds some data?

Strange…
One more idea in case it's the kernel driver/firmware: You can try PXE-booting from Flatcar LTS or Alpha and then run the installer. Typhoon has os_channel but since it only allows Alpha, Beta, Stable, you need to patch this a bit, or do a manual change in your PXE setup.
Edit: Or set the Typhoon os_version for Stable to 3510.2.8 which the LTS is based on.

Ok, new updated - did what you said, didnt solve the issue. Stil same while using just dd:

Disk /dev/sda: 1.75 TiB, 1920383410176 bytes, 468843606 sectors
Disk model: PX05SRQ192      
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot Start    End Sectors Size Id Type
/dev/sda1  *     4096 266239  262144   1G  c W95 FAT32 (LBA)
/dev/sda2           1   4095    4095  16M ee GPT

With /dev/sdb - same story, wipefs -a /dev/sdb, then dd of image produces same result in fdisk. Going further, i've installed USB stick in that server and did dd on it - and it made perfectly fine layout:

Disk /dev/sdc: 28.65 GiB, 30765219840 bytes, 60088320 sectors
Disk model:  SanDisk 3.2Gen1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 00000000-0000-0000-0000-000000000001

Device       Start     End Sectors  Size Type
/dev/sdc1     4096  266239  262144  128M EFI System
/dev/sdc2   266240  270335    4096    2M BIOS boot
/dev/sdc3   270336 2367487 2097152    1G unknown
/dev/sdc4  2367488 4464639 2097152    1G unknown
/dev/sdc6  4464640 4726783  262144  128M Linux filesystem
/dev/sdc7  4726784 4857855  131072   64M unknown
/dev/sdc9  4857856 9285631 4427776  2.1G unknown

So my next idea is to take this disks out of this machine and replace them with brand new drives, maybe since they previously was RAID drives in this server before i've installed HBA card instead of RAID their firmware holds some data?

Hello, this looks like a hardware issue, as it seems that the data is not actually synced but the controller says it was synced (otherwise the dd would have failed). Maybe you can try to cleanup the partitions from /dev/sda with fdisk, then create a new partition, format with mkfs.ext4, mount it, write something on it like a file and see if it works and it s also reboot verified that the data was actually written. If it works, you can exclude a hardware issue.

If the above works, I would also suggest a zero clean of the device: dd of=/dev/sda if=/dev/zero && sync, before trying again to install Flatcar. This way, you can be sure there is no hardware corruption at cell / sector level.

Strange… One more idea in case it's the kernel driver/firmware: You can try PXE-booting from Flatcar LTS or Alpha and then run the installer. Typhoon has os_channel but since it only allows Alpha, Beta, Stable, you need to patch this a bit, or do a manual change in your PXE setup. Edit: Or set the Typhoon os_version for Stable to 3510.2.8 which the LTS is based on.

Tried just wget of stable 3510.2.8 and dd it to drive - still same issue getting PMBR and dos partition.

If the above works, I would also suggest a zero clean of the device: dd of=/dev/sda if=/dev/zero && sync, before trying again to install Flatcar. This way, you can be sure there is no hardware corruption at cell / sector level.

Tried dd if /dev/zero - same issue. Gonna be getting brand new drives today to see if issue persists.

While waiting for new drives I've tried installing ubuntu (went perfactly) and then tried hdparm
root@kube-dell:~# hdparm -I /dev/sdb

/dev/sdb:
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 28 00 00 00 00 20 00 00 00 00 00 00 00 62 01 01 00 00 00 00 00 00 00 7a 00
root@kube-dell:~# hdparm -I /dev/sdc

/dev/sdc:
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 28 00 00 00 00 20 00 00 00 00 00 00 00 62 01 01 00 00 00 00 00 00 00 7a 00