[BUG] Incomplete regex in get_mount_point()
rwagnergit opened this issue · comments
Describe the bug: A clear and concise description of what the bug is.
On at least one of our Azure Linux VMs, we are seeing waagent is incorrectly identifying the mount point of the ephemeral disk. In our particular case, the ephemeral disk is present at /dev/sda, the OS disk is present ad /dev/sdac, the boot partition is /dev/sdac1 and is mounted at /boot. When waagent starts, get_mount_point() in /usr/lib/python3.6/site-packages/azurelinuxagent/common/osutil/default.py is returning /boot, which is causing waagent to (among other things) try to create the swapfile under /boot, where it runs out of space (since /boot is only 500MB in size and we are attempting to create a 16GB swapfile. I dug into the code and I believe the problem is the regex in get_mount_point() is not sufficiently specific. Instead of:
def get_mount_point(self, mountlist, device):
"""
Example of mountlist:
/dev/sda1 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs
(rw,rootcontext="system_u:object_r:tmpfs_t:s0")
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/dev/sdb1 on /mnt/resource type ext4 (rw)
"""
if (mountlist and device):
for entry in mountlist.split('\n'):
if(re.search(device, entry)):
tokens = entry.split()
#Return the 3rd column of this line
return tokens[2] if len(tokens) > 2 else None
return None
we should have:
def get_mount_point(self, mountlist, device):
"""
Example of mountlist:
/dev/sda1 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs
(rw,rootcontext="system_u:object_r:tmpfs_t:s0")
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/dev/sdb1 on /mnt/resource type ext4 (rw)
"""
if (mountlist and device):
for entry in mountlist.split('\n'):
if(re.search(device + '[0-9 ]', entry)): # Note change here
tokens = entry.split()
#Return the 3rd column of this line
return tokens[2] if len(tokens) > 2 else None
return None
After making that change, waagent functions correctly; get_mount_point() returns None and the ephemeral disk is properly partitioned, mounted, and the swapfile created. I know there's a lot here, so I'll try to explain below, and note that this was discovered as part of Azure case #2206030040004173.
When the problem occurs, here's what we see in waagent.log:
2022-02-14T15:20:03.190695Z INFO Daemon Daemon Azure Linux Agent Version:2.2.49.2
2022-02-14T15:20:03.206265Z INFO Daemon Daemon OS: redhat 8.3
2022-02-14T15:20:03.211102Z INFO Daemon Daemon Python: 3.6.8
2022-02-14T15:20:03.217851Z INFO Daemon Daemon Run daemon
2022-02-14T15:20:03.223552Z INFO Daemon Daemon No RDMA handler exists for distro='Red Hat Enterprise Linux' version='8.3'
2022-02-14T15:20:03.244831Z INFO Daemon Daemon Error getting cloud-init enabled status from systemctl: Command '['systemctl', 'is-enabled', 'cloud-init-local.service']' returned non-zero exit status 1.
2022-02-14T15:20:06.560272Z INFO Daemon Daemon Error getting cloud-init enabled status from service: Command '['service', 'cloud-init', 'status']' returned non-zero exit status 3.
2022-02-14T15:20:06.570553Z INFO Daemon Daemon cloud-init is enabled: False
2022-02-14T15:20:06.574833Z INFO Daemon Daemon Using waagent for provisioning
2022-02-14T15:20:06.579962Z INFO Daemon Daemon Activate resource disk
2022-02-14T15:20:06.584232Z INFO Daemon Daemon Searching gen1 prefix 00000000-0001 or gen2 f8b3781a-1e82-4818-a1c3-63d806ec15bb
2022-02-14T15:20:06.595730Z INFO Daemon Daemon Found device: sda
2022-02-14T15:20:06.606048Z INFO Daemon Daemon Resource disk [/dev/sda1] is already mounted [/boot]
2022-02-14T15:20:06.612663Z INFO Daemon Daemon Enable swap
2022-02-14T15:20:06.625758Z INFO Daemon Daemon Create swap file
2022-02-14T15:20:06.633544Z ERROR Daemon Daemon Command: [umask 0077 && fallocate -l 17301504000 '/boot/swapfile'], return code: [1], result: [fallocate: fallocate failed: No space left on device
]
2022-02-14T15:20:06.645125Z INFO Daemon Daemon fallocate unsuccessful, falling back to dd
2022-02-14T15:20:09.242501Z ERROR Daemon Daemon Command: [umask 0077 && dd if=/dev/zero bs=67108864 count=257 conv=notrunc of='/boot/swapfile'], return code: [1], result: [dd: error writing '/boot/swapfile': No space left on device
6+0 records in
5+0 records out
351076352 bytes (351 MB, 335 MiB) copied, 2.58573 s, 136 MB/s
]
2022-02-14T15:20:09.326793Z ERROR Daemon Daemon dd unsuccessful
2022-02-14T15:20:10.687323Z INFO Daemon Daemon Enabled 16896000KB of swap at /boot/swapfile
And here is lsblk:
rowagn@kmb14au:~#> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 112G 0 disk
+-sda1 8:1 0 112G 0 part
sdb 8:16 0 256G 0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup 253:18 0 1.2T 0 lvm /sso/data/oracle/backup
sdc 8:32 0 256G 0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup 253:18 0 1.2T 0 lvm /sso/data/oracle/backup
sdd 8:48 0 256G 0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup 253:18 0 1.2T 0 lvm /sso/data/oracle/backup
sde 8:64 0 256G 0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup 253:18 0 1.2T 0 lvm /sso/data/oracle/backup
sdf 8:80 0 256G 0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup 253:18 0 1.2T 0 lvm /sso/data/oracle/backup
sdg 8:96 0 256G 0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup 253:18 0 1.2T 0 lvm /sso/data/oracle/backup
sdh 8:112 0 256G 0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup 253:18 0 1.2T 0 lvm /sso/data/oracle/backup
sdi 8:128 0 64G 0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01 253:19 0 623G 0 lvm /sso/data/oracle/data01
sdj 8:144 0 64G 0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01 253:19 0 623G 0 lvm /sso/data/oracle/data01
sdk 8:160 0 64G 0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01 253:19 0 623G 0 lvm /sso/data/oracle/data01
sdl 8:176 0 64G 0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01 253:19 0 623G 0 lvm /sso/data/oracle/data01
sdm 8:192 0 64G 0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01 253:19 0 623G 0 lvm /sso/data/oracle/data01
sdn 8:208 0 64G 0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01 253:19 0 623G 0 lvm /sso/data/oracle/data01
sdo 8:224 0 64G 0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01 253:19 0 623G 0 lvm /sso/data/oracle/data01
sdp 8:240 0 64G 0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01 253:19 0 623G 0 lvm /sso/data/oracle/data01
sdq 65:0 0 64G 0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01 253:19 0 623G 0 lvm /sso/data/oracle/data01
sdr 65:16 0 64G 0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01 253:19 0 623G 0 lvm /sso/data/oracle/data01
sds 65:32 0 64G 0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01 253:19 0 623G 0 lvm /sso/data/oracle/data01
sdt 65:48 0 128G 0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17 0 748G 0 lvm /sso/data/oracle/flash01
sdu 65:64 0 128G 0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17 0 748G 0 lvm /sso/data/oracle/flash01
sdv 65:80 0 128G 0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17 0 748G 0 lvm /sso/data/oracle/flash01
sdw 65:96 0 128G 0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17 0 748G 0 lvm /sso/data/oracle/flash01
sdx 65:112 0 128G 0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17 0 748G 0 lvm /sso/data/oracle/flash01
sdy 65:128 0 128G 0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17 0 748G 0 lvm /sso/data/oracle/flash01
sdz 65:144 0 128G 0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17 0 748G 0 lvm /sso/data/oracle/flash01
sdaa 65:160 0 64G 0 disk
+-vg_sso_sfw_oracle-ssosfworacle 253:16 0 50G 0 lvm /sso/sfw/oracle
sdab 65:176 0 256G 0 disk
+-vg_standard-opt 253:5 0 10G 0 lvm /opt
+-vg_standard-tmp 253:6 0 10G 0 lvm /tmp
+-vg_standard-var 253:7 0 20G 0 lvm /var
+-vg_standard-sso 253:8 0 50G 0 lvm /sso
+-vg_standard-home 253:9 0 10G 0 lvm /home
+-vg_standard-vartmp 253:10 0 2G 0 lvm /var/tmp
+-vg_standard-varlog 253:11 0 50G 0 lvm /var/log
+-vg_standard-varcache 253:12 0 10G 0 lvm /var/cache
+-vg_standard-varlogaudit 253:13 0 10G 0 lvm /var/log/audit
+-vg_standard-ssomonitoring 253:14 0 20G 0 lvm /sso/monitoring
+-vg_standard-varlogjournal 253:15 0 10G 0 lvm /var/log/journal
sdac 65:192 0 64G 0 disk
+-sdac1 65:193 0 500M 0 part /boot
+-sdac2 65:194 0 63G 0 part
¦ +-rootvg-tmplv 253:0 0 2G 0 lvm
¦ +-rootvg-usrlv 253:1 0 10G 0 lvm /usr
¦ +-rootvg-homelv 253:2 0 1G 0 lvm
¦ +-rootvg-varlv 253:3 0 8G 0 lvm
¦ +-rootvg-rootlv 253:4 0 42G 0 lvm /
+-sdac14 65:206 0 4M 0 part
+-sdac15 65:207 0 495M 0 part /boot/efi
If we look at the relevant portion of azurelinuxagent/daemon/resourcedisk/default.py:
95 def mount_resource_disk(self, mount_point):
96 device = self.osutil.device_for_ide_port(1)
97 if device is None:
98 raise ResourceDiskError("unable to detect disk topology")
99
100 device = "/dev/{0}".format(device)
101 partition = device + "1"
102 mount_list = shellutil.run_get_output("mount")[1]
103 existing = self.osutil.get_mount_point(mount_list, device)
104
105 if existing:
106 logger.info("Resource disk [{0}] is already mounted [{1}]",
107 partition,
108 existing)
109 return existing
110
And the relevant portion of azurelinuxagent/common/osutil/default.py:
1111 def get_mount_point(self, mountlist, device):
1112 """
1113 Example of mountlist:
1114 /dev/sda1 on / type ext4 (rw)
1115 proc on /proc type proc (rw)
1116 sysfs on /sys type sysfs (rw)
1117 devpts on /dev/pts type devpts (rw,gid=5,mode=620)
1118 tmpfs on /dev/shm type tmpfs
1119 (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
1120 none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
1121 /dev/sdb1 on /mnt/resource type ext4 (rw)
1122 """
1123 if (mountlist and device):
1124 for entry in mountlist.split('\n'):
1125 if(re.search(device, entry)):
1126 tokens = entry.split()
1127 #Return the 3rd column of this line
1128 return tokens[2] if len(tokens) > 2 else None
1129 return None
We can trace the problem:
rowagn@kmb14au:/usr/lib/python3.6/site-packages#> python
Python 3.6.8 (default, Mar 18 2021, 08:58:41)
[GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import re
>>> import stat
>>> import sys
>>> import threading
>>> from time import sleep
>>> import azurelinuxagent.common.logger as logger
>>> from azurelinuxagent.common.future import ustr
>>> import azurelinuxagent.common.conf as conf
>>> from azurelinuxagent.common.event import add_event, WALAEventOperation
>>> import azurelinuxagent.common.utils.fileutil as fileutil
>>> import azurelinuxagent.common.utils.shellutil as shellutil
>>> from azurelinuxagent.common.exception import ResourceDiskError
>>> from azurelinuxagent.common.osutil import get_osutil
>>> from azurelinuxagent.common.version import AGENT_NAME
>>>
# step through lines 96-102 of mount_resource_disk() just to see the results of the calls
>>> osutil = get_osutil()
>>> device = osutil.device_for_ide_port(1)
>>> device
'sda'
>>> device = "/dev/{0}".format(device)
>>> device
'/dev/sda'
>>> partition = device + "1"
>>> partition
'/dev/sda1'
>>> mount_list = shellutil.run_get_output("mount")[1]
>>> mount_list
'sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)\nproc on /proc type proc (rw,nosuid,nodev,noexec,relatime)\ndevtmpfs on /dev type devtmpfs (rw,nosuid,size=28731092k,nr_inodes=7182773,mode=755)\nsecurityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)\ntmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,noexec)\ndevpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)\ntmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)\ntmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)\ncgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)\npstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)\nbpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)\ncgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)\ncgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)\ncgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)\ncgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)\ncgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)\ncgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)\ncgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)\ncgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)\ncgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)\ncgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)\ncgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)\nnone on /sys/kernel/tracing type tracefs (rw,relatime)\nconfigfs on /sys/kernel/config type configfs (rw,relatime)\n/dev/mapper/rootvg-rootlv on / type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/rootvg-usrlv on /usr type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\nmqueue on /dev/mqueue type mqueue (rw,relatime)\nsystemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=38,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=26069)\nhugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)\ndebugfs on /sys/kernel/debug type debugfs (rw,relatime)\nbinfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)\n/dev/sdac1 on /boot type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/sdac15 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro)\n/dev/mapper/vg_standard-home on /home type xfs (rw,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-opt on /opt type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-sso on /sso type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-tmp on /tmp type xfs (rw,nosuid,nodev,noexec,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_sso_sfw_oracle-ssosfworacle on /sso/sfw/oracle type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_sso_data_oracle_backup-ssodataoraclebackup on /sso/data/oracle/backup type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=256,swidth=1792,noquota)\n/dev/mapper/vg_sso_data_oracle_flash01-ssodataoracleflash01 on /sso/data/oracle/flash01 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=256,swidth=1792,noquota)\n/dev/mapper/vg_standard-var on /var type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_sso_data_oracle_data01-ssodataoracledata01 on /sso/data/oracle/data01 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=256,swidth=2816,noquota)\n/dev/mapper/vg_standard-varcache on /var/cache type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-vartmp on /var/tmp type xfs (rw,nosuid,nodev,noexec,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-varlog on /var/log type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-ssomonitoring on /sso/monitoring type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-varlogaudit on /var/log/audit type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-varlogjournal on /var/log/journal type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\nsunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)\ntmpfs on /run/user/18448 type tmpfs (rw,nosuid,nodev,relatime,size=5750048k,mode=700,uid=18448,gid=10029)\ntracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime)\ntmpfs on /run/user/83249 type tmpfs (rw,nosuid,nodev,relatime,size=5750048k,mode=700,uid=83249,gid=18251)\ntmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=5750048k,mode=700)\ntmpfs on /run/user/47006 type tmpfs (rw,nosuid,nodev,relatime,size=5750048k,mode=700,uid=47006,gid=100)\n'
# now define get_mount_point() so we can trace line 103:
>>> def get_mount_point(mountlist, device):
... if (mountlist and device):
... for entry in mountlist.split('\n'):
... if(re.search(device, entry)):
... tokens = entry.split()
... return tokens[2] if len(tokens) > 2 else None
... return None
...
# Now, call line 103 and note that we get back /boot
>>> existing = get_mount_point(mount_list, device)
>>> existing
'/boot'
# Now, redefine get_mount_point() with a more specific regex:
>>> def get_mount_point(mountlist, device):
... if (mountlist and device):
... for entry in mountlist.split('\n'):
... if(re.search(device + '[0-9 ]', entry)):
... tokens = entry.split()
... return tokens[2] if len(tokens) > 2 else None
... return None
...
# And note that None was returned with this new regex:
>>> existing = get_mount_point(mount_list, device)
>>> existing
>>> type(existing)
<class 'NoneType'>
>>>
After making the above change, restarting waagent yields a better waagent.log:
2022-06-28T13:15:06.767313Z INFO Daemon Daemon Azure Linux Agent Version:2.2.49.2
2022-06-28T13:15:06.767847Z INFO Daemon Daemon OS: redhat 8.3
2022-06-28T13:15:06.770140Z INFO Daemon Daemon Python: 3.6.8
2022-06-28T13:15:06.770433Z INFO Daemon Daemon Run daemon
2022-06-28T13:15:06.771324Z INFO Daemon Daemon No RDMA handler exists for distro='Red Hat Enterprise Linux' version='8.3'
2022-06-28T13:15:06.795482Z INFO Daemon Daemon Error getting cloud-init enabled status from systemctl: Command '['systemctl', 'is-enabled', 'cloud-init-local.service']' returned non-zero exit status 1.
2022-06-28T13:15:06.847602Z INFO Daemon Daemon Error getting cloud-init enabled status from service: Command '['service', 'cloud-init', 'status']' returned non-zero exit status 3.
2022-06-28T13:15:06.848149Z INFO Daemon Daemon cloud-init is enabled: False
2022-06-28T13:15:06.850281Z INFO Daemon Daemon Using waagent for provisioning
2022-06-28T13:15:06.851542Z INFO Daemon Daemon Activate resource disk
2022-06-28T13:15:06.852096Z INFO Daemon Daemon Searching gen1 prefix 00000000-0001 or gen2 f8b3781a-1e82-4818-a1c3-63d806ec15bb
2022-06-28T13:15:06.854842Z INFO Daemon Daemon Found device: sda
2022-06-28T13:15:07.218629Z INFO Daemon Daemon Examining partition table
2022-06-28T13:15:07.234279Z INFO Daemon Daemon GPT not detected, determining filesystem
2022-06-28T13:15:07.241598Z INFO Daemon Daemon sfdisk --part-type -f /dev/sda 1 -n succeeded
2022-06-28T13:15:07.243245Z INFO Daemon Daemon The partition type is 83
2022-06-28T13:15:07.244988Z INFO Daemon Daemon Mount resource disk [mount -t ext4 /dev/sda1 /mnt/resource]
2022-06-28T13:15:07.357079Z INFO Daemon Daemon Resource disk /dev/sda is mounted at /mnt/resource with ext4
2022-06-28T13:15:07.358579Z INFO Daemon Daemon Enable swap
2022-06-28T13:15:07.925162Z INFO Daemon Daemon Enabled 16896000KB of swap at /mnt/resource/swapfile
Distro and WALinuxAgent details (please complete the following information):
rowagn@kmb14au:~#> uname -a
Linux kmb14au.vsp.sas.com 4.18.0-240.22.1.el8_3.x86_64 #1 SMP Thu Mar 25 14:36:04 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
rowagn@kmb14au:~#> cat /etc/redhat-release
Red Hat Enterprise Linux release 8.3 (Ootpa)
rowagn@kmb14au:~#> waagent --version
WALinuxAgent-2.2.49.2 running on redhat 8.3
Python: 3.6.8
Goal state agent: 2.7.3.0
rowagn@kmb14au:~#>
Additional context
I believe a necessary prerequisite for the problem to occur is the Azure VM needs to have more than 26 disks attached to it, so that the /dev/sd* device names roll over to /dev/sdaX (where X is a letter). Until that rollover occurs, a regex looking for /dev/sda will not match something inappropriately. In sum, the existing regex matches on any device name with a PREFIX of /dev/sda, which can only be incorrect once more than 26 drives are attached.
Log file attached
I provided the relevant portions of the log, above. That said, if having the entire log is helpful, I'm happy to provide it.
@anhvoms could you take a look?
@rwagnergit
RHEL 8.1+ and RHEL 7.7+ on Azure should be using cloud-init for provisioning and the resource disk formatting/partitioning should be handled by cloud-init. Cloud-init does a better job at discovering the resource disk by looking at the alias /dev/disk/cloud/resource instead.
@rwagnergit walinuxagent provisioning is not deprecated. It is, however, considered to be in maintenance mode (we will only release patches for security related bugs)