Azure / WALinuxAgent

Microsoft Azure Linux Guest Agent

Home Page:http://azure.microsoft.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Cgroup monitoring doesn't support RHEL-9

yuxisun1217 opened this issue · comments

Describe the bug: A clear and concise description of what the bug is.
In RHEL-9, the logs collector cannot be enabled because cgroup monitoring cannot be enabled. It seems there're 2 reasons:

  1. Cgroup monitoring is not supported on ['rhel', '9.1', 'Plow', 'Red Hat Enterprise Linux']. The distro_name is 'rhel' but not 'redhat' so that cannot match the condition in the following function:
class CGroupsApi(object):
...
        return ((distro_name.lower() == 'ubuntu' and distro_version.major >= 16) or
                (**distro_name.lower() in ("centos", "redhat")** and
                 ((distro_version.major == 7 and distro_version.minor >= 8) or distro_version.major >= 8)))
  1. In RHEL-9 it uses cgroup2. And WALA failed to find cpu and memory path.
2022-06-20T10:02:13.301737Z INFO ExtHandler ExtHandler [CGW] The CPU cgroup controller is not mounted
2022-06-20T10:02:13.304658Z INFO ExtHandler ExtHandler [CGW] The memory cgroup controller is not mounted
2022-06-20T10:02:13.309621Z INFO ExtHandler ExtHandler [CGI] cgroups v2 mounted at /sys/fs/cgroup.  Controllers: [cpuset cpu io memory hugetlb pids rdma misc
]
2022-06-20T10:02:13.310793Z INFO ExtHandler ExtHandler [CGW] The agent's process is not within a CPU cgroup
2022-06-20T10:02:13.311341Z INFO ExtHandler ExtHandler [CGW] The agent's process is not within a memory cgroup
2022-06-20T10:02:13.311828Z INFO ExtHandler ExtHandler [CGI] Agent cgroups enabled: False

Distro and WALinuxAgent details (please complete the following information):

  • Distro and Version: RHEL-9.1
  • WALinuxAgent version:
    WALinuxAgent-2.7.0.6 running on rhel 9.1
    Python: 3.9.10
    Goal state agent: 2.7.0.6

@yuxisun1217 Do you happen to know why cgroup v1 support was removed from these distros? Is this specific to rhel images or anything after particular Kernel version don't have cgroup v1?

@yuxisun1217 since you pointed out distro name change, we wonder how the Agent packaged into RHEL image when agent setup required a change to copy agent unit, config and etc. files. Don't you use agent setup when packaging? or Do you guys modify/customize the agent to have necessary files. If so, we would like to have those changes in the upstream when you modify the agent for building base image. so that it won't break for customers who build it from here(source).

Hi @nagworld9 ,
Sorry I don't have knowledge base for other distros...Start from RHEL-9.0, by default, mounts and utilizes cgroups-v2. I think it doesn't mean we don't support cgroup-v1, but the default one is cgroups-v2.(see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/managing_monitoring_and_updating_the_kernel/setting-limits-for-applications_managing-monitoring-and-updating-the-kernel)

@nagworld9 About the distro name change, perhaps it's because in RHEL-9.0 the python version is 3.9, and the "linux_distribution" function is removed from platform. So it calls the distro.linux_distribution function to get the distro name.
And here is the distro id list:

def id():
    """
    Return the distro ID of the current distribution, as a
    machine-readable string.

    For a number of OS distributions, the returned distro ID value is
    *reliable*, in the sense that it is documented and that it does not change
    across releases of the distribution.

    This package maintains the following reliable distro ID values:

    ==============  =========================================
    Distro ID       Distribution
    ==============  =========================================
    "ubuntu"        Ubuntu
    "debian"        Debian
    "rhel"          RedHat Enterprise Linux
    "centos"        CentOS
    "fedora"        Fedora
    "sles"          SUSE Linux Enterprise Server
    "opensuse"      openSUSE
    "amazon"        Amazon Linux
    "arch"          Arch Linux
    "cloudlinux"    CloudLinux OS
    "exherbo"       Exherbo Linux
    "gentoo"        GenToo Linux
    "ibm_powerkvm"  IBM PowerKVM
    "kvmibm"        KVM for IBM z Systems
    "linuxmint"     Linux Mint
    "mageia"        Mageia
    "mandriva"      Mandriva Linux
    "parallels"     Parallels
    "pidora"        Pidora
    "raspbian"      Raspbian
    "oracle"        Oracle Linux (and Oracle Enterprise Linux)
    "scientific"    Scientific Linux
    "slackware"     Slackware
    "xenserver"     XenServer
    "openbsd"       OpenBSD
    "netbsd"        NetBSD
    "freebsd"       FreeBSD
    "midnightbsd"   MidnightBSD
    ==============  =========================================