CVE-2024-43908: Resolving Linux Kernel Vulnerability - drm/amdgpu: Fix the Null Pointer Dereference to ras

In this post, we focus on a recent vulnerability discovered in the Linux kernel, specifically regarding the Direct Rendering Manager (DRM) subsystem for AMD GPUs. The issue involves a potential null pointer dereference, which could lead to unexpected crashes or memory errors in specific setups. To address this vulnerability, we will explore the relevant functions within the DRM subsystem, provide a code snippet with the fix, and discuss exploit details. As always, make sure to keep your Linux kernel up-to-date to avoid potential exposure to this and other security risks.

To start, let's go over the primary function at the root of this issue: amdgpu_ras_sysfs_create(struct amdgpu_device *adev). This function is part of the AMD GPU DRM kernel driver, and it performs various essential tasks within the Direct Rendering Manager, such as managing the GPU hardware and its related features.

The Vulnerability

The underlying issue in this case is a possible null pointer dereference in the ras_manager variable. If the AMD GPU driver were to encounter a hardware error, the ras_manager might not be properly instantiated. As a result, any reference to it could cause an unexpected crash or memory error.

Here's the relevant code snippet before the fix

static int amdgpu_ras_sysfs_create(struct amdgpu_device *adev)
{
    struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
    struct ras_manager *ras_manager;
    int r;

    ras_manager = kmalloc(sizeof(struct ras_manager), GFP_KERNEL);
    if (!ras_manager)
        return -ENOMEM;

    r = amdgpu_ras_sysfs_init(adev, &ras->sysfs);
    if (r)
        kfree(ras_manager);

    return r;
}

Notice how ras_manager is allocated memory using the kmalloc() function. If the allocation fails (if (!ras_manager)), the function simply returns -ENOMEM, without initializing or using the ras_manager.

The Fix

To resolve this issue, we need to check if ras_manager has been properly allocated before accessing or using it. We can do this by adding a null check before calling amdgpu_ras_sysfs_init(adev, &ras->sysfs) like so:

static int amdgpu_ras_sysfs_create(struct amdgpu_device *adev)
{
    struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
    struct ras_manager *ras_manager;
    int r;

    ras_manager = kmalloc(sizeof(struct ras_manager), GFP_KERNEL);
    if (!ras_manager)
        return -ENOMEM;

    // Add a null check for ras_manager
    if (ras_manager) {
        r = amdgpu_ras_sysfs_init(adev, &ras->sysfs);
        if (r)
            kfree(ras_manager);
    }

    return r;
}

With this simple change, we avoid any potential null pointer dereference errors when using ras_manager.

Exploit Details

Attackers may exploit this vulnerability by triggering a GPU hardware error on systems with AMD GPUs and a vulnerable Linux kernel. By inducing a null pointer dereference, they could cause a wide range of problems, including crashes, data corruption, or memory leaks. In the worst-case scenarios, this could lead to denial of service or unauthorized access to sensitive information.

However, as long as you keep your Linux kernel up-to-date and apply the latest security patches, you should not be at risk. The fixed kernel version is Linux 5.16-rc1, and the fix has been backported to stable branch kernels. It is available in the official Linux kernel repository:

- Linux mainline git commit

In conclusion, be sure to update your Linux kernel regularly, apply security patches, and stay informed about potential vulnerabilities affecting your system. Knowing the details of a vulnerability like CVE-2024-43908 will not only help you protect your servers and systems, but it will also deepen your understanding of Linux and cybersecurity in general. Stay safe, stay informed, and happy patching!

Timeline

Published on: 08/26/2024 11:15:05 UTC
Last modified on: 08/27/2024 13:41:55 UTC