A critical race condition vulnerability has been identified and resolved in the Linux kernel's Nouveau driver (specifically, the NVIDIA GPU driver). The vulnerability could lead to crashes or unstable behavior under certain conditions when multiple threads are involved. This post discusses the details of the vulnerability, its impact, and the code changes made to address it.

Vulnerability Details

The vulnerability, identified as CVE-2024-26984, was discovered as a result of running Vulkan CTS (Conformance Test Suite) tests in parallel against the Nouveau driver. Sometimes, the kernel experienced a NULL pointer dereference, leading to an Oops error and a crash.

The root cause of the issue lies in a race condition surrounding the use of the 'ptrs' field in the 'nv50_instobj_acquire()' function. In particular, if two threads (Thread A and Thread B) access this function simultaneously, there is a chance that one thread reads an uninitialized (NULL) pointer value, resulting in a crash.

To resolve this issue, a memory barrier has been added to the code to prevent this problematic scenario. Specifically, an 'smp_mb()' memory barrier was introduced to enforce proper ordering of memory operations and prevent the race condition. In the updated code, an 'smp_rmb()' (read memory barrier) and an 'smp_wmb()' (write memory barrier) pair is used instead.

Below is the relevant section of code before the fix

// In nv50_instobj_acquire()

if (refcount_inc_not_zero(&data->refcount)) {
    // ptrs value may not be stored yet, leading to a race condition
    return data->ptrs;
}

And here is the section of code after the fix

// In nv50_instobj_acquire()

if (refcount_inc_not_zero(&data->refcount)) {
    smp_rmb(); // Add memory barrier to ensure proper ordering
    return data->ptrs;
}

// ...

smp_wmb(); // Another memory barrier before storing the ptrs value

This change ensures that the 'ptrs' value is correctly stored and visible to other threads before any other thread can successfully increment the reference count and access it.

Original References

- Kernel commit that fixes the issue
- Discussion of the issue on LKML

Conclusion

The race condition vulnerability (CVE-2024-26984) in the Linux kernel's Nouveau driver has been resolved, preventing potential crashes and instability when running Vulkan CTS tests in parallel. The fix involves the use of memory barriers to ensure correct ordering of memory operations and to prevent the NULL pointer dereference issue. Always ensure that your systems are running the latest versions of the Linux kernel and associated drivers to protect against vulnerabilities such as this one. Keep an eye on future kernel updates and security announcements for more information about potential vulnerabilities and fixes.

Timeline

Published on: 05/01/2024 06:15:15 UTC
Last modified on: 07/03/2024 01:50:12 UTC