CVE-2024-50082: Linux Kernel Vulnerability Resolved - blk-rq-qos: Fix Crash on rq_qos_wait vs. rq_qos_wake_function Race
The Linux kernel recently fixed a vulnerability concerning the blk-rq-qos subsystem that aids in avoiding crashes due to a race condition between the rq_qos_wait and rq_qos_wake_function operations. This vulnerability could have caused a crash, leading to potential data loss or system instability for Linux users. In this post, we will dive deep into the inner workings of the issue, discuss the cause, and explain how it was resolved.
Problem Statement
The problem was observed when crashes from the rq_qos_wake_function exhibited the following diagnostic output (source):
BUG: unable to handle page fault for address: ffffafe180a40084
#PF: supervisor write access in kernel mode
#PF: error_code(x0002) - not-present page
PGD 100000067 P4D 100000067 PUD 10027c067 PMD 10115d067 PTE
...
In layman's terms, this output signifies that the kernel encountered an issue while trying to handle a page fault in the memory address ffffafe180a40084.
Cause
The root cause turned out to be a race condition between the rq_qos_wait() and rq_qos_wake_function() functions. The rq_qos_wait() function prepares the system to wait for a token, and the rq_qos_wake_function() wakes the system up when a token is available. The issue occurs when rq_qos_wait() returns before the wake_function has finished executing, leading to crashes.
The rq_qos_wait() function would call the prepare_to_wait_exclusive() subroutine.
2. The rq_qos_wake_function() would set data->got_token = true; and delete the waitqueue entry with list_del_init(&curr->entry);.
3. If the token is already received (data.got_token), rq_qos_wait() proceeds to break; and return by calling the finish_wait() function, as it believes the token is obtained.
4. The rq_qos_wake_function() would then call wake_up_process(data->task) using an invalid task reference, leading to a crash.
Solution
To address this issue, developers made amendments to the rq_qos_wake_function(), changing the order of operations and utilizing list_del_init_careful() to match list_empty_careful() called in finish_wait(). The changes improve the synchronization between the waiter and waker, thereby preventing this race condition from leading to a crash.
After applying this fix, the Linux kernel is now more stable and less prone to crashing due to the identified race condition. Details about the Linux kernel version containing this fix can be found in the official patch submission.
Conclusion
The resolution of CVE-2024-50082 highlights the importance of diligence and collaboration in the open-source community for maintaining the security and stability of the Linux kernel. By identifying, analyzing, and addressing such vulnerabilities, the Linux kernel ensures a more stable and secure experience for all of its users.
Always ensure that you keep your Linux kernel up to date with the latest releases and patches to benefit from such fixes and stay safe from potential vulnerabilities. Happy computing!
Timeline
Published on: 10/29/2024 01:15:05 UTC
Last modified on: 10/30/2024 15:44:05 UTC