In the Linux kernel, a vulnerability with the Common Vulnerabilities and Exposures (CVE) identifier CVE-2024-57888 has been resolved. The issue pertained to workqueues and caused unwarranted warnings in Amdgpu, a driver for AMD Radeon graphics cards. This blog post will provide an overview of the vulnerability, code snippets demonstrating the changes, and links to original references for further reading.
Background
After commit 746ae46c1113, the Linux kernel started showing the following warning:
[ ] workqueue: WQ_MEM_RECLAIM sdma:drm_sched_run_job_work [gpu_sched] is flushing !WQ_MEM_RECLAIM events:amdgpu_device_delay_enable_gfx_off [amdgpu]
...
[ ] Workqueue: sdma drm_sched_run_job_work [gpu_sched]
...
[ ] Call Trace:
[ ] <TASK>
...
[ ] ? check_flush_dependency+xf5/x110
...
[ ] cancel_delayed_work_sync+x6e/x80
[ ] amdgpu_gfx_off_ctrl+xab/x140 [amdgpu]
[ ] amdgpu_ring_alloc+x40/x50 [amdgpu]
[ ] amdgpu_ib_schedule+xf4/x810 [amdgpu]
[ ] ? drm_sched_run_job_work+x22c/x430 [gpu_sched]
[ ] amdgpu_job_run+xaa/x1f [amdgpu]
[ ] drm_sched_run_job_work+x257/x430 [gpu_sched]
[ ] process_one_work+x217/x720
...
[ ] </TASK>
The verification in check_flush_dependency aimed to ensure forward progress during memory reclaim by flagging cases when either a memory reclaim process or a memory reclaim work item is flushed from a context not marked as memory reclaim safe.
This was correct when flushing, but when called from the cancel(_delayed)_work_sync() paths, it produced a false positive. This is because the work is either already running or will not be running at all. Thus, canceling it is safe, and the warning criteria can be relaxed by informing the helper of the calling context.
Solution
To resolve this issue, the warning criteria in check_flush_dependency have been updated. This change prevents unwarranted warnings from appearing when the code is run.
Exploit Details
Due to the nature of this vulnerability, there are no known exploits targeting it.
For further reading, refer to the following links
- Commit 746ae46c1113: The commit that introduced the vulnerability.
- Linux kernel source code: The source code of the Linux kernel, where the vulnerability was found and resolved.
In conclusion, the Linux kernel vulnerability CVE-2024-57888 has been resolved by updating the warning criteria in the check_flush_dependency function. This prevents unwanted warnings in the Amdgpu driver and ensures better stability. Users are encouraged to keep their systems up-to-date to benefit from the latest security patches.
Timeline
Published on: 01/15/2025 13:15:13 UTC
Last modified on: 01/20/2025 06:28:56 UTC