A recently resolved vulnerability in the Linux kernel's crypto: qat driver exposed a race condition during PCI AER (Advanced Error Reporting) recovery. This vulnerability has been registered under CVE-2024-26974. In this long read post, we will cover the specifics of this vulnerability, a code snippet that demonstrates the problem, and links to the original references that detail the exploit and the subsequent fix.

Vulnerability Details

During the error recovery process of the PCI AER system in Linux kernel, the kernel driver might run into a race condition with releasing the memory associated with the reset_data structure. When a device restart takes longer than 10 seconds, the function responsible for scheduling the restart will exit due to a timeout and consequently free the reset_data structure. Nevertheless, this data structure is vital for completion notification after the restart is completed, which leads to a use-after-free (UAF) bug and subsequently a KFENCE bug notice. This can be observed in the following example log entry:

BUG: KFENCE: use-after-free read in adf_device_reset_worker+x38/xa [intel_qat]
Use-after-free read at x00000000bc56fddf (in kfence-#142):
adf_device_reset_worker+x38/xa [intel_qat]
process_one_work+x173/x340

The primary reason for this race condition is the premature freeing of the memory associated with the container encompassing the work_struct during the device reset process.

Solution

To address this race condition, the memory allocation methodology for the container of the work_struct must be modified. If the timeout expires, the memory should be freed on the worker; if not, the memory should be freed on the function responsible for scheduling the worker. The timeout detection can be achieved by verifying if the caller is still expecting completion or not, utilizing the completion_done() function.

For a more in-depth understanding of the vulnerability, readers can refer to the following sources

1. Linux Kernel Commit: 5d31ec36b18d
2. Linux Kernel Mailing List: ["[PATCH] crypto: qat - resolve race condition"](https://lore.kernel.org/lkml/20220315062300.407040-1-kristen@iteas.at/)

Conclusion

CVE-2024-26974 highlights a race condition vulnerability during the Linux kernel's crypto: qat PCI AER recovery process. By following the solution mentioned above, the Linux kernel team has successfully resolved this vulnerability, ensuring the proper handling of memory and completion notifications associated with the reset_data structure. It is essential to remain vigilant and continuously audit the Linux kernel and its drivers for similar vulnerabilities, maintaining system security and stability.

Timeline

Published on: 05/01/2024 06:15:14 UTC
Last modified on: 12/19/2024 08:51:32 UTC