A recent Linux kernel vulnerability, identified as CVE-2024-26859, was found in the net/bnx2x module, affecting the handling of EEH errors and causing potential system crashes. The vulnerability stems from a race condition between two threads trying to access the same freed memory location. In this article, we'll discuss the details of the vulnerability, its potential impact, and the solution to prevent this issue from occurring.

Vulnerability Details

The vulnerability exists in the bnx2x driver's transmit timeout logic that could cause a race condition during EEH error recovery. The issue occurs when bnx2x_tx_timeout() schedules reset tasks via bnx2x_sp_rtnl_task(), which ultimately leads to bnx2x_nic_unload(). In the bnx2x_nic_unload() function, SGEs (Scatter-Gather Elements) are freed using bnx2x_free_rx_sge_range(). However, this could overlap with the EEH driver's attempt to reset the device using bnx2x_io_slot_reset(), which also tries to free SGEs.

This race condition can result in system crashes due to accessing freed memory locations in the bnx2x_free_rx_sge() function where sw_buf (a pointer to the struct sw_rx_page) is set to NULL after a call to dma_unmap_page() by the preceding thread. The code snippet causing this error is as follows:

799  static inline void bnx2x_free_rx_sge(struct bnx2x *bp,
800                struct bnx2x_fastpath *fp, u16 index)
801  {
802    struct sw_rx_page *sw_buf = &fp->rx_page_ring[index];
803    struct page *page = sw_buf->page;

The call trace for the vulnerability is as shown below:

Call Trace:
[c000000003c67a20] [c00800000250658c] bnx2x_io_slot_reset+x204/x610 [bnx2x] (unreliable)
[c000000003c67af] [c0000000000518a8] eeh_report_reset+xb8/xf
[c000000003c67b60] [c000000000052130] eeh_pe_report+x180/x550
[c000000003c67c70] [c00000000005318c] eeh_handle_normal_event+x84c/xa60
[c000000003c67d50] [c000000000053a84] eeh_event_handler+xf4/x170
[c000000003c67da] [c000000000194c58] kthread+x1c8/x1d
[c000000003c67e10] [c00000000000cf64] ret_from_kernel_thread+x5c/x64

Solution

To resolve this vulnerability and avoid system crashes, it is necessary to verify page pool allocations before freeing them. The fix involves adding proper checks and synchronization mechanisms in the bnx2x_free_rx_sge() function to ensure that memory locations are not accessed after they have been freed.

In summary, CVE-2024-26859 is a critical Linux kernel vulnerability that can lead to system crashes due to incorrect handling of memory resources during EEH error recovery. To mitigate the effects of this vulnerability, Linux kernel developers should apply the above-mentioned fix to protect their systems from potential crashes and maintain stability.

1. Linux Kernel Mailing List (LKML) Patch
2. Linux Kernel Source Code Repository

Timeline

Published on: 04/17/2024 11:15:08 UTC
Last modified on: 06/27/2024 12:15:21 UTC