Recently, the Linux kernel has patched a vulnerability (CVE-2024-41009) in its BPF (Berkeley Packet Filter) ring buffer, which could potentially lead to memory corruption and kernel crashes. The issue was reported by Bing-Jhong Lin and Muhammad Usama Anis, who discovered that it was possible to make a second allocated memory chunk overlap with the first chunk, allowing the BPF program to edit the first chunk's header. This fix has been developed to prevent overrunning reservations in the ring buffer and maintain the stability of the Linux kernel.

Vulnerability Details

The vulnerability stems from the BPF ring buffer implementation, which uses a power-of-2 sized circular buffer. The buffer consists of two logical counters: consumer_pos, showing which logical position the consumer consumed the data, and producer_pos, the producer counter denoting the amount of data reserved by all producers. The data area is mapped twice contiguously back-to-back in virtual memory to simplify the implementation and speed up both producers and consumers.

Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for bookkeeping the length and offset, which is inaccessible to the BPF program. Helpers such as bpf_ringbuf_reserve() return (void *)hdr + BPF_RINGBUF_HDR_SZ for the BPF program to use. The vulnerability arises when the BPF program can edit the header after making a second allocated memory chunk overlap with the first chunk.

Exploit Example

Consider the creation of a BPF_MAP_TYPE_RINGBUF map with a size of x400. Next, consumer_pos is modified to x300 before calling bpf_ringbuf_reserve(). This creates a chunk A in the range [x, x3008], with the BPF program allowed to edit [x8, x3008]. Now, allocate chunk B with size x300, which succeeds because consumer_pos was edited ahead of time to pass the new_prod_pos - cons_pos > rb->mask check. Chunk B is in the range [x3008, x601], with the BPF program allowed to edit [x301, x601].

Due to the ring buffer memory layout, the ranges [x, x400] and [x400, x800] point to the same data pages. This means that chunk B at [x400, x4008] is actually chunk A's header. The kernel crash could be triggered when bpf_ringbuf_submit() or bpf_ringbuf_discard() use the header's pg_off, referring to the wrong page and potentially causing a crash.

The Fix

To resolve this issue, the fix involves calculating the oldest pending pos and checking whether the range from the oldest outstanding record to the newest spans beyond the ring buffer size. If so, the request is rejected. The ring buffer benchmark in BPF self-tests (./benchs/run_bench_ringbufs.sh) has been tested with the fix, and while some benchmarks appear slightly slower, the difference is not significant enough to impact overall performance.

Explore the original patch and vulnerability details in the following resources

- Linux Kernel Patch
- CVE Details

In summary, the Linux kernel has addressed the CVE-2024-41009 vulnerability in the BPF ring buffer implementation, preventing potential memory corruption and kernel crashes. By ensuring the proper validation of memory chunks and their headers, this fix helps maintain the stability and security of systems running on Linux.

Timeline

Published on: 07/17/2024 07:15:01 UTC
Last modified on: 07/19/2024 15:06:23 UTC