A new security vulnerability, CVE-2024-26943, has been identified and resolved in the Linux kernel. The vulnerability originates from the kcalloc() function in nouveau_dmem_evict_chunk() when handling the allocation failure for the nouveau/dmem module. If the physical memory is exhausted, kcalloc() returns null, leading to the dereferencing of src_pfns, dst_pfns or dma_addrs and ultimately, causing null pointer dereference bugs.

This article provides a detailed explanation of the vulnerability, its consequences, and how it has been resolved. Relevant code snippets and links to original references can be found throughout the article to provide a better understanding of the exploit and its details.

Exploit Details

The Linux kernel's nouveau/dmem module is used to allocate and manage memory in NVIDIA GPUs (nouveau is the open-source NVIDIA GPU driver). The vulnerability exists in the function nouveau_dmem_evict_chunk() as it does not properly handle cases when the kcalloc() function is unable to allocate memory. The bug occurs when the physical memory runs out, and the kcalloc() call returns a null pointer.

In the original code, it did not check for the allocation failure of kcalloc(). When kcalloc() fails and returns null, attempting to dereference pointers for src_pfns, dst_pfns or dma_addrs results in null pointer dereference bugs.

Moreover, if GPU memory allocation fails with kcalloc(), pages that were supposed to be evicted from the GPU can't be properly evicted. To handle this scenario, a proposed solution is to use kcalloc() with the __GFP_NOFAIL flag.

Proposed Solution

A patch has been submitted that addresses this vulnerability in two ways. First, it adds the __GFP_NOFAIL flag to the kcalloc() function call, ensuring that the function will not fail due to memory allocation issues. This helps handle the scenario when physical memory has run out.

Second, the patch replaces the kcalloc() function with the kvcalloc() function. The kvcalloc() function allows allocating a large, virtually contiguous memory area without the requirement of physical contiguity. This change reduces the chances of memory allocation failure in the first place.

Here's the code snippet reflecting the changes applied to the nouveau_dmem_evict_chunk() function

...
- src_pfns = kcalloc(chunk->npages, sizeof(*src_pfns), GFP_KERNEL);
+ src_pfns = kvcalloc(chunk->npages, sizeof(*src_pfns), GFP_KERNEL | __GFP_NOFAIL);
...
- dst_pfns = kcalloc(chunk->npages, sizeof(*dst_pfns), GFP_KERNEL);
+ dst_pfns = kvcalloc(chunk->npages, sizeof(*dst_pfns), GFP_KERNEL | __GFP_NOFAIL);
...
- dma_addrs = kcalloc(chunk->npages, sizeof(*dma_addrs), GFP_KERNEL);
+ dma_addrs = kvcalloc(chunk->npages, sizeof(*dma_addrs), GFP_KERNEL | __GFP_NOFAIL);
...

1. Linux Kernel Mailing List (LKML) - Vulnerability Report
2. LKML - Proposed Patch

Conclusion

The vulnerability CVE-2024-26943 in the Linux kernel's nouveau/dmem module could result in null pointer dereference bugs and improper GPU memory management when the physical memory runs out. It has been resolved by adding the __GFP_NOFAIL flag to the kcalloc() call and replacing kcalloc() with kvcalloc(). Users are advised to update their Linux kernel to the latest version to benefit from this and other security fixes.

Timeline

Published on: 05/01/2024 06:15:09 UTC
Last modified on: 05/29/2024 05:25:36 UTC