The Linux kernel vulnerability related to the memory migration has been resolved, significantly strengthening the overall system performance and stability. This vulnerability, identified as CVE-2023-52490, had affected the Linux kernel subsystem mm: migrate, and caused system crashes during certain high-load operations like stress-ng testing and memory hotplug.

Description

The vulnerability was triggered when a kernel NULL pointer dereference occurred at a virtual address of 000000000000000. A detailed analysis of the resulting crash dump pinpointed the problem to be within the page migration operations. During the execution of page migration, the target page's ->mapping field was incorrectly assigned the anon_vma pointer, leading to crashes when attempting to dump the mapping of the page. The lack of setting the PAGE_MAPPING_ANON flag made it troublesome in certain scenarios, including PFN walkers and compaction.

Below is a crash report that happened while running stress-ng after several hours of testing

Unable to handle kernel NULL pointer dereference at virtual address 000000000000000
pc : dentry_name+xd8/x224
lr : pointer+x22c/x370
sp : ffff800025f134c
...
Call trace:
dentry_name+xd8/x224
pointer+x22c/x370
vsnprintf+x1ec/x730
vscnprintf+x2c/x60
vprintk_store+x70/x234
vprintk_emit+xe/x24c
vprintk_default+x3c/x44
vprintk_func+x84/x2d
printk+x64/x88
__dump_page+x52c/x530
dump_page+x14/x20
set_migratetype_isolate+x110/x224
start_isolate_page_range+xc4/x20c
offline_pages+x124/x474
memory_block_offline+x44/xf4
memory_subsys_offline+x3c/x70
device_offline+xf/x120
...

Exploit Details

The bug causes the system to crash if another memory hotplug or stress-ng thread tries to offline the target page being migrated. It occurs due to the target page's ->mapping field storing only the anon_vma pointer, without setting the PAGE_MAPPING_ANON flag.

Researchers provided three possible ways to fix the issue

1. Set PAGE_MAPPING_ANON flag for the target page's mapping while saving the anon_vma. However, this could lead to misbehavior in PFN walkers as the target page has not yet built mappings.
2. Acquire the page lock before calling page_mapping() in __dump_page(), which prevents system crashes. But this solution could still cause issues in certain PFN walkers during compaction since they would keep calling page_mapping() without acquiring the page lock.
3. Save the anon_vma pointer and 2 bits page state in the target page's ->private field. This method would ensure there's no significant impact on PFN walkers like in option 1, and it's a simpler fix compared to option 2.

Developers decided to use the third option as the most suitable fix, which not only resolved the CVE-2023-52490 vulnerability but also fixed other potential issues that could appear under similar circumstances, like during the compaction process.

References

* Description of the vulnerability: [Link to the original report]
* Discussion on the bug report: [Link to the mailing list or forum discussion]
* Changelog or commit message that resolves the issue: [Link to the source code that implements the fix]

Timeline

Published on: 03/11/2024 18:15:16 UTC
Last modified on: 11/21/2024 08:39:53 UTC