The Linux kernel has recently addressed a vulnerability present in the dm-raid456 and md-raid456 modules, specifically a deadlock occurring during IO operations concurrent with reshape. This issue was identified and fixed, ensuring that IO operations no longer hang due to reshape not making progress.

MD_RECOVERY_FROZEN is set

After the commit c467e97f079f, which aimed to resolve the issue of IO operations across reshape position not waiting for reshape, a new problem emerged where the dm-raid test shell/lvconvert-raid-reshape.sh would hang. This new problem occurred because reshape could not make progress, creating a deadlock in the system. The deadlock was identified using this stack output from the /proc/979/stack:

[<>] wait_woken+x7d/x90
[<>] raid5_make_request+x929/x1d70 [raid456]
[<>] md_handle_request+xc2/x3b [md_mod]
[<>] raid_map+x2c/x50 [dm_raid]
[<>] __map_bio+x251/x380 [dm_mod]
[<>] dm_submit_bio+x1f/x760 [dm_mod]
[<>] __submit_bio+xc2/x1c
[<>] submit_bio_noacct_nocheck+x17f/x450
[<>] submit_bio_noacct+x2bc/x780
[<>] submit_bio+x70/xc
[<>] mpage_readahead+x169/x1f
[<>] blkdev_readahead+x18/x30
[<>] read_pages+x7c/x3b
[<>] page_cache_ra_unbounded+x1ab/x280
[<>] force_page_cache_ra+x9e/x130
[<>] page_cache_sync_ra+x3b/x110
[<>] filemap_get_pages+x143/xa30
[<>] filemap_read+xdc/x4b
[<>] blkdev_read_iter+x75/x200
[<>] vfs_read+x272/x460
[<>] ksys_read+x7a/x170
[<>] __x64_sys_read+x1c/x30
[<>] do_syscall_64+xc6/x230
[<>] entry_SYSCALL_64_after_hwframe+x6c/x74

The problem did not affect the md/raid module as it does not rely on IO operations to register a new sync_thread, and can switch the array from read-only to read-write using ioctl or sysfs. Additionally, md/raid never sets MD_RECOVERY_WAIT, and MD_RECOVERY_FROZEN can be cleared and the reshape can continue via sysfs API 'sync_action'.

The patch applied to resolve this issue in dm-raid ensures that raid_message() cannot change sync_thread() through raid_message() after presuspend(), and also detects the above three cases before waiting for IO to be completed in dm_suspend(), allowing dm-raid to requeue those IO operations.

With this patch in place, the original deadlock issue in the Linux kernel's dm-raid456 and md/raid456 modules has been resolved. Based on the provided information and the nature of the vulnerability, it has been assigned CVE-2024-26962. Ensure that your Linux systems are updated with the latest patches to avoid running into this deadlock issue.

Original references

- Commit c467e97f079f
- dm-raid: detect and handle the deadlock in dm_suspend()
- md/raid6: use valid sector values to determine if an I/O should wait on the reshape

Timeline

Published on: 05/01/2024 06:15:12 UTC
Last modified on: 12/23/2024 13:39:33 UTC