A vulnerability has been discovered and resolved in the Linux kernel, specifically in the IB/ipoib subsystem. This vulnerability, assigned the identification number CVE-2023-52587, allowed a race condition to happen during ipoib_mcast_join_task() execution, when iterating the priv->multicast_list with the lock, priv->lock, temporarily released. As a result, a hard lockup could occur in certain Linux kernel versions (reported on RHEL 4.18.-372.75.1.el8_6 kernel).
Affected Components
- Linux kernel's IB/ipoib subsystem
Exploit Details
To understand the vulnerability and how it was fixed, consider the following scenario with two tasks running concurrently:
Task A (kworker/u72:2 below) | Task B (kworker/u72: below)
-----------------------------------+-----------------------------------
ipoib_mcast_join_task(work) | ipoib_ib_dev_flush_light(work)
spin_lock_irq(&priv->lock) | __ipoib_ib_dev_flush(priv, ...)
list_for_each_entry(mcast, | ipoib_mcast_dev_flush(dev = priv->dev)
&priv->multicast_list, list) |
ipoib_mcast_join(dev, mcast) |
spin_unlock_irq(&priv->lock) |
| spin_lock_irqsave(&priv->lock, flags)
| list_for_each_entry_safe(mcast, tmcast,
| &priv->multicast_list, list)
| list_del(&mcast->list);
| list_add_tail(&mcast->list, &remove_list)
| spin_unlock_irqrestore(&priv->lock, flags)
spin_lock_irq(&priv->lock) |
| ipoib_mcast_remove_list(&remove_list)
(Here, mcast is no longer on the | list_for_each_entry_safe(mcast, tmcast,
priv->multicast_list and we keep | remove_list, list)
spinning on the remove_list of | >>> wait_for_completion(&mcast->done)
the other thread which is blocked |
and the list is still valid on |
it's stack.)
The issue occurs when Task A releases the priv->lock after calling spin_unlock_irq() and before resuming the iteration on the priv->multicast_list. In this window, Task B acquires the lock and starts removing items from the priv->multicast_list. As a result, Task A may end up in an infinite loop, trying to iterate over the removed mcast items, causing a hard lockup in the kernel.
Fix:
The vulnerability was fixed by ensuring the lock, priv->lock, is held all the time while iterating the priv->multicast_list. Additionally, the GFP_KERNEL memory allocation flag was replaced with GFP_ATOMIC to prevent the kernel from sleeping while the lock is held:
- spin_unlock_irq(&priv->lock);
- if (!ret)
- wait_for_completion(&mcast->done);
- spin_lock_irq(&priv->lock);
While it was not possible to reproduce the lockup issue consistently, the code review and the applied fix suggests that the vulnerability should be resolved.
Migrate to newer kernel versions where this vulnerability is patched to avoid potential hard lockup in the Linux kernel due to this race condition.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c3aa5371c1904c3252937a2dc6e5ac201a16b134
Timeline
Published on: 03/06/2024 07:15:07 UTC
Last modified on: 06/27/2024 12:15:14 UTC