A recently discovered vulnerability in the Linux kernel related to the DRM/i915 VMA has been resolved. The vulnerability was causing sporadic issues with object debugging tools, reporting illegal attempts to free a still active i915 VMA object when parking a GT believed to be idle. The full details of the vulnerability are available in the commit message: drm/i915/vma: Fix UAF on destroy against retire race.

The issue was happening when another thread was deactivating the VMA inside the __active_retire() helper function after the VMA's active counter had been decremented to , but before the deactivation of the VMA's object was reported to the object debugging tool. Instead of fixing the issue at the i915_active level, the fix targets the VMA level.

The fix involves holding the GT wakeref long enough for __active_retire() to complete before that wakeref is released and the GT is parked. This issue is believed to be introduced by commit d93939730347 ("drm/i915: Remove the vma refcount"), which moved a call to i915_active_fini() from a dropped i915_vma_release(), called on the last put of the removed VMA kref, to i915_vma_parked() processing path called on the last put of a GT wakeref.

As a result, the patch includes the following code snippet

/* i915_vma.c */
{
   intel_gt_pm_put_async(vma->vm->gt);
}

This fix helps ensure that the VMA associated with a request does not acquire a GT wakeref by itself, instead relying on a wakeref held by the request's active intel_context for a GT associated with its VM and indirectly on the intel_context's engine wakeref.

To avoid future issues and maintain stability, the patch provides additional justifications and makes use of untracked variants of GT pm_get/put functions, uses the asynchronous variant of wakeref put, excludes the global GTT from the processing path, and ensures proper locking dependencies.

For the full details and commit history, refer to the original patch submission and discussion here.

Exploiting this vulnerability would have required an attacker to induce a race condition between threads and could have potentially led to use-after-free (UAF) issues. However, the vulnerability has now been patched, making it much safer for users and systems running the affected Linux kernel versions.

Timeline

Published on: 05/01/2024 06:15:09 UTC
Last modified on: 07/03/2024 01:50:03 UTC