The vLLM (Vectorized Language Model) is a high-throughput and memory-efficient inference and serving engine for Language Models (LLMs). It is designed to provide fast processing of input statements, minimizing the latency and maximizing the efficiency for serving multiple clients simultaneously.

However, a security concern has been identified in vLLM, where maliciously constructed statements can lead to hash collisions, resulting in cache reuse. This can potentially interfere with subsequent responses and cause unintended behavior.

This issue has recently been assigned the CVE identifier "CVE-2025-25183". This article provides more information about this vulnerability, a code snippet demonstrating its impact, and links to the original references. Additionally, we will provide information on how to upgrade and mitigate this issue.

Exploit Details

The core issue stems from the prefix caching mechanism implemented in vLLM. It is built using the Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value, which introduces a more feasible way for an attacker to exploit hash collisions.

An attacker, with prior knowledge of the prompts being used and the predictable hashing behavior, could intentionally populate the cache using a prompt known to collide with another prompt in use. If successful, a collision would result in using cache that was generated with different content, possibly leading to incorrect responses and undesirable behavior.

Here's a snippet of a simple example demonstrating the issue

# Before Python 3.12
cache = {}

def process_prompt(prompt):
    hashed = hash(prompt)
    if hashed in cache:
        return cache[hashed]
    else:
        response = vLLM(prompt)  # Simulating vLLM response
        cache[hashed] = response
        return response

# After Python 3.12
cache = collections.defaultdict(int)

def process_prompt(prompt):
    cache[hash(None)] = vLLM(None_prompt)  # Example of an attacker exploiting hash(None)
    hashed = hash(prompt)
    if hashed in cache:
        return cache[hashed]
    else:
        response = vLLM(prompt)  # Simulating vLLM response
        cache[hashed] = response
        return response

Mitigation

This vulnerability has been addressed in the latest version of vLLM (.7.2). All users are advised to upgrade to this version. There are no known workarounds for this vulnerability. To upgrade vLLM, you can use the following instructions:

pip install vLLM==.7.2

Original References

- CVE-2025-25183 - Official CVE Details from MITRE
- vLLM Project - vLLM Repository, where you can find release notes, source code, and more information about the project
- Python 3.12 Hash Change - Python 3.12 Release Notes, containing information about the change in the hash function behavior

Conclusion

The CVE-2025-25183 vulnerability in vLLM is a result of the change in Python's hash function behavior in version 3.12, which leads to cache reuse and possibly unintended behavior. This issue has been fixed in vLLM version .7.2, and all users should upgrade to this version to mitigate potential exploits.

Timeline

Published on: 02/07/2025 20:15:34 UTC