CVE-2021-46849: XXE Vulnerability Found in pikepdf before 2.10. - Affecting PDF XMP Metadata Parsing

A security vulnerability has been discovered in pikepdf, a powerful and robust Python library, which allows users to read and write PDF files. This recently identified vulnerability is an XML External Entity (XXE) attack, and it specifically targets PDF XMP metadata parsing. The vulnerability, tagged as CVE-2021-46849, has been found in pikepdf versions prior to 2.10.. In this post, we will delve into the details of this vulnerability, explore code snippets that demonstrate the issue, and discuss the possible risks and mitigation strategies.

Background

Pikepdf (https://github.com/pikepdf/pikepdf) is a popular Python library for working with PDF files. It provides users with a simple, clean, and efficient way to read, write, and modify PDF files with Python. However, before version 2.10., pikepdf was susceptible to XXE attacks against its XMP metadata parsing functionality.

An XML External Entity (XXE) attack is a type of security vulnerability that abuses the XML parser’s ability to retrieve external entities when parsing an XML document. The attacker prepares a malicious XML file, which includes entities that reference external resources, such as a local file or a remote host, and sends it to the victim. When the XML parser processes the file, it reads the external entities, potentially disclosing sensitive information or allowing for other malicious activity. More details about XXE attacks can be found here: https://owasp.org/www-community/vulnerabilities/XML_External_Entity_(XXE)_Processing

Exploit Details

To exploit the vulnerability present in pikepdf, the attacker would create a malicious PDF file containing XMP metadata with external entities. This metadata is usually located in the XMP packet of a PDF file. When pikepdf's XMP metadata parser reads this packet, it would attempt to load the external entities, allowing the attacker to access sensitive information or potentially execute further attacks.

Here's a code snippet demonstrating the vulnerable pikepdf code

import pikepdf
from pikepdf import Pdf

def parse_xmp_metadata():
    pdf = pikepdf.open('malicious_pdf.pdf')
    xmp_metadata = pdf.open_metadata()
    # ... process the metadata ...

parse_xmp_metadata()

In the example above, opening a malicious PDF file and subsequently calling pdf.open_metadata() would trigger the XXE attack. The malicious PDF could be constructed with the following XMP packet:

<?xml version="1." encoding="UTF-8"?>
<!DOCTYPE xpacket [
  <!ELEMENT xpacket (ANY)>
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">;
    <rdf:Description>
      <dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">&xxe;</dc:title>;
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>

This XMP packet includes an external entity named "xxe", which would try to load the contents of "/etc/passwd" when parsed.

Mitigation

To remediate this vulnerability, pikepdf users should update their library to version 2.10. or later. This version includes a security fix that prevents XXE attacks while parsing PDF XMP metadata. Specifically, pikepdf now configures its XML parser to not process external entities when parsing XMP metadata.

Here's the link to the pikepdf 2.10. release: https://github.com/pikepdf/pikepdf/releases/tag/2.10.

Conclusion

CVE-2021-46849 is a critical vulnerability that affects pikepdf versions before 2.10.. This XXE attack targets PDF XMP metadata parsing and could lead to serious consequences, such as unauthorized access to sensitive information or further exploitation. By updating pikepdf to version 2.10. or newer, users can secure their Python applications against this vulnerability and ensure the safe processing of PDF files.

Timeline

Published on: 10/24/2022 14:15:00 UTC
Last modified on: 10/24/2022 16:15:00 UTC