How To Convert Bytes Array to JSON in Python

In this tutorial, you’ll learn several methods of converting byte arrays to JSON in Python.

We’ll deal with standard UTF-8 encoded JSON, non-UTF encodings, Byte Order Marks (BOM), escaped JSON strings, and even create a custom decoder for specialized data.

Table of Contents hide

1 Basic Conversion Using json.loads()
2 Handling Non-UTF Encodings
3 Dealing with Byte Order Mark (BOM) in JSON Conversion
4 Converting Byte Array Containing Escaped JSON String
5 Using ast.literal_eval() for Safe Conversion
6 Using a Custom Decoder

Basic Conversion Using json.loads()

You can use json.loads() to convert the data into a Python dictionary.

import json
bytes_data = b'{"calls": 150, "messages": 40, "data": 12.5}'
json_data = json.loads(bytes_data.decode('utf-8'))
print(json_data)

Output:

{'calls': 150, 'messages': 40, 'data': 12.5}

This output shows a Python dictionary with keys and values extracted from the JSON object.

Handling Non-UTF Encodings

Imagine you have a bytes array encoded in a different encoding, such as ISO-8859-1 (commonly used in certain European data sets).

You can use the appropriate encoding in the decode method to handle a non-UTF bytes array:

import json
bytes_data = b'{\x22service_calls\x22: 75, \x22network_issues\x22: 5}'  # Encoded in ISO-8859-1
json_data = json.loads(bytes_data.decode('iso-8859-1'))
print(json_data)

Output:

{'service_calls': 75, 'network_issues': 5}

This output shows the JSON object successfully decoded and converted into a Python dictionary.

Dealing with Byte Order Mark (BOM) in JSON Conversion

When working with JSON data from various sources, you may encounter a Byte Order Mark (BOM).

This is common in files originating from systems where UTF-16 or UTF-8 with BOM is the standard encoding.

The BOM can cause issues during the conversion process if not handled correctly.

You can use lstrip() to strip the BOM (BOM_UTF8) from the bytes array before decoding and loading it as JSON

import json
from codecs import BOM_UTF8
bytes_data = BOM_UTF8 + b'{"feedback": "Excellent", "usage": 23.7}'
clean_data = bytes_data.lstrip(BOM_UTF8)
json_data = json.loads(clean_data.decode('utf-8'))
print(json_data)

Output:

{'feedback': 'Excellent', 'usage': 23.7}

This output shows a successful conversion to a Python dictionary, free from any issues caused by the BOM.

Converting Byte Array Containing Escaped JSON String

Sometimes, you might encounter a byte array that contains an escaped JSON string.

Here’s how to handle and convert an escaped JSON string in a byte array:

import json
bytes_data = b'"{\\"customer_id\\": 123, \\"plan\\": \\"premium\\", \\"active\\": true}"'
string_data = bytes_data.decode('utf-8')
unescaped_json = json.loads(string_data)
json_data = json.loads(unescaped_json)
print(json_data)

Output:

{'customer_id': 123, 'plan': 'premium', 'active': True}

In this output, the byte array is first decoded into a string. The resulting string, which is an escaped JSON string, is then unescaped using json.loads.

Finally, this unescaped string is converted into a Python dictionary.

Using ast.literal_eval() for Safe Conversion

When dealing with data from untrusted sources, you need a safer alternative to json.loads() for conversion.

You can use ast.literal_eval() from Python’s ast module, it safely evaluates a string containing a Python literal.

Check the following example:

import ast
bytes_data = b"{'user_id': 200, 'status': 'active', 'balance': 45.30}"
string_data = bytes_data.decode('utf-8')
dict_data = ast.literal_eval(string_data)
print(dict_data)

Output:

{'user_id': 200, 'status': 'active', 'balance': 45.30}

ast.literal_eval() is designed to parse strings containing Python literals and is much safer than using eval(), as it avoids the security risks associated with the latter.

Using a Custom Decoder

If you have unique data formats or specific parsing requirements, you need to go beyond the standard JSON decoding.

Here’s how you can create and use a custom JSON decoder:

import json
class CustomDecoder(json.JSONDecoder):
    def decode(self, s):
        # Custom decoding logic goes here
        # For demonstration, let's convert all string values to uppercase
        result = super().decode(s)
        return {k: v.upper() if isinstance(v, str) else v for k, v in result.items()}
bytes_data = b'{"account": "user123", "service": "mobile", "active": true}'
json_data = json.loads(bytes_data.decode('utf-8'), cls=CustomDecoder)
print(json_data)

Output:

{'account': 'USER123', 'service': 'MOBILE', 'active': True}

In this output, you can see the custom decoder in action. The decoder class inherits from json.JSONDecoder and overrides the decode method.

In our example, the custom logic converts all string values in the JSON data to uppercase.

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Basic Conversion Using json.loads()

Handling Non-UTF Encodings

Dealing with Byte Order Mark (BOM) in JSON Conversion

Converting Byte Array Containing Escaped JSON String

Using ast.literal_eval() for Safe Conversion

Using a Custom Decoder

Related posts

Leave a Reply Cancel reply