How To Convert Bytes Array to JSON in Python

In this tutorial, you’ll learn several methods of converting byte arrays to JSON in Python.

We’ll deal with standard UTF-8 encoded JSON, non-UTF encodings, Byte Order Marks (BOM), escaped JSON strings, and even create a custom decoder for specialized data.

 

 

Basic Conversion Using json.loads()

You can use json.loads() to convert the data into a Python dictionary.

import json
bytes_data = b'{"calls": 150, "messages": 40, "data": 12.5}'
json_data = json.loads(bytes_data.decode('utf-8'))
print(json_data)

Output:

{'calls': 150, 'messages': 40, 'data': 12.5}

This output shows a Python dictionary with keys and values extracted from the JSON object.

 

Handling Non-UTF Encodings

Imagine you have a bytes array encoded in a different encoding, such as ISO-8859-1 (commonly used in certain European data sets).

You can use the appropriate encoding in the decode method to handle a non-UTF bytes array:

import json
bytes_data = b'{\x22service_calls\x22: 75, \x22network_issues\x22: 5}'  # Encoded in ISO-8859-1
json_data = json.loads(bytes_data.decode('iso-8859-1'))
print(json_data)

Output:

{'service_calls': 75, 'network_issues': 5}

This output shows the JSON object successfully decoded and converted into a Python dictionary.

 

Dealing with Byte Order Mark (BOM) in JSON Conversion

When working with JSON data from various sources, you may encounter a Byte Order Mark (BOM).

This is common in files originating from systems where UTF-16 or UTF-8 with BOM is the standard encoding.

The BOM can cause issues during the conversion process if not handled correctly.

You can use lstrip() to strip the BOM (BOM_UTF8) from the bytes array before decoding and loading it as JSON

import json
from codecs import BOM_UTF8
bytes_data = BOM_UTF8 + b'{"feedback": "Excellent", "usage": 23.7}'
clean_data = bytes_data.lstrip(BOM_UTF8)
json_data = json.loads(clean_data.decode('utf-8'))
print(json_data)

Output:

{'feedback': 'Excellent', 'usage': 23.7}

This output shows a successful conversion to a Python dictionary, free from any issues caused by the BOM.

 

Converting Byte Array Containing Escaped JSON String

Sometimes, you might encounter a byte array that contains an escaped JSON string.

Here’s how to handle and convert an escaped JSON string in a byte array:

import json
bytes_data = b'"{\\"customer_id\\": 123, \\"plan\\": \\"premium\\", \\"active\\": true}"'
string_data = bytes_data.decode('utf-8')
unescaped_json = json.loads(string_data)
json_data = json.loads(unescaped_json)
print(json_data)

Output:

{'customer_id': 123, 'plan': 'premium', 'active': True}

In this output, the byte array is first decoded into a string. The resulting string, which is an escaped JSON string, is then unescaped using json.loads.

Finally, this unescaped string is converted into a Python dictionary.

 

Using ast.literal_eval() for Safe Conversion

When dealing with data from untrusted sources, you need a safer alternative to json.loads() for conversion.

You can use ast.literal_eval() from Python’s ast module, it safely evaluates a string containing a Python literal.

Check the following example:

import ast
bytes_data = b"{'user_id': 200, 'status': 'active', 'balance': 45.30}"
string_data = bytes_data.decode('utf-8')
dict_data = ast.literal_eval(string_data)
print(dict_data)

Output:

{'user_id': 200, 'status': 'active', 'balance': 45.30}

ast.literal_eval() is designed to parse strings containing Python literals and is much safer than using eval(), as it avoids the security risks associated with the latter.

 

Using a Custom Decoder

If you have unique data formats or specific parsing requirements, you need to go beyond the standard JSON decoding.

Here’s how you can create and use a custom JSON decoder:

import json
class CustomDecoder(json.JSONDecoder):
    def decode(self, s):
        # Custom decoding logic goes here
        # For demonstration, let's convert all string values to uppercase
        result = super().decode(s)
        return {k: v.upper() if isinstance(v, str) else v for k, v in result.items()}
bytes_data = b'{"account": "user123", "service": "mobile", "active": true}'
json_data = json.loads(bytes_data.decode('utf-8'), cls=CustomDecoder)
print(json_data)

Output:

{'account': 'USER123', 'service': 'MOBILE', 'active': True}

In this output, you can see the custom decoder in action. The decoder class inherits from json.JSONDecoder and overrides the decode method.

In our example, the custom logic converts all string values in the JSON data to uppercase.

Leave a Reply

Your email address will not be published. Required fields are marked *