How To Convert Bytes Array to JSON in Python
In this tutorial, you’ll learn several methods of converting byte arrays to JSON in Python.
We’ll deal with standard UTF-8 encoded JSON, non-UTF encodings, Byte Order Marks (BOM), escaped JSON strings, and even create a custom decoder for specialized data.
Basic Conversion Using json.loads()
You can use json.loads()
to convert the data into a Python dictionary.
import json bytes_data = b'{"calls": 150, "messages": 40, "data": 12.5}' json_data = json.loads(bytes_data.decode('utf-8')) print(json_data)
Output:
{'calls': 150, 'messages': 40, 'data': 12.5}
This output shows a Python dictionary with keys and values extracted from the JSON object.
Handling Non-UTF Encodings
Imagine you have a bytes array encoded in a different encoding, such as ISO-8859-1 (commonly used in certain European data sets).
You can use the appropriate encoding in the decode
method to handle a non-UTF bytes array:
import json bytes_data = b'{\x22service_calls\x22: 75, \x22network_issues\x22: 5}' # Encoded in ISO-8859-1 json_data = json.loads(bytes_data.decode('iso-8859-1')) print(json_data)
Output:
{'service_calls': 75, 'network_issues': 5}
This output shows the JSON object successfully decoded and converted into a Python dictionary.
Dealing with Byte Order Mark (BOM) in JSON Conversion
When working with JSON data from various sources, you may encounter a Byte Order Mark (BOM).
This is common in files originating from systems where UTF-16 or UTF-8 with BOM is the standard encoding.
The BOM can cause issues during the conversion process if not handled correctly.
You can use lstrip()
to strip the BOM (BOM_UTF8
) from the bytes array before decoding and loading it as JSON
import json from codecs import BOM_UTF8 bytes_data = BOM_UTF8 + b'{"feedback": "Excellent", "usage": 23.7}' clean_data = bytes_data.lstrip(BOM_UTF8) json_data = json.loads(clean_data.decode('utf-8')) print(json_data)
Output:
{'feedback': 'Excellent', 'usage': 23.7}
This output shows a successful conversion to a Python dictionary, free from any issues caused by the BOM.
Converting Byte Array Containing Escaped JSON String
Sometimes, you might encounter a byte array that contains an escaped JSON string.
Here’s how to handle and convert an escaped JSON string in a byte array:
import json bytes_data = b'"{\\"customer_id\\": 123, \\"plan\\": \\"premium\\", \\"active\\": true}"' string_data = bytes_data.decode('utf-8') unescaped_json = json.loads(string_data) json_data = json.loads(unescaped_json) print(json_data)
Output:
{'customer_id': 123, 'plan': 'premium', 'active': True}
In this output, the byte array is first decoded into a string. The resulting string, which is an escaped JSON string, is then unescaped using json.loads
.
Finally, this unescaped string is converted into a Python dictionary.
Using ast.literal_eval() for Safe Conversion
When dealing with data from untrusted sources, you need a safer alternative to json.loads()
for conversion.
You can use ast.literal_eval()
from Python’s ast
module, it safely evaluates a string containing a Python literal.
Check the following example:
import ast bytes_data = b"{'user_id': 200, 'status': 'active', 'balance': 45.30}" string_data = bytes_data.decode('utf-8') dict_data = ast.literal_eval(string_data) print(dict_data)
Output:
{'user_id': 200, 'status': 'active', 'balance': 45.30}
ast.literal_eval()
is designed to parse strings containing Python literals and is much safer than using eval()
, as it avoids the security risks associated with the latter.
Using a Custom Decoder
If you have unique data formats or specific parsing requirements, you need to go beyond the standard JSON decoding.
Here’s how you can create and use a custom JSON decoder:
import json class CustomDecoder(json.JSONDecoder): def decode(self, s): # Custom decoding logic goes here # For demonstration, let's convert all string values to uppercase result = super().decode(s) return {k: v.upper() if isinstance(v, str) else v for k, v in result.items()} bytes_data = b'{"account": "user123", "service": "mobile", "active": true}' json_data = json.loads(bytes_data.decode('utf-8'), cls=CustomDecoder) print(json_data)
Output:
{'account': 'USER123', 'service': 'MOBILE', 'active': True}
In this output, you can see the custom decoder in action. The decoder class inherits from json.JSONDecoder
and overrides the decode
method.
In our example, the custom logic converts all string values in the JSON data to uppercase.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.