8+ Examples for Merging JSON arrays in Python
In this tutorial, you will learn how to merge JSON arrays.
From basic concatenation using the +
operator to advanced methods with reduce()
and the jsonmerge
library.
Basic Merging of JSON Arrays
There are two basic ways to merge simple JSON arrays, shallow copy using +
operator and deep copy using copy
module.
Shallow Copying
A shallow copy means the merged array contains references to the original objects.
Suppose you have two JSON arrays:
array1 = [{"id": 1, "name": "Customer A"}, {"id": 2, "name": "Customer B"}] array2 = [{"id": 3, "name": "Customer C"}, {"id": 4, "name": "Customer D"}]
To merge these arrays, you can use the +
operator:
merged_array = array1 + array2 print(merged_array)
Output:
[{"id": 1, "name": "Customer A"}, {"id": 2, "name": "Customer B"}, {"id": 3, "name": "Customer C"}, {"id": 4, "name": "Customer D"}]
In this case, merged_array
is a new list containing elements from both array1
and array2
.
If you modify the original arrays, the changes will be reflected in merged_array
.
Deep Copying
If you want to merge arrays but keep them independent (changes in the original arrays should not affect the merged array), you need to create copies of the original objects.
You can use the copy
module for this purpose:
import copy array1 = [{"id": 1, "name": "Customer A"}, {"id": 2, "name": "Customer B"}] array2 = [{"id": 3, "name": "Customer C"}, {"id": 4, "name": "Customer D"}] deep_merged_array = copy.deepcopy(array1) + copy.deepcopy(array2) print(deep_merged_array)
Output:
[{'id': 1, 'name': 'Customer A'}, {'id': 2, 'name': 'Customer B'}, {'id': 3, 'name': 'Customer C'}, {'id': 4, 'name': 'Customer D'}]
This creates a new array where the objects are independent of those in the original arrays.
Changes in array1
or array2
will not affect deep_merged_array
.
Merge Nested JSON Arrays
Consider the following two nested JSON arrays:
array1 = [{"id": 1, "name": "Customer A", "subscriptions": [{"plan": "Basic", "status": "Active"}]}, {"id": 2, "name": "Customer B", "subscriptions": [{"plan": "Premium", "status": "Inactive"}]}] array2 = [{"id": 1, "name": "Customer A", "subscriptions": [{"plan": "Data", "status": "Active"}]}, {"id": 3, "name": "Customer C", "subscriptions": [{"plan": "Basic", "status": "Active"}]}]
You can use a recursive function to merge both arrays:
def merge_nested_arrays(arr1, arr2): merged = {item['id']: item for item in arr1} for item in arr2: if item['id'] in merged: merged[item['id']]['subscriptions'] += item['subscriptions'] else: merged[item['id']] = item return list(merged.values()) merged_customers = merge_nested_arrays(array1, array2) print(merged_customers)
Output:
[{'id': 1, 'name': 'Customer A', 'subscriptions': [{'plan': 'Basic', 'status': 'Active'}, {'plan': 'Data', 'status': 'Active'}]}, {'id': 2, 'name': 'Customer B', 'subscriptions': [{'plan': 'Premium', 'status': 'Inactive'}]}, {'id': 3, 'name': 'Customer C', 'subscriptions': [{'plan': 'Basic', 'status': 'Active'}]}]
This code iterates over both arrays, merging the subscriptions for customers with the same ID.
Merging with Custom Rules
Imagine you have two JSON arrays with customer data, and you need to merge them based on specific rules:
array1 = [{"id": 1, "name": "Customer A", "status": "Active"}, {"id": 2, "name": "Customer B", "status": "Inactive"}] array2 = [{"id": 2, "name": "Customer B", "status": "Active"}, {"id": 3, "name": "Customer C", "status": "Inactive"}]
Here are the rules for merging:
- Exclude customers with the status “Inactive”.
- Remove duplicate customers based on their ID.
- If there are duplicates, the entry in
array2
should overwrite the one inarray1
.
Now, let’s write a function to apply these rules:
def merge_with_rules(arr1, arr2): merged = {item['id']: item for item in arr1 if item['status'] == 'Active'} for item in arr2: if item['status'] == 'Active': merged[item['id']] = item return list(merged.values()) merged_customers = merge_with_rules(array1, array2) print(merged_customers)
Output:
[{'id': 1, 'name': 'Customer A', 'status': 'Active'}, {'id': 2, 'name': 'Customer B', 'status': 'Active'}]
This function first creates a dictionary from array1
and excludes inactive customers.
Then it iterates over array2
and applies the same exclusion rule overwriting any existing entries in the dictionary.
Using reduce() for Complex Merges
The Python reduce()
function from the functools
module applies a function to items of a sequence to produce a single result.
Here’s an example of how reduce()
can be used:
from functools import reduce array1 = [{"id": 1, "name": "Customer A"}, {"id": 2, "name": "Customer B"}] array2 = [{"id": 2, "name": "Customer B Updated"}, {"id": 3, "name": "Customer C"}] array3 = [{"id": 3, "name": "Customer C"}, {"id": 4, "name": "Customer D"}] def merge_arrays(arr1, arr2): merged = {item['id']: item for item in arr1} for item in arr2: # Custom merging logic (e.g., updating existing entries) merged[item['id']] = item return list(merged.values()) result = reduce(merge_arrays, [array1, array2, array3]) print(result)
Output:
[{'id': 1, 'name': 'Customer A'}, {'id': 2, 'name': 'Customer B Updated'}, {'id': 3, 'name': 'Customer C'}, {'id': 4, 'name': 'Customer D'}]
In this example, reduce()
applies the merge_arrays
function to merge array1
, array2
, and array3
.
The custom merging logic inside merge_arrays
ensures that if an ID already exists, the new information from the subsequent arrays overwrites the old.
Using a Loop
Let’s consider an example where you have three JSON arrays:
array1 = [{"id": 1, "name": "Customer A"}, {"id": 2, "name": "Customer B"}] array2 = [{"id": 2, "name": "Customer B", "status": "Active"}, {"id": 3, "name": "Customer C"}] array3 = [{"id": 3, "name": "Customer C", "location": "City X"}, {"id": 4, "name": "Customer D"}]
To merge these arrays, you can use a loop:
def merge_arrays(*arrays): merged = {} for array in arrays: for item in array: if item['id'] in merged: merged[item['id']].update(item) else: merged[item['id']] = item return list(merged.values()) merged_customers = merge_arrays(array1, array2, array3) print(merged_customers)
Output:
[{'id': 1, 'name': 'Customer A'}, {'id': 2, 'name': 'Customer B', 'status': 'Active'}, {'id': 3, 'name': 'Customer C', 'location': 'City X'}, {'id': 4, 'name': 'Customer D'}]
Merging JSON Arrays with Different Keys
Merging JSON arrays with different keys into a single dataset requires standardizing the keys or accommodating the differences in the merged result.
Consider you have two JSON arrays with customer data, but the keys in these arrays are not identical:
array1 = [{"customer_id": 1, "customer_name": "Customer A"}, {"customer_id": 2, "customer_name": "Customer B"}] array2 = [{"id": 2, "name": "Customer B", "status": "Active"}, {"id": 3, "name": "Customer C", "status": "Active"}]
Here, array1
uses customer_id
and customer_name
, while array2
uses id
, name
, and status
.
To merge these arrays, you need to align these keys. One method is to standardize the keys and then merge them:
def standardize_keys(array, key_map): standardized = [] for item in array: new_item = {key_map.get(k, k): v for k, v in item.items()} standardized.append(new_item) return standardized # Key mappings for standardization key_map1 = {"customer_id": "id", "customer_name": "name"} key_map2 = {} # No changes needed for array2 # Standardizing arrays standardized_array1 = standardize_keys(array1, key_map1) standardized_array2 = standardize_keys(array2, key_map2) merged_array = standardized_array1 + standardized_array2 print(merged_array)
Output:
[{'id': 1, 'name': 'Customer A'}, {'id': 2, 'name': 'Customer B'}, {'id': 2, 'name': 'Customer B', 'status': 'Active'}, {'id': 3, 'name': 'Customer C', 'status': 'Active'}]
In this example, standardize_keys
function is used to map the keys from one standard to another.
Using jsonmerge
jsonmerge
module allows you to merge JSON data and provides a range of features to handle various merge cases, including handling conflicts in keys, complex merge strategies, and schema validation.
First, ensure you have jsonmerge
installed:
pip install jsonmerge
Now, consider you have two JSON arrays representing customer data:
array1 = [{"id": 1, "name": "Customer A"}, {"id": 2, "name": "Customer B"}] array2 = [{"id": 2, "name": "Customer B", "location": "City X"}, {"id": 3, "name": "Customer C"}]
You want to merge these arrays, where entries with the same id
should be merged into a single entry.
Here’s how to do it using jsonmerge
:
from jsonmerge import merge import json obj1 = {"customers": array1} obj2 = {"customers": array2} # Define the merge schema schema = { "properties": { "customers": { "mergeStrategy": "append" } } } result = merge(obj1, obj2, schema) merged_array = result['customers'] print(json.dumps(merged_array, indent=4))
Output:
[ {"id": 1, "name": "Customer A"}, {"id": 2, "name": "Customer B", "location": "City X"}, {"id": 3, "name": "Customer C"} ]
In this example, jsonmerge
takes two JSON objects and a merge schema. The schema specifies the merge strategy for the customers
array, which in this case is append
.
Benchmark Test
In this section, we’ll set up a simple benchmark to compare the performance of a standard Python code method to merging JSON arrays against using reduce()
function and the jsonmerge
library.
For the benchmark, we’ll measure the time taken to merge JSON arrays using both methods.
First, ensure you have the necessary libraries for benchmarking:
pip install jsonmerge timeit
Now, let’s define our sample data and the merging functions:
import timeit from jsonmerge import merge from functools import reduce import copy array1 = [{"id": i, "name": f"Customer {i}"} for i in range(10000)] array2 = [{"id": i, "location": f"City {i}"} for i in range(500, 15000)] # Function using normal Python code def merge_standard(arr1, arr2): merged = {item['id']: item for item in arr1} for item in arr2: if item['id'] in merged: merged[item['id']].update(item) else: merged[item['id']] = item return list(merged.values()) # Using jsonmerge def merge_jsonmerge(arr1, arr2): obj1 = {"customers": arr1} obj2 = {"customers": arr2} schema = {"properties": {"customers": {"mergeStrategy": "append"}}} result = merge(obj1, obj2, schema) return result['customers'] # Using reduce() def merge_reduce(arr1, arr2): def reducer(merged, item): if item['id'] in merged: merged[item['id']].update(item) else: merged[item['id']] = item return merged return list(reduce(reducer, arr2, {item['id']: item for item in arr1}).values()) # Benchmark standard_time = timeit.timeit(lambda: merge_standard(array1, array2), number=100) jsonmerge_time = timeit.timeit(lambda: merge_jsonmerge(array1, array2), number=100) reduce_time = timeit.timeit(lambda: merge_reduce(array1, array2), number=100) print(f"Standard Merge Time: {standard_time} seconds") print(f"jsonmerge Merge Time: {jsonmerge_time} seconds") print(f"reduce() Merge Time: {reduce_time} seconds")
Output:
Standard Merge Time: 0.6578992000140715 seconds jsonmerge Merge Time: 0.10612290000426583 seconds reduce() Merge Time: 0.9299761000147555 seconds
The number=100
argument specifies that each merge function should be called 100 times to get an average time.
From the above result, jsonmerge
is faster in merging JSON arrays.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.