8+ Examples for Merging JSON arrays in Python

In this tutorial, you will learn how to merge JSON arrays.

From basic concatenation using the + operator to advanced methods with reduce() and the jsonmerge library.

 

 

Basic Merging of JSON Arrays

There are two basic ways to merge simple JSON arrays, shallow copy using + operator and deep copy using copy module.

Shallow Copying

A shallow copy means the merged array contains references to the original objects.

Suppose you have two JSON arrays:

array1 = [{"id": 1, "name": "Customer A"}, {"id": 2, "name": "Customer B"}]
array2 = [{"id": 3, "name": "Customer C"}, {"id": 4, "name": "Customer D"}]

To merge these arrays, you can use the + operator:

merged_array = array1 + array2
print(merged_array)

Output:

[{"id": 1, "name": "Customer A"}, {"id": 2, "name": "Customer B"}, {"id": 3, "name": "Customer C"}, {"id": 4, "name": "Customer D"}]

In this case, merged_array is a new list containing elements from both array1 and array2.

If you modify the original arrays, the changes will be reflected in merged_array.

Deep Copying

If you want to merge arrays but keep them independent (changes in the original arrays should not affect the merged array), you need to create copies of the original objects.

You can use the copy module for this purpose:

import copy
array1 = [{"id": 1, "name": "Customer A"}, {"id": 2, "name": "Customer B"}]
array2 = [{"id": 3, "name": "Customer C"}, {"id": 4, "name": "Customer D"}]
deep_merged_array = copy.deepcopy(array1) + copy.deepcopy(array2)
print(deep_merged_array)

Output:

[{'id': 1, 'name': 'Customer A'}, {'id': 2, 'name': 'Customer B'}, {'id': 3, 'name': 'Customer C'}, {'id': 4, 'name': 'Customer D'}]

This creates a new array where the objects are independent of those in the original arrays.

Changes in array1 or array2 will not affect deep_merged_array.

 

Merge Nested JSON Arrays

Consider the following two nested JSON arrays:

array1 = [{"id": 1, "name": "Customer A", "subscriptions": [{"plan": "Basic", "status": "Active"}]},
          {"id": 2, "name": "Customer B", "subscriptions": [{"plan": "Premium", "status": "Inactive"}]}]

array2 = [{"id": 1, "name": "Customer A", "subscriptions": [{"plan": "Data", "status": "Active"}]},
          {"id": 3, "name": "Customer C", "subscriptions": [{"plan": "Basic", "status": "Active"}]}]

You can use a recursive function to merge both arrays:

def merge_nested_arrays(arr1, arr2):
    merged = {item['id']: item for item in arr1}
    for item in arr2:
        if item['id'] in merged:
            merged[item['id']]['subscriptions'] += item['subscriptions']
        else:
            merged[item['id']] = item
    return list(merged.values())
merged_customers = merge_nested_arrays(array1, array2)
print(merged_customers)

Output:

[{'id': 1, 'name': 'Customer A', 'subscriptions': [{'plan': 'Basic', 'status': 'Active'}, {'plan': 'Data', 'status': 'Active'}]},
 {'id': 2, 'name': 'Customer B', 'subscriptions': [{'plan': 'Premium', 'status': 'Inactive'}]},
 {'id': 3, 'name': 'Customer C', 'subscriptions': [{'plan': 'Basic', 'status': 'Active'}]}]

This code iterates over both arrays, merging the subscriptions for customers with the same ID.

 

Merging with Custom Rules

Imagine you have two JSON arrays with customer data, and you need to merge them based on specific rules:

array1 = [{"id": 1, "name": "Customer A", "status": "Active"},
          {"id": 2, "name": "Customer B", "status": "Inactive"}]

array2 = [{"id": 2, "name": "Customer B", "status": "Active"},
          {"id": 3, "name": "Customer C", "status": "Inactive"}]

Here are the rules for merging:

  1. Exclude customers with the status “Inactive”.
  2. Remove duplicate customers based on their ID.
  3. If there are duplicates, the entry in array2 should overwrite the one in array1.

Now, let’s write a function to apply these rules:

def merge_with_rules(arr1, arr2):
    merged = {item['id']: item for item in arr1 if item['status'] == 'Active'}
    for item in arr2:
        if item['status'] == 'Active':
            merged[item['id']] = item
    return list(merged.values())

merged_customers = merge_with_rules(array1, array2)
print(merged_customers)

Output:

[{'id': 1, 'name': 'Customer A', 'status': 'Active'},
 {'id': 2, 'name': 'Customer B', 'status': 'Active'}]

This function first creates a dictionary from array1 and excludes inactive customers.

Then it iterates over array2 and applies the same exclusion rule overwriting any existing entries in the dictionary.

 

Using reduce() for Complex Merges

The Python reduce() function from the functools module applies a function to items of a sequence to produce a single result.

Here’s an example of how reduce() can be used:

from functools import reduce
array1 = [{"id": 1, "name": "Customer A"}, {"id": 2, "name": "Customer B"}]
array2 = [{"id": 2, "name": "Customer B Updated"}, {"id": 3, "name": "Customer C"}]
array3 = [{"id": 3, "name": "Customer C"}, {"id": 4, "name": "Customer D"}]
def merge_arrays(arr1, arr2):
    merged = {item['id']: item for item in arr1}
    for item in arr2:

        # Custom merging logic (e.g., updating existing entries)
        merged[item['id']] = item
    return list(merged.values())
result = reduce(merge_arrays, [array1, array2, array3])
print(result)

Output:

[{'id': 1, 'name': 'Customer A'}, {'id': 2, 'name': 'Customer B Updated'}, {'id': 3, 'name': 'Customer C'}, {'id': 4, 'name': 'Customer D'}]

In this example, reduce() applies the merge_arrays function to merge array1, array2, and array3.

The custom merging logic inside merge_arrays ensures that if an ID already exists, the new information from the subsequent arrays overwrites the old.

 

Using a Loop

Let’s consider an example where you have three JSON arrays:

array1 = [{"id": 1, "name": "Customer A"}, {"id": 2, "name": "Customer B"}]
array2 = [{"id": 2, "name": "Customer B", "status": "Active"}, {"id": 3, "name": "Customer C"}]
array3 = [{"id": 3, "name": "Customer C", "location": "City X"}, {"id": 4, "name": "Customer D"}]

To merge these arrays, you can use a loop:

def merge_arrays(*arrays):
    merged = {}
    for array in arrays:
        for item in array:
            if item['id'] in merged:
                merged[item['id']].update(item)
            else:
                merged[item['id']] = item
    return list(merged.values())
merged_customers = merge_arrays(array1, array2, array3)
print(merged_customers)

Output:

[{'id': 1, 'name': 'Customer A'},
 {'id': 2, 'name': 'Customer B', 'status': 'Active'},
 {'id': 3, 'name': 'Customer C', 'location': 'City X'},
 {'id': 4, 'name': 'Customer D'}]

 

Merging JSON Arrays with Different Keys

Merging JSON arrays with different keys into a single dataset requires standardizing the keys or accommodating the differences in the merged result.

Consider you have two JSON arrays with customer data, but the keys in these arrays are not identical:

array1 = [{"customer_id": 1, "customer_name": "Customer A"}, {"customer_id": 2, "customer_name": "Customer B"}]
array2 = [{"id": 2, "name": "Customer B", "status": "Active"}, {"id": 3, "name": "Customer C", "status": "Active"}]

Here, array1 uses customer_id and customer_name, while array2 uses id, name, and status.

To merge these arrays, you need to align these keys. One method is to standardize the keys and then merge them:

def standardize_keys(array, key_map):
    standardized = []
    for item in array:
        new_item = {key_map.get(k, k): v for k, v in item.items()}
        standardized.append(new_item)
    return standardized

# Key mappings for standardization
key_map1 = {"customer_id": "id", "customer_name": "name"}
key_map2 = {}  # No changes needed for array2

# Standardizing arrays
standardized_array1 = standardize_keys(array1, key_map1)
standardized_array2 = standardize_keys(array2, key_map2)
merged_array = standardized_array1 + standardized_array2
print(merged_array)

Output:

[{'id': 1, 'name': 'Customer A'}, {'id': 2, 'name': 'Customer B'}, {'id': 2, 'name': 'Customer B', 'status': 'Active'}, {'id': 3, 'name': 'Customer C', 'status': 'Active'}]

In this example, standardize_keys function is used to map the keys from one standard to another.

 

Using jsonmerge

jsonmerge module allows you to merge JSON data and provides a range of features to handle various merge cases, including handling conflicts in keys, complex merge strategies, and schema validation.

First, ensure you have jsonmerge installed:

pip install jsonmerge

Now, consider you have two JSON arrays representing customer data:

array1 = [{"id": 1, "name": "Customer A"}, {"id": 2, "name": "Customer B"}]
array2 = [{"id": 2, "name": "Customer B", "location": "City X"}, {"id": 3, "name": "Customer C"}]

You want to merge these arrays, where entries with the same id should be merged into a single entry.

Here’s how to do it using jsonmerge:

from jsonmerge import merge
import json
obj1 = {"customers": array1}
obj2 = {"customers": array2}

# Define the merge schema
schema = {
    "properties": {
        "customers": {
            "mergeStrategy": "append"
        }
    }
}
result = merge(obj1, obj2, schema)
merged_array = result['customers']
print(json.dumps(merged_array, indent=4))

Output:

[
    {"id": 1, "name": "Customer A"},
    {"id": 2, "name": "Customer B", "location": "City X"},
    {"id": 3, "name": "Customer C"}
]

In this example, jsonmerge takes two JSON objects and a merge schema. The schema specifies the merge strategy for the customers array, which in this case is append.

 

Benchmark Test

In this section, we’ll set up a simple benchmark to compare the performance of a standard Python code method to merging JSON arrays against using reduce() function and the jsonmerge library.

For the benchmark, we’ll measure the time taken to merge JSON arrays using both methods.

First, ensure you have the necessary libraries for benchmarking:

pip install jsonmerge timeit

Now, let’s define our sample data and the merging functions:

import timeit
from jsonmerge import merge
from functools import reduce
import copy
array1 = [{"id": i, "name": f"Customer {i}"} for i in range(10000)]
array2 = [{"id": i, "location": f"City {i}"} for i in range(500, 15000)]

# Function using normal Python code
def merge_standard(arr1, arr2):
    merged = {item['id']: item for item in arr1}
    for item in arr2:
        if item['id'] in merged:
            merged[item['id']].update(item)
        else:
            merged[item['id']] = item
    return list(merged.values())

# Using jsonmerge
def merge_jsonmerge(arr1, arr2):
    obj1 = {"customers": arr1}
    obj2 = {"customers": arr2}
    schema = {"properties": {"customers": {"mergeStrategy": "append"}}}
    result = merge(obj1, obj2, schema)
    return result['customers']

# Using reduce()
def merge_reduce(arr1, arr2):
    def reducer(merged, item):
        if item['id'] in merged:
            merged[item['id']].update(item)
        else:
            merged[item['id']] = item
        return merged
    return list(reduce(reducer, arr2, {item['id']: item for item in arr1}).values())

# Benchmark
standard_time = timeit.timeit(lambda: merge_standard(array1, array2), number=100)
jsonmerge_time = timeit.timeit(lambda: merge_jsonmerge(array1, array2), number=100)
reduce_time = timeit.timeit(lambda: merge_reduce(array1, array2), number=100)
print(f"Standard Merge Time: {standard_time} seconds")
print(f"jsonmerge Merge Time: {jsonmerge_time} seconds")
print(f"reduce() Merge Time: {reduce_time} seconds")

Output:

Standard Merge Time: 0.6578992000140715 seconds
jsonmerge Merge Time: 0.10612290000426583 seconds
reduce() Merge Time: 0.9299761000147555 seconds

The number=100 argument specifies that each merge function should be called 100 times to get an average time.

From the above result, jsonmerge is faster in merging JSON arrays.

Leave a Reply

Your email address will not be published. Required fields are marked *