Get Unique Values from JSON Array in Python
In this tutorial, we’ll explore several methods to extract unique values (remove duplicates) from JSON array in Python.
We’ll cover methods ranging from using Python sets and list comprehensions to more advanced methods such as collections.OrderedDict
, custom functions, Pandas, and itertools
.
Using Python Sets
A Python set is an unordered collection of items where every element is unique (i.e., no duplicates).
First, import the JSON module and load your data:
import json json_data = '["iPhone 12", "Galaxy S20", "iPhone 12", "Galaxy S20", "Pixel 5"]' phone_models = json.loads(json_data)
Now, convert this list into a set to automatically remove duplicates:
unique_models = set(phone_models) print(unique_models)
Output:
{'Galaxy S20', 'iPhone 12', 'Pixel 5'}
In this output, you’ll notice that duplicates like ‘iPhone 12’ and ‘Galaxy S20’, which appeared twice in the original list, are now represented only once.
Using List Comprehension
Assuming you have already loaded the JSON data into a Python list named phone_models
(as shown in the previous section), you can use list comprehension along with a set to filter out duplicates:
unique_models = list({model for model in phone_models}) print(unique_models)
Output:
['Galaxy S20', 'Pixel 5', 'iPhone 12']
In this output, the list comprehension inside the set {model for model in phone_models}
removes the duplicates by converting the list into a set.
Then convert it back into a list with list(...)
retains the unique values.
Using collections.OrderedDict
Here’s how to do it:
from collections import OrderedDict unique_models_ordered = list(OrderedDict.fromkeys(phone_models)) print(unique_models_ordered)
Output:
['iPhone 12', 'Galaxy S20', 'Pixel 5']
In this output, OrderedDict.fromkeys(phone_models)
creates an OrderedDict
where each phone model from the phone_models
list is a key.
Since keys in a dictionary are unique, this removes any duplicates.
The order in which elements are inserted is preserved. Converting it back to a list with list(...)
provides a sequence of unique values in the order they first appeared in the original list.
Using a Custom Function
Creating a custom function to extract unique values from a JSON array is useful when you want more control over the process, such as adding additional conditions for uniqueness.
Here’s the custom function:
def get_unique_values(data_list): unique_list = [] for item in data_list: if item not in unique_list: unique_list.append(item) return unique_list unique_models_custom = get_unique_values(phone_models) print(unique_models_custom)
Output:
['iPhone 12', 'Galaxy S20', 'Pixel 5']
This output shows that the function get_unique_values
iterates through phone_models
, adding each model to unique_list
only if it is not already present.
As a result, unique_models_custom
contains all unique phone models, preserving their original order.
Using Pandas
First, you’ll need to install Pandas if you haven’t already:
pip install pandas
Then, import Pandas and use it to handle the JSON data:
import pandas as pd df = pd.DataFrame(phone_models, columns=['Model']) unique_models_pandas = df['Model'].drop_duplicates().tolist() print(unique_models_pandas)
Output:
['iPhone 12', 'Galaxy S20', 'Pixel 5']
By converting the JSON array into a DataFrame, you can apply the drop_duplicates()
method on the ‘Model’ column.
This method removes duplicate values and the tolist()
method converts the result back into a list format.
Using Python itertools
We can use itertools.groupby()
to group consecutive duplicate elements, and then extract one element from each group to get the unique values.
Here’s how you can use itertools
along with a list comprehension:
import itertools # Assuming phone_models is sorted, if not, sort it first phone_models.sort() unique_models_itertools = [model for model, group in itertools.groupby(phone_models)] print(unique_models_itertools)
Output:
['Galaxy S20', 'Pixel 5', 'iPhone 12']
In this output, the groupby
function groups the list by consecutive identical elements.
The list comprehension iterates over these groups and picks the first item from each group, resulting in a list of unique values.
Note that this method will only remove consecutive duplicates. If the list is not sorted, identical items may not be adjacent and hence won’t be grouped.
Extracting Unique Values from Nested JSON Array
Imagine the JSON data now includes a list of phone models along with their features. The goal is to extract unique phone models from this nested structure.
import json nested_json_data = ''' [ {"model": "iPhone 12", "features": ["5G", "Dual Camera"]}, {"model": "Galaxy S20", "features": ["AMOLED Display", "Water Resistant"]}, {"model": "iPhone 12", "features": ["5G", "Dual Camera"]}, {"model": "Pixel 5", "features": ["Night Sight", "Reverse Charging"]} ] ''' data = json.loads(nested_json_data) unique_models_nested = list({item['model'] for item in data}) print(unique_models_nested)
Output:
['Galaxy S20', 'iPhone 12', 'Pixel 5']
In this output, the set comprehension {item['model'] for item in data}
goes through each dictionary in the loaded JSON array and extracts the ‘model’ value.
The set automatically removes any duplicates. Finally, converting it to a list with list(...)
gives you a list of unique phone models.
Benchmark Test
To perform a benchmark test, we’ll use Python’s timeit
module, which provides a simple way to time small bits of Python code.
We’ll use the callable one to test the performance of each method discussed earlier.
Let’s write a benchmark test for the different methods of extracting unique values:
import timeit data = ["iPhone 12", "Galaxy S20", "iPhone 12", "Galaxy S20", "Pixel 5"] * 1000 #Large sample data # Test using Python sets time_sets = timeit.timeit('set(data)', globals=globals(), number=1000) # Test using list comprehension time_list_comp = timeit.timeit('list({model for model in data})', globals=globals(), number=1000) # Test using OrderedDict time_ordered_dict = timeit.timeit('list(OrderedDict.fromkeys(data))', globals=globals(), setup='from collections import OrderedDict', number=1000) # Test using custom function setup_custom_func = ''' def get_unique_values(data_list): unique_list = [] for item in data_list: if item not in unique_list: unique_list.append(item) return unique_list ''' time_custom_func = timeit.timeit('get_unique_values(data)', globals=globals(), setup=setup_custom_func, number=1000) # Test using Pandas time_pandas = timeit.timeit('pd.DataFrame(data, columns=["Model"]).drop_duplicates().Model.tolist()', globals=globals(), setup='import pandas as pd', number=1000) # Test using itertools time_itertools = timeit.timeit('[model for model, group in itertools.groupby(sorted(data))]', globals=globals(), setup='import itertools', number=1000) print(f"Time using sets: {time_sets:.5f} seconds") print(f"Time using list comprehension: {time_list_comp:.5f} seconds") print(f"Time using OrderedDict: {time_ordered_dict:.5f} seconds") print(f"Time using custom function: {time_custom_func:.5f} seconds") print(f"Time using Pandas: {time_pandas:.5f} seconds") print(f"Time using itertools: {time_itertools:.5f} seconds")
Output:
Time using sets: 0.05806 seconds Time using list comprehension: 0.16215 seconds Time using OrderedDict: 0.28626 seconds Time using custom function: 0.29265 seconds Time using Pandas: 1.11520 seconds Time using itertools: 0.59240 seconds
This script will time each method 1000 times and output the average time taken.
Using sets is the fastest method.
It’s important to note that the performance might vary based on the size and complexity of the data, and different methods may be preferable in different cases.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.