Search JSON Keys and Values Using Regex in Python
In this tutorial, you’ll learn various methods to use regex for searching JSON data in Python.
We’ll cover how to match keys and values based on specific patterns, lengths, and formats, including email addresses, phone numbers, and dates.
Also, you’ll learn how to navigate nested JSON structures using recursive functions.
Matching Keys Start or End
To match the key start, you can use ^
and to match the end of a key, you can use $
.
First, let’s import the necessary modules and set up a sample JSON data.
import json import re json_data = ''' { "router1Location": "New York", "router2Location": "San Francisco", "userAccount123": "active", "userAccount456": "inactive", "serviceTypeMobile": "4G", "serviceTypeFiber": "100Mbps" } ''' data = json.loads(json_data)
Now, let’s write a function to search for keys that start with ‘router’, end with ‘Location’, or contain ‘Account’:
def search_keys(data, pattern): regex = re.compile(pattern) return [key for key in data.keys() if regex.search(key)] keys_starting_with_router = search_keys(data, r'^router') keys_ending_with_location = search_keys(data, r'Location$') keys_containing_account = search_keys(data, r'Account') print("Keys starting with 'router':", keys_starting_with_router) print("Keys ending with 'Location':", keys_ending_with_location) print("Keys containing 'Account':", keys_containing_account)
Output:
Keys starting with 'router': ['router1Location', 'router2Location'] Keys ending with 'Location': ['router1Location', 'router2Location'] Keys containing 'Account': ['userAccount123', 'userAccount456']
In the output, you see the keys filtered based on the specified patterns.
Matching Keys with a Certain Length
You can match JSON keys or values based on their length using the {m,n}
regex syntax in Python.
Let’s target keys or values with a length between 12 and 14 characters:
def search_by_length(data, key_pattern): key_regex = re.compile(key_pattern) result = {} for key, value in data.items(): if key_regex.match(key): result[key] = value return result filtered_data = search_by_length(data, r'^.{12,14}$') print("Filtered data:", filtered_data)
Output:
Filtered data: {'userAccount123': 'active', 'userAccount456': 'inactive'}
Keys like ‘userAccount123’ and ‘userAccount456’, which have lengths within the specified range, are selected.
Extracting All Keys or Values Matching a Pattern
You can use re.findall()
in Python to extract all keys or values from JSON data that match a specific regex pattern.
Suppose you want to extract all keys that contain numbers or all values that are types of services:
def find_all_matching(data, key_pattern, value_pattern): key_matches = [] value_matches = [] for key, value in data.items(): if re.findall(key_pattern, key): key_matches.append(key) if re.findall(value_pattern, value): value_matches.append(value) return key_matches, value_matches keys_with_numbers, service_type_values = find_all_matching(data, r'\d+', r'^(4G|100Mbps)$') print("Keys with numbers:", keys_with_numbers) print("Service type values:", service_type_values)
Output:
Keys with numbers: ['router1Location', 'router2Location', 'userAccount123', 'userAccount456'] Service type values: ['4G', '100Mbps']
re.findall()
identifies keys like ‘userAccount123’ and ‘userAccount456’, which contain numbers, and values like ‘4G’ and ‘100Mbps’, which match the specified service types.
Matching Multiple Patterns Using |
You can match multiple different patterns in JSON keys or values using the |
operator.
This operator acts as a logical OR, allowing you to search for keys or values that match any one of several patterns.
Imagine you need to find keys or values that match different service types or account statuses.
Let’s define patterns for service types like ‘4G’ or ‘100Mbps’ and account statuses like ‘active’ or ‘inactive’:
def find_by_multiple_patterns(data, pattern): regex = re.compile(pattern) matched_items = {key: value for key, value in data.items() if regex.search(key) or regex.search(value)} return matched_items pattern = r'^(4G|100Mbps|active|inactive)$' matched_data = find_by_multiple_patterns(data, pattern) print("Matched data:", matched_data)
Output:
Matched data: {'userAccount123': 'active', 'userAccount456': 'inactive', 'serviceTypeMobile': '4G', 'serviceTypeFiber': '100Mbps'}
Matching a Pattern Occurring N Times
You can use the {n}
quantifier in regex to match patterns in JSON keys or values that occur a specific number of times.
Imagine you need to identify keys or values that contain exactly three digits:
def find_pattern_occurrences(data, key_pattern, value_pattern): key_regex = re.compile(key_pattern) value_regex = re.compile(value_pattern) matched_items = {} for key, value in data.items(): if key_regex.search(key): matched_items[key] = value elif value_regex.search(value): matched_items[key] = value return matched_items pattern_for_keys = r'\d{3}' pattern_for_values = r'\d{3}' matched_data = find_pattern_occurrences(data, pattern_for_keys, pattern_for_values) print("Matched data based on the pattern:", matched_data)
Output:
Matched data based on the pattern: {'userAccount123': 'active', 'userAccount456': 'inactive', 'serviceTypeFiber': '100Mbps'}
The function find_pattern_occurrences
extracts keys like ‘userAccount123’ and ‘userAccount456’, which contain exactly three digits, and the value ‘100Mbps’ from ‘serviceTypeFiber’.
Using Regex to Identify Email/Phone/Date
If you need to identify JSON values that are either email addresses, phone numbers, or dates in a specific format (e.g., YYYY-MM-DD).
Here’s how you can do this:
import json import re json_data = ''' { "customer1Email": "john.doe@example.com", "customer2Email": "jane.smith@domain.net", "supportContact": "+1-800-555-0199", "lastServiceDate": "2023-06-15", "nextBillingDate": "2023-07-01", "miscellaneousInfo": "Some other data" } ''' data = json.loads(json_data) def find_special_formats(data, email_pattern, phone_pattern, date_pattern): email_regex = re.compile(email_pattern) phone_regex = re.compile(phone_pattern) date_regex = re.compile(date_pattern) matched_items = {"emails": [], "phone_numbers": [], "dates": []} for key, value in data.items(): if email_regex.search(value): matched_items["emails"].append(value) elif phone_regex.search(value): matched_items["phone_numbers"].append(value) elif date_regex.search(value): matched_items["dates"].append(value) return matched_items email_pattern = r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+' phone_pattern = r'\+\d{1,3}-\d{3}-\d{3}-\d{4}' date_pattern = r'\d{4}-\d{2}-\d{2}' matched_data = find_special_formats(data, email_pattern, phone_pattern, date_pattern) print("Matched data:", matched_data)
Output:
Matched data: {'emails': ['john.doe@example.com', 'jane.smith@domain.net'], 'phone_numbers': ['+1-800-555-0199'], 'dates': ['2023-06-15', '2023-07-01']}
In this output, the function find_special_formats
identifies email addresses, phone numbers, and dates in the specific YYYY-MM-DD format from the JSON data.
Searching Nested JSON Using Regex
You can use a recursive function to search for keys or values that match a regex pattern at multiple levels of depth.
Let’s assume you’re dealing with a nested JSON structure like the following and you want to search for IP addresses and dates:
import json import re json_data = ''' { "network": { "router1": { "location": "New York", "ip": "192.168.1.1" }, "router2": { "location": "San Francisco", "ip": "192.168.2.1" } }, "users": { "user123": { "email": "user123@example.com", "phone": "+123-456-7890" }, "user456": { "status": "active", "lastLogin": "2023-06-15" } } } ''' data = json.loads(json_data) def search_nested_json(data, pattern, results=None): if results is None: results = [] if isinstance(data, dict): for key, value in data.items(): if isinstance(value, (dict, list)): search_nested_json(value, pattern, results) elif re.search(pattern, str(value)): results.append(value) elif isinstance(data, list): for item in data: search_nested_json(item, pattern, results) return results # Regex pattern to find IP addresses and dates pattern = r'(\d{3}\.\d{3}\.\d{1}\.\d{1})|(\d{4}-\d{2}-\d{2})' matched_values = search_nested_json(data, pattern) print("Matched values in nested JSON:", matched_values)
Output:
Matched values in nested JSON: ['192.168.1.1', '192.168.2.1', '2023-06-15']
The function search_nested_json
recursively traverses the nested JSON structure and finds both IP addresses and date string.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.