Search JSON Keys and Values Using Regex in Python

In this tutorial, you’ll learn various methods to use regex for searching JSON data in Python.

We’ll cover how to match keys and values based on specific patterns, lengths, and formats, including email addresses, phone numbers, and dates.

Also, you’ll learn how to navigate nested JSON structures using recursive functions.

 

 

Matching Keys Start or End

To match the key start, you can use ^ and to match the end of a key, you can use $.

First, let’s import the necessary modules and set up a sample JSON data.

import json
import re
json_data = '''
{
    "router1Location": "New York",
    "router2Location": "San Francisco",
    "userAccount123": "active",
    "userAccount456": "inactive",
    "serviceTypeMobile": "4G",
    "serviceTypeFiber": "100Mbps"
}
'''
data = json.loads(json_data)

Now, let’s write a function to search for keys that start with ‘router’, end with ‘Location’, or contain ‘Account’:

def search_keys(data, pattern):
    regex = re.compile(pattern)
    return [key for key in data.keys() if regex.search(key)]
keys_starting_with_router = search_keys(data, r'^router')
keys_ending_with_location = search_keys(data, r'Location$')
keys_containing_account = search_keys(data, r'Account')
print("Keys starting with 'router':", keys_starting_with_router)
print("Keys ending with 'Location':", keys_ending_with_location)
print("Keys containing 'Account':", keys_containing_account)

Output:

Keys starting with 'router': ['router1Location', 'router2Location']
Keys ending with 'Location': ['router1Location', 'router2Location']
Keys containing 'Account': ['userAccount123', 'userAccount456']

In the output, you see the keys filtered based on the specified patterns.

 

Matching Keys with a Certain Length

You can match JSON keys or values based on their length using the {m,n} regex syntax in Python.

Let’s target keys or values with a length between 12 and 14 characters:

def search_by_length(data, key_pattern):
    key_regex = re.compile(key_pattern)
    result = {}
    for key, value in data.items():
        if key_regex.match(key):
            result[key] = value
    return result
filtered_data = search_by_length(data, r'^.{12,14}$')
print("Filtered data:", filtered_data)

Output:

Filtered data: {'userAccount123': 'active', 'userAccount456': 'inactive'}

Keys like ‘userAccount123’ and ‘userAccount456’, which have lengths within the specified range, are selected.

 

Extracting All Keys or Values Matching a Pattern

You can use re.findall() in Python to extract all keys or values from JSON data that match a specific regex pattern.

Suppose you want to extract all keys that contain numbers or all values that are types of services:

def find_all_matching(data, key_pattern, value_pattern):
    key_matches = []
    value_matches = []
    for key, value in data.items():
        if re.findall(key_pattern, key):
            key_matches.append(key)
        if re.findall(value_pattern, value):
            value_matches.append(value)
    return key_matches, value_matches
keys_with_numbers, service_type_values = find_all_matching(data, r'\d+', r'^(4G|100Mbps)$')
print("Keys with numbers:", keys_with_numbers)
print("Service type values:", service_type_values)

Output:

Keys with numbers: ['router1Location', 'router2Location', 'userAccount123', 'userAccount456']
Service type values: ['4G', '100Mbps']

re.findall() identifies keys like ‘userAccount123’ and ‘userAccount456’, which contain numbers, and values like ‘4G’ and ‘100Mbps’, which match the specified service types.

 

Matching Multiple Patterns Using |

You can match multiple different patterns in JSON keys or values using the | operator.

This operator acts as a logical OR, allowing you to search for keys or values that match any one of several patterns.

Imagine you need to find keys or values that match different service types or account statuses.

Let’s define patterns for service types like ‘4G’ or ‘100Mbps’ and account statuses like ‘active’ or ‘inactive’:

def find_by_multiple_patterns(data, pattern):
    regex = re.compile(pattern)
    matched_items = {key: value for key, value in data.items() if regex.search(key) or regex.search(value)}
    return matched_items
pattern = r'^(4G|100Mbps|active|inactive)$'
matched_data = find_by_multiple_patterns(data, pattern)
print("Matched data:", matched_data)

Output:

Matched data: {'userAccount123': 'active', 'userAccount456': 'inactive', 'serviceTypeMobile': '4G', 'serviceTypeFiber': '100Mbps'}

 

Matching a Pattern Occurring N Times

You can use the {n} quantifier in regex to match patterns in JSON keys or values that occur a specific number of times.

Imagine you need to identify keys or values that contain exactly three digits:

def find_pattern_occurrences(data, key_pattern, value_pattern):
    key_regex = re.compile(key_pattern)
    value_regex = re.compile(value_pattern)
    matched_items = {}
    for key, value in data.items():
        if key_regex.search(key):
            matched_items[key] = value
        elif value_regex.search(value):
            matched_items[key] = value
    return matched_items
pattern_for_keys = r'\d{3}'
pattern_for_values = r'\d{3}'
matched_data = find_pattern_occurrences(data, pattern_for_keys, pattern_for_values)
print("Matched data based on the pattern:", matched_data)

Output:

Matched data based on the pattern: {'userAccount123': 'active', 'userAccount456': 'inactive', 'serviceTypeFiber': '100Mbps'}

The function find_pattern_occurrences extracts keys like ‘userAccount123’ and ‘userAccount456’, which contain exactly three digits, and the value ‘100Mbps’ from ‘serviceTypeFiber’.

 

Using Regex to Identify Email/Phone/Date

If you need to identify JSON values that are either email addresses, phone numbers, or dates in a specific format (e.g., YYYY-MM-DD).

Here’s how you can do this:

import json
import re
json_data = '''
{
    "customer1Email": "john.doe@example.com",
    "customer2Email": "jane.smith@domain.net",
    "supportContact": "+1-800-555-0199",
    "lastServiceDate": "2023-06-15",
    "nextBillingDate": "2023-07-01",
    "miscellaneousInfo": "Some other data"
}
'''
data = json.loads(json_data)
def find_special_formats(data, email_pattern, phone_pattern, date_pattern):
    email_regex = re.compile(email_pattern)
    phone_regex = re.compile(phone_pattern)
    date_regex = re.compile(date_pattern)
    matched_items = {"emails": [], "phone_numbers": [], "dates": []}
    for key, value in data.items():
        if email_regex.search(value):
            matched_items["emails"].append(value)
        elif phone_regex.search(value):
            matched_items["phone_numbers"].append(value)
        elif date_regex.search(value):
            matched_items["dates"].append(value)
    return matched_items
email_pattern = r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+'
phone_pattern = r'\+\d{1,3}-\d{3}-\d{3}-\d{4}'
date_pattern = r'\d{4}-\d{2}-\d{2}'
matched_data = find_special_formats(data, email_pattern, phone_pattern, date_pattern)
print("Matched data:", matched_data)

Output:

Matched data: {'emails': ['john.doe@example.com', 'jane.smith@domain.net'], 'phone_numbers': ['+1-800-555-0199'], 'dates': ['2023-06-15', '2023-07-01']}

In this output, the function find_special_formats identifies email addresses, phone numbers, and dates in the specific YYYY-MM-DD format from the JSON data.

 

Searching Nested JSON Using Regex

You can use a recursive function to search for keys or values that match a regex pattern at multiple levels of depth.

Let’s assume you’re dealing with a nested JSON structure like the following and you want to search for IP addresses and dates:

import json
import re
json_data = '''
{
    "network": {
        "router1": {
            "location": "New York",
            "ip": "192.168.1.1"
        },
        "router2": {
            "location": "San Francisco",
            "ip": "192.168.2.1"
        }
    },
    "users": {
        "user123": {
            "email": "user123@example.com",
            "phone": "+123-456-7890"
        },
        "user456": {
            "status": "active",
            "lastLogin": "2023-06-15"
        }
    }
}
'''
data = json.loads(json_data)
def search_nested_json(data, pattern, results=None):
    if results is None:
        results = []
    if isinstance(data, dict):
        for key, value in data.items():
            if isinstance(value, (dict, list)):
                search_nested_json(value, pattern, results)
            elif re.search(pattern, str(value)):
                results.append(value)
    elif isinstance(data, list):
        for item in data:
            search_nested_json(item, pattern, results)
    return results

# Regex pattern to find IP addresses and dates
pattern = r'(\d{3}\.\d{3}\.\d{1}\.\d{1})|(\d{4}-\d{2}-\d{2})'
matched_values = search_nested_json(data, pattern)
print("Matched values in nested JSON:", matched_values)

Output:

Matched values in nested JSON: ['192.168.1.1', '192.168.2.1', '2023-06-15']

The function search_nested_json recursively traverses the nested JSON structure and finds both IP addresses and date string.

Leave a Reply

Your email address will not be published. Required fields are marked *