Introduction to Python defaultdict: dictionary on steroids

In Python, defaultdict is a dictionary-like class from the collections module that allows us to define a default value for keys that have not been explicitly set in the dictionary. It is a subclass of the built-in dict class.

Both dict and defaultdict are used to store and manage data in a key-value pair format (known as a dictionary in Python).

In this tutorial, we will explore various features and use cases of defaultdict, understanding its differences from the standard dictionary.

 

 

Difference between dict and defaultdict

The main difference is that the regular dictionary throws an error when you access a nonexisting key, while defaultdict returns a default value.
Let’s consider this scenario with a regular dictionary:

regular_dict = dict()
print(regular_dict['non_existent_key'])

Output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'non_existent_key'

The code above raises a KeyError traceback because the key ‘non_existent_key’ doesn’t exist in the dictionary. This is where defaultdict comes in.

default_dict = defaultdict(int)
print(default_dict['non_existent_key'])

Output:

0

In the case of defaultdict, when we try to access a key that doesn’t exist in the dictionary, instead of a KeyError, the defaultdict returns the default value as set by the default factory function (in this case, int which means a default value of 0).

 

Creating defaultdict

To create a defaultdict, we need to import the defaultdict class from the collections module.

The constructor takes a function as an argument, which provides the default value for the dictionary.
Here’s a simple example:

from collections import defaultdict

# Initializing defaultdict with int
dd_int = defaultdict(int)
print(dd_int)

# Initializing defaultdict with list
dd_list = defaultdict(list)
print(dd_list)

Output:

defaultdict(<class 'int'>, {})
defaultdict(<class 'list'>, {})

In this code, we’ve created two instances of defaultdict: dd_int with int as the default factory function and dd_list with list as the default factory function.

When printed, each instance shows its default factory function and its current data, which is an empty dictionary.

 

Default factory function

Python provides several built-in functions that can be used as a default factory, such as list(), int(), set(), dict(), str(), float(), bool(), and tuple().
Let’s explore some of these:

# Default factory function is list
dd_list = defaultdict(list)
dd_list['key1'].append(1)
print(dd_list)

# Default factory function is int
dd_int = defaultdict(int)
dd_int['key1'] += 1
print(dd_int)

# Default factory function is set
dd_set = defaultdict(set)
dd_set['key1'].add(1)
print(dd_set)

# Default factory function is dict
dd_dict = defaultdict(dict)
dd_dict['key1']['inner_key1'] = 1
print(dd_dict)

# Default factory function is str
dd_str = defaultdict(str)
dd_str['key1'] += 'a'
print(dd_str)

Output:

defaultdict(<class 'list'>, {'key1': [1]})
defaultdict(<class 'int'>, {'key1': 1})
defaultdict(<class 'set'>, {'key1': {1}})
defaultdict(<class 'dict'>, {'key1': {'inner_key1': 1}})
defaultdict(<class 'str'>, {'key1': 'a'})

In the above examples, we’re using different default factory functions. With list, when we try to append an element to an unknown key, it automatically creates an empty list and appends the item. Similarly, with int, it initializes the value to 0 and adds 1 to it.

With set, it creates an empty set for an unknown key. With dict, it creates an empty dictionary for an unknown key.

Lastly, with str, it initializes an empty string and appends the string ‘a’ to it.

 

Creating a defaultdict with a custom default function

Besides the built-in functions, you can also pass a custom function as the default factory.

from collections import defaultdict

# Define a function that will be used as the default factory
def my_func():
    return 'Default Value'

# Create a defaultdict with the custom default factory
dd = defaultdict(my_func)

# Access a key that doesn't exist
print(dd['non_existent_key'])

Output:

Default Value

In this code, we first define a custom function my_func that returns ‘Default Value’. We then create a defaultdict dd, passing my_func as the argument.

When we try to access a key that does not exist in the defaultdict, it returns the value returned by our custom default factory function, which is ‘Default Value’.

 

 

defaultdict with Lambda

We can use lambda functions as default factory in defaultdict. A lambda function is a small anonymous function that is defined using the lambda keyword.
Let’s create a defaultdict with a lambda function that returns a string ‘default’ as the default value.

from collections import defaultdict

# Create a defaultdict with lambda as default factory
dd = defaultdict(lambda: 'default')
print(dd['missing_key'])

Output:

default

In this example, we create a defaultdict dd, passing a lambda function as the argument. This lambda function returns the string ‘default’ when called.

When we try to access a missing key ‘missing_key’, defaultdict does not raise a KeyError. Instead, it calls the lambda function to provide a default value, so ‘default’ is returned.

 

Accessing Elements in defaultdict

Just like a regular dictionary, you can access elements in a defaultdict by using keys. You can also use the get() method, which is a built-in method of the dict class.

from collections import defaultdict

# Create a defaultdict with default factory int
dd = defaultdict(int)

# Add some elements
dd['key1'] = 10
dd['key2'] = 20

# Accessing elements using keys
print(dd['key1'])
print(dd['key2'])

# Accessing elements using get() method
print(dd.get('key1'))
print(dd.get('key2'))

Output:

10
20
10
20

In this code, we first create a defaultdict dd with int as the default factory function. We then add some elements to the defaultdict.

When we access the elements using their keys or the get() method, it returns the corresponding values.
If you use the get() method with a key that does not exist, it will return None instead of a KeyError.

print(dd.get('non_existent_key'))

Output:

None

The get() method here works similarly as in a standard dictionary, returning None when the key does not exist instead of the default factory value.

 

Adding a new element

Adding a new element to a defaultdict is straightforward. Just like with a standard dictionary, you use the assignment operation.

from collections import defaultdict

# Create a defaultdict with default factory list
dd = defaultdict(list)

# Add a new element
dd['key1'].append('Python')

print(dd)

Output:

defaultdict(<class 'list'>, {'key1': ['Python']})

In this example, we first create a defaultdict dd with list as the default factory function. Then we add a new element to the defaultdict using the key ‘key1’.

Since the default factory function is list, we can directly use the append function to add the value ‘Python’ to the list corresponding to ‘key1’.

 

Updating an existing element

You can directly assign a new value to an existing key:

from collections import defaultdict
dd = defaultdict(int)
dd['key1'] = 10
print(dd)
dd['key1'] = 20
print(dd)

Output:

defaultdict(<class 'int'>, {'key1': 10})
defaultdict(<class 'int'>, {'key1': 20})

In this example, we first create a defaultdict dd with int as the default factory function. Then we add a new element with ‘key1’ as the key and 10 as the value.

To update the element, we simply assign a new value 20 to the same key ‘key1’. The final defaultdict has ‘key1’ with 20 as its value.

 

How defaultdict handles missing keys?

A defaultdict works exactly like a regular dictionary, but it is initialized with a function (default_factory function) that takes no arguments and provides the default value for a nonexistent key.

If you access a key that doesn’t exist in a defaultdict, it will invoke its default_factory function and use the result as the new value for that key.

This behavior is managed by the __missing__ method.

In a standard dictionary, the __missing__ method is used to handle missing keys. When a key is not found, Python sends the key as an argument to the __missing__ method (if it’s implemented) instead of raising a KeyError.

However, the defaultdict overrides this __missing__ method to handle missing keys.

If the default_factory attribute is None, this raises a KeyError exception with the argument as the key.

If default_factory is not None, it is called without arguments to provide a default value for the given key, this value is inserted in the dictionary for the key, and returned.

Here’s an example:

from collections import defaultdict

# Function to return a default value for missing keys
def default_factory():
    return 'Default Value'

ddict = defaultdict(default_factory)
print(ddict['key1'])  # key1 does not exist in the dictionary

Output:

Default Value

In this example, key1 does not exist in the dictionary. When we try to access key1, defaultdict calls its default_factory function, inserts key1 into the dictionary with the value returned by default_factory, and then returns this value. Thus, a defaultdict never raises a KeyError.

 

Nested defaultdict

A nested defaultdict is a defaultdict that contains other defaultdicts as values, which allows you to create a dictionary of dictionaries (or even more complex structures).

Creating a nested defaultdict

Here is how you can create a nested defaultdict:

from collections import defaultdict

# Function to return a defaultdict with int as default factory
def nested_dd():
    return defaultdict(int)

# Create a nested defaultdict
dd = defaultdict(nested_dd)

# Add values to the nested defaultdict
dd['key1']['inner_key1'] += 1
dd['key1']['inner_key2'] += 2
dd['key2']['inner_key1'] += 3

print(dd)

Output:

defaultdict(<function nested_dd at 0x7f15c3f7edc0>, {'key1': defaultdict(<class 'int'>, {'inner_key1': 1, 'inner_key2': 2}), 'key2': defaultdict(<class 'int'>, {'inner_key1': 3})})

In this code, we first define a function nested_dd that returns a defaultdict with int as the default factory function.

We then create a defaultdict dd, passing nested_dd as the argument.

When we add values to the nested defaultdict, if the keys do not exist, defaultdict automatically creates them and initializes them with the default value (0 in this case).

Real-world use-cases for nested defaultdict

A nested defaultdict can be useful when dealing with complex data structures.

For example, it can be used to represent a tree or a graph, where each node is a dictionary with keys representing connected nodes and values representing the weights of the connections.

Let’s consider a practical use case: you’re working with data representing sales in an e-commerce store. The data contains information about sales amounts grouped by year, month, and product category.

We can use a nested defaultdict to organize and access this data efficiently.

from collections import defaultdict

# Function to return a defaultdict with float as default factory
def nested_dd():
    return defaultdict(float)

# Create a nested defaultdict for sales data
sales = defaultdict(lambda: defaultdict(nested_dd))

# Add values to the nested defaultdict
sales[2022]['January']['Electronics'] = 12000.50
sales[2022]['January']['Books'] = 3500.25
sales[2022]['February']['Electronics'] = 10500.75
sales[2023]['March']['Books'] = 5000.00

print(sales)

Output:

defaultdict(<function <lambda> at 0x7f15c3f7edc0>, {2022: defaultdict(<function nested_dd at 0x7f15c3f7ee50>, {'January': defaultdict(<class 'float'>, {'Electronics': 12000.5, 'Books': 3500.25}), 'February': defaultdict(<class 'float'>, {'Electronics': 10500.75})}), 2023: defaultdict(<function nested_dd at 0x7f15c3f7ee50>, {'March': defaultdict(<class 'float'>, {'Books': 5000.0})})})

In this example, we first define a function nested_dd that returns a defaultdict with float as the default factory function.

We then create a defaultdict sales, passing a function that returns a defaultdict created by nested_dd as the argument.

This gives us a three-level nested defaultdict. We then add sales data to the defaultdict.

The first level keys represent the years, the second level keys represent the months, and the third level keys represent the product categories.

The values represent the sales amounts.
This structure allows us to easily access the sales amount of a specific product category for a specific month of a specific year.

For example, sales[2022]['January']['Electronics'] would return the sales amount of electronics in January 2022.

 

defaultdict Methods

A defaultdict supports all the methods provided by the standard Python dictionary.

Methods such as keys(), values(), items(), get(), pop(), clear(), and many others, all work similarly with defaultdicts as they do with standard dictionaries.
Let’s look at some examples:

from collections import defaultdict
dd = defaultdict(int)
dd['key1'] = 10
dd['key2'] = 20

# Print all keys
print(dd.keys())

# Print all values
print(dd.values())

# Print all items
print(dd.items())

# Get the value of a key
print(dd.get('key1'))

# Remove and return a key-value pair
print(dd.pop('key1'))
print(dd)

Output:

dict_keys(['key1', 'key2'])
dict_values([10, 20])
dict_items([('key1', 10), ('key2', 20)])
10
10
defaultdict(<class 'int'>, {'key2': 20})

All these methods work as expected. They perform operations on the defaultdict just like they would on a standard dictionary.

Special method specific to defaultdict (default_factory)

This method returns the function that is used to create default values.

print(dd.default_factory)

Output:

<class 'int'>

Here, default_factory returns <class 'int'>, which is the function we used to generate default values for the defaultdict.

 

Iterating over defaultdict

Iterating over a defaultdict is similar to iterating over a standard dictionary. You can iterate over the keys, values, or items (key-value pairs).

Iterating over keys

from collections import defaultdict
dd = defaultdict(int)
dd['key1'] = 10
dd['key2'] = 20

# Iterate over keys
for key in dd:
    print(key)

Output:

key1
key2

In this example, we create a defaultdict with two keys: ‘key1’ and ‘key2’. We then iterate over the keys using a simple for loop. This prints each key on a separate line.

Iterating over values

for value in dd.values():
    print(value)

Output:

10
20

We use the values() method of defaultdict to get an iterable of all values and then print each value on a separate line.

Iterating over items (key-value pairs)

for key, value in dd.items():
    print(f'{key}: {value}')

Output:

key1: 10
key2: 20

The items() method of defaultdict returns an iterable of tuples, where each tuple contains a key-value pair.

 

Sorting defaultdict

You can use the sorted() function which returns a list of sorted keys or values.

Sorting by keys

from collections import defaultdict
dd = defaultdict(int)
dd['banana'] = 10
dd['apple'] = 20
dd['cherry'] = 15

# Sort by keys
sorted_dd = sorted(dd.items())
print(sorted_dd)

Output:

[('apple', 20), ('banana', 10), ('cherry', 15)]

In this example, we create a defaultdict with three keys: ‘banana’, ‘apple’, and ‘cherry’. We then use the sorted function with dd.items() as the argument.

This sorts the items by keys and returns a list of tuples, where each tuple contains a key-value pair. The list is sorted in ascending order by the keys.

Sorting by values

sorted_dd = sorted(dd.items(), key=lambda x: x[1])
print(sorted_dd)

Output:

[('banana', 10), ('cherry', 15), ('apple', 20)]

We use the sorted function with dd.items() and a key function as the arguments.

The key function is a lambda function that returns the second element of a tuple (the value of a key-value pair), so the sorted function sorts the items by values.

 

Real-world examples of defaultdict Usage

Represent graph data structures

A common use case of defaultdict is to represent a graph data structure where the keys are nodes and the values are lists of adjacent nodes.

Let’s say we are building a simple social network and we want to represent the friend connections between different users.

Here is how we might do that using defaultdict:

from collections import defaultdict

# Each pair represents a connection between two users
friend_connections = [('Anna', 'Beth'), ('Anna', 'Chad'), ('Beth', 'Dave'), ('Chad', 'Dave'), ('Dave', 'Emily'), ('Emily', 'Frank'), ('Frank', 'Anna')]

# Initialize our graph with a defaultdict of type list
friend_graph = defaultdict(list)

# Populate the graph
for user1, user2 in friend_connections:
    friend_graph[user1].append(user2)
    friend_graph[user2].append(user1)  # Add this line if the friendship is mutual

# Print friend connections of each user
for user, friends in friend_graph.items():
    print(f"{user} is friends with {', '.join(friends)}")

Output:

Anna is friends with Beth, Chad, Frank
Beth is friends with Anna, Dave
Chad is friends with Anna, Dave
Dave is friends with Beth, Chad, Emily
Emily is friends with Dave, Frank
Frank is friends with Emily, Anna

This way, we can easily represent complex data structures like graphs with defaultdict.

Counting Word Frequencies

Another common use case of defaultdict in the field of natural language processing or text analytics is counting the frequency of words in a document or a collection of documents.

The idea is to create a defaultdict with the default factory as int and use each word in the document as a key.

The defaultdict will automatically handle missing keys and return the default value 0.
Let’s write a small script to demonstrate this:

from collections import defaultdict
text = """
The Python defaultdict type is a dictionary-like class available in Python's collections module. 
Unlike standard Python dictionaries, defaultdict lets you specify a default value type at the 
time of its creation. When you try to access or modify keys in the defaultdict that do not exist,
instead of a KeyError, you get a default value of the type you specified during creation. 
The defaultdict type is useful in situations where you want to avoid unnecessary KeyError exceptions 
and make your code more readable and clean. It's especially handy when working with collections of 
data where some keys might not exist.
"""

word_freq = defaultdict(int)

# Normalize the text
normalized_text = text.lower().split()

# Count the frequency of each word
for word in normalized_text:
    word_freq[word] += 1

for word, freq in word_freq.items():
    print(f'Word: {word}, Frequency: {freq}')

Output:

Word: the, Frequency: 5
Word: python, Frequency: 2
Word: defaultdict, Frequency: 4
Word: type, Frequency: 4
Word: is, Frequency: 2
Word: a, Frequency: 4
Word: dictionary-like, Frequency: 1
Word: class, Frequency: 1
Word: available, Frequency: 1
Word: in, Frequency: 3
Word: python's, Frequency: 1
Word: collections, Frequency: 2
Word: module., Frequency: 1
Word: unlike, Frequency: 1
Word: standard, Frequency: 1
Word: dictionaries,, Frequency: 1
Word: lets, Frequency: 1
Word: you, Frequency: 5
Word: specify, Frequency: 1
Word: default, Frequency: 2
Word: value, Frequency: 2
Word: at, Frequency: 1
Word: time, Frequency: 1
Word: of, Frequency: 4
Word: its, Frequency: 1
Word: creation., Frequency: 2
Word: when, Frequency: 2
Word: try, Frequency: 1
Word: to, Frequency: 2
Word: access, Frequency: 1
Word: or, Frequency: 1
Word: modify, Frequency: 1
Word: keys, Frequency: 2
Word: that, Frequency: 1
Word: do, Frequency: 1
Word: not, Frequency: 2
Word: exist,, Frequency: 1
Word: instead, Frequency: 1
Word: keyerror,, Frequency: 1
Word: get, Frequency: 1
Word: specified, Frequency: 1
Word: during, Frequency: 1
Word: useful, Frequency: 1
Word: situations, Frequency: 1
Word: where, Frequency: 2
Word: want, Frequency: 1
Word: avoid, Frequency: 1
Word: unnecessary, Frequency: 1
Word: keyerror, Frequency: 1
Word: exceptions, Frequency: 1
Word: and, Frequency: 2
Word: make, Frequency: 1
Word: your, Frequency: 1
Word: code, Frequency: 1
Word: more, Frequency: 1
Word: readable, Frequency: 1
Word: clean., Frequency: 1
Word: it's, Frequency: 1
Word: especially, Frequency: 1
Word: handy, Frequency: 1
Word: working, Frequency: 1
Word: with, Frequency: 1
Word: data, Frequency: 1
Word: some, Frequency: 1
Word: might, Frequency: 1
Word: exist., Frequency: 1

This code will output the frequency of each word. The words are case-insensitive because we use the lower function to normalize the text.

The split() function is used to separate the text into words based on spaces.

 

Resources

https://docs.python.org/3/library/collections.html#collections.defaultdict

Leave a Reply

Your email address will not be published. Required fields are marked *