Introduction to Python defaultdict: dictionary on steroids
In Python, defaultdict
is a dictionary-like class from the collections
module that allows us to define a default value for keys that have not been explicitly set in the dictionary. It is a subclass of the built-in dict
class.
Both dict
and defaultdict
are used to store and manage data in a key-value pair format (known as a dictionary in Python).
In this tutorial, we will explore various features and use cases of defaultdict, understanding its differences from the standard dictionary.
- 1 Difference between dict and defaultdict
- 2 Creating defaultdict
- 3 Default factory function
- 4 Creating a defaultdict with a custom default function
- 5 defaultdict with Lambda
- 6 Accessing Elements in defaultdict
- 7 Adding a new element
- 8 Updating an existing element
- 9 How defaultdict handles missing keys?
- 10 Nested defaultdict
- 11 defaultdict Methods
- 12 Iterating over defaultdict
- 13 Sorting defaultdict
- 14 Real-world examples of defaultdict Usage
- 15 Resources
Difference between dict and defaultdict
The main difference is that the regular dictionary throws an error when you access a nonexisting key, while defaultdict returns a default value.
Let’s consider this scenario with a regular dictionary:
regular_dict = dict() print(regular_dict['non_existent_key'])
Output:
Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'non_existent_key'
The code above raises a KeyError
traceback because the key ‘non_existent_key’ doesn’t exist in the dictionary. This is where defaultdict
comes in.
default_dict = defaultdict(int) print(default_dict['non_existent_key'])
Output:
0
In the case of defaultdict
, when we try to access a key that doesn’t exist in the dictionary, instead of a KeyError
, the defaultdict returns the default value as set by the default factory function (in this case, int
which means a default value of 0
).
Creating defaultdict
To create a defaultdict, we need to import the defaultdict class from the collections module.
The constructor takes a function as an argument, which provides the default value for the dictionary.
Here’s a simple example:
from collections import defaultdict # Initializing defaultdict with int dd_int = defaultdict(int) print(dd_int) # Initializing defaultdict with list dd_list = defaultdict(list) print(dd_list)
Output:
defaultdict(<class 'int'>, {}) defaultdict(<class 'list'>, {})
In this code, we’ve created two instances of defaultdict: dd_int
with int
as the default factory function and dd_list
with list
as the default factory function.
When printed, each instance shows its default factory function and its current data, which is an empty dictionary.
Default factory function
Python provides several built-in functions that can be used as a default factory, such as list()
, int()
, set()
, dict()
, str()
, float()
, bool()
, and tuple()
.
Let’s explore some of these:
# Default factory function is list dd_list = defaultdict(list) dd_list['key1'].append(1) print(dd_list) # Default factory function is int dd_int = defaultdict(int) dd_int['key1'] += 1 print(dd_int) # Default factory function is set dd_set = defaultdict(set) dd_set['key1'].add(1) print(dd_set) # Default factory function is dict dd_dict = defaultdict(dict) dd_dict['key1']['inner_key1'] = 1 print(dd_dict) # Default factory function is str dd_str = defaultdict(str) dd_str['key1'] += 'a' print(dd_str)
Output:
defaultdict(<class 'list'>, {'key1': [1]}) defaultdict(<class 'int'>, {'key1': 1}) defaultdict(<class 'set'>, {'key1': {1}}) defaultdict(<class 'dict'>, {'key1': {'inner_key1': 1}}) defaultdict(<class 'str'>, {'key1': 'a'})
In the above examples, we’re using different default factory functions. With list
, when we try to append an element to an unknown key, it automatically creates an empty list and appends the item. Similarly, with int
, it initializes the value to 0 and adds 1 to it.
With set
, it creates an empty set for an unknown key. With dict
, it creates an empty dictionary for an unknown key.
Lastly, with str
, it initializes an empty string and appends the string ‘a’ to it.
Creating a defaultdict with a custom default function
Besides the built-in functions, you can also pass a custom function as the default factory.
from collections import defaultdict # Define a function that will be used as the default factory def my_func(): return 'Default Value' # Create a defaultdict with the custom default factory dd = defaultdict(my_func) # Access a key that doesn't exist print(dd['non_existent_key'])
Output:
Default Value
In this code, we first define a custom function my_func
that returns ‘Default Value’. We then create a defaultdict dd
, passing my_func
as the argument.
When we try to access a key that does not exist in the defaultdict, it returns the value returned by our custom default factory function, which is ‘Default Value’.
defaultdict with Lambda
We can use lambda functions as default factory in defaultdict. A lambda function is a small anonymous function that is defined using the lambda
keyword.
Let’s create a defaultdict with a lambda function that returns a string ‘default’ as the default value.
from collections import defaultdict # Create a defaultdict with lambda as default factory dd = defaultdict(lambda: 'default') print(dd['missing_key'])
Output:
default
In this example, we create a defaultdict dd
, passing a lambda function as the argument. This lambda function returns the string ‘default’ when called.
When we try to access a missing key ‘missing_key’, defaultdict does not raise a KeyError
. Instead, it calls the lambda function to provide a default value, so ‘default’ is returned.
Accessing Elements in defaultdict
Just like a regular dictionary, you can access elements in a defaultdict by using keys. You can also use the get()
method, which is a built-in method of the dict class.
from collections import defaultdict # Create a defaultdict with default factory int dd = defaultdict(int) # Add some elements dd['key1'] = 10 dd['key2'] = 20 # Accessing elements using keys print(dd['key1']) print(dd['key2']) # Accessing elements using get() method print(dd.get('key1')) print(dd.get('key2'))
Output:
10 20 10 20
In this code, we first create a defaultdict dd
with int
as the default factory function. We then add some elements to the defaultdict.
When we access the elements using their keys or the get()
method, it returns the corresponding values.
If you use the get()
method with a key that does not exist, it will return None
instead of a KeyError
.
print(dd.get('non_existent_key'))
Output:
None
The get()
method here works similarly as in a standard dictionary, returning None
when the key does not exist instead of the default factory value.
Adding a new element
Adding a new element to a defaultdict is straightforward. Just like with a standard dictionary, you use the assignment operation.
from collections import defaultdict # Create a defaultdict with default factory list dd = defaultdict(list) # Add a new element dd['key1'].append('Python') print(dd)
Output:
defaultdict(<class 'list'>, {'key1': ['Python']})
In this example, we first create a defaultdict dd
with list
as the default factory function. Then we add a new element to the defaultdict using the key ‘key1’.
Since the default factory function is list
, we can directly use the append function to add the value ‘Python’ to the list corresponding to ‘key1’.
Updating an existing element
You can directly assign a new value to an existing key:
from collections import defaultdict dd = defaultdict(int) dd['key1'] = 10 print(dd) dd['key1'] = 20 print(dd)
Output:
defaultdict(<class 'int'>, {'key1': 10}) defaultdict(<class 'int'>, {'key1': 20})
In this example, we first create a defaultdict dd
with int
as the default factory function. Then we add a new element with ‘key1’ as the key and 10
as the value.
To update the element, we simply assign a new value 20
to the same key ‘key1’. The final defaultdict has ‘key1’ with 20
as its value.
How defaultdict handles missing keys?
A defaultdict
works exactly like a regular dictionary, but it is initialized with a function (default_factory
function) that takes no arguments and provides the default value for a nonexistent key.
If you access a key that doesn’t exist in a defaultdict
, it will invoke its default_factory
function and use the result as the new value for that key.
This behavior is managed by the __missing__
method.
In a standard dictionary, the __missing__
method is used to handle missing keys. When a key is not found, Python sends the key as an argument to the __missing__
method (if it’s implemented) instead of raising a KeyError
.
However, the defaultdict
overrides this __missing__
method to handle missing keys.
If the default_factory
attribute is None
, this raises a KeyError
exception with the argument as the key.
If default_factory
is not None
, it is called without arguments to provide a default value for the given key, this value is inserted in the dictionary for the key, and returned.
Here’s an example:
from collections import defaultdict # Function to return a default value for missing keys def default_factory(): return 'Default Value' ddict = defaultdict(default_factory) print(ddict['key1']) # key1 does not exist in the dictionary
Output:
Default Value
In this example, key1
does not exist in the dictionary. When we try to access key1
, defaultdict
calls its default_factory
function, inserts key1
into the dictionary with the value returned by default_factory
, and then returns this value. Thus, a defaultdict
never raises a KeyError
.
Nested defaultdict
A nested defaultdict is a defaultdict that contains other defaultdicts as values, which allows you to create a dictionary of dictionaries (or even more complex structures).
Creating a nested defaultdict
Here is how you can create a nested defaultdict:
from collections import defaultdict # Function to return a defaultdict with int as default factory def nested_dd(): return defaultdict(int) # Create a nested defaultdict dd = defaultdict(nested_dd) # Add values to the nested defaultdict dd['key1']['inner_key1'] += 1 dd['key1']['inner_key2'] += 2 dd['key2']['inner_key1'] += 3 print(dd)
Output:
defaultdict(<function nested_dd at 0x7f15c3f7edc0>, {'key1': defaultdict(<class 'int'>, {'inner_key1': 1, 'inner_key2': 2}), 'key2': defaultdict(<class 'int'>, {'inner_key1': 3})})
In this code, we first define a function nested_dd
that returns a defaultdict with int
as the default factory function.
We then create a defaultdict dd
, passing nested_dd
as the argument.
When we add values to the nested defaultdict, if the keys do not exist, defaultdict automatically creates them and initializes them with the default value (0 in this case).
Real-world use-cases for nested defaultdict
A nested defaultdict can be useful when dealing with complex data structures.
For example, it can be used to represent a tree or a graph, where each node is a dictionary with keys representing connected nodes and values representing the weights of the connections.
Let’s consider a practical use case: you’re working with data representing sales in an e-commerce store. The data contains information about sales amounts grouped by year, month, and product category.
We can use a nested defaultdict to organize and access this data efficiently.
from collections import defaultdict # Function to return a defaultdict with float as default factory def nested_dd(): return defaultdict(float) # Create a nested defaultdict for sales data sales = defaultdict(lambda: defaultdict(nested_dd)) # Add values to the nested defaultdict sales[2022]['January']['Electronics'] = 12000.50 sales[2022]['January']['Books'] = 3500.25 sales[2022]['February']['Electronics'] = 10500.75 sales[2023]['March']['Books'] = 5000.00 print(sales)
Output:
defaultdict(<function <lambda> at 0x7f15c3f7edc0>, {2022: defaultdict(<function nested_dd at 0x7f15c3f7ee50>, {'January': defaultdict(<class 'float'>, {'Electronics': 12000.5, 'Books': 3500.25}), 'February': defaultdict(<class 'float'>, {'Electronics': 10500.75})}), 2023: defaultdict(<function nested_dd at 0x7f15c3f7ee50>, {'March': defaultdict(<class 'float'>, {'Books': 5000.0})})})
In this example, we first define a function nested_dd
that returns a defaultdict with float
as the default factory function.
We then create a defaultdict sales
, passing a function that returns a defaultdict created by nested_dd
as the argument.
This gives us a three-level nested defaultdict. We then add sales data to the defaultdict.
The first level keys represent the years, the second level keys represent the months, and the third level keys represent the product categories.
The values represent the sales amounts.
This structure allows us to easily access the sales amount of a specific product category for a specific month of a specific year.
For example, sales[2022]['January']['Electronics']
would return the sales amount of electronics in January 2022.
defaultdict Methods
A defaultdict supports all the methods provided by the standard Python dictionary.
Methods such as keys()
, values()
, items()
, get()
, pop()
, clear()
, and many others, all work similarly with defaultdicts as they do with standard dictionaries.
Let’s look at some examples:
from collections import defaultdict dd = defaultdict(int) dd['key1'] = 10 dd['key2'] = 20 # Print all keys print(dd.keys()) # Print all values print(dd.values()) # Print all items print(dd.items()) # Get the value of a key print(dd.get('key1')) # Remove and return a key-value pair print(dd.pop('key1')) print(dd)
Output:
dict_keys(['key1', 'key2']) dict_values([10, 20]) dict_items([('key1', 10), ('key2', 20)]) 10 10 defaultdict(<class 'int'>, {'key2': 20})
All these methods work as expected. They perform operations on the defaultdict just like they would on a standard dictionary.
Special method specific to defaultdict (default_factory)
This method returns the function that is used to create default values.
print(dd.default_factory)
Output:
<class 'int'>
Here, default_factory
returns <class 'int'>
, which is the function we used to generate default values for the defaultdict.
Iterating over defaultdict
Iterating over a defaultdict is similar to iterating over a standard dictionary. You can iterate over the keys, values, or items (key-value pairs).
Iterating over keys
from collections import defaultdict dd = defaultdict(int) dd['key1'] = 10 dd['key2'] = 20 # Iterate over keys for key in dd: print(key)
Output:
key1 key2
In this example, we create a defaultdict with two keys: ‘key1’ and ‘key2’. We then iterate over the keys using a simple for loop. This prints each key on a separate line.
Iterating over values
for value in dd.values(): print(value)
Output:
10 20
We use the values()
method of defaultdict to get an iterable of all values and then print each value on a separate line.
Iterating over items (key-value pairs)
for key, value in dd.items(): print(f'{key}: {value}')
Output:
key1: 10 key2: 20
The items()
method of defaultdict returns an iterable of tuples, where each tuple contains a key-value pair.
Sorting defaultdict
You can use the sorted()
function which returns a list of sorted keys or values.
Sorting by keys
from collections import defaultdict dd = defaultdict(int) dd['banana'] = 10 dd['apple'] = 20 dd['cherry'] = 15 # Sort by keys sorted_dd = sorted(dd.items()) print(sorted_dd)
Output:
[('apple', 20), ('banana', 10), ('cherry', 15)]
In this example, we create a defaultdict with three keys: ‘banana’, ‘apple’, and ‘cherry’. We then use the sorted
function with dd.items()
as the argument.
This sorts the items by keys and returns a list of tuples, where each tuple contains a key-value pair. The list is sorted in ascending order by the keys.
Sorting by values
sorted_dd = sorted(dd.items(), key=lambda x: x[1]) print(sorted_dd)
Output:
[('banana', 10), ('cherry', 15), ('apple', 20)]
We use the sorted
function with dd.items()
and a key
function as the arguments.
The key
function is a lambda function that returns the second element of a tuple (the value of a key-value pair), so the sorted
function sorts the items by values.
Real-world examples of defaultdict Usage
Represent graph data structures
A common use case of defaultdict
is to represent a graph data structure where the keys are nodes and the values are lists of adjacent nodes.
Let’s say we are building a simple social network and we want to represent the friend connections between different users.
Here is how we might do that using defaultdict
:
from collections import defaultdict # Each pair represents a connection between two users friend_connections = [('Anna', 'Beth'), ('Anna', 'Chad'), ('Beth', 'Dave'), ('Chad', 'Dave'), ('Dave', 'Emily'), ('Emily', 'Frank'), ('Frank', 'Anna')] # Initialize our graph with a defaultdict of type list friend_graph = defaultdict(list) # Populate the graph for user1, user2 in friend_connections: friend_graph[user1].append(user2) friend_graph[user2].append(user1) # Add this line if the friendship is mutual # Print friend connections of each user for user, friends in friend_graph.items(): print(f"{user} is friends with {', '.join(friends)}")
Output:
Anna is friends with Beth, Chad, Frank Beth is friends with Anna, Dave Chad is friends with Anna, Dave Dave is friends with Beth, Chad, Emily Emily is friends with Dave, Frank Frank is friends with Emily, Anna
This way, we can easily represent complex data structures like graphs with defaultdict
.
Counting Word Frequencies
Another common use case of defaultdict in the field of natural language processing or text analytics is counting the frequency of words in a document or a collection of documents.
The idea is to create a defaultdict with the default factory as int
and use each word in the document as a key.
The defaultdict will automatically handle missing keys and return the default value 0.
Let’s write a small script to demonstrate this:
from collections import defaultdict text = """ The Python defaultdict type is a dictionary-like class available in Python's collections module. Unlike standard Python dictionaries, defaultdict lets you specify a default value type at the time of its creation. When you try to access or modify keys in the defaultdict that do not exist, instead of a KeyError, you get a default value of the type you specified during creation. The defaultdict type is useful in situations where you want to avoid unnecessary KeyError exceptions and make your code more readable and clean. It's especially handy when working with collections of data where some keys might not exist. """ word_freq = defaultdict(int) # Normalize the text normalized_text = text.lower().split() # Count the frequency of each word for word in normalized_text: word_freq[word] += 1 for word, freq in word_freq.items(): print(f'Word: {word}, Frequency: {freq}')
Output:
Word: the, Frequency: 5 Word: python, Frequency: 2 Word: defaultdict, Frequency: 4 Word: type, Frequency: 4 Word: is, Frequency: 2 Word: a, Frequency: 4 Word: dictionary-like, Frequency: 1 Word: class, Frequency: 1 Word: available, Frequency: 1 Word: in, Frequency: 3 Word: python's, Frequency: 1 Word: collections, Frequency: 2 Word: module., Frequency: 1 Word: unlike, Frequency: 1 Word: standard, Frequency: 1 Word: dictionaries,, Frequency: 1 Word: lets, Frequency: 1 Word: you, Frequency: 5 Word: specify, Frequency: 1 Word: default, Frequency: 2 Word: value, Frequency: 2 Word: at, Frequency: 1 Word: time, Frequency: 1 Word: of, Frequency: 4 Word: its, Frequency: 1 Word: creation., Frequency: 2 Word: when, Frequency: 2 Word: try, Frequency: 1 Word: to, Frequency: 2 Word: access, Frequency: 1 Word: or, Frequency: 1 Word: modify, Frequency: 1 Word: keys, Frequency: 2 Word: that, Frequency: 1 Word: do, Frequency: 1 Word: not, Frequency: 2 Word: exist,, Frequency: 1 Word: instead, Frequency: 1 Word: keyerror,, Frequency: 1 Word: get, Frequency: 1 Word: specified, Frequency: 1 Word: during, Frequency: 1 Word: useful, Frequency: 1 Word: situations, Frequency: 1 Word: where, Frequency: 2 Word: want, Frequency: 1 Word: avoid, Frequency: 1 Word: unnecessary, Frequency: 1 Word: keyerror, Frequency: 1 Word: exceptions, Frequency: 1 Word: and, Frequency: 2 Word: make, Frequency: 1 Word: your, Frequency: 1 Word: code, Frequency: 1 Word: more, Frequency: 1 Word: readable, Frequency: 1 Word: clean., Frequency: 1 Word: it's, Frequency: 1 Word: especially, Frequency: 1 Word: handy, Frequency: 1 Word: working, Frequency: 1 Word: with, Frequency: 1 Word: data, Frequency: 1 Word: some, Frequency: 1 Word: might, Frequency: 1 Word: exist., Frequency: 1
This code will output the frequency of each word. The words are case-insensitive because we use the lower
function to normalize the text.
The split()
function is used to separate the text into words based on spaces.
Resources
https://docs.python.org/3/library/collections.html#collections.defaultdict
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.