Python defaultdict tutorial
A Python dictionary is used to store key-value pairs and is the implementation for hash maps in Python.
Each key in a Python dictionary is unique and can be only of an immutable data type such as string
, int
, tuple
, etc.
There is no restriction on the values, they can be of any data type.
If you try to access a key that does not exist in a Python dictionary, you will get a “KeyError
“.
d1 = {"Ashley":42, "Jacob":24, "Katherine":31} print(d1["Ashley"]) #key exists, OK print(d1["Katherine"]) #key exists, OK print(d1["Melanie"]) #key absent, Error
Output:
To overcome this problem, and to better handle this kind of error, Python provides an alternative called defaultdict
 which is part of its in-built collection
 module.
Table of Contents
What is defaultdict?
defaultdict
 is a subclass of Python’s standard dict
 class and works almost similar to the standard dictionary, with the additional provision of specifying default values for missing keys.
Let’s reimplement the dictionary from the previous example, this time using defaultdict
 with a default value of 0.
from collections import defaultdict d2 = defaultdict(int) #setting the default callable to int() print("Defaultdict d2 initialized:", d2) #Assigning key-value pairs d2["Ashley"]=42 d2["Jacob"]=24 d2["Katherine"]=31 print("d2 after setting some keys:",d2) #accessing existent and non-existent keys print(d2["Ashley"]) #key exists, returns corresponding value print(d2["Katherine"]) #key exists, returns corresponding value print(d2["Melanie"]) #key absent, returns default value using int()
Output:
The defaultdict
 constructor as the first parameter a ‘default_factory
‘ method which is called whenever a missing key is accessed on the dictionary.
In the above example, we pass int
 as the default_factory
 method. Whenever int()Â
is called, it returns a 0. Hence, when we access the key ‘Melanie’, we get the value 0.
Note that if we don’t pass any value to the default_factory
 method, its default value is set to None
, in which case our defaultdict
 will work as the standard dict
 and will raise a KeyError
 in case a missing key is accessed.
We could also define our own custom method or pass a lambda
 function, that would return any other desired value to be used as the default value for our dictionary.
Let’s take the same example and set the default value to 99, this time using our custom callable.
from collections import defaultdict # our default method that will be called in case of missing key access def get_default_value(): return 99 d3 = defaultdict(get_default_value, {"Ashley":42, "Jacob":24, "Katherine":31}) print("Dictionary d3:", d3) #accessing existent and non-existent keys print(d2["Ashley"]) #key exists, returns corresponding value print(d2["Katherine"]) #key exists, returns corresponding value print(d2["Melanie"]) #key absent, returns default value using get_default_value()
Output:
This time, when we accessed the key ‘Melanie’, our user-defined function get_default_value
 was called to return the default value.
Note that the callable passed as default_factory
 is called with no arguments, so make sure you define your method accordingly with the matching signature.
How does defaultdict works?
Whenever we access any value of a dictionary, using the subscript operator [ ]
, both Python’s standard dict
 as well as the defaultdict
 objects internally call the __getitem__
 method.
If the dictionary has the specified key, then the __getitem__
 method returns the value of that key.
If the key does not exist, then it internally calls the __missing__
 method.
The __missing__
 method will raise the KeyError
 in the case of standard dictionaries, and in case the default_factory
 parameter is set to None
 for the defaultdict
.
If it is not set to None
, then it will call the method passed as the argument to the default_factory
 parameter.
You can test this by directly calling these methods on the defaultdict
 object.
from collections import defaultdict d4 = defaultdict(lambda : 99, {"Ashley":42, "Jacob":24, "Katherine":31}) #specifying a lambda function as the default callable print("Dictionary d4:", d4) print(d4.__getitem__("Ashley")) #key exists, returns 42 print(d4.__getitem__("Jacob")) #key exists, returns 24 print(d4.__getitem__("Ashton")) #key does not exist, calls __missing__, which in turn calls the lambda method we passed. #directly calling the __missing__ method print("d4.__missing__('Ashton') = ",d4.__missing__("Ashton"))
Output:
Appending to list values in defaultdict
In Python dict
, if you used lists as values and if you wanted to update them dynamically, say in a loop, you always have to check if the key exists before appending values to the corresponding list.
If the key doesn’t exist, you create a new list else you append it to the existing list.
Let’s make a dictionary representing even and odd values up to (and excluding) 20. The even values are identified by the key 0, and the odd values by 1.
d_even_odd = dict() #empty dictionary for i in range(20): key = i%2 if key in d_even_odd: #key exists, list has already been created d_even_odd[key].append(i) else: #key doesn't exist, create one and assign a list with 1 element d_even_odd[key] = [i] for k in d_even_odd: print(f"{k}: {d_even_odd[k]}")
Output:
To avoid this hassle of always checking if the key exists and then performing a certain operation is exactly where defaultdict
 becomes the most useful alternative.
We can simply define a defaultdict
 with the callable list
.
This way whenever we access a key that doesn’t exist, an empty list will be returned, to which we can append the desired value and this updated list will be mapped to the respective key.
from collections import defaultdict dd_even_odd = defaultdict(list) #empty defaultdict with list() as default callable. for i in range(20): key = i%2 # no if condition, missing keys handled implicitly dd_even_odd[key].append(i) for k in dd_even_odd: print(f"{k}: {dd_even_odd[k]}")
Output:
Length of defaultdict
The length of a defaultdict
 indicating the number of key-value pairs in the dictionary can be computed by passing the defaultdict
 object to the len
 method.
This is the same as we would do for the standard dict
.
from collections import defaultdict dd_powers = defaultdict(list) for i in range(8): dd_powers[i].extend([i**2, i**0.5, i**3]) #appending square, square root and cube for k in dd_powers: print(f"{k}: {dd_powers[k]}") print("\nlength of the defaultdict:", len(dd_powers))
Output:
Removing an item from defaultdict
We can remove elements from a defaultdict
 dictionary the way we do in the standard Python dictionaries, i.e using the del
 operator or the pop
 method.
from collections import defaultdict name_lengths = defaultdict(int) names = ["Aman", "Shanaya", "Harris", "Alwyn"] for n in names: name_lengths[n] = len(n) print(f"Current dictionary:") print(name_lengths) del name_lengths["Shanaya"] #removing "Shanaya" deleted_val = name_lengths.pop("Harris") #removing "Harris", returns deleted value print(f"\nDeleted value:",deleted_val) print(f"\nAfter deleting two keys:") print(name_lengths)
Output:
If the requested key doesn’t exist, the del
 statement raises the KeyError
.
The pop
 method returns the deleted value.
If the key does not exist, it raises the KeyError
 or returns the default value specified by the optional parameter d
.
Get a list of keys in defultdict
To get the list of keys in a defaultdict
 dictionary, we can call the keys()
 method on the defaultdict
 object.
The method returns a dict_keys
 object containing all the keys of the object.
The dict_keys
 object is an iterable, we can iterate over it to get the individual keys or we can convert it to a Python list using the list
 method.
The keys
 method is also defined in Python’s dict
 class, which is a parent class of the defaultdict
 class.
from collections import defaultdict name_lengths = defaultdict(int) names = ["Aman", "Shanaya", "Harris", "Alwyn"] for n in names: name_lengths[n] = len(n) print(f"Current dictionary:") print(name_lengths) print(name_lengths.keys()) keys_list = list(name_lengths.keys()) print("\nKeys:",keys_list)
Output:
Checking the existence of keys in defaultdict
Although we don’t need to check for the existence of a key before accessing it in a defaultdict
, we might still want to find out if a certain key exists in the dictionary or not.
To do this, we use Python’s in
 operator that is used with almost all kinds of containers in Python to check if a certain element is present in that container.
from collections import defaultdict divisibility_by_4 = defaultdict(list) for i in range(21): divisibility_by_4[i%4].append(i) print(f"Current dictionary:",divisibility_by_4) print("3 exists?") print(3 in divisibility_by_4) #True, divisibility by 4 can leave remainder 3 print("6 exists?") print(6 in divisibility_by_4) #False, divisor 4 can never produce remainder 6
Output:
Sort a Python defaultdict
By default, Python dictionaries are unordered. That is the reason you cannot index Python dictionaries as there is no notion of the ‘position’ of elements.
So there is no point in sorting a dictionary, whether standard dict
 or a defaultdict
 object in their original form.
However, we can obtain the key-value pairs as an iterable dict_items
 object using the items()
 method, which we can sort by calling Python’s sorted()
 method.
from collections import defaultdict def count_vowels(string): '''function to count number of vowels in a string''' count = 0 for c in str.lower(string): if c in "aeiou": count+=1 return count vowels_counter = defaultdict(int) #maps names to no. of vowels in them names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias"] for n in names: vowels_counter[n] = count_vowels(n) #assigning vowel count to each name print("Current defaultdict:\n",vowels_counter) items = vowels_counter.items() #get key-value pairs print("\ndefaultdict items:\n", items) print("type:",type(items)) items_sorted = sorted(items) #sort key-value pairs print("\nSorted defaultdict items:\n", items_sorted)
Output:
Now if we again try to create a defaultdict
 using these sorted items, the resultant dictionary will still not have the desired sorted ordering.
from collections import defaultdict def count_vowels(string): '''function to count number of vowels in a string''' count = 0 for c in str.lower(string): if c in "aeiou": count+=1 return count vowels_counter = defaultdict(int) #maps names to no. of vowels in them names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias"] for n in names: vowels_counter[n] = count_vowels(n) #assigning vowel count to each name print("Current defaultdict:\n",vowels_counter) items = vowels_counter.items() #get key-value pairs items_sorted = sorted(items) #sort key-value pairs print("\nSorted defaultdict items:\n", items_sorted) # creating new defaultdict using sorted items vowels_counter_1 = defaultdict(int, items) #new defaultdict, unordered print(f"\ndefaultdict from sorted items:\n",vowels_counter_1)
Output:
In these examples, we resorted to default sorting, which is based on the first element of the tuple in the dict_items
 list.
So the result is sorted by keys.
If we want to sort the items by values, we can specify a lambda
 function indicating the basis of sorting using the key
 parameter of the sorted
 method.
from collections import defaultdict def count_vowels(string): '''function to count number of vowels in a string''' count = 0 for c in str.lower(string): if c in "aeiou": count+=1 return count vowels_counter = defaultdict(int) #maps names to no. of vowels in them names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias"] for n in names: vowels_counter[n] = count_vowels(n) #assigning vowel count to each name print("Current defaultdict:\n",vowels_counter) items = vowels_counter.items() #get key-value pairs items_sorted = sorted(items) #sort key-value pairs print("\nSorted defaultdict items:\n", items_sorted) items_sorted_by_value = sorted(items, key=lambda x: x[1]) #value is at pos.1 of key-val pair print("\ndefaultdict items sorted by value:\n", items_sorted_by_value)
Output:
defaultdict to JSON
JSON or JavaScript Object Notion is a popular format for data exchange over the internet.
It can comprise structures similar to both Python lists and dictionaries.
You often find internet APIs sending requests and receiving responses in JSON format.
A file containing JSON data has the extension .json
.
Python provides the json
 library to better parse JSON data from files and also to easily write data to JSON files.
The defaultdict
 object (as well as the standard dict
 object) can be dumped to a JSON file using the dump
 or dumps
 method of the json
 module in Python.
The json.dumps
 method converts the defaultdict
 object into a string representation. We can write this string to a file using the write
 method of the Python file handler.
We can also directly dump the defaultdict
 data as JSON using the json.dump
 method which accepts the dictionary and the file pointer opened in ‘write’ mode.
We can optionally set the parameter indent
 for both these methods to an integer value to pretty print the output JSON with the specified indent level for each data element in JSON.
We can also direct these methods to sort the output JSON data by keys, using the optional boolean parameter sort_keys
. Let’s use all these options in an example.
import json from collections import defaultdict names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias", "Shanaya", "Harris", "Alwyn"] ages = [21, 23, 23, 26, 28, 19, 21, 22, 24] courses = ["CS", "Law", "Environment", "CS", "CS", "Environment", "Law", "Music", "CS"] students = defaultdict(dict) #creating defaultdict with dict callable #adding students data to defaultdict for i in range(len(names)): students[i+100]["name"] = names[i] #would first return an empty dict to which we assign key 'name' students[i+100]["age"] = ages[i] students[i+100]["course"] = courses[i] print(f"Current student data:") print(students) #converting to JSON string students_json = json.dumps(students, indent=3) #add indent of 3 print("\nStudents data as JSON string:") print(students_json) print("type:", type(students_json)) # dumping the string with open("students.json", "w") as f1: f1.write(students_json) print("JSON string dumped in students.json") #dumping json without string conversion with open("students_1.json", "w") as f2: json.dump(students, f2, indent=3, sort_keys=True) #sort the defaultdict keys in output json print("defaultdict directly dumped as JSON in students_1.json")
Output:
Our student data stored as defaultdict
 will be dumped as JSON in the files students.json
 and students_1.json
.
Defaultdict to Pandas DataFrame
Pandas DataFrames are one of the most popular libraries of storing and manipulating 2D tabular data, where each column can be a different datatype.
Pandas provides a way to convert a dictionary into a Pandas DataFrame.
We can pass our defaultdict
 object directly to the pandas.DataFrame
 method as an argument to the first data
 parameter, in which case the row and column indices will be implicitly determined based on the given data.
A better way is to use the pd.DataFrame.from_dict
 method which offers more flexibility in determining the orientation of the table.
Let us convert our student data from the previous example into a Pandas DataFrame.
import pandas as pd from collections import defaultdict names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias", "Shanaya", "Harris", "Alwyn"] ages = [21, 23, 23, 26, 28, 19, 21, 22, 24] courses = ["CS", "Law", "Environment", "CS", "CS", "Environment", "Law", "Music", "CS"] students = defaultdict(dict) #creating defaultdict with dict callable #adding students data to defaultdict for i in range(len(names)): students[i+100]["name"] = names[i] #would first return an empty dict to which we assign key 'name' students[i+100]["age"] = ages[i] students[i+100]["course"] = courses[i] print(f"Current student data:") print(students) #creating a dataframe from defaultdict object df_students = pd.DataFrame.from_dict(students, orient='index') #using defaultdict keys as row indices print(f"\nStudents data as DataFrames:") print(df_students)
Output:
We can also dump the defaultdict
 object into a CSV file using Pandas’ to_csv
 method.
import pandas as pd from collections import defaultdict names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias", "Shanaya", "Harris", "Alwyn"] ages = [21, 23, 23, 26, 28, 19, 21, 22, 24] courses = ["CS", "Law", "Environment", "CS", "CS", "Environment", "Law", "Music", "CS"] students = defaultdict(dict) #creating defaultdict with dict callable #adding students data to defaultdict for i in range(len(names)): students[i+100]["name"] = names[i] #would first return an empty dict to which we assign key 'name' students[i+100]["age"] = ages[i] students[i+100]["course"] = courses[i] print(f"Current student data:") print(students) #creating a dataframe from defaultdict object df_students = pd.DataFrame.from_dict(students, orient='index') #using defaultdict keys as row indices df_students.to_csv("students.csv", index_label="id") print("\nStudent data dumped to students.csv")
With the parameter value index_label="id"
, we indicate that we want to store the row indices as a separate column with the label “id” in the output CSV file.
Output:
Defaultdict to normal dict
Finally, let’s also look at how to convert a defaultdict
 into the standard dict
 type.
It is relatively straightforward, we can simply pass the defaultdict
 object to the dict
 constructor to convert it to the standard dictionary.
from collections import defaultdict names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias", "Shanaya", "Harris", "Alwyn"] ages = [21, 23, 23, 26, 28, 19, 21, 22, 24] courses = ["CS", "Law", "Environment", "CS", "CS", "Environment", "Law", "Music", "CS"] students = defaultdict(dict) #creating defaultdict with dict callable #adding students data to defaultdict for i in range(len(names)): students[i+100]["name"] = names[i] #would first return an empty dict to which we assign key 'name' students[i+100]["age"] = ages[i] students[i+100]["course"] = courses[i] print(f"Current student data:") print(students) print("type:",type(students)) students_d = dict(students) print(f"\nAfter converting to dict:") print(students_d) print("type:",type(students_d))
Output:
Mokhtar is the founder of LikeGeeks.com. He works as a Linux system administrator since 2010. He is responsible for maintaining, securing, and troubleshooting Linux servers for multiple clients around the world. He loves writing shell and Python scripts to automate his work.