Python

Python defaultdict tutorial

A Python dictionary is used to store key-value pairs and is the implementation for hash maps in Python.
Each key in a Python dictionary is unique and can be only of an immutable data type such as stringinttuple, etc.

There is no restriction on the values, they can be of any data type.
If you try to access a key that does not exist in a Python dictionary, you will get a “KeyError“.

d1 = {"Ashley":42, "Jacob":24, "Katherine":31}
print(d1["Ashley"]) #key exists, OK
print(d1["Katherine"]) #key exists, OK
print(d1["Melanie"]) #key absent, Error    

Output:

normal dict usage and KeyError

To overcome this problem, and to better handle this kind of error, Python provides an alternative called defaultdict which is part of its in-built collection module.

 

 

What is defaultdict?

defaultdict is a subclass of Python’s standard dict class and works almost similar to the standard dictionary, with the additional provision of specifying default values for missing keys.
Let’s reimplement the dictionary from the previous example, this time using defaultdict with a default value of 0.

from collections import defaultdict
d2 = defaultdict(int) #setting the default callable to int()
print("Defaultdict d2 initialized:", d2)
#Assigning key-value pairs
d2["Ashley"]=42
d2["Jacob"]=24
d2["Katherine"]=31
print("d2 after setting some keys:",d2)
#accessing existent and non-existent keys
print(d2["Ashley"]) #key exists, returns corresponding value
print(d2["Katherine"]) #key exists, returns corresponding value
print(d2["Melanie"]) #key absent, returns default value using int()

Output:

using defaultdict

The defaultdict constructor as the first parameter a ‘default_factory‘ method which is called whenever a missing key is accessed on the dictionary.
In the above example, we pass int as the default_factory method. Whenever int() is called, it returns a 0. Hence, when we access the key ‘Melanie’, we get the value 0.

Note that if we don’t pass any value to the default_factory method, its default value is set to None, in which case our defaultdict will work as the standard dict and will raise a KeyError in case a missing key is accessed.

We could also define our own custom method or pass a lambda function, that would return any other desired value to be used as the default value for our dictionary.

Let’s take the same example and set the default value to 99, this time using our custom callable.

from collections import defaultdict
# our default method that will be called in case of missing key access
def get_default_value(): 
    return 99
d3 = defaultdict(get_default_value, {"Ashley":42, "Jacob":24, "Katherine":31}) 
print("Dictionary d3:", d3)
#accessing existent and non-existent keys
print(d2["Ashley"]) #key exists, returns corresponding value
print(d2["Katherine"]) #key exists, returns corresponding value
print(d2["Melanie"]) #key absent, returns default value using get_default_value()

Output:

using defaultdict with custom callable

This time, when we accessed the key ‘Melanie’, our user-defined function get_default_value was called to return the default value.
Note that the callable passed as default_factory is called with no arguments, so make sure you define your method accordingly with the matching signature.

 

How does defaultdict works?

Whenever we access any value of a dictionary, using the subscript operator [ ], both Python’s standard dict as well as the defaultdict objects internally call the __getitem__ method.
If the dictionary has the specified key, then the __getitem__ method returns the value of that key.

If the key does not exist, then it internally calls the __missing__ method.
The __missing__ method will raise the KeyError in the case of standard dictionaries, and in case the default_factory parameter is set to None for the defaultdict.
If it is not set to None, then it will call the method passed as the argument to the default_factory parameter.

You can test this by directly calling these methods on the defaultdict object.

from collections import defaultdict
d4 = defaultdict(lambda : 99, {"Ashley":42, "Jacob":24, "Katherine":31})  #specifying a lambda function as the default callable
print("Dictionary d4:", d4)
print(d4.__getitem__("Ashley")) #key exists, returns 42
print(d4.__getitem__("Jacob")) #key exists, returns 24
print(d4.__getitem__("Ashton")) #key does not exist, calls __missing__, which in turn calls the lambda method we passed.
#directly calling the __missing__ method
print("d4.__missing__('Ashton') = ",d4.__missing__("Ashton"))

Output:

demonstrating internal working of defaultdict

 

Appending to list values in defaultdict

In Python dict, if you used lists as values and if you wanted to update them dynamically, say in a loop, you always have to check if the key exists before appending values to the corresponding list.
If the key doesn’t exist, you create a new list else you append it to the existing list.
Let’s make a dictionary representing even and odd values up to (and excluding) 20. The even values are identified by the key 0, and the odd values by 1.

d_even_odd = dict() #empty dictionary
for i in range(20):
    key = i%2
    if key in d_even_odd:
        #key exists, list has already been created
        d_even_odd[key].append(i)
    else:
        #key doesn't exist, create one and assign a list with 1 element
        d_even_odd[key] = [i]
for k in d_even_odd:
    print(f"{k}: {d_even_odd[k]}")

Output:

standard dict with list values

To avoid this hassle of always checking if the key exists and then performing a certain operation is exactly where defaultdict becomes the most useful alternative.
We can simply define a defaultdict with the callable list.
This way whenever we access a key that doesn’t exist, an empty list will be returned, to which we can append the desired value and this updated list will be mapped to the respective key.

from collections import defaultdict
dd_even_odd = defaultdict(list) #empty defaultdict with list() as default callable.
for i in range(20):
    key = i%2
    # no if condition, missing keys handled implicitly
    dd_even_odd[key].append(i)
for k in dd_even_odd:
    print(f"{k}: {dd_even_odd[k]}")

Output:

append to defaultdict list values

 

Length of defaultdict

The length of a defaultdict indicating the number of key-value pairs in the dictionary can be computed by passing the defaultdict object to the len method.
This is the same as we would do for the standard dict.

from collections import defaultdict
dd_powers = defaultdict(list) 
for i in range(8):
    dd_powers[i].extend([i**2, i**0.5, i**3]) #appending square, square root and cube
for k in dd_powers:
    print(f"{k}: {dd_powers[k]}")
print("\nlength of the defaultdict:", len(dd_powers))

Output:

extend defaultdict list and compute its length

 

Removing an item from defaultdict

We can remove elements from a defaultdict dictionary the way we do in the standard Python dictionaries, i.e using the del operator or the pop method.

from collections import defaultdict
name_lengths = defaultdict(int) 
names = ["Aman", "Shanaya", "Harris", "Alwyn"]
for n in names:
    name_lengths[n] = len(n)
print(f"Current dictionary:")
print(name_lengths)
del name_lengths["Shanaya"] #removing "Shanaya"
deleted_val = name_lengths.pop("Harris") #removing "Harris", returns deleted value
print(f"\nDeleted value:",deleted_val)
print(f"\nAfter deleting two keys:")
print(name_lengths)

Output:

removing element from defaultdict

If the requested key doesn’t exist, the del statement raises the KeyError.
The pop method returns the deleted value.

If the key does not exist, it raises the KeyError or returns the default value specified by the optional parameter d.

 

Get a list of keys in defultdict

To get the list of keys in a defaultdict dictionary, we can call the keys() method on the defaultdict object.
The method returns a dict_keys object containing all the keys of the object.
The dict_keys object is an iterable, we can iterate over it to get the individual keys or we can convert it to a Python list using the list method.
The keys method is also defined in Python’s dict class, which is a parent class of the defaultdict class.

from collections import defaultdict
name_lengths = defaultdict(int) 
names = ["Aman", "Shanaya", "Harris", "Alwyn"]
for n in names:
    name_lengths[n] = len(n)
print(f"Current dictionary:")
print(name_lengths)
print(name_lengths.keys())
keys_list = list(name_lengths.keys())
print("\nKeys:",keys_list)

Output:

getting list of defaultdict keys

 

Checking the existence of keys in defaultdict

Although we don’t need to check for the existence of a key before accessing it in a defaultdict, we might still want to find out if a certain key exists in the dictionary or not.
To do this, we use Python’s in operator that is used with almost all kinds of containers in Python to check if a certain element is present in that container.

from collections import defaultdict
divisibility_by_4 = defaultdict(list)
for i in range(21):
    divisibility_by_4[i%4].append(i)
print(f"Current dictionary:",divisibility_by_4)
print("3 exists?")
print(3 in divisibility_by_4) #True, divisibility by 4 can leave remainder 3
print("6 exists?")
print(6 in divisibility_by_4) #False, divisor 4 can never produce remainder 6

Output:

finding if key exists in defaultdict

 

Sort a Python defaultdict

By default, Python dictionaries are unordered. That is the reason you cannot index Python dictionaries as there is no notion of the ‘position’ of elements.
So there is no point in sorting a dictionary, whether standard dict or a defaultdict object in their original form.
However, we can obtain the key-value pairs as an iterable dict_items object using the items() method, which we can sort by calling Python’s sorted() method.

from collections import defaultdict
def count_vowels(string):
    '''function to count number of vowels in a string'''
    count = 0
    for c in str.lower(string):
        if c in "aeiou":
            count+=1
    return count 
vowels_counter = defaultdict(int) #maps names to no. of vowels in them
names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias"]
for n in names:
    vowels_counter[n] = count_vowels(n) #assigning vowel count to each name
print("Current defaultdict:\n",vowels_counter)
items = vowels_counter.items() #get key-value pairs 
print("\ndefaultdict items:\n", items)
print("type:",type(items))
items_sorted = sorted(items) #sort key-value pairs
print("\nSorted defaultdict items:\n", items_sorted)

Output:

sorting defaultdict items

Now if we again try to create a defaultdict using these sorted items, the resultant dictionary will still not have the desired sorted ordering.

from collections import defaultdict
def count_vowels(string):
    '''function to count number of vowels in a string'''
    count = 0
    for c in str.lower(string):
        if c in "aeiou":
            count+=1
    return count 
vowels_counter = defaultdict(int) #maps names to no. of vowels in them
names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias"]
for n in names:
    vowels_counter[n] = count_vowels(n) #assigning vowel count to each name
print("Current defaultdict:\n",vowels_counter)
items = vowels_counter.items() #get key-value pairs 
items_sorted = sorted(items) #sort key-value pairs
print("\nSorted defaultdict items:\n", items_sorted)
# creating new defaultdict using sorted items
vowels_counter_1 = defaultdict(int, items) #new defaultdict, unordered
print(f"\ndefaultdict from sorted items:\n",vowels_counter_1) 

Output:

unordered defaultdict created from sorted items

In these examples, we resorted to default sorting, which is based on the first element of the tuple in the dict_items list.
So the result is sorted by keys.
If we want to sort the items by values, we can specify a lambda function indicating the basis of sorting using the key parameter of the sorted method.

from collections import defaultdict
def count_vowels(string):
    '''function to count number of vowels in a string'''
    count = 0
    for c in str.lower(string):
        if c in "aeiou":
            count+=1
    return count 
vowels_counter = defaultdict(int) #maps names to no. of vowels in them
names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias"]
for n in names:
    vowels_counter[n] = count_vowels(n) #assigning vowel count to each name
print("Current defaultdict:\n",vowels_counter)
items = vowels_counter.items() #get key-value pairs 
items_sorted = sorted(items) #sort key-value pairs
print("\nSorted defaultdict items:\n", items_sorted)
items_sorted_by_value = sorted(items, key=lambda x: x[1]) #value is at pos.1 of key-val pair
print("\ndefaultdict items sorted by value:\n", items_sorted_by_value)

Output:

defaultdict items sorted by value

 

defaultdict to JSON

JSON or JavaScript Object Notion is a popular format for data exchange over the internet.
It can comprise structures similar to both Python lists and dictionaries.
You often find internet APIs sending requests and receiving responses in JSON format.
A file containing JSON data has the extension .json.

Python provides the json library to better parse JSON data from files and also to easily write data to JSON files.
The defaultdict object (as well as the standard dict object) can be dumped to a JSON file using the dump or dumps method of the json module in Python.
The json.dumps method converts the defaultdict object into a string representation. We can write this string to a file using the write method of the Python file handler.
We can also directly dump the defaultdict data as JSON using the json.dump method which accepts the dictionary and the file pointer opened in ‘write’ mode.
We can optionally set the parameter indent for both these methods to an integer value to pretty print the output JSON with the specified indent level for each data element in JSON.
We can also direct these methods to sort the output JSON data by keys, using the optional boolean parameter sort_keys. Let’s use all these options in an example.

import json
from collections import defaultdict
names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias", "Shanaya", "Harris", "Alwyn"]
ages = [21, 23, 23, 26, 28, 19, 21, 22, 24]
courses = ["CS", "Law", "Environment", "CS", "CS", "Environment", "Law", "Music", "CS"]
students = defaultdict(dict) #creating defaultdict with dict callable
#adding students data to defaultdict
for i in range(len(names)):
    students[i+100]["name"] = names[i] #would first return an empty dict to which we assign key 'name'
    students[i+100]["age"] = ages[i]
    students[i+100]["course"] = courses[i]
print(f"Current student data:")
print(students)
#converting to JSON string
students_json = json.dumps(students, indent=3) #add indent of 3
print("\nStudents data as JSON string:")
print(students_json)
print("type:", type(students_json))
# dumping the string
with open("students.json", "w") as f1:
    f1.write(students_json)
print("JSON string dumped in students.json")
#dumping json without string conversion
with open("students_1.json", "w") as f2:
    json.dump(students, f2, indent=3, sort_keys=True) #sort the defaultdict keys in output json
print("defaultdict directly dumped as JSON in students_1.json")

Output:

demo of defaultdict to JSON

Our student data stored as defaultdict will be dumped as JSON in the files students.json and students_1.json.

 

Defaultdict to Pandas DataFrame

Pandas DataFrames are one of the most popular libraries of storing and manipulating 2D tabular data, where each column can be a different datatype.
Pandas provides a way to convert a dictionary into a Pandas DataFrame.
We can pass our defaultdict object directly to the pandas.DataFrame method as an argument to the first data parameter, in which case the row and column indices will be implicitly determined based on the given data.
A better way is to use the pd.DataFrame.from_dict method which offers more flexibility in determining the orientation of the table.
Let us convert our student data from the previous example into a Pandas DataFrame.

import pandas as pd
from collections import defaultdict
names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias", "Shanaya", "Harris", "Alwyn"]
ages = [21, 23, 23, 26, 28, 19, 21, 22, 24]
courses = ["CS", "Law", "Environment", "CS", "CS", "Environment", "Law", "Music", "CS"]
students = defaultdict(dict) #creating defaultdict with dict callable
#adding students data to defaultdict
for i in range(len(names)):
    students[i+100]["name"] = names[i] #would first return an empty dict to which we assign key 'name'
    students[i+100]["age"] = ages[i]
    students[i+100]["course"] = courses[i]
print(f"Current student data:")
print(students)
#creating a dataframe from defaultdict object
df_students = pd.DataFrame.from_dict(students, orient='index') #using defaultdict keys as row indices
print(f"\nStudents data as DataFrames:")
print(df_students)

Output:

converting defaultdict to dataframe

We can also dump the defaultdict object into a CSV file using Pandas’ to_csv method.

import pandas as pd
from collections import defaultdict
names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias", "Shanaya", "Harris", "Alwyn"]
ages = [21, 23, 23, 26, 28, 19, 21, 22, 24]
courses = ["CS", "Law", "Environment", "CS", "CS", "Environment", "Law", "Music", "CS"]
students = defaultdict(dict) #creating defaultdict with dict callable
#adding students data to defaultdict
for i in range(len(names)):
    students[i+100]["name"] = names[i] #would first return an empty dict to which we assign key 'name'
    students[i+100]["age"] = ages[i]
    students[i+100]["course"] = courses[i]
print(f"Current student data:")
print(students)
#creating a dataframe from defaultdict object
df_students = pd.DataFrame.from_dict(students, orient='index') #using defaultdict keys as row indices
df_students.to_csv("students.csv", index_label="id")
print("\nStudent data dumped to students.csv")

With the parameter value index_label="id", we indicate that we want to store the row indices as a separate column with the label “id” in the output CSV file.

Output:

converting defaultdict to csv

 

Defaultdict to normal dict

Finally, let’s also look at how to convert a defaultdict into the standard dict type.
It is relatively straightforward, we can simply pass the defaultdict object to the dict constructor to convert it to the standard dictionary.

from collections import defaultdict
names = ["Ashneer", "Pamella", "Aaliya", "Wright", "Jennifer", "Iglesias", "Shanaya", "Harris", "Alwyn"]
ages = [21, 23, 23, 26, 28, 19, 21, 22, 24]
courses = ["CS", "Law", "Environment", "CS", "CS", "Environment", "Law", "Music", "CS"]
students = defaultdict(dict) #creating defaultdict with dict callable
#adding students data to defaultdict
for i in range(len(names)):
    students[i+100]["name"] = names[i] #would first return an empty dict to which we assign key 'name'
    students[i+100]["age"] = ages[i]
    students[i+100]["course"] = courses[i]
print(f"Current student data:")
print(students)
print("type:",type(students))
students_d = dict(students)
print(f"\nAfter converting to dict:")
print(students_d)
print("type:",type(students_d))

Output:

converting defaultdict to normal dict

Leave a Reply

Your email address will not be published.