Using ETags for Efficient Data Transfer in aiohttp

ETags are unique identifiers assigned to specific versions of a resource. They help determine if the content has changed so you can avoid downloading unchanged data.

In this tutorial, you’ll learn how to implement ETag support in aiohttp servers and clients to reduce unnecessary data transfer.

You’ll also explore advanced methods like optimistic locking and concurrency control using ETags.

 

 

Generate ETags

You can generate ETags using various methods, such as content hashing or timestamp-based approaches:

import time
import hashlib
def generate_etag(content):
    return hashlib.md5(content.encode()).hexdigest()
def generate_timestamp_etag():
    return str(int(time.time()))
content = "Hello, World!"
print(f"Content-based ETag: {generate_etag(content)}")
print(f"Timestamp-based ETag: {generate_timestamp_etag()}")

Output:

Content-based ETag: 65a8e27d8879283831b664bd8b7f0ad4
Timestamp-based ETag: 1726045905

The content-based ETag remains consistent for the same content, while the timestamp-based ETag changes with each generation.

 

Implement ETag Support in aiohttp Server

To generate ETags for aiohttp responses, you can create a middleware:

from aiohttp import web
import hashlib

@web.middleware
async def etag_middleware(request, handler):
    response = await handler(request)
    if response.body is not None:
        etag = hashlib.md5(response.body).hexdigest()
        response.headers['ETag'] = etag
    return response
app = web.Application(middlewares=[etag_middleware])
async def hello(request):
    return web.Response(text="Hello, world!")
app.router.add_get('/', hello)
if __name__ == '__main__':
    web.run_app(app)

This middleware automatically generates and sets ETags for all responses with a body.

If you check the response header, you’ll see the ETag value.

Handle If-None-Match requests

To handle If-None-Match requests, you can check if the client’s If-None-Match header matches the current ETag and returns a 304 Not Modified response if they match:

async def handle_request(request):
    content = "Hello, World!"
    etag = hashlib.md5(content.encode()).hexdigest()
    if request.headers.get('If-None-Match') == etag:
        return web.Response(status=304)
    return web.Response(text=content, headers={'ETag': etag})
app.router.add_get('/', handle_request)

 

ETag-aware Caching in Aiohttp Clients

To implement ETag-aware caching in aiohttp clients, you can use a dictionary to store ETags:

import aiohttp
import asyncio
etag_cache = {}
async def fetch_with_etag(url):
    async with aiohttp.ClientSession() as session:
        headers = {'If-None-Match': etag_cache.get(url)} if url in etag_cache else {}
        async with session.get(url, headers=headers) as response:
            if response.status == 304:
                print(f"Resource not modified: {url}")
                return None
            etag_cache[url] = response.headers.get('ETag')
            content = await response.text()
            print(f"Received content: {content}")
            return content
async def main():
    # First request
    await fetch_with_etag('http://127.0.0.1:8080')

    # Second request (should be cached)
    await fetch_with_etag('http://127.0.0.1:8080')
asyncio.run(main())

Output:

Received content: Hello, World!
Resource not modified: http://127.0.0.1:8080

This function stores ETags in a dictionary and uses them for subsequent requests.

Update Local Cache Based on ETag Changes

To update the local cache based on ETag changes:

import json
import asyncio
import aiohttp
async def fetch_and_update_cache(url, cache):
    async with aiohttp.ClientSession() as session:
        headers = {'If-None-Match': cache[url]['etag']} if url in cache else {}
        async with session.get(url, headers=headers) as response:
            if response.status == 304:
                print("Cache is up to date")
                return cache[url]['data']
            data = await response.json()
            cache[url] = {'etag': response.headers.get('ETag'), 'data': data}
            print("Cache updated")
            return data
cache = {}
result = asyncio.run(fetch_and_update_cache('https://api.github.com/users/octocat', cache))
print(json.dumps(result, indent=2))

Output:

Cache updated
{
  "login": "octocat",
  "id": 583231,
  "node_id": "MDQ6VXNlcjU4MzIzMQ==",
  "avatar_url": "https://avatars.githubusercontent.com/u/583231?v=4",
  "gravatar_id": "",
  "url": "https://api.github.com/users/octocat",
  "html_url": "https://github.com/octocat",
  "followers_url": "https://api.github.com/users/octocat/followers",
  "following_url": "https://api.github.com/users/octocat/following{/other_user}",
  "gists_url": "https://api.github.com/users/octocat/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/octocat/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/octocat/subscriptions",
  "organizations_url": "https://api.github.com/users/octocat/orgs",
  "repos_url": "https://api.github.com/users/octocat/repos",
  "events_url": "https://api.github.com/users/octocat/events{/privacy}",
  "received_events_url": "https://api.github.com/users/octocat/received_events",
  "type": "User",
  "site_admin": false,
  "name": "The Octocat",
  "company": "@github",
  "blog": "https://github.blog",
  "location": "San Francisco",
  "email": null,
  "hireable": null,
  "bio": null,
  "twitter_username": null,
  "public_repos": 8,
  "public_gists": 8,
  "followers": 14882,
  "following": 9,
  "created_at": "2011-01-25T18:44:36Z",
  "updated_at": "2024-08-22T11:25:04Z"
}

C:\Users\Mokhtar\Python Projects>python client.py
Cache updated
{
  "login": "octocat",
  "id": 583231,
  "node_id": "MDQ6VXNlcjU4MzIzMQ==",
  "avatar_url": "https://avatars.githubusercontent.com/u/583231?v=4",
  "gravatar_id": "",
  "url": "https://api.github.com/users/octocat",
  "html_url": "https://github.com/octocat",
  "followers_url": "https://api.github.com/users/octocat/followers",
  "following_url": "https://api.github.com/users/octocat/following{/other_user}",
  "gists_url": "https://api.github.com/users/octocat/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/octocat/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/octocat/subscriptions",
  "organizations_url": "https://api.github.com/users/octocat/orgs",
  "repos_url": "https://api.github.com/users/octocat/repos",
  "events_url": "https://api.github.com/users/octocat/events{/privacy}",
  "received_events_url": "https://api.github.com/users/octocat/received_events",
  "type": "User",
  "site_admin": false,
  "name": "The Octocat",
  "company": "@github",
  "blog": "https://github.blog",
  "location": "San Francisco",
  "email": null,
  "hireable": null,
  "bio": null,
  "twitter_username": null,
  "public_repos": 8,
  "public_gists": 8,
  "followers": 14882,
  "following": 9,
  "created_at": "2011-01-25T18:44:36Z",
  "updated_at": "2024-08-22T11:25:04Z"
}

This function updates the cache with new data and ETags when the resource changes.

 

Optimize ETag for Static files

For static files such as font and CSS files, you can use file metadata to generate ETags:

import os
import time
def generate_static_file_etag(file_path):
    stat = os.stat(file_path)
    return f"{stat.st_mtime}-{stat.st_size}"
file_path = "style.css"
etag = generate_static_file_etag(file_path)
print(f"ETag for {file_path}: {etag}")

Output:

ETag for style.css: 1726048509.5163145-250

This function generates an ETag based on the file’s modification time and size, suitable for static files.

Leave a Reply

Your email address will not be published. Required fields are marked *