Skip to main content
1 of 2
alecxe
  • 17.5k
  • 8
  • 52
  • 93

Checking HTTP headers with asyncio and aiohttp

This is one of my first attempts to do something practical with asyncio. The task is simple:

Given a list of URLs, determine if the content type is HTML for every URL.

I've used aiohttp, initializing a single "session", ignoring SSL errors and issuing HEAD requests to avoid downloading the whole endpoint body. Then, I simply check if text/html is inside the Content-Type header string:

import asyncio

import aiohttp


@asyncio.coroutine
def is_html(session, url):
    response = yield from session.head(url, compress=True)
    print(url, "text/html" in response.headers["Content-Type"])


if __name__ == '__main__':
    links = ["https://httpbin.org/html",
             "https://httpbin.org/image/png",
             "https://httpbin.org/image/svg",
             "https://httpbin.org/image"]
    loop = asyncio.get_event_loop()

    conn = aiohttp.TCPConnector(verify_ssl=False)
    with aiohttp.ClientSession(connector=conn, loop=loop) as session:
        f = asyncio.wait([is_html(session, link) for link in links])
        loop.run_until_complete(f)

The code works, it prints:

https://httpbin.org/image/svg False
https://httpbin.org/image False
https://httpbin.org/image/png False
https://httpbin.org/html True

But, I'm not sure if I'm using asyncio loop, wait and coroutines, aiohttp's connection and session objects appropriately. What would you recommend to improve?

alecxe
  • 17.5k
  • 8
  • 52
  • 93