This is one of my first attempts to do something practical with asyncio. The task is simple:
Given a list of URLs, determine if the content type is HTML for every URL.
I've used aiohttp, initializing a single "session", ignoring SSL errors and issuing HEAD requests to avoid downloading the whole endpoint body. Then, I simply check if text/html is inside the Content-Type header string:
import asyncio
import aiohttp
@asyncio.coroutine
def is_html(session, url):
response = yield from session.head(url, compress=True)
print(url, "text/html" in response.headers["Content-Type"])
if __name__ == '__main__':
links = ["https://httpbin.org/html",
"https://httpbin.org/image/png",
"https://httpbin.org/image/svg",
"https://httpbin.org/image"]
loop = asyncio.get_event_loop()
conn = aiohttp.TCPConnector(verify_ssl=False)
with aiohttp.ClientSession(connector=conn, loop=loop) as session:
f = asyncio.wait([is_html(session, link) for link in links])
loop.run_until_complete(f)
The code works, it prints (the output order is inconsistent, of course):
https://httpbin.org/image/svg False
https://httpbin.org/image False
https://httpbin.org/image/png False
https://httpbin.org/html True
But, I'm not sure if I'm using asyncio loop, wait and coroutines, aiohttp's connection and session objects appropriately. What would you recommend to improve?
async defandawait... \$\endgroup\$