httpx is great, but it has a long standing issue where it doesn’t cache ssl context. This is fine if you aren’t creating a lot of clients, but for various reasons, Kodiak creates a ton of http clients.

Narrowing in on the fix

First step is creating a simple test case that we can run reliably to reproduce the issue:

import asyncio
import httpx

async def main() -> None:
    for _ in range(0, 10_000):
        async with httpx.AsyncClient() as client:
            r = await client.get("https://example.com")
            print(r.status_code)

if __name__ == '__main__':
    asyncio.run(main())

And then we can start it up and run py-spy on it:

httpx ssl context creation before

Which clearly shows load_ssl_context_verify is taking up a large portion of the trace.

If we remove the actual network calls, and instead just instantiate the client:

import asyncio
import httpx
import ssl


async def main() -> None:
    while True:
        async with httpx.AsyncClient() as client:
            print("foo")

if __name__ == '__main__':
    asyncio.run(main())

Then the issue is even more pronounced:

httpx ssl context creation before no network

The Fix

The proper fix is to update httpx to cache the ssl context, but as a quick workaround in the meantime, looking around in the innards of load_ssl_context_verify reveals there’s an early return path that’s used when verify is passed into the client’s __init__.

Here’s the code updated with the verify argument:

import asyncio
import httpx
import ssl

# "cache" at module level
context = ssl.create_default_context()

async def main() -> None:
    while True:
        async with httpx.AsyncClient(verify=context) as client:
            print("foo")

if __name__ == '__main__':
    asyncio.run(main())
import asyncio
import httpx
import ssl

URL = "https://example.com"

# "cache" at module level
context = ssl.create_default_context()

async def main() -> None:
    while True:
        async with httpx.AsyncClient(verify=context) as client:
            r = await client.get(URL)
            print(r.status_code)

if __name__ == '__main__':
    asyncio.run(main())

The final results

HTTP calls before

httpx client creation with network calls before the fix

HTTP calls after httpx client creation with network calls after the fix

HTTP client creation before httpx client creation flamegraph before the fix

HTTP client creation after httpx client creation flamegraph after the fix

And finally, after rolling out the change to Kodiak’s production servers, we see the 50% drop in peak usage:

digital ocean cpu utilization graph after showing 50% reduction in peak usage