Lately I’ve been noticing recipeyak’s API is kind of slow and after having success using asyncio with kodiak, I thought it would be good to benchmark sync Python against async Python.

Also I threw in a Rust implementation for kicks.

setup

architecture:

client (oha) -> haproxy -> http server -> postgres

client: my laptop (in Boston)

servers:

service vCPUs Memory Storage Price Region
haproxy 2 4 GB 80 GB $28/mo NYC
http server 2 4 GB 80 GB $28/mo NYC
postgres (v14) w/ pgbouncer (25 connections) 2 4 GB 38 GB $60/mo NYC

haproxy config:

# /etc/haproxy/haproxy.cfg
global
    stats timeout 30s
    daemon

defaults
    log	global
    mode http
    option httplog
    timeout connect 5s
    timeout client 30s
    timeout server 30s

frontend front
    bind *:80
    use_backend api if { path_beg /api/v1/ }

backend api
    server recipes 10.0.0.1:8080 maxconn 25

Note: I took the max connection count from the docs, but changing it up or down didn’t seem to affect anything as the servers didn’t get overloaded.

Also I didn’t setup TLS but I don’t think it should affect the various results too much because haproxy would handle the termination.

client oha (version 0.5.2) command:

oha -z 1m -c 50 --no-tui \
    -H 'Cookie: sessionid=$SESSION_ID' 'http://$IP/api/v1/recipes'

also used psrecord to record CPU and memory usage:

psrecord --interval 1 --include-children --log $logname.log $PID

results

python (sync): django + psycopg

Chose 5 workers as gunicorn docs suggest (2 * os.cpu_count()) + 1

./.venv/bin/gunicorn \
    -w 5 \
    main:app \
    -b 0.0.0.0:8080 \
    --access-logfile - \
    --error-logfile - \
    --capture-output \
    --enable-stdio-inheritance \
    --access-logformat 'request="%(r)s" request_time=%(L)s remote_addr="%(h)s" request_id=%({X-Request-Id}i)s response_id=%({X-Response-Id}i)s method=%(m)s protocol=%(H)s status_code=%(s)s response_length=%(b)s referer="%(f)s" process_id=%(p)s user_agent="%(a)s"'

Also tried using --threads but it didn’t make a difference in performance.

Summary:
  Success rate:	1.0000
  Total:	60.0026 secs
  Slowest:	1.6509 secs
  Fastest:	0.2181 secs
  Average:	0.8512 secs
  Requests/sec:	58.3474

  Total data:	11.95 MiB
  Size/request:	3.49 KiB
  Size/sec:	203.91 KiB

Response time histogram:
  0.348 [6]    |
  0.479 [18]   |
  0.609 [114]  |■■■
  0.739 [1131] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.869 [1015] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  1.000 [335]  |■■■■■■■■■
  1.130 [495]  |■■■■■■■■■■■■■■
  1.260 [253]  |■■■■■■■
  1.390 [86]   |■■
  1.521 [35]   |
  1.651 [13]   |

Latency distribution:
  10% in 0.6521 secs
  25% in 0.7046 secs
  50% in 0.7903 secs
  75% in 1.0018 secs
  90% in 1.1429 secs
  95% in 1.2335 secs
  99% in 1.4102 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0345 secs, 0.0252 secs, 0.0416 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0001 secs

Status code distribution:
  [200] 3501 responses

perf log

# Elapsed time   CPU (%)     Real (MB)
       0.000        0.000      227.293
       1.004        0.000      227.293
       2.008        0.000      227.293
       3.012        0.000      227.293
       4.016        0.000      227.293
       5.020        0.000      227.293
       6.025        0.000      227.293
       7.031        0.000      227.293
       8.035        0.000      227.293
       9.040        0.000      227.293
      10.044        0.000      227.293
      11.048       48.000      274.422
      12.052       38.000      274.422
      13.056       40.000      274.422
      14.060       34.000      274.422
      15.064       41.000      274.422
      16.068       35.000      274.422
      17.072       35.000      274.422
      18.077       36.000      274.422
      19.081       37.000      274.422
      20.085       36.000      274.422
      21.089       32.000      274.422
      22.093       42.900      274.422
      23.099       34.000      274.422
      24.104       39.000      274.422
      25.108       35.000      274.422
      26.112       27.000      274.422
      27.117       32.000      274.422
      28.121       40.000      274.422
      29.127       43.000      274.422
      30.132       42.000      274.422
      31.136       39.000      274.422
      32.141       37.000      274.422
      33.145       37.000      274.422
      34.149       35.000      274.422
      35.153       28.000      274.422
      36.157       38.000      274.422
      37.161       29.000      274.422
      38.165       37.000      274.672
      39.169       38.000      274.672
      40.173       36.000      275.176
      41.177       36.000      275.176
      42.182       32.000      275.176
      43.187       33.000      275.176
      44.191       34.000      275.422
      45.195       37.000      275.672
      46.199       32.000      275.672
      47.202       35.000      275.672
      48.207       36.000      275.672
      49.211       37.000      275.672
      50.215       37.000      275.672
      51.219       44.000      275.672
      52.224       47.800      275.672
      53.229       46.500      275.672
      54.235       43.000      275.672
      55.240       37.000      275.672
      56.244       31.000      275.672
      57.249       32.000      275.672
      58.253       40.000      275.672
      59.257       36.000      275.672
      60.261       39.000      275.672
      61.265       36.000      275.672
      62.269       35.000      275.672
      63.273       35.000      275.672
      64.277       40.000      275.672
      65.281       29.000      275.672
      66.287       36.000      275.672
      67.291       40.000      275.672
      68.297       37.000      275.672
      69.301       36.000      275.672
      70.305       39.000      275.672
      71.309       19.000      275.672
      72.313        0.000      275.672

python (async): starlette + asyncpg

./.venv/bin/gunicorn \
    -w 2 \
    -k uvicorn.workers.UvicornWorker \
    -b 0.0.0.0:3000 main:app \
    --access-logfile - \
    --error-logfile - \
    --capture-output \
    --enable-stdio-inheritance \
    --access-logformat 'request="%(r)s" request_time=%(L)s remote_addr="%(h)s" request_id=%({X-Request-Id}i)s response_id=%({X-Response-Id}i)s method=%(m)s protocol=%(H)s status_code=%(s)s response_length=%(b)s referer="%(f)s" process_id=%(p)s user_agent="%(a)s"'
Summary:
  Success rate:	1.0000
  Total:	60.0037 secs
  Slowest:	0.3148 secs
  Fastest:	0.0424 secs
  Average:	0.1009 secs
  Requests/sec:	495.3030

  Total data:	103.01 MiB
  Size/request:	3.55 KiB
  Size/sec:	1.72 MiB

Response time histogram:
  0.066 [1809]  |■■■■■
  0.090 [8589]  |■■■■■■■■■■■■■■■■■■■■■■■
  0.113 [11554] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.137 [5272]  |■■■■■■■■■■■■■■
  0.161 [1661]  |■■■■
  0.184 [499]   |■
  0.208 [207]   |
  0.232 [84]    |
  0.255 [32]    |
  0.279 [9]     |
  0.303 [4]     |

Latency distribution:
  10% in 0.0697 secs
  25% in 0.0812 secs
  50% in 0.0991 secs
  75% in 0.1143 secs
  90% in 0.1332 secs
  95% in 0.1479 secs
  99% in 0.1866 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0308 secs, 0.0253 secs, 0.0376 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0001 secs

Status code distribution:
  [200] 29720 responses

perf log

# Elapsed time   CPU (%)     Real (MB)
       0.000        0.000      112.832
       1.004        0.000      112.832
       2.010        0.000      112.832
       3.014        1.000      112.832
       4.018        0.000      112.832
       5.022        0.000      112.832
       6.026        1.000      112.832
       7.029        0.000      112.832
       8.034        4.000      114.289
       9.037        0.000      114.289
      10.041        1.000      114.289
      11.045        7.000      115.516
      12.049       38.800      116.289
      13.053       81.600      117.574
      14.057       94.600      118.863
      15.061      106.600      120.152
      16.065      141.400      121.957
      17.070      131.400      123.504
      18.075      143.200      125.051
      19.079      154.400      126.598
      20.083      142.400      128.402
      21.087      148.400      129.949
      22.091      143.400      131.496
      23.096      138.400      133.043
      24.100      141.400      134.074
      25.104      142.500      134.590
      26.108      122.400      135.105
      27.112      121.600      135.363
      28.116      140.400      135.879
      29.120      133.600      135.879
      30.124      136.400      135.879
      31.127      136.500      136.395
      32.131      132.400      136.652
      33.136      134.400      136.652
      34.140      143.400      136.910
      35.144      135.400      137.168
      36.147      143.400      137.941
      37.152      130.600      138.457
      38.156      134.400      138.715
      39.160      130.400      139.230
      40.164      133.400      139.746
      41.168      142.600      140.004
      42.172      135.400      140.262
      43.176      147.300      140.262
      44.180      163.400      140.520
      45.184      158.200      141.035
      46.189      161.400      141.809
      47.193      119.500      141.809
      48.198      119.500      141.809
      49.201      105.600      141.809
      50.205      120.600      141.809
      51.208      132.600      142.066
      52.212      137.400      142.066
      53.217      134.200      142.324
      54.222      135.400      142.324
      55.226      138.400      142.324
      56.230      128.600      142.324
      57.234      127.500      142.324
      58.238      143.400      142.582
      59.242      136.500      142.840
      60.246      134.600      142.840
      61.249      145.400      142.840
      62.253      132.300      142.840
      63.259      128.400      142.840
      64.264      139.400      142.840
      65.268      138.400      142.840
      66.272      159.200      143.098
      67.278      160.200      143.098
      68.282      106.500      143.098
      69.287        0.000      143.098
      70.291        2.000      143.098
      71.295        0.000      143.098

rust (async): axum + tokio postgres + bb8 (for connection pool)

./target/release/async_rust
Summary:
  Success rate:	1.0000
  Total:	60.0030 secs
  Slowest:	0.3792 secs
  Fastest:	0.0758 secs
  Average:	0.1870 secs
  Requests/sec:	267.0533

  Total data:	58.45 MiB
  Size/request:	3.73 KiB
  Size/sec:	997.51 KiB

Response time histogram:
  0.103 [22]   |
  0.131 [145]  |
  0.159 [1753] |■■■■■■■
  0.186 [7096] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.214 [4543] |■■■■■■■■■■■■■■■■■■■■
  0.241 [1629] |■■■■■■■
  0.269 [545]  |■■
  0.296 [225]  |■
  0.324 [40]   |
  0.352 [17]   |
  0.379 [9]    |

Latency distribution:
  10% in 0.1566 secs
  25% in 0.1679 secs
  50% in 0.1820 secs
  75% in 0.2010 secs
  90% in 0.2249 secs
  95% in 0.2424 secs
  99% in 0.2794 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0353 secs, 0.0288 secs, 0.0377 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0001 secs

Status code distribution:
  [200] 16024 responses

perf log

# Elapsed time   CPU (%)     Real (MB)
       0.000        0.000       14.461
       1.004        0.000       14.461
       2.007        0.000       14.461
       3.010        4.000       14.461
       4.014       46.800       14.578
       5.018       50.800       14.578
       6.024       45.700       14.578
       7.029       38.800       14.578
       8.033       39.800       14.578
       9.037       35.900       14.578
      10.041       37.900       14.578
      11.045       37.800       14.578
      12.049       35.900       14.578
      13.053       39.800       14.578
      14.057       36.800       14.578
      15.061       38.900       14.578
      16.065       35.900       14.578
      17.068       37.900       14.578
      18.072       34.900       14.578
      19.076       36.900       14.586
      20.079       34.900       14.586
      21.083       36.900       14.586
      22.086       36.900       14.586
      23.090       37.800       14.586
      24.094       43.800       14.586
      25.098       35.900       14.586
      26.102       38.800       14.621
      27.106       50.800       14.621
      28.112       40.700       14.621
      29.117       41.800       14.621
      30.121       44.800       14.621
      31.125       38.800       14.621
      32.129       37.800       14.621
      33.133       35.900       14.621
      34.136       38.900       14.621
      35.140       40.900       14.621
      36.144       36.800       14.621
      37.148       30.900       14.629
      38.152       33.900       14.621
      39.156       35.900       14.621
      40.160       37.900       14.621
      41.163       36.900       14.621
      42.167       37.900       14.621
      43.171       33.900       14.621
      44.174       35.900       14.598
      45.178       39.900       14.598
      46.182       37.900       14.598
      47.185       37.900       14.598
      48.190       39.800       14.617
      49.193       37.900       14.617
      50.198       48.800       14.617
      51.202       52.800       14.617
      52.207       47.800       14.617
      53.212       47.800       14.625
      54.216       41.800       14.617
      55.219       37.900       14.617
      56.224       35.800       14.617
      57.227       37.900       14.617
      58.231       27.900       14.617
      59.235       30.900       14.617
      60.239       37.900       14.617
      61.242       36.900       14.617
      62.247       34.900       14.633
      63.250       26.900       14.633
      64.254        0.000       14.633

Also compiler errors sometimes look like:

error[E0277]: the trait bound `fn(Extension<Pool<PostgresConnectionManager<MakeTlsConnector>>>, CookieJar) -> impl Future<Output = Result<Json<[type error]>, (axum::http::StatusCode, std::string::String)>> {recipes_list}: Handler<_, _>` is not satisfied
   --> src/main.rs:48:39
    |
48  |         .route("/api/v1/recipes", get(recipes_list))
    |                                   --- ^^^^^^^^^^^^ the trait `Handler<_, _>` is not implemented for `fn(Extension<Pool<PostgresConnectionManager<MakeTlsConnector>>>, CookieJar) -> impl Future<Output = Result<Json<[type error]>, (axum::http::StatusCode, std::string::String)>> {recipes_list}`
    |                                   |
    |                                   required by a bound introduced by this call
    |
    = help: the trait `Handler<T, ReqBody>` is implemented for `axum::handler::Layered<S, T>`
note: required by a bound in `axum::routing::get`
   --> /Users/steve/.cargo/registry/src/github.com-1ecc6299db9ec823/axum-0.5.16/src/routing/method_routing.rs:395:1
    |
395 | top_level_handler_fn!(get, GET);
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `axum::routing::get`
    = note: this error originates in the macro `top_level_handler_fn` (in Nightly builds, run with -Z macro-backtrace for more info)

conclusion

name RPS peak CPU peak memory
sync Python 58 48% 276 MB
async Python 495 160% 143 MB
async Rust 267 53% 15 MB

Switching from sync python to async gives us a 10x boost in RPS (58 -> 495) and we can cut our memory usage in half.

Compared to async Python, async Rust gives us a 10x decrease in memory usage (143 MB -> 15 MB) with a 2/3 decrease in peak CPU usage (160% -> 53%), but we get 45% less RPS (495 -> 267).

Overall, async python is a magnitude more efficient than sync Python and async Rust is a magnitude more efficient than async Python.

I’m guessing there is a way to get Tokio to use more of the available CPU / memory, as there are a lot more resources available on the box.

Also the code is available on GitHub.