Testing Tenets

Some thoughts on testing I’ve been mulling over.

Style: No nesting

Makes search easier, with nesting you’ll have tests named it "success" do, and there will be bunch of them.

Style: No before each

before_each and friends introduce globals – they’re often nullable so you have ! (or equivalent) sprinkled everywhere.

Style: Use factory functions for test data

This is mostly calling out pytest fixtures.

While JetBrains’ IDEs and VSCode (via Pylance) support navigating to the fixture definitions from their usages, fixtures in a general are globals that are hard to maintain as projects get larger.

Much easier to have factory functions, e.g., make_user(name: str) -> User. But they can also have issues.

Style: suffix test files with `_test.`, `.test.`, `.spec.` instead of prefixing them

I like that foo_test.py sorts next to foo.py, test_foo.py doesn’t.

Style: Tests exist next to the code they’re testing

Different ecosystems have different approaches around this, but having the tests close to the code they’re testing makes it easier to glance coverage.

DX: Editor Support

Running a one-off test via your editor of choice should be easy.

Debugging and assertion diffs should also be easy.

Vitest has a good VSCode integration.

DX: Watch mode

There should be an option to run tests in watch mode and rerun on file changes, Vitest does this well.

Correctness: No order dependencies

Ensures you can run a one off test and believe its results.

Also makes sharding at the test level easier.

See i_suck_and_my_tests_are_order_dependent! and related discussion.

Correctness: Tests actually commit transactions

If you’re project involves database transactions you’ll want to actually commit them in your tests. Django doesn’t do this by default, and it bites you when you have more complicated queries and setups. This also means you shouldn’t mock your database.

Correctness: Easy way to mock common components

If you’re tests use S3, SQS, etc. there should be an easy and robust way to mock them.

Perf: Don’t cleanup after each test

Dropping the database, truncating tables, clearing directories, etc. are too slow to run after each test.

Let the test data stick around, tests should be robust enough to not require a clean slate.

Perf: All tests run in parallel

This requires tests to be okay with other data existing in the database, which seems like a fair trade-off, production will have a bunch of preexisting data anyways.

Flakiness might be reason to avoid running everything in parallel but that’s an issue that can be handled on its own.

Also the test runner should take advantage of multiple cores!

Perf: Sharding

Shard at the test level, not file/module.

Sharding also enables spreading tests across VMs in CI.

Perf: Asyncio support

Somewhat Python specific, but async support should be builtin.

Perf: Purity

By default any Sentry errors, warnings, unmocked network requests generated in the test should fail the test.

Perf: Limits

Each test should have a time limit, which should be adjustable on a one off basis.

Perf: Flakiness detection and quarantining

Requires integration into CI, but test flakiness should be automatically detected and the tests should be disabled until they’re fixed.

Perf: Easy performance info

It should be easy to find slow tests.

Perf: Speed

Tests should be fast to run.