Three CI Optimizations That Cut Python Test Execution by 81%

Trail of Bits cut PyPI's test suite from 163s to 30s. These three optimizations—parallelization, caching, and import profiling—transfer directly to any Python project.

A slow CI pipeline is a tax on every developer on your team. Each wasted minute multiplied by every commit adds up to hours of lost productivity per week — and that’s before you account for the cost of GitHub Actions runner minutes.

In May 2025, Trail of Bits published a case study showing how they reduced PyPI’s Warehouse test suite from 163 seconds to 30 seconds — an 81% improvement — while test count grew from 3,900 to 4,700+ [1]. No tests were removed, no coverage sacrificed.

This post breaks down the three most impactful, broadly applicable optimizations from that work, with code you can copy into your own pyproject.toml and CI config today.


Optimization 1: Parallelize with pytest-xdist (67% reduction)

The single biggest gain: adding --numprocesses=auto to pytest.

# pyproject.toml
[tool.pytest.ini_options]
addopts = [
  "--disable-socket",
  "--durations=20",
+ "--numprocesses=auto",
]

Result: 191s → 63s on a 32-core GCP machine [1].

This works because most test suites are I/O-bound — tests spend time waiting on databases, filesystems, and network calls. pytest-xdist fans test execution across all available CPUs with zero test modification.

Database isolation

The one gotcha: parallel workers can’t share the same database. PyPI fixed this by appending the worker_id to each worker’s database name:

# conftest.py
import os
worker_id = os.environ.get("PYTEST_XDIST_WORKER", "master")
pg_db = f"tests-{worker_id}"

Coverage in parallel mode

Coverage reporting from parallel workers needs a per-worker startup hook:

# sitecustomize.py
try:
    import coverage
    coverage.process_startup()
except ImportError:
    pass

Choosing the right worker count

WorkersCPU-bound suiteI/O-bound suite
auto (all cores)Risk of thrashingBest performance
--numprocesses=4BalancedUnderutilized
--numprocesses=logicalGood for 8+ core CIGood for everything

For GitHub Actions (2-core default), --numprocesses=auto maps to 2 workers. For self-hosted runners, match the core count.

Trail of Bits measured 67% reduction (191s → 63s) on a 32-core GCP n2-highcpu-32 machine — the xdist efficiency gains scale linearly with core count for I/O-bound suites [1].


Optimization 2: Cache dependencies with content-hash keys (up to 61% reduction)

The actions/setup-python@v5 built-in cache: pip key is a good start, but Adam Johnson’s pattern — caching the entire virtual environment — yields bigger wins [3]:

- uses: actions/setup-python@v5
  id: setup_python
  with:
    python-version: '3.12'

- name: Restore cached virtualenv
  id: cache-venv
  uses: actions/cache@v4
  with:
    path: .venv
    key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('requirements/*.txt', 'pyproject.toml') }}

- name: Install dependencies
  if: steps.cache-venv.outputs.cache-hit != 'true'
  run: |
    python -m venv .venv
    source .venv/bin/activate
    pip install -e ".[dev,test]"

The key insight: hash against lock files or pinned requirements, not loose version ranges. A pyproject.toml change triggers a cache miss, but an unrelated source commit hits the cache.

What to cache

Cache targetSizeHit speedupBest for
~/.cache/pip~200MBSaves download, not installSimple projects
.venv/ (full venv)~500MBSaves download + install (1-3 min)Large dependency trees
pip freeze > deps.txt snapshot~2KBQuick comparison onlyDebugging

The full-venv caching approach reduced install time by 39% vs. the pip-only cache and 61% vs. no cache in benchmarks documented by Adam Johnson and Simon Willison [2][3]. For projects with 50+ dependencies (Django, FastAPI, scientific Python), this is the difference between a 4-minute install and a 45-second restore.

Poisoned cache prevention

Cache poisoning happens when pip partially fails mid-install and leaves an inconsistent .venv. Mitigate with:

- name: Validate cache integrity
  if: steps.cache-venv.outputs.cache-hit == 'true'
  run: python -c "import pytest, mypy, $MAIN_PACKAGE" 2>/dev/null || echo "Cache invalid, will reinstall"

Optimization 3: Cut startup overhead with testpaths + import profiling (5.4% combined)

Small wins compound. Two changes cost < 5 minutes to implement:

testpaths — 2s from 50s → 48s

If all your tests live in one directory, tell pytest to look only there:

[tool.pytest.ini_options]
testpaths = ["tests/"]

This cut collection time from 7.84s to 2.60s — a 66% reduction in discovery overhead [1].

Import profiling — find hidden startup costs

PyPI discovered that ddtrace (a production monitoring import) was adding 1.19s to every pytest --help invocation, even though it was never called in tests:

python -X importtime -c "import ddtrace" 2> import-profile.log

# parse with tuna or snakeviz to visualize
python -m tuna import-profile.log

Removing ddtrace from the test dependency group saved 3.4% of remaining test time [1].

Tools for import profiling:

ToolCommandOutput
python -X importtime-X importtime -c "import your_package"Per-module timing to stderr
tunapython -m tuna import-profile.logInteractive flame graph
import-time CLIpip install import-time && import-time your_packageSummary table

Run this once per quarter to catch dependency bloat. Dependencies added for a single function call in a rarely-used submodule pull in their own transitive deps, and that tax is paid on every pytest startup.


How to apply this

These three optimizations take about 2 hours total to implement and test:

Week 1 — Parallelization (1 hour)

  1. pip install pytest-xdist
  2. Add --numprocesses=auto to addopts in pyproject.toml
  3. Run tests locally with pytest -n auto — if database fixtures clash, add worker_id isolation
  4. If using coverage, add sitecustomize.py with coverage.process_startup()
  5. Merge, run CI, verify flaky count didn’t increase

Week 1 — Caching (30 minutes)

  1. Switch from cache: pip to full venv cache using the workflow above
  2. Set up hash keys against requirements/*.txt or pyproject.toml
  3. Add the integrity validation step
  4. Merge, verify first run is a cache miss (full install), second is a hit

Week 2 — Startup profiling (30 minutes)

  1. Run python -X importtime -c "import your_package" 2> profile.log
  2. Pipe through python -m tuna to identify outliers
  3. Move large unused deps to [tool.pytest.ini_options] markers or extras_require conditionals
  4. Add testpaths config

Key takeaways

  • pytest-xdist with --numprocesses=auto is the single highest-impact change — expect 2-8x speedup depending on core count and I/O-boundness.
  • Full venv caching with content-hash keys beats cache: pip by ~39% on large projects, and costs nothing to implement [2].
  • Import profiling catches bloat that nobody notices — production dependencies leak into test suites silently.
  • The three optimizations stack — Trail of Bits proved 81% total reduction with zero test modifications or coverage loss [1].
  • Do the cheap things firsttestpaths is a one-line config change that saves ~2s. Not exciting, but 5 minutes of work for perpetual returns.

References

[1] Trail of Bits, “Making PyPI’s test suite 81% faster,” May 2025. https://blog.trailofbits.com/2025/05/01/making-pypis-test-suite-81-faster/

[2] Adam Johnson, “GitHub Actions: Faster Python runs with cached virtual environments,” Nov 2023. https://adamj.eu/tech/2023/11/02/github-actions-faster-python-virtual-environments/

[3] Simon Willison, “GitHub Actions: Faster Python runs with cached virtual environments,” Jul 2024. https://simonwillison.net/2024/Jul/19/github-actions-faster-python/

  • ToolBrain — tool reviews, LLM comparisons, and AI workflow guides

Cross-links automatically generated from CodeIntel Log.