Three CI Optimizations That Cut Python Test Execution by 81%
Trail of Bits cut PyPI's test suite from 163s to 30s. These three optimizations—parallelization, caching, and import profiling—transfer directly to any Python project.
A slow CI pipeline is a tax on every developer on your team. Each wasted minute multiplied by every commit adds up to hours of lost productivity per week — and that’s before you account for the cost of GitHub Actions runner minutes.
In May 2025, Trail of Bits published a case study showing how they reduced PyPI’s Warehouse test suite from 163 seconds to 30 seconds — an 81% improvement — while test count grew from 3,900 to 4,700+ [1]. No tests were removed, no coverage sacrificed.
This post breaks down the three most impactful, broadly applicable optimizations from that work, with code you can copy into your own pyproject.toml and CI config today.
Optimization 1: Parallelize with pytest-xdist (67% reduction)
The single biggest gain: adding --numprocesses=auto to pytest.
# pyproject.toml
[tool.pytest.ini_options]
addopts = [
"--disable-socket",
"--durations=20",
+ "--numprocesses=auto",
]
Result: 191s → 63s on a 32-core GCP machine [1].
This works because most test suites are I/O-bound — tests spend time waiting on databases, filesystems, and network calls. pytest-xdist fans test execution across all available CPUs with zero test modification.
Database isolation
The one gotcha: parallel workers can’t share the same database. PyPI fixed this by appending the worker_id to each worker’s database name:
# conftest.py
import os
worker_id = os.environ.get("PYTEST_XDIST_WORKER", "master")
pg_db = f"tests-{worker_id}"
Coverage in parallel mode
Coverage reporting from parallel workers needs a per-worker startup hook:
# sitecustomize.py
try:
import coverage
coverage.process_startup()
except ImportError:
pass
Choosing the right worker count
| Workers | CPU-bound suite | I/O-bound suite |
|---|---|---|
auto (all cores) | Risk of thrashing | Best performance |
--numprocesses=4 | Balanced | Underutilized |
--numprocesses=logical | Good for 8+ core CI | Good for everything |
For GitHub Actions (2-core default), --numprocesses=auto maps to 2 workers. For self-hosted runners, match the core count.
Trail of Bits measured 67% reduction (191s → 63s) on a 32-core GCP n2-highcpu-32 machine — the xdist efficiency gains scale linearly with core count for I/O-bound suites [1].
Optimization 2: Cache dependencies with content-hash keys (up to 61% reduction)
The actions/setup-python@v5 built-in cache: pip key is a good start, but Adam Johnson’s pattern — caching the entire virtual environment — yields bigger wins [3]:
- uses: actions/setup-python@v5
id: setup_python
with:
python-version: '3.12'
- name: Restore cached virtualenv
id: cache-venv
uses: actions/cache@v4
with:
path: .venv
key: venv-${{ runner.os }}-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('requirements/*.txt', 'pyproject.toml') }}
- name: Install dependencies
if: steps.cache-venv.outputs.cache-hit != 'true'
run: |
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,test]"
The key insight: hash against lock files or pinned requirements, not loose version ranges. A pyproject.toml change triggers a cache miss, but an unrelated source commit hits the cache.
What to cache
| Cache target | Size | Hit speedup | Best for |
|---|---|---|---|
~/.cache/pip | ~200MB | Saves download, not install | Simple projects |
.venv/ (full venv) | ~500MB | Saves download + install (1-3 min) | Large dependency trees |
pip freeze > deps.txt snapshot | ~2KB | Quick comparison only | Debugging |
The full-venv caching approach reduced install time by 39% vs. the pip-only cache and 61% vs. no cache in benchmarks documented by Adam Johnson and Simon Willison [2][3]. For projects with 50+ dependencies (Django, FastAPI, scientific Python), this is the difference between a 4-minute install and a 45-second restore.
Poisoned cache prevention
Cache poisoning happens when pip partially fails mid-install and leaves an inconsistent .venv. Mitigate with:
- name: Validate cache integrity
if: steps.cache-venv.outputs.cache-hit == 'true'
run: python -c "import pytest, mypy, $MAIN_PACKAGE" 2>/dev/null || echo "Cache invalid, will reinstall"
Optimization 3: Cut startup overhead with testpaths + import profiling (5.4% combined)
Small wins compound. Two changes cost < 5 minutes to implement:
testpaths — 2s from 50s → 48s
If all your tests live in one directory, tell pytest to look only there:
[tool.pytest.ini_options]
testpaths = ["tests/"]
This cut collection time from 7.84s to 2.60s — a 66% reduction in discovery overhead [1].
Import profiling — find hidden startup costs
PyPI discovered that ddtrace (a production monitoring import) was adding 1.19s to every pytest --help invocation, even though it was never called in tests:
python -X importtime -c "import ddtrace" 2> import-profile.log
# parse with tuna or snakeviz to visualize
python -m tuna import-profile.log
Removing ddtrace from the test dependency group saved 3.4% of remaining test time [1].
Tools for import profiling:
| Tool | Command | Output |
|---|---|---|
python -X importtime | -X importtime -c "import your_package" | Per-module timing to stderr |
tuna | python -m tuna import-profile.log | Interactive flame graph |
import-time CLI | pip install import-time && import-time your_package | Summary table |
Run this once per quarter to catch dependency bloat. Dependencies added for a single function call in a rarely-used submodule pull in their own transitive deps, and that tax is paid on every pytest startup.
How to apply this
These three optimizations take about 2 hours total to implement and test:
Week 1 — Parallelization (1 hour)
pip install pytest-xdist- Add
--numprocesses=autotoaddoptsinpyproject.toml - Run tests locally with
pytest -n auto— if database fixtures clash, addworker_idisolation - If using coverage, add
sitecustomize.pywithcoverage.process_startup() - Merge, run CI, verify flaky count didn’t increase
Week 1 — Caching (30 minutes)
- Switch from
cache: pipto full venv cache using the workflow above - Set up hash keys against
requirements/*.txtorpyproject.toml - Add the integrity validation step
- Merge, verify first run is a cache miss (full install), second is a hit
Week 2 — Startup profiling (30 minutes)
- Run
python -X importtime -c "import your_package" 2> profile.log - Pipe through
python -m tunato identify outliers - Move large unused deps to
[tool.pytest.ini_options]markers orextras_requireconditionals - Add
testpathsconfig
Key takeaways
- pytest-xdist with
--numprocesses=autois the single highest-impact change — expect 2-8x speedup depending on core count and I/O-boundness. - Full venv caching with content-hash keys beats
cache: pipby ~39% on large projects, and costs nothing to implement [2]. - Import profiling catches bloat that nobody notices — production dependencies leak into test suites silently.
- The three optimizations stack — Trail of Bits proved 81% total reduction with zero test modifications or coverage loss [1].
- Do the cheap things first —
testpathsis a one-line config change that saves ~2s. Not exciting, but 5 minutes of work for perpetual returns.
References
[1] Trail of Bits, “Making PyPI’s test suite 81% faster,” May 2025. https://blog.trailofbits.com/2025/05/01/making-pypis-test-suite-81-faster/
[2] Adam Johnson, “GitHub Actions: Faster Python runs with cached virtual environments,” Nov 2023. https://adamj.eu/tech/2023/11/02/github-actions-faster-python-virtual-environments/
[3] Simon Willison, “GitHub Actions: Faster Python runs with cached virtual environments,” Jul 2024. https://simonwillison.net/2024/Jul/19/github-actions-faster-python/
📖 Related Reads
- ToolBrain — tool reviews, LLM comparisons, and AI workflow guides
Cross-links automatically generated from CodeIntel Log.