Testing strategy for boutique software studios: TDD as house policy

Why a 2-person studio writes more tests than most 20-person agencies — and how disciplined testing makes solo and small-team builders ship faster, not slower.

Most small software studios don’t write tests. The reasoning is usually: we’re too small, our clients won’t pay for it, we’ll do TDD when we’re bigger.

We do the opposite. As a 2-person boutique studio shipping multiple production systems in parallel, disciplined testing is what makes the pace sustainable. This post explains why and how.

The economics of testing for a 2-person studio

The standard objection to testing in a small studio: “we don’t have time.”

The standard reality: studios that don’t test spend their time differently — they spend it on regression bugs, on manual QA passes before every deploy, on “fix it Monday morning” calls from clients, on rewrites of code that nobody trusts to modify.

The math works like this. Across our active client projects in a typical month:

  • Without tests: ~30% of weekly hours spent on regression, manual QA, and rework.
  • With tests: ~10% of weekly hours spent on test maintenance, ~3% on regression and rework.

Net result: testing saves ~17% of our total monthly capacity. For a 2-person studio that’s roughly one full extra day per week, every week, forever. That day is what lets us run multiple projects without burning out.

TDD where it matters; tests-after where it doesn’t

We don’t apply TDD uniformly. We apply it where it actually pays off.

TDD (red-green-refactor) for:

  • Business logic. Any function with conditional behavior, calculations, validations. Writing the test first forces the function’s API to be designed for use, not for implementation convenience.
  • API handlers. Routes that take input, do something, return output. Tests-first ensures the contract is right.
  • Data transformations. Functions that map between formats, schemas, or systems. Critical to test invariants before implementation.

Tests-after for:

  • UI components. The right UI changes too much in the first iterations to lock down with tests too early. We test components after the design has stabilized.
  • Integration glue. Connecting third-party APIs (Stripe, OpenAI, Auth0) — write the integration first, then write tests that pin down what we expect from the third party.
  • Migrations. SQL schema changes — write them, verify them against a local D1, then add a regression test if there’s a likely-future bug.

No tests for:

  • Spikes and prototypes. Anything that’s going to be thrown away within a week.
  • One-off scripts. Migration scripts, data backfills, one-time imports. The test is “did the data look right when it landed in production?”
  • Marketing site content changes. A blog post doesn’t need a test.

The stack we use in 2026

  • Vitest for unit and integration tests. Fast, native ESM, mostly Jest-compatible. Replaced Jest in our new projects in 2024.
  • Playwright for end-to-end browser tests. Headless Chrome by default, Firefox and WebKit for cross-browser when needed.
  • Wrangler’s local dev with Miniflare for testing Cloudflare Workers against a local D1, R2, KV stack. We run Worker tests in CI against a fresh local environment.
  • Snapshot tests sparingly. Useful for stable structural output (a generated HTML email, a CSV export); harmful for UI snapshots that change often.

We don’t use Cypress for new work — Playwright has overtaken it for our purposes (faster, better DX, better cross-browser support).

Coverage targets we actually hit

Realistic numbers from our recent client production systems:

  • Business logic & API handlers: 85–95% line coverage. Most of this is the core domain logic where bugs would be most expensive.
  • UI components: 40–60% line coverage. We test behavior (clicks, form submissions, error states), not implementation details.
  • Critical paths: 100%. Auth flows, payment processing, anything that touches client billing or regulated data has full coverage with end-to-end tests.
  • Total project coverage: typically 70–85%.

We don’t chase 100%. The last 10% of coverage is disproportionately expensive to maintain and rarely catches real bugs. Better to use that time writing higher-value tests for paths that matter.

Specific patterns we’ve learned

A few non-obvious things that have made our test suites better:

Tests are documentation

Every test should read like a small spec. The test name describes the behavior in plain English. A new developer reading the test file should understand what the function does without reading the implementation.

Bad: test('user creation', ...)

Good: test('creates a new user with hashed password and a confirmation token', ...)

Test the contract, not the implementation

A test that breaks every time you refactor the function but the behavior didn’t change is a bad test. It’s coupled to implementation details. Rewrite it to test the input/output contract instead.

One-deep mocks; never mock-the-world

We mock at the boundary of our system (the third-party API, the database) — never inside our own code. If you’re mocking your own functions to test other functions, the design is wrong; refactor instead of mocking.

Failures should diagnose themselves

A good assertion failure tells you what went wrong without making you read the test. Use expect(actual).toEqual({...full expected shape...}) rather than asserting individual properties — the diff output is dramatically more useful.

Run tests in CI on every push

Cloudflare Pages + GitHub Actions makes this trivial. Tests run in CI on every PR. We don’t merge until they pass. The discipline matters more than the specific config.

What this looks like for clients

Clients don’t usually see the tests directly. They see the second-order effects:

  • Fewer “we found a bug” calls in the months after launch.
  • Confident refactoring when requirements change — no “we can’t touch that part of the code without risking everything.”
  • Fast incident response when something does go wrong — we can write a failing test that reproduces the issue, fix it, and ship the fix in hours rather than days.
  • Smooth handoff if the client ever wants to switch developers. A test suite is the most useful documentation a codebase can carry.

Most clients don’t ask whether we test. The good ones do, and they’re the ones we want to work with long-term.

When testing fails

The honest part. Tests aren’t a panacea. We’ve shipped bugs with passing test suites. The failure modes:

  • Tests that test the wrong thing. A test passes because we tested implementation, not behavior.
  • Untested combinations. Each test passes; two of them together produce a bug.
  • Mocked services that have changed. The third-party API updated its behavior; our mock didn’t.
  • Environmental drift. Tests pass on the laptop, fail on the actual Cloudflare runtime.

These are all caught faster when you have tests than when you don’t. They’re not arguments against testing; they’re reminders that tests are a tool, not a guarantee.

The cultural piece

The hardest part of testing isn’t technical. It’s developing the habit and protecting it under deadline pressure. The pattern we see in small studios that try to “add testing later”: they never do. Testing as a culture has to be present from the first commit of a project, not retrofitted in month 6.

For a boutique studio, this means: every new project starts with a test runner configured, a CI pipeline running on every push, and the first feature ships with tests. After three projects of doing it this way, it’s automatic.

If you’re a fellow studio thinking about how to improve your testing practice, the single highest-leverage move is: ship your next new project with tests from day one. Don’t try to backfill old projects. Just start the next one right.

If you’re a client wondering whether your software project has a healthy test suite, ask the engineering team to show you the CI pipeline. The answer is right there.

Tagged #testing#tdd#quality#methodology#boutique

FAQ

Frequently asked questions.

The questions clients ask most after reading this.

Doesn't writing tests slow a small studio down?
Short-term, slightly. Long-term, the opposite. Tests slow you down in week 1 and speed you up from week 2 onward, because they let you refactor confidently and ship without manual regression sweeps. For a 2-person studio shipping multiple projects in parallel, this leverage is the difference between burnout and sustainable pace.
Do you write tests before code (true TDD) or after?
We write tests before code for any non-trivial business logic. We write tests after code for UI and integration glue. The distinction matters — TDD for logic gives clean designs and total coverage; tests-after for UI lets us iterate on the design before locking in expectations.
What test coverage do you actually achieve in client projects?
85–95% on business logic and API handlers. 40–60% on UI components. 100% on critical paths (auth, payment, anything regulatory). We don't chase 100% overall — the last 10% of coverage costs disproportionately more to maintain than the value it provides.
What's your test stack in 2026?
Vitest for unit and integration tests (fast, native ESM, compatible Jest API). Playwright for end-to-end browser tests. Miniflare or Wrangler's local dev for Cloudflare Worker tests against a local D1. We've largely moved on from Jest and Cypress for new projects.
Do you actually write tests for everything you ship?
Yes for client production systems. Yes for our own internal tools. No for spikes, prototypes, or one-off scripts. The rule is: if it's going to run on its own in production with real users or real money, it has tests. If it's exploratory or one-off, the test is your judgement.

More development reading

Related from the lab.

All field notes