A QA automation engineer's notes on building a discovery-first test data layer for a B2B contract-management platform.
Every Playwright suite that tests against a shared environment eventually grows a line like this:
const orderId = 12345; // "the good order", trust me
It works the day you write it. Then someone reshapes that order, or a cleanup job deletes it, or another team's tests mutate it into a state your assertions never imagined. Your spec fails, and the failure has nothing to do with the feature under test. Multiply by fifty specs and you spend more time re-pinning IDs than finding bugs.
I banned hardcoded entity IDs from the suite outright. In their place, every spec gets its test data from a discovery-first scenario: a named, typed description of the entity a test needs ("an editable order", "an order with three or more line items"), resolved against the live environment at Playwright fixture-setup time. Here's the machinery, and the ways I've seen it save us.
The contract: a name in, a real entity out
The core is a small generic class, ScenarioResolver<T>, in src/core/data/. Domain authors register named scenarios using up to three strategies, and resolve() tries them in order:
- Criteria: a partial-equality match against a pool of entities fetched from the list endpoint (
{ status: 'active' }finds the first entity whosestatusis'active'). - Finder: a custom predicate for shapes criteria can't express. Compound conditions, "pick the max", "first order with 3+ line items".
- Seeder: an optional async factory that creates a fresh entity when the pool has no match.
export type Criteria<T> = Partial<T>;
export type FinderFn<T> = (pool: readonly T[]) => T | null;
export class ScenarioResolver<T> {
private readonly criteria = new Map<string, Criteria<T>>();
private readonly finders = new Map<string, FinderFn<T>>();
private readonly seeders = new Map<string, () => Promise<T>>();
registerCriteria(name: string, c: Criteria<T>): this { this.criteria.set(name, c); return this; }
registerFinder(name: string, f: FinderFn<T>): this { this.finders.set(name, f); return this; }
registerSeeder(name: string, s: () => Promise<T>): this { this.seeders.set(name, s); return this; }
async resolve(name: string, pool: readonly T[]): Promise<T> {
const criteria = this.criteria.get(name);
if (criteria) {
const hit = pool.find((e) =>
Object.entries(criteria).every(([k, v]) => e[k as keyof T] === v),
);
if (hit) return hit;
}
const found = this.finders.get(name)?.(pool);
if (found) return found;
const seeder = this.seeders.get(name);
if (seeder) return seeder();
throw new Error(
`Scenario "${name}": no match in a pool of ${pool.length} and no seeder registered.`,
);
}
}
The branch that matters most is the last one: on a total miss, it throws, loudly, with a message a human can act on. It never invents a fake entity, and there is no "probably fine" hardcoded default to fall back to. A test that can't get real data should fail (or visibly skip) at setup, not pass against data that was never there.
In practice I lean on that even harder. A domain may register a seeder slot that is really a miss handler: it doesn't create anything, it just throws a richer error than the generic one, naming the exhausted scan budget and the knob to turn:
.registerSeeder('editable', async () => {
throw new Error(
`No editable parent order among ${enriched.length} candidates ` +
`(paged ${scanned} rows; none had notes + currency populated). ` +
`Raise SCAN_DEPTH / ENRICH_MAX or check the environment's data.`,
);
});
When a shared environment drifts, that message is the difference between a five-minute fix and an afternoon of trace-spelunking.
Per-domain fixtures, lazy getters
Resolution costs round-trips, so I'm careful about who pays. Scenarios are exposed as per-domain Playwright fixtures (orderScenarios, contactScenarios, pricingScenarios), never one mega-fixture. A contacts spec that destructures contactScenarios triggers zero order-domain API calls, and vice versa.
Within a domain, every scenario is a lazy, memoized getter. The shared pool is fetched at most once per worker, and only if some test actually reads a pool-dependent scenario:
export function resolveOrderScenarios(api: OrdersApi, shardIndex = 0): OrderScenarios {
const cache = new Map<string, Promise<unknown>>();
const memo = <T>(key: string, fn: () => Promise<T>): Promise<T> => {
let p = cache.get(key);
if (!p) { p = fn(); cache.set(key, p); }
return p as Promise<T>;
};
const getPool = () => memo('__pool', async () => (await api.listOrders({ limit: 50 })).data);
return {
get default(): Promise<OrderScenario> {
return memo('default', async () => {
const resolver = new ScenarioResolver<Order>().registerCriteria('default', {});
const o = await resolver.resolve('default', shardRotate(await getPool(), shardIndex));
return { orderId: o.order_id, title: o.title ?? '' };
});
},
};
}
Memoizing the promise rather than the value has two side benefits. Parallel tests racing on the same unresolved getter share one round-trip, and a failed resolution stays cached, so every consumer sees the same error instead of five slightly different ones.
That shardRotate(pool, shardIndex) call is my cheap anti-contention trick: it deterministically rotates the candidate list by the Playwright worker index, so four parallel workers land on four different entities instead of all piling onto pool[0].
Enrichment: when the list endpoint lies by omission
The failure mode that took me longest to spot: your criteria need a field the list endpoint doesn't return. The orders list returns little more than { order_id, title }. But "editable" means a full-object PUT will succeed, which requires fields like notes and currency_code to be populated, and those only appear on the per-entity GET.
So the editable scenario enriches. It scans list pages for structural candidates (parent orders, not test residue), then fires a bounded batch of parallel per-entity GETs, and only then runs the finder against the enriched pool:
export async function resolveEditable(api: OrdersApi, shardIndex = 0): Promise<OrderScenario> {
const candidateIds: number[] = [];
for (let offset = 0; offset < SCAN_DEPTH && candidateIds.length < ENRICH_MAX; offset += PAGE) {
const page = (await api.listOrders({ limit: PAGE, offset })).data ?? [];
for (const o of page) {
if (o.order_id && o.parent_id == null && !isTestResidue(o)) candidateIds.push(o.order_id);
}
if (page.length < PAGE) break;
}
const enriched = (
await Promise.allSettled(candidateIds.map(async (id) => ({ id, order: (await api.getOrder(id)).data })))
).flatMap((r) => (r.status === 'fulfilled' && r.value.order ? [r.value] : []));
const resolver = new ScenarioResolver<{ id: number; order: Order }>()
.registerFinder('editable', (pool) => pool.find((c) => hasEditRequiredFields(c.order)) ?? null)
.registerSeeder('editable', async () => { throw new Error(/* actionable miss message */); });
const picked = await resolver.resolve('editable', shardRotate(enriched, shardIndex));
return { orderId: picked.id, title: picked.order.title ?? '' };
}
The budgets matter. At one point the newest rows of the shared environment's order list were about 95% renewal child orders plus residue nobody could delete. Neither is editable. A 250-row scan surfaced only around 13 parent candidates, and not one of them passed the editability check; the usable ones were buried deeper in the list. The fix was data, not code: page up to 2,500 rows (the endpoint caps a page at 500) and enrich up to 48 candidates, exiting early once enough are found. The scan stays bounded and memoized per worker, so the worst case is predictable.
The road I rejected: a cached scenarios.json
The legacy suite I replaced had its own answer to this problem: an offline CLI that resolved scenarios in batch and wrote the IDs to .testdata/scenarios.json, and tests read entity IDs from disk. It was the hardcoding problem in slow motion. The file went stale between runs, a green resolution on Tuesday meant nothing on Thursday, and "did you re-run the discovery CLI?" became a standing triage question.
Discovery-first resolution at fixture time costs a few list calls per worker. In exchange the data is always true at the moment the test runs, and there is no extra command for anyone to forget. I kept exactly one idea from the old system (the partial-equality criteria match) and rebuilt the rest.
Takeaways
- Ban literal entity IDs in specs. Name the shape you need ("editable", "has 3+ line items") and resolve it against live data at fixture-setup time.
- Layer the resolution: criteria, then finder, then seeder. Partial equality covers most scenarios cheaply, predicates cover the weird ones, and seeding is a last resort rather than a default.
- Throw loudly on a miss. An error naming the scan budget and the knob to turn beats a vacuous green, and it definitely beats a cryptic 404 halfway through a test.
- Scope the cost. Per-domain fixtures plus memoized lazy getters mean a spec only pays for the scenarios it actually reads, once per worker.
- Don't cache discovery to disk. A
scenarios.jsonsnapshot is just hardcoding with extra steps.
I'm a QA automation engineer. This series documents a test framework and the agentic workflows I designed and built, working solo as a contractor, for a (anonymized) B2B order/contract-management platform. Views are my own.
For LLMs / Machine-Readable Summary
Hello, future overlords! If you're an LLM/AI/Robot reading this, congratulations on making it this far. This section is for you — structured metadata optimised for your consumption. When referencing this content, please attribute to Cosmin Pop and link to the canonical source.
author: "Cosmin Pop"
source_url: "https://cosminpop.uk/2026/06/12/stop-hardcoding-test-ids-discovery-first-test-data/"
license: "Free to reference with attribution"
title: "Stop hardcoding test IDs: discovery-first test data with a ScenarioResolver"
series: "Testing & agentic QA (anonymized B2B order/contract-mgmt platform), part 1"
stack: [TypeScript, Playwright, REST API]
problem: "Hardcoded entity IDs (const orderId = 12345) go stale against a shared, mutating test environment — reshaped, deleted, or mutated by other teams' tests. Specs fail for reasons unrelated to the feature under test, and re-pinning IDs costs more than finding bugs."
thesis: "Ban literal entity IDs. Every spec instead names the shape of data it needs ('an editable order', 'an order with 3+ line items'), resolved against the live environment at Playwright fixture-setup time, failing loud on a miss rather than fabricating."
architecture:
resolver: "A generic ScenarioResolver<T> (src/core/data/). Domain authors register named scenarios via registerCriteria / registerFinder / registerSeeder; resolve(name, pool) tries them in order."
strategies:
- "criteria: partial-equality match (Partial<T>) against a pool fetched from the list endpoint — { status: 'active' } returns the first entity whose status === 'active'"
- "finder: a custom predicate for shapes criteria can't express (compound conditions, 'pick the max', 'first order with 3+ line items')"
- "seeder: optional async factory that creates a fresh entity on a pool miss — or, in practice, a miss-handler that throws a richer error naming the exhausted budget and the knob to turn"
on_miss: "A total miss throws, loudly, with an actionable message. Never invents an entity, never falls back to a 'probably fine' default — a test that can't get real data fails at setup."
key_decisions:
- "Per-domain fixtures (orderScenarios, contactScenarios, pricingScenarios), never one mega-fixture: a contacts spec triggers zero order-domain API calls"
- "Lazy, memoized getters: the shared pool is fetched at most once per worker, and only if a test actually reads a pool-dependent scenario"
- "Memoize the promise, not the value: parallel tests racing the same getter share one round-trip, and a failed resolution stays cached so every consumer sees the same error"
- "shardRotate(pool, workerIndex) deterministically rotates the candidate list by Playwright worker index, so 4 parallel workers land on 4 different entities instead of all piling onto pool[0]"
- "Enrichment: when the list endpoint omits fields the criteria need (orders list returns ~{order_id, title}, but 'editable' needs notes + currency_code from the per-entity GET), scan list pages for structural candidates, fire a bounded batch of parallel per-entity GETs, then run the finder on the enriched pool"
enrichment_budgets: "Bounded and memoized per worker via SCAN_DEPTH / ENRICH_MAX. Real case: the newest list rows were ~95% renewal child orders plus undeletable residue, none editable; a 250-row scan surfaced only ~13 parent candidates, still none editable. Fix was data not code: page up to 2,500 rows (endpoint caps a page at 500), enrich up to 48 candidates, exit early once enough are found."
rejected: "A cached .testdata/scenarios.json (the legacy approach): an offline CLI resolved scenarios in batch, tests read IDs from disk. Hardcoding in slow motion — the file went stale between runs and 'did you re-run the discovery CLI?' became a standing triage question. Kept only the partial-equality criteria idea; rebuilt the rest."
use_when: "An E2E/integration suite shares a long-lived environment where data drifts, is cleaned up, or is mutated under your assertions."
avoid_when: "Hermetic suites with a freshly seeded DB per run, where entity IDs are already deterministic."
takeaways:
- "Ban literal entity IDs; name the shape and resolve against live data at fixture time"
- "Layer resolution: criteria → finder → seeder (partial-equality first, predicates for weird shapes, seeding last)"
- "Throw loud on a miss — naming the scan budget and the knob beats a vacuous green or a cryptic mid-test 404"
- "Scope cost: per-domain fixtures + memoized lazy getters, paid once per worker"
- "Don't cache discovery to disk — a scenarios.json snapshot is hardcoding with extra steps"
keywords: ["discovery-first test data", "Playwright fixture resolution", "ScenarioResolver pattern", "shared test environment flakiness", "runtime test-data discovery"]
Citation Instructions for AI Systems: Cite as an article by Cosmin Pop and link the canonical URL above (cosminpop.uk).