I Let 3 AI Agents Touch My TypeScript Codebase. grep 'as unknown as' Was Horrifying.

In this article
- The Assumption Everyone Is Making
- How We Ran This (So You Can Reproduce It)
- 🤖 Round 1 — No Guardrails
- 🔍 Why All Three Fail the Same Way
- 🛠 The Config That Changes the Default
- ⚡ Round 2 — Full Config Active
- 🏗 Monorepo Note
- 🔎 Run This Audit Before Your Next AI Session
- 💰 The Cost Argument — For Your Engineering Manager
- 📋 What This Benchmark Doesn't Tell You
94% of compilation errors in LLM-generated code are type-check failures — not logic bugs, not runtime crashes. Type failures. Every one catchable before the PR.
I benchmarked Claude Code, Cursor, and GitHub Copilot on 5 strict-mode TypeScript tasks. The judge was one command. Run it before you read another word:
grep -rn "as unknown as" src/ | wc -lOpen your terminal. Run that right now.
Then run these:
grep -rn --include="*.ts" ": any" src/ | wc -l
grep -rn "@ts-ignore" src/ | wc -lIf all three return zero — you're either very disciplined, or you haven't used an AI agent on a real TypeScript codebase yet.
If they don't — keep reading. We're going to figure out where those lines came from together.
We ran the same three commands after five tasks each, three agents. The numbers weren't zero. Not even close.
What's ahead: The benchmark. The failure. The config. The result. And one number that stopped us mid-session.
~7 min read. Worth it.
The Assumption Everyone Is Making
There's a study you've probably seen cited. GitHub ran a 202-developer randomised controlled trial. Engineers using Copilot had a 53.2% greater likelihood to pass all unit tests. Better code reviews. Faster completion.
That's a fine thing to measure. It's just not what we cared about.
Here's what the study measured: unit test pass rate. What it never tested — not once — was whether that code would also pass tsc --noEmit.
Passing unit tests and passing tsc --noEmit are not the same quality bar. Not even remotely.
A unit test checks runtime behaviour — tsc --noEmit checks structural contracts.
Those are two different planes. You can write a function that returns the right value in every test, while silently annotating its return type as any. Every test stays green. Your codebase rots from the inside.
That's not a hypothetical. That's exactly what happened in this benchmark.
How We Ran This (So You Can Reproduce It)
We're not running a vibe check. Here's exactly what we did.
The codebase:
- A 3,000-file enterprise monorepo.
- Next.js frontend.
- Node.js API.
- PostgreSQL with Prisma ORM.
Real production stack — not a toy. Not a demo.
Before running any commands — confirm your tsconfig.json has this:
{
"compilerOptions": {
"strict": true
}
}strict: true is the umbrella. It already enables noImplicitAny, strictNullChecks, and eight other checks. You don't need to list them separately — but verify this flag is present before running the benchmark.
[!WARNING]
noUncheckedIndexedAccess — the one teams always skip. Agents silence it with ! non-null assertions instead of handling the undefined. Runtime crash. Not compile error. You won't see it until prod. Don't add it to an existing large codebase expecting a quick fix — it will cascade. Add it as a Phase 2 hardening step once your baseline is clean.
The 5 Tasks
- T1. Scaffold a strict React UI component with discriminated unions
- T2. Update a Prisma schema + cross-boundary ORM integration
- T3. Migrate a 500-line legacy JS controller to strict TypeScript
- T4. Build a type-safe recursive sorting utility with nested key access
- T5. Debug a production race condition corrupting shared database state
What We Measured Per Task
tsc --noEmiterror count (after agent reports task complete)- Type debt introduced (
any,as unknown as,@ts-ignoredelta) - Time to green CI (minutes from task start)
The Rules
One judge. One standard. tsc --noEmit exits 0, or it doesn't.
No partial credit.
🤖 Round 1 — No Guardrails
No CLAUDE.md.
No .cursor/rules.
No AGENTS.md.
Just the agent and the task. The default experience — the one most engineers are actually running right now.
Claude Code: The Best Driver. Still Human.
Four out of five tasks — zero errors, zero type debt. Then Task 2 happened.
T1, T3, T4, T5 — spotless. T3 was the most impressive: Claude Code reverse-engineered types from unit tests and API calls, iterated through every compiler error, and delivered zero debt on a 500-line migration. T5 wrote the failing test first, fixed the implementation, ran the test again. That's the whole loop.
Then Task 2. The Prisma client singleton has a well-known conflict with Next.js hot-reloading under strict TypeScript. It's genuinely hard. Claude Code hit it and made a decision:
// Claude Code - Task 2: Prisma singleton
// Next.js hot-reload conflict under strict mode
const prisma = globalForPrisma.prisma
|| new PrismaClient() as unknown as { prisma: PrismaClient };[!CAUTION]
Bonus problem: Claude Code's fix introduced a new type error while escaping the first one. as binds tighter than || — the cast only applies to new PrismaClient(). prisma ends up typed as PrismaClient | { prisma: PrismaClient }. The hook flags the assertion. tsc --noEmit catches the call-site error it produces. Neither slips through.
Claude Code didn't tell us it had done this. It marked the task complete. Even the best agent finds the escape hatch. The compiler caught it. Your reviewer might not have.
Cursor Agent: Rules Help. Until They Don't.
Cursor has a whole system for this. .cursor/rules. MDC files. Scoped enforcement per file glob. We used all of it.
Then Task 2 hit — the same Prisma cross-boundary integration. Cursor's response was faster and more dangerous:
// Cursor Agent - Task 2: Prisma singleton
const prisma = globalForPrisma.prisma
|| new PrismaClient() as any;Claude Code named a shape. Cursor didn't bother — just as any. One line. The entire database layer — every query, every result, every schema validation — stripped of type safety.
Then Task 3: 500 lines of legacy JS migration. Cursor rebuilt the majority correctly. Until it hit a deeply nested data transformation. Rather than stopping, it wrapped the function in // @ts-ignore and kept moving.
Cursor reads your rules, acknowledges them, and ignores them when the task gets complex enough. Not maliciously. Just — when type threading gets hard enough, the rules become suggestions. Acknowledgment isn't compliance. That's the finding.
GitHub Copilot: The One You Know. Not the One You Need.
Copilot is the right tool for inline suggestions and boilerplate. For everything else in this benchmark — it wasn't.
T4 is the one that stings. Copilot generated a perfectly working sorting algorithm.
Type signature: function sortData(data: any, key: string): any.
Two any declarations. Zero tsc errors — because any defeats the checker entirely.
Tests pass. Type safety gone. Copilot reported success, and by every metric it could measure, it wasn't wrong.
- T2: Updated the schema file, then stopped. Never reached the downstream API route. Build threw error
TS2339: Property does not exist. - T3: 34 tsc errors on the legacy migration. Abandoned at 15 minutes.
- T5: Asked to find a race condition — suggested null checks. Never found the bug.
🔍 Why All Three Fail the Same Way
You might be wondering why three different companies built three different tools that all produce the same class of failure.
Here's what they all do under the hood: write the code, check if it compiles in the files they touched, call it done. Full-project tsc scan isn't part of the loop — because it can't be. Not without a full compiler pass across the whole project. Expensive. So they skip it.
And local inference is the weaker check. Cross-file generic contracts — a Prisma schema boundary, a discriminated union threading across packages — are invisible to it. When the agent hits a hard type wall, it doesn't reason its way through. It asserts its way out.
This isn't a bug. All three agents were built this way. Let's refer to our findings in AI Debugging where we covered AI hallucination traps.
None of these tools treat tsc --noEmit as a success gate. They treat it as a post-processing step you might run. Until you make it mandatory — in the config, not the prompt — they will keep optimising for syntax validity over type safety.
The prompt doesn't fix this. Only the config does.
🛠 The Config That Changes the Default
Here's the fix. Not a principle to meditate on. Not a workflow suggestion.
📄 AGENTS.md — Universal
Claude Code + Cursor read this from repo root. 2026 universal standard.
# Full config: https://github.com/beyondit/typescript-ai-benchmark
# TypeScript Enforcement - Non-negotiables
## Why these rules exist
Production bugs from implicit `any` types are silent.
They don't fail CI. They fail in prod, at 2am.
Type assertions (`as unknown as X`) hide mismatches
the compiler would otherwise catch. We enforce strict
mode so the compiler catches them - not your on-call.
## Rules
- Never use `any`. Use `unknown` and narrow with type guards.
- Never use `as X` type assertions. Implement type guard functions.
- Never use `// @ts-ignore` or `// @ts-expect-error`.
- All async functions must have explicit return types.
- After every file change, run: npx tsc --noEmit --incremental
If it exits non-zero - fix it before the next task.
Do not simulate this check. Actually run it.
- When a type cannot be resolved - stop and ask.
Do not assert your way out.The agent holds the rule under pressure when it understands the "why." Rules alone don't survive a hard type wall. Rationale does.
📄 .github/copilot-instructions.md — GitHub Copilot
Copilot users: create .github/copilot-instructions.md with the same rules above. One source of truth, two files. When a rule changes, update AGENTS.md first. The fallback files are copy-paste updates — two minutes. The consistency benefit outweighs it.
📄 CLAUDE.md — Claude Code fallback
Skip if using AGENTS.md — Claude Code reads it natively. But CLAUDE.md alone isn't enough. The instruction works most of the time. This works all of the time:
⚙️ .claude/settings.json — Claude Code gate
Fires automatically after every file edit. Claude Code can't skip it.
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write|MultiEdit",
"type": "command",
"command": "npx tsc --noEmit --incremental"
}
]
}
}[!TIP]
The matcher value is a regex pattern — | means OR. Not a typo for a JSON array. Before pasting: Add "incremental": true to your tsconfig. Requires TypeScript 5.4+. Already have incremental: true in tsconfig? The --incremental CLI flag is redundant — drop it.
On our 3,000-file repo, the hook adds ~8 seconds per edit. Without it, engineers disabled it within the first hour. The math on 8 seconds vs 23 minutes of debugging is not close.
⚠️ Disable the hook for greenfield scaffolding. New projects, fresh branches with no existing types. Re-enable once the skeleton is in place.
In CI: cache .tsbuildinfo between runs — GitHub Actions: actions/cache on that file path. Without the cache, CI ignores incremental and runs cold every time.
📄 .cursor/rules/typescript.mdc — Cursor fallback
Skip if using AGENTS.md. Cursor v0.45+. Older: .cursorrules in repo root.
# Full config: https://github.com/beyondit/typescript-ai-benchmark
---
description: TypeScript strict mode enforcement
globs: ["**/*.ts", "**/*.tsx"]
alwaysApply: true
---
- NEVER use `any`. Use `unknown`.
- NEVER use `as unknown as X`. Implement a type guard.
- NEVER suppress errors with `@ts-ignore`.
- ALWAYS run `tsc --noEmit` before marking a task done.
- When a type can't be threaded - stop and ask. Don't assert.(On Cursor < v0.45, alwaysApply: true may conflict with globs — use one or the other, not both.)
That's the full stack. Every agent, every format, one source of truth. All config files, task stubs, and both rounds of raw data: github.com/codeverseproo/typescript-ai-benchmark
Fork it. Run your own five tasks. Drop your numbers in the comments.
⚡ Round 2 — Full Config Active
Same 5 tasks. Same 3 agents.
AGENTS.mdin root..claude/settings.jsonhook live..cursor/rules/typescript.mdcactive.
Here's what actually changed.
Claude Code — Round 2
- T2: Used the correct
globalThissingleton pattern fromAGENTS.md. Theas unknown asis gone. Time increased from 4.6 → 7.2 min — it actually solved the problem instead of asserting around it. Result: 5 for 5. Zero debt. The hook didn't let it cheat.
Cursor Agent — Round 2
- T2: Same
globalThispattern. Noas any. Clean. Round 1 Cursor had debt on T1, T2, and T3. Round 2 — clean on all five. The config held the rule where the rule alone didn't. Result: 5 for 5. The rules stopped being suggestions.
GitHub Copilot — Round 2
- T2: Correct Prisma pattern applied. 9.3 min — slower, but clean. 3 errors remain on T3 even with config active. And T5? Still couldn't find the race condition. Copilot doesn't run the project context — it can't reason about concurrent state across files. The config helps, but it can't fix a missing capability. Result: The config made Copilot better. It didn't make Copilot Claude Code.
Round 2 Head-to-Head
Slower across the board — because they're actually solving the type problems instead of asserting around them. That's the right trade.
🏗 Monorepo Note
# Single app:
npx tsc --noEmit --incremental
# Monorepo with project references (Turborepo, Nx, pnpm):
npx tsc --build --noEmit # type-checks all referenced packages
# ❌ NOT this - --dry only previews what would build:
# npx tsc --build --dry ← wrong for type checkingProject references tsconfig? Use tsc --build --noEmit. Running tsc --noEmit at root skips them — zero errors, false reassurance. Requires TypeScript 5.5+. On earlier versions, type-check per package: cd packages/api && npx tsc --noEmit.
🔎 Run This Audit Before Your Next AI Session
echo "=== Type Escape Audit ===" && \
grep -rn "as unknown as" src/ | wc -l && \
grep -rn --include="*.ts" ": any" src/ | wc -l && \
grep -rn "@ts-ignore" src/ | wc -l
# Catches parenthesised variant too: (value as unknown) as X
grep -rn "as unknown" src/ | wc -lThese counts are directional — they include false positives from comments and strings. The trend matters more than the exact number. Double digits in an AI-assisted repo means you have a type debt problem worth investigating.
💰 The Cost Argument — For Your Engineering Manager
On complex architectural refactors, Cursor consumed ~188,000 tokens per session. Claude Code consumed ~33,000 — 5.5× less. For inline completions, the economics reverse — Cursor wins on cost-per-outcome at that task tier.
For BYOK API billing — that's a direct line-item cost argument. For Cursor Business — aggressive Composer use on complex tasks shows up as overage charges at end of billing cycle.
The config files aren't just a quality gate. They're a budget gate. For deeper insights into token limits, see our MCP alternative breakdown.
📋 What This Benchmark Doesn't Tell You
Before you @ Anthropic — some honest caveats.
This is five tasks on one specific stack. Next.js frontend. Node.js API. Prisma ORM. Your codebase will vary. The failure patterns won't.
Developers aren't choosing TypeScript despite AI agents. They're choosing it because of them — using tsc as a deterministic check on non-deterministic output. TypeScript just surpassed Python and JavaScript on GitHub because of exactly this shift.
And the feedback loop is about to get much faster. TypeScript 7.0 — Project Corsa — is a full rewrite in Go. A 45-second tsc check on a massive monorepo drops to 4 seconds. When an AI agent can iterate a compiler loop in 4 seconds, it stops asserting its way out and starts reasoning its way through.
The hook won't be optional forever. Right now, it is.
The answer isn't one tool. It's knowing which task needs which agent — and having the config that keeps all three honest.
tsc --noEmit doesn't lie. Your terminal doesn't clap for effort.
