The CLI-vs-MCP debate is the most active conversation in AI engineering right now. The token math is one-sided. The reliability data is one-sided. And yet a lot of vendors are still shipping MCP-first.
This is the case for why Deepline is CLI-first, what we ship for the cases where the CLI is wrong, and why the second wave of agent-native infrastructure will look more like a typed TypeScript SDK than a JSON-RPC server.
What just happened
In late March 2026, a head-to-head benchmark from StackOne compared the official GitHub MCP server against the gh CLI on identical tasks. MCP used 4-32x more tokens per operation. MCP failed where CLI succeeded — reliability sat at roughly 100% for CLI versus 72% for MCP. Same prompts, same agent, dramatically different outcomes.
A week later, Firecrawl published its own benchmark confirming the pattern: MCP costs 4 to 32x more tokens than CLI. Same tasks, much higher bills.
Independently, Geoffrey Huntley measured the context cost of the official GitHub MCP server: roughly 55,000 tokens added to the agent's context window, before any work happens, just from the tool definitions. Adding three popular MCP servers (GitHub, Playwright, IDE integration) consumed 143,000 of a 200,000-token context window — 72% of the agent's working memory eaten by tool descriptions it mostly never touched.
The community responded by building MCP-to-CLI converters. The mcp2cli project crossed 1.9K GitHub stars by the end of April, claiming 96-99% token savings on common workflows. Atlassian published an MCP compressor that reduces tool-description overhead by 70-97% without changing how the agent calls tools.
Even Claude Code itself is structured this way. Per a public deep-dive of the Claude Code architecture, Claude Code's native tools are Read, Write, Bash, Grep, and Glob — Bash handles git, npm, and other common shell operations. MCP servers are reserved for specialized capabilities like Playwright (browser automation) where the tool surface is genuinely complex enough to justify the context cost. Not "everything is an MCP." Bash for the common case.
This is not a "MCP is bad" story. MCP is the right answer for some problems. But the default for AI coding agents has flipped — and it flipped fast.
What Anthropic shipped to close the gap
This is the part of the story most write-ups miss: Anthropic has been actively shipping context-efficiency tools for MCP since November 2025. Three matter:
- Code execution with MCP (November 2025): Instead of emitting JSON tool calls one at a time, Claude writes code that calls MCP tools inside a sandboxed container. The model only sees the final result. Anthropic's testing showed a 98.7% token reduction on a representative workflow — from 150,000 tokens down to 2,000.
- Tool Search Tool (November 2025): A new
defer_loading: trueflag that lets Claude discover tools on-demand instead of loading all definitions upfront. Anthropic reports an 85% token reduction, plus internal MCP-eval accuracy gains from 49% → 74% on Opus 4 and 79.5% → 88.1% on Opus 4.5. - Programmatic Tool Calling (November 2025 beta): Combines the two — tools are deferred AND invoked from code Claude writes. Anthropic's initial release reported 37% token reduction.
The catch: these features are opt-in. They live in the Anthropic SDK and require explicit tool_search or defer_loading flags. Most off-the-shelf MCP servers — including the popular GitHub, Playwright, and Slack servers — don't yet emit code-execution-friendly schemas or mark themselves deferred. The Claude Code GitHub issue requesting these betas is still open.
So the practical question hasn't fully flipped. When a developer chooses between giving an agent a CLI vs an MCP server today, the CLI still uses fewer tokens by default. The gap will close as more MCP servers adopt deferred loading and code execution — and informed teams using Anthropic's beta features can already get most of the way there. But "we ship a CLI" remains the simpler ergonomic answer for the next several quarters.
The three primitives an agent can call
Every API exposed to an AI agent has to choose how to expose itself. There are three real options.
Direct API (raw HTTP/JSON)
The agent crafts each call from scratch — auth headers, retry logic, pagination, schema mapping, error handling. This is fine for one-off ad-hoc work, but disastrous for repeated workflows. Every retry is a new reasoning step. Every pagination is a new tool call. The agent burns tokens reinventing what a library should encapsulate.
Best for: ad-hoc exploration, one-shot integrations. Worst for: any workflow the agent runs more than twice.
MCP server
A standardized protocol where the agent reads a tool catalog from a server and invokes tools by name with structured payloads. Beautiful in theory: protocol-level uniformity, consistent error semantics, no per-vendor adapter required.
In practice, the tool catalog is the problem. Every tool definition consumes context tokens. Every parameter description, every example, every authentication note. By the time the agent finishes loading tool schemas, a meaningful fraction of its context is already gone — and it has to do this on every session.
Best for: non-shell agent runtimes (embedded copilots, ChatGPT plugins, certain enterprise environments). Worst for: coding agents that have a Bash tool and just want to run a command.
CLI (used via the agent's Bash/terminal tool)
The agent shells out. The CLI parses arguments, handles auth, runs the operation, prints a result. The agent reads the result as text — the same way it reads git log or gh pr view. Zero tool-definition cost. Composable with awk, jq, grep, and other CLIs. Failures are observable through stderr.
The trade-off is that the agent runtime must support shell execution. Most coding agents do. ChatGPT plugins and certain embedded environments do not.
Best for: Claude Code, Codex, Cursor, Windsurf, Cline, Aider, Plandex, OpenCode — basically every coding agent shipped in the last year. Worst for: runtimes without shell access.
TypeScript SDK (with auto-generated types)
The fourth option, less discussed but underrated. The agent imports a typed SDK and calls functions. TypeScript types and JSDoc become inline documentation in the IDE. Compile-time errors catch bugs before they hit credit-burning runtime calls. When the agent generates application code that will run later, the types are already in scope — no separate tool catalog to load.
This is the Supabase pattern: auto-generated types from the schema, saved into the codebase, reading like canonical documentation to the agent. Once the types are in the codebase, switching SDKs becomes a structural cost — call it the "type lock-in" effect.
Best for: an agent generating production application code that will run later. Worst for: exploratory shell work or one-off scripts.
Why we shipped CLI first
The audit was simple. Our buyer is a GTM engineer or RevOps operator using Claude Code, Codex, Cursor, or Windsurf — all CLI-capable runtimes. Our value proposition is "one command does what would be 6 provider SDK calls plus retry logic plus dedup logic." The cheapest way to deliver that value is the cheapest way for the agent to call us.
We measured. A typical Deepline waterfall via our CLI consumes a few hundred tokens of context — the command syntax the agent already knows. The same waterfall via a hypothetical MCP server would have added thousands of tokens of tool definitions before the first call.
So we shipped CLI first. Then we shipped the Claude Code skill bundle — but the skill teaches Claude about Deepline workflows, it does not inject MCP-style tool schemas into the context window. The agent still reaches the CLI through Bash. Token cost stays low.
Why we still ship MCP
Our MCP server exists for the cases where the CLI is wrong:
- Embedded copilots that wrap Claude or GPT in a UI layer without shell access.
- Certain ChatGPT plugin integrations that cannot invoke arbitrary commands.
- Enterprise environments with security policies that disable shell execution but allow signed MCP tool catalogs.
The MCP server exposes a small, intentional set of operations — not every Deepline tool, just the highest-leverage ones. We do not want to win the "most MCP tools" leaderboard. We want to win "the MCP server that does not blow up your context window."
Why the TypeScript SDK is the second wave
The next thing we are shipping is an auto-generated TypeScript SDK — every tool, every payload, every result type, fully typed and JSDoc-documented. Run deepline gen types typescript (planned), get a .ts file with the entire Deepline surface area. Save it into your repo. Now the agent has IDE autocomplete and inline docs for every Deepline call, with compile-time errors before the first runtime credit is spent.
The Supabase pattern proves this works at scale. Once the types are in the codebase, the agent stops guessing — and the cost of switching providers becomes real architectural work, not a one-line change. That is what we want for the agent-generated app code path.
What this means for you
If you are choosing between Deepline and a competitor right now, the question is not "do they have an MCP server." The question is "what is the cheapest path for my agent to call them and get the right answer."
For coding agents in 2026, that is the CLI. For app developers building agent-driven services, that will increasingly be a typed SDK. MCP is the fallback for runtimes that cannot do either.
Deepline ships all three because each has a real audience. But the canonical interface — the one we measure ourselves against — is the CLI.
Install in 30 seconds:
curl -s "https://code.deepline.com/api/v2/cli/install" | bash
deepline auth register
Then run a waterfall. Tell your agent what you want in plain English. Read the result.
That is the whole pitch.
Frequently asked questions
Why is a CLI better than an MCP server for AI agents? Recent benchmarks (StackOne, Firecrawl, 2026) show MCP servers use 4-32x more tokens than equivalent CLI calls for the same task. MCP tool definitions consume context window before any work begins — the official GitHub MCP server alone adds ~55K tokens. CLIs add zero tool-definition cost because the agent already knows how to use Bash.
Does Deepline have an MCP server? Yes. Deepline ships an MCP server for agent runtimes that cannot shell out (e.g., embedded copilots, certain ChatGPT integrations). For coding agents like Claude Code, Codex, Cursor, and Windsurf — all of which can call Bash — the CLI is the recommended primitive.
When should I use the TypeScript SDK instead of the CLI? Use the SDK when an agent is generating application code that will run later in production. Auto-generated TypeScript types provide IDE autocomplete and JSDoc inline documentation that catch errors at compile time. Use the CLI for exploratory shell work, ad-hoc enrichment, and workflow scripts.
What does Anthropic do internally with Claude Code? Per a public deep-dive of the Claude Code architecture (alexop.dev), Claude Code's native tools are Read, Write, Bash, Grep, and Glob. Bash handles git, npm, and common shell operations. MCP servers are reserved for specialized capabilities like Playwright (browser automation) where the tool surface is genuinely complex enough to justify the context cost.
What is the "lethal trifecta" in MCP security? Simon Willison coined "lethal trifecta" (June 2025) to describe agent configurations with all three of: access to private data, exposure to untrusted content, and the ability to communicate externally. This is a property of the agent runtime, not MCP specifically — but CVE-2025-6514 (CVSS 9.6) in mcp-remote, an npm package with 437,000+ downloads, showed how malicious MCP servers can achieve full remote code execution by exploiting the OAuth authorization_endpoint URL.
Why does Deepline ship CLI, SDK, MCP, AND REST? Each interface serves a different audience. CLI for coding agents. SDK for app developers generating production code. MCP for non-shell agent runtimes. REST for direct integrations with existing services. All four share identical semantics — the same waterfall returns the same fields regardless of how it was called.
How much does Deepline cost? Free with Bring Your Own Key (BYOK) — pay providers directly with zero Deepline markup. Managed credits are pay-on-match (no result, no charge) with transparent per-operation pricing. See pricing.
What if I am not using a coding agent yet? The CLI works standalone. You can run Deepline from any terminal — no Claude Code, no Cursor, no agent required. Most teams adopt the CLI first and add an agent later when they want plain-English workflow definition.
Skip the MCP context tax — use the CLI
Install Deepline in 30 seconds. Claude Code, Codex, Cursor, and Windsurf can run waterfall enrichment from a single Bash invocation, no MCP overhead.