9Router: Free AI Coding With Auto-Fallback and 40% Token Savings

8 min read Tiếng Việt
Featured image for decolua/9router — 9Router: Free AI Coding With Auto-Fallback and 40% Token Savings

TL;DR

  • What it solves: Rate limits, quota exhaustion, and $20–200/month AI subscriptions all hitting zero mid-coding session
  • Why it matters: Your tools don’t know what happened - you just stopped getting responses and had to manually switch providers
  • Best for: Developers running Claude Code, Cursor, Cline, Codex, or any OpenAI-compatible CLI tool who want zero-downtime AI access
  • Main differentiator: RTK Token Saver compresses git diff, grep, ls outputs before sending to the LLM - 20–40% fewer tokens, same context, same answer
  • Start here: npm install -g 9router → dashboard at localhost:20128 → connect Kiro AI → code for free

The error showed up on a Friday afternoon: rate_limit_exceeded. I had $20 of Claude Pro credit. I’d used it in two days.

I switched to OpenAI. That one hit its daily limit by Tuesday.

By Wednesday I had three browser tabs open - one per provider - and a mental map of which one still had quota. Switching between them meant changing the API key and base URL in my tool’s settings. Not difficult. Just friction at exactly the wrong moment, every time.

What 9Router Is

9Router is a local HTTP proxy that sits between your AI coding tool and your AI provider. You install it once, point your tool at localhost:20128/v1, and 9Router handles the rest: routing, fallback, format translation, and token compression.

One sentence: 9Router acts as a smart gateway between your coding tools and 40+ AI providers, automatically falling back to cheaper or free alternatives when quotas run out - and compressing token-heavy tool outputs before they ever reach the LLM.

The architecture is straightforward:

Claude Code / Cursor / Cline / Codex / Antigravity

         ↓  http://localhost:20128/v1
   ┌─────────────────────────────────────┐
   │  9Router                            │
   │  • RTK token compression            │
   │  • Format translation               │
   │  • Quota tracking                   │
   │  • Auto token refresh               │
   └─────┬───────────────────────────────┘

         ├─→ Tier 1: SUBSCRIPTION  (Claude Pro, Codex, Copilot)
         │          ↓ quota hit
         ├─→ Tier 2: CHEAP         (GLM $0.6/1M, MiniMax $0.2/1M)
         │          ↓ budget limit
         └─→ Tier 3: FREE          (Kiro, OpenCode Free, Vertex $300 credits)

Your tool thinks it’s talking to one provider. It’s actually talking to a tiered fallback stack.

RTK Token Saver - The Part That Surprises People

When Claude Code or Cline runs a tool - git diff, grep, find, ls, a log dump - the full output goes back to the LLM as a tool_result. For a moderately complex codebase, that’s often 20,000–50,000 tokens just in tool noise.

RTK intercepts every tool_result before it reaches the provider. It detects the content type and applies the right lossless compression filter: diff compression, dedup-log, smart-truncate, search-list collapse. If the filter fails or makes the output larger, it silently keeps the original. Your request never breaks.

Without RTK: 47,000 tokens sent to LLM
With RTK:    28,000 tokens sent to LLM   (40% saved · same context · same answer)

On a Claude Pro subscription with 5-hour quota windows, that difference often means finishing the session instead of hitting the wall.

Getting Started

The three-minute path:

npm install -g 9router
9router
# Dashboard opens at http://localhost:20128

In the dashboard: Providers → Connect Kiro AI (free, unlimited Claude Sonnet 4.5 - just OAuth) or OpenCode Free (no auth at all, model list auto-fetches).

Then in your CLI tool settings:

Endpoint:  http://localhost:20128/v1
API Key:   [copy from 9Router dashboard]
Model:     kr/claude-sonnet-4.5   # Kiro AI
           or
           oc/...                 # OpenCode Free models

That’s it. No subscriptions needed to start.

Or run from source (Docker):

git clone https://github.com/decolua/9router.git
cd 9router
docker compose up -d

Provider Map

Free (zero cost):

ProviderModelsAuth
Kiro AIClaude Sonnet 4.5, GLM-5, MiniMaxOAuth (one-time)
OpenCode FreeAuto-fetched listNone
Vertex AIGemini 3 Pro, DeepSeek, GLM-5GCP account ($300 free credits)

Cheap backup:

ProviderPriceNotes
GLM-5.1 / GLM-4.7$0.60/1M tokensDaily reset 10AM
MiniMax M2.7$0.20/1M tokensCheapest option
Kimi K2.5$9/month flat10M tokens/month predictable

Subscriptions you already have: Claude Code Pro/Max, Codex Plus/Pro, GitHub Copilot, Cursor IDE - all work via OAuth. 9Router maximizes their quotas before touching cheaper tiers.

What Gets Harder

Free tier providers change without notice. Kiro AI and OpenCode Free are current; iFlow, Qwen, and Gemini CLI free tiers were discontinued in early 2026. If a provider closes its free access, the fallback chain degrades until you update your config. The README keeps a current list - check it before trusting any specific free provider long-term.

The dashboard “cost” display is a savings tracker, not an actual bill. 9Router never charges. But that number can be misleading until you understand it: “$290 cost” on the dashboard means $290 you didn’t spend, not $290 owed.

Caveman Mode - the optional feature that rewrites LLM prompts to terse caveman-speak to cut output tokens by up to 65% - works but it changes the character of responses. Fine for code. Poor for anything where the explanation matters.

Cloud Sync stores your provider configs across devices. That means API keys and OAuth tokens transit a 9Router cloud service. For most users that’s fine; for security-sensitive environments, self-host the Docker image and disable sync.

Under the Hood: What’s Actually There

People have raised legitimate questions about 9Router’s transparency. I read the source. Here is what is verifiable and what warrants caution - no marketing, no FUD.

What is genuinely open. The routing logic, RTK compression, fallback tiers, DB schema, and per-provider interceptors are all readable on GitHub. The code does what it says.

MITM with a system root CA - this is real, and it is not prominently disclosed.

For OAuth subscription providers (Copilot, Cursor, Kiro, Antigravity), 9Router does not act as a simple HTTP proxy. It runs a full MITM (man-in-the-middle) HTTPS server in src/mitm/. On first run it generates a local root CA certificate and installs it into your system’s trust store. On Windows it needs elevated privileges (winElevated.js) to do so. After that, it intercepts and decrypts outgoing HTTPS calls from those tools - reads the request, replaces the model, then re-routes to your configured provider.

This is how it transparently redirects traffic from tools that don’t support custom endpoints. It works. But it means 9Router has the technical ability to read every request and response going to Copilot, Cursor, Kiro, and Antigravity - including your code context and conversation history - as cleartext on your machine.

There is no evidence in the source that this content is logged or exfiltrated. But installing a root CA exposes you to any future vulnerability in 9Router’s CA key management. You are making a trust decision, and the README does not make it prominent.

The npm package cannot be audited against the source.

The package.json in the repo sets "private": true and "name": "9router-app". This means the GitHub source and the npm-published 9router package are separately maintained. What you install with npm install -g 9router cannot be directly verified against what you see on GitHub. For a tool that intercepts HTTPS traffic and stores OAuth tokens, this gap matters.

HOST_REWRITE routes through a Google internal endpoint.

The code in src/mitm/server.js contains:

const HOST_REWRITE = {
  "cloudcode-pa.googleapis.com": "daily-cloudcode-pa.googleapis.com",
};

daily-cloudcode-pa is a Google developer/staging endpoint, not a production API. 9Router rewrites Antigravity requests to this endpoint to bypass production rate limits. This is likely a violation of Google’s terms of service and may stop working without warning if Google restricts it.

Cloud Sync sends credentials to 9Router’s servers.

The providerConnections table stores OAuth tokens and API keys. When Cloud Sync is enabled, this data goes to 9Router’s cloud. The README says it uses encryption; the key management details are not visible in the public source. Disable sync if you are handling sensitive credentials.

The bottom line. 9Router is not malware in the traditional sense. The routing, compression, and fallback features work as documented. But it is not the “simple local proxy” it presents itself as. It installs a system root CA, routes through an unofficial Google endpoint, and syncs credentials to a cloud service whose internals are not auditable from the public source. If you choose to use it, run it in Docker with Cloud Sync disabled, and audit the CA it installs into your trust store.

The Mental Model

Think of 9Router as a circuit breaker for your AI tool spend.

When quota runs out, normal behaviour is hard stop: the tool errors, you notice, you manually switch providers, you lose your train of thought. 9Router makes the failure silent and automatic. The next tier picks up. You keep typing.

The RTK compression is orthogonal to the fallback story - it reduces how fast you burn through any tier. The two features together mean you spend less money per session and hit limits less frequently.

10,731 stars and 1,652 forks since January 2026 is four months of growth. The trajectory suggests the pain point is real and widely shared.

decolua/9router · MIT · JavaScript · 9router.com

Hoang Yell

Hoang Yell

A software developer and technical storyteller. I spend my time exploring the most interesting open-source repositories on GitHub and presenting them as accessible stories for everyone.