Token Scout
Open source · MIT license · Rust CLI + MCP server

Free and cheap LLM inference.
Found live. Routed safely.

Free LLM models exist right now. Tomorrow the list will be different. Token Scout discovers what's free and cheap in real time — across OpenRouter, Groq, Cerebras, and your local Ollama GPUs — so your agent always routes to the cheapest safe option.

Dynamic pricing. Compatibility filtering. Cost ceiling. One tool you wire in once — your agent does the rest.

View on GitHub

Stop paying for inference your agent can get for free.

Find what's free — right now, not last week
token_scout(query="reasoning code")
→ live results: free and cheap models matching your query, with pricing
Results change as providers add, remove, and reprice models.
Set a cost ceiling — never overspend
TOKEN_SCOUT_MAX_COST=0     # free models only
TOKEN_SCOUT_MAX_COST=0.001 # free + cheap (~$1/M tokens)
Your agent can't accidentally route to a $15/M model.
Route safely — compatibility filtering prevents broken calls
token_scout(query="reasoning", require={"reasoning_format": "inline_tags", "min_context": 32000})
→ only models that match your constraints. No context clipping. No tag corruption.
Your local GPUs count too
token_scout(query="fast", prefer="speed")
→ fastest available models across cloud providers and your local GPUs
Ollama instances on your network are discovered automatically. Zero cost.

Three discovery layers. One query.

Token Scout combines cloud, local, and fallback discovery — every query reflects what's available right now.

Layer What it discovers How
OpenRouter Live Hundreds of models with real-time pricing. Free tiers come and go — Token Scout catches them. GET /api/v1/models — no key needed
Ollama Constellation Every model on your local network. Free, unlimited. Probes configured hosts via /api/tags
Static Fallback Groq, Cerebras, Mistral, GitHub, Google — curated free tiers. Always available, even offline

Free models on OpenRouter change hourly. Token Scout catches them as they appear and disappear.

Five minutes to first query

Install (Rust CLI or Python MCP)
$ git clone https://github.com/jackccrawford/token-scout.git
$ cd token-scout
$ cargo build --release   # Rust (recommended)
$ pip install -e .        # or Python MCP server
Add your keys (whichever providers you have)
$ export GROQ_API_KEY=gsk_...
$ export OPENROUTER_API_KEY=sk-or-...
$ export GITHUB_TOKEN=ghp_...
Add to Claude Code
$ claude mcp add token-scout -- token-scout
Or add to Claude Desktop
// claude_desktop_config.json
  "mcpServers":
    "token-scout":
      "command": "token-scout"
    
  

Why not just search OpenRouter yourself?

You can. But your AI agent can't browse a website. Token Scout gives your agent the same market awareness you have — what's free, what's cheap, what just appeared, what disappeared since yesterday. Plus three things OpenRouter's UI doesn't tell you:

Compatibility prevents three common failures
Broken tool calls
Anthropic, OpenAI, and Ollama handle function calling differently. Token Scout profiles the format so your agent never routes to an incompatible model.
Context clipping
Sending 200K tokens to a 32K model isn't graceful degradation — it's data loss. The min_context filter prevents it.
Reasoning tag corruption
Claude, DeepSeek R1, and Qwen3 all expose thinking differently. Mixing formats mid-workflow corrupts the session. Token Scout profiles reasoning_format so you match correctly.

The free tier changes every hour.

A model that's free today gets priced tomorrow. A new model drops with a 48-hour promotional window. A provider quietly adds a trillion-parameter model to their free tier. If your agent is hardcoding model IDs or reading from a static config, it's missing these windows entirely.

Token Scout checks the market on every query. Your agent always routes to the cheapest option that's compatible with the task. The savings compound — 50 subagent calls a day on free models instead of paid ones adds up fast.

Your agent is paying retail.
It doesn't have to.

One tool. Five minutes. Free and cheap inference, found live.

MIT License · Rust + Python · Works with Claude Code, OpenClaw, LangChain, CrewAI & any MCP client