模型成本路由
AI 模型价格计算器
搜索并比较 300+ 公共 AI 模型价格,按请求量、输入 token、输出 token、缓存输入和联网搜索成本估算每月预算。
中文说明
这页适合在接入官方 API、模型 Router 或中转站之前做预算。先用真实请求量估算成本,再决定哪些任务给旗舰模型,哪些任务可以路由到便宜模型或缓存。
第一版中文页保留部分英文 API 字段、模型名和表单标签,方便和官方文档、价格表、开发工具配置项对应。计算结果只做预算和选型参考,最终价格、限额和条款以官方后台或服务商当前公开说明为准。
Searchable directory
Browse 300+ public AI model prices
This directory uses a local snapshot of the open-source models.dev database captured on May 23, 2026. It includes 59 providers and 364 matched. The table shows the first 80 shown rows after filtering so the page stays fast on mobile.
| Model | Provider | Input / 1M | Output / 1M | Cached input / 1M | Context | Modality |
|---|
How to compare model prices without fooling yourself
AI model pricing is not just one number. A model with cheap input tokens can still be expensive if it produces long outputs. A model with expensive output can be efficient if it solves the task in one call and avoids retries. A model with a large context window can reduce engineering work, but long-context requests may quietly turn a cheap prototype into a large monthly bill. This calculator therefore asks for request count, input tokens, output tokens, cached input share, and optional web-search calls instead of asking for a vague monthly budget.
The first decision is model class. Use flagship reasoning models when the task needs deep planning, high-quality code review, hard debugging, complex tool use, or decisions that affect revenue. Use mid-tier models for normal chat, summarization, extraction, support drafts, and agent steps that need decent reasoning but not top-tier judgment. Use small or open models for routing, classification, simple rewriting, safety pre-checks, and bulk background jobs. The cheapest reliable architecture is usually a router: a small model handles easy traffic and escalates hard cases to a premium model.
The second decision is output control. Many teams underestimate output tokens because they only think about the user's prompt. In real products, output includes explanations, JSON, tool calls, code, tests, citations, and retries. A task that asks for a long answer can be more expensive than a task with a long input and a short structured output. If your app can use concise JSON, short bullet points, or a two-step summarize-then-expand flow, you can often reduce monthly cost without changing providers.
The third decision is caching. OpenAI, Anthropic, Google, and several routing providers expose cached input or prompt-cache pricing for some models. Caching helps when system prompts, schemas, tools, style guides, project instructions, or large reference documents repeat across many requests. It does not help much when every request is unique. Do not put random user text into the stable cached block. Keep the reusable context separate from user-specific content so the provider can actually reuse it.
Coverage and limitations
The broad directory is generated from models.dev, which tracks model IDs, context windows, modalities, and token prices across many public providers and OpenRouter-style endpoints. That gives this page much wider coverage than a hand-written table. At the same time, it is still a snapshot. Some providers change prices, rename models, add preview versions, remove endpoints, or apply regional and plan-specific terms. The page should be used as a fast comparison surface, not as a contract.
For production, always verify three things directly with the provider: whether the model is available in your account region, whether the listed price applies to your plan or endpoint, and whether extra charges apply for web search, code execution, audio, image generation, video, storage, fine-tuning, batch processing, or priority latency. If a provider bills through a router, compare the router price with the first-party API price. Routers can simplify integration and failover, but they may include markups, routing behavior, or availability differences.
Common pricing patterns by provider
OpenAI models usually make a clear distinction between input, cached input, and output tokens, with premium models carrying much higher output prices. Anthropic Claude models add prompt cache write and read behavior for some models, which matters for agent workflows with repeated project context. Google Gemini pricing often varies by model, mode, cache, grounding, and media type. xAI Grok models are notable for large context windows on some endpoints and separate web-search pricing in some data. DeepSeek and Qwen models often offer very low token prices, which can make them attractive for high-volume background work if quality is sufficient. Mistral, Cohere, Perplexity, Meta Llama, MiniMax, Moonshot, Z.ai, NVIDIA, Amazon, and many open or hosted models can be useful in specialized niches, especially when you care about language, deployment region, open-weight availability, search, or latency.
The practical rule is simple: do not pick a model only by benchmark or only by price. For each workload, run a small eval set with real prompts, measure retry rate, answer length, latency, and human acceptance. A model that is half the price but needs twice as many retries is not cheaper. A premium model that eliminates manual review may be cheaper in business terms even if the API line item is higher.
Build a routing plan from the calculator
- Use the calculator with your expected monthly request volume.
- Sort the directory by provider and search for the model family you already trust.
- Compare the cheapest scenario list, then remove models that do not support the modality or context window you need.
- Test three candidate models on the same prompt set: one small, one mid-tier, one flagship.
- Record cost per accepted answer, not just cost per request.
- Route easy requests to the cheapest acceptable model and escalate failures to the stronger model.
FAQ
Does this page really include every model on the market?
No page can guarantee every private, regional, enterprise, preview, or recently renamed model. This page includes a broad public snapshot from models.dev with hundreds of model entries and many providers, then links official pricing pages for verification.
Why are some models listed as free?
Some public routing datasets include free tiers, promotional endpoints, or zero-priced open model routes. Treat those as discovery signals. Production availability, rate limits, and acceptable use terms still need provider verification.
Should I use the cheapest model in the result list?
Only after testing quality. The cheapest model is a candidate, not a recommendation. Run real prompts, measure retry rate, and compare total cost per accepted output.
How often should this page be refreshed?
For active model pricing work, refresh weekly. For production routing, check official pricing before any launch, budget change, or provider migration.
Sources
- models.dev open-source model database
- OpenAI API pricing
- Anthropic Claude pricing
- Google Gemini API pricing
- Mistral AI documentation
- Cohere model documentation
- DeepSeek API pricing
- xAI model documentation
- Perplexity API documentation
- GPT-5.5 API Pricing Calculator
- Gemini 3.5 Flash Pricing Calculator
AI Code Limits is independent and is not affiliated with OpenAI, Anthropic, Google, xAI, DeepSeek, Qwen, Mistral, Cohere, Perplexity, Meta, or any listed provider. Prices are estimates; verify official pricing before production spend.