Google API 价格

Gemini 3.5 Flash 价格计算器

按请求量、输入输出 token、缓存比例、处理模式和 Grounded Search 调用量估算 Gemini 3.5 Flash 成本。

Monthly estimate $0.00
Per request $0.00
Checked May 21, 2026

中文说明

这页适合在使用 Gemini Flash 做内容生成、搜索增强、多模态处理或批量任务前估算预算,并比较 Standard、Batch、Flex、Priority 的成本差异。

第一版中文页保留部分英文 API 字段、模型名和表单标签,方便和官方文档、价格表、开发工具配置项对应。计算结果只做预算和选型参考,最终价格、限额和条款以官方后台或服务商当前公开说明为准。

Why this page exists now

Google I/O 2026 pushed Gemini 3.5 Flash into the center of Google's developer stack. Google describes it as a fast frontier model for agentic workflows, and the Gemini API pricing page now exposes Standard, Batch, Flex, and Priority prices. That creates a simple search task: developers want to know whether Gemini 3.5 Flash is cheaper than the model they already use.

What the calculator includes

The estimate includes paid-tier input tokens, output tokens including thinking tokens, context cache reads, and grounded search overage. It does not include separate application hosting, database, vector storage, or your own orchestration costs.

Gemini 3.5 Flash prices used here

The calculator uses Google AI Developer pricing checked on May 21, 2026. The paid tier for Gemini 3.5 Flash lists Standard at $1.50 input, $9.00 output, and $0.15 context caching per 1M tokens. Batch is listed at $0.75 input, $4.50 output, and $0.075 cached input. Flex is listed at $0.75 input, $4.50 output, and $0.08 cached input. Priority is listed at $2.70 input, $16.20 output, and $0.27 cached input.

Mode Input / 1M Cached input / 1M Output / 1M Use it when
Standard $1.50 $0.15 $9.00 You need predictable production latency.
Batch $0.75 $0.075 $4.50 Offline jobs, enrichment, reports, and evaluations.
Flex $0.75 $0.08 $4.50 Lower-priority requests that can tolerate variable latency.
Priority $2.70 $0.27 $16.20 Latency-sensitive user-facing paths where speed is worth the premium.

How to use this estimate

Start with the traffic shape you actually expect. A developer tool might have a few thousand large requests per month, while a consumer app might have millions of short requests. Gemini 3.5 Flash has a wide gap between input and output price, so a chatbot with long answers will behave differently from a classifier or router that returns short JSON.

Then set a realistic cache share. Cached input is valuable when the same system prompt, policy, schema, tool list, or project context repeats across requests. If every request is unique, use a low cache share. If you run many requests against the same instruction and reference material, 50% or more can be realistic, but only real usage logs can confirm it.

For a first production budget, run the calculator three times: one conservative case, one expected case, and one spike case. The spike case should include retries, longer answers, and any grounded search calls that might happen during a launch week. This is especially important for agents because they often create extra intermediate tokens before the user sees the final answer.

Which mode should you model first?

Use Standard as the default baseline when the request is part of a user-facing product. It gives you a clean reference point before you optimize latency or price. If the Standard estimate is already affordable, you can keep the launch simple and focus on prompt quality, logging, and user experience. If the estimate is too high, then test Batch or Flex for background jobs before changing the product design.

Batch is usually the easiest discount to reason about because it fits scheduled work: nightly enrichment, content refreshes, evaluation runs, and large data cleanup tasks. Flex is more useful when lower priority latency is acceptable but the job still belongs in the normal API path. Priority should be modeled as a deliberate premium. Put only the requests that directly affect conversion, retention, or support quality into that scenario, otherwise the blended monthly cost can look worse than the product value.

For teams comparing Gemini against other AI APIs, keep the same token assumptions across every calculator page on this site. Change one variable at a time: model, output length, cache share, or search grounding. That makes the result a decision tool instead of a spreadsheet full of unrelated guesses.

Grounding with Google Search can dominate cost

The pricing page says a customer-submitted request to Gemini may result in one or more Google Search queries, and each individual search query can be charged. The published Gemini 3 line includes 5,000 grounded prompts or requests per month, then $14 per 1,000 search queries. If your product grounds every user request, that extra line item can overtake token costs.

A practical routing pattern is to separate requests that truly need live web grounding from requests that only need model reasoning over your own data. For example, price checks, current events, and local availability may need grounding, but classification, extraction, rewriting, and deterministic workflow steps usually do not. That single product decision can change the monthly bill more than small prompt tweaks.

FAQ

Is Gemini 3.5 Flash cheaper than GPT-5.5?

For many high-volume text workloads, yes, based on the public API rates checked here. The right comparison still depends on quality, retries, output length, and whether you need Search grounding or Priority latency.

Should I use Batch or Flex for production?

Use them for work that does not block the user. If a request is part of an interactive flow, Standard is a safer baseline. Priority should be reserved for latency-sensitive paths.

Is this an official Google calculator?

No. It is an independent planning tool using public Google pricing. Always verify final prices in the official Gemini API pricing page before production spend.

Sources

AI Code Limits is independent and is not affiliated with Google, Gemini, or Google AI Studio.