Gemini 3.5 Flash prices used here
The calculator uses Google AI Developer pricing checked on May 21, 2026. The paid tier for Gemini 3.5 Flash lists Standard at $1.50 input, $9.00 output, and $0.15 context caching per 1M tokens. Batch is listed at $0.75 input, $4.50 output, and $0.075 cached input. Flex is listed at $0.75 input, $4.50 output, and $0.08 cached input. Priority is listed at $2.70 input, $16.20 output, and $0.27 cached input.
| Mode |
Input / 1M |
Cached input / 1M |
Output / 1M |
Use it when |
| Standard |
$1.50 |
$0.15 |
$9.00 |
You need predictable production latency. |
| Batch |
$0.75 |
$0.075 |
$4.50 |
Offline jobs, enrichment, reports, and evaluations. |
| Flex |
$0.75 |
$0.08 |
$4.50 |
Lower-priority requests that can tolerate variable latency. |
| Priority |
$2.70 |
$0.27 |
$16.20 |
Latency-sensitive user-facing paths where speed is worth the premium. |
How to use this estimate
Start with the traffic shape you actually expect. A developer tool might have a few thousand large requests per month, while a consumer app might have millions of short requests. Gemini 3.5 Flash has a wide gap between input and output price, so a chatbot with long answers will behave differently from a classifier or router that returns short JSON.
Then set a realistic cache share. Cached input is valuable when the same system prompt, policy, schema, tool list, or project context repeats across requests. If every request is unique, use a low cache share. If you run many requests against the same instruction and reference material, 50% or more can be realistic, but only real usage logs can confirm it.
For a first production budget, run the calculator three times: one conservative case, one expected case, and one spike case. The spike case should include retries, longer answers, and any grounded search calls that might happen during a launch week. This is especially important for agents because they often create extra intermediate tokens before the user sees the final answer.
Which mode should you model first?
Use Standard as the default baseline when the request is part of a user-facing product. It gives you a clean reference point before you optimize latency or price. If the Standard estimate is already affordable, you can keep the launch simple and focus on prompt quality, logging, and user experience. If the estimate is too high, then test Batch or Flex for background jobs before changing the product design.
Batch is usually the easiest discount to reason about because it fits scheduled work: nightly enrichment, content refreshes, evaluation runs, and large data cleanup tasks. Flex is more useful when lower priority latency is acceptable but the job still belongs in the normal API path. Priority should be modeled as a deliberate premium. Put only the requests that directly affect conversion, retention, or support quality into that scenario, otherwise the blended monthly cost can look worse than the product value.
For teams comparing Gemini against other AI APIs, keep the same token assumptions across every calculator page on this site. Change one variable at a time: model, output length, cache share, or search grounding. That makes the result a decision tool instead of a spreadsheet full of unrelated guesses.
Grounding with Google Search can dominate cost
The pricing page says a customer-submitted request to Gemini may result in one or more Google Search queries, and each individual search query can be charged. The published Gemini 3 line includes 5,000 grounded prompts or requests per month, then $14 per 1,000 search queries. If your product grounds every user request, that extra line item can overtake token costs.
A practical routing pattern is to separate requests that truly need live web grounding from requests that only need model reasoning over your own data. For example, price checks, current events, and local availability may need grounding, but classification, extraction, rewriting, and deterministic workflow steps usually do not. That single product decision can change the monthly bill more than small prompt tweaks.
FAQ
Is Gemini 3.5 Flash cheaper than GPT-5.5?
For many high-volume text workloads, yes, based on the public API rates checked here. The right comparison still depends on quality, retries, output length, and whether you need Search grounding or Priority latency.
Should I use Batch or Flex for production?
Use them for work that does not block the user. If a request is part of an interactive flow, Standard is a safer baseline. Priority should be reserved for latency-sensitive paths.
Is this an official Google calculator?
No. It is an independent planning tool using public Google pricing. Always verify final prices in the official Gemini API pricing page before production spend.
Sources
AI Code Limits is independent and is not affiliated with Google, Gemini, or Google AI Studio.