How does the GPT-5.5 API pricing calculator work?

It multiplies monthly requests by average input, cached input, and output tokens, then applies the selected model price, processing tier, regional uplift, and long-context surcharge when applicable.

Does GPT-5.5 Pro have cached input pricing?

The calculator treats GPT-5.5 Pro as having no cached input discount because the OpenAI model documentation states that GPT-5.5 Pro does not offer a cached input discount.

When should I use Batch or Flex?

Use Batch or Flex for non-urgent jobs such as offline evaluation, data enrichment, or report generation. Keep standard or priority processing for user-facing requests where latency matters.

GPT-5.5 API 价格计算器

中文说明

这页适合在把 GPT-5.5 接入产品、Agent、代码工具或批处理任务之前做成本估算。输出 token 和长上下文往往是预算超支的关键。

第一版中文页保留部分英文 API 字段、模型名和表单标签，方便和官方文档、价格表、开发工具配置项对应。计算结果只做预算和选型参考，最终价格、限额和条款以官方后台或服务商当前公开说明为准。

Model

Requests per month

Processing mode

Average input tokens per request

Average output tokens per request

Cached input share (%)

Use regional processing / data residency uplift (+10% for supported GPT-5.5 family models).

Cost result

Monthly $0.00

Per request $0.00

Token volume 0M

Rate: GPT-5.5
Mode: Standard
Same tokens on GPT-5.5: $0.00
Same tokens on GPT-5.4: $0.00
Same tokens on GPT-5.4 mini: $0.00

Edit the inputs to estimate your workload.

When this page is useful

Use it before routing agent workloads to GPT-5.5. Long prompts, repeated repository context, and expensive output tokens can move a project from a small test to a four-figure monthly bill faster than a simple per-request estimate suggests.

Important pricing notes

GPT-5.5 standard text pricing is modeled as $5 input, $0.50 cached input, and $30 output per 1M tokens.
GPT-5.5 Pro is modeled as $30 input and $180 output per 1M tokens, with no cached input discount.
For GPT-5.5 prompts above 272K input tokens, the calculator applies the published 2x input and 1.5x output long-context surcharge to the session.
Batch and Flex are modeled at half the standard rate. Priority is modeled at 2.5x.

Pricing model used by this calculator

This page is designed for people searching for a GPT-5.5 API pricing calculator because they already have a rough workload in mind: coding agents, document analysis, customer support routing, batch enrichment, or long-context repository work. The calculator does not guess your usage from a plan name. It asks for the variables that actually drive token billing: request count, input size, cached input share, output size, model, and processing mode.

Prices were checked on May 20, 2026 against OpenAI's API pricing and model documentation. GPT-5.5 is listed with a 1,050,000 token context window and 128,000 max output tokens. GPT-5.5 Pro uses the same published context and max output limits but costs more because it spends more compute on difficult requests. The calculator keeps both choices visible because many real workloads should route only a small fraction of traffic to the Pro model.

Model	Input / 1M	Cached input / 1M	Output / 1M	Best fit
GPT-5.5	$5.00	$0.50	$30.00	Hard coding, agent workflows, long professional tasks.
GPT-5.5 Pro	$30.00	No cached discount	$180.00	Small volume of the hardest tasks where accuracy matters more than latency or cost.
GPT-5.4	$2.50	$0.25	$15.00	Cost-controlled coding and professional work.
GPT-5.4 mini	$0.75	$0.075	$4.50	High-volume simpler tasks, classification, extraction, and routing.

How to estimate your token inputs

For a coding agent, input tokens are usually the dominant cost driver because the same repository context is sent repeatedly. A small bug fix may only need a few thousand tokens. A multi-file feature that includes instructions, file excerpts, test output, and a review loop can easily reach tens of thousands of input tokens per turn. If the agent keeps a long conversation open, later turns carry earlier messages too, so the average input size rises during the session.

Use conservative assumptions before production launch. For a first estimate, separate your workload into three groups: simple requests that can use GPT-5.4 mini, regular requests that need GPT-5.4 or GPT-5.5, and rare hard requests that justify GPT-5.5 Pro. Then run the calculator for each group and add the totals. This is more accurate than pricing every request as if it used the most expensive model.

Batch, Flex, Priority, and regional processing

Standard processing is the baseline. Batch is for asynchronous work with a 24-hour completion window and lower cost. Flex also targets lower-priority work and may be slower or temporarily unavailable, while Priority is for user-facing latency-sensitive traffic. The calculator models Batch and Flex at half the standard token rate and Priority at a 2.5x multiplier. Regional processing is modeled as a 10% uplift for the supported GPT-5.5 family models.

Long-context surcharge and cache behavior

OpenAI's GPT-5.5 model documentation states that prompts above 272K input tokens are priced with a 2x input and 1.5x output surcharge for the full session for standard, batch, and flex. The calculator applies that threshold when the average input tokens per request cross 272,000. If you are close to the threshold, test whether summarizing old context, using file search, or splitting the job into smaller requests keeps quality high enough without crossing into long-context pricing.

Prompt caching matters most when a stable system prompt, schema, project guide, or reference document repeats across many requests. A higher cache share can reduce input cost sharply for GPT-5.5 and GPT-5.4. It does not help the output side, and the calculator disables cached input for GPT-5.5 Pro because the model page does not list a cached input discount for Pro.

FAQ

Why is my GPT-5.5 estimate much higher than a simple request count?

Request count alone is not enough. A workload with 10,000 short prompts and a workload with 10,000 repository-sized prompts have completely different costs. Input tokens, output tokens, cache share, and long-context thresholds drive the final number.

Should I use GPT-5.5 Pro for every coding-agent request?

Usually no. GPT-5.5 Pro is useful for a small set of hard planning, debugging, and architecture tasks. Routine edits, extraction, and review loops should be routed to cheaper models first, then escalated when the result fails.

Is the estimate a guaranteed bill?

No. It is a planning estimate based on public token prices and your assumptions. Actual billing can differ because of tool calls, retries, image inputs, provider-side changes, and differences between measured and assumed tokens.

Sources

AI Code Limits is independent and is not affiliated with OpenAI. Prices can change; check the official pricing page before production spend.