Codex rate card used here
The calculator uses the OpenAI Help Center Codex rate card checked on May 21, 2026. OpenAI describes the current rate card as token-based: Codex usage is priced in credits per million input tokens, cached input tokens, and output tokens. The model table below is copied into the calculator as planning data, not as a guarantee of future rates.
| Model |
Input / 1M |
Cached input / 1M |
Output / 1M |
Use it when |
| GPT-5.5 |
125 credits |
12.50 credits |
750 credits |
Harder tasks where higher quality justifies burn. |
| GPT-5.4 |
62.50 credits |
6.250 credits |
375 credits |
Default serious coding work with lower burn than GPT-5.5. |
| GPT-5.4-Mini |
18.75 credits |
1.875 credits |
113 credits |
Routine edits, triage, and high-volume lightweight tasks. |
| GPT-5.3 Codex / GPT-5.2 |
43.75 credits |
4.375 credits |
350 credits |
Cost-controlled agent workflows where available. |
How to estimate a Codex task
A local bug fix might include project instructions, a few files, test output, and a short diff. A cloud task or pull-request review can include larger context, multiple file reads, long logs, repeated planning, and more output tokens. If you do not have logs yet, start with 40k-80k input tokens and 4k-12k output tokens for a normal task, then update the numbers after checking Codex Settings > Usage.
Cached input is the largest lever when the same repository instructions, AGENTS.md, tool schemas, and stable context repeat across tasks. It does not reduce output cost. If the model is producing long explanations, diffs, reviews, or generated tests, output tokens can dominate credit burn even with good caching.
The most common underestimation mistake is counting only the final answer. Real coding tasks often include hidden planning, file reads, failed test output, command logs, retry prompts, and review messages. If the task touches a large repository or asks for broad refactoring, use the spike case rather than the average case until you have actual usage history.
A practical monthly budgeting workflow
Start by dividing Codex usage into three buckets instead of one blended average. The first bucket is quick interactive work: small bugs, single-file edits, command output review, and direct questions about a repository. The second bucket is medium implementation work: multi-file features, test repair, review feedback, and documentation changes. The third bucket is autonomous or cloud work: long-running tasks, scheduled jobs, broad refactors, and repeated verification runs. Each bucket has a different token shape.
Use the calculator once per bucket, then add the three totals. This avoids the common mistake of treating twenty small questions and twenty cloud tasks as the same kind of request. For small tasks, output length and model choice often matter most. For medium tasks, cached repository instructions and focused file scope can reduce input burn. For autonomous work, retries and hidden command output can dominate, so a planning buffer is reasonable.
After the first billing week, compare the calculator to real usage. If the estimate is too low, do not only raise the budget. Look for preventable causes: vague prompts, repeated failing tests, oversized AGENTS.md instructions, unnecessary paste-heavy context, or requests that ask for explanation when a concise patch is enough. That review turns the calculator into an operating habit rather than a one-time guess.
Ways to reduce credit burn
- Keep AGENTS.md concise so every task does not carry unnecessary instructions.
- Point Codex at specific files, failing tests, or error snippets instead of broad vague requests.
- Use cheaper models for routine refactors, formatting, and known fixes.
- Split unrelated work into separate tasks to avoid carrying stale context.
- Review generated output size; long explanations and repeated test logs can be expensive.
- Check Settings > Usage before enabling automations or repeated cloud tasks.
After a few days of usage, replace the default calculator assumptions with your own medians: average input tokens, average output tokens, and cache share per task type. Keep separate rows for quick local fixes, larger cloud tasks, and pull-request reviews. The average across all tasks is less useful than knowing which task type drains the budget.
FAQ
Why is this different from the legacy per-message rate card?
The legacy table gave rough average credits per message or pull request. OpenAI now documents a token-based rate card for most plans, so input, cached input, and output volumes determine actual burn more directly.
What does the fast mode checkbox do?
It adds a planning buffer for workflows that run more aggressively, create more intermediate output, or execute parallel automations. It is not an official OpenAI multiplier.
Where do I confirm real Codex usage?
Use Codex Settings > Usage inside the official OpenAI product. Depending on plan and role, you may also be able to add credits or manage auto-reload there.
Sources
AI Code Limits is independent and is not affiliated with OpenAI or Codex.