AI & Work

Copilot AI Credits: Why Agents Burn Cash

Aditya Kumar JhaAditya Kumar JhaLinkedIn·June 14, 2026·11 min read

GitHub Copilot AI Credits billing: how token metering works, why agents burn quotas fast, and how to estimate and cap spend.

Short answer: GitHub Copilot AI Credits usage-based billing meters input, output, and cached context on every agent step

GitHub Copilot now charges by tokens, not by requests. Every chat turn, agent step, and code review run consumes input tokens, output tokens, and cached context tokens. Each is priced at the underlying model's API rate and converted into GitHub AI Credits, where one credit equals one cent.

That is why a single agent session can spend a real slice of a monthly quota in an afternoon. Nothing is broken. An agent that reads files, plans, edits, runs tests, and re-reads output fires many billable steps in a loop, and each step pays for everything it sends to the model and everything the model sends back.

Insight

The mental model that breaks budgets: under request billing you paid per action. Under token billing you pay per word in and per word out, multiplied by how many times the agent loops. Long context times many steps equals real money.

What changed in June 2026: from premium requests to AI Credits

On June 1, 2026, GitHub replaced premium request units with GitHub AI Credits across paid Copilot plans. Credits drain on token usage, counting input, output, and cached tokens at the published API rate for each model. One AI credit equals 0.01 US dollars, so a 10 dollar allowance is 1,000 credits.

Each paid plan still bundles a monthly allowance. Copilot Pro is 10 dollars per month with 10 dollars of included AI Credits, Pro+ is 39 dollars per month with 39 dollars included, Business is 19 dollars per user per month, and Enterprise is 39 dollars per user per month, each carrying credits matched to the plan price. As of June 2026, existing Business and Enterprise organizations also get a larger introductory allowance for the first three months: 30 dollars per user on Business and 70 dollars per user on Enterprise, running June 1 to September 1, 2026.

Two things stay free. Code completions and Next Edit Suggestions remain included on all paid plans and never draw down credits. Everything agentic spends tokens that convert to credits: Copilot Chat, agent mode, code review, and the CLI.

The backlash was immediate. Developers reported burning large fractions of their monthly quota in hours, with some single agentic sessions eating 30 to 40 dollars in credits and the cost called hard to predict. Power users projected bill increases of roughly 10x to 50x versus the old flat plans. GitHub and Microsoft have not independently verified those community figures. The sharpest example: one engineering manager who ran a month of real team usage through the new rates told Visual Studio Magazine the bill would climb from under 1,000 dollars to over 18,000 dollars.

One detail magnified the shock. Under the old system, exhausting premium requests dropped you to a lower-cost fallback model so work could continue. GitHub confirmed that fallback experiences are no longer available. Once included credits are spent, your remaining credits and any admin budget govern usage until you add budget or the next cycle begins.

Pro Tip

On an annual Pro or Pro+ plan, you stay on the legacy request-based model with its model multipliers until the plan expires. The token meter applies to monthly paid plans now, and to annual plans at renewal.

Why agentic loops drain quotas: tokens compound across steps

Agents cost more because the cost structure is multiplicative, not additive. A single chat question is one input plus one output. An agent task is a loop: read context, reason, act, observe the result, then feed that result back as input for the next step. Every pass re-sends a growing pile of context.

The three token streams you pay for

  • Input tokens: everything sent to the model on each step, including your prompt, system instructions, tool definitions, and the file contents or command output the agent pulled in.
  • Output tokens: everything the model writes back, including reasoning, code, and tool calls. Output usually costs more per token than input.
  • Cached context tokens: repeated context reused across steps. Caching costs less per token than fresh input, but it is not free, and a long session still pays for it repeatedly.

Per-model rates make the spread wide. GitHub's pricing lists GPT-5 mini at 0.25 dollars per million input tokens, 0.025 dollars per million cached input tokens, and 2.00 dollars per million output tokens. Larger frontier models cost several times more per token, so the same agent task can differ by an order of magnitude depending on which model you pick.

Insight

The compounding trap: if an agent runs 20 steps and each step re-sends 15,000 tokens of accumulated context, that is 300,000 input tokens before a single line of output is counted. Context that grows with the conversation is the silent cost driver, not the answer length.

Worked example: where a single agent run's cost actually goes

The output is rarely the big number. The accumulated input is. Here is an illustrative breakdown of where credits go in one multi-step agent task on a mid-tier model. The token counts are a representative scenario, not a GitHub-published figure, and your real numbers depend on the model and codebase.

Cost componentWhat it representsShare of a typical agent runWhy it adds up
Accumulated inputFiles, tool output, and history re-sent each stepLargest shareContext is resent every loop and grows over time
Cached contextReused system and project contextModerate shareCheaper per token but billed across many steps
Model outputReasoning, code, and tool calls written backSmaller sharePriced higher per token but fewer tokens total
Retries and dead endsFailed steps, re-reads, wrong turnsHidden shareEach retry pays full input and output again

Read the table as a pattern. The output you actually wanted, the diff, is often the cheapest part. The expensive parts are the context the agent dragged along on every step and the steps that went nowhere and had to repeat. A task that takes the agent ten tries costs roughly ten times the tokens of a task it nails on the first attempt.

Pro Tip

Watch output token pricing on reasoning-heavy models. Hidden reasoning tokens count as output, so a model that thinks at length before answering can spend more on a short final reply than a direct model spends on a long one.

How to estimate spend before you run an agent

Estimate from steps and context size, not from how the task feels. The rough formula: expected cost is about the number of steps, times the average context tokens per step, times the model's input rate, plus output tokens times the output rate. Cached context bills at the lower cached rate.

A quick estimating method

  • Estimate steps: a focused edit might be 3 to 8 steps; a broad refactor across many files can be 30 or more.
  • Estimate context per step: a tight task keeps a few thousand tokens in play; a whole-repo task can carry tens of thousands.
  • Multiply by the model's input rate, then add output. Convert dollars to credits at one credit per cent.
  • Add a margin for retries. Doubling the raw estimate is a reasonable buffer for non-trivial tasks.

Then convert against your allowance. Pro includes 1,000 credits a month and Pro+ includes 3,900, so a task you estimate at 4 dollars (400 credits) is a meaningful chunk of a Pro month and a smaller bite of Pro+. Here is the part most cost guides skip: on a small model like GPT-5 mini, a tight chat turn can cost a fraction of a single credit, while one sprawling frontier-model agent run can cost hundreds. The same plan stretches across thousands of cheap turns or burns out in a dozen expensive ones. Estimating first turns a surprise bill into a budget decision.

Practical caps: scope tasks, cache context, watch the meter

The fastest way to cut agent cost is to shrink context and steps, then set hard spend limits. GitHub's budget controls let you cap additional AI Credits beyond the included allowance, with user-level budgets that always enforce a hard stop and a usage view in your billing settings so you can track consumption as it happens.

Tactics that move the bill

  • Scope narrowly: point the agent at specific files or a folder instead of the whole repo. Less context per step is the single biggest win.
  • Pick the model deliberately: use a small model for boilerplate and reserve frontier models for genuinely hard reasoning. The per-token gap is large.
  • Keep sessions short: start a fresh session per task so history does not pile up and re-bill. Long chats carry their entire past as input.
  • Reuse cached context: stable project instructions and reference material cost less when cached than when re-pasted as fresh input each time.
  • Avoid the retry spiral: give clear, complete instructions up front. Vague prompts cause dead-end steps that each pay full freight.
  • Set a budget cap and check usage often, especially in the first weeks, so a runaway loop cannot quietly drain the month.
Insight

The single habit that pays off most: control what context the agent carries. Every token in the context window is billed on every step it survives, so trimming context beats trimming output almost every time.

MemX: a memory layer that keeps repeated context out of the meter

Repeated context is the part of token billing you can actually shrink, and that is where an external memory layer helps. MemX (memx.app) is a model-agnostic AI memory layer that stores durable facts, decisions, and project context outside any single tool, so you retrieve the few relevant pieces a task needs instead of re-pasting large blocks into every session.

Because the relevant context gets fetched on demand rather than carried in full conversation history, the input you send per step stays smaller, and that is the exact variable that drives token cost. MemX is private by architecture: per-user isolation, encryption at rest, and on-device options. It is not end-to-end encrypted and not zero-knowledge, and it does not replace Copilot. It feeds agents tighter context so the meter has less to count.

When usage-based pricing is cheaper than flat plans

Here is the contrarian read most of the outrage misses: for a lot of developers, the meter is actually cheaper. Usage billing wins for light and bursty users and loses for steady heavy agent users. If your work is mostly completions plus occasional chat, the metered model can cost less than a flat fee, because completions are free and your token use stays low. If you run long agent sessions across large codebases all day, flat pricing was the better deal and the new model will cost more.

  • Mostly completions, light chat: metered wins. Completions are free and low token use keeps spend small, while a flat plan charged the full fee regardless.
  • Occasional, well-scoped agent tasks: roughly even. Metered spend stays predictable against a fixed flat cost.
  • All-day agent sessions on big repos: flat plan wins. Metered spend runs high and variable; a fixed cost capped the damage.
  • Team with mixed usage: depends on the mix. A pooled allowance plus budget caps can beat per-seat fixed pricing, or lose to it, by where the team's heaviest users land.

The honest takeaway: the change rewards discipline. Developers who scope tasks, choose models with intent, and trim context will often pay less than before. Developers who let agents roam a whole repository on a frontier model will pay much more, and the meter makes every step of that habit visible.

Frequently Asked Questions
01What are GitHub Copilot AI Credits?

GitHub AI Credits are the unit of usage-based Copilot billing introduced June 1, 2026. One credit equals one cent. Credits are spent on token usage, counting input, output, and cached context at each model's published API rate. Paid plans include a monthly credit allowance matched to the plan price.

02Why did my Copilot credits run out so fast?

Agent mode and chat meter every step. An agent reads files, reasons, acts, and re-reads results in a loop, re-sending growing context each pass. That accumulated input, plus output and retries, compounds fast, so a long session on a frontier model can spend a large share of a monthly quota in hours.

03Do code completions use AI Credits?

No. Code completions and Next Edit Suggestions stay included on all paid plans and never consume AI Credits. Credits are spent by Copilot Chat, agent mode, code review, and the CLI, which send and receive tokens that convert to credits at per-model rates.

04How do I estimate the cost of a Copilot agent task?

Multiply expected steps by average context tokens per step by the model's input rate, then add output tokens times the output rate, and convert dollars to credits at one credit per cent. Add a margin for retries. Steps and context size matter more than the length of the final answer.

05Is Copilot usage-based billing cheaper than the old plan?

It depends on usage. Light users who rely on completions and occasional chat often pay less, since completions are free and token use is low. Heavy agent users running large codebases all day usually pay more. Scoping tasks and choosing smaller models lowers the bill.

Read Next

Or try MemX to access 40+ AI models in one place — including Claude Sonnet 4.6 and GPT-5.4 — and get your questions answered today.

Was this article helpful?

Found this useful? Share it with someone who needs it.

Free · iOS, Android & WhatsApp

Stop losing what you save.
Let MemX remember it for you.

Every screenshot, photo, PDF and voice note — captured, encrypted, and instantly searchable. Ask in plain English, get the answer in seconds.

  • Reads text inside images and handwriting
  • Private and encrypted by default
  • Free to start, no credit card

Takes under a minute to set up. Your data stays yours.

Aditya Kumar Jha
Written by
Aditya Kumar JhaLinkedIn

Core software engineer at MemX, where he builds the website, backend, and data systems. Also a published author of six books on Amazon KDP, writing on AI, memory, and behavior.

Keep reading

More guides for AI-powered students.