The road to COMP4020: token management by proxy

22 Apr 26

comp4020

Tip

This post is part of a series I’m writing as I develop COMP4020: Agentic Coding Studio. See all posts in the series. This one is a direct follow-up to managing the strategic token reserve—now that Anthropic have come to the party, we can design the quota-enforcement tooling against the actual controls their API provides.

So: we have a $500k pool of Claude credits sitting behind a workspace-level API key, and we have ~200 students who need to share that pool without eating each other’s lunch. The Anthropic Admin API will get us part of the way—you can create and revoke keys programmatically, set monthly workspace spend caps in the Console, pull usage reports—but the controls are too coarse for what we actually want to do: per-student weekly token allocations with predictable resets and optional carryover, an audit log with enough detail to give the course policy around token use some actual teeth, and a safety net that catches leaked API keys in plaintext before they escape onto the open internet.

The only way to get all of that is to put a proxy between the students’ Claude Code sessions and the Anthropic API, and enforce the class-specific policy there. Happily, the School of Computing’s infrastructure team has signed on to build it and host it on their own infrastructure—this isn’t a solo project, and the scope gets a lot more realistic with their help. The thing has four jobs.

Authentication and transparent passthrough. One real Anthropic API key (tied to our COMP4020 workspace) on the egress side. On the ingress side, each student has their own virtual API key—issued by us, revocable by us, completely separate from Anthropic’s auth system. Students point their Claude Code config at the proxy with their virtual key; from their side, it looks like the Anthropic API. Claude Code talks to /v1/messages and gets native Anthropic-shaped responses back; streaming, tool use, and prompt caching all work unmodified.

Per-student quotas with time-based reset. The proxy counts tokens against each student’s allocation, stops serving requests when they hit their limit, and resets on whatever cadence we settle on—probably weekly, probably with some carryover, but that’s still an open question.

Full-traffic logging to a local database. Every request, every response, tied to the student identity that the virtual key resolves to. This is the foundation for the audit trail and (with consent) the research corpus, which I’ll come back to.

Leaked-credential detection. If a student pastes their virtual key into a prompt—or any secret matching a known pattern like sk-ant-api...—the proxy detects it, auto-suspends the offending key, and alerts us. Accidents happen, especially when students are new to agentic workflows. Better to catch them before the key ends up in a public GitLab repo.

Of those four jobs, the logging is the piece that takes the most thought, because it cuts in a few directions at once. The simple answer is that it’s the enforcement mechanism. Course policies—use only for coursework, no on-selling, no harassment, no circumventing academic integrity—are only as real as our ability to check that they’re being honoured. Students will be told this explicitly, at the start of the course: traffic through the class proxy is logged. If they want to use Claude Code outside class for personal projects, nothing stops them; that’s their business, on their own Anthropic key, not ours.

It also dovetails with the assessment design I wrote about last week. Students are already handing in their Claude Code JSONL session logs as part of each assignment—those logs live on the student’s machine and capture the full local harness state (their CLAUDE.md, subagent dispatches, slash command expansions, and so on). The proxy-side logs are a server-side counterpart. They don’t replace the JSONL logs, but they do make certain claims checkable that the client-side logs alone don’t. A student can’t quietly delete a JSONL and tell me a different story about what happened; the proxy saw the traffic.

With consent and anonymisation, aggregated proxy logs also become a research corpus. What does the token-usage curve look like across a 200-student cohort working on the same weekly provocation? When do students hit context limits? What does session activity look like in the hours right before the aha moment?

That leaves one practical question: how much of this do we actually have to build from scratch? LiteLLM is the obvious candidate—the Claude Code docs themselves point at it as a supported LLM gateway. It’s MIT-licensed, self-hostable, and Python. Virtual keys, spend tracking, and Postgres-backed logging are all first-class. More importantly, there’s an Anthropic-native passthrough at /anthropic/v1/messages that lets Claude Code talk the native protocol rather than being coerced through an OpenAI-compatible translation layer. That last bit matters more than it sounds—a proxy that makes Claude Code work almost like the real thing is worse than no proxy at all.

For roughly 80% of the brief, it’s a drop-in: virtual key CRUD via /key/generate, /key/block, /key/delete; the Anthropic-native passthrough; dollar-denominated budgets with budget_duration set to “7d” or “1mo” that reset automatically; and a LiteLLM_SpendLogs Postgres table capturing per-request metadata.

The remaining 20% is where it gets interesting, because it’s the class-specific policy stuff that probably should be ours:

Bottom line: LiteLLM for the plumbing, custom hooks for the policy. The alternative—hand-rolling the whole thing in, say, Elixir or Go—would mean reimplementing the virtual-key lifecycle, the Anthropic passthrough, the spend accounting, and the admin endpoints. That’s not work anyone wants to sign up for when a reasonable baseline already exists. The plan is to start with LiteLLM and write the class-specific bits on top, falling back to hand-rolling only if we hit a wall we can’t climb with a custom callback.

A few things I’m still thinking through:

No doubt more questions will surface once the team sits down to actually build the thing.

Cite this post
@online{swift2026comp4020TokenManagementByProxy,
  author = {Ben Swift},
  title = {The road to COMP4020: token management by proxy},
  url = {https://benswift.me/blog/2026/04/22/comp4020-token-management-by-proxy/},
  year = {2026},
  month = {04},
  note = {AT-URI: at://did:plc:tevykrhi4kibtsipzci76d76/site.standard.document/2026-04-22-comp4020-token-management-by-proxy},
}