The road to COMP4020: what's the theory here?

Ben Swift

The road to COMP4020: what's the theory here?

30 Mar 26

Tip

This post is part of a series I’m writing as I develop COMP4020: Agentic Coding Studio. See all posts in the series.

I was chatting with the CIO of a government agency last week and they asked about this course. They’re wrestling with this very issue: in the age of agentic coding agents, how do I structure my teams, development workflows and QA processes?

The answer which I gave him, which I believe to be true in my bones, is that nobody really knows for sure. But a lot of folks are trying to figure it out—some quietly with their organisations, some very loudly on (ugh!) LinkedIn, and lots in between¹.

The question was really one of theory; while there are many in the hard sciences that would criticise software development/engineering as having a lamentably loose proof relationship between the theory (agile! scrum! 10x developers!) and what works in practice, at least there are theories about what works and what doesn’t, and there’s enough agreement about what these courses are for people to write textbooks and run university degrees. But since Claude Code was released in May 2025, you can feel the ground shifting.

Here’s my attempt at a survey. Some of these are genuinely impressive; others I’m still not sure about. I’ve split them into rough categories, though there’s plenty of overlap—the best frameworks come with tools, and the best tools embody a theory about how work should flow.

Info

None of the resources below are “getting started” guides—if you haven’t actually used any of these tools yet, each vendor has official onboarding docs: Claude Code, Codex CLI, Gemini CLI, and GitHub Copilot CLI. These are practical “here’s how to install and use our thing” resources, not theories about how agentic coding should work—but they’re worth running through before diving into the methodological debates below.

#Frameworks and methodologies

These are the structured approaches—the ones with a name, a thesis, and (usually) a manifesto. They’re trying to answer “how should we work with agents?” rather than just “how do we work with agents?”

John Regehr’s zero degree-of-freedom approach—constrain the agent so tightly that there’s only one correct output
Simon Willison’s Agentic Engineering guide—ongoing, “kind of book-shaped”, and probably the most comprehensive single resource right now
spec-driven development—Harper Reed’s “hero’s journey” post is the canonical intro, and the thesis is blunt: “the spec is the godhead”
Jesse Vincent’s Superpowers—enforces mandatory design, planning, and TDD via composable markdown “skills” that you bolt onto your agent
Anthropic’s own practices, documented in their Claude Code best practices guide and the how Anthropic teams use Claude Code blog post
Birgitta Böckeler’s harness engineering—a framework of “guides” (feedforward controls) and “sensors” (feedback controls) for steering coding agents toward better output while reducing human supervision
the Deer Valley retreat consensus, written up by Martin Fowler—about 50 luminaries (Beck, Yegge, Gene Kim, etc.) locked in a room hashing out the future of software development. Chad Fowler’s framing: “the rigour has to go somewhere”

#Tools

Things you can actually pip install or npm install and try right now. These overlap with the frameworks above—the best tools are opinionated about workflow.

Deciduous—decision trees for AI coding agents, so your agent’s choices are queryable and persistent
Chainlink—a CLI issue tracker designed specifically for AI agent workflows
Gas Town—Steve Yegge’s multi-agent orchestrator for running 20–30 Claude Code instances in parallel (named after Mad Max, naturally)
GitHub’s spec-kit—an official toolkit for spec-driven development, agent-agnostic
Plandex—plan-first CLI agent with version-controlled plans and a sandbox for reviewing diffs before applying
Jeremy Howard’s Solveit—explicitly “the opposite of vibe coding”, all about small steps and deep understanding

#Reflections from practitioners

And then there’s the “I tried it and here’s what I think” genre. These are valuable precisely because the authors have enough credibility and experience to say something beyond “wow, cool”—though several of them do also say “wow, cool”.

Andrej Karpathy coined “vibe coding” and later refined his thinking toward “agentic engineering”
Ryan Dahl (Node.js, Deno) declared “the era of humans writing code is over”—2.3 million views and counting
Mitchell Hashimoto documented the full arc from scepticism to productive use
Charity Majors argues 2025 was for AI what 2010 was for cloud—but writing code was always the easy part; observability and ops are where it gets real
Armin Ronacher—practical agentic coding recommendations from the Flask/Sentry creator
Maggie Appleton on agent orchestration patterns and why design and critical thinking are the new bottlenecks, not code generation
Cassidy Williams found vibe coding effective but joyless—“there’s no ‘YAY I am a GENIUS’ feeling”
Kent Beck on TDD as counterbalance to AI agents (“agents keep trying to delete tests to make them pass”)
antirez (Redis creator) on “automatic programming” vs vibe coding—“LLMs are good amplifiers and bad one-man-band workers”
Jeremy Howard—sceptical of autonomous agents, built Solveit as the antidote
Paige Bailey on AI developer tools at Google and why Gemini 3 is built for “acting and coding”, not just chatting
DHH flipped from sceptic to enthusiast and made it look dramatic
Donald Knuth opened with “Shock! Shock!” after Claude solved an open graph theory problem he’d been working on for weeks
Terry Tao on AI-assisted mathematical exploration at scale

So the issue isn’t so much that there’s no theory for Agentic Coding, but there are lots of nascent (and unverified) theories and it’s hard to know which ones are legit.

But I think this is an opportunity for my class; I’ll have 100-200 (maybe more! estimating student numbers is hard) switched-on final-year and postgraduate students to try out these theories and see what works. One of the weekly provocations is explicitly about that—finding one theory, using it to build a prototype, and reporting the reports back to the class.

What will we find? Who knows? The models will also be six months further on by the end of the course, so the strengths/weaknesses and bottlenecks may shift further from where they are now. But I’ll share the results of this experiment on this blog—stay tuned.

I mean, here I am writing a blog series about this course, so I can’t exactly claim to be one of the quiet ones. ↩

Cite this post

@online{swift2026comp4020WhatsTheTheoryHere,
  author = {Ben Swift},
  title = {The road to COMP4020: what's the theory here?},
  url = {https://benswift.me/blog/2026/03/30/comp4020-whats-the-theory-here/},
  year = {2026},
  month = {03},
  note = {AT-URI: at://did:plc:tevykrhi4kibtsipzci76d76/site.standard.document/2026-03-30-comp4020-whats-the-theory-here},
}

The road to COMP4020: what's the theory here?

#Frameworks and methodologies

#Tools

#Reflections from practitioners

#Footnotes