Ask Ben (powered by Gemma 4)

3 Apr 26

aiweb

Google just released the Gemma 4 model family—their most capable open models yet, and the smallest variants are small enough to run entirely in your browser. The E2B model has 2.3 billion effective parameters (5.1B total), where the “E” stands for “efficient”—it uses a technique called Per-Layer Embeddings that reduces compute at inference time while keeping the model’s full representational capacity.1

So naturally I had to see if I could get it running on this site. The widget below loads the LiteRT web build of Gemma 4 E2B via Google’s MediaPipe LLM Inference API and WebGPU, then primes it with content from this site—bio, CV, research interests, and blog post titles—as its system prompt. You can ask it questions about me and my work, and it’ll do its best to answer from that context.

Warning

Some caveats, because they matter. This is a 2.3B parameter language model, not actually me. It will hallucinate. It will get things wrong. It may confidently tell you I have opinions I don’t hold or have done things I haven’t done. Treat it as a fun experiment in on-device inference, not as a reliable source of information about me.

It requires a desktop browser with WebGPU support (Chrome, Edge, Safari 17+, or Firefox 141+), and the initial model download is around 2 GB—though it’s cached in your browser after the first visit. Speed depends on your GPU—on a decent discrete GPU it should be fairly responsive, but on integrated graphics expect it to be slower. This is a 2B model doing inference in your browser, not a cloud API.

#Footnotes

  1. Each decoder layer gets its own small embedding table rather than sharing a single large one. This means the total parameter count is higher than the effective count, but the model only activates the embeddings it needs for each layer. Clever trick.

Cite this post
@online{swift2026askBenPoweredByGemma4,
  author = {Ben Swift},
  title = {Ask Ben (powered by Gemma 4)},
  url = {https://benswift.me/blog/2026/04/03/ask-ben-powered-by-gemma-4/},
  year = {2026},
  month = {04},
  note = {AT-URI: at://did:plc:tevykrhi4kibtsipzci76d76/site.standard.document/2026-04-03-ask-ben-powered-by-gemma-4},
}