Self-Organizing AI Agent Swarms. On Demand.

k8s orchestrates containers. a8s orchestrates AI agents. Inspired by Kubernetes.

Agentnetes

Zero to a Self-Discovering AI Agency.

One goal. A swarm of agents spins up, each in its own sandbox and coding agent. They research, build, collaborate, and deliver together.

RLM PatternMIT CSAIL

AutoResearch LoopKarpathy

Two-Tool MCPsearch() + execute()

A2A ProtocolGoogle Agent-to-Agent

$agentnetes run

Try Demo Read the docs

terminalnpmjs.com/package/agentnetes →

$ npm install -g agentnetes

then run on any git repo

$ agentnetes run "add dark mode to this app"

parallel agents

0 tools

per agent (MCP)

0K+

lines explored

0 goal

to a full team

How It Works

From one goal to a working team.

Type a goal. The system researches, decomposes, builds, and delivers — autonomously.

01 · Goal

$ agentnetes run "Add billing with Stripe"

One prompt. No config, no YAML, no agent definitions.

02 · Decompose

Root Agent · RLM

search → analyze → plan → split

UI Eng

API Eng

DB Eng

Tester

Root agent auto-researches the codebase and invents the right team.

03 · Sandboxes

UI EngmicroVM

API EngmicroVM

DB EngmicroVM

TestermicroVM

Firecracker Coding harness

04 · Auto-Research

API EngineerRLM loop 3/5

search···

read···

think···

execute···

verify···

Each agent runs a tight RLM loop — search, read, think, execute, verify — repeating until solved.

05 · A2A Collaborate

Tester

Webhook sig check fails

API Eng

Patched env config — re-run

Tester

12/12 tests passing ✓

Agents coordinate via A2A protocol — no shared memory or context window.

06 · Deliver

Complete52s

✓9 files modified

✓Stripe webhooks + billing UI

✓12/12 tests passing

agents

sandboxes

RLM

52s

time

The Problem

Single agents break on real codebases.

Teams have tried everything: single coding agents, prompt-stuffed RAG pipelines, hand-wired MCP tools, elaborate harnesses. They all hit the same wall. Real codebases are too large for one context window, too complex for one agent, and too dynamic for static tool configs. Nobody stopped to ask: what if the agents invented their own team?

Single agent + tools

Cursor, Copilot, custom harnesses

✗Entire codebase stuffed into context
✗Context rot beyond ~50K tokens
✗Sequential: one task at a time
✗One failure halts everything
✗Static tool lists, no emergent roles

Agentnetes

vRLM runtime · RLM pattern

✓Context lives in sandboxes, not prompts
✓Agents explore via grep, find, cat
✓Specialists run in parallel
✓Agents catch and fix each other's errors
✓Roles invented per goal, never hardcoded

Teams using AI tools

Copilot, Cursor, RAG, MCP harnesses

✗Try different coding agent harnesses
✗Hand-wire MCP tools and hope they compose
✗Build RAG pipelines to stuff context
✗Tune prompts and pray context holds
✗Never break the problem down RLM-style

Who Is This For

Built for developers who move fast.

🧑‍💻

Solo developers

Get the leverage of a full engineering team on your personal projects. Ship faster without hiring.

🏗️

Monorepo teams

Point the swarm at your large codebase. Agents explore only what they need · no context limits.

🔐

Security engineers

Spin up a dedicated audit swarm to scan, analyse, and fix vulnerabilities across the entire codebase.

⚙️

Platform engineers

Automate cross-cutting concerns · test coverage, observability, migrations · across all services.

Differentiation

Not another AI coding assistant.

Agentnetes is not autocomplete. It is not a chatbot. It is a swarm that investigates, builds, tests, and delivers.

Self-organizing teams

The model invents the roles it needs. A provider task spawns a Scout, Engineer, Tester. A security audit spawns an entirely different team. Nothing is hardcoded.

Real code execution

Agents run real shell commands in real sandboxes. Tests actually execute. Build failures get fixed. This is not a simulation of engineering · it is engineering.

Any model, same swarm

Swap Gemini for Claude or GPT by changing one env var. The swarm architecture is model-agnostic. Route through Vercel AI Gateway or call Google directly.

Two-tool MCP strategy

Each agent has exactly two tools: search() and execute(). ~1,000 token footprint regardless of codebase size. No tool bloat, no context waste.

Context stays external

Files never enter the prompt. Agents write code to explore the codebase · grep, find, cat. Proven by the MIT CSAIL RLM paper to outperform context-stuffing 2×.

Agents fix each other

When the Tester finds a bug, it routes back to the Engineer automatically. The swarm has a built-in try → test → fix loop that runs until tests pass.

Architecture

One goal. A recursive agent swarm.

The root agent decomposes your goal, invents the right team of specialists, and orchestrates them across isolated sandboxes. Roles are fully emergent. Nothing is hardcoded.

goal:"Add @ai-sdk/deepseek provider"

Root Agent / Tech Lead

Gemini 2.5 Pro via AI Gateway

vRLM orchestrator

🔍

Architecture Scout

Gemini 2.5 Flash

search()

⚙️

Provider Engineer

Gemini 2.5 Flash

execute()

🧪

Test Engineer

Gemini 2.5 Flash

execute()

📦

Package Engineer

Gemini 2.5 Flash

execute()

Firecracker microVM per agentsearch() + execute() MCP onlySSE event stream to UI

Emergent team formation

The root agent reads the codebase, understands the task, and invents the right specialist roles. A provider task gets a Scout, Engineer, Tester, and Packager. A security audit gets an entirely different team.

Isolated Firecracker sandboxes

Each agent runs in its own Vercel Sandbox (Firecracker microVM). Pre-warmed from a repo snapshot for near-instant startup. Agents cannot interfere with each other.

Context externalized, not stuffed

Agents do not receive hundreds of files in their prompts. They write code to explore context: grep, find, cat. This is the RLM Pattern from MIT CSAIL, proven 2x more effective.

Agents collaborate at runtime

When the Test Engineer finds a type error, that finding routes back to the Provider Engineer automatically. The vRLM runtime handles inter-agent communication.

Live Preview

Watch the swarm execute

Real output from a simulated run. The actual runtime produces identical event streams.

agentnetes -- vRLM runtime

Root orchestrator

Shell execution

Worker agent

Inter-agent collaboration

Completion

Research Foundations

Five ideas. One system.

Agentnetes combines five research-backed patterns that individually improve agent performance and together create something qualitatively different.

MIT CSAIL

RLM Pattern: Context lives in sandboxes

Recursive Language Model runtime. Instead of stuffing files into an agent's context window, context is externalized into a filesystem that agents explore programmatically.

Agents write small shell scripts to grep, find, and read exactly what they need. This keeps token footprints tiny regardless of codebase size and has been shown to outperform naive context-stuffing by 2x on software engineering benchmarks.

agent.execute('grep -r LanguageModelV1 packages/ -l')

Based on: RLM paper, MIT CSAIL -- externalized context via code execution

Karpathy

AutoResearch Loop: Try, measure, keep

Agents do not write code and hope for the best. They write code, run tests, measure the result, and either keep the change or discard it and try again.

The Test Engineer runs vitest after every change. Type errors trigger re-implementation. Test failures trigger targeted patches. The loop runs until everything passes or the agent escalates to root.

run tests -> check failures -> patch -> repeat

Based on: AutoResearch pattern -- Andrej Karpathy

MCP v2

Two-Tool MCP: search() and execute()

Each agent has exactly two tools exposed via MCP: search() for finding things in the codebase, and execute() for running shell commands in its sandbox.

This keeps the agent's tool surface to roughly 1,000 tokens regardless of task complexity. The agent writes arbitrary code against these two primitives. No tool proliferation. Just two powerful primitives that compose into anything.

tools: [search(), execute()] // ~1000 token footprint

MCP protocol -- two-tool minimal surface strategy

A2A Protocol

A2A-Ready: Every agent is a publishable service

Every agent Agentnetes spawns generates a standard A2A Agent Card describing its capabilities, skills, and endpoints.

Today these cards are internal. Tomorrow any specialist agent can be published as an independent, discoverable service. A Provider Engineer becomes a reusable service any other system can call.

GET /agents/provider-engineer/.well-known/agent.json

A2A Protocol v1.0 -- Google Agent-to-Agent standard

Kubernetes-inspired

Load Balancing: Work distributed across the swarm

All specialist agents run concurrently via Promise.allSettled. Work is load-balanced across the agent pool automatically. One agent failing never blocks the others.

maxWorkers caps concurrency like Kubernetes resource limits on a node. Promise.allSettled provides fault-tolerant dispatch: a failing Engineer never kills the Scout or Tester. The swarm delivers what it can regardless of individual failures.

Promise.allSettled(workers.map(task => runWorker(task)))

Kubernetes-inspired: parallel execution + fault-tolerant dispatch

Stack

Built on the bleeding edge

Every layer is the latest available. No legacy versions. No compatibility shims.

AI Runtime

ai (Vercel AI SDK)

v7.0.0-beta.33

ToolLoopAgent

beta agent primitive

@ai-sdk/google

v3.0.52

ToolLoopAgent is the core primitive in AI SDK v7 beta. Each worker runs two MCP tools: search() and execute().

Sandbox

Docker

node:20-alpine · local default

@vercel/sandbox

Firecracker microVMs

Local shell

no-install fallback

Docker is the default sandbox for local runs. Vercel Firecracker available for cloud self-hosting.

Models

Gemini 3.1 Pro

planner · latest

Gemini 2.5 Flash

planner + worker · default

Gemini 3.1 Flash-Lite

worker · fastest

Full Gemini 2.0 · 2.5 · 3.x lineup supported. Separate planner and worker models configurable in the UI.

lib/vrlm/runtime.ts

// Each worker: two tools, one sandbox, emergent role

const agent = new ToolLoopAgent({

model: google('gemini-2.5-flash'),

tools: { search, execute },

stopWhen: stepCountIs(40),

instructions: buildWorkerPrompt(task),

});

// Drive the tool loop; events stream to the UI via SSE

for await (const _ of result.fullStream) { ... }

Workflow

One sentence. Dynamic team. Real results.

From goal to dynamic team formation to delivered artifacts in isolated sandboxes.

You give a goal

Type what you want built. Not instructions on how. Just what. Agentnetes figures out the rest. The goal can reference a GitHub repo or an uploaded codebase.

Root agent explores

The Tech Lead spawns a Firecracker sandbox pre-warmed with the target repo. It uses grep, find, and cat to map the architecture. Context lives in the filesystem, not the prompt.

Team self-assembles

Based on what it finds, the root agent invents the team. Roles, goals, and dependencies are all emergent. A provider task gets different specialists than a security audit.

Agents work in parallel

Each specialist runs concurrently in its own Firecracker microVM. They explore, write code, run tests, and fix failures. No sequential bottlenecks.

Agents collaborate

Test failures and findings route back to the right specialist automatically. The vRLM runtime handles inter-agent communication. No human needed to relay.

Synthesis and delivery

The root agent collects all artifacts, verifies completeness, and streams a structured summary to the UI with every generated file. Each agent publishes an A2A card.

Runtime

vRLM: The Orchestration Runtime

Virtual Recursive Language Model Runtime. The engine between your goal and the agents. Inspired by the RLM pattern from MIT CSAIL.

01Plan

Root agent explores the repo with grep and find, then calls the Gemini planner to invent a specialist team. Roles are fully emergent. Nothing is hardcoded.

02Execute

Workers run in parallel. Each gets an isolated Docker container with the repo pre-cloned, and two tools: search() to grep the codebase and execute() to run any shell command.

03Synthesize

When all workers complete, the root agent reads their findings and artifacts, then produces a structured summary. Every generated file is collected and streamed to the UI.

Event Stream · SSE

Every phase emits typed events over Server-Sent Events. The UI subscribes and renders agent activity in real time with no polling.

task-creatednew agent spawned

task-updatedstatus or progress change

task-completedagent finished with artifacts

task-failedagent error

findingagent discovered something

terminalshell command + output

artifactfile produced by an agent

collaborationinter-agent finding shared

synthesisroot agent final summary

donerun complete

errorruntime error

lib/vrlm/types.ts

interface VrlmConfig {

maxWorkers: number; // default 6

maxStepsPerAgent: number; // default 20

plannerModel: string; // orchestrator

workerModel: string; // specialists

repoUrl: string; // cloned per agent

sandboxProvider: 'docker' | 'vercel' | 'e2b' | 'daytona' | 'local';

googleApiKey?: string; // UI override

}

Get Started

Three ways to run

From instant browser demo to full local execution with Docker sandboxes

no setup

Simulation

Watch the full agent lifecycle in your browser. No API key, no Docker, no install. Pre-scripted scenarios replay real event sequences.

Open /demo → toggle Simulation → watch

Try Demo

recommended

Self-host · Local

Clone the repo and run locally. Both simulation and real agent execution available. Configure your Google API key and target repo directly in the UI. No .env file needed.

$ git clone · npm install

$ docker pull node:20-alpine

$ npm run dev → open /demo

Read the docs

CLI

CLI · Any Repo

Install the npm package globally and run against any local git repo. Requires Docker running locally and a Google API key.

$ npm install -g agentnetes

$ export GOOGLE_API_KEY=...

$ agentnetes run "your goal"

npm package

Models

Any model. Same swarm.

Bring your own Google API key and swap models from the UI. No config file needed. Separate planner and worker models let you balance quality vs cost.

Gemini 3.x, 2.5, 2.0 all supported

Separate planner and worker models

BYOK: paste API key directly in the UI

Claude / GPT-4o supportsoon

Gemini 3.1 Pro

Planner · latest

new

Gemini 2.5 Flash

Planner + Worker · default

recommended

Gemini 3.1 Flash-Lite

Worker · fastest

fast

Gemini 2.0 Flash

Worker · budget

createGoogleGenerativeAI({
apiKey: process.env.GOOGLE_API_KEY
});

Demo

See it in action

One goal in, a swarm of self-discovering agents out. Watch the full pipeline from goal to delivered code.

Try it now

See the swarm execute

Give it a goal. Watch specialist agents spawn, explore, implement, test, and deliver inside isolated sandboxes. Under 90 seconds.

Try Demo Read the docs