The Role
You'll spend the term building inside the intelligence layer between our models and the real world. Vecna's Virtual Workers execute long, multi-step operations across hundreds of tools without losing coherence, context, or intent — and you'll work alongside our engineers on the orchestration, memory, tool design, and reasoning architectures that make that possible.
Single-task agents are table stakes. The hard problem is the layer above them: how dozens of agents coordinate, hand off work, share context, recover from failure, and stay aligned with a strategic objective across hours of autonomous operation. Most agent systems collapse somewhere between step 50 and step 200 — context gets corrupted, plans drift, tools fail silently. We're building the system that doesn't, and you'll own a real, scoped piece of it.
This is a hands-on engineering internship working directly with the founding team. You'll ship code that goes into the platform, not a side project that gets thrown away at the end.
What You'll Work On
You'll take ownership of a focused project within one of these areas:
- Multi-agent orchestration — supervisor and worker patterns, role-based delegation, sub-agent spawning, and the coordination primitives that let agents collaborate without stepping on each other.
- Procedural environment generation — the harnesses that generate synthetic gyms and self-mutating environments programmatically.
- Async event bus and message passing — the substrate over which agents publish, subscribe, hand off work, and react to environmental changes, with retries and ordering guarantees that hold up under load.
- Context management — long-horizon coherence through summarization, relevance scoring, pruning, and memory externalization across sessions and worker boundaries.
- Persistent memory and bitemporal graph state — knowledge and spatial graphs that model the operational environment, asset relationships, and cross-session state, tracking both when a fact was true and when the system learned it so agents can reason about state as of any point in time.
- Tool design and execution environments — browser automation, computer use, sandboxed terminals, and code interpreters that agents can invoke safely.
- Self-reflection and recovery — agents that detect their own failures, critique their reasoning mid-task, backtrack, and retry with improved strategies instead of failing silently or compounding a bad plan.
- Self-improvement loops — agents that learn from their own trajectories: mining past runs for what worked, distilling successful strategies into reusable behavior, and getting better at a task across episodes rather than starting cold each time.
- Long-horizon planning and reasoning — OODA-loop execution, dynamic plan decomposition, confidence-gated action selection, and the architectures that keep an agent aligned with a strategic objective across hours of operation and hundreds of steps without drifting.
Who Thrives Here
We care less about pedigree than evidence. Three traits are non-negotiable:
- Hacker mindset — You see systems as something to take apart and understand, not consume as designed. You read the source, poke at edges, and assume there is a faster or more correct way. The instinct to question how things work is the floor.
- Break things, then stabilize them — Robustness in long-horizon agent systems isn't a feature you add at the end; it's the product. You push systems until they fail, study the failure honestly, and rebuild stronger. We don't punish breakage that produces learning — we punish hiding it.
- Try harder — When the obvious approach fails, you find the second, third, and fifth. You re-read the paper, instrument the system, ask a smarter question. Most ceilings here are self-imposed.
You Might Be a Fit If You
- Are pursuing a BS, MS, or PhD in computer science, engineering, or a related field — or are skipping that path and have the repos to back it up.
- Have built something real with LLMs or agents — a project, research prototype, open-source contribution, or hackathon build you can walk us through. We want builder evidence: shipped systems, repos, papers — not slide decks.
- Are strong in Python, comfortable with async, and have written code other people depend on.
- Have exposure to one or more of: distributed systems, event-driven architectures, simulation or procedural environment generation, long-horizon planning or self-reflection loops, graph data models, or tool/API integration.
- Operate with high agency, hold strong opinions you update quickly, and are comfortable in ambiguity.
A lodging stipend is provided for those outside Chicago.