Vecna AI Applied Scientist Intern Chicago, IL · Intern Company website

Applied Scientist Intern

About Vecna AI

Vecna is an early-stage vc backed team building agent orchestration infrastructure for autonomous, long-horizon operations. We work on the hard parts — coordination, memory, reasoning, and recovery — at the edge of what current agent systems can do. Headquartered in Chicago.

Description

The Role

You'll spend the term working on the models that sit at the core of the platform. Vecna's models execute multi-step task chains across complex environments, route to the right tools and reasoning strategies, and improve from real-world trajectories. You'll work alongside our research team on a focused piece of the post-training, evaluation, or serving stack.


The hard part of training agents isn't fitting a curve to a benchmark — it's training models that stay coherent across hundreds of tool calls in environments they've never seen, reason over structured context that doesn't fit in any prompt, and recover when their plan breaks at step 173. Static benchmarks don't measure this. Static datasets don't teach it. So we generate our own: procedurally generated synthetic gyms and complex, self-mutating environments that change under the agent's feet. We post-train on real trajectories, reward at the trajectory level, and evaluate against environments that fight back.


This is a research-heavy internship working directly with the founding team. Your experiments and infrastructure feed directly into how our models get trained and evaluated — not a sandbox that gets shelved at the end. We create pathways for our team to publish and contribute to the community; credibility in this field is earned by advancing it.


What You'll Work On

You'll take ownership of a focused research project within one of these areas:


  • Post-training and alignment — SFT, DPO, GRPO, and trajectory-level rewards over real agent runs, built on a distributed training stack designed for long-horizon, tool-using behavior.
  • Procedural environment generation — synthetic gyms and self-mutating environments built programmatically rather than hand-authored: domain randomization, non-stationary dynamics, and adversarial perturbations that span task distributions no fixed benchmark could cover, with reward shaping that resists reward hacking.
  • Agentic online RL — models that learn from live tool interactions and trajectory outcomes rather than static datasets, with reward shaping that captures partial progress, recovery quality, and strategic coherence.
  • Bitemporal context graphs as model substrate — how models read from, write to, and reason over persistent graphs that track both when a fact was true in the world and when the system learned it, including subgraph retrieval at inference time and graph-aware prompt construction.
  • Graph-based reasoning — path traversal, link prediction, and next-hop selection across complex topologies, and how models learn to plan multi-step trajectories through graph-structured state.
  • Evaluation infrastructure — offline benchmarks, LLM-as-judge frameworks, trajectory scoring, red-teaming, and automated regression testing that catches capability drift before it ships.
  • Inference optimization and serving — quantization, speculative decoding, KV-cache management, and deployment tuned for agentic, long-context workloads.


Who Thrives Here

We care less about pedigree than evidence. Three traits are non-negotiable:


  • Hacker mindset — You see systems as something to take apart and understand, not consume as designed. You read the source, poke at edges, and assume there is a faster or more correct way. The instinct to question how things work is the floor.
  • Break things, then stabilize them — Robustness in long-horizon agent systems isn't a feature you add at the end; it's the product. You push systems until they fail, study the failure honestly, and rebuild stronger. We don't punish breakage that produces learning — we punish hiding it.
  • Try harder — When the obvious approach fails, you find the second, third, and fifth. You re-read the paper, instrument the system, ask a smarter question. Most ceilings here are self-imposed.


You Might Be a Fit If You

  • Are pursuing an MS or PhD in ML, CS, or a related field — or have equivalent research depth earned outside that path.
  • Have done real research in one or more of: post-training, reinforcement learning, alignment, evaluation, or agentic systems — coursework, a publication, a preprint, or a substantial project you can walk us through. We want builder evidence: repos, papers, trained models — not slide decks.
  • Are proficient with PyTorch and have trained or fine-tuned models, even if not yet at production scale.
  • Have exposure to one or more of: RLHF/DPO/GRPO, LLM evaluation, procedural environment or task generation, graph-structured context (retrieval over knowledge/memory graphs, GNNs), or inference optimization.
  • Reason from first principles, are intellectually honest about what you don't know, and treat being wrong as information rather than failure.


A lodging stipend is provided for those outside Chicago