Skip to content

Glossary

Agent

An agent is a piece of software, written by a Vivaria user, that takes actions to complete a task. Most agents use one or more LLMs to decide which action to take next. Agents often take actions like running shell commands, executing Python code, and editing files. Agents can also contain more complicated logic, like generating multiple possible next actions and choosing between them.

In Vivaria, agents must have a Python file called main.py that acts as an entrypoint. Otherwise, users can write agents in any programming language, using any libraries, as long as the agent can run on a Debian 12 system.

Task

Under the METR Task Standard, a task is Python code that specifies:

  1. A computational environment for the agent to interact with (e.g. a Docker container)
  2. Instructions for the agent
  3. Optionally, code for automatically scoring the agent's submission and the state of the computational environment

Task environment

A task environment is an instantiation of a particular METR Task Standard task definition. The viv CLI has commands for creating and interacting with task environments, under viv task.

Run

A run is when a user creates a task environment and start an agent in it, using the viv run command.

Agent container

"Agent container" is a term for a task environment in which an agent is running or has run.

Agent branch

Vivaria supports running multiple agents inside the same agent container. This lets users quickly start up new instances of a particular agent, with different settings or prompts, to explore how the agent would behave in that situation. These agents aren't supposed to interact with each other, although they can theoretically, e.g. by writing files to the agent container's filesystem.

(At METR, we only use this feature for agent elicitation research, not for rigorously evaluating agents. When conducting evals, we run each agent inside its own agent container. That way, agents have no way to interfere with each other.)

An agent branch refers to one agent process running in a particular task environment, potentially alongside other, independent agent branches.

Trace

A trace is a recording of what happened on a particular agent branch. Basically, it's an array of events that the agent or Vivaria logged during branch execution.

Trace entry

A trace entry is one event in a trace. It could track a generation request from an agent, an action taken by an agent, the results of an action, an error encountered by an agent or by Vivaria, or many other things.

pyhooks

pyhooks is a Python library that Vivaria agents use to communicate with Vivaria. See here for more details.