How to run an agent on a task

To run an agent on a specific task, use the viv run command. Vivaria "runs" (created with viv run) are performed by Vivaria agents, whereas "task environments" (created with viv task start) are used for manual testing. Vivaria agents are usually powered by LLMs. However, there is also a headless-human agent that can be used to perform runs manually.

Note: for running agents in full_internet tasks without human supervision, you'll need to grant the model access using the NON_INTERVENTION_FULL_INTERNET_MODELS environment variable described here. You can set this in .env.server.

A simple example

Get the agent code

Agents are distributed as Git repositories. We'll use the modular-public agent:

git clone https://github.com/poking-agents/modular-public

Run the agent

viv run count_odds/main --task-family-path vivaria/examples/count_odds --agent-path path/to/modular-public

This will output a link and run number. Follow the link to see the run's trace and track the agent's progress on the task. You can also connect to the run using viv ssh <run_number> or docker exec:

viv ssh <run_number> --user agent  # omit '--user agent' to connect as root
docker exec -it <container_name> bash -l

Running your own agent and task

There are two ways to run agents on tasks, depending on if your Vivaria instance has Git support set up or not.

Upload your task and agent directly to Vivaria

This works whether or not your Vivaria instance has Git support, and is the method used in our example above.

viv run count_odds/main --task-family-path path/to/count_odds --agent-path path/to/my-agent-repo

Vivaria will create two zip files, one containing the task code in the folder path/to/count_odds and another containing the agent in path/to/my-agent-repo. It'll upload both zip files to Vivaria, which will start a task environment based on the task code and run the agent in it.

Push your agent to a Git remote

This only works if your Vivaria instance has Git support.

cd path/to/my-agent-repo
viv run count_odds/main

Vivaria will commit and push any uncommitted agent changes from your computer to your Git hosting service. Then, it'll look up the task count_odds/main in your Vivaria instance's tasks Git repo, start a task environment based on that task, and run your agent in it.

Running the specific version of a task

If the task repository you're running from has version tags enabled, then you can run specific versions of tasks. The current task versioning scheme is per family, and the required tag format is [email protected].

To run a task on a specific version, you can run with

viv run count_odds/main@count_odds/v1.2.3

Other features

Run viv run --help to see a full list of flags for viv run.

Intervention mode

You can use viv run <task> -i (or --intervention) to enable human input on certain agent actions, if the agent code supports this by calling the rate_options or get_input functions from pyhooks.

Run-time agent settings arguments

You can pass arbitrary run-time arguments to your agent in several ways. The following are equivalent:

viv run count_odds/main --agent_settings_override="\"{\"actor\": {\"model\": \"gpt-4o\"}\""

echo "{\"actor\": {\"model\": \"gpt-4o\"}" > settings_override.json
viv run count_odds/main --agent_settings_override "settings_override.json"

You can also store this information inside a manifest.json file inside the agent (see modular-public/manifest.json for an example)

// manifest.json
{
  "defaultSettingsPack": "my_settings",
  "settingsPacks": {
    "my_settings": {
      "actor": {
        "model": "gpt-4o"
      }
    },
    ...
  }
}

And refer to it like this:

viv run count_odds/main --agent_settings_pack my_settings

Lastly, you can an agent from a previous state. You can copy the state by clicking "Copy agent state json" in the Vivaria UI and then pasting it into some file (state.json in this example). the agent will then reload this state if you use the following argument:

viv run count_odds/main --agent_starting_state_file state.json

If you use multiple of these options, the override takes highest priority, then the settings pack, then the agent state, and lastly the default settings pack.