Skip to content

Setting up Vivaria using Docker Compose

We've tested that this works on Linux, macOS and Windows.

Known issues

  • On Linux, you must run these setup steps as the root user.
  • On Windows, you must run the shell commands in a PowerShell prompt.
  • On Linux and macOS, this setup assumes that a Docker socket exists at /var/run/docker.sock. This isn't true for Docker in rootless mode on Linux. You may be able to work around this by creating a symlink from /var/run/docker.sock to the actual location of the Docker socket.

Start Vivaria

  1. Install Docker and Docker Compose. (The Docker Desktop distribution includes both.)
  2. Clone https://github.com/METR/vivaria.
  3. In the clone's root directory, run ./scripts/setup-docker-compose.sh (or .\scripts\setup-docker-compose.ps1 on Windows). This generates .env files containing environment variables for the Vivaria server and database.
  4. Add an OPENAI_API_KEY to .env.server.
  5. (Optional) If you want to start task environments containing aux VMs, add a TASK_AWS_REGION, TASK_AWS_ACCESS_KEY_ID, and TASK_AWS_SECRET_ACCESS_KEY to .env.server.
  6. (On macOS) Docker Desktop on macOS doesn't allow easy access to containers over IP. Therefore, viv ssh/scp/code and viv task ssh/scp/code don't work out of the box. The Docker Compose setup defines a proxy container on MacOS to get round this, but for it work correctly you will need to make sure it can access your keys. By default it assumes this is ~/.ssh/id_rsa.pub, but you can override this by setting SSH_PUBLIC_KEY_PATH in .env.
  7. Run docker compose up --detach --wait
    • By default, Docker Compose uses the directory name of the docker-compose file as the project name. docker-compose.yml is written assuming the project name is vivaria. If you want to use a different project name, you'll need to use a docker-compose.override.yml file to e.g. change the values of FULL_INTERNET_NETWORK_NAME and NO_INTERNET_NETWORK_NAME.
    • If the scripts hangs or you get the error The system cannot find the file specified, make sure the Docker Engine/daemon is running and not paused or in "Resource Saver" mode.
  8. Run docker compose ps to check that the containers are up and running.

Now you can:

  • Visit https://localhost:4000 to see the Vivaria UI
    • You'll probably see a certificate error from your browser, which we suggest ignoring
    • You'll be asked to provide an access token and ID token (get them from .env.server)
  • Run curl http://localhost:4001/health to check that the server is running

Install the viv CLI

(Optional) Create a virtualenv:

mkdir ~/.venvs && python3 -m venv ~/.venvs/viv && source ~/.venvs/viv/bin/activate

Or, on Windows:

mkdir $home\.venvs && python3 -m venv $home\.venvs\viv && & "$home\.venvs\viv\scripts\activate.ps1"

Install the CLI and its dependencies:

pip install -e cli

In the root directory of your https://github.com/METR/vivaria clone, run:

./scripts/configure-cli-for-docker-compose.sh

Or, on Windows:

.\scripts\configure-cli-for-docker-compose.ps1

Note that this could override the viv CLI's existing settings. If you like, you can back up ~/.config/viv-cli/config.json before running this script.

To have Vivaria give you access SSH access to task environments and agent containers:

viv register-ssh-public-key path/to/ssh/public/key

Create your first task environment

viv task start reverse_hash/abandon --task-family-path task-standard/examples/reverse_hash

# Note that this doesn't work on macOS. Instead, use docker exec to access the container.
viv task ssh --user agent

Inside the task environment, run cat ~/instructions.txt to see the task's instructions.

To score a solution to the task:

viv task score --submission abandon
viv task score --submission "another word"

Start your first run

cd ..
git clone https://github.com/poking-agents/modular-public
cd vivaria

viv run reverse_hash/abandon --task-family-path task-standard/examples/reverse_hash --agent-path ../modular-public

The last command prints a link to https://localhost:4000. Follow that link to see the run's trace and track the agent's progress on the task.