Skip to content

Setting up Vivaria using Docker Compose

We've tested that this works on Linux, macOS and Windows.

Known issues

  • On Linux, you must run these setup steps as the root user.
  • On Windows, you must run the shell commands in a PowerShell prompt.
  • On Linux, this setup assumes that a Docker socket exists at /var/run/docker.sock. This isn't true for Docker in rootless mode on Linux. You may be able to work around this by creating a symlink from /var/run/docker.sock to the actual location of the Docker socket.

Prerequisite: Install a container runtime (once per computer)

Mac

We recommend OrbStack over Docker Desktop. OrbStack runs containers with faster filesystem I/O and lower memory usage than Docker Desktop.

Problems with docker login? (if you did that)

On macOS, multiple simultaneous docker login calls will result in

Error saving credentials: error storing credentials - err: exit status 1, out: `error storing credentials - err: exit status 1, out: `The specified item already exists in the keychain.`

This currently only comes up as a race condition when using Depot and building multiple images simultaneously.

Linux + Windows

Use the official Docker Installation.

Set Docker to run at computer startup

Settings (top right gear) --> General --> "Start Docker Desktop when you sign in to your computer". Ref

Install Script (macOS and Linux only)

curl -fsSL https://raw.githubusercontent.com/METR/vivaria/main/scripts/install.sh | bash -

Manual Setup (macOS, Linux and Windows)

  1. Clone Vivaria: https://github.com/METR/vivaria
  2. Enter the vivaria directory: cd vivaria
  3. Generate .env.db and .env.server
    • Unix shells (Mac / Linux): ./scripts/generate-env-files.sh
    • Windows PowerShell: .\scripts\generate-env-files.ps1
  4. (Optional) Add LLM provider API keys
    • This will allow you to run one of METR's agents (e.g. modular-public) to solve a task using an LLM. If you don't do this, you can still try to solve the task manually or run a non-METR agent with its own LLM API credentials.
    • OpenAI: docs
      • You can also add OPENAI_ORGANIZATION and OPENAI_PROJECT
    • Gemini: docs
      • Add the line GEMINI_API_KEY=AIza... to .env.server
    • Anthropic: docs
      • Add the line ANTHROPIC_API_KEY=sk-... to .env.server
  5. (Optional, not recommended for local development) Support aux VMs
    • This will let Vivaria set up a VM in AWS to run a task. Learn more.
    • Add TASK_AWS_REGION, TASK_AWS_ACCESS_KEY_ID, and TASK_AWS_SECRET_ACCESS_KEY to .env.server.
  6. (Docker Desktop only) Give the jumphost container your public key
    • Long explanation on why this is needed: (On macOS) Docker Desktop on macOS doesn't allow direct access to containers using their IP addresses on Docker networks. Therefore, viv ssh/scp/code and viv task ssh/scp/code don't work out of the box. docker-compose.dev.yml defines a jumphost container on macOS to get around this. For it to work correctly, you need to provide it with a public key for authentication. By default it assumes your public key is at ~/.ssh/id_rsa.pub, but you can override this by setting SSH_PUBLIC_KEY_PATH in .env.
    • Generate an SSH key: You can use the GitHub tutorial. However:
      • You don't need to "Add the SSH public key to your account on GitHub".
      • You do need ~/.ssh/id_ed25519 to exist and be added to your keychain.
    • Add SSH_PUBLIC_KEY_PATH=~/.ssh/id_ed25519 to .env
      • This isn't the default because of legacy reasons.
  7. Start Vivaria: docker compose up --pull always --detach --wait

See the Vivaria logs

If you want to

docker compose logs -f

FAQ

Q: The scripts hangs or you get the error The system cannot find the file specified

A: Make sure the Docker Engine/daemon is running and not paused or in "Resource Saver" mode. (did you install Docker in the recommended way above?)

Q: The migration container gets an error when it tries to run

A: TL;DR: Try removing the DB container (and then rerunning Docker Compose)

docker compose down
docker container ls # expecting to see the vivaria-database-1 container running. If not, edit the next line
docker rm vivaria-database-1 --force

Then try running Docker Compose again again.

If that didn't work, you can remove the Docker volumes too, which would also reset the DB:

docker compose down --volumes

Why: If setup-docker-compose.sh ran after the DB container was created, it might have randomized a new DB_READONLY_PASSWORD (or maybe something else randomized for the DB), and if the DB container wasn't recreated, then it might still be using the old password.

Q: Can't connect to the Docker socket

A: Options:

  1. Docker isn't running (see the section about installing and running Docker).
  2. There's a permission issue accessing the Docker socket, solved in the docker-compose.dev.yml section.

Make sure Vivaria is running correctly

docker compose ps

You should at least have these containers (their names usually end with -1):

  1. vivaria-server
  2. vivaria-database
  3. vivaria-ui
  4. vivaria-background-process-runner

If you still have vivaria-run-migrations and you don't yet have vivaria-server, then you might have to wait 20 seconds, or perhaps look at the logs to see if the migrations are stuck (see FAQ above).

Visit the UI

Open https://localhost:4000 in your browser.

  1. Certificate error: That's expected, bypass it to access the UI.
    1. Why this error happens: Because Vivaria generates a self-signed certificate for itself on startup.
  2. You'll be asked to provide an access token and ID token (get them from .env.server)

Install the viv CLI

Why: The viv CLI can connect to the Vivaria server and tell it to, for example, run a task or start an agent that will try solving the task.

Create a virtualenv

Make sure you have python3.11 or above used in your shell

Why: cli/pyproject.toml requires python=">=3.11,<4".

How:

python3 --version # or `python` instead of `python3`, but then also edit the commands below

If you need a newer python version and you're using Mac, we recommend using pyenv.

Create virtualenv: Unix shells (Mac / Linux)

mkdir ~/.venvs && python3 -m venv ~/.venvs/viv && source ~/.venvs/viv/bin/activate

Create virtualenv: Windows PowerShell

mkdir $home\.venvs && python3 -m venv $home\.venvs\viv && & "$home\.venvs\viv\scripts\activate.ps1"

Update pip

pip install --upgrade pip

Install the CLI and its dependencies

pip install -e cli

Configure the CLI to use Docker Compose

Optional: Backup the previous configuration

If your CLI is already installed and pointing somewhere else, you can back up the current configuration, which is in ~/.config/viv-cli/config.json.

Configure the CLI

In the root of vivaria:

Configure the CLI: Unix shells (Mac / Linux)

./scripts/configure-cli-for-docker-compose.sh

Configure the CLI: Windows PowerShell

.\scripts\configure-cli-for-docker-compose.ps1

To have Vivaria give you access SSH access to task environments and agent containers:

viv register-ssh-public-key path/to/ssh/public/key

Create your first task environment

What this means: Start a Docker container that contains a task, in our example, the task is "Find the number of odd digits in this list: ...". After that, either an agent (that uses an LLM) or a human can try solving the task.

Create task

viv task start count_odds/main --task-family-path task-standard/examples/count_odds

Access the task environment

Why: It will let you see the task (from inside the Docker container) similarly to how an agent (powered by an LLM) would see it.

  1. Find the container name
    docker container ls
    
  2. Access the container
    docker exec -it --user agent <container_name> bash -l
    

Option 2: Using SSH through the CLI (doesn't work for macOS)

viv task ssh --user agent

Read the task instructions

Inside the task environment,

cat ~/instructions.txt

Submit a solution (and get a score)

Using the CLI (outside of the task environment)

For example, submit the correct solution (which happens to be "2") and see what score you get:

viv task score --submission "2"

For example, submit an incorrect solution and see what score you get:

viv task score --submission "99"

Start your first run

This means: Start an agent (powered by an LLM) to try solving the task:

Get the agent code

This means: Scaffolding. Code that will prompt the LLM to try solving the task, and will let the LLM do things like running bash commands. We'll use the "modular public" agent:

cd ..
git clone https://github.com/poking-agents/modular-public
cd vivaria
viv run count_odds/main --task-family-path task-standard/examples/count_odds --agent-path ../modular-public

The last command prints a link to https://localhost:4000. Follow that link to see the run's trace and track the agent's progress on the task.

Writing new code?

See CONTRIBUTING.md for instructions for configuring this Docker Compose setup for Vivaria development.