Setting up Vivaria using Docker Compose

We've tested that this works on Linux and macOS. Windows may require some additional setup.

Prerequisites

Container runtime installation

macOS: We recommend OrbStack for better filesystem performance and lower memory usage compared to Docker Desktop.
Linux & Windows: Use the official Docker Installation Guide.

Install script (macOS and Linux only)

curl -fsSL https://raw.githubusercontent.com/METR/vivaria/main/scripts/install.sh | bash -

Manual setup (macOS, Linux and Windows)

Clone Vivaria: https://github.com/METR/vivaria
Enter the vivaria directory: cd vivaria
Generate .env.db and .env.server
- macOS/Linux: ./scripts/setup-docker-compose.sh
- Windows PowerShell: .\scripts\setup-docker-compose.ps1
Add an LLM provider's API key to .env.server (make sure to only use one provider)
- For OpenAI add OPENAI_API_KEY=... (docs)
- For Gemini add GEMINI_API_KEY=... (docs)
- For Anthropic add ANTHROPIC_API_KEY=... (docs)
Start Vivaria: docker compose up --pull always --detach --wait (make sure to set VIVARIA_DOCKER_GID if needed, see here)

Make sure Vivaria is running correctly

Check that the containers are running:

docker compose ps

You should at least have these containers (their names usually end with -1):

vivaria-server
vivaria-database
vivaria-ui
vivaria-background-process-runner

If you still have vivaria-run-migrations and you don't yet have vivaria-server, then you might have to wait 20 seconds, or perhaps look at the logs to see if the migrations are stuck (see this section below).

Visit the UI

Open https://localhost:4000 in your browser.

Certificate errors are expected since Vivaria generates a self-signed certificate for local use.
You'll be asked to provide an access token and ID token (get them from .env.server)

Install the viv CLI

You can use the viv CLI to start task environments and agent runs.

Create a virtualenv

Make sure you have python3.11 or above used in your shell

If you need a newer python version and you're using Mac or Linux, we recommend using pyenv.

Create virtualenv: macOS/Linux

mkdir -p ~/.venvs && python3 -m venv ~/.venvs/viv && source ~/.venvs/viv/bin/activate

Create virtualenv: Windows PowerShell

mkdir $home\.venvs && python3 -m venv $home\.venvs\viv && & "$home\.venvs\viv\scripts\activate.ps1"

Install the CLI and its dependencies

pip install -e cli

Configure the CLI to use Docker Compose

Optional: backup the previous configuration

If your CLI is already installed and pointing somewhere else, you can back up the current configuration, which is in ~/.config/viv-cli/config.json.

Configure the CLI: macOS/Linux

./scripts/configure-cli-for-docker-compose.sh

Configure the CLI: Windows PowerShell

.\scripts\configure-cli-for-docker-compose.ps1

SSH

To have Vivaria give you access to task environments and agent containers via SSH:

viv register-ssh-public-key path/to/ssh/public/key

This will let you run viv ssh and viv task ssh to access the containers. Alternatively, you can use docker exec to access the containers directly.

Next steps

Known issues

Rootless Docker mode in Linux

On Linux, Vivaria expects a Docker socket at /var/run/docker.sock. If you're running Docker in rootless mode, create a symlink from there to the actual Docker socket location.

Docker GID on macOS/Linux (logs say `permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock`)

On macOS/Linux, you may need to make sure VIVARIA_DOCKER_GID matches your system's number before running docker compose up. On Linux you can get this using getent group docker. Once you have the group ID, either export it as an environment variable or run docker like this:

VIVARIA_DOCKER_GID=<number> docker compose up --pull always --detach --wait

macOS Docker Desktop and SSH access

On macOS, Docker Desktop doesn't allow direct access to containers using their IP addresses on Docker networks. Therefore, viv ssh/scp/code and viv task ssh/scp/code don't work out of the box. docker-compose.dev.yml defines a jumphost container on macOS to get around this. For it to work correctly, you need to provide it with a public key for authentication.

By default it assumes your public key is at ~/.ssh/id_rsa.pub, but you can override this by setting SSH_PUBLIC_KEY_PATH in .env.
Generate an SSH key: You can use the GitHub tutorial. However:
- You don't need to "Add the SSH public key to your account on GitHub".
- You do need ~/.ssh/id_ed25519 to exist and be added to your keychain.
Add SSH_PUBLIC_KEY_PATH=~/.ssh/id_ed25519 to .env
- This isn't the default because of legacy reasons.

A script hangs or you get the error `The system cannot find the file specified`

Make sure the Docker Engine/daemon is running and not paused or in "Resource Saver" mode. (Did you install Docker in the recommended way above?)

The migration container gets an error when it tries to run

Try removing the DB container:

docker container ls --all # get the container name, e.g. vivaria-database-1
docker rm vivaria-database-1 --force

Then try running docker compose up again.

If that didn't work, you can remove the Docker volumes too, which would also reset the DB:

docker compose down --volumes

Why: If setup-docker-compose.sh ran after the DB container was created, it might have randomized a new DB_READONLY_PASSWORD (or maybe something else randomized for the DB), and if the DB container wasn't recreated, then it might still be using the old password.

Browser error: `Unable to transform response from server`

Clear your browser's local storage for https://localhost:4000, then refresh https://localhost:4000 and re-enter the access and ID tokens from .env.server.

Can't start runs with CLI because `x-evals-token is incorrect`

If you can access the web interface at https://localhost:4000, copy the evals token using the button in the top right corner. Then set it with the CLI:

viv config set evalsToken <token>

`overlay over xfs with 'pquota' mount option`

If you get Error response from daemon: --storage-opt is supported only for overlay over xfs with 'pquota' mount option, this can be fixed by setting the TASK_ENVIRONMENT_STORAGE_GB environment variable to -1 in .env.server. See the Agent sandboxing options for details.

`agents can only use this model on full_internet tasks if the run is interactive`

For full_internet tasks without human supervision, you'll need to grant the model you're using access by putting it in the NON_INTERVENTION_FULL_INTERNET_MODELS environment variable, which you can set in .env.server. See the Agent sandboxing options for details.