Setting up Vivaria using Docker Compose
We've tested that this works on Linux, macOS and Windows.
Known issues
- On Linux, you must run these setup steps as the root user.
- On Windows, you must run the shell commands in a PowerShell prompt.
- On Linux, this setup assumes that a Docker socket exists at
/var/run/docker.sock
. This isn't true for Docker in rootless mode on Linux. You may be able to work around this by creating a symlink from/var/run/docker.sock
to the actual location of the Docker socket.
Prerequisite: Install a container runtime (once per computer)
Mac
We recommend OrbStack over Docker Desktop. OrbStack runs containers with faster filesystem I/O and lower memory usage than Docker Desktop.
Problems with docker login? (if you did that)
On macOS, multiple simultaneous docker login
calls will result in
Error saving credentials: error storing credentials - err: exit status 1, out: `error storing credentials - err: exit status 1, out: `The specified item already exists in the keychain.`
This currently only comes up as a race condition when using Depot and building multiple images simultaneously.
Linux + Windows
Use the official Docker Installation.
Set Docker to run at computer startup
Settings (top right gear) --> General --> "Start Docker Desktop when you sign in to your computer". Ref
Install Script (macOS and Linux only)
curl -fsSL https://raw.githubusercontent.com/METR/vivaria/main/scripts/install.sh | bash -
Manual Setup (macOS, Linux and Windows)
- Clone Vivaria: https://github.com/METR/vivaria
- Enter the vivaria directory:
cd vivaria
- Generate
.env.db
and.env.server
- Unix shells (Mac / Linux):
./scripts/setup-docker-compose.sh
- Windows PowerShell:
.\scripts\setup-docker-compose.ps1
- Unix shells (Mac / Linux):
- (Optional) Add LLM provider's API keys to
.env.server
- This will allow you to run one of METR's agents (e.g. modular-public) to solve a task using an LLM. If you don't do this, you can still try to solve the task manually or run a non-METR agent with its own LLM API credentials.
- OpenAI: docs
- You can also add
OPENAI_ORGANIZATION
andOPENAI_PROJECT
- You can also add
- Gemini: docs
- Add the line
GEMINI_API_KEY=AIza...
to.env.server
- Add the line
- Anthropic: docs
- Add the line
ANTHROPIC_API_KEY=sk-...
to.env.server
- Add the line
- (Optional, not recommended for local development) Support aux VMs
- This will let Vivaria set up a VM in AWS to run a task. Learn more.
- Add
TASK_AWS_REGION
,TASK_AWS_ACCESS_KEY_ID
, andTASK_AWS_SECRET_ACCESS_KEY
to.env.server
.
- (Docker Desktop only) Give the jumphost container your public key
- Long explanation on why this is needed: (On macOS) Docker Desktop on macOS doesn't allow direct access to containers using their IP addresses on Docker networks. Therefore,
viv ssh/scp/code
andviv task ssh/scp/code
don't work out of the box.docker-compose.dev.yml
defines a jumphost container on macOS to get around this. For it to work correctly, you need to provide it with a public key for authentication. By default it assumes your public key is at~/.ssh/id_rsa.pub
, but you can override this by settingSSH_PUBLIC_KEY_PATH
in.env
. - Generate an SSH key: You can use the GitHub tutorial. However:
- You don't need to "Add the SSH public key to your account on GitHub".
- You do need
~/.ssh/id_ed25519
to exist and be added to your keychain.
- Add
SSH_PUBLIC_KEY_PATH=~/.ssh/id_ed25519
to.env
- This isn't the default because of legacy reasons.
- Long explanation on why this is needed: (On macOS) Docker Desktop on macOS doesn't allow direct access to containers using their IP addresses on Docker networks. Therefore,
- Start Vivaria:
docker compose up --pull always --detach --wait
See the Vivaria logs
If you want to
docker compose logs -f
FAQ
Q: The scripts hangs or you get the error The system cannot find the file specified
A: Make sure the Docker Engine/daemon is running and not paused or in "Resource Saver" mode. (did you install Docker in the recommended way above?)
Q: The migration container gets an error when it tries to run
A: TL;DR: Try removing the DB container (and then rerunning Docker Compose)
docker compose down
docker container ls # expecting to see the vivaria-database-1 container running. If not, edit the next line
docker rm vivaria-database-1 --force
Then try running Docker Compose again again.
If that didn't work, you can remove the Docker volumes too, which would also reset the DB:
docker compose down --volumes
Why: If setup-docker-compose.sh
ran after the DB container was created, it might have randomized a new
DB_READONLY_PASSWORD
(or maybe something else randomized for the DB), and if the DB container
wasn't recreated, then it might still be using the old password.
Q: Can't connect to the Docker socket
A: Options:
- Docker isn't running (see the section about installing and running Docker).
- There's a permission issue accessing the Docker socket, solved in the
docker-compose.dev.yml
section.
Make sure Vivaria is running correctly
docker compose ps
You should at least have these containers (their names usually end with -1
):
- vivaria-server
- vivaria-database
- vivaria-ui
- vivaria-background-process-runner
If you still have vivaria-run-migrations
and you don't yet have vivaria-server
, then you might
have to wait 20 seconds, or perhaps look at the logs to see if the migrations are stuck (see FAQ above).
Visit the UI
Open https://localhost:4000 in your browser.
- Certificate error: That's expected, bypass it to access the UI.
- Why this error happens: Because Vivaria generates a self-signed certificate for itself on startup.
- You'll be asked to provide an access token and ID token (get them from
.env.server
)
Install the viv CLI
Why: The viv CLI can connect to the Vivaria server and tell it to, for example, run a task or start an agent that will try solving the task.
Create a virtualenv
Make sure you have python3.11 or above used in your shell
Why: cli/pyproject.toml
requires python=">=3.11,<4"
.
How:
python3 --version # or `python` instead of `python3`, but then also edit the commands below
If you need a newer python version and you're using Mac, we recommend using pyenv.
Create virtualenv: Unix shells (Mac / Linux)
mkdir ~/.venvs && python3 -m venv ~/.venvs/viv && source ~/.venvs/viv/bin/activate
Create virtualenv: Windows PowerShell
mkdir $home\.venvs && python3 -m venv $home\.venvs\viv && & "$home\.venvs\viv\scripts\activate.ps1"
Update pip
pip install --upgrade pip
Install the CLI and its dependencies
pip install -e cli
Configure the CLI to use Docker Compose
Optional: Backup the previous configuration
If your CLI is already installed and pointing somewhere else, you can back up the current
configuration, which is in ~/.config/viv-cli/config.json
.
Configure the CLI
In the root of vivaria:
Configure the CLI: Unix shells (Mac / Linux)
./scripts/configure-cli-for-docker-compose.sh
Configure the CLI: Windows PowerShell
.\scripts\configure-cli-for-docker-compose.ps1
SSH (not recommended when running a local Vivaria instance)
To have Vivaria give you access SSH access to task environments and agent containers:
viv register-ssh-public-key path/to/ssh/public/key
Create your first task environment
What this means: Start a Docker container that contains a task, in our example, the task is "Find the number of odd digits in this list: ...". After that, either an agent (that uses an LLM) or a human can try solving the task.
Create task
viv task start count_odds/main --task-family-path task-standard/examples/count_odds
Access the task environment
Why: It will let you see the task (from inside the Docker container) similarly to how an agent (powered by an LLM) would see it.
Option 1: Using docker exec (recommended)
- Find the container name
docker container ls
- Access the container
docker exec -it --user agent <container_name> bash -l
Option 2: Using SSH through the CLI (doesn't work for macOS)
viv task ssh --user agent
Read the task instructions
Inside the task environment,
cat ~/instructions.txt
Submit a solution (and get a score)
Using the CLI (outside of the task environment)
For example, submit the correct solution (which happens to be "2") and see what score you get:
viv task score --submission "2"
For example, submit an incorrect solution and see what score you get:
viv task score --submission "99"
Start your first run
This means: Start an agent (powered by an LLM) to try solving the task:
Get the agent code
This means: Scaffolding. Code that will prompt the LLM to try solving the task, and will let the LLM do things like running bash commands. We'll use the "modular public" agent:
cd ..
git clone https://github.com/poking-agents/modular-public
cd vivaria
viv run count_odds/main --task-family-path task-standard/examples/count_odds --agent-path ../modular-public
The last command prints a link to https://localhost:4000. Follow that link to see the run's trace and track the agent's progress on the task.
Writing new code?
See CONTRIBUTING.md for instructions for configuring this Docker Compose setup for Vivaria development.