Setting up Vivaria using Docker Compose
We've tested that this works on Linux, macOS and Windows.
Known issues
- On Linux, you must run these setup steps as the root user.
- On Windows, you must run the shell commands in a PowerShell prompt.
- On Linux and macOS, this setup assumes that a Docker socket exists at
/var/run/docker.sock
. This isn't true for Docker in rootless mode on Linux. You may be able to work around this by creating a symlink from/var/run/docker.sock
to the actual location of the Docker socket.
Start Vivaria
- Install Docker and Docker Compose. (The Docker Desktop distribution includes both.)
- Clone https://github.com/METR/vivaria.
- In the clone's root directory, run
./scripts/setup-docker-compose.sh
(or.\scripts\setup-docker-compose.ps1
on Windows). This generates.env
files containing environment variables for the Vivaria server and database. - Add an
OPENAI_API_KEY
to.env.server
. - (Optional) If you want to start task environments containing aux VMs, add a
TASK_AWS_REGION
,TASK_AWS_ACCESS_KEY_ID
, andTASK_AWS_SECRET_ACCESS_KEY
to.env.server
. - (On macOS) Docker Desktop on macOS doesn't allow easy access to containers over IP. Therefore,
viv ssh/scp/code
andviv task ssh/scp/code
don't work out of the box. The Docker Compose setup defines a proxy container on MacOS to get round this, but for it work correctly you will need to make sure it can access your keys. By default it assumes this is~/.ssh/id_rsa.pub
, but you can override this by settingSSH_PUBLIC_KEY_PATH
in.env
. - Run
docker compose up --detach --wait
- By default, Docker Compose uses the directory name of the docker-compose file as the project name.
docker-compose.yml
is written assuming the project name isvivaria
. If you want to use a different project name, you'll need to use adocker-compose.override.yml
file to e.g. change the values ofFULL_INTERNET_NETWORK_NAME
andNO_INTERNET_NETWORK_NAME
. - If the scripts hangs or you get the error
The system cannot find the file specified
, make sure the Docker Engine/daemon is running and not paused or in "Resource Saver" mode.
- By default, Docker Compose uses the directory name of the docker-compose file as the project name.
- Run
docker compose ps
to check that the containers are up and running.
Now you can:
- Visit https://localhost:4000 to see the Vivaria UI
- You'll probably see a certificate error from your browser, which we suggest ignoring
- You'll be asked to provide an access token and ID token (get them from
.env.server
)
- Run
curl http://localhost:4001/health
to check that the server is running
Install the viv CLI
(Optional) Create a virtualenv:
mkdir ~/.venvs && python3 -m venv ~/.venvs/viv && source ~/.venvs/viv/bin/activate
Or, on Windows:
mkdir $home\.venvs && python3 -m venv $home\.venvs\viv && & "$home\.venvs\viv\scripts\activate.ps1"
Install the CLI and its dependencies:
pip install -e cli
In the root directory of your https://github.com/METR/vivaria clone, run:
./scripts/configure-cli-for-docker-compose.sh
Or, on Windows:
.\scripts\configure-cli-for-docker-compose.ps1
Note that this could override the viv CLI's existing settings. If you like, you can back up ~/.config/viv-cli/config.json
before running this script.
To have Vivaria give you access SSH access to task environments and agent containers:
viv register-ssh-public-key path/to/ssh/public/key
Create your first task environment
viv task start reverse_hash/abandon --task-family-path task-standard/examples/reverse_hash
# Note that this doesn't work on macOS. Instead, use docker exec to access the container.
viv task ssh --user agent
Inside the task environment, run cat ~/instructions.txt
to see the task's instructions.
To score a solution to the task:
viv task score --submission abandon
viv task score --submission "another word"
Start your first run
cd ..
git clone https://github.com/poking-agents/modular-public
cd vivaria
viv run reverse_hash/abandon --task-family-path task-standard/examples/reverse_hash --agent-path ../modular-public
The last command prints a link to https://localhost:4000. Follow that link to see the run's trace and track the agent's progress on the task.