Using distributed tasks requires:
- A private worker or full private installation of Valohai
- The infrastructure security should freely allow communication between workers in the subnet
- Then distributed features can be enabled by Valohai staff
If you have any questions regarding the setup, contact Valohai support and we'll get it sorted!
Prepare Your Project Work Distributedly
Running your steps as distributed tasks doesn't automatically turn your code to work in a distributed way. You should prepare your project to support the distributed mindset by following the conventions provided by the frameworks you use and/or checking out our distributed examples.
The publicly available
valohai-utils Python package contain distributed tooling for that.
To get started, you might want to split your initialization code something on the lines:
The rest of the preparation largely depends on the tools used, but somewhere down the line you probably want to get connection information of all workers in the distributed task.
members = valohai.distributed.members()
member_local_ips = ",".join([m.primary_local_ip for m in members])
Or, alternatively, connection information for one of the workers that the rest use to communicate through.
master = valohai.distributed.master()
Some distributed tool chains also require you to run a separate program to wrap the job code like OpenMPI. In these cases, we recommend wrapping that call to a Python script to better control the parameters inserted. Here is one such example.
If you want further direction or tips regarding specific distributed frameworks or technologies, be sure to contact Valohai support or checking out our distributed examples.
Run a Distributed Task
After your code supports distributed processing, you can create a distributed task:
- Click on your Project’s Task tab
- Click the Create task button
- Select the Source and Runtime values you want to use
- Under Configuration, select Distributed as your Task type
- Edit the Execution count below to define how many total executions will be in the task
- Optionally use Environment variables to customize network configuration:
- VH_DOCKER_NETWORK=host can be used to expose all ports to the worker subnet
- VH_EXPOSE_PORTS=1234 can be used to expose certain ports or port ranges
- These values can also be predefined in the valohai.yaml through step.environment-variables like most of our distributed examples do
Distributed tasks are created by selecting task type as Distributed.
Network configuration of distributed workers can be further customized with environment variables.