There are two configuration files generated on all distributed task executions: /valohai/config/distributed.json
and /valohai/config/distributed.yaml
. Both of these files contain the same information; we are providing two formats for convenience.
These configuration files are only present on executions that are part of a distributed task.
The publicly available valohai-utils
Python package contains helpers under valohai.distributed
to utilize the distributed task configuration, but you are also free to parse and use the configuration files yourself.
If you choose to interpret the configuration yourself, here is short descriptions for the values:
- config.group_name distributed group name, usually from the task identifier
- config.member_id identifier for the member running on the machine this configuration is read
- config.required_count how many workers will be in this group
- members a list of all the members i.e. executions
- member.announce_time when the member joined the group
- member.identity machine identifier of the member, depending on the infrastructure used
- member.job_id execution identifier of the member, used for queueing
- member.member_id member identifier, arbitrary unique string, currently a simple number as string
- member.network.exposed_ports a mapping of host port to container port that are exposed
- if all ports are exposed by e.g. by VH_DOCKER_NETWORK=host , this could be empty
- member.network.local_ips a list of known local IP addresses to access this member
- member.network.public_ips a list of known public IP addresses to access this member, if exists
- self a duplicate helper object that is the member object of the currently running machine
The rest of this document is an example how a full configuration file might look like:
{
"config": {
"group_name": "task-0180f5a9-9ffe-4e09-d5a7-9a0a507019d4",
"member_id": "0",
"required_count": 3
},
"members": [
{
"announce_time": "2022-05-24T10:42:57",
"identity": "happy-yjaqaqlx",
"job_id": "exec-0180f5a9-a002-45a0-f0e6-8e98720eeaad",
"member_id": "0",
"network": {
"exposed_ports": {
"1234": "1234"
},
"local_ips": [
"10.0.16.61"
],
"public_ips": [
"34.121.32.110"
]
}
},
{
"announce_time": "2022-05-24T10:42:58",
"identity": "happy-kwfncqxe",
"job_id": "exec-0180f5a9-a007-633b-8af3-e11593482653",
"member_id": "2",
"network": {
"exposed_ports": {
"1234": "1234"
},
"local_ips": [
"10.0.16.60"
],
"public_ips": [
"34.134.18.149"
]
}
},
{
"announce_time": "2022-05-24T10:42:57",
"identity": "happy-tcuaezxm",
"job_id": "exec-0180f5a9-a005-f2ef-693a-3b4c4c115ed8",
"member_id": "1",
"network": {
"exposed_ports": {
"1234": "1234"
},
"local_ips": [
"10.0.16.59"
],
"public_ips": [
"35.194.55.255"
]
}
}
],
"self": {
"announce_time": "2022-05-24T10:42:57",
"identity": "happy-yjaqaqlx",
"job_id": "exec-0180f5a9-a002-45a0-f0e6-8e98720eeaad",
"member_id": "0",
"network": {
"exposed_ports": {
"1234": "1234"
},
"local_ips": [
"10.0.16.61"
],
"public_ips": [
"34.121.32.110"
]
}
}
}
Comments
0 comments
Please sign in to leave a comment.