Tasks inside a pipeline come in handy when you need to run a hyperparameter optimization or parameter sweep as part of a pipeline.
All the outputs of the Task will be passed to the next node
Define a Task node in the app
You can easily convert any existing execution node to a task node in the web ui.
- Open your project’s pipelines tab
- Create a new pipeline
- Select the right blueprint from the drop-down menu
- Click on a node that has parameters
- Click on Convert to task (below the graph)
- Scroll down to the Parameters section and configure your Task
- Create a pipeline
Define a Task node in YAML
You can define a Task in your valohai.yaml config file by setting the node’s execution type to task.
- pipeline:
name: Training Pipeline
nodes:
- name: preprocess
type: execution
step: Preprocess dataset (MNIST)
- name: train
type: task
step: Train model (MNIST)
on-error: stop-all
override:
inputs:
- name: training-set-images
- name: training-set-labels
- name: test-set-images
- name: test-set-labels
- name: evaluate
type: execution
step: Batch inference (MNIST)
edges:
- [preprocess.output.*train-images*, train.input.training-set-images]
- [preprocess.output.*train-labels*, train.input.training-set-labels]
- [preprocess.output.*test-images*, train.input.test-set-images]
- [preprocess.output.*test-labels*, train.input.test-set-labels]
- [train.output.model*, evaluate.input.model]
On-Error
By default, the whole pipeline node will stop if a single job in the Task errors. You can change this default behavior by setting the on-error property for the node.
The options are:
-
stop-all
: This is the default behavior. If one execution is the Task node fails the whole node will be errored and the pipeline stopped. -
continue
: Continue executing the Task node, even if an execution inside the Task errors. The expectation is that at least one of the executions in the Task has been completed succesfully. -
stop-next
: Stops only the nodes that follow the errored node.
On-error example
The below example shows a pipeline with two parallel task nodes.
- train is defined with on-error: stop-next
- train2 is defined with on-error: continue
Each of the task nodes run 2 executions, and in each of them, one of the executions fails. Using the on-error
rules defined in the valohai.yaml
the pipeline won't execute the evaluate node because the train node had one failed execution. But evaluate2
will be executed because of on-error
of train2 is set to continue
.
The valohai.yaml
used for the pipeline looks like:
- pipeline:
name: Training Pipeline
nodes:
- name: preprocess
type: execution
step: preprocess-dataset
- name: train
type: task
on-error: stop-next
step: train-model
override:
inputs:
- name: dataset
- name: evaluate
type: execution
step: batch-inference
- name: train2
type: task
on-error: continue
step: train-model
override:
inputs:
- name: dataset
- name: evaluate2
type: execution
step: batch-inference
edges:
- [preprocess.output.preprocessed_mnist.npz, train.input.dataset]
- [preprocess.output.preprocessed_mnist.npz, train2.input.dataset]
- [train.output.model*, evaluate.input.model]
- [train2.output.model*, evaluate2.input.model]
Comments
0 comments
Please sign in to leave a comment.