Tasks inside a pipeline come in handy when you need to run a hyperparameter optimization or parameter sweep as part of a pipeline.
All the outputs of the Task will be passed to the next node
Define a Task node in the app
You can easily convert any existing execution node to a task node in the web ui.
- Open your project’s pipelines tab
- Create a new pipeline
- Select the right blueprint from the drop-down menu
- Click on a node that has parameters
- Click on Convert to task (below the graph)
- Scroll down to the Parameters section and configure your Task
- Create a pipeline
Define a Task node in YAML
You can define a Task in your valohai.yaml config file by setting the node’s execution type to task.
- pipeline: name: Training Pipeline nodes: - name: preprocess type: execution step: Preprocess dataset (MNIST) - name: train type: task step: Train model (MNIST)
on-error: stop-all override: inputs: - name: training-set-images - name: training-set-labels - name: test-set-images - name: test-set-labels - name: evaluate type: execution step: Batch inference (MNIST) edges: - [preprocess.output.*train-images*, train.input.training-set-images] - [preprocess.output.*train-labels*, train.input.training-set-labels] - [preprocess.output.*test-images*, train.input.test-set-images] - [preprocess.output.*test-labels*, train.input.test-set-labels] - [train.output.model*, evaluate.input.model]
By default, the whole pipeline node will stop if a single job in the Task errors. You can change this default behavior by setting the on-error property for the node.
The options are:
stop-all: This is the default behavior. If one execution is the Task node fails the whole node will be errored and the pipeline stopped.
continue: Continue executing the Task node, even if an execution inside the Task errors. The expectation is that at least one of the executions in the Task has been completed succesfully.
stop-next: Stops only the nodes that follow the errored node.
The below example shows a pipeline with two parallel task nodes.
- train is defined with on-error: stop-next
- train2 is defined with on-error: continue
Each of the task nodes run 2 executions, and in each of them, one of the executions fails. Using the
on-error rules defined in the
valohai.yaml the pipeline won't execute the evaluate node because the train node had one failed execution. But
evaluate2 will be executed because of
on-error of train2 is set to
valohai.yaml used for the pipeline looks like:
- pipeline: name: Training Pipeline nodes: - name: preprocess type: execution step: preprocess-dataset - name: train type: task on-error: stop-next step: train-model override: inputs: - name: dataset - name: evaluate type: execution step: batch-inference - name: train2 type: task on-error: continue step: train-model override: inputs: - name: dataset - name: evaluate2 type: execution step: batch-inference edges: - [preprocess.output.preprocessed_mnist.npz, train.input.dataset] - [preprocess.output.preprocessed_mnist.npz, train2.input.dataset] - [train.output.model*, evaluate.input.model] - [train2.output.model*, evaluate2.input.model]
Please sign in to leave a comment.