Pipelines automate your machine learning operations on the Valohai ecosystem.
pipeline definition has 3 required properties:
name: name for the pipeline
nodes: list of all nodes (executions and deployments) in the pipeline
edges: list of all edges (requirements) between the nodes
A simple pipeline could look something like this:
--- - step: name: generate-dataset image: python:3.6 command: python preprocess.py - step: name: train-model image: tensorflow/tensorflow:2.2.0-gpu command: python train.py inputs: - name: dataset-images default: http://... - name: dataset-labels default: http://... - pipeline: name: simple-pipeline nodes: - name: generate-node type: execution step: generate-dataset - name: train-node type: execution step: train-model - name: deploy-node type: deployment deployment: mydeployment endpoints: - predict-digit edges: - [generate-node.output.images*, train-node.input.dataset-images] - [generate-node.output.labels*, train-node.input.dataset-labels] - [train-node.output.model*, deploy-node.file.predict-digit.model] - endpoint: name: predict-digit description: predict digits from image inputs ("file" parameter) image: tensorflow/tensorflow:1.13.1-py3 wsgi: predict_wsgi:predict_wsgi files: - name: model description: Model output file from TensorFlow path: model.pb
Here we have a pipeline with 3 nodes, and the second node train will wait for its inputs to be generated by generate node. The third node deploys the model outputted by the train node. All files in
/valohai/outputs that start with either
labels will be passed between the executions.
Override default inputs in a pipeline
In the above example:
train-modelstep has two inputs, each with its own default values.
- The pipeline we define that the
train-modelnode should use the outputs of
generate-datasetas its inputs.
By default, Valohai will include both files from the default input location and the files generated by the pipeline as the step’s inputs. You can specify an override in the pipeline if instead, you want the input from the pipeline to override the default input.
- name: train type: execution step: train-model override: inputs: - name: dataset-images - name: dataset-labels