Pipeline is a version-controlled collection of executions some of which rely on the results of the previous executions thus creating a directed graph. These pipeline graphs consist of nodes and edges and we’ll discuss these further down.
For example, consider the following sequence of data science operations:
- Preprocess dataset on a memory-optimized machine
- Train multiple machine learning models on GPU machines using the preprocessed data
- Evaluate all of the trained models
- Find-best-model compare the models to find the best model
- Deploy a new version of your trained model for online inference
Our pipeline would have 4 or more nodes; at least one for each step mentioned above and one for the deployment.
In the example below we’ll train 3 different models in parallel and compare them to find the best performing model that we can deploy to an HTTP endpoint. It is worth noting that when evaluating multiple trained models inside a pipeline, the comparison for choosing the best model is not done automatically. It’s up to you to define the comparison according to your models' criteria and run the comparison in its own node (find-best-model node in the figure below).
You can manage pipelines under the
Pipelines tab on the web app if the feature has been enabled for your account. Your project will need to connect to a Git repository and have a pipelines section defined in your
Nodes of the pipeline (the circles that receive something and/or produce something):
- Nodes can be either executions, or deployments.
- It is also possible to run Tasks inside pipelines by converting an execution node into a task node.
- Each node will start automatically when all of the requirements have been met.
- Each node has a list of “edges” (explained below).
Edges of the pipeline (the lines between nodes) are either:
- output files used as an input of an upcoming execution or deployment.
- input files, to allow for copying inputs from one node to another and ensure multiple pipeline nodes use the same inputs.
- parameters that are passed from one node to another.
- metadata that is passed from the node where it was created to a parameter in the next node.