There are two ways to run inference on Valohai. You should choose the case that suits your use case better.
-
Valohai Deployments push new deployment versions to your Kubernetes cluster. You’re responsible for writing up your own RESTful APIs (for example with FastAPI, Flask, etc.) and configuring the Kubernetes cluster’s node-groups and scaling rules.
-
Valohai executions for inference allow you to specify an inference job using a standard Valohai execution. You can use the Valohai APIs to launch a new inference job with the specificed data and model file(s).
See the tutorials below to learn how to build your deployments:
Comparing Valohai deployment options
Requirement |
Real-time inference & Custom APIs |
Valohai executions for inference |
---|---|---|
Latency & Inference time |
I need low latency and prediction results in (sub)seconds. |
I don’t have strict latency requirements and my predictions can take minutes. |
API |
Write your own RESTful APIs and routing (FastAPI, Flask, etc.) |
Use Valohai’s own RESTful APIs to launch a prediction with your data & model(s). |
Versioning |
Yes. Valohai keeps track of the different versions of your code and model file(s) through deployment versions. |
Yes. Valohai keeps track of the different versions of your code and model file(s) through execution versioning. |
Metrics |
Yes. Collect custom logs from your endpoints by dumping JSON that can be then visualized with Valohai. See Monitor your deployment endpoints for details. |
Yes. Collect custom logs from your endpoints by dumping JSON that can be then visualized with Valohai. See Collect metadata for details. |
Aliases |
Yes. You can use deployment aliases to give friendly names to your endpoints. For example and alias |
Yes. Each of your model files is versioned automatically by Valohai. You can also create Datum Aliases to easily keep track of the exact model version(s) that are currently in production, QA, or somewhere else. |
Configuration & Management |
You are responsible for creating and managing the Kubernetes cluster. This means that you’ll be responsible for setting up node groups, access control, and different Kubernetes scaling rules. |
Valohai manages the Virtual Machine that will be used for the inference. This machine can either be scaled up/down or you could keep a set of machines hot all the time. |
What do you need to provide? |
|
|
Comments
0 comments
Please sign in to leave a comment.