Data aliases are aliases to Valohai datum URLs.
You can create a data alias in your project and use the alias as the input for your executions.
There are many different use cases for data aliases. For example:
- Create an alias
model-prod
and point it to the version of a model that should be used for batch inference in production - Create an alias for
train-images
that points to the latest preprocessed dataset of a specific use case. The rest of the team can then point their inputs to this alias, instead of always searching for the latest version of the preprocessed data.
Valohai keeps track of the change history of each alias, so you can understand when did the latest changes happen.
Datum aliases are resolved during execution/task creation, which means copying executions will not change the data if the alias had been changed meanwhile.
Create and update an alias
You can create or update an existing alias using the web UI, programmatically when you create a new output or using the Valohai APIs.
In the Web UI
- Open your project on app.valohai.com
- Open the Data tab
- Open the Aliases tab
- Select Create new datum alias
- Give the alias a name and select which file should the alias point to
On the same page you can also modify existing aliases and view the change history of that datum.
In the code
You also have the option to automatically generate or modify an existing alias when you save a file within your executions. Once you've stored an alias using metadata, it will become visible on the Data -> Alias tab.
import valohai
import json
metadata = {
"valohai.alias": "model-prod", # creates or updates a Valohai data alias to point to this output file
}
save_path = valohai.outputs().path('model.h5')
model.save(save_path)
metadata_path = valohai.outputs().path('model.h5.metadata.json')
with open(metadata_path, 'w') as outfile:
json.dump(metadata, outfile)
Read more about the .metadata.json
-file: Store arbitrary metadata with your file
Use an alias as an input
Datum aliases are resolved during execution/task creation, which means copying executions will not change the data if the alias had been changed meanwhile.
In the Web UI
You can select an alias as the input for your execution by searching for the alias name in the inputs data browser.
In valohai.yaml
You can set the default input of a step as a datum alias. Every time you run that step Valohai will fetch the data file that the alias is pointing to and use it to run the execution.
- step:
name: Train Model
image: tensorflow/tensorflow:1.13.1
command: python myfile.py
inputs:
- name: mydata
default: datum://train-images
- name: mymodel
default: datum://model-prod
In this setup, the "Train Model" step is defined with a default input of a datum alias, "train-images", for the data and "model-prod" for the model weights. When the step is executed, Valohai will automatically retrieve the specified data and model weights for seamless processing.
Comments
0 comments
Please sign in to leave a comment.