We suggest using an alias to keep track of the latest version of a specific file.
Data aliases are aliases to Valohai datum URLs, enabling you to update the data being used without any changes to code. You can create a data alias in your project and use the alias as the input for your executions.
Create an alias in the UI
- Open your project on app.valohai.com
- Open the Data tab
- Open the Aliases tab
- Select Create new datum alias
- Give the alias a name (for example dataset-a-latest) and select which file should the alias point to
Now you can use datum://dataset-a-latest
as an input for your execution. Valohai will resolve it to the file that is currently marked for that alias and run the job with that specific file.
Update an alias programmatically
You can create or update an alias to point to a new file whenever saving a file to Valohai outputs.
In addition to saving the file, you'll need to create and save a JSON file that tells Valohai to update the alias to this new file.
The name of the JSON file is always yourfile.ext.metadata.json If you're saving a file called dataset.csv
, then the JSON file needs to be called dataset.csv.metadata.json
import valohai
import json
import pandas
# Some sample data
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
df = pandas.DataFrame(data)
save_path = valohai.outputs().path('dataset.csv')
df.to_csv(save_path) # save your dataframe as a CSV file
# Create a sidecar file for Valohai
# Tell Valohai to update the dataset-a-latest alias to point to this file
metadata = {
"valohai.alias": "dataset-a-latest", # creates or updates a Valohai data alias to point to this output file
}
metadata_path = valohai.outputs().path('dataset.csv.metadata.json')
with open(metadata_path, 'w') as outfile:
json.dump(metadata, outfile)
Comments
0 comments
Please sign in to leave a comment.