Let’s assume we have something similar to the following set up in Valohai YAML:
- step:
# ...
name: train-model
# ...
inputs:
- name: training-set-images
- name: training-set-labels
And you have a local project linked to Valohai, then you can run the step with the following.
vh exec run train-model
But this will crash because the inputs aren’t defined.
So, how can you refer to various datasets?
Option #1: Custom Store URL
You can connect private data stores to Valohai projects.
If you connect a store that contains files that Valohai doesn’t know about, like the files that you have uploaded there yourself, you can use the following syntax to refer to the files.
- Azure Blob Storage:
azure://{account_name}/{container_name}/{blob_name}
- Google Storage:
gs://{bucket}/{key}
- Amazon S3:
s3://{bucket}/{key}
- OpenStack Swift:
swift://{project}/{container}/{key}
This syntax also has supports wildcard syntax to download multiple files:
s3://my-bucket/dataset/images/*.jpg
for all .jpg (JPEG) filess3://my-bucket/dataset/image-sets/**.jpg
for recursing subdirectories for all .jpg (JPEG) files
You can also interpolate execution parameter into input URIs:
s3://my-bucket/dataset/images/{parameter:user-id}/*.jpeg
would replace{parameter:user-id}
with the value of the parameteruser-id
during an execution.
Usage example
vh exec run train-model \
--training-set-images=s3://my-bucket/dataset/images/train.zip \
--training-set-labels=s3://my-bucket/dataset/labels/train.zip
Option #2: Datum URI
You can use the datum://<identifier>
syntax to refer to specific files Valohai platform already knows about.
Files will have a datum identifier if the files were uploaded to Valohai either:
- by another execution, or
- by using the Valohai web interface uploader under “Data” tab of the project
Find the datum URL through the “datum://” button under “Data” tab of your project.
Usage example:
vh exec run train-model \
--training-set-images=datum://01685ff1-5a7a-c36b-e79e-80623acea29f \
--training-set-labels=datum://01685ff1-5930-8c09-83d1-cd174c9770ab
Option #3: Public HTTP(S) URL
If your data is available through an HTTP(S) address, use the URL as-is.
Usage example:
vh exec run train-model \
--training-set-images=https://example.com/train-images.zip \
--training-set-labels=https://example.com/train-labels.zip
Comments
0 comments
Please sign in to leave a comment.