By defining inputs you can easily download data from a public address or your private cloud storage.
In this section you will learn:
- How to define Valohai inputs
- How to change inputs between executions both in the CLI and in the UI
- Valohai will download data files from your private cloud storage. Data can come from for example AWS S3, Azure Storage, GCP Cloud Storage, or a public source (HTTP/HTTPS).
- Valohai handles both authentication with your cloud storage and downloading, uploading, and caching data.
- This means that you don’t need to manage keys, authentication,and use tools like
boto3
,gsutils
, orBlobClient
in your code. Instead you should always treat the data as local
- This means that you don’t need to manage keys, authentication,and use tools like
- All Valohai machines have a local directory
/valohai/inputs/
where all your datasets are downloaded to. Each input will have it’s own directory, for example/valohai/inputs/images/
and/valohai/inputs/model/
. - Each step in your
valohai.yaml
can contain one or multiple input definitions and each input can contain one or multiple files. For example, in a batch inference step you could have a trained model file and a set of images you want to run the inference on. - Each input in
valohai.yaml
can have a default value. These values can be overriden any time you run a new execution to for example change the set of images you want to run batch inference on.
Let’s start by defining the inputs for our train-model step.
Update valohai.yaml to define new inputs:
- step:
name: train-model
command:
- pip install -r requirements.txt
- python train.py {parameters}
image: tensorflow/tensorflow:2.6.0
parameters:
- name: epoch
type: integer
default: 5
- name: learning_rate
type: float
default: 0.001
inputs:
- name: dataset
default: https://valohaidemo.blob.core.windows.net/mnist/mnist.npz
Update train.py
to point the mnist_file_path to the Valohai inputs.
You should also remove the mnist.npz from your local machine.
import numpy as np
import tensorflow as tf
import valohai
input_path = valohai.inputs('dataset').path()
with np.load(input_path, allow_pickle=True) as f:
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
optimizer = tf.keras.optimizers.Adam(learning_rate=valohai.parameters('learning_rate').value)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer,
loss=loss_fn,
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=valohai.parameters('epoch').value)
model.evaluate(x_test, y_test, verbose=2)
output_path = valohai.outputs().path('model.h5')
model.save(output_path)
Run in Valohai
Finally run a new Valohai execution.
vh exec run train-model --adhoc
Rerun an execution with different input data
- Open your project on app.valohai.com
- Open the latest execution
- Click Copy
- Scroll down to the Inputs section and remove the current input.
- You can now either pass in a new URI or select an input from the Data list (for example, if you’ve uploaded a file)
- Click Create execution
You can also run a new execution with different input value from the commandline:
vh exec run train-model --adhoc --dataset=https://myurl.com/differentfile.npz
Comments
0 comments
Please sign in to leave a comment.