Valohai allows you to easily collect metadata, such as key performance metrics from executions, visualize it and compare it across multiple executions.
In this section you will learn:
- How to collect metadata
- How to visualize metadata in the UI
- How to compare metadata between executions
- Easily keep track of your executions key metrics and sort executions based on your custom metrics.
- Anything that you print as JSON from your code gets collected as potential Valohai metadata.
- Visualize and compare execution metrics as a timeseries, a scatter plot or a confusion matrix in the UI.
- You can set early stopping rules for executions and Tasks or stop a pipeline from proceeding based on metadata values.
- You can download the metadata as a CSV or JSON file from the executions metadata tab.
Update train.py to add metadata logging:
- Create a new function
log_metadata
that will log metadata - Create a TensorFlow LambdaCallback to trigger the
log_metadata
function every time an epoch ends - Pass the new callback to the
model.fit
method
import numpy as np
import tensorflow as tf
import valohai
def log_metadata(epoch, logs):
with valohai.logger() as logger:
logger.log('epoch', epoch)
logger.log('accuracy', logs['accuracy'])
logger.log('loss', logs['loss'])
input_path = valohai.inputs('dataset').path()
with np.load(input_path, allow_pickle=True) as f:
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
optimizer = tf.keras.optimizers.Adam(learning_rate=valohai.parameters('learning_rate').value)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer,
loss=loss_fn,
metrics=['accuracy'])
callback = tf.keras.callbacks.LambdaCallback(on_epoch_end=log_metadata)
model.fit(x_train, y_train, epochs=valohai.parameters('epoch').value, callbacks=[callback])
model.evaluate(x_test, y_test, verbose=2)
output_path = valohai.outputs().path('model.h5')
model.save(output_path)
Collect test metrics
- Save the model test accuracy and test loss into variables
- Log the test metrics with the Valohai logger
import numpy as np
import tensorflow as tf
import valohai
def log_metadata(epoch, logs):
with valohai.logger() as logger:
logger.log('epoch', epoch)
logger.log('accuracy', logs['accuracy'])
logger.log('loss', logs['loss'])
input_path = valohai.inputs('dataset').path()
with np.load(input_path, allow_pickle=True) as f:
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
optimizer = tf.keras.optimizers.Adam(learning_rate=valohai.parameters('learning_rate').value)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer,
loss=loss_fn,
metrics=['accuracy'])
callback = tf.keras.callbacks.LambdaCallback(on_epoch_end=log_metadata)
model.fit(x_train, y_train, epochs=valohai.parameters('epoch').value, callbacks=[callback])
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=2)
with valohai.logger() as logger:
logger.log('test_accuracy', test_accuracy)
logger.log('test_loss', test_loss)
output_path = valohai.outputs().path('model.h5')
model.save(output_path)
Run in Valohai
Adding or changing metadata doesn’t require any changes to the valohai.yaml config file.
You can immediately launch a new execution and view the collected metadata.
vh exec run train-model --adhoc
View metrics
- Go to your project’s executions
- Click on the Show columns button on the right side, above the table
- Select accuracy and loss to show them in the table.
- Open the latest execution
- Go to the metadata tab to view metrics from that executions.
- Select epoch on X-axis and accuracy and loss on Y-axis
The metadata value displayed in the table is always the latest printed metadata. In your script you should ensure that the last value you print out for accuracy is the best value for your use case.
Comments
0 comments
Please sign in to leave a comment.