In this tutorial you will learn how to create and run a Batch Inference execution in Valohai. This execution will use TensorFlow 2.5.0 to run new CSV data through a previously trained model.
For this tutorial you will need:
- Python 3.6 or newer
- Valohai command-line client (Run
pip install --upgrade valohai-cli
)
We’re also going to need two files:
- a model trained with TensorFlow 2.5.0
- some new data in a single CSV file
To make it easy for you they are available here, no need to download them:
- Model: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/model.zip
- Images: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/data.csv
If you want to, you can train the required model by following the Keras example here: https://keras.io/examples/structured_data/structured_data_classification_from_scratch/.
Running on Valohai
To easily run our batch inference on Valohai, we will use it to run our code from the very beginning.
If you don’t already have a Valohai account, go to https://app.valohai.com/ to create one for yourself.
Create a new folder for our project, then run the following commands in the project folder:
vh login
# fill in your username
# and your password
vh init
# Answer the wizard questions like this:
# "First, let's..." -> y
# "Looks like..." -> python batch_inference.py, then y to confirm
# "Choose a number or..." -> tensorflow/tensorflow:2.5.0, then y to confirm
# "Write this to..." -> y
# "Do you want to link..." -> C, then give a name for your project, then select your user
Edit the generated valohai.yaml
so that it looks like this:
---
- step:
name: Batch Inference
image: tensorflow/tensorflow:2.5.0
command:
- pip install pandas valohai-utils
- python batch_inference.py
inputs:
- name: model
default: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/model.zip
- name: images
default: s3://valohai-public-files/tutorials/batch-inference/csv-batch-inference/data.csv
What we are doing here is defining a single step for our machine learning pipeline, which is the Batch Inference step. We will run on top of the official tensorflow/tensorflow:2.5.0 Docker image, first install the valohai-utils Python library and then run our batch inference code. Let’s test that everything is set up correctly by running on Valohai:
vh exec run --adhoc "Batch Inference"
If everything went as planned, we should see our Valohai execution end after finding out that batch_inference.py
is missing:
Unpacking the Model
Today we are unpacking the model ourselves. Let’s get started by creating and opening up batch_inference.py
in your favorite editor!
Add these imports to the beginning of the file:
import json
from zipfile import ZipFile
import pandas as pd
import tensorflow as tf
import valohai as vh
For unpacking the model, we will only need zipfile
and valohai
, but we will use the rest of the imports soon enough.
Next, unpack the model to a folder called model in the current working directory:
with ZipFile(vh.inputs('model').path(process_archives=False), 'r') as f:
f.extractall('model')
Done!
Loading and Using Our Model
Begin by loading our model:
model = tf.keras.models.load_model('model')
Easy, huh? Let’s load up the data:
csv = pd.read_csv('data.csv')
labels = csv.pop('target')
data = tf.data.Dataset.from_tensor_slices((dict(csv), labels))
batch_data = data.batch(batch_size=32)
Aaand we are almost done. Run the model with the loaded up data. While we’re at it, let’s log and save the results as a JSON file:
results = model.predict(batch_data)
# Let's build a dictionary out of the results,
# e.g. {"1": 0.375, "2": 0.76}
flattened_results = results.flatten()
indexed_results = enumerate(flattened_results, start=1)
metadata = dict(indexed_results)
for value in metadata.values():
with vh.logger() as logger:
logger.log("result", value)
with open(vh.outputs().path('results.json'), 'w') as f:
# The JSON library doesn't know how to print
# NumPy float32 values, so we stringify them
json.dump(metadata, f, default=lambda v: str(v))
Let’s run the batch inference on Valohai:
vh exec run --adhoc "Batch Inference"
If everything went according to plan, you can now preview the results in the Outputs tab:
Comments
0 comments
Please sign in to leave a comment.