Keras - Serving Keras Model Quickly with TensorFlow Serving and Docker
This blog post is a Keras example of the Tensorflow Serving featured in the following blog post: Serving ML Quickly with TensorFlow Serving and Docker. Please read this blog post first, as it explains/links to relevant topics (e.g. how to install docker).
In this post I'll show how to:
- Save a Keras model in the format expected by Tensorflow Serving
- Load the model into Tensorflow Serving
- Query the model with an HTTP client and verify the same output as the local model
We'll use Keras's InceptionV3 model, pretrained over imagenet as our example Keras model and serve it in production.
import os
import tensorflow as tf
import keras
from IPython.display import Image, display
from IPython.core.display import HTML
print(keras.__version__, tf.__version__)
# Install additional packages
!pip3 install requests
import requests
1. Save a Keras model in the format expected by Tensorflow Serving¶
In this example, we will load an inception_v3 model pretrained over imagenet, but it can be replaced with any other Keras model that you have trained yourself.
### Load a pretrained inception_v3
inception_model = keras.applications.inception_v3.InceptionV3(weights='imagenet')
# Define a destination path for the model
MODEL_EXPORT_DIR = '/tmp/inception_v3'
MODEL_VERSION = 1
MODEL_EXPORT_PATH = os.path.join(MODEL_EXPORT_DIR, str(MODEL_VERSION))
print("Model dir: ", MODEL_EXPORT_PATH)
# Before saving the model, let's print the input tensors of the model:
print(inception_model.inputs)
# We'll need to create an input mapping, and name each of the input tensors.
# In the inception_v3 Keras model, there is only a single input and we'll name it 'image'
input_names = ['image']
name_to_input = {name: t_input for name, t_input in zip(input_names, inception_model.inputs)}
print(name_to_input)
# Save the model to the MODEL_EXPORT_PATH
# Note using 'name_to_input' mapping, the names defined here will also be used for querying the service later
tf.saved_model.simple_save(
keras.backend.get_session(),
MODEL_EXPORT_PATH,
inputs=name_to_input,
outputs={t.name: t for t in inception_model.outputs})
# Show the saved resources
!ls -la {MODEL_EXPORT_PATH}
When running in production, you will usually want to save these model resources and download them to multiple machines behind a load balancer. Next, we will load this model to a Tensorflow Serving docker.
2. Load the model into Tensorflow Serving¶
All you need for serving this model is to run a Tensorflow Serving docker as described in Serving ML Quickly with TensorFlow Serving and Docker.
In this context, the source should be the directory we saved the model to (i.e. '/tmp/inception_v3').
- Copy the saved model to the hosts' specified directory. (source=/tmp/inception_v3 in this example)
- Run the docker:
docker run -d -p 8501:8501 --name keras_inception_v3 --mount type=bind,source=/tmp/inception_v3,target=/models/inception_v3 -e MODEL_NAME=inception_v3 -t tensorflow/serving
- Verify that there's network access to the Tensorflow service. In order to get the local docker ip (172.*.*.*) for testing run:
docker inspect -f '' keras_inception_v3
# Verify we have network acccess to the TF service
HOST_IP = "172.17.0.3" # Local docker ip
HOST_PORT = "8501"
!curl {HOST_IP}:{HOST_PORT}
After verifying the service is up and there are no issues, let's write a client and start querying the service.
3. Query the model with an HTTP client and verify the same output as the local model¶
Download a Dog (Tibetan Mastiff) and a Cat (Ragdol) image for our tests¶
def download_file(url, filename):
response = requests.get(url)
response.raise_for_status()
with open(filename, 'wb') as f:
f.write(response.content)
cat_url = "https://upload.wikimedia.org/wikipedia/commons/c/c0/Ragdoll_Blue_Colourpoint.jpg"
dog_url = "https://3milliondogs.com/blog-assets-two/2015/06/11407122_926045227460335_778769622795895821_n.jpg"
cat_filename = os.path.join("/tmp", "cat.jpg")
dog_filename = os.path.join("/tmp", "dog.jpg")
download_file(cat_url, cat_filename)
download_file(dog_url, dog_filename)
Define a simple method to query the model locally¶
import requests
import numpy as np
from keras.preprocessing.image import load_img, img_to_array
from keras.applications.inception_v3 import preprocess_input
from keras.applications.imagenet_utils import decode_predictions
INCEPTIONV3_TARGET_SIZE = (299, 299)
def predict(image_path):
x = img_to_array(load_img(image_path, target_size=INCEPTIONV3_TARGET_SIZE))
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
return inception_model.predict(x)
Define a client class for Tensorflow Serving¶
# Define a Base client class for Tensorflow Serving
class TFServingClient:
"""
This is a base class that implements a Tensorflow Serving client
"""
TF_SERVING_URL_FORMAT = '{protocol}://{hostname}:{port}/v1/models/{endpoint}:predict'
def __init__(self, hostname, port, endpoint, protocol="http"):
self.protocol = protocol
self.hostname = hostname
self.port = port
self.endpoint = endpoint
def _query_service(self, req_json):
"""
:param req_json: dict (as define in https://cloud.google.com/ml-engine/docs/v1/predict-request)
:return: dict
"""
server_url = self.TF_SERVING_URL_FORMAT.format(protocol=self.protocol,
hostname=self.hostname,
port=self.port,
endpoint=self.endpoint)
response = requests.post(server_url, json=req_json)
response.raise_for_status()
return np.array(response.json()['predictions'])
# Define a specific client for our inception_v3 model
class InceptionV3Client(TFServingClient):
# INPUT_NAME is the config value we used when saving the model (the only value in the `input_names` list)
INPUT_NAME = "image"
TARGET_SIZE = INCEPTIONV3_TARGET_SIZE
def load_image(self, image_path):
"""Load an image from path"""
img = img_to_array(load_img(image_path, target_size=self.TARGET_SIZE))
return preprocess_input(img)
def predict(self, image_paths):
imgs = [self.load_image(image_path) for image_path in image_paths]
# Create a request json dict
req_json = {
"instances": [{self.INPUT_NAME: img.tolist()} for img in imgs]
}
print(req_json)
return self._query_service(req_json)
# Instantiate a client
hostname = "172.17.0.3"
port = "8501"
endpoint="inception_v3"
client = InceptionV3Client(hostname=hostname, port=port, endpoint=endpoint)
Validate Results¶
Query the image using both local/TF serving models and validate the results are the same
cat_local_preds = predict(cat_filename)
cat_remote_preds = client.predict([cat_filename])
dog_local_preds = predict(dog_filename)
dog_remote_preds = client.predict([dog_filename])
# Validate the prediction values are the same for the local and remote models
assert np.allclose(cat_local_preds, cat_remote_preds, atol=1e-07)
assert np.allclose(dog_local_preds, dog_remote_preds, atol=1e-07)
We have now saved a Keras model in Tensorflow Serving. This also works for models with multiple inputs as long as you map each input to a name and use it also in the Tensorflow Serving http request.
Now that we have an InceptionV3 model, let's try to query a few images and see how good the classifications are.
Show model predictions¶
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from PIL import Image
def grid_iterator(n_rows, n_cols, axis_size=2.5, total=None, fig_title=None):
"""
matplotlib helper method for plotting a grid of graphs
:param n_rows: int
:param n_cols: int
:param axis_size: float
:param total: int (stopping condition - stop before n_rows * n_cols subplots are filled in the grid)
:param fig_title: str
"""
fig = plt.figure()
gs = gridspec.GridSpec(n_rows, n_cols, top=1., bottom=0., right=1., left=0., hspace=0.15, wspace=0.1)
index = 0
for r in range(n_rows):
for c in range(n_cols):
ax = fig.add_subplot(gs[r, c])
yield index, ax
index += 1
# Allowing for not filling all spots
if total and index >= total:
break
width, height = n_cols * axis_size, n_rows * axis_size
fig.set_size_inches(width, height)
# Add title to the figure
if fig_title is not None:
fig.suptitle(fig_title, fontsize=16, y=1.05)
def get_titles(predicts):
"""
Helper method that concats the highest classification with all classification that have a score >=0.1
"""
titles = []
for decoded_pred in decode_predictions(predicts):
res = []
for _, cls, score in decoded_pred:
if score >= 0.1 or not res:
res.append("%s(%.2f)" % (cls, score))
title = ",".join(res)
titles.append(title)
return titles
# Define a list of image urls
image_urls = [
dog_url,
cat_url,
"https://upload.wikimedia.org/wikipedia/commons/1/1f/Oryctolagus_cuniculus_Rcdo.jpg",
"https://cdn.omlet.co.uk/images/originals/healthy-rabbits.jpg",
"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQjfw20fp-pcOqDMzBU-kOSgU7LL5myy69bDzatFTukdh4U3DV28A",
"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcR-_q5kjh1vrsHkwQQi52_xYAA72TonOGseWg2y9kp5DcLAda4p",
"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRVrGcFxHwHcIFWSZuCmdtRAlz45oGCY91E3ymr20UsjqVwhoGS",
"https://i.pinimg.com/originals/9b/e4/41/9be44139b01b18221fa35340e8da50fc.jpg",
"https://wwwselwomarinaes-jrzviv0e.stackpathdns.com/sites/selwomarina.es/files/uploads/styles/adaptive/public/animales/media/web_img_0001.jpg?itok=2Jchrtuf",
"https://upload.wikimedia.org/wikipedia/commons/4/40/Canada_goose_on_Seedskadee_NWR_%2827826185489%29.jpg",
"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSyojh-pUD9QUSpbcVcUKGcUQC_2exNPliU1ddkiinZUA3kf_KPCw",
"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQz0QAkiDOwzVBkRVcjRHtiJceYTymUa3uBXEFBjPxj9Bkbi5jUBA",
"https://i0.wp.com/learnaboutanimals.co/wp-content/uploads/2016/07/green-mamba.jpg?resize=567%2C375",
]
# Download files
img_files = []
for i, image_url in enumerate(image_urls):
filename = os.path.join("/tmp", "%d.jpg" % i)
download_file(image_url, filename)
img_files.append(filename)
# Run predictions
%time predicts = client.predict(img_files)
assert len(predicts) == len(img_files)
titles = get_titles(predicts)
for i, ax in grid_iterator(n_rows=5, n_cols=3, axis_size=4, total=len(img_files)):
title = titles[i]
img = Image.open(img_files[i])
ax.imshow(img)
ax.set_xticks([])
ax.set_yticks([])
ax.set_aspect('auto')
ax.set_title(title)