TensorFlow Serving "version_labels" do not work as documented for the HTTP REST API #1555

ericmclachlan · 2020-02-17T12:26:03Z

Bug Report

System information

OS Platform and Distribution: Windows 10 and Linux Ubuntu 18.04 in WSL
TensorFlow Serving installed from: Docker image (binary)
TensorFlow Serving version: Docker image tensorflow/serving:latest (downloaded around January 20th)

Describe the problem

At a high level, the version_labels feature appears not to work correctly. (This feature is illustrated with the "canary" vs "stable" version labels example in the TensorFlow documentation).

More specifically, it may only work with simple numeric version labels.

The following URL will be rejected by TensorFlow Serving: host/v1/models/my_model/versions/1.0:predict
with a Malformed request... error.

Exact Steps to Reproduce

I believe that any kind of non-numeric version label will fail. (i.e. The "canary" vs "stable" example itself should fail.)

Source code / logs

http_rest_api_handler.cc seems to define a regular expression used to parse the URL. Note that, after "versions", the regular expression defines an expectation for a numeric version.

prediction_api_regex_(
          R"((?i)/v1/models/([^/:]+)(?:/versions/(\d+))?:(classify|regress|predict))")

It's possible that I misread the code, but this interpretation agrees with an observation on StackOverflow that numeric versions are handled correctly whereas text version labels are not.

rmothukuru · 2020-02-18T06:47:10Z

@ericmclachlan,
Can you please confirm if you have tried Inference using Numeric Version Labels and using Non-Numeric Version Labels.

And if you observed that Numeric Labels are working fine and Non-Numeric Version Labels are resulting in error, can you please provide the Commands which you have used in both the cases (Numeric and Non-Numeric), and the respective Output or the Error. Thanks!

rmothukuru · 2020-02-18T06:48:39Z

Also, you can look at this article for more information about Configuring Different Versions for Serving.

ericmclachlan · 2020-02-18T07:24:59Z

Hi @rmothukuru

Thank you for the link but we already have TensorFlow Serving working in production. The problem lies specifically in trying to use the version_labels feature as described in the documentation.

I have just confirmed that sending requests like the following yields predictions:

http://localhost:8501/v1/models/model1/versions/1579843026:predict

However, using a URL like this does not:

http://localhost:8501/v1/models/model1/versions/test:predict

The only difference between these two is that this one uses the version_labels instead of the version itself.

The exact error message I receive in the POST response is:

{ "error": "Malformed request: POST /v1/models/model1/versions/test:predict" }

This is despite the model.config file containing the following version_labels definition:

config {
    name: 'model1'
    base_path: '/models/model1/'
    model_platform: "tensorflow"
    model_version_policy {
      specific {
        versions: 1579843026
      }
    }
    version_labels {
      key: 'test'
      value: 1579843026
    }
  }

I'm 99% sure this specific error message is being reported because this line in the code is failing; causing the "Malformed request" error defined in the previous few lines to be reported for this POST request.

Thanks for your help in investigating this problem.

rmothukuru · 2020-02-18T08:30:23Z

@ericmclachlan,
Can you please confirm that you have invoked the Tensorflow Model Server along with the Config File, as shown below. Thanks!

tensorflow_model_server --port=8500 --rest_api_port=8501 --model_config_file=/home/configs/models.conf

ericmclachlan · 2020-02-18T08:39:36Z

@rmothukuru: I'm deploying the TensorFlow Server using docker-compose. Below is a simplied version:

tensorflow-servings:
    image: tensorflow/serving:latest
    ports:
      - 8500:8500
      - 8501:8501
    command:
      - --allow_version_labels_for_unavailable_models
      - --batching_parameters_file=/config/batching_parameters.txt
      - --enable_batching=true
      - --model_config_file=/config/all_models.config
      - --model_config_file_poll_wait_seconds=10
      - --monitoring_config_file=/config/monitoring_config.txt
      - --rest_api_timeout_in_ms=30000

This issue on GitHub confirms my observation and suggests:

Re: not being able to access the version using labels via HTTP - this is something that's not possible today (AFAIR) - only through the grpc interface can you declare labels :(

This makes me sad. The limitations of the HTTP implementation should be made more transparent. This investigation has now needlessly cost my company money; as I'm sure it will continue to do for others.

I'm not upset with you as an individual of course. But it is nonetheless a frustrating situation.

Arnold1 · 2020-02-18T21:28:40Z

@ericmclachlan just out of curiosity - how does the /config/batching_parameters.txt look like?

ericmclachlan · 2020-02-19T05:34:49Z

Hi @Arnold1

The batching_parameters.txt looks like this:

max_batch_size { value: 1024 }
batch_timeout_micros { value: 100 }
num_batch_threads { value: 4 }
pad_variable_length_inputs: true

Please let me know if anything seems off.

Liu-Da · 2020-02-20T02:51:22Z

Same problem！

gowthamkpr · 2020-03-05T22:51:19Z

TensorFlow Serving "version_labels" only works using GRPC not REST API.

ericmclachlan · 2020-03-06T07:19:53Z

@gowthamkpr: It would probably be helpful if the documentation describing labels mentioned this limitation upfront.

It's less an issue of "This doesn't work" as much an issue of "I've just implemented REST and now I've discovered that this documented feature doesn't work unless I don't use REST."

misterpeddy · 2020-03-07T02:27:02Z

@ericmclachlan thanks for pointing out the lack of documentation on this - I've added a note [1] clarifying that version labels don't work for REST.

@christisg how do we feel about fixing this? It's not a lot of work and I'd be happy to take it. We'd have 2 options:

Keep the current path [2] try to parse the version as an int, if failed, use it as a version_label.
Add a new path (model/<>/version_label/<>) for directing a request to a model using its version label.

Option 1) has the drawback of kinda odd behavior in that it implicitly disallows having version labels that can be parsed as int64 - which is currently undocumented behavior.
Option 2) has the drawback of deviating greatly from REST principles as the path model/<>/version_label/<> URI no longer really represents a resource hosted by the server (the version_label is an attribute of a specific version of the model, not an identifier).

WDYT?

/cc @netfs in case he's thought about this before.

[1] 9781ed1
[2]

serving/tensorflow_serving/model_servers/http_rest_api_handler.cc

Line 57 in e085cb3

R"((?i)/v1/models/([^/:]+)(?:/versions/(\d+))?:(classify|regress|predict))"),

christisg · 2020-03-07T06:48:36Z

Thanks @misterpeddy!

How about Option 3) keep the current path and use explicit prefix for labels, i.e models/<model_name>/versions/label=<version_label> ?
The explicit prefix makes it less error prone, and it also allows assigning numerical labels like "1" and "2" if one prefers to.

ericmclachlan · 2020-03-07T10:25:10Z

@misterpeddy, @christisg: I don't want to dilute the conversation too much; I just wanted to say thanks for seriously considering the suggestion. And thanks for all the work you're are doing for the community.

misterpeddy · 2020-03-19T23:46:57Z

Just a note that I have a preliminary implementation for the awesome idea @christisg mentioned and will resume with testing and merging it once we get some of our build breakage (due to incompatibilities with recent changes in upstream TF) under control.

aaur0 · 2020-04-16T00:16:20Z

@misterpeddy Any update on this? Also, We have not moved to Tensorflow 2.0 yet. So, I was wondering if you guys will backport this fix to the older version of TFServing also.

rmothukuru self-assigned this Feb 18, 2020

rmothukuru added the stat:awaiting response label Feb 18, 2020

rmothukuru added the type:bug label Feb 18, 2020

rmothukuru assigned minglotus-6 and unassigned rmothukuru Feb 18, 2020

rmothukuru added stat:awaiting tensorflower and removed stat:awaiting response labels Feb 18, 2020

gowthamkpr assigned misterpeddy Mar 5, 2020

gowthamkpr added the stat:awaiting response label Mar 5, 2020

misterpeddy assigned christisg and unassigned misterpeddy and minglotus-6 Mar 7, 2020

christisg assigned misterpeddy and unassigned christisg Mar 19, 2020

Apr	MAY	Jun
	25
2019	2020	2021

tensorflow / serving

TensorFlow Serving "version_labels" do not work as documented for the HTTP REST API #1555

TensorFlow Serving "version_labels" do not work as documented for the HTTP REST API #1555

ericmclachlan commented Feb 17, 2020

rmothukuru commented Feb 18, 2020

rmothukuru commented Feb 18, 2020

ericmclachlan commented Feb 18, 2020 •

edited

rmothukuru commented Feb 18, 2020

ericmclachlan commented Feb 18, 2020 •

edited

Arnold1 commented Feb 18, 2020

ericmclachlan commented Feb 19, 2020

Liu-Da commented Feb 20, 2020

gowthamkpr commented Mar 5, 2020

ericmclachlan commented Mar 6, 2020 •

edited

misterpeddy commented Mar 7, 2020 •

edited

christisg commented Mar 7, 2020

ericmclachlan commented Mar 7, 2020

misterpeddy commented Mar 19, 2020

aaur0 commented Apr 16, 2020

tensorflow / serving

Join GitHub today

TensorFlow Serving "version_labels" do not work as documented for the HTTP REST API #1555

TensorFlow Serving "version_labels" do not work as documented for the HTTP REST API #1555

Comments

ericmclachlan commented Feb 17, 2020

Bug Report

System information

Describe the problem

Exact Steps to Reproduce

Source code / logs

rmothukuru commented Feb 18, 2020

rmothukuru commented Feb 18, 2020

ericmclachlan commented Feb 18, 2020 • edited

rmothukuru commented Feb 18, 2020

ericmclachlan commented Feb 18, 2020 • edited

Arnold1 commented Feb 18, 2020

ericmclachlan commented Feb 19, 2020

Liu-Da commented Feb 20, 2020

gowthamkpr commented Mar 5, 2020

ericmclachlan commented Mar 6, 2020 • edited

misterpeddy commented Mar 7, 2020 • edited

christisg commented Mar 7, 2020

ericmclachlan commented Mar 7, 2020

misterpeddy commented Mar 19, 2020

aaur0 commented Apr 16, 2020

ericmclachlan commented Feb 18, 2020 •

edited

ericmclachlan commented Feb 18, 2020 •

edited

ericmclachlan commented Mar 6, 2020 •

edited

misterpeddy commented Mar 7, 2020 •

edited