The Wayback Machine - https://web.archive.org/web/20200525093130/https://github.com/tensorflow/serving/issues/1555
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow Serving "version_labels" do not work as documented for the HTTP REST API #1555

Open
ericmclachlan opened this issue Feb 17, 2020 · 15 comments

Comments

@ericmclachlan
Copy link

@ericmclachlan ericmclachlan commented Feb 17, 2020

Bug Report

System information

  • OS Platform and Distribution: Windows 10 and Linux Ubuntu 18.04 in WSL
  • TensorFlow Serving installed from: Docker image (binary)
  • TensorFlow Serving version: Docker image tensorflow/serving:latest (downloaded around January 20th)

Describe the problem

At a high level, the version_labels feature appears not to work correctly. (This feature is illustrated with the "canary" vs "stable" version labels example in the TensorFlow documentation).

More specifically, it may only work with simple numeric version labels.

The following URL will be rejected by TensorFlow Serving: host/v1/models/my_model/versions/1.0:predict
with a Malformed request... error.

Exact Steps to Reproduce

I believe that any kind of non-numeric version label will fail. (i.e. The "canary" vs "stable" example itself should fail.)

Source code / logs

http_rest_api_handler.cc seems to define a regular expression used to parse the URL. Note that, after "versions", the regular expression defines an expectation for a numeric version.

prediction_api_regex_(
          R"((?i)/v1/models/([^/:]+)(?:/versions/(\d+))?:(classify|regress|predict))")

It's possible that I misread the code, but this interpretation agrees with an observation on StackOverflow that numeric versions are handled correctly whereas text version labels are not.

@rmothukuru rmothukuru self-assigned this Feb 18, 2020
@rmothukuru
Copy link

@rmothukuru rmothukuru commented Feb 18, 2020

@ericmclachlan,
Can you please confirm if you have tried Inference using Numeric Version Labels and using Non-Numeric Version Labels.

And if you observed that Numeric Labels are working fine and Non-Numeric Version Labels are resulting in error, can you please provide the Commands which you have used in both the cases (Numeric and Non-Numeric), and the respective Output or the Error. Thanks!

@rmothukuru
Copy link

@rmothukuru rmothukuru commented Feb 18, 2020

Also, you can look at this article for more information about Configuring Different Versions for Serving.

@ericmclachlan
Copy link
Author

@ericmclachlan ericmclachlan commented Feb 18, 2020

Hi @rmothukuru

Thank you for the link but we already have TensorFlow Serving working in production. The problem lies specifically in trying to use the version_labels feature as described in the documentation.

I have just confirmed that sending requests like the following yields predictions:

http://localhost:8501/v1/models/model1/versions/1579843026:predict

However, using a URL like this does not:

http://localhost:8501/v1/models/model1/versions/test:predict

The only difference between these two is that this one uses the version_labels instead of the version itself.

The exact error message I receive in the POST response is:

{ "error": "Malformed request: POST /v1/models/model1/versions/test:predict" }

This is despite the model.config file containing the following version_labels definition:

config {
    name: 'model1'
    base_path: '/models/model1/'
    model_platform: "tensorflow"
    model_version_policy {
      specific {
        versions: 1579843026
      }
    }
    version_labels {
      key: 'test'
      value: 1579843026
    }
  }

I'm 99% sure this specific error message is being reported because this line in the code is failing; causing the "Malformed request" error defined in the previous few lines to be reported for this POST request.

Thanks for your help in investigating this problem.

@rmothukuru
Copy link

@rmothukuru rmothukuru commented Feb 18, 2020

@ericmclachlan,
Can you please confirm that you have invoked the Tensorflow Model Server along with the Config File, as shown below. Thanks!

tensorflow_model_server --port=8500 --rest_api_port=8501 --model_config_file=/home/configs/models.conf

@rmothukuru rmothukuru added the type:bug label Feb 18, 2020
@ericmclachlan
Copy link
Author

@ericmclachlan ericmclachlan commented Feb 18, 2020

@rmothukuru: I'm deploying the TensorFlow Server using docker-compose. Below is a simplied version:

tensorflow-servings:
    image: tensorflow/serving:latest
    ports:
      - 8500:8500
      - 8501:8501
    command:
      - --allow_version_labels_for_unavailable_models
      - --batching_parameters_file=/config/batching_parameters.txt
      - --enable_batching=true
      - --model_config_file=/config/all_models.config
      - --model_config_file_poll_wait_seconds=10
      - --monitoring_config_file=/config/monitoring_config.txt
      - --rest_api_timeout_in_ms=30000

This issue on GitHub confirms my observation and suggests:

Re: not being able to access the version using labels via HTTP - this is something that's not possible today (AFAIR) - only through the grpc interface can you declare labels :(

This makes me sad. The limitations of the HTTP implementation should be made more transparent. This investigation has now needlessly cost my company money; as I'm sure it will continue to do for others.

I'm not upset with you as an individual of course. But it is nonetheless a frustrating situation.

@Arnold1
Copy link

@Arnold1 Arnold1 commented Feb 18, 2020

@ericmclachlan just out of curiosity - how does the /config/batching_parameters.txt look like?

@ericmclachlan
Copy link
Author

@ericmclachlan ericmclachlan commented Feb 19, 2020

Hi @Arnold1

The batching_parameters.txt looks like this:

max_batch_size { value: 1024 }
batch_timeout_micros { value: 100 }
num_batch_threads { value: 4 }
pad_variable_length_inputs: true

Please let me know if anything seems off.

@Liu-Da
Copy link

@Liu-Da Liu-Da commented Feb 20, 2020

Same problem!

@gowthamkpr
Copy link

@gowthamkpr gowthamkpr commented Mar 5, 2020

TensorFlow Serving "version_labels" only works using GRPC not REST API.

@ericmclachlan
Copy link
Author

@ericmclachlan ericmclachlan commented Mar 6, 2020

@gowthamkpr: It would probably be helpful if the documentation describing labels mentioned this limitation upfront.

It's less an issue of "This doesn't work" as much an issue of "I've just implemented REST and now I've discovered that this documented feature doesn't work unless I don't use REST."

@misterpeddy
Copy link
Member

@misterpeddy misterpeddy commented Mar 7, 2020

@ericmclachlan thanks for pointing out the lack of documentation on this - I've added a note [1] clarifying that version labels don't work for REST.

@christisg how do we feel about fixing this? It's not a lot of work and I'd be happy to take it. We'd have 2 options:

  1. Keep the current path [2] try to parse the version as an int, if failed, use it as a version_label.
  2. Add a new path (model/<>/version_label/<>) for directing a request to a model using its version label.

Option 1) has the drawback of kinda odd behavior in that it implicitly disallows having version labels that can be parsed as int64 - which is currently undocumented behavior.
Option 2) has the drawback of deviating greatly from REST principles as the path model/<>/version_label/<> URI no longer really represents a resource hosted by the server (the version_label is an attribute of a specific version of the model, not an identifier).

WDYT?

/cc @netfs in case he's thought about this before.

[1] 9781ed1
[2]

R"((?i)/v1/models/([^/:]+)(?:/versions/(\d+))?:(classify|regress|predict))"),

@christisg
Copy link
Member

@christisg christisg commented Mar 7, 2020

Thanks @misterpeddy!

How about Option 3) keep the current path and use explicit prefix for labels, i.e models/<model_name>/versions/label=<version_label> ?
The explicit prefix makes it less error prone, and it also allows assigning numerical labels like "1" and "2" if one prefers to.

@ericmclachlan
Copy link
Author

@ericmclachlan ericmclachlan commented Mar 7, 2020

@misterpeddy, @christisg: I don't want to dilute the conversation too much; I just wanted to say thanks for seriously considering the suggestion. And thanks for all the work you're are doing for the community.

@christisg christisg assigned misterpeddy and unassigned christisg Mar 19, 2020
@misterpeddy
Copy link
Member

@misterpeddy misterpeddy commented Mar 19, 2020

Just a note that I have a preliminary implementation for the awesome idea @christisg mentioned and will resume with testing and merging it once we get some of our build breakage (due to incompatibilities with recent changes in upstream TF) under control.

@aaur0
Copy link

@aaur0 aaur0 commented Apr 16, 2020

@misterpeddy Any update on this? Also, We have not moved to Tensorflow 2.0 yet. So, I was wondering if you guys will backport this fix to the older version of TFServing also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.