The Wayback Machine - https://web.archive.org/web/20210815102722/https://github.com/etcd-io/etcd/issues/11713
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incomplete put wipes values #11713

Open
xaki23 opened this issue Mar 23, 2020 · 18 comments
Open

incomplete put wipes values #11713

xaki23 opened this issue Mar 23, 2020 · 18 comments

Comments

@xaki23
Copy link

@xaki23 xaki23 commented Mar 23, 2020

"found" by the @discordapp troops the hard way: https://status.discordapp.com/incidents/62gt9cgjwdgf
pinging @zorkian who pointed it out to me.

trivial to repro, this is against a 3.3.13 on fedora-31.

$ etcdctl set /baa schnorp
$ etcdctl get /baa
schnorp
$ echo -ne 'PUT /v2/keys/baa HTTP/1.1\r\nHost: boo\r\nContent-Length: 123\r\n\r\n' | nc localhost 2379
HTTP/1.1 200 OK
...
$ etcdctl get /baa
$

the key part here is the PUT that announces to send 123 bytes of body, then doesnt send anything.

while i agree (see link) this needs to be fixed in the used http lib, it may be worth adding some sanity checks in etcd because with etcd there might be more impact then a lost blogpost.

@stale
Copy link

@stale stale bot commented Jun 21, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 21, 2020
@mezz
Copy link

@mezz mezz commented Jun 22, 2020

I don't think this issue has been addressed, so it should not be closed yet.

@stale stale bot removed the stale label Jun 22, 2020
@readall
Copy link

@readall readall commented Aug 17, 2020

Is there a possibility of using different http lib to avoid this?
If so which library is recommended?

@stale
Copy link

@stale stale bot commented Nov 15, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 15, 2020
@zorkian
Copy link

@zorkian zorkian commented Nov 15, 2020

Still an issue, shouldn't be closed.

@stale stale bot removed the stale label Nov 15, 2020
@stale
Copy link

@stale stale bot commented Feb 13, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 13, 2021
@mezz
Copy link

@mezz mezz commented Feb 14, 2021

Sorry I don't think this has been addressed so it should stay open.
@ptabor could I please bring this issue to your attention?

@stale stale bot removed the stale label Feb 14, 2021
@ptabor
Copy link
Contributor

@ptabor ptabor commented Feb 14, 2021

Happy to review contribution that fixes this (with test).

I believe this piece of code requires tweaking:

func unmarshalRequest(lg *zap.Logger, r *http.Request, req json.Unmarshaler, w http.ResponseWriter) bool {

@preetham
Copy link

@preetham preetham commented Feb 28, 2021

@ptabor @mezz I would like to take this up, Is there any specific thing I should know before diving into this?

@mezz
Copy link

@mezz mezz commented Feb 28, 2021

@preetham I think the repro steps at the top of this issue do a good job of targeting this issue, it covers the issue completely so should be good to dive in.
To complete the task I think you will just need to include a test and show that the repro steps are resolved by your fix.

It is not necessary to read, but for a more complete context around this bug, you can scroll down a bit on this page to see our technical analysis when we ran into this bug in production at Discord: https://discordstatus.com/incidents/62gt9cgjwdgf

@preetham
Copy link

@preetham preetham commented Feb 28, 2021

@mezz I have started the server locally:
If I try to use v2 APIs using etcdctl I am getting the following error:

$ ./etcdctl set /message Hello                                      
Error:  client: response is invalid json. The endpoint is probably not valid etcd cluster endpoint.

If I try via curl:

$ curl -X PUT http://127.0.0.1:2379/v2/keys/message -d value="Hello"
404 page not found

Version request is working as expected:

$ curl -L http://127.0.0.1:2379/version
{"etcdserver":"3.5.0-alpha.0","etcdcluster":"3.5.0"}% 

v3 APIs are working as expected:

$ ./etcdctl put /message Hello
OK
$ ./etcdctl get /message   
/message
Hello

Is there something I am missing? Anything I need to setup before trying to access these APIs?

@mezz
Copy link

@mezz mezz commented Mar 2, 2021

@preetham I am not a maintainer of this project, so I am not sure what issue you are running into.
You may be able to find help on the mailing list or IRC: https://github.com/etcd-io/etcd#contact

@xaki23
Copy link
Author

@xaki23 xaki23 commented Mar 3, 2021

it seems possible this is actualy "fixed" in more recent versions.

perhaps someone more familiar with the changes to etcd over the last 1-2 years can confirm whether the "plain RESTy v2 HTTP API" was removed somewhere between 3.3.x and 3.4.x.

this would make it a case of "taking your etcd to the vet to get it fixed", but still "fixed".

@ptabor
Copy link
Contributor

@ptabor ptabor commented Mar 3, 2021

The v3 HTTP API is:
curl -X POST http://127.0.0.1:2379/v3/kv/put -d '{"key": "k", "value": "v"}'

@fabjan
Copy link

@fabjan fabjan commented May 9, 2021

This is reproducible in the current version using the v2 API:

~ $ cd /tmp
/tmp $ git clone https://github.com/etcd-io/etcd.git
...
/tmp $ cd etcd
/tmp/etcd $ ./build.sh
...
/tmp/etcd $ ./bin/etcd --enable-v2
...
~ $ curl http://127.0.0.1:2379/v2/keys/baa -XPUT -d value="schnorp"
{"action":"set","node":{"key":"/baa","value":"schnorp","modifiedIndex":4,"createdIndex":4}}
~ $ curl http://127.0.0.1:2379/v2/keys/baa
{"action":"get","node":{"key":"/baa","value":"schnorp","modifiedIndex":4,"createdIndex":4}}
~ $ echo -ne 'PUT /v2/keys/baa HTTP/1.1\r\nHost: boo\r\nContent-Length: 123\r\n\r\n' | nc localhost 2379
HTTP/1.1 200 OK
...
~ $ curl http://127.0.0.1:2379/v2/keys/baa
{"action":"get","node":{"key":"/baa","value":"","modifiedIndex":5,"createdIndex":5}}

With the v3 API, you get a 400 Bad Request back if the data is shorter than the provided content length header (and it requires a proto-encoded message I presume, it did not accept json from me).

@fabjan
Copy link

@fabjan fabjan commented May 9, 2021

I looked into reproducing this and arrived at the same conclusion as in the Discord incident report.

However, I think something might be missing from the repro @xaki23: the incident report says a url encoded form body is set using PUT, but the Content-Type is empty in your curl example, so the request would not be interpreted as a form at all.

If the content type identifies the request as having form data, or if the headers are not completely sent, an error response comes back:

~ $ echo -ne 'PUT /v2/keys/baa HTTP/1.1\r\nHost: boo\r\nContent-Length: 123\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\n' | nc localhost 2379
HTTP/1.1 400 Bad Request
...
{"errorCode":210,"message":"Invalid POST form","cause":"unexpected EOF","index":0}                                                                                                                                  
~ $ echo -ne 'PUT /v2/keys/baa HTTP/1.1\r\nHost: boo\r\nContent-Length: 123\r\nContent-Type: application/x-www-form-urlenc' | nc localhost 2379 
HTTP/1.1 400 Bad Request
...
400 Bad Request

The cause of the reported issue seems to be that Go's http.Request.FormValue treats value="" and nothing at all in the form data as the same thing. application/x-www-form-urlencoded is really only defined in terms of HTML and POSTing forms as far as I can gather, but according to the HTTP rfc it is the responsibility of the origin server to verify that the new content is valid for the given resource, so one fix could be to return status 415 if the content-type is neither application/x-www-form-urlencoded or multipart/form-data, and possibly also if value=... is not present in the request (i.e. if len(r.Form["value"]) < 1 when "value" is extracted from the form into the photo request in parseKeyRequest). Note that the "value" field is optional in the etcd v2 protocol, parseKeyRequest even disallows its existence if the "refresh" field is in the form.

Verifying that the content type is indeed a form would catch the error that cause this issue to be created, but it is not a backwards compatible change, as up until now clients have not had to specify anything. Well they have had to specify a form content type in order to set any values at all except the empty string (since it's the only way to get the Go http library to pick the values up), so perhaps it's not a bad change and there is not many clients that rely on the fact that you can set a key to the empty string by PUTing an empty request (it's like the only thing one can do with an incorrect content type).

Side note: @xaki23 I know this issue is old now and I guess Discord devs worked around it, but it seems to me to be impossible to set any values at all without sending a content-type header identifying a form. And when a form content type is present, an error comes back if the body data is shorter than the content length header.

@fabjan
Copy link

@fabjan fabjan commented May 9, 2021

@preetham: I'm not working on a fix btw, so keep on hacking if you found a good solution.

@readall, @zorkian did you also see the issue? How did you trigger it and which http client are you using? (It seems to me that the only way to trigger this is to not send a content type, and if that is not sent there is no way to set any values ever except for the empty string).

@stale
Copy link

@stale stale bot commented Aug 7, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment