Fix Lambda invoke in user networks #3509

awilkins · 2021-12-01T17:00:06Z

Which issue(s) does this change fix?

#2837
(at least for the subset of cases that are caused by using a user defined network in a docker stack)

Why is this change necessary?

The Docker CLI client won't let you attach containers to a user
network and the bridge at the same time, presumably for a good
reason.

One good reason seems to be that name resolution doesn't
work - which means that if you pass the service name of e.g.
localstack on the same user network to provide endpoints for AWS
APIs for testing, it's not going to work.

Disconnecting the container from the bridge network seems to fix this.

Secondly, if SAM is running on a container inside the same user
network (the likeliest scenario for using one?) the Lambda ports are
not available on localhost, and the remapped ports are not available
inside the network.

How does it address the issue?

Therefore if the container is in a user network :

Detach it from the bridge network
Set its hostname to the first 12 chars of the container ID
Set its rapid port to 8080

Then invocations work properly.

What side effects does this change have?

Detaches containers in a user network from the bridge adapter
- Not sure if this has any deleterious effects, because I can't imagine a scenario where someone would want their Lambda invokers on both networks
Stops lamba executors in a user network from working if SAM is executing outside the docker stack
- Probably needs fixing. Not sure if the code has an existing method of working this out.

Checklist

Add input/output type hints to new functions/methods
Write design document (Do I need to write a design document?)
Write unit tests
Write/update functional tests
Write/update integration tests
make pr passes
make update-reproducible-reqs if dependencies were changed
Write documentation

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

The Docker CLI client won't let you attach containers to a user network and the bridge at the same time, presumably for a good reason. One good reason seems to be that name resolution doesn't work - which means that if you pass the service name of e.g. `localstack` on the same user network to provide endpoints for AWS APIs for testing, it's not going to work. Disconnecting the container from the bridge network seems to fix this. Secondly, if SAM is running on a container inside the same user network (the likeliest scenario for using one?) the Lambda ports are not available on `localhost`, and the remapped ports are not available inside the network. Therefore if the container is in a user network : - Detach it from the bridge network - Set its hostname to the first 12 chars of the container ID - Set its rapid port to 8080 Then invocations work properly.

jfuss · 2021-12-01T17:22:15Z

@awilkins I hesitate with this. I get your reasoning but I don't think we should assume it is bad. We use a docker client, which eventually communicates to the docker daemon. If it was bad or not allowed, I assume docker wouldn't let you do it at all.

What do you mean by: "Stops lamba executors in a user network from working if SAM is executing outside the docker stack"? Are you saying that SAM CLI won't be able to call a container over localhost if a network is provided?

It's been a while since I have dug into the docker networking but I am unsure what side effects this may have. Customers may be using networks and docker.host.internal. We would need to add integration with cases we can think of and validate this is actually safe to be doing.

awilkins · 2021-12-01T17:42:33Z

I think hesitation is entirely correct, because clearly this is a core bit of SAM's APIG support. But I figured opening a PR was a good context to discuss what needs fixing before it would be ready to merge.

If it was bad or not allowed, I assume docker wouldn't let you do it at all.

The CLI client refuses to let you attach containers to both networks :

❯ docker run --net lambda --net bridge -it bash
docker: conflicting options: cannot attach both user-defined and non-user-defined network-modes.
See 'docker run --help'.

The client library doesn't seem to check this though (it literally just posts the request to the daemon, and I assume the daemon doesn't check either).

What do you mean by: "Stops lamba executors in a user network from working if SAM is executing outside the docker stack"

If you've set --docker-network to a value other than host, the patch will make SAM try to invoke the lambdas over their "real" host/port {container_id[:12]}:8080 rather than their exposed host/port localhost:{remapped_port} ; really this should only happen if SAM is running on a container in a network that can reach them (ie, the same user network).

Customers may be using networks and docker.host.internal. We would need to add integration with cases we can think of and validate this is actually safe to be doing.

Agreed. It's working great for my case (which is a docker-compose stack of api-tests --> sam-api --> [invoke_containers] --> localstack using the --exit-code-from api-tests flag to get the test outcome), but I don't want to spoil anyone else's fun.

Guess there's some design docs in my future.

jfuss · 2021-12-02T18:36:00Z

@awilkins

The CLI client refuses to let you attach containers to both networks :

The command you provided isn't what SAM CLI does. We first create the container and then attach it to a network if one is provided. Not sure if that matters but could be a difference that impacts how the docker cli (or daemon) works.

If you've set --docker-network to a value other than host, the patch will make SAM try to invoke the lambdas over their "real" host/port {container_id[:12]}:8080 rather than their exposed host/port localhost:{remapped_port} ; really this should only happen if SAM is running on a container in a network that can reach them (ie, the same user network).

I need to think more here but --docker-network is for putting the local Lambda Container into a specific network. SAM CLI still needs access to call the container from localhost (we have to make sure not to break the non SAM CLI running in a container case).

Guess there's some design docs in my future.

Maybe not :). What I want to make sure is if we do add this we can add tests that verify it works and doesn't regress what we already support. SAM CLI was never really built for running in a container, as it needs Docker. There are ways around it but not something we really design for. Everything that doesn't use docker is fine but Docker in Docker + networking is overhead (and frankly much easier if you don't run in docker but I digress).

I am not going to be able to get back to this until next week. I am going to assign it to myself and come back to this. Again, I don't want to block these docker in docker cases but we have to be very careful because we don't want to break customers using the CLI on their system and not within docker.

github-actions bot added pr/external area/local/invoke area/local/start-api area/local/start-invoke labels Dec 1, 2021

awilkins mentioned this pull request Dec 1, 2021

No response from invoke container for Lambda inside docker-compose #2837

Open

jfuss self-assigned this Dec 2, 2021

Merge branch 'develop' into network/fix-invoke-in-user-networks

064b6c4

Feb	MAR	Apr
	15
2021	2022	2023

aws / aws-sam-cli Public

Fix Lambda invoke in user networks #3509

Fix Lambda invoke in user networks #3509

awilkins commented Dec 1, 2021 •

edited

jfuss commented Dec 1, 2021

awilkins commented Dec 1, 2021

jfuss commented Dec 2, 2021

aws / aws-sam-cli Public

Fix Lambda invoke in user networks #3509

Are you sure you want to change the base?

Fix Lambda invoke in user networks #3509

Conversation

awilkins commented Dec 1, 2021 • edited

Which issue(s) does this change fix?

Why is this change necessary?

How does it address the issue?

What side effects does this change have?

Checklist

jfuss commented Dec 1, 2021

awilkins commented Dec 1, 2021

jfuss commented Dec 2, 2021

awilkins commented Dec 1, 2021 •

edited