The Wayback Machine - https://web.archive.org/web/20240226191637/https://github.com/psf/requests/pull/5665
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HTTPS proxies. #5665

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jalopezsilva
Copy link

@jalopezsilva jalopezsilva commented Nov 19, 2020

Starting with 1.26 urllib3 now supports HTTPS proxies. To enable the support two changes are needed:

  • An additional proxy_kwargs argument that can be passed from the session. This dictionary will be used to pass any arguments needed to the underlying adapter. The parameter is optional.
  • The use_forwarding_for_https mode requires us to send the absolute URI when enabled.

The resulting API is very similar except it takes an additional parameter when needed. An example:

    session = requests.Session()
    proxies = {"http": "https://proxy.example", "https": "https://proxy.example"}

    proxies_kwargs = {
        "proxy_ssl_context": proxy_ssl_context(),
    }

    response = session.get(
       "https://www.google.com", proxies=proxies, proxies_kwargs=proxies_kwargs
    )
Starting with 1.26 urllib3 now supports HTTPS proxies. To enable the
support two changes are needed:

* An additional proxy_kwargs argument that can be passed from the session.
This dictionary will be used to pass any arguments needed to the
underlying adapter. The parameter is optional.
* The use_forwarding_for_https mode requires us to send the absolute URI
when enabled.

Signed-off-by: Jorge Lopez Silva <jalopezsilva@gmail.com>
@sigmavirus24
Copy link
Contributor

Requests is under feature freeze which means we will not add any additional keyword arguments to the public API like this.

@jalopezsilva
Copy link
Author

@sigmavirus24 thanks for the feedback! Is adding an additional keyword argument to the public API the only concern? I've been reviewing the options available and there's no easy way around this unless we modify the proxy_manager_for method in HTTPAdapter

Could we add a proxy_kwargs attribute to HTTPAdapter? Rather than passing the proxy_kwargs dictionary through multiple top level APIs we could expose it like this:

s = requests.Session()
s.adapters['https://'].proxy_kwargs = { ... }

That minimizes some of the changes needed. Would that work?

Facebook needs these changes upstreamed so I would love it if we can find a path forward here. I can create our own HTTPAdapter internally but t it leaves the requests community without being able to benefit from the work I put in to expand the HTTPS proxy support in urllib3.

@nateprewitt nateprewitt added this to the 2.26.0 milestone May 20, 2021
@sethmlarson
Copy link
Member

@nateprewitt Are we still planning on doing this for 2.26.0? It doesn't look like it's moved since being added to the milestone.

@nateprewitt nateprewitt removed this from the 2.26.0 milestone Jul 7, 2021
@nateprewitt
Copy link
Member

I popped it out of 2.26.0 so we can get a release unblocked. I'll follow up with @jalopezsilva on getting this in the next minor release.

@jalopezsilva
Copy link
Author

That would be amazing @nateprewitt. Let me know which changes are required.

@@ -470,7 +470,8 @@ def prepare_request(self, request):
def request(self, method, url,
params=None, data=None, headers=None, cookies=None, files=None,
auth=None, timeout=None, allow_redirects=True, proxies=None,
hooks=None, stream=None, verify=None, cert=None, json=None):
proxies_kwargs=None, hooks=None, stream=None, verify=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried about putting this in the middle of the function signature. People shouldn't be passing these as positional args, but I'm sure they are.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can't prevent them from passing them positionally, I agree that this should be tacked onto the end.

using_socks_proxy = False
if proxy:
proxy_scheme = urlparse(proxy).scheme.lower()
using_socks_proxy = proxy_scheme.startswith('socks')
is_using_https_proxy = proxy_scheme.startswith('https')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support a number of possible socks* schemes which is the reason for using startswith. I'm wondering if we want to be more explicit here by checking for exactly https.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree this could be more simply proxy_scheme == 'https' but I'm wondering if there are cases I'm unaware of where someone might use https+<somethingelse>://proxy or <somethingelse>+https://proxy. It shouldn't be too much trouble to add support for those if necessary though and on the chance those aren't easily supportable with this work, then we won't accidentally opt them in


url = request.path_url
if is_proxied_http_request and not using_socks_proxy:
if (is_proxied_http_request and not using_socks_proxy or
(is_using_https_proxy and is_using_https_forwarding)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; this conditional is getting a kind of hard to parse at first glance. Perhaps we could define a couple variables like is_forwarded_https_proxy and is_nonsocks_http_proxy to simplify the concepts.

e.g.

if is_nonsocks_http_proxy or is_forwarded_https_proxy:
@@ -59,7 +59,7 @@ def __init__(self):
super(BaseAdapter, self).__init__()

def send(self, request, stream=False, timeout=None, verify=True,
cert=None, proxies=None):
cert=None, proxies=None, proxies_kwargs=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the probably the more controversial part of this change. Requests has considered its top level APIs more-or-less frozen since 2.0. Adding this is a deviation from that philosophy.

While we should continue to uphold that stance, I don't see another way for us to expose this urllib3 functionality. Given we have some legitimately broken use cases, I think a rare exception may be warranted.

Copy link
Contributor

@sigmavirus24 sigmavirus24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there are a lot of public functions whose signatures are being altered here in a way that is going to cause significant pain for users.

Adapters are one point of extension that many users take advantage of. Some subclass the default HTTPAdapter in order to customize certain things and may have added some custom logic to get_connection or even request_url (although I'd hope not).

If they are, however, we're unconditionally sending that parameter and we're not doing it as a keyword argument below in send which means that people then need to add proxies_kwargs before upgrading or they may break things desperately.

While many subclass HTTPAdapter, an order of magnitude more subclass Session and many override request and send. I'm very hesitant to accept these changes as is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants