This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2 authentication problem
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: gazzadee, georg.brandl, ghaering, jjlee
Priority: normal Keywords:

Created on 2003-02-05 00:22 by gazzadee, last changed 2022-04-10 16:06 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
urllib2_proxy_auth.py gazzadee, 2003-12-16 03:10 Demonstrates authentication problem
Messages (9)
msg14438 - (view) Author: GaryD (gazzadee) Date: 2003-02-05 00:22
I've found a problem using the authentication in urllib2.

When matching up host-names in order to find a
password, then putting the protocol in the address
makes it seem like a different address. eg...

I create a HTTPBasicAuthHandler with a
HTTPPasswordMgrWithDefaultRealm, and add the tuple
(None, "http://proxy.blah.com:17828", "foo", "bar") to it.

I then setup the proxy to use
http://proxy.blah.com:17828 (which requires
authentication).

When I connect, the password lookup fails, because it
is trying to find a match for "proxy.blah.com:17828"
rather than "http://proxy.blah.com:17828"

This problem doesn't exist if I pass
"proxy.blah.com:17828" to the password manager.

There seems to be some stuff in HTTPPasswordMgr to deal
with variations on site names, but I guess it's not
working in this case (unless this is intentional).

Version Info:
Python 2.2 (#1, Feb 24 2002, 16:21:58)
[GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]
on linux-i386
msg14439 - (view) Author: Gerhard Häring (ghaering) * (Python committer) Date: 2003-02-07 23:21
Logged In: YES 
user_id=163326

Can you please retry with Python 2.2.2?

It seems that a related bug was fixed for 2.2.2:
http://python.org/2.2.2/NEWS.txt has an entry:

"""
- In urllib2.py: fix proxy config with user+pass
authentication.  [SF
  patch 527518]
"""
msg14440 - (view) Author: GaryD (gazzadee) Date: 2003-02-09 23:17
Logged In: YES 
user_id=693152

Okay, the same problem crops up in Python 2.2.2 running
under cygwin on Win XP

Version Info:
Python 2.2.2 (#1, Dec 31 2002, 12:24:34) 
[GCC 3.2 20020927 (prerelease)] on cygwin

Here's the pertinent section of my test file (passwords and
URL changed to protect the innocent):


    # Setup proxy
    proxy_handler = ProxyHandler({"http" :
"http://blah.com:17828"})
    
    # Setup authentication
    pass_mgr = HTTPPasswordMgrWithDefaultRealm()
    for passwd in [ \
                   (None, "http://blah.com:17828", "foo",
"bar"), \
#                   (None, "blah.com:17828", "foo",
"bar"), \	# Works if this line is uncommented
                   (None, "blah.com", "foo", "bar"), \
                  ]:
        print("Adding password set (%s, %s, %s, %s)" % passwd)
        pass_mgr.add_password(*passwd)
    auth_handler = HTTPBasicAuthHandler(pass_mgr)
    proxy_auth_handler = ProxyBasicAuthHandler(pass_mgr)
    
    # Now build a new URL opener and install it
    opener = build_opener(proxy_handler, proxy_auth_handler,
auth_handler, HTTPHandler)
    install_opener(opener)
    
    # Now try to open a file and see what happens
    request = Request("http://www.google.com")
    try:
        remotefile = urlopen(request)
    except HTTPError, ex:
        print("Unable to download file due to HTTP Error %d
(%s)." % (ex.code, ex.msg))
        return

msg14441 - (view) Author: John J Lee (jjlee) Date: 2003-12-01 00:14
Logged In: YES 
user_id=261020

The problem seems to be with the port (:17828), not the URL 
scheme (http:), because HTTPPasswordMgr.reduce_uri() 
removes the scheme. 
 
RFC 2617 (top of page 3) says nothing about removing the 
port from the URI.  urllib2 does not remove the port, so this 
doesn't appear to be a bug. 
 
I guess gazzadee was doing a urlopen with a different 
canonical root URI (RFC 2617, top of page 3 again) to the one 
he gave in add_password (ie. the URL he passed to urlopen() 
had no explicit port number). 
msg14442 - (view) Author: GaryD (gazzadee) Date: 2003-12-16 02:08
Logged In: YES 
user_id=693152

This was a while ago, and my memory has faded. I'll try to
respond intelligently.

I think the question was with the way the password manager
looks up passwords, rather than anything else.

I am pretty sure that the problem is not to do with the URI
passed to urlopen(). In the code shown below, the problem
was solely dependent on whether I added the line:
    (None, "blah.com:17828", "foo", "bar")
...to the HTTPPasswordMgrWithDefaultRealm object.

If that password set was added, then the password lookup for
the proxy was successful, and urlopen() worked. If that
password set was not included, then the password lookup for
the proxy was unsuccessful (despite the inclusion of the
other 2, similar, password sets - "http://blah.com:17828"
and "blah.com"), and urlopen() would fail. Hence my
suspicion that the password manager did not fully remove the
scheme, despite attempts to do so.

I'll see if I can set it up on the latest python and get it
to happen again.

Just as an explanation, the situation was that I was running
an authenticating proxy on a non-standard port (in order to
avoid clashing with the normal proxy), in order to test out
how my download code would work through an authenticating proxy.
msg14443 - (view) Author: GaryD (gazzadee) Date: 2003-12-16 03:10
Logged In: YES 
user_id=693152

Okay, I have attached a file that replicates this problem.

If you run it as is (replacing the proxy name and address
with something suitable), then it will fail (requiring proxy
authentication).

If you uncomment line 23 (which specifies the password
without the scheme), then it will work successfully.

Technical Info:
 * For a proxy, I am using Squid Cache version 2.4.STABLE7
for i586-mandrake-linux-gnu...
 * I have replicated the problem with Python 2.2.2 on Linux,
and Python 2.3.2 on Windows XP.
msg14444 - (view) Author: John J Lee (jjlee) Date: 2003-12-16 12:49
Logged In: YES 
user_id=261020

Thanks! 
 
It seems .reduce_uri() tries to cope with hostnames as well as 
absoluteURIs.  I don't understand why it wants to do that, but it 
fails, because it doesn't anticipate what urlparse does when a 
port is present: 
 
>>> urlparse.urlparse("foo.bar.com") 
('', '', 'foo.bar.com', '', '', '') 
>>> urlparse.urlparse("foo.bar.com:80") 
('foo.bar.com', '', '80', '', '', '') 
 
I haven't checked, but I assume it's just incorrect use of 
urlparse to pass it a hostname. 
 
Of course, if it's "fixed" to only accept absoluteURIs, it will 
break existing code, so I guess it must be fixed for 
hostnames. :-(( 
 
Also, I think .is_suburi("/foo/spam", "/foo/eggs") should return 
False, but returns True, and .http_error_40x() use 
req.get_host() when they should be using req.get_full_url() 
(from a quick look at RFC 2617). 
msg14445 - (view) Author: John J Lee (jjlee) Date: 2006-04-15 18:45
Logged In: YES 
user_id=261020

This issue is fixed by patch 1470846.
msg14446 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-05-03 05:33
Logged In: YES 
user_id=849994

Closing accordingly.
History
Date User Action Args
2022-04-10 16:06:31adminsetgithub: 37910
2003-02-05 00:22:00gazzadeecreate