This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2 dloads failing through HTTP proxy w/ auth
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: jjlee, jwpye, loewis, mfleetwo
Priority: normal Keywords: patch

Created on 2005-04-18 21:07 by mfleetwo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python_lib_fix.patch mfleetwo, 2005-04-18 21:07 fix
Messages (4)
msg48236 - (view) Author: Mike Fleetwood (mfleetwo) Date: 2005-04-18 21:07
When using urllib2 to download through a HTTP proxy, 
which requires
authorisation, a broken HTTP request is sent.  The initial 
request might
work but subsequent requests send using the same socket 
definitely
fail.

Problem occurs on Fedora Core 3 with python 2.3.4.  Buggy 
code still
exists in Python Library in 2.4.1.

Found the problem using yum to download files via my 
companies Microsoft
ISA web proxy.  The proxy requires authorisation.  I set the 
HTTP_PROXY
environment variable to define the proxy like this:
  export HTTP_PROXY=http://username:password@proxy.
example.com:8080/

Analysis from my yum bugzilla report
http://devel.linux.duke.edu/bugzilla/show_bug.cgi?id=441 , 
follows:

Location is:
  File:     urllib2.py
  Class:    ProxyHandler
  Function: proxy_open()

The basic proxy authorisation string is created using
base64.encodestring() and passed to add_header() method 
of a Request
object.  However base64.encodestring() specifically adds a 
trailing
'\n' but when the headers are sent over the socket each is 
followed by
'\r\n'.  The server sees this double new line as the end of the 
HTTP
request and the rest of the HTTP headers as a second 
invalid request.

The broken request looks like this:
  GET ...
  Host: ...
  Accept-Encoding: identity
  Proxy-authorization: Basic xxxxxxxxxxxxxxxx
                           <-- Blank line which shouldn't be there
  User-agent: urlgrabber/2.9.2
                           <-- Blank line ending HTTP request

The fix is just to remove the '\n' which base64.encodestring() 
added
before calling add_header().  Just use string method strip() 
as is done
in the only other location base64.encodestring() is used in 
file
urllib2.py.
msg48237 - (view) Author: James William Pye (jwpye) Date: 2005-05-04 01:15
Logged In: YES 
user_id=1044177

Seems like a valid issue to me.
Each header in HTTP must be followed with a CRLF, not a LFCRLF:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2
http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4

And I don't think it consistutes continuation of the
field-content either as the LF is not followed by at least 1
SP or HT, rather a CRLF(per LWS).
msg48238 - (view) Author: John J Lee (jjlee) Date: 2006-04-14 22:12
Logged In: YES 
user_id=261020

This was fixed in revision 39377.
msg48239 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-04-15 08:28
Logged In: YES 
user_id=21627

Closing as fixed.
History
Date User Action Args
2022-04-11 14:56:10adminsetgithub: 41874
2005-04-18 21:07:48mfleetwocreate