Issue 1185444: urllib2 dloads failing through HTTP proxy w/ auth

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/41874

classification

Title:	urllib2 dloads failing through HTTP proxy w/ auth
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 2.4

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	jjlee, jwpye, loewis, mfleetwo
Priority:	normal	Keywords:	patch

Created on 2005-04-18 21:07 by mfleetwo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
python_lib_fix.patch	mfleetwo, 2005-04-18 21:07	fix

Messages (4)
msg48236 - (view)	Author: Mike Fleetwood (mfleetwo)	Date: 2005-04-18 21:07
When using urllib2 to download through a HTTP proxy, which requires authorisation, a broken HTTP request is sent. The initial request might work but subsequent requests send using the same socket definitely fail. Problem occurs on Fedora Core 3 with python 2.3.4. Buggy code still exists in Python Library in 2.4.1. Found the problem using yum to download files via my companies Microsoft ISA web proxy. The proxy requires authorisation. I set the HTTP_PROXY environment variable to define the proxy like this: export HTTP_PROXY=http://username:password@proxy. example.com:8080/ Analysis from my yum bugzilla report http://devel.linux.duke.edu/bugzilla/show_bug.cgi?id=441 , follows: Location is: File: urllib2.py Class: ProxyHandler Function: proxy_open() The basic proxy authorisation string is created using base64.encodestring() and passed to add_header() method of a Request object. However base64.encodestring() specifically adds a trailing '\n' but when the headers are sent over the socket each is followed by '\r\n'. The server sees this double new line as the end of the HTTP request and the rest of the HTTP headers as a second invalid request. The broken request looks like this: GET ... Host: ... Accept-Encoding: identity Proxy-authorization: Basic xxxxxxxxxxxxxxxx <-- Blank line which shouldn't be there User-agent: urlgrabber/2.9.2 <-- Blank line ending HTTP request The fix is just to remove the '\n' which base64.encodestring() added before calling add_header(). Just use string method strip() as is done in the only other location base64.encodestring() is used in file urllib2.py.
msg48237 - (view)	Author: James William Pye (jwpye)	Date: 2005-05-04 01:15
Logged In: YES user_id=1044177 Seems like a valid issue to me. Each header in HTTP must be followed with a CRLF, not a LFCRLF: http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2 http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4 And I don't think it consistutes continuation of the field-content either as the LF is not followed by at least 1 SP or HT, rather a CRLF(per LWS).
msg48238 - (view)	Author: John J Lee (jjlee)	Date: 2006-04-14 22:12
Logged In: YES user_id=261020 This was fixed in revision 39377.
msg48239 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2006-04-15 08:28
Logged In: YES user_id=21627 Closing as fixed.

History
Date	User	Action	Args
2022-04-11 14:56:10	admin	set	github: 41874
2005-04-18 21:07:48	mfleetwo	create