This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2 can't cope with error response
Type: Stage:
Components: None Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jhylton Nosy List: edemaine, jhylton
Priority: normal Keywords:

Created on 2002-06-02 22:28 by edemaine, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
out edemaine, 2002-06-02 22:28 Traceback caused by the simple example
Messages (4)
msg11028 - (view) Author: Erik Demaine (edemaine) Date: 2002-06-02 22:28
This looks similar to SF bug 216649, but with somewhat
different symptoms.  Redirection seems to cause an
AttributeError (attempt to access self.fp.read when
self.fp is None).  Simple example:

python -c "import urllib2; urllib2.urlopen
('http://www.yahoo.com/promotions/mom_com97/supermom.html')"

Traceback from Python 2.2.1 attached.  Same behavior
appears with Python 2.2.
msg11029 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2002-06-03 16:17
Logged In: YES 
user_id=31392

I haven't looked at 216649 yet, but this particular
traceback is caused by a problem loading the redirected url.
 If you load 
http://promotions.yahoo.com/promotions/mom_com97/supermom.html,
you'll see the same failure without invoking an redirect
machinery.

My first guess is that the yahoo server is sending an
invalid response and the httplib isn't being generous enough
in skipping the garbage and looking for the valid response
data.  Here's a brief trace of httplib activity:
>>> import httplib
>>> h = httplib.HTTP('promotions.yahoo.com')
>>> h.set_debuglevel(2)
>>> h.putrequest("GET /promotions/mom_com97/supermom.html")   
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: putrequest() takes at least 3 arguments (2 given)
>>> h.putrequest("GET", "/promotions/mom_com97/supermom.html")
connect: (promotions.yahoo.com, 80)
send: 'GET /promotions/mom_com97/supermom.html HTTP/1.0\r\n'
>>> h.endheaders()
send: '\r\n'
>>> h.getreply()
reply: '#\x0f\x01yhh00000011\x010\x01HTTP/1.0 200 OK\n'
(-1, '#\x0f\x01yhh00000011\x010\x01HTTP/1.0 200 OK\n', None)

Not sure what the text starting with a hash is all about.

Of course, urllib2 has a bug that prevents it from reporting
anything useful about this error.  That needs to be fixed.
msg11030 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2002-06-03 16:55
Logged In: YES 
user_id=31392

Fixed the urllib2 part of the problem in CVS as rev 1.31 of
urllib2.py.  You'll now get a better error message about
what went wrong.

Still not sure what httplib should do differently.  I notice
that Mozilla renders this page with the HTTP response in the
text, including junk at the very beginning of the response.
 (The server is clearly broken.)

It would probably be best if httplib treated this as an
HTTP/0.9 response if there appears to be a valid message
body.  It looks like that's what Mozilla is doing.
msg11031 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2002-07-06 18:49
Logged In: YES 
user_id=31392

httplib.py 1.55 now treats the page as an HTTP/0.9 response,
just like Mozilla.
History
Date User Action Args
2022-04-10 16:05:23adminsetgithub: 36688
2002-06-02 22:28:57edemainecreate