This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: AssertionError from urllib.retrieve / httplib
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: jhylton, jmoses, terry.reedy, zenzen
Priority: normal Keywords:

Created on 2003-06-16 00:37 by zenzen, last changed 2022-04-10 16:09 by admin. This issue is now closed.

Messages (9)
msg16423 - (view) Author: Stuart Bishop (zenzen) Date: 2003-06-16 00:37
The following statement is occasionally generating
AssertionErrors:
    current_page = urllib.urlopen(action,data).read()

Traceback (most recent call last):
  File "/Users/zen/bin/autospamrep.py", line 161, in ?
    current_page = handle_spamcop_page(current_page)
  File "/Users/zen/bin/autospamrep.py", line 137, in 
handle_spamcop_page
    current_page = urllib.urlopen(action,data).read()
  File "/sw/lib/python2.3/httplib.py", line 1150, in read
    assert not self._line_consumed and self._line_left


Fix may be to do the following in 
LineAndFileWrapper.__init__ (last two lines are new):

def __init__(self, line, file):
        self._line = line
        self._file = file
        self._line_consumed = 0
        self._line_offset = 0
        self._line_left = len(line)
        if not self._line_left:
            self._done()
msg16424 - (view) Author: Stuart Bishop (zenzen) Date: 2003-06-16 00:55
Logged In: YES 
user_id=46639

My suggested fix is wrong.
msg16425 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2003-06-16 19:40
Logged In: YES 
user_id=31392

Can you reproduce this problem easily?  We've seen something
like it before, but have had trouble figuring out what goes
wrong.
msg16426 - (view) Author: Stuart Bishop (zenzen) Date: 2003-06-24 12:46
Logged In: YES 
user_id=46639

I've been unable to repeat the problem through a tcpwatch.py 
proxy, so I'm guessing the trigger is connecting to a fairly loaded 
server over a 56k modem - possibly the socket is in a bad state 
and nothing noticed?

I'll try not going through tcpwatch.py for a bit and see if I can still 
trigger the problem in case there was a server problem triggering 
it that has been fixed.
msg16427 - (view) Author: Jon Moses (jmoses) Date: 2003-09-22 20:38
Logged In: YES 
user_id=55110

I also experience this problem, and it's repeatable.  When
trying to talk with CrossRef (www.crossref.com) server, I
get this same error.  I don't know why.  All the crossref
server does is spit back text.  It normally takes between 10
and 20 seconds to recieve all the data.  I've successfully
viewed the results with mozilla, and with wget.

I'd post the URL i'm hitting, but it's a for-pay service. 
This is the code I'm using:

...
( name, headers ) = urllib.urlretrieve( url )
...

While attempting to recieve this data, I tried doing a:

...
u = urllib.urlopen( url )
for line in u.readlines():
  print line
...

but program execution seemed to continue while the data was
being received, which is not cool.  I'm not sure if that's
expected behaviour or not.

Let me know if I can provide you with any more information.

-jon
msg16428 - (view) Author: Jeremy Hylton (jhylton) (Python triager) Date: 2003-09-23 03:09
Logged In: YES 
user_id=31392

jmoses: Are you seeing this problem with Python 2.3?  I
thought we had fixed the problem in the original report.

Also, I'm not sure what you mean by program execution
continuing.  Do you mean that the for loop finished and the
rest of the program continued executing, even though there
was data left to read?  

What would probably help most is a trace of the session with
httplib's set_debuglevel() enabled.  If that's got sensitive
data, you can email it to me privately.
msg16429 - (view) Author: Jon Moses (jmoses) Date: 2003-09-23 11:52
Logged In: YES 
user_id=55110

Whups, my bad, I just assumed (and we know what happens
then) that this was for python 2.2, since that's what I was
having the problem with.  My next step was to try with
Python 2.3.  I'll let you know if it works (since it sounds
like it should).

And yes, that's what I meant.  Data from the http read was
still being outputted to the screen, while other output from
_past_ where the read was occuring was also being output. 
I'd end up with output like this:

[data from http read]
[data from after]
[data from http read]

and the data was from the same connection.

Hopefully the switch to 2.3 makes my issues moot.  Thanks
msg16430 - (view) Author: Jon Moses (jmoses) Date: 2003-09-23 12:35
Logged In: YES 
user_id=55110

I switched from using urllib.urlretrieve / urllib.urlopen to
using httplib, since I can debug with it.  I no longer get
the error this bug is about.

The other problem I seemed to be having was related to the
data I was recieving, which was generated in part from the
data I was passing to the server.  I changed the data I was
sending (changed ' ' to '%20') and ever thing works fine. 
Even using urllib.urlopen().  Sorry for the confusion.

The data that the server was sending back to the broken
request was outputted like this, using
httplib.http.set_debuglevel(1):

------start
Getting: doi.crossref.org
connect: (doi.crossref.org, 80)
send: 'GET
/servlet/query?usr=<deleted>&pwd=<deleted>&qdata=|Canadian
Journal of Fisheries and Aquatic
Sciences|Adkison|52||2762||full_text|1|<snip> HTTP/1.0\r\n\r\n'
reply: '\n'

|Canadian

-----------end

I don't know if that helps, but maybe.

Thanks much.
msg16431 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2005-04-26 22:05
Logged In: YES 
user_id=593130

Closing since this appears to have been fixed in 2.3.  If I am 
mistaken, reopen.
History
Date User Action Args
2022-04-10 16:09:14adminsetgithub: 38654
2003-06-16 00:37:03zenzencreate