This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urlopen object's read() doesn't read to EOF
Type: Stage:
Components: Documentation Versions: Python 2.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: fdrake Nosy List: fdrake, smichr
Priority: normal Keywords:

Created on 2003-04-21 20:49 by smichr, last changed 2022-04-10 16:08 by admin. This issue is now closed.

Messages (3)
msg15569 - (view) Author: Christopher Smith (smichr) Date: 2003-04-21 20:49
On http://python.org/doc/current/lib/module-urllib.html it says that 
the object returned by urlopen supports the read()method and that 
this and other methods "have the same interface as for file objects 
-- see section 2.2.8".  In that section on page 
http://python.org/doc/current/lib/bltin-file-objects.html it says about 
the read() method that "if the size argument is negative or omitted, 
[read should] read all data until EOF is reached."

I was a bit surprised when a project that students of mine were 
working on were failing when they tried to process the data 
obtained by the read() method on a connection made to a web 
page.  The problem, apparently, is that the read may not obtain all 
of the data requested in the first request and the total response 
has to be built up someting like follows:

import urllib
c=urllib.urlopen("http://www.blakeschool.org")
data = ''
while 1:
	packet=c.read()
	if packet == '': break
	data+=packet
	
I'm not sure if this is a feature or a bug.  Could a file's read method 
fail to obtain the whole file in one read(), too?  It seems that either 
the documentation should be changed or the read() method for at 
least urllib objects should be changed.

/c

Christopher P. Smith
The Blake School
Minneapolis, MN
msg15570 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2004-03-25 17:04
Logged In: YES 
user_id=3066

This is an issue with reading from a socket; there's no way
to recognize the end of the stream until the remote end of
the socket actually closes the socket.

I've documented this limitation in Doc/lib/liburllib.tex
1.52.  Someone should backport the patch to Python 2.3.x and
close this report.
msg15571 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2004-04-01 04:23
Logged In: YES 
user_id=3066

Backported to Python 2.3.4 as Doc/lib/liburllib.tex 1.50.8.2.
History
Date User Action Args
2022-04-10 16:08:16adminsetgithub: 38346
2003-04-21 20:49:37smichrcreate