Issue 725265: urlopen object's read() doesn't read to EOF

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/38346

classification

Title:	urlopen object's read() doesn't read to EOF
Type:		Stage:
Components:	Documentation	Versions:	Python 2.2

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	fdrake	Nosy List:	fdrake, smichr
Priority:	normal	Keywords:

Created on 2003-04-21 20:49 by smichr, last changed 2022-04-10 16:08 by admin. This issue is now closed.

Messages (3)
msg15569 - (view)	Author: Christopher Smith (smichr)	Date: 2003-04-21 20:49
On http://python.org/doc/current/lib/module-urllib.html it says that the object returned by urlopen supports the read()method and that this and other methods "have the same interface as for file objects -- see section 2.2.8". In that section on page http://python.org/doc/current/lib/bltin-file-objects.html it says about the read() method that "if the size argument is negative or omitted, [read should] read all data until EOF is reached." I was a bit surprised when a project that students of mine were working on were failing when they tried to process the data obtained by the read() method on a connection made to a web page. The problem, apparently, is that the read may not obtain all of the data requested in the first request and the total response has to be built up someting like follows: import urllib c=urllib.urlopen("http://www.blakeschool.org") data = '' while 1: packet=c.read() if packet == '': break data+=packet I'm not sure if this is a feature or a bug. Could a file's read method fail to obtain the whole file in one read(), too? It seems that either the documentation should be changed or the read() method for at least urllib objects should be changed. /c Christopher P. Smith The Blake School Minneapolis, MN
msg15570 - (view)	Author: Fred Drake (fdrake)	Date: 2004-03-25 17:04
Logged In: YES user_id=3066 This is an issue with reading from a socket; there's no way to recognize the end of the stream until the remote end of the socket actually closes the socket. I've documented this limitation in Doc/lib/liburllib.tex 1.52. Someone should backport the patch to Python 2.3.x and close this report.
msg15571 - (view)	Author: Fred Drake (fdrake)	Date: 2004-04-01 04:23
Logged In: YES user_id=3066 Backported to Python 2.3.4 as Doc/lib/liburllib.tex 1.50.8.2.

History
Date	User	Action	Args
2022-04-10 16:08:16	admin	set	github: 38346
2003-04-21 20:49:37	smichr	create