This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib problems
Type: Stage:
Components: macOS Versions: Python 2.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jackjansen Nosy List: jackjansen, ybenita
Priority: normal Keywords:

Created on 2002-01-31 07:25 by ybenita, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Messages (3)
msg9069 - (view) Author: Yair Benita (ybenita) Date: 2002-01-31 07:25
when using urllib.urlopen("url") and then reading 
the file with handle.read() i get only parts of pages. 
it works for short webpages but if i use it to 
download large pages it always come too short. To 
me it looks that it tries to read the file before it is 
downloaded. Jack Jansen's said: MacPython may 
do short reads on sockets. I've always maintained 
that this was correct (which reasoning was quietly 
accepted by everyone here), but last year I finally 
admitted that it may actually be incorrect (which 
was again quietly accepted:-)

example:
x=urllib.urlopen("http://www.ebi.ac.uk/cgi-bin/emblf
etch?db=embl&format=fasta&style=raw&id=AB002
378")
print x.read()

compare the file downloaded by any html browser 
and the file from macpython.
msg9070 - (view) Author: Jack Jansen (jackjansen) * (Python committer) Date: 2002-02-06 00:34
Logged In: YES 
user_id=45365

I probably found the cause for this, now the only task remaining is finding out who to blame:-)

httplib explicitly sets non-buffering I/O on the file corresponding to the socket, by calling
self.fp = socket.makefile("rb", 0).

MSL, the CodeWarrior I/O library, has an optimization (or bug:-) that if you fread() from a binary
file with buffering turned off it will call the underlying read() straight away.

Python's fileobject.c file_read() reacts to a short fread() return value by returning.

One of these three is wrong, apparently.
msg9071 - (view) Author: Jack Jansen (jackjansen) * (Python committer) Date: 2002-04-22 13:24
Logged In: YES 
user_id=45365

This was fixed some time ago (the fix made it into 2.2.1) by modifying the underlying GUSI I/O library. Apparently I forgot to close the bug report, so I'm doing so now.
History
Date User Action Args
2022-04-10 16:04:56adminsetgithub: 36005
2002-01-31 07:25:04ybenitacreate