This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Fix for off-by-one bug in urllib.URLopener.retrieve
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: georg.brandl, georg.brandl, jlmuir, rhettinger, titus
Priority: normal Keywords: patch

Created on 2003-09-21 05:13 by jlmuir, last changed 2022-04-10 16:11 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
reporthook-HEAD-jlmuir-v1.diff jlmuir, 2003-09-21 05:13 Patch against HEAD
Messages (6)
msg44669 - (view) Author: J. Lewis Muir (jlmuir) Date: 2003-09-21 05:13
This patch fixes an off-by-one bug in the reporthook part of 
urllib.URLopener.retrieve and adds test cases to verify the 
fix.

The retrieve method reports reading one more block than it 
actually does. Here's an example. The file is empty. I expect 
the initial reporthook call on establishment of the network 
connection but not an additional call since the file is empty 
(no block was read):

>>> import urllib
>>> def reporter(count, blockSize, fileSize):
...     print "c: %i, bs: %i, fs: %i" % (count, blockSize, 
fileSize)
...
>>> srcFile = file("/tmp/empty.txt", "wb")
>>> srcFile.close()
>>> urllib.urlretrieve("file:///tmp/empty.txt", "/tmp/new-
empty.txt", reporter)
c: 0, bs: 8192, fs: 0
c: 1, bs: 8192, fs: 0
('/tmp/new-empty.txt', <mimetools.Message instance at 
0x5a4f58>)
>>>

As a second example, if the file contains 1 byte, the retrieve 
method claims it read 2 blocks when it really only read 1:

>>> srcFile = file("/tmp/empty.txt", "wb")
>>> srcFile.write("x")
>>> srcFile.close()
>>> urllib.urlretrieve("file:///tmp/empty.txt", "/tmp/new-
empty.txt", reporter)
c: 0, bs: 8192, fs: 1
c: 1, bs: 8192, fs: 1
c: 2, bs: 8192, fs: 1
('/tmp/new-empty.txt', <mimetools.Message instance at 
0x5a50d0>)
>>>

This patch also includes some changes to 
test_urllib.urlretrieve_FileTests (where I added the test 
cases) to support creating more than one temporary file and 
making sure any created temporary files get deleted when 
the test case fixture is torn down.
msg44670 - (view) Author: Titus Brown (titus) Date: 2004-12-19 08:51
Logged In: YES 
user_id=23486

Patch needed slight modification to work against current HEAD; new 
patch at 

http://issola.caltech.edu/~t/transfer/transfer/patch-810023-
reporthook.diff

The new regression tests look reasonable.  I verified that the new 
regression tests FAILED against old urllib.py, and worked against patched 
urllib.py.  Recommend apply.
msg44671 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-12-20 03:55
Logged In: YES 
user_id=80475

Will review and apply after Xmas vacation.
msg44672 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-12-21 00:29
Logged In: YES 
user_id=80475

Titus, I'm having difficulty with your link, please attach
the revised patch or just email it to me (python at rcn.com).
msg44673 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2005-08-26 08:20
Logged In: YES 
user_id=80475

Reinhold, do you have time for this one?
msg44674 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-08-26 08:52
Logged In: YES 
user_id=1188172

Patch looks good, committed as
Lib/urllib.py r1.168 r1.165.2.1
Lib/test/test_urllib.py r1.17 r1.16.4.1
History
Date User Action Args
2022-04-10 16:11:15adminsetgithub: 39265
2003-09-21 05:13:46jlmuircreate