This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib reporthook could be more informative
Type: Stage:
Components: Library (Lib) Versions: Python 2.3
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: allanbwilson, gvanrossum, loewis
Priority: normal Keywords: patch

Created on 2003-11-26 03:41 by allanbwilson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
urllibdiff.txt allanbwilson, 2003-11-27 20:03 diff -c ... >urllibdiff.txt
Messages (4)
msg44947 - (view) Author: Allan B. Wilson (allanbwilson) Date: 2003-11-26 03:41
A reporthook in urllib.urlretrieve() (in 2.3.2) is
given the max number of characters accepted ("bs") per
.read() as its second argument. It would be more
helpful to receive the number of characters actually
retrieved in the most recent block.

While perhaps this would break some existing code
(though I can't imagine how), the minor patches below
will allow giving progess updates, etc. that are accurate.

Thanks

Allan Wilson

------------

*** urllib.py.old Tue Nov 25 17:42:55 2003
--- urllib.py Tue Nov 25 18:00:50 2003
***************
*** 236,248 ****
              reporthook(0, bs, size)
          block = fp.read(bs)
          if reporthook:
!             reporthook(1, bs, size)
          while block:
              tfp.write(block)
              block = fp.read(bs)
              blocknum = blocknum + 1
              if reporthook:
!                 reporthook(blocknum, bs, size)
          fp.close()
          tfp.close()
          del fp
--- 236,248 ----
              reporthook(0, bs, size)
          block = fp.read(bs)
          if reporthook:
!             reporthook(1, len(block), size)
          while block:
              tfp.write(block)
              block = fp.read(bs)
              blocknum = blocknum + 1
              if reporthook:
!                 reporthook(blocknum, len(block), size)
          fp.close()
          tfp.close()
          del fp
msg44948 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-11-27 19:41
Logged In: YES 
user_id=21627

Can you please attach the patch, instead of pasting it?
msg44949 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2006-11-19 16:44
I notice that the patch doesn't apply to the svn head (2.6a0).  But that's easily fixed and the idea still applies.

As the original author of the code being patched I believe my reason for doing it the old way was that I wanted the report hook to be called before the first block, which would let a GUI open up a dialog box before anything was read.  The idea was that if the reads are really slow, you'd want the dialog box there right from the start.  But this was rather naive, since the most likely source of delay is making the connection and getting the response header back, and the report hook isn't being called at all until all the headers have been seen.

The changed API to reporthook() needs to be documented very clearly.  There's one call to reporthook() that still passes the block size instead of the actual data size.  A naive implementation could be confused by this call, although it is easily recognized because it is the first call and the only one with blocknum equal to zero.

I think this is a fine change -- as long as it isn't backported, since it is clearly a feature change.  I do wonder "why bother", since most people using urllib don't care all that much about extreme details (I can't remember the last time I specified a reporthook), and most people caring about details don't like urllib and use something else (e.g. httplib, or urllib2).

So I guess I'm somewhere between +0 and -0 on this on this.
msg44950 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-11-21 18:29
Discussion on python-dev revealed that read() on a socket will always give you blocksize data, except for the last block. So this doesn't really change anything in practice; applications that find that the data read (blocksize*blocknumber) exceeds the amount of data expected should conclude that they saw the last block.

Rejecting this patch.
History
Date User Action Args
2022-04-11 14:56:01adminsetgithub: 39609
2003-11-26 03:41:59allanbwilsoncreate