This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zlib.decompressobj under-described.
Type: Stage:
Components: None Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, loewis, scott_daniels
Priority: normal Keywords: patch

Created on 2002-11-18 18:48 by scott_daniels, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Messages (4)
msg41695 - (view) Author: Scott David Daniels (scott_daniels) * Date: 2002-11-18 18:48
While trying to implement some decompression code,
(from reading the docs), I believed this was appropriate
code:

    dco = zlib.decompressionobj()
    for bytes in compressed_data:
        s = dco.decompress(bytes, limit)
        while s:
             handle_some_data(s)
             s = dco.decompress('', limit)

After eventually chasing back through the test_zlib.py
code, I now believe the proper analog of this code 
should be:

    dco = zlib.decompressionobj()
    for bytes in compressed_data:
        s = dco.decompress(bytes, limit)
        while s:
             handle_some_data(s)
             s = dco.decompress(
                      dco.unconsumed_tail,
                      limit)

The meaning of both unconsumed_tail and 
unused_data need a it of explanation in the docs.

msg41696 - (view) Author: Scott David Daniels (scott_daniels) * Date: 2002-12-12 17:46
Logged In: YES 
user_id=493818

Here is some alternative text, with a bit more explanation.

\begin{memberdesc}{unused_data}
A string which contains any bytes past the end of the
compressed data.
That is, this remains \code{""} until the last byte that
contains compression data is available.  If the whole string
turned
out to contain compressed data, this is \code{""}, the empty
string. 

The only way to determine where a string of compressed data
ends is by
actually decompressing it.  This means that when compressed
data is
contained part of a larger file, you can only find the end
of it by
reading data and feeding it followed by some non-empty
string into a 
decompression object's \method{decompress} method until the 
\member{unused_data} attribute is no longer the empty string.  
\end{memberdesc}

\begin{memberdesc}{unconsumed_tail}
A string that contains any data that was not consumed by the
last
\method{decompress} call because it exceeded the limit for the
uncompressed data buffer.  This data has not yet been seen
by the 
zlib machinery, so you must feed it (possibly with further data 
concatenated to it) back to a subsequent \method{decompress}
method 
call in order to get correct output.
\end{memberdesc}
msg41697 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2003-05-22 03:39
Logged In: YES 
user_id=357491

I don't see much of a difference between the current docs for unused_data 
and the ones Scott posted here.  unused_tail does seem to be more 
informative.  I have no issue addding the extra info that Scott added in his 
version.  Anyone else think it is worth adding?

Since the comment basically has a patch, I am making this tracker item a 
patch.
msg41698 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2003-06-21 14:15
Logged In: YES 
user_id=21627

Thanks for the patch, committed as libzlib.tex 1.28.
History
Date User Action Args
2022-04-10 16:05:54adminsetgithub: 37497
2002-11-18 18:48:51scott_danielscreate