Issue 1074261: gzip dies on gz files with many appended headers

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/41236

classification

Title:	gzip dies on gz files with many appended headers
Type:		Stage:
Components:	Library (Lib)	Versions:	Python 2.4

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	akuchling	Nosy List:	akuchling, eichin
Priority:	normal	Keywords:

Created on 2004-11-27 17:29 by eichin, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
make_gz_thing.py	eichin, 2004-11-27 17:29	test case that demonstrates the bug
gzip-patch	eichin, 2004-11-27 17:48	patch for GzipFile header-reading bug

Messages (4)
msg23346 - (view)	Author: Mark Eichin (eichin)	Date: 2004-11-27 17:29
One of the values of the gzip format is that one can reopen for append and the file is, as a whole, still valid. This is accomplished by adding new headers on reopen. gzip.py (as tested on 2.1, 2.3, and 2.4rc1 freshly built) doesn't deal well with more than a certain number of appended headers. The included test case generates (using gzip.py) such a file, runs gzip -tv on it to show that it is valid, and then tries to read it with gzip.py -- and it blows out, with OverflowError: long int too large to convert to int in earlier releases, MemoryError in 2.4rc1 - what's going on is that gzip.GzipFile.read keeps doubling readsize and calling _read again; _read does call _read_gzip_header, and consumes one header. So, readsize doubling means that older pythons blow out by not autopromoting past 2**32, and 2.4 blows out trying to call file.read on a huge value - but basically, more than 30 or so headers and it fails. The test case below is based on a real-world queueing case that generates over 200 appended headers - and isn't bounded in any useful way. I'll think about ways to make GzipFile more clever, but I don't have a patch yet.
msg23347 - (view)	Author: Mark Eichin (eichin)	Date: 2004-11-27 17:48
Logged In: YES user_id=79734 Oh, this is actually easy to fix: just clamp readsize. After all, you don't actually want to try to read gigabyte chunks most of the time. (The supplied patch allows one to override gzip.GzipFile.max_read_chunk if one really does.) Tested on 2.4rc1, and a version backported to 2.1 works there too.
msg23348 - (view)	Author: Mark Eichin (eichin)	Date: 2004-11-27 23:28
Logged In: YES user_id=79734 Patch sent to patch-tracker as 1074381.
msg23349 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2005-06-09 14:23
Logged In: YES user_id=11375 Patch applied to both HEAD and 2.4-maint branches; thanks!

History
Date	User	Action	Args
2022-04-11 14:56:08	admin	set	github: 41236
2004-11-27 17:29:30	eichin	create