This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Patch for #1586414 to avoid fragmentation on Windows
Type: Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: lars.gustaebel Nosy List: enochjul, josiahcarlson, lars.gustaebel
Priority: normal Keywords: patch

Created on 2006-10-31 05:05 by enochjul, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tarfile_set_length.patch enochjul, 2006-10-31 05:05
Messages (8)
msg51299 - (view) Author: Enoch Julias (enochjul) Date: 2006-10-31 05:05
Add a call to file.truncate() to inform Windows of the
size of the target file in  makefile(). This helps
guide cluster allocation in NTFS to avoid fragmentation.
msg51300 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2006-11-01 15:27
Logged In: YES 
user_id=642936

Is this merely an NTFS problem or is it the same with FAT fs?
How do you detect file fragmentation?
Doesn't this problem apply to all other modules or scripts
that write to file objects as well?
Shouldn't a decent filesystem be able to handle growing
files in a correct manner?
msg51301 - (view) Author: Enoch Julias (enochjul) Date: 2006-11-06 17:19
Logged In: YES 
user_id=6071

I have not really tested FAT/FAT32 yet as I don't use these 
filesystems now.

The Disk Defragmenter tool in Windows 2000/XP shows the number of 
files/directories fragmented in its report.

NTFS does handle growing files, but the operating system can only do 
so much without knowing the size of the file. Extracting from 
archives consisting of only several files does not cause 
fragmentation. However, if the archive has many files, it is much 
more likely that the default algorithm will fail to allocate 
contiguous clusters for some files. It may also depend on the amount 
of free space fragmentation on a particular partition and whether 
other processes are writing to other files in the same partition.

Some details of the cluster allocation algorithm used in Windows can 
be found at http://support.microsoft.com/kb/841551.
msg51302 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2006-11-06 21:57
Logged In: YES 
user_id=642936

Personally, I think disk defragmenters are evil ;-) They
create the need that they are supposed to satisfy at the
same time. On Linux we have no defragmenters, so we don't
bother about it.

I think your proposal is some kind of a performance hack for
a particular filesystem. In principle, this problem exists
for all filesystems on all platforms. Fragmentation is IMO a
filesystem's problem and is not so much a state but more
like a process. Filesystem fragment over time and you can't
do anything about it. For those people who care, disk
fragmenter were invented. It is not tarfile.py's job to care
about a fragmented filesystem, that's simply too low level.

I admit that it is a small patch, but I'm -1 on having this
applied.
msg51303 - (view) Author: Josiah Carlson (josiahcarlson) * (Python triager) Date: 2006-11-08 16:33
Logged In: YES 
user_id=341410

I disagree with user gustaebel.  We should be adding
automatic truncate calls for all possible supported
platforms, in all places where it could make sense.  Be it
in tarfile, zipfile, where ever we can.  It would make sense
to write a function that can be called by all of those
modules so that there is only one place to update if/when
changes occur.  If the function were not part of the public
Python API, then it wouldn't need to wait until 2.6, unless
it were considered a feature addition rather than bugfix. 
One would have to wait on a response from Martin or Anthony
to know which it was, though I couldn't say for sure if
operations that are generally performance enhancing are
bugfixes or feature additions.
msg51304 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2006-11-08 21:30
Logged In: YES 
user_id=642936

You both still fail to convince me and I still don't see
need for action. The only case ATM where this addition makes
sense (in your opinion) is the Windows OS when using the
NTFS filesystem and certain conditions are met. NTFS has a
preallocation algorithm to deal with this. We don't know if
there is any advantage on FAT filesystems.

On Linux for example there is a plethora of supported
filesystems. Some of them may take advantage, others may
not. Who knows? We can't even detect which filesystem type
we are currently writing to. Apart from that, the behaviour
of truncate(arg) with arg > filesize seems to be
system-dependent.

So, IMO this is a very special optimization targeted at a
single platform. The TarFile class is easily subclassable,
just override the makefile() method and add the two lines of
code. I think that's what ActiveState's Python Cookbook is for.

BTW, I like my files to grow bit by bit. In case of an
error, I can detect if a file was not extracted completely
by comparing the file sizes. Furthermore, a file that grows
is more common and more what a programmer who uses this
module might expect.
msg51305 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2006-12-23 19:03
Any progress on this one?
msg51306 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2007-01-22 19:08
Closed due to lack of interest.
History
Date User Action Args
2022-04-11 14:56:21adminsetgithub: 44180
2006-10-31 05:05:25enochjulcreate