msg60709 - (view) |
Author: Adam Olsen (Rhamphoryncus) |
Date: 2005-04-01 04:48 |
$ python -c 'open("/dev/zero").read()'
Traceback (most recent call last):
File "<string>", line 1, in ?
SystemError: ../Objects/stringobject.c:3316: bad
argument to internal function
Compare with this two variants:
$ python -c 'open("/dev/zero").read(2**31-1)'
Traceback (most recent call last):
File "<string>", line 1, in ?
MemoryError
$ python -c 'open("/dev/zero").read(2**31)'
Traceback (most recent call last):
File "<string>", line 1, in ?
OverflowError: long int too large to convert to int
The unsized read should produce either MemoryError or
OverflowError instead of SystemError.
Tested with Python 2.2, 2.3, and 2.4.
|
msg60710 - (view) |
Author: Armin Rigo (arigo) * |
Date: 2005-04-01 09:58 |
Logged In: YES
user_id=4771
I think that file.read() with no argument needs to be more conservative. Currently it asks and trusts the stat() to get the file size, but this can lead to just plain wrong results on special devices. (I had the problem that open('/dev/null').read() would give a MemoryError!)
We can argue whether plain read() on special devices is a good idea or not, but I guess that not blindly trusting stat() if it returns huge values could be a good idea.
|
msg60711 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2005-04-01 21:42 |
Logged In: YES
user_id=21627
I think it should trust the stat result, and then find that
it cannot allocate that much memory.
Actually, os.stat("/dev/zero").st_size is 0, so something
else must be going on.
|
msg60712 - (view) |
Author: Armin Rigo (arigo) * |
Date: 2005-04-02 12:31 |
Logged In: YES
user_id=4771
os.stat() doesn't always give consistent results on dev files. On my machine for some reason os.stat('/dev/null') appears to be random (and extremely large). I suspect that on the OP's machine os.stat('/dev/zero') is not 0 either, but a random number that turns out to be negative, hence a "bad argument" SystemError.
|
msg60713 - (view) |
Author: Sean Reifschneider (jafo) * |
Date: 2005-04-06 03:39 |
Logged In: YES
user_id=81797
I am able to reproduce this on a Fedora Core 3 Linux system:
>>> fp = open('/dev/zero', 'rb')
>>> d = fp.read()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
MemoryError
>>> print os.stat('/dev/zero').st_size
0
What about only trusting st_size if the file is a regular
file, not a directory or other type of special file?
Sean
|
msg60714 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2005-04-06 06:40 |
Logged In: YES
user_id=21627
The problem is different. Instead, _PyString_Resize
complains that the new buffersize of the string is negative.
This in turn happens because the string manages to get
larger >2GB, which in turn happens because buffersize is
size_t, yet _PyString_Resize expects int.
I don't know how Linux manages to allocate such a large
string without thrashing.
There is a minor confusion with stat() as well:
new_buffersize tries to find out how much bytes are left to
the end of the file. In the case of /dev/zero, both fstat
and lseek are "lying" by returning 0. As lseek returns 0,
ftell is invoked and returns non-zero. Then, newbuffer does
not trust the values, and just adds BIGCHUNK.
|
msg60715 - (view) |
Author: Sean Reifschneider (jafo) * |
Date: 2005-04-06 06:52 |
Logged In: YES
user_id=81797
Linux can do a very fast allocation if it has swap
available. It reserves space, but does not actually assign
the memory until you try to use it. In my case, I have 1GB
of RAM, around 700MB free, and another 2GB in swap. So, I
have plenty unless I use it. In C I can malloc 1GB and
unless I write every page in that block the system doesn't
really give the pages to the process.
|
msg60716 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2005-04-06 07:06 |
Logged In: YES
user_id=21627
The surprising aspect is that the memory *is* being used.
Python allocates 2GB of memory, and then passes this to
read(2) (through stdio) to fill it with the contents of
/dev/zero. This should cause a write operation to the memory
pages, which in turn should cause them to consume actual
memory. For some reason, this doesn't happen.
|
msg60717 - (view) |
Author: Sean Reifschneider (jafo) * |
Date: 2005-04-06 07:17 |
Logged In: YES
user_id=81797
I'm quite sure that the 2GB is not getting filled when doing
this. After running the first command, and checking
/proc/meminfo, I see that only 46MB is shown as free, which
means that there was no more than this amount of RAM consumed.
|
msg60718 - (view) |
Author: Jeff Epler (jepler) |
Date: 2005-04-07 14:33 |
Logged In: YES
user_id=2772
To my surprise, the st_size field *is* undefined for
devices, according to
http://www.opengroup.org/onlinepubs/009695399/basedefs/sys/stat.h.html
My Linux consistently gives 0 for device nodes, but I guess
Unix might not.
jafo, loweis: read() from /dev/zero is special cased in
linux. In 2.4.20-8, function read_zero_pagealined, a
comment says "for private mappings, just map in zero pages",
which will be very fast and not actually write any memory or
blocks of swap.
|
msg62577 - (view) |
Author: A.M. Kuchling (akuchling) * |
Date: 2008-02-20 00:01 |
This seems like an easy fix: just check that S_ISREG(st_mode) is true in
new_buffersize in fileobject.c.
|
msg62781 - (view) |
Author: David Christian (dugan) |
Date: 2008-02-23 17:29 |
I don't see any change to the return value of new_buffersize that could
alleviate this problem - the problem being that because linux is
extremely efficient at reading bytes from /dev/zero, some other code
incosistencies are exposed.
The problem that is being hit is that the new_buffersize value is
allowed to grow without bounds and is never rechecked for sanity, then
is passed in to PyString_Resize where it is converted from unsigned int
to signed int.
I suggest adding a check of new_buffersize against PY_SSIZE_T_MAX. If
it exceeded, we could raise an OverflowError - "unbounded read consumed
more bytes than a Python string can hold"
|
msg62793 - (view) |
Author: David Christian (dugan) |
Date: 2008-02-23 18:26 |
Raise OverflowError if unbounded read() exceeds PY_SSIZE_T_MAX bytes.
|
msg66558 - (view) |
Author: Dustin J. Mitchell (djmitche) * |
Date: 2008-05-10 18:43 |
While it is a sensible fix for the signed/unsigned problem, David's patch
still fails regrtest on my system (amd64), after OOM-killing random other
processes :(
Andrew's suggestion makes a lot of sense here. Does it make sense for
new_buffersize to return (PY_SSIZE_T_MAX+1) in this case, to trigger an
overflow (in addition to David's patch)?
|
msg66561 - (view) |
Author: Dustin J. Mitchell (djmitche) * |
Date: 2008-05-10 18:53 |
Improved fix; this passes test_file on my system.
|
msg66569 - (view) |
Author: Adam Olsen (Rhamphoryncus) |
Date: 2008-05-10 19:27 |
new_buffersize returns a size_t. You should use SIZE_MAX instead
(although I don't see it used elsewhere in CPython, so maybe there's
portability problems.)
The call to _PyString_Resize implicitly casts the size_t to Py_ssize_t.
The check against PY_SSIZE_T_MAX does make this safe, but a comment
would make it more obvious.
The latest patch uses spaces for indentation, which don't match up with
the existing tabs.
|
msg66578 - (view) |
Author: Dustin J. Mitchell (djmitche) * |
Date: 2008-05-10 19:59 |
Thanks, Adam -- requested changes made
|
msg66581 - (view) |
Author: Adam Olsen (Rhamphoryncus) |
Date: 2008-05-10 20:16 |
The indentation still needs tweaking. You have only one tab where you
should have two, and one line uses a mix of tabs and spaces.
|
msg66593 - (view) |
Author: Dustin J. Mitchell (djmitche) * |
Date: 2008-05-11 00:28 |
Ack, sorry. My 'vi' settings should now be correct.
|
msg84296 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2009-03-28 04:03 |
I applied the patch against the trunk, and 'make' failed:
....
File "/home/rdmurray/python/Issue1174606/Lib/platform.py", line 932,
in _syscmd_uname
output = string.strip(f.read())
OverflowError: requested number of bytes is more than a Python string
can hold
I can confirm that the issue still exists on the trunk.
Py3k doesn't benefit from the linux /dev/zero optimization because it
has its own I/O layer, so it takes it a _lot_ longer to get to the
failure point...which is more pathological than the py2 behavior:
rdmurray@partner:~/python/py3k>./python -c 'open("/dev/zero").read()'
zsh: segmentation fault ./python -c 'open("/dev/zero").read()'
Which I think makes this a 'crash' bug on py3k.
|
msg84346 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2009-03-28 23:05 |
Antoine, I added you to the nosy list for this because it turns out the
new io.c segfaults in this case.
|
msg84349 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-03-28 23:29 |
The reason py3k is a lot longer to crash is because of a slight bug in
_fileio.c :-) I'm gonna correct this one first.
|
msg84351 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-03-29 00:46 |
A fix for py3k was committed in r70664.
|
msg84363 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2009-03-29 03:02 |
Antoine, since your fix also goes into the io module in 2.6, and the
buggish behavior even of the old code is more cosmetic than problematic,
I'm thinking we can just close this as accepted. Do you concur?
|
msg84380 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2009-03-29 12:01 |
Well, it should be backported to 2.6 first, then.
(which is not necessarily trivial since a lot of bug fixes in 3.0/3.1
weren't backported to the 2.6/2.7 "io" module, AFAIK)
|
msg112205 - (view) |
Author: Georg Brandl (georg.brandl) * |
Date: 2010-07-31 21:09 |
Don't think this qualifies for 2.6 anymore.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:10 | admin | set | github: 41784 |
2010-07-31 21:09:12 | georg.brandl | set | status: open -> closed nosy:
+ georg.brandl messages:
+ msg112205
|
2009-03-29 12:01:44 | pitrou | set | resolution: accepted messages:
+ msg84380 stage: needs patch -> resolved |
2009-03-29 03:02:07 | r.david.murray | set | messages:
+ msg84363 |
2009-03-29 00:46:18 | pitrou | set | messages:
+ msg84351 versions:
- Python 3.1 |
2009-03-28 23:29:08 | pitrou | set | messages:
+ msg84349 |
2009-03-28 23:05:13 | r.david.murray | set | keywords:
- patch nosy:
+ pitrou messages:
+ msg84346
|
2009-03-28 04:03:27 | r.david.murray | set | versions:
+ Python 2.6, Python 3.1, Python 2.7 nosy:
+ r.david.murray, benjamin.peterson
messages:
+ msg84296
type: crash stage: needs patch |
2008-05-11 00:28:39 | djmitche | set | files:
+ 1174606-3.patch messages:
+ msg66593 |
2008-05-10 20:16:52 | Rhamphoryncus | set | messages:
+ msg66581 |
2008-05-10 19:59:46 | djmitche | set | files:
+ 1174606-2.patch messages:
+ msg66578 |
2008-05-10 19:27:21 | Rhamphoryncus | set | nosy:
+ Rhamphoryncus messages:
+ msg66569 |
2008-05-10 18:53:57 | djmitche | set | files:
+ 1174606.patch keywords:
+ patch messages:
+ msg66561 |
2008-05-10 18:43:15 | djmitche | set | nosy:
+ djmitche messages:
+ msg66558 |
2008-02-23 18:40:21 | dugan | set | files:
+ issue1174606.diff |
2008-02-23 18:26:36 | dugan | set | files:
+ issue1174606.diff messages:
+ msg62793 |
2008-02-23 17:29:31 | dugan | set | nosy:
+ dugan messages:
+ msg62781 |
2008-02-20 00:01:41 | akuchling | set | keywords:
+ easy nosy:
+ akuchling messages:
+ msg62577 |
2005-04-01 04:48:46 | rhamphoryncus.historic | create | |