This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [PATCH] mmap fails on AMD64
Type: Stage:
Components: Extension Modules Versions: Python 2.4
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: nnorwitz Nosy List: arigo, jepler, michaelurman, nnorwitz, piman
Priority: normal Keywords: patch

Created on 2005-11-24 22:22 by piman, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (17)
msg49105 - (view) Author: Joe Wreschnig (piman) Date: 2005-11-24 22:22
mmap_move_method is not 64-bit safe, and fails on the
following code:

from mmap import mmap
f = open('test.out', 'ab+')
f.write('ABCDEabcde')
f.flush()
m = mmap(f.fileno(), 10)
m.move(5, 0, 5)
m.read(10)

To fix this, mmapmodule.c:538 needs to use "LLL:move"
instead of "iii:move" (this also fixes #1921169, which
changed it from "iii" to "III").

This is in both Python 2.3 and 2.4.
msg49106 - (view) Author: Jeff Epler (jepler) Date: 2005-11-27 19:25
Logged In: YES 
user_id=2772

I haven't tested the patch, but the above testcase does fail
on my linux x86_64 machine with "ValueError: source or
destination out of range" on the call to m.move(), but
succeed on my linux x86 machine.
msg49107 - (view) Author: Joe Wreschnig (piman) Date: 2005-11-28 21:36
Logged In: YES 
user_id=796

This is causing data corruption. There is also a trivial
patch. There is no good reason not to fix it immediately.
msg49108 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-12-13 11:09
Logged In: YES 
user_id=4771

Could you do a complete review of the i/I/l/L issue in
mmapmodule.c?  There seems to be a fair amount of
inconsistencies between them and the declared types of
the variables -- for example, I don't think that assigning
to size_t variables is a good idea; PyArg_ParseTuple()
should only assign to int/unsigned int/long/unsigned long
variables.
msg49109 - (view) Author: Michael Urman (michaelurman) Date: 2005-12-14 17:11
Logged In: YES 
user_id=1404983

Hi Armin,
Joe finally convinced me to get the SF account myself. I've
done some further analysis on this already at
http://www.tortall.net/mu/blog/2005/12/01 and would be
willing to create relevant patches for whatever change
coverage level you would like to see.  I.e. just for
mmapmodule, just for glaring errors, or for creating
temporary local variables to ensure all format string
passing is okay in all ParseTuple instances known to be bad.
Please let me know what patch or patches you would like me
to create.
msg49110 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-12-14 17:40
Logged In: YES 
user_id=4771

Nice results, great work.  My humble opinion on this is that all known instances should be fixed.

The type size_t could be an int or a long depending on compiler/platform (or maybe even something else), so it should just not be used.  Any signed/unsigned conflict should also be investigated; most but not all seem harmless.

Of course, tests would be nice, but I don't completely see the point of spending too much efforts testing dark corners that we are pretty confident are getting fixed, when the code is under-tested to start with.

The Py_BuildValue() logs are more tedious to look through.  There are definitely some int-vs-long conflicts.  There are signed/unsigned conflicts, but some are intended, e.g. in binascii -- in these cases, though, an explicit cast would not hurt.

In any case if you embark on this clean-up, I volunteer to review and apply it (be sure to work on the subversion head).
msg49111 - (view) Author: Michael Urman (michaelurman) Date: 2005-12-14 19:11
Logged In: YES 
user_id=1404983

Alright, I'll work on creating a patch against HEAD. I'll
see what I can do about writing tests for this, but in
general I expect the only place to test it will be at the
higher level with samples like the one in this report; it's
certainly not worth writing ways to invasively check if a
ParseTuple the right values in most scenarios.

The BuildValue support from my script is very rough, but
I'll see what I can do. If you know of any easy to use C
parsers in Python, I wouldn't mind modifying the checker to
get better results with it.
msg49112 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-12-14 20:06
Logged In: YES 
user_id=4771

No, I'd also be interested in hearing about a decent reusable parser for C.  The only one I know about is gccxml, and it seems to be an incredible mess to compile and to be broken on every other gcc version.
msg49113 - (view) Author: Michael Urman (michaelurman) Date: 2005-12-15 05:38
Logged In: YES 
user_id=1404983

Issues I've run into, and would like pre-patch feedback:

Regarding signed/unsigned in general - is it python-code
safe to change a signed format to an unsigned? Reading
getargs.c it looks like bounds are only checked for the
signed formats. This could be either good or bad.

fcntlmodule.c: the reported char* type of arg on line 174 is
due to a shadow variable (already scoped-out) earlier in the
function. Is this worth changing?

xxmodule.c: the use of # with a long is preceded by a
comment about testing bad format characters. I'm unconvinced
whether this is what it meant.
msg49114 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2005-12-15 05:46
Logged In: YES 
user_id=33168

Michael, I'm not sure I understand the issues in detail,
however I will try to respond.  signed/unsigned issues are
probably used and checked inconsitently--be wary of them. 
There are potential issues when the high bit is set and
whether that means a long or unsigned value.  The meaning
has changed in some versions of python.

I would definitely like to see shadowed variables fixed.

If you don't believe a comment is correct, you should fix
the problem whatever it is.  So fix the comment and/or code
as you see fit.

If the patch gets too large, feel free to break it up into
several related components if that makes sense.

I'm just starting to read over your analysis.  Thanks for
looking into these issues.
msg49115 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2005-12-15 06:16
Logged In: YES 
user_id=33168

I read your blog and I entirely agree that we need more unit
tests.  The more you can provide the better.  I have an
AMD64 so can test it works before checkin.
msg49116 - (view) Author: Michael Urman (michaelurman) Date: 2005-12-15 06:43
Logged In: YES 
user_id=1404983

Neal, thanks for the additional encouragement. The issues
I'm worried about with regard to signed/unsigned are on the
user code side. In most cases it's a signed format string
matching to an unsigned variable, and often is for a
positive only value (count or length, or similar).

I doubt more than a couple python programs try to use values
large enough for it to be relevant (outside of semantically
questionable -1 markers), however before arbitrarily
changing format strings, I want to be sure such a -1 value
parsed with an unsigned format string will not start raising
ValueErrors - I want my patch to be a candidate for
backporting if desired. Better to leave well enough alone if
it's mostly safe.

There are also tons of unsigned char * variables passed for
string formats; since I don't know why they are explicitly
unsigned, and most of Python works as is, I'd rather just
leave those alone. It would suck to hit one of the special
cases for which it really needs to be an unsigned char *.
msg49117 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2005-12-15 07:04
Logged In: YES 
user_id=33168

Thanks for your comments.  It made a lot more sense after I
finished reading your blog.  

I think the signed-ness issues which relate to ("i",
unsigned int) or ("l", unsigned long) should be fixed.  I
agree with your concern about fixing and backporting, though
I don't know how best to address it.

I'm not concerned about unsigned vs signed char * for 's'
and 't' formats.  I don't think they are a big deal which we
seem to agree on.

I fixed the bltinmodule.c:1910 problem.  That wasn't really
an issue since the variable was larger and not actually
used.  The problem with xxmodule.c:207 is in error checking
code, so I'm not concerned about that.  (It's checking that
O# is invalid.)

Can you run your script on Objects too?  It would be great
if we got your script into the python tree too so we can
search for these things in the future.  (Ideally we would
have complete unittests, but I'm not holding my breath for
that to happen.)

For the record is your name Michael Urman?  I would like to
add you to the ACKS file.
msg49118 - (view) Author: Michael Urman (michaelurman) Date: 2005-12-15 07:40
Logged In: YES 
user_id=1404983

Objects/ is clean on the ParseTuple, and all I see on
BuildValue are due to limitations in my script. The script
itself is available at
http://www.tortall.net/mu/static/fmtcheck.py (linked via the
blog entry as well if it moves later), and if GPL terms are
useless, I'm willing to consider others. Regardless the
script is currently too false-positive happy for automated
use (i hope not false-negative happy).

I am indeed Michael Urman. I don't know how quickly I'll
have a final patch---perhaps by the end of the week---a lot
of it is mind-numbing detail work. Feel free to preemptively
obsolete parts of the patch as you see fit.
msg49119 - (view) Author: Michael Urman (michaelurman) Date: 2005-12-17 21:39
Logged In: YES 
user_id=1404983

I've got a patch for the ParseTuple cases, but I don't seem
to be able to attach it. For now it is available from
http://www.tortall.net/mu/static/fmtcheck_parsetuple_41738_diff?raw

It consists mostly of replacing signed formats with unsigned
formats, and adding parse_foo temporary variables for
parsing to a known size. There's also a couple cases where I
just change the variable type to match the format when I'm
fairly certain it's safe. I added the test on this ticket to
test_mmap.py but added no other tests.
msg49120 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2005-12-18 03:34
Logged In: YES 
user_id=33168

Michael, I checked in a variation of your patch.  The reason
you can't attach a file to this report is because SF is
annoying.  You can only attach files to reports you own.  So
it would be great if you can open a new report and attach
your patch against svn HEAD.  Thanks.

Committed revision 41748.

msg49121 - (view) Author: Jeff Epler (jepler) Date: 2005-12-18 15:22
Logged In: YES 
user_id=2772

Just did an svn update and built on my FC4/x86_64 system. 
The test code seems to now give the expected result instead
of an exception.

>>> m.read(10)
'ABCDEABCDE'
History
Date User Action Args
2022-04-11 14:56:14adminsetgithub: 42621
2005-11-24 22:22:05pimancreate