This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Concatenation on a long string breaks
Type: Stage:
Components: Interpreter Core Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: arigo Nosy List: arigo, exarkun, mwh, nnorwitz, tim.peters
Priority: release blocker Keywords:

Created on 2006-07-21 17:18 by exarkun, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
big-str.diff nnorwitz, 2006-07-25 03:46 v1
big-str-2.diff arigo, 2006-07-25 18:25 Fix namespace corruption problem too.
Messages (16)
msg29234 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2006-07-21 17:18
Consider this transcript:

exarkun@charm:~/Projects/python/trunk$ ./python
Python 2.5b2 (trunk:50698, Jul 18 2006, 10:08:36)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
>>> x = 'x' * (2 ** 31 - 1)
>>> x = x + 'x'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: Objects/stringobject.c:4103: bad argument
to internal function
>>> len(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> 


I would expect some exception other than SystemError
and for the locals namespace to not become corrupted.
msg29235 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2006-07-22 09:00
Logged In: YES 
user_id=6656

Confirmed with 2.4.  Ouch.
msg29236 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-07-22 14:11
Logged In: YES 
user_id=31435

Part of the problem appears to be that ceval.c's
string_concatenate() doesn't check for overflow in the
second argument here:

if (_PyString_Resize(&v, v_len + w_len) != 0)

The Windows malloc on my box returns NULL (and so Python
raises MemoryError) on the initial:

x = 'x' * (2 ** 31 - 1)

attempt, so I never get that far.  I'm afraid this is
muddier in strange ways on Linux because the Linux malloc()
is pathologically optimistic (can return a non-NULL value
pointing at "memory" that can't actually be used for anything).
msg29237 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-07-25 03:46
Logged In: YES 
user_id=33168

Attached a patch against 2.5.  The patch should work against
2.4 too, but you will need to change all Py_ssize_t to int.
 Tim since we know both lens are >= 0, is this test
sufficient for verifying overflow:

      Py_ssize_t new_len = v_len + w_len;
      if (new_len < 0) {

Jp, can you test the patch?
msg29238 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-07-25 03:57
Logged In: YES 
user_id=31435

[Neal]
>  Tim since we know both lens are >= 0, is this test
> sufficient for verifying overflow:
>
>     Py_ssize_t new_len = v_len + w_len;
>     if (new_len < 0) {

Right!  That's all the new checking needed in
string_concatenate().
msg29239 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2006-07-25 18:25
Logged In: YES 
user_id=4771

The check should be done before the variable is removed
from the namespace, so that 'x' doesn't just disappear.
Attached a patch that does this.  Also, let's produce
an exception consistent with what PyString_Concat()
raises in the same case, which is an OverflowError
instead of a MemoryError.
msg29240 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2006-07-25 20:06
Logged In: YES 
user_id=366566

Tested Armin's patch, looks like it addresses the issue. 
One thing I noticed though, ('x' * (2 ** 32 - 2) + 'x')
fails the same way ('x' * (2 ** 32 - 1) + 'x').  Don't
really understand why, just thought I'd mention it.
msg29241 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-07-26 04:19
Logged In: YES 
user_id=33168

Armin, can you check in your patch and backport?  Also a
news entry and a test in test_bigmem would be great. 
Thanks.  If not let me know (assign to me) and I'll finish
this up.

This fix should be backported as well.
msg29242 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2006-07-26 09:01
Logged In: YES 
user_id=4771

I'm unsure about how the bigmem tests should be used.
I think that I am not supposed to set a >2G limit on
a 32-bit machine, right?  When I do, I get 9 failures:
8 OverflowErrors and a stranger AssertionError in
test_hash.  I think that these tests are meant to
test the int/Py_ssize_t difference on 64-bit 
machines instead.  The bug this tracker is about
only shows up on 32-bit machines.

My concern is that if we add a test for the current
bug in test_bigmem, then with a memory limit < 2GB
the test is essentially skipped, and with a memory
limit > 2GB then 9 other tests fail anyway.
msg29243 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-07-26 15:50
Logged In: YES 
user_id=33168

You're correct that bigmem is primarily for testing
int/Py_ssize_t.  But it doesn't have to be.  It has support
for machines with largish amounts of memory (and limiting
test runs).  I didn't know where else to put such a test.  I
agree that this bug would only occur on 32-bit platforms. 
Most machines can't run it, so about the only other option I
can think of would be to put it in it's own file and add a
-u option.  That seemed like even more work.

I'm not tied to bigmem at all, but it would be nice to have
a test for this somewhere.  I'm sure there are a bunch of
other places we have this sort of overflow and it would be
good to test them somewhere.  Do whatever you think is best.
msg29244 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2006-07-27 08:38
Logged In: YES 
user_id=4771

We could reuse the --memlimit option of regrtest in the
following way:

At the moment it makes no sense to specify a --memlimit
larger than Py_ssize_t, like 3GB on 32-bit systems.  At
least test_bigmem fails completely in this case.  From this
it seems that the --memlimit actually tells, more precisely,
how much of its *address space* the Python test process is
allowed to consume.  So the value should be clamped to a
maximum of MAX(Py_ssize_t).  This would solve the current
test_bigmem issue.

If we do so, then the condition "--memlimit >=
MAX(Py_ssize_t)" is precisely what should be checked to know
if we can run the test for the bug in the present tracker,
and other tests of the same kind, which check what occurs
when the *address space* is exhausted.

In this way, specifying --memlimit=3G would enable either
test_bigmem (on 64-bit systems) or some new
test_filladdressspace (on 32-bit systems), as appropriate.

Sounds reasonable?
msg29245 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2006-07-27 08:48
Logged In: YES 
user_id=4771

Almost missed kuran's note.  Kuran: I suppose you meant to
use 2**31 instead of 2**32, but you've found another
important bug:

>>> s = 'x' * (2**32-2)
>>> N = len(s)
>>> N
2147483647
>>> 2**32
4294967296L

Argh!  Another check is missing somewhere.
msg29246 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-08-04 05:27
Logged In: YES 
user_id=33168

Armin, yes that sounds reasonable.  Please checkin as soon
as possible now that the trunk is not frozen.
msg29247 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2006-08-08 09:24
Logged In: YES 
user_id=4771

I was away.  I will try to get around to it before
release candidate one.
msg29248 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2006-08-09 15:39
Logged In: YES 
user_id=4771

Committed in rev 51178.

Closing this report; the repetition problem is in
another tracker and is mentioned in PEP 356
(the 2.5 release schedule).
msg29249 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2006-08-09 15:42
Logged In: YES 
user_id=4771

Committed in rev 51178.

Closing this report; the repetition problem is in
another tracker and is mentioned in PEP 356
(the 2.5 release schedule).
History
Date User Action Args
2022-04-11 14:56:19adminsetgithub: 43706
2006-07-21 17:18:58exarkuncreate