This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re doesn't like (^$)*
Type: Stage:
Components: Regular Expressions Versions: Python 2.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: nnorwitz Nosy List: dalke, mkc, nnorwitz
Priority: normal Keywords:

Created on 2003-08-11 20:21 by dalke, last changed 2022-04-10 16:10 by admin. This issue is now closed.

Messages (3)
msg17720 - (view) Author: Andrew Dalke (dalke) * (Python committer) Date: 2003-08-11 20:21
Nor, for that matter, does it like "(^)*"

% python
Python 2.3 (#1, Aug  3 2003, 02:47:49) 
[GCC 3.1 20020420 (prerelease)] on darwin
>>> import re
>>> re.compile("(^$)*").match("")
Segmentation fault
%

It's trying real hard to match 0 characters an infinite
number of time.  :)

The segfault is caused in part by the low stacksize limit
on my OS X machine,

% limit stacksize
stacksize       512 kbytes
% limit stacksize 2000kbytes
% limit stacksize
stacksize       2000 kbytes
% python
Python 2.3 (#1, Aug  3 2003, 02:47:49) 
[GCC 3.1 20020420 (prerelease)] on darwin
Type "help", "copyright", "credits" or "license" for more 
information.
>>> import re
>>> re.compile("(^$)*").match("")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
RuntimeError: maximum recursion limit exceeded
>>> 

which suggests that the stack recursion limit
test for the re library is not the same as the one
used for the rest of Python.  (def f(): f() gives
me the expected recursion limit, and not a
segfault)

Seems like the bug could be in several places:
  - the compiler doesn't handle infinite loops of
      zero-character tests well (it could convert
      them to a finite-loop test)
  - the re matcher doesn't check that it's been
      in the same place several times without
      advancing any character positions
  - the re matcher doesn't use the same stack
     check used elsewhere in Python
  - the Mac stacksize default is too low for
     Python's 

BTW, checking pcre ...

>>> import pre
/usr/local/lib/python2.3/pre.py:94: DeprecationWarning: 
Please use the 're' module, not the 'pre' module
  DeprecationWarning)
>>> pre.compile("(^$)*").match("")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/lib/python2.3/pre.py", line 251, in compile
    code=pcre_compile(pattern, flags, groupindex)
pcre.error: ('operand of unlimited repeat could match the 
empty string', 4)
>>> 

which is true, but the pattern I used should (IMHO)
be allowed to match the empty string.
msg17721 - (view) Author: Mike Coleman (mkc) Date: 2004-07-30 15:04
Logged In: YES 
user_id=555

I was able to reproduce this under Linux (by setting the
stack limit to 512k) under Python 2.3.2.  The specific case
seems to be fixed in the current CVS head; I don't get
problems until I reduce the stack limit below 64k, at which
point the import of 're' fails.  So, to the degree that this
was caused by heavy use of the C stack, perhaps this has
been fixed.

As the submitter suggests, the limit being bumped into here
is not the Python recursion limit, but the underlying C
stack limit.  AFAIK, the Python recursion limit is not
checked within C modules (and I doubt it would be reasonable
to add this).

In principle it seems like it would be nice if Python could
throw a MemoryException when the C stack limit is exceeded,
but it's not clear how this could be done or whether it
would be worthwhile.  Guido is already on record (I think)
as being against longjmp, which seems like the only portable
way to implement it.  It might be possible to emit a short
diagnostic (e.g., 'stack limit exceeded') before aborting,
but this would entail adding some sigaltstack mechanism that
probably wouldn't be portable to non-POSIX and might well
make debugging real SIGSEGV's or runaway C recursion more
difficult.

My suggestion would be to close this bug, perhaps after
adding a bit of documentation regarding how Python handles
this case (i.e., it relies on the default behavior, like 99%
of other programs).  If there's a desire to add
functionality to handle the stack limit in a different way,
that could be proposed on python-dev or as an RFE.
msg17722 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2005-10-02 05:59
Logged In: YES 
user_id=33168

I can't reproduce this and there have been changes to re
that probably would have improved the situation.  Please
reopen if there is evidence to the contrary.
History
Date User Action Args
2022-04-10 16:10:35adminsetgithub: 39049
2003-08-11 20:21:43dalkecreate