This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Fix for sre bug 470582
Type: Stage:
Components: Extension Modules Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: effbot Nosy List: effbot, glchapman, niemeyer, nnorwitz, tim.peters
Priority: release blocker Keywords: patch

Created on 2002-03-08 13:14 by glchapman, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
_sre.c.diff glchapman, 2002-03-08 13:14
re_tests.py.diff glchapman, 2002-03-08 13:20
Messages (9)
msg39186 - (view) Author: Greg Chapman (glchapman) Date: 2002-03-08 13:14
Bug report 470582 points out that nested groups can 
produces matches in sre even if the groups within 
which they are nested do not match:

>>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, '3', '34', '123')
>>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d)
$", "34.123")
>>> m.groups()
(None, None, '34', '123')

I believe this is because in the handling of 
SRE_OP_MAX_UNTIL, state->lastmark is being reduced 
(after "((\d)\:)" fails) without NULLing out the now-
invalid entries at the end of the state->mark array.  
In the other two cases where state->lastmark is 
reduced (specifically in SRE_OP_BRANCH and 
SRE_OP_REPEAT_ONE) memset is used to NULL out the 
entries at the end of the array.  The attached patch 
does the same thing for the SRE_OP_MAX_UNTIL case.  
This fixes the above case and does not break anything 
in test_re.py.
msg39187 - (view) Author: Greg Chapman (glchapman) Date: 2002-03-08 13:20
Logged In: YES 
user_id=86307

I forgot: here's a patch for re_tests.py which adds the 
case from the bug report as a test.
msg39188 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2002-03-08 13:29
Logged In: YES 
user_id=33168

Confirmed that the test w/o fix fails
and the test passes with the fix to _sre.c.

But I'm not sure if the memset can go too far:

  memset(state->mark + lastmark + 1, 0, 
         (state->lastmark - lastmark) * sizeof(void*));

I can try under purify, but that doesn't guarantee anything.
msg39189 - (view) Author: Greg Chapman (glchapman) Date: 2002-03-08 15:23
Logged In: YES 
user_id=86307

I'm pretty sure the memset is correct; state->lastmark is 
the index of last mark written to (not the index of the 
next potential write).

Also, it occurred to me that there is another related error 
here:

>>> m = sre.search(r'^((\d)\:)?\d\d\.\d\d\d$', '34.123')
>>> m.groups()
(None, None)
>>> m.lastindex
2

In other words, lastindex claims that group 2 was the last 
that matched, even though it didn't really match.  Since 
lastindex is undocumented, this probably doesn't matter too 
much.  Still, it probably should be reset if it is pointing 
to a group which gets "unmatched" when state->lastmark is 
reduced.  Perhaps a function like the following should be 
added for use in the three places where state->lastmark is 
reset to a previous value:

void lastmark_restore(SRE_STATE *state, int lastmark)
{
    assert(lastmark >= 0);
    if (state->lastmark > lastmark) {
        int lastvalidindex = 
            (lastmark == 0) ? -1 : (lastmark-1)/2+1;
        if (state->lastindex > lastvalidindex)
            state->lastindex = lastvalidindex;
        memset(
            state->mark + lastmark + 1, 0,
            (state->lastmark - lastmark) * sizeof(void*)
        );
    }
    state->lastmark = lastmark;
}
 
msg39190 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-03-08 18:28
Logged In: YES 
user_id=31435

Assigned to /F -- he's the expert here.
msg39191 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2002-07-12 11:11
Logged In: YES 
user_id=38376

(bumped priority as a reminder to self) /F
msg39192 - (view) Author: Greg Chapman (glchapman) Date: 2002-08-12 21:17
Logged In: YES 
user_id=86307

I noticed recently that the lastindex attribute of match objects 
is now documented, so I believe that the lastindex problem I 
described in my March 8 posting needs to be fixed.  Simply, 
lastindex may claim that a group matched when in fact it 
didn't (because lastindex does not get updated when 
lastmark is reset to a lower value):

>>> m = sre.match('(\d)?\d\d', '12')
>>> m.groups()
(None,)
>>> m.lastindex
1
msg39193 - (view) Author: Greg Chapman (glchapman) Date: 2002-10-04 16:51
Logged In: YES 
user_id=86307

Assuming this patch is acceptable (I see it has not yet been 
applied to _sre.c), I wonder if it would be a good candidate for a 
backport to 2.2.2?  (Though it still lacks a fix for the lastindex 
problem.)
msg39194 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2002-11-06 14:24
Logged In: YES 
user_id=7887

Applied as:

Lib/test/re_tests.py:1.30->1.31 
Lib/test/test_sre.py:1.37->1.38 
Misc/NEWS:1.511->1.512 
Modules/_sre.c:2.83->2.84 

Thank you very much!
History
Date User Action Args
2022-04-10 16:05:04adminsetgithub: 36223
2002-03-08 13:14:04glchapmancreate