This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Inconsistent behaviour in re grouping
Type: Stage:
Components: Regular Expressions Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: effbot Nosy List: effbot, glchapman, mbrierst, pedro_rodriguez, tim.peters
Priority: normal Keywords:

Created on 2002-07-01 18:22 by pedro_rodriguez, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
re_pb.py pedro_rodriguez, 2002-07-01 18:22
Messages (7)
msg11418 - (view) Author: Pedro Rodriguez (pedro_rodriguez) Date: 2002-07-01 18:22
The following expression (?P<name>.*) and
(?P<name>(.*)) don't behave in the same way. 

When the matching fails, the first group will be None,
but the last one will contain an empty string.

The problem occurs with python 2.1.1 and 2.2. (and
latest CVS for 2.3)

Python 1.5.2 OTH works fine.


(example file attached)
msg11419 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-01 18:36
Logged In: YES 
user_id=31435

Here's a simpler example:

import re
pat1 = re.compile(r"((.*)x)?(y)")
pat2 = re.compile(r"(((.*))x)?(y)")
print pat1.match('y').groups()
print pat2.match('y').groups()

That prints

(None, None, 'y')
(None, '', None, 'y')

If (y) in the regexps is changed to plain y:

pat1 = re.compile(r"((.*)x)?y")
pat2 = re.compile(r"(((.*))x)?y")
print pat1.match('y').groups()
print pat2.match('y').groups()

the output changes to (the expected):

(None, None)
(None, None, None)

So it's not *just* the extra level of parens -- whether there's 
a capturing group "to the right" also affects the outcome.

FWIW, I agree it's a buglet.
msg11420 - (view) Author: Greg Chapman (glchapman) Date: 2002-07-07 16:09
Logged In: YES 
user_id=86307

I believe this is another example of the bug at which Patch 
527371 was aimed.  With that patch applied to the 2.2.1 
_sre.c, I get this:

>>> pat2 = re.compile(r"(((.*))x)?(y)")
>>> print pat2.match('y').groups()
(None, None, None, 'y')

I see that the patch is marked as accepted, but it does not 
yet appear to have made it into _sre.c even in CVS (at least 
it's not in version 2.80).  Perhaps this is an oversight?
msg11421 - (view) Author: Pedro Rodriguez (pedro_rodriguez) Date: 2002-07-09 07:20
Logged In: YES 
user_id=426450

Greg,

I tried your patch and it fixes the problems as Tim and
myself reported them.

If each time the 'lastmark' field is set to a value which is
less than the current one, the 'memset' operation should be
performed, your patch makes sense IMO, and is consistent
with other places in the code.
msg11422 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2002-07-12 11:09
Logged In: YES 
user_id=38376

Makes sense to me.  I'll check it in soonish.

Thanks /F
msg11423 - (view) Author: Michael Stone (mbrierst) Date: 2003-02-05 01:07
Logged In: YES 
user_id=670441

This was fixed when patch 527371 was checked in.  Someone close this bug.
msg11424 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2003-02-05 08:06
Logged In: YES 
user_id=38376

Done.  Thanks /F
History
Date User Action Args
2022-04-10 16:05:28adminsetgithub: 36834
2002-07-01 18:22:41pedro_rodriguezcreate