This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: replace groups doesn't work in this special case
Type: Stage:
Components: Regular Expressions Versions: Python 2.4
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: niemeyer Nosy List: georg.brandl, niemeyer, tomek74
Priority: normal Keywords:

Created on 2006-11-06 11:49 by tomek74, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (5)
msg30467 - (view) Author: Thomas K. (tomek74) Date: 2006-11-06 11:49
If you have a regular expression like this:
([0-9])([a-z])?
matching this string:
1 1a
and replacing with this:
yx
you get what expected:
yx yx

BUT:
If you replace with this:
\1\2
you get nothing replaced, because the group \2 
doesn't exist for the pattern "1".
But it does exist for the pattern "1a"!

We have multiple possibilities here:
1.) The string "1" gives no result, because \2 
doesn't exist. The string "1a" gives a result, so the 
output should be: 1a
2.) The sring "1" gives a result, because \2 is 
handled like an empty string. The string "1a" gives a 
result, so the output should be: 1 1a


I think the case that the sring "1" has no results, 
but effects the string "1a" wich would normaly have a 
result, is bad.

What are your thoughts on it?


Test code:
import re

# common variables

rawstr = r"""([0-9])([a-z])?"""
embedded_rawstr = r"""([0-9])([a-z])?"""
matchstr = """1 1a"""

# method 1: using a compile object
compile_obj = re.compile(rawstr)
match_obj = compile_obj.search(matchstr)

# method 2: using search function (w/ external flags)
match_obj = re.search(rawstr, matchstr)

# method 3: using search function (w/ embedded flags)
match_obj = re.search(embedded_rawstr, matchstr)

# Retrieve group(s) from match_obj
all_groups = match_obj.groups()

# Retrieve group(s) by index
group_1 = match_obj.group(1)
group_2 = match_obj.group(2)

# Replace string
newstr = compile_obj.subn('\1\2', 0)
msg30468 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2006-11-06 12:17
Logged In: YES 
user_id=7887

Hello Thomas,

I don't understand exactly what you mean here.

This doesn't work:

  >>> re.compile("([0-9])([a-z])?").subn(r"<\1\2>", "1 1a")
  Traceback (most recent call last):
  ...
  sre_constants.error: unmatched group

And this works fine:

  >>> re.compile("([0-9])([a-z]?)").subn(r"<\1\2>", "1 1a")
  ('<1> <1a>', 2)

The example code you provided doesn't run here, because
'subn()' is being provided
bad data (check http://docs.python.org/lib/node46.html for
docs). It's also
being passed '\1\2', which is really '\x01\x02', and won't
do what you want.
msg30469 - (view) Author: Thomas K. (tomek74) Date: 2006-11-07 10:36
Logged In: YES 
user_id=22427

I verified your code. It works for me, too.
Sorry.
msg30470 - (view) Author: Thomas K. (tomek74) Date: 2006-11-08 16:56
Logged In: YES 
user_id=22427

I have tried it again with my original regexp and the
searchstring. In this case I have to put the “?” after “)”.


-> RegEx:
([1-9][a-z][a-z][0-9])([ \-\r\n\t]*([0-9])(([0-9])(([0-9])([
\-\r\n\t]*([0-9])(([a-z])(([a-z])(([0-9])(([0-9])([
\-\r\n\t]*([0-9])(([a-z])(([a-z])(([0-9]))?)?)?)?)?)?)?)?)?)?)?)?
IGNORECASE is switched on.

-> ReplaceString:
\1\3\5\7\9\11\13\15\17\19\21\23\25 

-> Searchstring 1):
6ES5894-0MA63-0UG5

Result:
6ES58940MA630UG5


-> Searchstring 2):
6ES5894-0MA03; 6ES5864-0MA03; 6ES5894-0MA63-0UG5; 6ES58860MA03

Result:
NO Result!

-> The problem is that I get no results with searchstring 2.

Thomas 
msg30471 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-11-18 19:22
I do get a match with your regex and search string 2.
History
Date User Action Args
2022-04-11 14:56:21adminsetgithub: 44202
2006-11-06 11:49:55tomek74create